News:

This week IPhone 15 Pro winner is karn
You can be too a winner! Become the top poster of the week and win valuable prizes.  More details are You are not allowed to view links. Register or Login 

Main Menu

Machine Translation of Chinese resultative formation process and the difficultie

Started by wlsqfjaru, April 26, 2011, 09:17:29 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

wlsqfjaru

Machine Translation of Chinese resultative formation process and the difficulties
 
 
1. Introduction  When the computer came out in 1946, people put forward the idea of ​​machine translation, and in 1954 conducted the first test machine translation. However, with the later language information processing research and applications (language information retrieval, text categorization, automatic summarization, information extraction, etc.) compared to progress in machine translation is the slowest. Scholars join together to focus most of their lives, businesses invested a significant number of funds, through fifty years of unremitting research and development, the fruits or products are often not satisfactory.  Because what is it? From the perspective of language studies, machine translation systems analysis, natural language understanding and generation capacity are not yet in place, the language can not deal with the phenomenon of many: some level of sentence structure wrong, some structural relationship wrong, some components of the semantic relationships between a mistake, the wrong identification of some meaning, there are errors in the source language and target language is the contrast between the difference caused. The following is machine translation processing in Chinese resultative several examples of successful (from the three machine translation system).  Look at the case generated, machine translation is still difficult to generate the dynamic resultant Chinese, so we rarely see in translated in Chinese with resultative sentences. EC Translation for the following example, the three systems can not be translated as  * His sweeping the floor clean.  * He sweep the floor clean.     look at the Chinese translated into English resultative example, they can explain the current machine translation systems analysis and understanding of the ability of Chinese resultative:  he played three bad pairs of shoes. * Heplaysthreepairsofevilshoes.  * Hekickedthreepairsofshoesbad.  * Himkickspoil3pairsofshoes.  This section of track to my mother tired. * Thewaymakemothertiredafterthewalk.  * Thissectionofwaywaswalkedmothertired (ly).  * Thisroadmotherwalktired.  You tired of eating the leftovers. * Everybodyhasfeddedupwiththeleftovers.  * Thateverybodyategreasy (ly) surplusvegetable.  * Alleat, isloathetoleavevegetable.    Here we only discuss the formation of resultative question, which does not include the following three conditions: 1) the complement of with a fixed relationship, such as:  2. Machine translation problem resultative  To illustrate how machine translation resultative predicate structure, we need to look at machine translation process. The following figure illustrates the principle of machine translation, but also the entire process of machine translation. Figure 1 Figure   the process of machine translation  Obviously, this is an ideal machine translation process. T from S to I and then the process is the intermediary language translation policy, the intermediary language is usually some sort of independent of the source language and target language of the logical expression. If it is English to Chinese, analysis and understanding of English from the surface down to the bottom, meaning you get the sentence described the logic of language expression. Similarly, the Chinese have step by step from the bottom to the surface generation. Analysis of the bottom away from the surface to the deeper, generated from the bottom back to the surface when the process is more complex. Therefore, these two languages ​​are syntactic and semantic system for in-depth study. Actually present, most machine translation systems for less than this level, the common translation strategy is to convert the direct method or methods, or a combination of direct and converted hybrid method. We can be an example of EC Translation of the direct method, conversion and intermediate languages ​​Translation of these three methods to make a comparison:  Yougetgoodreceptiononyourradio. (From a product manual)  translation of sentence you get a good reception 1 on your radio.  Translated sentence 2 you get good reception of your radio. 3 translated sentence  your radio reception is good.  In the machine translation system, using the direct method can be translated sentence 1; with the conversion method based on syntactic, plus some semantic analysis of the relationship, can be translated sentence 2; translated sentence 3 is based on understanding, using intermediaries Language Act could do. Clearly, in the previous section and on the Resultative Yingyi Han Chinese to English example, are not based on understanding of the translation.  Resultative simple structure, semantic complexity, in the Chinese language is very characteristic of a structure. Mr. Lv Shuxiang (1986) has used it demonstrates the flexibility of Chinese syntax. Discussed from various angles people Resultative Verb structure, the often mentioned in the foreign language teaching them that it is difficult. Similarly, in machine translation is also a problem. Translated into foreign languages ​​in the Chinese system, the difficulty is how to analyze and understand Resultative Verb structure. Translated into Chinese in the foreign language system, the difficulty is how to generate sentences with resultative.  We discuss here only the case generated. In this case, the source language (eg English) are often not equivalent to the structure of Chinese resultative form, it is hard to use transformation rules to some of the structure of English and Chinese resultative linked. So unless the individual with a solution by using the direct method and system transformation method is difficult to generate Translation Resultative Chinese translation. Let the system has the ability to generate resultative, we must follow the idea of ​​an intermediary language law, increasing the depth of analysis to understand the source statement to express the meaning of children (the concept of significance of each component, the relationship between the meaning of composition, sentence sentence style sense, and so on), then the need for meaningful expression, select a resultative structure, and then generate surface sentences. We study the Chinese language is not enough to support the build process. Therefore, the existing machine translation system output of Chinese translation of which is difficult to find authentic Resultative Verb with the structure of sentences. Then there is the following translation:  Hehasmadethequestioncomplicated. He has to complicate matters. (He put out complex problems.)  Thechildrenhavehadenoughtoeat. The children have been eating enough. (The children fed.)  Translation of machine translation so that is even pretty good, just read up some awkward, a little  3. Resultative  the generation of machine translation task of translation is generated to express the significance of starting with selected words to determine the semantic relationships between words, identifying the target statement, the syntactic structure of sub-steps, the final source language sentence output and equivalent in the sense of the surface string. For the formation of resultative, the following steps:  (a) determine the intended meaning (d) integration of the semantic structure  (b) to select words, distribution of semantic roles (e) Select the syntactic form of <BR Such as the means to generate: Wang read the article, the results Wang understand this article. Before the start of the Chinese generation, machine translation systems that use an intermediary to generate a logical expression language translation of what that means. Generally speaking, if this expression there are two predicate structure, and between the  Intermediary language is from the source language the logical expression (such as English) assays in the source language in the predication structure and This is not to determine whether the generation of Chinese resultative only basis. We mentioned in the previous section, English is often not equivalent to the structure of Chinese resultative form, it is hard to use transformation rules to some of the structure of English and Chinese resultative link, which is on the syntactic structure for the . In fact, the English and Chinese between dominant, accounting for minor in English or even marginalized. Moreover, the fundamental mode in the two languages ​​on the construction of the objective situation is also different because of the conceptualization of experience show different forms.) obtained from the analysis of the source language to the semantics of expressions, can be generated in Chinese resultative For example:  English description of the action object in the Chinese language can sometimes be expressed as a result of the action:  Shemarriedthewrongperson. She married the wrong man.  Heenteredthewrongdoor. He entered the wrong door.  Chinese verb complement the results of instructions in English and sometimes the status and extent of action elements:  watch TV for a long time. towatchTVforaverylongtime.  I am learning English late. ItwasverylatewhenIstartedtolearnEnglish.  There, resulting in ) and in English they tend to act as the other ingredients:  in that terrible blizzard a lot of people freezing to death. Manypeoplefrozetodeathintheterriblesnowstorm.  Sofa you sit lazy. Youarebecominglazyonthesofa.  So we need a set of rules, the semantics of expressions to be generated should be used to determine there is no expression in Chinese resultative predication relations. In this set of rules, in addition to the two predicate structure and its dominance of the . This need to study English and Chinese in the expression This difference is sometimes manifested in a certain category, and sometimes brings a personalized, relevant only with the specific words. Current machine translation systems have yet to find such a rule. For the time being we can only see  3.2  choice words chosen words have a need for information processing in the Chinese Dictionary, tells us that words and their meanings, and their use (for example, the predicate argument structure and valence constraints) . For the previous example, need to be elected in the dictionary, These words in the language of logical expressions is the intermediary entity and the predicate. Word choice and the result of semantic role assignment can be expressed as a collection of tree form or characteristics. Figure 2 Figure   word choice and the role of the results of  Agen said distribution agent, Pred says the predicate, Pati said the patient, Cont said content, Expe that experience (the parties).  Dictionary even if there is a detailed, let the machine select the word under the meaning is not an easy thing. We often need a few of them to choose a synonym or synonyms. For example, the Chinese Selected according to the rules of what The current study Chinese vocabulary and semantics can not formally answer this question. Machine translation system had to be a priority under the term relationship with the judge. One approach is the concept of using words describing the semantic dictionary, statistical language model with semantic similarity, let the computer learn to express and compare the words with the precedence relations. However, with this engineering approach and can not bypass the role of Chinese studies, as a statistical language model can achieve better treatment effect depends largely on what kind of linguistic knowledge as a parameter.  3.3  through word choice to determine the legitimacy and role distribution, has been Figure 2 shows the structure of the two predicates. The task of this step is to determine can not be used resultative predicate structure of said structure and relationship of these two predicates. Specifically, Therefore, the question of legality to which the relationship between verbs and adjectives which (or verb) can be combined into a consistent habit of Chinese resultative.  If you give the word machine translation system to provide a table that lists the number of resultative phrase, and For example, the verb This dictionary is for two people, if given the use of machine translation will also need to include more examples. Such as In fact, the word table is only applicable to small-scale experimental translation system. Resultative predicate is a free structure, is made according to need to temporarily come out to speak, so should be too numerous to mention.  We can think, But people often say, These complement what the conditions to generate? In the fourth section we will discuss this issue further.  Determine if the results of this step is not generated resultant legal action, you need to go back and re-select the words, until no matching requirements of the intended semantics of the words so far.  3.4 Semantic integration  as a predication of the whole structure, resultative type has its own semantic dominant components, including components and additional components on the element, which we referred to as semantic relations and semantic structure. In this step, we need to complement verbs and their semantic structure, to determine the overall resultative semantic structure, mainly with price structure (price of the number and nature of language.)  Resultative ranged valence structure of its components (the verb or complement) of the valence structure, not simply equal to the sum of the two. Valence resultative with its components is no corresponding relationship between price? Verbs and how to complement their valence structure has been Resultative valence structure? Yuan Yulin (2001), Guo Rui (1995) and Wang Hongqi (1995) have all made a study of the causes in explaining the same time, looking for resultative original argument for its members to select the control rules. Within a certain range apply these rules, we can complement with the verb and the price structure of resultant projected out of valence structure. Including the price of the number of words: resultative is a price, bivalent, or trivalent; and the price of the nature of language: resultative predicate argument structure, the semantics of all what is the role (mainly divided into the main grid (main argument), the object grid (guest of dollars)). For the previous sentence, we can get, move-on Price of the number and nature of language syntactic structure is the next step the main basis for selection.  Resultative generated in the process, scholars of Chinese grammar rules can be integrated on the element directly affect the rule generation algorithm, such a conclusion in the present study of Chinese grammar is also rare. Machine translation attached great importance to the role of this set of rules, but also looking forward to it more in-depth research and improvement. (Note: for example, Yuan Yulin (2001) proposed the integration of access rules in the argument of the case the result is equivalent to be effective, but also operational, although the situation on the increasing price of dealing with the rules, but the computer is not easy operation, the situation on the reduction has not come up with effective approaches.)  3.5 Select syntax forms  this step to do is choose what kind of syntactic means to show the semantics of resultative structures. Resultative many types of surface structure, Li Lin-set (1986,181-204 page) has summarized the five sentences:  (1) N [, 1] + V + C radical mother cried  (2 ) N [, 1] + V + N [, 2] + V + C walk tired him  (3) N [, 1] + V + C + N [, 2] I lit the oil lamp  (4) N [, 1] + V + N [, 2] + V + C + N [, 3] pound the table he hurt his hand shot  (5) N [, 1] + the + N [, 2] + V + C + N [, 3] torches burn his clothes several holes  first four categories also have four possible forms of the surface transformation (de In the end we should select the sentence which generated it? This is how the semantic structure and syntactic structure to find the corresponding relationship between the problem, we intend to in the fifth part of this issue for further discussion.  3.6  selected word processing layer surface structure and word order of sentences later, the rest of the thing expressed by means of some syntactic or lexical semantic category. For example: when the body, negative, refer to, fixed, quantity, and so on. Then output the final results generated. For our example is: Sentence of  Machine translation of Chinese resultative formation, in the sentence a lot of problems to deal with the surface, each problem is very complex, such as when the body composition, word order and other negative elements, the need for special study.  3.7  control operation should be noted that the above is not unconditional in all steps of the operation carried out in turn. When a step can not be a definitive conclusion, it should suspend the generation of resultative.  4. Verb and complement the results of the combination between the verb and the results complement  combination should be based on semantics. To leave the vocabulary constraints, which determine which verbs and adjectives (or verbs) can be combined into a consistent habit resultatives Chinese, you need to move from the semantics of the combination between the type and make the rules. Obviously, this thing is not a machine translation whatever. In cases where we can start with a start, take a look at monosyllabic monosyllabic verb and adjective as a verb  Verb In the The results complement the verb can take, what adjectives, verbs can be used for the results of complement.) (Shiwen Yu, 1998, hereinafter referred to as Ma-chen, (1997) lists the results can be monosyllabic adjective complement of 153.) Verbs 112. After eleven with the test, including 54 adjectives and 30 verbs to be So we from the Then follow the between concepts of description, and the concept of the relationship between attributes and properties, including the upper and lower relationship between synonymous, antisense relationship, on the meaning of relations, components with the overall relationship between materials and finished products, properties, and host relations and property values ​​and attributes points to relationships, time, and the role of relationships. Internet address: You are not allowed to view links. Register or Login) (Dong Zhendong, 2001) is defined as the complement for each instance of the semantic categories marked, together again class analysis, sorting out the Where A, E, F are three types of adjectives as complements, B, C, D categories as Complement is a verb, are presented below (in brackets is the  A.'s objective attributes (intelligence, behavior, age, appearance, character, economic status)  A1. [Intelligence] stupid school school school blacked out of school kept out of school crazy land of the stupid out of school Wood spent learning the chastened school science silly  A2. [manner] out of school learning Diao crooked thieves out of school study of floating oil out of school learning a stubborn hard out of school science learning colloquial dead skin of a school mixed acid out of school learning out of school to pull the  A3. [age] old school  A4. [economy] learn Rich went to school poorer  A5. [character] learn the difference between a chemical waste black school to learn the bad science bad science  A6. [looks] learn a  B. Qiao's subjective feelings (attitude, perceived)  B1. [attitude] learn enough trouble out of school School used to study the science annoyed fans tired of school learning afraid  B2. [perceived] lack of learned how to learn, learn learn learn through the school tired and forgotten  C.'s state and behavior (state, action)  C1. [state] school science became sick crazy school collapsed school science learning deficit of the school too tired to study dead lame paralyzed school school school dumb dizzy  C2 . [action] learn to cry learn to run a school go learn scattered state of things  D. (state)  lost school did not learn a thing  E. characteristics (properties)  anti-science bias of the learning out of school learning to life a profoundly shallow out of school out of school narrow the school complex enough  F. incident characteristics (properties)  school learning over the school late in the wrong weight of the school learn more study of the long school learning all the junk out of school less and thoroughly study the science out of school early and late  can be seen, follow certain rules. Can, under these laws, computer operable method to determine in a certain range of vocabulary than the resultative examples legal? For example, The computer can do to confirm its legitimacy: According to the > Yu / Lane / intellectual / spiritual ](Similarly, with the the  Semantics for dynamic combination between complement, the results complement the concept of significance is the meaning of the verb under the constraints of the concept of role. We will look at and Therefore, similar results may be asked to complement. In fact, the (Note: Although the semantics of each point may be different, such as: state of the economy), subjective feelings (attitudes, perception), the state and behavior; or its related cognitive behavior that have the effect of things: change the state of the object involved, the nature (length, width, depth, number of positive and negative, beauty and ugliness); or that the characteristics of cognitive behavior itself (measurement, frequency, extent). Different Even with the concept of  word meaning based on the rules, they still can not completely solve the problem. In the In addition to And This shows that the verb and the results of the relationship between complement factor combination is not only the concept of meaning of words. So, in the end determine what other factors can be used as legal or resultative conditions? How these conditions can be manipulated into the computer the rules? We are still unknown.  5. From the semantic structure to the syntactic form of  in Resultative entire build process, the choice of what kind of syntactic means to express the semantic structure is a complex step. People often say that Chinese syntactic structure and semantic structure of the link between the loose, or syntactic and semantic components of coordination is very flexible, a structure often has a variety of meaning, semantic content of a variety of structures can also be used the form said. This is a machine translation of the Chinese generation will pose enormous difficulties. Resultative surface for the choice of syntactic structure,You are not allowed to view links. Register or Login, we can present to the conditions are very limited, so patterns can be generated is also very limited.  5.1 build targets is limited to the semantic structure  we get by integrating the valence resultative structure, the price of the number of language patterns can help us choose. If the price is a resultative, and choose a sentence that appears Nominal composition; is the second price, selection consists of two components of the body parts of speech patterns. The sentence in the same category more than the middle of the surface structure should choose which one I need more detailed conditions and rules. The following discussion of how the components of resultative as a semantic relation between the choice of conditions.  Order to simplify the process of discussion, here we only consider two price resultative situation (the words .) Thus, the scope of discussion to be limited to contain only two components of the surface of the body parts of speech inside the structure. Professor Lü (1986) was in accordance with the subject or object complement the semantic relation to resultative Verb structure is divided into 15 categories (omitted with including the patient, content.)   integrated the above photo can be seen, the second price in the six resultative with the following five semantic structure of the surface structure can be expressed in the form. The following discussion will within this range were:  surface structure 1: S + V + C + O  surface structure 2: S + V + C  surface structure 4: O + -type structure corresponding to the six semantic relations only surface structure 1 (S + V + C + O), so it can be generated as the preferred surface resultative. But further observation will be found in the surface structure of 1 with expression of some of the semantic structure would be restricted. For example, the semantic structure Ⅱ, we can say . can say and can not be said should be able to control, O and C may be the semantic relation between a control condition, but there is no precise rule can be used.  In this case, the surface structure had to be avoided 1, Option 2 and 3. After a preliminary study, we see that the expression of propositions on the sense, expressed by the semantic structure of the surface structure of 3 Ⅱ, 2 expression with the semantic structure of the surface structure Ⅰ, Ⅲ, Ⅳ, Ⅴ, Ⅵ, restrictions minimum. Thus, the scope has narrowed to two surface structure. the question becomes: how to find forms of expression with these two syntactic control of the six conditions of semantic format. We call this approach to gradually narrow the scope of the problem known to produce targets is limited, in fact, this is a compromise of complex issues. that is, price action on the second node type, to give up all of the surface generated sentence, looking for simple and effective control of as much as possible the conditions, the first part of a Surface expression of the various semantic structure format.  5.2 surface structure of the control conditions generated from the semantic structure generated  surface structure of the main control condition is resultative semantic relationships between components.  for two price of the composition of resultative S, O, V, C, V if S is the main grid, but:  (a) If S, O, V, C between one of the following five conditions, 2 with the surface structure can be expressed:  1. (O is the object of V cell) and (O is the main cell C) and (C is a price)  2. (S is C, the main grid ) and (O is the object of V cell) and (O is the object of C cells)  3. (O is the main cell C) and (V and C are a price)  4. (S is the main cell C) and (O is the object of C cells) and (V is a price)  5. (O is the object of V cell) and (C is the V of the modifier) ​​ (b ) If S, O, V, C satisfies the condition between 6 and 3 can be expressed with the surface structure:  6. (O is the object of V cell) and (S is C, the main cell) and (C is a price)  2 and 3 are the surface structure of The role of components other than, knot type and the . (Note: the surface structure of resultative 2, the object is O; in the surface structure of 3, the object is S.) in order to reflect the step (processing surface terms) provides that the adaptability of expression is better than semantic relations are often able to cover some options other than the controlling factor. For example, the complement is a two-syllable type of Actin surface structure generally should not choose S + V + C + O, but words are not the limit. Again, the use of appropriate number. Wang came to understand the results of this article. , in which the various components of the semantic relationship between:  S → V ← O & S → C ← O  control conditions consistent with the semantic structure of 2 (S is the V and C of the main grid, and O is The object of V and C cells). So it is a choice of surface syntactic form:  S + read the article.  6. I wear on the  Koichi (2002) in his discussion of the philosophical basis of Chinese grammar, pointed out that the concept of grammatical phenomenon is the result of the system conceptualization. He made a macro from the conceptual structure to the Chinese syntax and vocabulary in the form of process, that play a role in this process is a Chinese conceptual principles. We should say that machine translation logic expressions generated from the intermediary language translation process Resultative is such an example. from the results found from the formation mechanism of the composition of resultative conditions. Over the years, more emphasis on Chinese grammar has been Resultative predicate structure of the structural level, structural, syntactic functions, semantic relations, with price structure, and composition characteristics and nature of many issues discussed. only conclusion most people-oriented, able to Chinese information processing and machine translation is not used much. For example, on how the resultative form, most people from the last point of view, that its source is the ancient Chinese method to use. And machine translation is needed is found from the synchronic point of view the formation of resultative control conditions, tell the computer, under what circumstances, what verbs and adjectives which (or verb) can be combined into, and how to meet the Chinese habit to form Resultative predicate structure. Also, how to find the meaning of sentences generated Choose the right words mean? how verbs and complement the structure projected valence Valence out resultant structure? how the semantic structure of the slave resultant syntactic forms select it? and how to deal with in the sentence when the surface composition, negative components, refer to the relationship between the number of relationships? find solutions to these problems in the absence of effective rules before using the machine translation system is also only an expedient measure to generate some Chinese resultative.  emphasizing the role of the rule , we also see the method in recent years, statistical language models are increasingly used in language engineering, there have been case-based and statistical machine translation system. in people's mechanisms of language and translation system is also a lack of understanding of There is not a language for information processing theory can be applied when the statistical language model may play some sort of rely on the reflect the structure of rules and verbal statement of the cognitive process of law. In fact, this is yet to be proved that point. not to mention statistical language model needs to be founded on the basis of linguistic knowledge, a language model can achieve better treatment effect depends largely on what we can provide it kind of linguistic knowledge as a parameter. computation task is to obtain the parameters of the statistical laws of the fundamental parameters of the model is established. At present statistical model in the machine translation system the results are poor, the main reason should not be limited computing power of the model itself, but rather to provide a model of too little knowledge about the machine translation. Chinese grammar has not uncovered enough of this kind of knowledge, or is also has not made them systematically structured and organized. that this is a rule based method, it is also a serious problem.  from the machine translation system design point of view, Chinese resultative formation consists of two aspects of the problem. One is to generate knowledge of the language under which it is closely related with the Chinese study; the other is how to achieve the generation process, which is to issue formal and design algorithms to do. We discussed the contents relate only to the first One aspect of the purpose of Resultative generated by the description of the process and see what it needs the support of linguistic knowledge. In fact, we talked about the generation process is still quite rough, raised the question is only a small part. once Some people (BAI Shuo, 1996) said that, following the native language of the people and said that non-native language of the people, the computer language of the emergence of a new reference brings. It will help raise awareness of some Under the old reference out of the language is difficult to reveal the phenomena and laws. I hope we raised these issues can also become such a ],[

Quick Reply

Name:
Email:
Shortcuts: ALT+S post or ALT+P preview