creation of a russian-english translation program karen shiells
TRANSCRIPT
Creation of a Russian-English Translation Program
Karen Shiells
Purpose
Object-oriented approach Interactive machine translation Designed for aid, not independent translation Explore algorithms used in machine translation Identify grammatical obstacles to translation Create a base to expand later
Scope of Study
Machine translation is and will be imperfect Modern translation uses statistical methods Project is limited to:
Separating base words from morphological endings Constructing syntax trees from source text Generating simple English output from tree Identifying words already known to the program
Other Research
Part-of-speech tagging: Uses probability to identify parts of speech Applied to unknown words and structures Complex labeling systems, beyond conventional
Translation algorithms: Massive dictionaries store words and information Aided by verb categorization Omit unknown words and translate without Usually comprehensible, but require human revision
Old Methods
Direct Translation First method Rearranges sentences without parsing Based on rules of transfer for specific languages
Interlingua From era of international languages Uses one representation as an intermediary Intermediary is usually a constructed language Easier to add language pairs
Syntactic Transfer
Similar to interlingua Generates syntax tree using specific parser Rearranges tree to fit target structure Uses specific generation method to form output Entire algorithm specific to one language pair Best quality translations Relatively new Not as common in commercial software
Alternative Structures
Valency Stores number of complements for each word Type of complements not specified Occupies less space in dictionary
Phrase-Structure Representation Most familiar: noun phrase, verb phrase, etc. Breaks sentence into superstructures Puts terminal symbols only in leaves Non-terminal symbols for branches
Dependency Trees
Uses words as nodes, not just leaves Examples:
Verb dependent on subject Objects dependent on verb Adjectives dependent on nouns Prepositions vary by type of prepositional phrase
Easier to verify agreement between words Occupies less space
Object Orientation
Object-oriented approach allows more flexibility Endings, cases, and declensions are classes Fewer hard-coded rules Methods for locating dependents are in classes Modular design allows gradual changes
Changes in lexical analysis do not affect parsing Changes in dictionary do not affect translation
Verb Typing
Divides verbs into categories, for example: Transitive Intransitive Directional or Non-directional motion
Condenses structure storage Dictionary stores only type of a verb Particular structures taken from general Code can apply to general structures, not specific
Dictionary
Open, save, add, remove, and search functions Stores:
Russian nominative English nominatives Part of speech Noun/pronoun attributes Verb types
Translator
Uses transliteration for ease of testing Can be easily converted to Unicode Cyrillic Debugging output to terminal window
Results
Subject, verb, direct object translated Subject is first nominative Verb matched by gender, number, and person Direct object is first accusative
Adjectives matched to nouns Matched by case, number, and gender Word order not considered
Word order should be accounted for, but aren't Adjectives to nearest, not matching Prepositional objects should be nearby
Conclusions Part-of-speech guessing could be added easily
When a subordinate is not found, add to list For each unmatched word, prompt user Allow selection between subordinates not found
Verb typing would be harder, but helpful Restricting complements makes more precise More efficient, not searching for all possible Prepositions could be associated with nouns
Even in inflecting languages, word order matters Subordinates should be located by proximity Multiple functions use the same inflections
Bibliography
Allen, James. Natural Language Understanding. New York: Benjamin/Cummings Publishing Company, 1995.
Arnold, Doug, Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa Sandler. Machine Translation: An Introductory Guide. London: NCC Blackwell, 1994. Available Online: http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript.
Barber, Charles. The English Language: A Historical Introduction. Cambridge: Cambridge University Press, 1993.
Beard, Robert. “Russian: An Interactive On-Line Reference Grammar”. November 1, 2005. Available Online: http://www.alphadictionary.com/rusgrammar/.
Comrie, Bernard, ed. The World's Major Languages. Oxford: Oxford University Press, 1990.
Hutchins, John and Harold Somers. An Introduction to Machine Translation. London: Academic Press, 1992. Available Online: http://ourworld.compuserve.com/hompages/WJHutchins/IntroMT-TOC.htm.