deep grammars in hybrid machine translation
DESCRIPTION
Deep Grammars in Hybrid Machine Translation. Helge Dyvik. University of Bergen. Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian. A 4-year project (2002 - 2006) involving groups at: The University of Oslo The University of Bergen NTNU (The University of Trondheim) - PowerPoint PPT PresentationTRANSCRIPT
Deep Grammarsin Hybrid Machine Translation
University of Bergen
Helge Dyvik
Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian
A 4-year project (2002 - 2006) involving groups at:•The University of Oslo•The University of Bergen•NTNU (The University of Trondheim)
Cooperation with PARC (John Maxwell) and others
The LOGON systemSchematic architecture
XLE: Xerox Linguistic EnvironmentA platform developed over more than 20 years
at Xerox PARC (now PARC)Developer: John Maxwell
•LFG grammar development•Parsing•Generation•Transfer•Stochastic parse selection•Interaction with shallow methods
An LFG analysis:
Det regnet'It rained'
•Develops parallel grammars on XLE:English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese•‘Parallel grammars’ means parallel f-structures:
A common inventory of featuresCommon principles of analysis
ParGram: The Parallel Grammar ProjectA long-term project (1993-)
LOGON Analysis Modules
Input string
•Tokenization•Named ent.•Compounds•Morphology
LFG lexicons:•NKL-derived•Hand coded
Lexicaltemplates
SyntacticrulesRule templates
c-structures
f-structures
MRSs
Norsk ordbanklexicon
XLE Parser
NorGram String of stemsand tags
Output-inputSupporting knowledgebase
Scope of NorGram
Lexicon: about 80 000 lemmas.In addition:
Automatically analyzed compoundsAutomatically recognized proper names"Guessed" nouns
Syntax: 229 complex rules, giving rise to about 48 000 arcs
Semantics: Minimal Recursion Semantics projections for all readings
Coverage
Performance on an unknown corpus of newspaper text:
•17 randomly selected pieces of text, limited to coherent text,
•comprising 1000 sentences
•taken from 9 newspapers
Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende,
Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys,
•from the editions on November 11th 2005.
The LOGON challenge:
From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.
Semantics for translation:Two issues
• The representational subset problem- Desirable: normalization to flat structures withunordered elements.
• Complete and detailed semantic analyses may be unnecessary.
- Desirable: rich possibilities of underspecification
Basics of
Minimal Recursion Semantics
•Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I.
Sag
•A framework for the representation of semantic information
•Developed in the context of HPSG and machine translation
(Verbmobil)
•Sources of inspiration:
- Quasi-Logical Form (H. Alshawi):
underspecification, e.g. of quantifier scope
- Shake-and-bake translation (P. Whitelock):
a bag of words as interface structure
An MRS representation
• is a bag of semantic entities (some corresponding to words,
some not),
each with a handle,
• plus a bag of handle constraints allowing the underspecification
of
scope,
• plus a handle and an index.
• Each semantic entity is referred to as an Elementary Predication
(EP).
• Relations among EPs are captured by means of shared
variables.
• There are three elementary variable types:
- handles (or 'labels') (h)
- events (e)
- referential indices (x)
From standard logical form to MRS
«Every ferry crosses some fjord»
Two readings:
Replace operators with generalized quantifiers:
every(variable, restriction, body)some(variable, restriction, body)
The first reading (wide-scope every):
var restriction body
Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)
Underspecified scope by means of handle constraints:
Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)
Wide scope: someWide scope: every
MRS as feature structure (also adding event variables):
Norwegian translation: «Hver ferge krysser en fjord»
Projecting MRS representationsfrom f-structures
«Katten sover»'The cat sleeps'
Projecting MRS representationsfrom f-structures
«Katten sover»'The cat sleeps'
mrs::
mrs::
mrs::
Composition: Top-level MRSwith unions of HCONS and RELS:
Post-processing this structurebrings us back to the LOGON MRS format:
http://decentius.aksis.uib.no/logon/xle-mrs.xml
bil 'car' (as in "Han kjøpte bil" 'He bought [a] car')
No SPEC
disse hans mange spørsmål 'these his many questions'
Multiple SPECs
Han jaget barnet ut nakent'He chased the child out naked'
The Transfer Component
Developer of the formalism: Stephan Oepen
Example of transfer
Source sentence:
Henter han bilen sin?fetches he car.DEF POSS.REFL.SG.MASC'Does he fetch his car?'
Alternative reading:'Does he fetch the one of the car?'
Parse output:
Choosing the first reading of Henter han bilen sin?
Choosing the first reading of Henter han bilen sin?
The variables have features.Interrogative is coded as [SF ques] on the event variable.
Two of fourtransferoutputs
Norwegiantransferinput
One of fourEnglishtransferoutputs
Generator output from the chosen transfer output
Transfer formalism(Stephan Oepen)
The form of a transfer rule:
C = contextI = inputF = filterO = output
Simple example:Lexical transfer rule, transferring bekk into creek
No context, no filter, only the predicate is replaced.
Example with a context restriction:gå en tur (lit. 'go a trip') is transferred into the light-verb constructiontake a trip.
In the context of _tur_n as its second argument,_gå_v is transferred to _take_v.
The SEM-I(Semantic Interface)
A documentation of the external semantic interfacefor a grammar, crucial for the writer of transfer rules.
In order to enforce the maintaining of a SEM-I,LOGON parsing returns fail if every parse containsat least one predicate not in the SEM-I.
A small sectionof the verb partof the NorGramSEM-ISize of the NorwegianSEM-I: slightly lessthan 6000 entries
Parse Selection
Parsing, transfer and generation may each givemany solutions, leading to a fanout tree:
The outputs at each of the three stages arestatistically ranked.
Example of a four-way ambiguity:
Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain'
The ParsebankerEfficient treebank building by discriminants
Developer: Paul Meurer, Bergen
Predecessors in discriminant analysis:David Carter (1997)Stephan Oepen, Dan Flickinger & al. (2003)
1
2
3
4
Packed representations and discriminants(Paul Meurer)
Clicking on one discriminant is in this case sufficientto select a unique solution:
The Parsebanker
'After all, a human being must be something more than a machine?'
TigerSearchThe implementation is under development by Paul Meurer
Find selected prepositional phrases with sentential objects:
Find selected prepositional phrases with the preposition 'om' and nominal objects:
Find topicalized objects: