deep grammars in hybrid machine translation

57
Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik

Upload: moe

Post on 15-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Deep Grammars in Hybrid Machine Translation. Helge Dyvik. University of Bergen. Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian. A 4-year project (2002 - 2006) involving groups at: The University of Oslo The University of Bergen NTNU (The University of Trondheim) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Deep Grammars in Hybrid Machine Translation

Deep Grammarsin Hybrid Machine Translation

University of Bergen

Helge Dyvik

Page 2: Deep Grammars in Hybrid Machine Translation

Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian

A 4-year project (2002 - 2006) involving groups at:•The University of Oslo•The University of Bergen•NTNU (The University of Trondheim)

Cooperation with PARC (John Maxwell) and others

Page 3: Deep Grammars in Hybrid Machine Translation

The LOGON systemSchematic architecture

Page 4: Deep Grammars in Hybrid Machine Translation

XLE: Xerox Linguistic EnvironmentA platform developed over more than 20 years

at Xerox PARC (now PARC)Developer: John Maxwell

•LFG grammar development•Parsing•Generation•Transfer•Stochastic parse selection•Interaction with shallow methods

Page 5: Deep Grammars in Hybrid Machine Translation

An LFG analysis:

Det regnet'It rained'

Page 6: Deep Grammars in Hybrid Machine Translation

•Develops parallel grammars on XLE:English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese•‘Parallel grammars’ means parallel f-structures:

A common inventory of featuresCommon principles of analysis

ParGram: The Parallel Grammar ProjectA long-term project (1993-)

Page 7: Deep Grammars in Hybrid Machine Translation

LOGON Analysis Modules

Input string

•Tokenization•Named ent.•Compounds•Morphology

LFG lexicons:•NKL-derived•Hand coded

Lexicaltemplates

SyntacticrulesRule templates

c-structures

f-structures

MRSs

Norsk ordbanklexicon

XLE Parser

NorGram String of stemsand tags

Output-inputSupporting knowledgebase

Page 8: Deep Grammars in Hybrid Machine Translation

Scope of NorGram

Lexicon: about 80 000 lemmas.In addition:

Automatically analyzed compoundsAutomatically recognized proper names"Guessed" nouns

Syntax: 229 complex rules, giving rise to about 48 000 arcs

Semantics: Minimal Recursion Semantics projections for all readings

Page 9: Deep Grammars in Hybrid Machine Translation

Coverage

Performance on an unknown corpus of newspaper text:

•17 randomly selected pieces of text, limited to coherent text,

•comprising 1000 sentences

•taken from 9 newspapers

Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende,

Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys,

•from the editions on November 11th 2005.

Page 10: Deep Grammars in Hybrid Machine Translation
Page 11: Deep Grammars in Hybrid Machine Translation

The LOGON challenge:

From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.

Page 12: Deep Grammars in Hybrid Machine Translation

Semantics for translation:Two issues

• The representational subset problem- Desirable: normalization to flat structures withunordered elements.

• Complete and detailed semantic analyses may be unnecessary.

- Desirable: rich possibilities of underspecification

Page 13: Deep Grammars in Hybrid Machine Translation

Basics of

Minimal Recursion Semantics

•Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I.

Sag

•A framework for the representation of semantic information

•Developed in the context of HPSG and machine translation

(Verbmobil)

•Sources of inspiration:

- Quasi-Logical Form (H. Alshawi):

underspecification, e.g. of quantifier scope

- Shake-and-bake translation (P. Whitelock):

a bag of words as interface structure

Page 14: Deep Grammars in Hybrid Machine Translation

An MRS representation

• is a bag of semantic entities (some corresponding to words,

some not),

each with a handle,

• plus a bag of handle constraints allowing the underspecification

of

scope,

• plus a handle and an index.

• Each semantic entity is referred to as an Elementary Predication

(EP).

• Relations among EPs are captured by means of shared

variables.

• There are three elementary variable types:

- handles (or 'labels') (h)

- events (e)

- referential indices (x)

Page 15: Deep Grammars in Hybrid Machine Translation

From standard logical form to MRS

«Every ferry crosses some fjord»

Two readings:

Replace operators with generalized quantifiers:

every(variable, restriction, body)some(variable, restriction, body)

The first reading (wide-scope every):

var restriction body

Page 16: Deep Grammars in Hybrid Machine Translation

Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)

Page 17: Deep Grammars in Hybrid Machine Translation

Underspecified scope by means of handle constraints:

Make the structure flat:• give each EP a handle• replace embedded EPs by their handles• collect all EPs on the same level (understood as conjunction)

Wide scope: someWide scope: every

Page 18: Deep Grammars in Hybrid Machine Translation

MRS as feature structure (also adding event variables):

Norwegian translation: «Hver ferge krysser en fjord»

Page 19: Deep Grammars in Hybrid Machine Translation

Projecting MRS representationsfrom f-structures

«Katten sover»'The cat sleeps'

Page 20: Deep Grammars in Hybrid Machine Translation

Projecting MRS representationsfrom f-structures

«Katten sover»'The cat sleeps'

Page 21: Deep Grammars in Hybrid Machine Translation
Page 22: Deep Grammars in Hybrid Machine Translation

mrs::

Page 23: Deep Grammars in Hybrid Machine Translation

mrs::

mrs::

Page 24: Deep Grammars in Hybrid Machine Translation

Composition: Top-level MRSwith unions of HCONS and RELS:

Page 25: Deep Grammars in Hybrid Machine Translation
Page 26: Deep Grammars in Hybrid Machine Translation

Post-processing this structurebrings us back to the LOGON MRS format:

http://decentius.aksis.uib.no/logon/xle-mrs.xml

Page 28: Deep Grammars in Hybrid Machine Translation

bil 'car' (as in "Han kjøpte bil" 'He bought [a] car')

No SPEC

Page 29: Deep Grammars in Hybrid Machine Translation

disse hans mange spørsmål 'these his many questions'

Multiple SPECs

Page 30: Deep Grammars in Hybrid Machine Translation

Han jaget barnet ut nakent'He chased the child out naked'

Page 31: Deep Grammars in Hybrid Machine Translation

The Transfer Component

Developer of the formalism: Stephan Oepen

Page 32: Deep Grammars in Hybrid Machine Translation

Example of transfer

Source sentence:

Henter han bilen sin?fetches he car.DEF POSS.REFL.SG.MASC'Does he fetch his car?'

Alternative reading:'Does he fetch the one of the car?'

Page 33: Deep Grammars in Hybrid Machine Translation

Parse output:

Page 34: Deep Grammars in Hybrid Machine Translation

Choosing the first reading of Henter han bilen sin?

Page 35: Deep Grammars in Hybrid Machine Translation

Choosing the first reading of Henter han bilen sin?

The variables have features.Interrogative is coded as [SF ques] on the event variable.

Page 36: Deep Grammars in Hybrid Machine Translation

Two of fourtransferoutputs

Page 37: Deep Grammars in Hybrid Machine Translation

Norwegiantransferinput

One of fourEnglishtransferoutputs

Page 38: Deep Grammars in Hybrid Machine Translation

Generator output from the chosen transfer output

Page 39: Deep Grammars in Hybrid Machine Translation

Transfer formalism(Stephan Oepen)

The form of a transfer rule:

C = contextI = inputF = filterO = output

Page 40: Deep Grammars in Hybrid Machine Translation

Simple example:Lexical transfer rule, transferring bekk into creek

No context, no filter, only the predicate is replaced.

Page 41: Deep Grammars in Hybrid Machine Translation

Example with a context restriction:gå en tur (lit. 'go a trip') is transferred into the light-verb constructiontake a trip.

In the context of _tur_n as its second argument,_gå_v is transferred to _take_v.

Page 42: Deep Grammars in Hybrid Machine Translation

The SEM-I(Semantic Interface)

A documentation of the external semantic interfacefor a grammar, crucial for the writer of transfer rules.

In order to enforce the maintaining of a SEM-I,LOGON parsing returns fail if every parse containsat least one predicate not in the SEM-I.

Page 43: Deep Grammars in Hybrid Machine Translation

A small sectionof the verb partof the NorGramSEM-ISize of the NorwegianSEM-I: slightly lessthan 6000 entries

Page 44: Deep Grammars in Hybrid Machine Translation

Parse Selection

Parsing, transfer and generation may each givemany solutions, leading to a fanout tree:

The outputs at each of the three stages arestatistically ranked.

Page 45: Deep Grammars in Hybrid Machine Translation

Example of a four-way ambiguity:

Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain'

The ParsebankerEfficient treebank building by discriminants

Developer: Paul Meurer, Bergen

Predecessors in discriminant analysis:David Carter (1997)Stephan Oepen, Dan Flickinger & al. (2003)

Page 46: Deep Grammars in Hybrid Machine Translation

1

2

Page 47: Deep Grammars in Hybrid Machine Translation

3

4

Page 48: Deep Grammars in Hybrid Machine Translation

Packed representations and discriminants(Paul Meurer)

Page 49: Deep Grammars in Hybrid Machine Translation
Page 50: Deep Grammars in Hybrid Machine Translation

Clicking on one discriminant is in this case sufficientto select a unique solution:

Page 52: Deep Grammars in Hybrid Machine Translation
Page 53: Deep Grammars in Hybrid Machine Translation
Page 54: Deep Grammars in Hybrid Machine Translation

'After all, a human being must be something more than a machine?'

Page 55: Deep Grammars in Hybrid Machine Translation

TigerSearchThe implementation is under development by Paul Meurer

Find selected prepositional phrases with sentential objects:

Page 56: Deep Grammars in Hybrid Machine Translation

Find selected prepositional phrases with the preposition 'om' and nominal objects:

Page 57: Deep Grammars in Hybrid Machine Translation

Find topicalized objects: