grammar engineering: coordination and macros metarulemacro interfacing finite-state morphology

19
Grammar Engineering: Grammar Engineering: Coordination and Macros Coordination and Macros METARULEMACRO METARULEMACRO Interfacing finite-state Interfacing finite-state morphology morphology Miriam Butt Miriam Butt (University of Konstanz) (University of Konstanz) and and Martin Forst (NetBase Martin Forst (NetBase Solutions) Solutions) Colombo 2014 Colombo 2014

Upload: brenden-maynard

Post on 01-Jan-2016

18 views

Category:

Documents


1 download

DESCRIPTION

Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology. Miriam Butt (University of Konstanz) and Martin Forst ( NetBase Solutions). Colombo 2014. Coordination. Every attribute can only have one value - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Grammar Engineering:Grammar Engineering:Coordination and MacrosCoordination and MacrosMETARULEMACROMETARULEMACROInterfacing finite-state morphologyInterfacing finite-state morphology

Miriam Butt Miriam Butt (University of Konstanz)(University of Konstanz)and and Martin Forst (NetBase Solutions)Martin Forst (NetBase Solutions)

Colombo 2014Colombo 2014

Page 2: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

CoordinationCoordination Every attribute can only have one value So what do we do with coordinated

constituents?

Example: gorillas sleep and eatVP --> { …

| VP: ! $ ^

CONJ

VP: ! $ ^

}.

Page 3: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Coordination (cont’d)Coordination (cont’d) Coordination can happen basically at any level

of the c-structure.

Example: the gorillas peel and eat the bananasV --> { …

| V: ! $ ^

CONJ

V: ! $ ^

}.

Page 4: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Coordination (cont’d)Coordination (cont’d)

Basically any category can be coordinated.

Example: the gorillas eat the bananas in the cage and in the garden

PP --> { …

| PP: ! $ ^

CONJ

PP: ! $ ^

}.

Page 5: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Coordination (cont’d)Coordination (cont’d)

How can we capture these generalizations?

Via regular-expression macros!

SC-COORD(CAT) = CAT: ! $ ^;

CONJ

CAT: ! $ ^.

PP --> { ...

| @(SC-COORD PP)

}.

Page 6: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Nominal coordinationNominal coordination NP, N, etc. coordination is special because the

NUM attribute should typically have the value pl even when the individual set members are in the sg.

Examples: Mary and the gorilla like bananas.

The boys and girls like bananas.

Page 7: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Nominal coordination (cont’d)Nominal coordination (cont’d)

NP-COORD(CAT) = CAT: ! $ ^;

CONJ: ^ = !

(^ NUM) = pl;

CAT: ! $ ^.

NP --> { ...

| @(NP-COORD NP)

}.

N --> { ...

| @(NP-COORD N)

}.

Page 8: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

METARULEMACROMETARULEMACRO Macros are nice But can‘t we do better? After all, it‘s pretty tedious to go into almost all

rules and invoke either the SC-COORD or the NP-COORD macro

XLE has a special macro called the METARULEMACRO

Every rule goes through the METARULEMACRO unless specified otherwise

Page 9: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

METARULEMACRO (cont’d)METARULEMACRO (cont’d) Takes three arguments: _CAT, _BASECAT,

and _RHS _CAT is the category on the left-hand side of

the rule _BASECAT is the same as _CAT unless you

are dealing with a complex-category rule _RHS is the right-hand side of the rule

Page 10: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

METARULEMACRO (cont’d)METARULEMACRO (cont’d)

METARULEMACRO(_CAT _BASECAT _RHS)=

{ _RHS

| e: _CAT $ { N NP };

@(NP-COORD _CAT)

| e: _CAT ~$ { N NP };

@(SC-COORD _CAT)

}.

Page 11: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Interfacing finite-state transducersInterfacing finite-state transducers Maintaining a full-form lexicon is tedious Many lexicon entries look alike Is there a way to get the information about the

category of a word from somewhere, ideally along with information about morphosyntactic categories such as tense, mood, case, number, person, etc?

Finite-state morphologies!

Page 12: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Interfacing finite-state transducersInterfacing finite-state transducers Cascade of finite-state transducers used is

specified in MORPHOLOGY section At least two subsections:

– TOKENIZE– ANALYZE

By default, the transducers listed are used both for parsing and for generation

This behavior can be altered by prefixing the names of transducer files with P! or G!

Page 13: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

TokenizationTokenization So far, only white spaces are considered as

token boundaries However, there are more kinds of token

boundaries in real-word text– Punctuation has to be split off the preceding token– Some white spaces should not be treated as token

boundaries, e.g. “Sri Lanka”– Upper-case letters at sentence beginnings should

optionally be lower-cased

A finite-state tokenizer takes care of these things

Page 14: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Finite-state morphologiesFinite-state morphologies Map surface forms to canonical form (lemma)

and series of morphological tags

Examples:

rode ride +Verb +PastTense +123P

rides ride +Verb +Pres +3sg

ride +Noun +Pl

children child +Noun +Pl

Page 15: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Interfacing Finite-state MorphologyInterfacing Finite-state Morphology Morphological tags need to be listed in the

lexicon– Sublexical lexicon entries look like regular lexicon

entries– Difference: morphcode xle instead of *

Lemmas with non-predictable subcategorization frames must be listed in the lexicon

Other lemmas can be dealt with by the -unknown entry

Page 16: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Lexicon entries for morphology Lexicon entries for morphology outputoutput

+Verb V-POS XLE .

+Pres TNS XLE @VPRES.

+3sg PERS XLE @S-AGR.

wait V-S XLE (^ PRED)= ‘wait<(^SUBJ)(^OBL)>’.

-unknown A-S XLE @(PRED %stem);

N-S XLE @(PRED %stem).

Page 17: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Interfacing Finite-state MorphologyInterfacing Finite-state Morphology

Morphology output needs to be parsed by sublexical rules– Look like regular rules– Have f-annotations like regular rules– Difference: Sublexical categories are marked with

the suffix _BASE

Page 18: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Interfacing Finite-state MorphologyInterfacing Finite-state Morphology

V --> V-S_BASE

V-POS_BASE

{ TNS_BASE

PERS_BASE

| ASP_BASE

}.

Page 19: Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

XLE Lookup ModelXLE Lookup Model Only one entry per headword per lexicon

section Same headword may be covered by an explicit

entry and by -unknown entry In order to allow this, we need to mark the

explicit entry with ; ETC

sleep V-S @(INTRANS sleep); ETC.

-unknown N-S @(PRED %stem).