grammar engineering: coordination and macros metarulemacro interfacing finite-state morphology
DESCRIPTION
Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology. Miriam Butt (University of Konstanz) and Martin Forst ( NetBase Solutions). Colombo 2014. Coordination. Every attribute can only have one value - PowerPoint PPT PresentationTRANSCRIPT
Grammar Engineering:Grammar Engineering:Coordination and MacrosCoordination and MacrosMETARULEMACROMETARULEMACROInterfacing finite-state morphologyInterfacing finite-state morphology
Miriam Butt Miriam Butt (University of Konstanz)(University of Konstanz)and and Martin Forst (NetBase Solutions)Martin Forst (NetBase Solutions)
Colombo 2014Colombo 2014
CoordinationCoordination Every attribute can only have one value So what do we do with coordinated
constituents?
Example: gorillas sleep and eatVP --> { …
| VP: ! $ ^
CONJ
VP: ! $ ^
}.
Coordination (cont’d)Coordination (cont’d) Coordination can happen basically at any level
of the c-structure.
Example: the gorillas peel and eat the bananasV --> { …
| V: ! $ ^
CONJ
V: ! $ ^
}.
Coordination (cont’d)Coordination (cont’d)
Basically any category can be coordinated.
Example: the gorillas eat the bananas in the cage and in the garden
PP --> { …
| PP: ! $ ^
CONJ
PP: ! $ ^
}.
Coordination (cont’d)Coordination (cont’d)
How can we capture these generalizations?
Via regular-expression macros!
SC-COORD(CAT) = CAT: ! $ ^;
CONJ
CAT: ! $ ^.
PP --> { ...
| @(SC-COORD PP)
}.
Nominal coordinationNominal coordination NP, N, etc. coordination is special because the
NUM attribute should typically have the value pl even when the individual set members are in the sg.
Examples: Mary and the gorilla like bananas.
The boys and girls like bananas.
Nominal coordination (cont’d)Nominal coordination (cont’d)
NP-COORD(CAT) = CAT: ! $ ^;
CONJ: ^ = !
(^ NUM) = pl;
CAT: ! $ ^.
NP --> { ...
| @(NP-COORD NP)
}.
N --> { ...
| @(NP-COORD N)
}.
METARULEMACROMETARULEMACRO Macros are nice But can‘t we do better? After all, it‘s pretty tedious to go into almost all
rules and invoke either the SC-COORD or the NP-COORD macro
XLE has a special macro called the METARULEMACRO
Every rule goes through the METARULEMACRO unless specified otherwise
METARULEMACRO (cont’d)METARULEMACRO (cont’d) Takes three arguments: _CAT, _BASECAT,
and _RHS _CAT is the category on the left-hand side of
the rule _BASECAT is the same as _CAT unless you
are dealing with a complex-category rule _RHS is the right-hand side of the rule
METARULEMACRO (cont’d)METARULEMACRO (cont’d)
METARULEMACRO(_CAT _BASECAT _RHS)=
{ _RHS
| e: _CAT $ { N NP };
@(NP-COORD _CAT)
| e: _CAT ~$ { N NP };
@(SC-COORD _CAT)
}.
Interfacing finite-state transducersInterfacing finite-state transducers Maintaining a full-form lexicon is tedious Many lexicon entries look alike Is there a way to get the information about the
category of a word from somewhere, ideally along with information about morphosyntactic categories such as tense, mood, case, number, person, etc?
Finite-state morphologies!
Interfacing finite-state transducersInterfacing finite-state transducers Cascade of finite-state transducers used is
specified in MORPHOLOGY section At least two subsections:
– TOKENIZE– ANALYZE
By default, the transducers listed are used both for parsing and for generation
This behavior can be altered by prefixing the names of transducer files with P! or G!
TokenizationTokenization So far, only white spaces are considered as
token boundaries However, there are more kinds of token
boundaries in real-word text– Punctuation has to be split off the preceding token– Some white spaces should not be treated as token
boundaries, e.g. “Sri Lanka”– Upper-case letters at sentence beginnings should
optionally be lower-cased
A finite-state tokenizer takes care of these things
Finite-state morphologiesFinite-state morphologies Map surface forms to canonical form (lemma)
and series of morphological tags
Examples:
rode ride +Verb +PastTense +123P
rides ride +Verb +Pres +3sg
ride +Noun +Pl
children child +Noun +Pl
Interfacing Finite-state MorphologyInterfacing Finite-state Morphology Morphological tags need to be listed in the
lexicon– Sublexical lexicon entries look like regular lexicon
entries– Difference: morphcode xle instead of *
Lemmas with non-predictable subcategorization frames must be listed in the lexicon
Other lemmas can be dealt with by the -unknown entry
Lexicon entries for morphology Lexicon entries for morphology outputoutput
+Verb V-POS XLE .
+Pres TNS XLE @VPRES.
+3sg PERS XLE @S-AGR.
wait V-S XLE (^ PRED)= ‘wait<(^SUBJ)(^OBL)>’.
-unknown A-S XLE @(PRED %stem);
N-S XLE @(PRED %stem).
Interfacing Finite-state MorphologyInterfacing Finite-state Morphology
Morphology output needs to be parsed by sublexical rules– Look like regular rules– Have f-annotations like regular rules– Difference: Sublexical categories are marked with
the suffix _BASE
Interfacing Finite-state MorphologyInterfacing Finite-state Morphology
V --> V-S_BASE
V-POS_BASE
{ TNS_BASE
PERS_BASE
| ASP_BASE
}.
XLE Lookup ModelXLE Lookup Model Only one entry per headword per lexicon
section Same headword may be covered by an explicit
entry and by -unknown entry In order to allow this, we need to mark the
explicit entry with ; ETC
sleep V-S @(INTRANS sleep); ETC.
-unknown N-S @(PRED %stem).