iceland 5/30-6/1/07 1 parsing with morphological information for treebank construction seth kulick...
TRANSCRIPT
Iceland 5/30-6/1/07 1
Parsing with Morphological Information for Treebank Construction
Seth KulickUniversity of Pennsylvania
Iceland 5/30-6/1/07 2
Outline
Motivation: Parsing for Treebanking
• Experience at Penn
• Issues for languages with more morphology
Parsing with Morphological Features
• Generative Parsers
• Discriminative Parsers
Conclusions, Issues, Questions
Iceland 5/30-6/1/07 3
Parsing for Treebanking
Major area of NLP over last decade – development of statistical parsers
• Trained on treebank (sentence, tree) pairs
• Output most likely parse for new sentence
• Handling ambiguity
Utility for treebanking
• Quicker for annotator to correct parser output than create structure from scratch(unless parser is really bad!)
Iceland 5/30-6/1/07 4
How Successful are Parsers?
What are parsers used for?
• Generate conference papers
—Evaluation can be isolation
• As step in NLP pipeline for information extraction systems, etc.
—Harder to evaluate across environments
• For Treebank construction
—Not a focus
Iceland 5/30-6/1/07 5
How “Successful” are Parsers?
(Phrase-Structure) Parser evaluation
• Trained and tested on Penn Treebank
Based on matching brackets
• Treebank differences can be significant
Goal is only a “skeletal” tree
• Doesn’t test everything we need for treebank construction
Iceland 5/30-6/1/07 6
What do We Want from a Parser?
(NP (NP answers) (SBAR (WHNP-6 that (S (NP-SBJ-3 we) (VP ‘d (VP like (S NP-SBJ *-3) (VP to (VP have (NP *T*-6)))))
(Example from Penn Treebank)
Iceland 5/30-6/1/07 7
What do We Want from a Parser?
(NP (NP answers) (SBAR (WHNP-6 that (S (NP-SBJ-3 we) (VP ‘d (VP like (S NP-SBJ *-3) (VP to (VP have (NP *T*-6) ))))
(Example from Penn Treebank)
Iceland 5/30-6/1/07 8
What do We Want from a Parser?
(NP (NP answers) (SBAR (WHNP that (S (NP we) (VP ‘d (VP like (VP to (VP have))))
(Example from Penn Treebank)
• It’s perfect!
Iceland 5/30-6/1/07 9
Improved Parsing for Treebanking
Some work on recovering empty categories (evaluation is not easy)
• Johnson ‘02, Levy&Manning ‘04, Campbell ‘04
Some work on function tags
• Blaheta ’03, Musillo & Merlo ‘05
Function tags and empty categories
• Gabbard, Kulick, Marcus’ 05 (still PTB-centric)
Iceland 5/30-6/1/07 10
Experience at Penn building Treebanks
Parser provides function tags and empty categories
• Various Penn Treebank-style (English) treebanks
Parser provides function tags, not empty categories
• Arabic Treebank, Historical Corpora
• But tags are significantly different for historical corpora – no evaluation yet
Iceland 5/30-6/1/07 11
Open Questions from Penn Experience
Improve parsing for treebanking
• Improve skeletal parse (Arabic, historical)
• How well can we recover function tags?
• Howe well can we recover empty categories?
How are these issues different for languages with morphology?
Iceland 5/30-6/1/07 12
Open Questions from Penn Experience
What problems arise for parsers from languages with more inflection, more free word order?
• Greater number of words, Part of Speech tags
• Correctly identifying arguments of a verb
What frequent syntactic phenomena can the parsers not handle well?
• Movement out of clause
Iceland 5/30-6/1/07 13
Parsers: Generative vs. Discriminative
Generative
• Computationally reasonable (but still complicated!)
• Less flexibility on using features
• Examples: Collins (’99), Bikel (’04)
Discriminative
• Computationally harder
• Flexibility on using features
• Examples: Multilingual Dependency Parser, Reranking
Iceland 5/30-6/1/07 14
Generative Parsers - Outline
Main properties, overview of Collins model
Greater morphology -> tagset games
• Spanish, Czech, Arabic
• Gets hard to “order” the information
More free Word Order -> ?
• Subcat frames – hasn’t been done(?)
• Problem of Long-distance movement
Iceland 5/30-6/1/07 15
Generative Parsers
Decompose tree into parsing decisions.
Generate a tree node-by-node, top-down
Based on a Probabilistic CFG
• Nice computational properties
• Horrendous independence assumptions!
Additional annotation to get better dependencies…
And then deal with sparse data issues
Iceland 5/30-6/1/07 16
PCFG Example Lack of sensitivity to lexical information
S
NP
V ?
VP
One solution: (Collins and others)• Add head (word,pos) to each nonterminal• Argument marking and pseudo-subcategorization
frames
Iceland 5/30-6/1/07 17
Collins’ Modified PCFG
Sbought,V
NPyest,N
V
VPbought,V
New Problem: Sparse Data
NP-AIBM,N
NP-ALotus,N
yesterday
N N
IBM bought Lotus
Iceland 5/30-6/1/07 18
Collins: Dealing with Sparse Data
Decompose Tree: A head-centered, top-down derivation
Independence Assumption: Each word has an associated sub-derivation:• Head-projection• Subcategorization (careful!)• Placement of “modifiers” (arguments and
adjuncts)• Lexical Dependencies
Iceland 5/30-6/1/07 19
Top-down derivation
Sbought,V
VPbought,V
Generate head and Subcat frames for left, right
• Left: {NP-A} Right:{}
Iceland 5/30-6/1/07 20
Top-down derivation
Generate “modifier” with POS tag Generated as argument, so check off
subcat entry (verb has a subject)
Sbought,V
VPbought,VNP-AN
Iceland 5/30-6/1/07 21
Top-down derivation
Sbought,V
VPbought,V
Generate head word of modifier Recursively generate modifier’s
subderivation
NP-AIBM,N
Iceland 5/30-6/1/07 22
Top-down derivation
Sbought,V
VPbought,V
Skipping the subderivation of the subject..
NP-AIBM,N
N
IBM
Iceland 5/30-6/1/07 23
Top-down derivation
Sbought,V
NPN VPbought,V
Generate another modifier with pos tag
NP-AIBM,N
N
IBM
Iceland 5/30-6/1/07 24
Top-down derivation
Sbought,V
NPyest,N VPbought,V
And then head word
NP-AIBM,N
N
IBM
Iceland 5/30-6/1/07 25
Top-down derivation
Sbought,V
NPyest,N VPbought,V
And recursive derivation of modifier Small locality of dependencies each step
NP-AIBM,N
yesterday
N N
IBM
Iceland 5/30-6/1/07 26
Backoff models to deal with sparse data
Lexicalization to sneak in some notion of linguistic locality.
Huge sparse data problem: rules like:S(bought,V)->NP(yesterday,N) NP(IBM,N) VP(bought,V)
Decomposed tree into series of tree creation/attachment decisions• A “history-based” model
But there’s still a sparse data problem…
Iceland 5/30-6/1/07 27
Backoff models to deal with sparse data
How often will we see evidence of this?P(IBM,n,NP | bought,v,S,VP)
Need to backoff to use just POS tagP(IBM,n,NP | v,S,VP)
POS tags – bootstrap syntactic parsing• classify words based on syntactic behavior
Sbought,V
VPbought,VNP-AN
Iceland 5/30-6/1/07 28
Tagset Games - Spanish(Cowan & Collins, 2005)
Plural subject, singular verb • How come this isn’t ruled out?
P(gatos,n,NP | corrio,v,S,VP) = P1(n,NP | corrio,v,S,VP) x P2(gatos | n,NP,corrio,v,S,VP)• Time for the backoff models…
Scorrio,v
VPcorrio,VNP(gatos,n)
Iceland 5/30-6/1/07 29
Tagset Games - Spanish(Cowan & Collins, 2005)
P(gatos,n.NP | corrio,v,S,VP) = P1(n,NP | corrio,v,S,VP) x P2(gatos | n,NP,corrio,v,S,VP)
P1(n,NP | corrio,v,S,VP) = λ1,1P1,1(n,NP | corrio,v,S,VP)+ λ1,2P1,2(n,NP | v,S,VP)+ λ1,3P1,3(n,NP | S,VP)
If corrio not seen (P1,1), use evidence without lexical dependence (P1,2)
Iceland 5/30-6/1/07 30
Tagset Games - Spanish(Cowan & Collins, 2005)
P(gatos,n.NP | corrio,v,S,VP) = P1(n,NP | corrio,v,S,VP) x P2(gatos | n,NP,corrio,v,S,VP)
P2(gatos| n,NP,corrio,v,S,VP) = λ2,1P2,1(gatos | n,NP,corrio,v,S,VP)+λ2,2P2,2(gatos | n,NP,v,S,VP)+ λ2,3P2,3(gatos | n)
If corrio not seen (P2,1), use evidence without lexical dependence (P2,2)
Iceland 5/30-6/1/07 31
Tagset Games - Spanish(Cowan & Collins, 2005)
P2(gatos| n,NP,corrio,v,S,VP) = λ2,1P2,1(gatos | n,NP,corrio,v,S,VP)+λ2,2P2,2(gatos | n,NP,v,S,VP)+ λ2,3P2,3(gatos | n)
If corrio not seen (P2,1), use evidence without lexical dependence (P2,2)
But P2,1 is the only probability that rules it out• even worse – even it was seen, probably not often, so
P2,2 will overwhelm it
Iceland 5/30-6/1/07 32
Tagset Games - Spanish(Cowan & Collins, 2005)
“The impoverished model can only capture morphological restrictions through lexically-specific estimates based on extremely sparse statistics”
But suppose the noun and verb part of speech tags had number information.
Iceland 5/30-6/1/07 33
Tagset Games - Spanish(Cowan & Collins, 2005)
P1(pn,NP | corrio,sv,S,VP) = λ1,1P1,1(pn,NP | corrio,sv,S,VP)+ λ1,2P1,2(pn,NP | sv,S,VP)+ λ1,3P1,3(pn,NP | S,VP)
pn=plural noun, sv = singular verb P1,2 will be very low, with high confidence λ1,2
They tried a variety of ways to play with the tagset.
Iceland 5/30-6/1/07 34
Tagset Games - Spanish(Cowan & Collins, 2005)
Scores not additive – sparse data? Best model n(A,D,N,V,P) +m(V) helps with
• Finding subjects• Distinguishing infinitival and gerund VPs• Attaching NP and PP postmodifiers to verbs
Baseline 81.0
number(Adj,Det,Noun,Pronoun,Verb) 82.8
mode(V) 82.4
person(V) 82.4
number(A,D,N,P,V)+mode(V) 83.5
number(A,D,N,P,V)+mode(V)+person(V) 83.2
Iceland 5/30-6/1/07 35
Tagset Games – Czech (Collins,Hajic,Ramshaw,Tillmann 1999)
Convert from dependency to phrase-structure Baseline: Use main POS of each tag
NNMP1-----A– mapped to Nnoun,masculine,plural,nom,”affirmative” negativeness
Two letter tag: main POS and either detailed POS (for D,J,V,X) or Case : 58 tags
richer tagsets -> no improvement, “presumably” because of “damage from sparse data”
Iceland 5/30-6/1/07 36
Tagset Games – Arabic Treebank(Bikel, 2004), (Kulick,Gabbard,Marcus 2006)
Lots of tags • Usual Sparse Data problem• Bikel: can even get new tags not seen in training
Map them down (DET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC JJ)The “Bies tag set” – it was just a quick hack!
(Kulick et al.) – keep the determiner at leastDET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC DT+JJ)
The Case endings mostly aren’t really there Maybe: Case information to identify heads of constituents (e.g., ADJ heading NP)
Iceland 5/30-6/1/07 37
Tagset Games – Conclusion(still the same as in Collins’ 99)
More tag information w/o sparse data?• P(modifier POS | head POS)• difficulty in doing this is motivation for other parsing
models Lots of word forms w/o sparse data?
• P(word-form|word-stem, POS tag) Another question – for parsing, how important
is such information compared to Case?• Where does it help disambiguate the parse?
Iceland 5/30-6/1/07 38
What about Free Word Order?
Czech work• mentioned as a problem, nothing done
May not have mattered for evaluation? • If not scoring SBJ,OBJ, etc. labels, so what if the
parser doesn’t know what’s what?
Iceland 5/30-6/1/07 39
What about Free Word Order?
“Subcat” frame between each level
Sbought,V
VPbought,VNP-AN
Obvious thing to try: integrate Case assignment into subcat frames• Verb requires NOM instead of NP-A, etc.• Alluded to in Collins 2003, not done
(but Zeman 2002 for dependency Czech parsing) Ease the problem of using other morph info?
Iceland 5/30-6/1/07 40
What about Free Word Order?
Problem: “subcat” frames as implemented are near meaningless
Independent horizontally• Left and right independent
Independent vertically between one-level trees• Sisters of VP are independent of sisters of V
Can this be fixed?• Requires some greater amount of history between
one-level trees, based on head percolation
Iceland 5/30-6/1/07 41
Long-distance movement?
Not handled in most generative parsers• Exception: CCG (Hockenmaier 2003)
PostprocessingWho do you think John sawWhoi do you think John saw ti
“Good enough”, or way to integrate into parsing?• “Good enough” for languages with more long-
distance movement?
Iceland 5/30-6/1/07 42
Discriminative Parsers - Outline
Basic idea and main properties
Dependency parsers –
• Easier handling of morphology and free word order – how successful?
• Long movement – “non-projective” – how successful?
Discriminative Phrase-Structure Parsers
• Handling of morphology and free word order – hasn’t really been tried
Iceland 5/30-6/1/07 43
Discriminative Parsers
Conditional P(T|S) instead of joint P(T,S)
• Training requires reparsing training corpus to update parameters
• Can take a long time!
Easier to utilize dependent features
• Successfully used in other aspects of NLP
• How about for parsing? – computational problem
Iceland 5/30-6/1/07 44
Discriminative Parsers
Dependency Parsing
• Not as computationally hard, still useful
Post-Processing: Parse reranking
• Just work with output of k-best generative parser
Phrase Structure Parsing
• Limited to sentences of length <=15
• Lots of pruning, doesn’t outperform generative parsers (but can still be promising for using morphology)
Iceland 5/30-6/1/07 45
Multi-Lingual Dependency Parsing
CoNLL Shared Task, 2006, 2007 High-performing system: McDonald ’06 Unlabelled Parsing
• Projective or non-Projective (“long” movement)• Still requires factoring the parsing problem• Features between (head, modifier, previous
modifier) – but within that can be very flexible Labelled Parsing
• Postprocessing stage to add labels• Features not limited.
Iceland 5/30-6/1/07 46
Parsing Results Regular Projective No-morph
Arabic (real) 66.9 66.9 65.1
Bulgarian 87.6 87.6 87.2
Chinese 85.9
Czech (real) 80.2
Danish(real) 84.8 84.1 84.0
Dutch 79.2 74.7 77.8
English 89.4
German 87.3
Japanese 90.7 90.7 90.6
Portuguese 86.8 87.0 85.7
Slovene (real) 73.4 72.6 71.5
Spanish 82.3 82.3 80.9
Swedish (real) 82.6 82.6 82.6
Turkish (real) 63.2 63.2 60.6
Iceland 5/30-6/1/07 47
Effects of non-projectivity
Regular Projective No-morph
Arabic(real) 66.9 66.9 65.1
Bulgarian 87.6 87.6 87.2
Danish(real) 84.8 84.1 84.0
Dutch 79.2 74.7 77.8
Japanese 90.7 90.7 90.6
Portuguese 86.8 87.0 85.7
Slovene(real) 73.4 72.6 71.5
Spanish 82.3 82.3 80.9
Swedish(real) 82.6 82.6 82.6
Turkish(real) 63.2 63.2 60.6
Iceland 5/30-6/1/07 48
Effects of morphology
Regular Projective No-morph
Arabic (real) 66.9 66.9 65.1
Bulgarian 87.6 87.6 87.2
Danish (real) 84.8 84.1 84.0
Dutch 79.2 74.7 77.8
Japanese 90.7 90.7 90.6
Portuguese 86.8 87.0 85.7
Slovene (real) 73.4 72.6 71.5
Spanish 82.3 82.3 80.9
Swedish (real) 82.6 82.6 82.6
Turkish (real) 63.2 63.2 60.6
Iceland 5/30-6/1/07 49
Dependency Parsing
Improvement with morphology• Effect of different types (Case, inflection?)
Improvement with freer word-order?• Local freer word-order reflected in labeled
accuracy – how does it compare? Improvement with non-local movement?
• Anything is an improvement• Czech all sentences: 85.2, only sentences with
nonprojective dependency: 81.9 (unlabeled!)
Iceland 5/30-6/1/07 50
Parse Reranking
Work with output of k-best parser
• Limited problem, computationally easier
• No tree decomposition, arbitrary featurese.g. trigrams with heads of arguments of PPs
Spanish: 83.5 to 85.1 with reranking
• Cowan & Collins – same reranking features as for English
Reranking incorporating morphological features?
Iceland 5/30-6/1/07 51
Discriminative Phrase-Structure Parsing
What discriminative parsers parse the usual set of phrase-structure sentences?• Ratnaparkhi ’98
— History-based, “local” features• Collins & Roark ’04 –
— Incremental parsing, “global” features• Shen ’06 - Tree Adjoining Grammar-based
— related to previous— Different approach to non-projectivity— But modifies trees severely
• Others? (CCG?)
Iceland 5/30-6/1/07 52
Incremental Parsing – (Collins & Roark ’04)
Main properties• About same performance as generative• Severe pruning of possible parses for sentence• Features same as generative• Can use generative score as feature
Possibilities:• Use morphological features• Free Word Order, non-local dependencies?• More tightly integrate generative and discriminative
—This should be done
Iceland 5/30-6/1/07 53
Issues,Questions,Conclusions
Not the question:• How can a parser work with a language with lots
of morphology?
Iceland 5/30-6/1/07 54
Issues,Questions,Conclusions
The question(s): • What do we want the annotation to look like?
— An independent question• Based on what we know of how parsers work,
what will the parser have problems recovering?—We can (mostly) answer this
• Where might morphological information be valuable (or not)?
—We can speculate about this• What approaches should we use?
—It depends on the above
Iceland 5/30-6/1/07 55
Some Questions from Talk
Spare Data from greater morphology• Can the tagset game be sufficient for utilizing what is
valuable?• How to backoff from sparse word forms when there
is more inflection? Free Word Order: Can Case be integrated
into subcat frames?• For generative (need better subcat frames)• For discriminative (need subcat frames?)
Function Tags/Labelled Dependency • Good enough for what’s needed?
Iceland 5/30-6/1/07 56
Some Questions from Talk
Empty category recovery• Features to use with morphology?
Long-Distance movement• Can be hacked into generative model?• How adequate are dependency parsers?• How usable is TAG-based approach?• How much do we care for parsing?
Is reranking a reasonable approach?• Make sense to throw morphological features in
here?
Iceland 5/30-6/1/07 57
Combining Solutions
Generative parser as input for discriminative• Incremental parser (Collin & Roark)• Rerankers
Better input for Parser by preprocessing• Dependency parser as input for generative • Chunk using morphology as input for either
(maybe don’t need to parse entire sentence?) Tighter integration?
• Move into discriminative mode at key points inside generative model
Iceland 5/30-6/1/07 58
Blah
The following slides don’t count
Iceland 5/30-6/1/07 59
Issues for Discussion
Sneaking discriminative info into generative model
Dependency parsing as preprocessing constraint for phrase-structure parsing
How much does morphology matter for Icelandic anyway?
Will the treebank be phrase-structure or dependency
The world could really use a high-quality phrase-structure treebank with lot of morphology
Iceland 5/30-6/1/07 60
Issues for Discussion
Preprocessing constituent bracketing from Case info • Do we even really need to parse the whole
sentence? Arabic hypothetical example of using
Case information to identify heads before parsing
Linguistically interesting way to order morph info in generative model?
Iceland 5/30-6/1/07 61
Issues for Discussion
And can’t forget – what about morphology and function tags and empty categories? How helpful will the morphology be? Examples? Again, this depends on what the treebank will look like
Iceland 5/30-6/1/07 62
Alternative Approach #2
Hack around with the parser to allow a little “discriminative” modelling to sneak in at key points. Probably need to save this point for after the end.
Iceland 5/30-6/1/07 63
Generative Parsing - Summary
Pretty good at • skeletal structure for English• Recovering function tags for Penn Treebank• Post-processing empty category recovery
Not good at (or unknown)• Integrating complex POS/morph tags• Long-distance movement• Free word order (but perhaps integrated into post-
processing)
Iceland 5/30-6/1/07 64
Another note on Function Tags
Treebanker, faced with a function tag:
• If correct, nothing to do
• If incorrect, delete it and assign new one
• If none, add if necessary
Want to increase the precision to where the tags can be assumed to be correct
Possible of some tags – e.g., SBJ
Be wary of overall numbers