iceland 5/30-6/1/07 1 parsing with morphological information for treebank construction seth kulick...

Iceland 5/30-6/1/07 1

Parsing with Morphological Information for Treebank Construction

Seth KulickUniversity of Pennsylvania

Iceland 5/30-6/1/07 2

Outline

Motivation: Parsing for Treebanking

• Experience at Penn

• Issues for languages with more morphology

Parsing with Morphological Features

• Generative Parsers

• Discriminative Parsers

Conclusions, Issues, Questions

Iceland 5/30-6/1/07 3

Parsing for Treebanking

Major area of NLP over last decade – development of statistical parsers

• Trained on treebank (sentence, tree) pairs

• Output most likely parse for new sentence

• Handling ambiguity

Utility for treebanking

• Quicker for annotator to correct parser output than create structure from scratch(unless parser is really bad!)

Iceland 5/30-6/1/07 4

How Successful are Parsers?

What are parsers used for?

• Generate conference papers

—Evaluation can be isolation

• As step in NLP pipeline for information extraction systems, etc.

—Harder to evaluate across environments

• For Treebank construction

—Not a focus

Iceland 5/30-6/1/07 5

How “Successful” are Parsers?

(Phrase-Structure) Parser evaluation

• Trained and tested on Penn Treebank

Based on matching brackets

• Treebank differences can be significant

Goal is only a “skeletal” tree

• Doesn’t test everything we need for treebank construction

Iceland 5/30-6/1/07 6

What do We Want from a Parser?

(NP (NP answers) (SBAR (WHNP-6 that (S (NP-SBJ-3 we) (VP ‘d (VP like (S NP-SBJ *-3) (VP to (VP have (NP *T*-6)))))

(Example from Penn Treebank)

Iceland 5/30-6/1/07 7


(NP (NP answers) (SBAR (WHNP-6 that (S (NP-SBJ-3 we) (VP ‘d (VP like (S NP-SBJ *-3) (VP to (VP have (NP *T*-6) ))))


Iceland 5/30-6/1/07 8


(NP (NP answers) (SBAR (WHNP that (S (NP we) (VP ‘d (VP like (VP to (VP have))))


• It’s perfect!

Iceland 5/30-6/1/07 9

Improved Parsing for Treebanking

Some work on recovering empty categories (evaluation is not easy)

• Johnson ‘02, Levy&Manning ‘04, Campbell ‘04

Some work on function tags

• Blaheta ’03, Musillo & Merlo ‘05

Function tags and empty categories

• Gabbard, Kulick, Marcus’ 05 (still PTB-centric)

Iceland 5/30-6/1/07 10

Experience at Penn building Treebanks

Parser provides function tags and empty categories

• Various Penn Treebank-style (English) treebanks

Parser provides function tags, not empty categories

• Arabic Treebank, Historical Corpora

• But tags are significantly different for historical corpora – no evaluation yet

Iceland 5/30-6/1/07 11

Open Questions from Penn Experience

Improve parsing for treebanking

• Improve skeletal parse (Arabic, historical)

• How well can we recover function tags?

• Howe well can we recover empty categories?

How are these issues different for languages with morphology?

Iceland 5/30-6/1/07 12

Open Questions from Penn Experience

What problems arise for parsers from languages with more inflection, more free word order?

• Greater number of words, Part of Speech tags

• Correctly identifying arguments of a verb

What frequent syntactic phenomena can the parsers not handle well?

• Movement out of clause

Iceland 5/30-6/1/07 13

Parsers: Generative vs. Discriminative

Generative

• Computationally reasonable (but still complicated!)

• Less flexibility on using features

• Examples: Collins (’99), Bikel (’04)

Discriminative

• Computationally harder

• Flexibility on using features

• Examples: Multilingual Dependency Parser, Reranking

Iceland 5/30-6/1/07 14

Generative Parsers - Outline

Main properties, overview of Collins model

Greater morphology -> tagset games

• Spanish, Czech, Arabic

• Gets hard to “order” the information

More free Word Order -> ?

• Subcat frames – hasn’t been done(?)

• Problem of Long-distance movement

Iceland 5/30-6/1/07 15

Generative Parsers

Decompose tree into parsing decisions.

Generate a tree node-by-node, top-down

Based on a Probabilistic CFG

• Nice computational properties

• Horrendous independence assumptions!

Additional annotation to get better dependencies…

And then deal with sparse data issues

Iceland 5/30-6/1/07 16

PCFG Example Lack of sensitivity to lexical information

S

NP

V ?

VP

One solution: (Collins and others)• Add head (word,pos) to each nonterminal• Argument marking and pseudo-subcategorization

frames

Iceland 5/30-6/1/07 17

Collins’ Modified PCFG

Sbought,V

NPyest,N

V

VPbought,V

New Problem: Sparse Data

NP-AIBM,N

NP-ALotus,N

yesterday

N N

IBM bought Lotus

Iceland 5/30-6/1/07 18

Collins: Dealing with Sparse Data

Decompose Tree: A head-centered, top-down derivation

Independence Assumption: Each word has an associated sub-derivation:• Head-projection• Subcategorization (careful!)• Placement of “modifiers” (arguments and

adjuncts)• Lexical Dependencies

Iceland 5/30-6/1/07 19

Top-down derivation

Sbought,V

VPbought,V

Generate head and Subcat frames for left, right

• Left: {NP-A} Right:{}

Iceland 5/30-6/1/07 20

Top-down derivation

Generate “modifier” with POS tag Generated as argument, so check off

subcat entry (verb has a subject)

Sbought,V

VPbought,VNP-AN

Iceland 5/30-6/1/07 21

Top-down derivation

Sbought,V

VPbought,V

Generate head word of modifier Recursively generate modifier’s

subderivation

NP-AIBM,N

Iceland 5/30-6/1/07 22

Top-down derivation

Sbought,V

VPbought,V

Skipping the subderivation of the subject..

NP-AIBM,N

N

IBM

Iceland 5/30-6/1/07 23

Top-down derivation

Sbought,V

NPN VPbought,V

Generate another modifier with pos tag

NP-AIBM,N

N

IBM

Iceland 5/30-6/1/07 24

Top-down derivation

Sbought,V

NPyest,N VPbought,V

And then head word

NP-AIBM,N

N

IBM

Iceland 5/30-6/1/07 25

Top-down derivation

Sbought,V

NPyest,N VPbought,V

And recursive derivation of modifier Small locality of dependencies each step

NP-AIBM,N

yesterday

N N

IBM

Iceland 5/30-6/1/07 26

Backoff models to deal with sparse data

Lexicalization to sneak in some notion of linguistic locality.

Huge sparse data problem: rules like:S(bought,V)->NP(yesterday,N) NP(IBM,N) VP(bought,V)

Decomposed tree into series of tree creation/attachment decisions• A “history-based” model

But there’s still a sparse data problem…

Iceland 5/30-6/1/07 27

Backoff models to deal with sparse data

How often will we see evidence of this?P(IBM,n,NP | bought,v,S,VP)

Need to backoff to use just POS tagP(IBM,n,NP | v,S,VP)

POS tags – bootstrap syntactic parsing• classify words based on syntactic behavior

Sbought,V

VPbought,VNP-AN

Iceland 5/30-6/1/07 28

Tagset Games - Spanish(Cowan & Collins, 2005)

Plural subject, singular verb • How come this isn’t ruled out?

P(gatos,n,NP | corrio,v,S,VP) = P1(n,NP | corrio,v,S,VP) x P2(gatos | n,NP,corrio,v,S,VP)• Time for the backoff models…

Scorrio,v

VPcorrio,VNP(gatos,n)

Iceland 5/30-6/1/07 31


P2(gatos| n,NP,corrio,v,S,VP) = λ2,1P2,1(gatos | n,NP,corrio,v,S,VP)+λ2,2P2,2(gatos | n,NP,v,S,VP)+ λ2,3P2,3(gatos | n)


But P2,1 is the only probability that rules it out• even worse – even it was seen, probably not often, so

P2,2 will overwhelm it

Iceland 5/30-6/1/07 32


“The impoverished model can only capture morphological restrictions through lexically-specific estimates based on extremely sparse statistics”

But suppose the noun and verb part of speech tags had number information.

Iceland 5/30-6/1/07 33


P1(pn,NP | corrio,sv,S,VP) = λ1,1P1,1(pn,NP | corrio,sv,S,VP)+ λ1,2P1,2(pn,NP | sv,S,VP)+ λ1,3P1,3(pn,NP | S,VP)

pn=plural noun, sv = singular verb P1,2 will be very low, with high confidence λ1,2

They tried a variety of ways to play with the tagset.

Iceland 5/30-6/1/07 34


Scores not additive – sparse data? Best model n(A,D,N,V,P) +m(V) helps with

• Finding subjects• Distinguishing infinitival and gerund VPs• Attaching NP and PP postmodifiers to verbs

Baseline 81.0

number(Adj,Det,Noun,Pronoun,Verb) 82.8

mode(V) 82.4

person(V) 82.4

number(A,D,N,P,V)+mode(V) 83.5

number(A,D,N,P,V)+mode(V)+person(V) 83.2

Iceland 5/30-6/1/07 35

Tagset Games – Czech (Collins,Hajic,Ramshaw,Tillmann 1999)

Convert from dependency to phrase-structure Baseline: Use main POS of each tag

NNMP1-----A– mapped to Nnoun,masculine,plural,nom,”affirmative” negativeness

Two letter tag: main POS and either detailed POS (for D,J,V,X) or Case : 58 tags

richer tagsets -> no improvement, “presumably” because of “damage from sparse data”

Iceland 5/30-6/1/07 36

Tagset Games – Arabic Treebank(Bikel, 2004), (Kulick,Gabbard,Marcus 2006)

Lots of tags • Usual Sparse Data problem• Bikel: can even get new tags not seen in training

Map them down (DET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC JJ)The “Bies tag set” – it was just a quick hack!

(Kulick et al.) – keep the determiner at leastDET+ADJ+NSUFF_FEM_SG+CASE_DEF_ACC DT+JJ)

The Case endings mostly aren’t really there Maybe: Case information to identify heads of constituents (e.g., ADJ heading NP)

Iceland 5/30-6/1/07 37

Tagset Games – Conclusion(still the same as in Collins’ 99)

More tag information w/o sparse data?• P(modifier POS | head POS)• difficulty in doing this is motivation for other parsing

models Lots of word forms w/o sparse data?

• P(word-form|word-stem, POS tag) Another question – for parsing, how important

is such information compared to Case?• Where does it help disambiguate the parse?

Iceland 5/30-6/1/07 38

What about Free Word Order?

Czech work• mentioned as a problem, nothing done

May not have mattered for evaluation? • If not scoring SBJ,OBJ, etc. labels, so what if the

parser doesn’t know what’s what?

Iceland 5/30-6/1/07 39


“Subcat” frame between each level

Sbought,V

VPbought,VNP-AN

Obvious thing to try: integrate Case assignment into subcat frames• Verb requires NOM instead of NP-A, etc.• Alluded to in Collins 2003, not done

(but Zeman 2002 for dependency Czech parsing) Ease the problem of using other morph info?

Iceland 5/30-6/1/07 40


Problem: “subcat” frames as implemented are near meaningless

Independent horizontally• Left and right independent

Independent vertically between one-level trees• Sisters of VP are independent of sisters of V

Can this be fixed?• Requires some greater amount of history between

one-level trees, based on head percolation

Iceland 5/30-6/1/07 41

Long-distance movement?

Not handled in most generative parsers• Exception: CCG (Hockenmaier 2003)

PostprocessingWho do you think John sawWhoi do you think John saw ti

“Good enough”, or way to integrate into parsing?• “Good enough” for languages with more long-

distance movement?

Iceland 5/30-6/1/07 42

Discriminative Parsers - Outline

Basic idea and main properties

Dependency parsers –

• Easier handling of morphology and free word order – how successful?

• Long movement – “non-projective” – how successful?

Discriminative Phrase-Structure Parsers

• Handling of morphology and free word order – hasn’t really been tried

Iceland 5/30-6/1/07 43

Discriminative Parsers

Conditional P(T|S) instead of joint P(T,S)

• Training requires reparsing training corpus to update parameters

• Can take a long time!

Easier to utilize dependent features

• Successfully used in other aspects of NLP

• How about for parsing? – computational problem

Iceland 5/30-6/1/07 44

Discriminative Parsers

Dependency Parsing

• Not as computationally hard, still useful

Post-Processing: Parse reranking

• Just work with output of k-best generative parser

Phrase Structure Parsing

• Limited to sentences of length <=15

• Lots of pruning, doesn’t outperform generative parsers (but can still be promising for using morphology)

Iceland 5/30-6/1/07 45

Multi-Lingual Dependency Parsing

CoNLL Shared Task, 2006, 2007 High-performing system: McDonald ’06 Unlabelled Parsing

• Projective or non-Projective (“long” movement)• Still requires factoring the parsing problem• Features between (head, modifier, previous

modifier) – but within that can be very flexible Labelled Parsing

• Postprocessing stage to add labels• Features not limited.

Iceland 5/30-6/1/07 46

Parsing Results Regular Projective No-morph

Arabic (real) 66.9 66.9 65.1

Bulgarian 87.6 87.6 87.2

Chinese 85.9

Czech (real) 80.2

Danish(real) 84.8 84.1 84.0

Dutch 79.2 74.7 77.8

English 89.4

German 87.3

Japanese 90.7 90.7 90.6

Portuguese 86.8 87.0 85.7

Slovene (real) 73.4 72.6 71.5

Spanish 82.3 82.3 80.9

Swedish (real) 82.6 82.6 82.6

Turkish (real) 63.2 63.2 60.6

Iceland 5/30-6/1/07 47

Effects of non-projectivity

Regular Projective No-morph

Arabic(real) 66.9 66.9 65.1

Bulgarian 87.6 87.6 87.2

Danish(real) 84.8 84.1 84.0

Dutch 79.2 74.7 77.8

Japanese 90.7 90.7 90.6


Slovene(real) 73.4 72.6 71.5

Spanish 82.3 82.3 80.9

Swedish(real) 82.6 82.6 82.6

Turkish(real) 63.2 63.2 60.6

Iceland 5/30-6/1/07 48

Effects of morphology

Regular Projective No-morph

Arabic (real) 66.9 66.9 65.1

Bulgarian 87.6 87.6 87.2

Danish (real) 84.8 84.1 84.0

Dutch 79.2 74.7 77.8

Japanese 90.7 90.7 90.6


Slovene (real) 73.4 72.6 71.5

Spanish 82.3 82.3 80.9

Swedish (real) 82.6 82.6 82.6

Turkish (real) 63.2 63.2 60.6

Iceland 5/30-6/1/07 49

Dependency Parsing

Improvement with morphology• Effect of different types (Case, inflection?)

Improvement with freer word-order?• Local freer word-order reflected in labeled

accuracy – how does it compare? Improvement with non-local movement?

• Anything is an improvement• Czech all sentences: 85.2, only sentences with

nonprojective dependency: 81.9 (unlabeled!)

Iceland 5/30-6/1/07 50

Parse Reranking

Work with output of k-best parser

• Limited problem, computationally easier

• No tree decomposition, arbitrary featurese.g. trigrams with heads of arguments of PPs

Spanish: 83.5 to 85.1 with reranking

• Cowan & Collins – same reranking features as for English

Reranking incorporating morphological features?

Iceland 5/30-6/1/07 51

Discriminative Phrase-Structure Parsing

What discriminative parsers parse the usual set of phrase-structure sentences?• Ratnaparkhi ’98

— History-based, “local” features• Collins & Roark ’04 –

— Incremental parsing, “global” features• Shen ’06 - Tree Adjoining Grammar-based

— related to previous— Different approach to non-projectivity— But modifies trees severely

• Others? (CCG?)

Iceland 5/30-6/1/07 52

Incremental Parsing – (Collins & Roark ’04)

Main properties• About same performance as generative• Severe pruning of possible parses for sentence• Features same as generative• Can use generative score as feature

Possibilities:• Use morphological features• Free Word Order, non-local dependencies?• More tightly integrate generative and discriminative

—This should be done

Iceland 5/30-6/1/07 53

Issues,Questions,Conclusions

Not the question:• How can a parser work with a language with lots

of morphology?

Iceland 5/30-6/1/07 54

Issues,Questions,Conclusions

The question(s): • What do we want the annotation to look like?

— An independent question• Based on what we know of how parsers work,

what will the parser have problems recovering?—We can (mostly) answer this

• Where might morphological information be valuable (or not)?

—We can speculate about this• What approaches should we use?

—It depends on the above

Iceland 5/30-6/1/07 55

Some Questions from Talk

Spare Data from greater morphology• Can the tagset game be sufficient for utilizing what is

valuable?• How to backoff from sparse word forms when there

is more inflection? Free Word Order: Can Case be integrated

into subcat frames?• For generative (need better subcat frames)• For discriminative (need subcat frames?)

Function Tags/Labelled Dependency • Good enough for what’s needed?

Iceland 5/30-6/1/07 56

Some Questions from Talk

Empty category recovery• Features to use with morphology?

Long-Distance movement• Can be hacked into generative model?• How adequate are dependency parsers?• How usable is TAG-based approach?• How much do we care for parsing?

Is reranking a reasonable approach?• Make sense to throw morphological features in

here?

Iceland 5/30-6/1/07 57

Combining Solutions

Generative parser as input for discriminative• Incremental parser (Collin & Roark)• Rerankers

Better input for Parser by preprocessing• Dependency parser as input for generative • Chunk using morphology as input for either

(maybe don’t need to parse entire sentence?) Tighter integration?

• Move into discriminative mode at key points inside generative model

Iceland 5/30-6/1/07 58

Blah

The following slides don’t count

Iceland 5/30-6/1/07 59

Issues for Discussion

Sneaking discriminative info into generative model

Dependency parsing as preprocessing constraint for phrase-structure parsing

How much does morphology matter for Icelandic anyway?

Will the treebank be phrase-structure or dependency

The world could really use a high-quality phrase-structure treebank with lot of morphology

Iceland 5/30-6/1/07 60


Preprocessing constituent bracketing from Case info • Do we even really need to parse the whole

sentence? Arabic hypothetical example of using

Case information to identify heads before parsing

Linguistically interesting way to order morph info in generative model?

Iceland 5/30-6/1/07 61


And can’t forget – what about morphology and function tags and empty categories? How helpful will the morphology be? Examples? Again, this depends on what the treebank will look like

Iceland 5/30-6/1/07 62

Alternative Approach #2

Hack around with the parser to allow a little “discriminative” modelling to sneak in at key points. Probably need to save this point for after the end.

Iceland 5/30-6/1/07 63

Generative Parsing - Summary

Pretty good at • skeletal structure for English• Recovering function tags for Penn Treebank• Post-processing empty category recovery

Not good at (or unknown)• Integrating complex POS/morph tags• Long-distance movement• Free word order (but perhaps integrated into post-

processing)

Iceland 5/30-6/1/07 64

Another note on Function Tags

Treebanker, faced with a function tag:

• If correct, nothing to do

• If incorrect, delete it and assign new one

• If none, add if necessary

Want to increase the precision to where the tags can be assumed to be correct

Possible of some tags – e.g., SBJ

Be wary of overall numbers

iceland 5/30-6/1/07 1 parsing with morphological information for treebank construction seth kulick...

Documents