dependency hashing for n-best ccg parsing

Dependency Hashing for n-best CCG Parsing

Dominick Ng and James R. Curran

Presented by Yun Huang

Background: CCG

• CCG derivation• Dependency

• Evaluation– All components of a de

p. structure must match golden standard

– Prec./Recall/F-score

Background: CCGbank

• CCGbank was created by converting the phrase-structure trees in the PTB into normal-form CCG derivations. (99.44% covered)

Background: C&C parser

• Supertagger: assign possible lexical categories to word (eg. S\NP, (S\NP)/PP for swim)– Tag dictionary extracted from training data– Adaptive supertagging: β and k

• C&C parser: log-linear model parser– POS tags and lexical categories as input.– CKY chart parsing– N-best reranking

Ambiguity in n-best CCG parsing

• Spurious ambiguity– Norm-form (usually right branching)

• Absorption ambiguity

• Diversity problem: n-best CCG derivations, but with duplicated dependencies

Dependency Hashing (1)

• Constraint: any n-best candidate must not have the same dependencies as any candidate already in the list.– Similar in SMT: remove duplicated strings– Delete which: later inserted? lower score?

Dependency Hashing (2)

• Implementation:– 32-bit hash value for each dependency

– Bit-wise XOR to combine sub-derivations– Only hash value, no hash table

• Collision: miss some useful dependencies

Diversity experiments

• Dependency

• Grammatical relation

Parsing Results

• Oracle– Reranking u

pper bound

• Reranking

Three types of error

• Grammar error– Only a subset of CCGbank rules are used– Seen rule constraint

• Supertagger error– Restricted categories by frequency cutoff – Probability threshold βand cutoff k

• Model error– Suboptimal parse

Grammar Error

• Given gold-standard categories, the parser F-score is 99.49%, with 95.61% coverage

• Grammar error accounts about 0.5% of overall parser errors, and 4.4% drop in coverage

Supertagger and model error

• Supertagger error : differ from oracle• Model error : differ from baseline

More experiments

• Tradeoff of speed and accuracy

• Gold/automatic

POS tags

Conclusion

• Dependency hashing for n-best CCG– Avoid derivations with same dependency– Increase diversity in n-best list

• Comprehensive error analysis– Grammar error: 0.5%– Supertagger error: 5%– Model error: 7.5%

Thank you

dependency hashing for n-best ccg parsing

Documents

hashing - ibr.cs.tu-bs.de12.pdf · 4/33 hashing...

nlp4ads - nasa astrophysics data...

hashing (tabela de dispersão). roteiro contextualização...

hashing, hashing tables chapter 8. class hierarchy

chapter 8 hashing concept of hashing static hashing...

ccg parsing - cs.utexas.edu

lecture 11 oct 6 goals: hashing hash functions chaining...

edinburgh research explorer · an incremental algorithm for...

1 b+-tree and hash indexes b+-trees bulk loading static...

hashing & indexing

14. hashing -...

parsing. 2301373chapter 4 parsing2 outline top-down v.s....

chapter 1thanhtung/downloads/dbms/chapter_1.pdf · hashing,...

ccg – ccg cable terminations

parsing: top-down vs. bottom-up parsing algorithms partial...

file organizations jan. 2008yangjun chen acs-39021 outline:...

frontier pruning for shift-reduce ccg parsing and testing...

cs1622 - university of...

hashing part two: static perfect hashing

dependency parsing - oregon state...