dependency hashing for n-best ccg parsing

Post on 05-Feb-2016

56 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dependency Hashing for n-best CCG Parsing. Dominick Ng and James R. Curran Presented by Yun Huang. CCG derivation Dependency Evaluation All components of a dep. structure must match golden standard Prec./Recall/F-score. Background: CCG. Background: CCGbank. - PowerPoint PPT Presentation

TRANSCRIPT

1

Dependency Hashing for n-best CCG Parsing

Dominick Ng and James R. Curran

Presented by Yun Huang

2

Background: CCG

• CCG derivation• Dependency

• Evaluation– All components of a de

p. structure must match golden standard

– Prec./Recall/F-score

3

Background: CCGbank

• CCGbank was created by converting the phrase-structure trees in the PTB into normal-form CCG derivations. (99.44% covered)

4

Background: C&C parser

• Supertagger: assign possible lexical categories to word (eg. S\NP, (S\NP)/PP for swim)– Tag dictionary extracted from training data– Adaptive supertagging: β and k

• C&C parser: log-linear model parser– POS tags and lexical categories as input.– CKY chart parsing– N-best reranking

5

Ambiguity in n-best CCG parsing

• Spurious ambiguity– Norm-form (usually right branching)

• Absorption ambiguity

• Diversity problem: n-best CCG derivations, but with duplicated dependencies

6

Dependency Hashing (1)

• Constraint: any n-best candidate must not have the same dependencies as any candidate already in the list.– Similar in SMT: remove duplicated strings– Delete which: later inserted? lower score?

7

Dependency Hashing (2)

• Implementation:– 32-bit hash value for each dependency

– Bit-wise XOR to combine sub-derivations– Only hash value, no hash table

• Collision: miss some useful dependencies

8

Diversity experiments

• Dependency

• Grammatical relation

9

Parsing Results

• Oracle– Reranking u

pper bound

• Reranking

Gap

10

Three types of error

• Grammar error– Only a subset of CCGbank rules are used– Seen rule constraint

• Supertagger error– Restricted categories by frequency cutoff – Probability threshold βand cutoff k

• Model error– Suboptimal parse

11

Grammar Error

• Given gold-standard categories, the parser F-score is 99.49%, with 95.61% coverage

• Grammar error accounts about 0.5% of overall parser errors, and 4.4% drop in coverage

12

Supertagger and model error

• Supertagger error : differ from oracle• Model error : differ from baseline

13

More experiments

• Tradeoff of speed and accuracy

• Gold/automatic

POS tags

14

Conclusion

• Dependency hashing for n-best CCG– Avoid derivations with same dependency– Increase diversity in n-best list

• Comprehensive error analysis– Grammar error: 0.5%– Supertagger error: 5%– Model error: 7.5%

15

Thank you

Q & A

top related