a fast, accurate deterministic parser for chinese

23
A Fast, Accurate Deterministic Parser for Chinese MengqiuWang Kenji Sagae Teruko Mitamura Language Technologies Institute School of Computer Science Carnegie Mellon University {mengqiu,sagae,teruko}@cs.cmu.edu Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2007/07/26 ACL - 2006

Upload: roary-kirk

Post on 03-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

ACL - 2006. A Fast, Accurate Deterministic Parser for Chinese. MengqiuWang Kenji Sagae Teruko Mitamura Language Technologies Institute School of Computer Science Carnegie Mellon University {mengqiu,sagae,teruko}@cs.cmu.edu. Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2007/07/26. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Fast, Accurate Deterministic Parser for Chinese

A Fast, Accurate Deterministic Parser for

ChineseMengqiuWang Kenji Sagae Teruko Mitamura

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

{mengqiu,sagae,teruko}@cs.cmu.edu

Advisor: Hsin-Hsi ChenSpeaker: Yong-Sheng Lo

Date: 2007/07/26

ACL - 2006

Page 2: A Fast, Accurate Deterministic Parser for Chinese

Introduction Deterministic parsing model

The parsing task the classification task The shift/reduce decision Four classifiers Feature selection

POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier

Experiments Conclusion

Agenda

Word Segmentation

POS tagging

Parsing

Page 3: A Fast, Accurate Deterministic Parser for Chinese

Traditional statistical approaches To build models which assign probabilities to every possible

parse tree for a sentence Techniques

Such as dynamic programming, beam-search, and best-first-search are then employed to find the parse tree with the highest probability

Disadvantage Too slow for many practical applications

Introduction

Page 4: A Fast, Accurate Deterministic Parser for Chinese

Introduction Deterministic parsing model

The parsing task the classification task The shift/reduce decision Four classifiers Feature selection

POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier

Experiments Conclusion

Page 5: A Fast, Accurate Deterministic Parser for Chinese

Deterministic Parsing Model Deterministic parsing model :

Input is a sentence has already been segmented and tagged with part-of-speech (POS) in

formation Data structure

Queue : To store the input word-POS tag pairs (ex.上海 -NR) Stack : To hold the partial trees that are built during parsing

At each parse state The classifier makes shift/reduce decision based on contextual featur

es Output is a full parsing tree

Page 6: A Fast, Accurate Deterministic Parser for Chinese

Introduction Deterministic parsing model

The parsing task the classification task The shift/reduce decision Four classifiers Feature selection

POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier

Experiments Conclusion

Page 7: A Fast, Accurate Deterministic Parser for Chinese

The shift/reduce decision Four parsing actions : (Sagae and Lavie, 2005)

Shift To remove the first item on the queue and put it onto the stack

Reduce-Unary-X To remove one item from the stack X is the label of a new tree node that will be dominating the removed item

Reduce-Binary-X-Left To remove two item from the stack To take the head-node of the left sub-tree to be the head of the new tree

Reduce-Binary-X-Right To remove two item from the stack To take the head-node of the right sub-tree to be the head of the new tree

Page 8: A Fast, Accurate Deterministic Parser for Chinese

For example 1/5

Input : 布朗 (NR) 訪問 (VV) 上海 (NR)

The parse state : Initialization

Action : Shift

Page 9: A Fast, Accurate Deterministic Parser for Chinese

For example 2/5 The parse state : 2

Action : Reduce-Unary-NP

The parse state : 3

Action : Shift

Page 10: A Fast, Accurate Deterministic Parser for Chinese

For example 3/5 The parse state : 4

Action : Shift

The parse state : 5

Action : Reduce-Unary-NP

Page 11: A Fast, Accurate Deterministic Parser for Chinese

For example 4/5 The parse state : 6

Action : Reduce-Binary-VP-Left

The parse state : 7

Action : Reduce-Binary-IP-Right

Page 12: A Fast, Accurate Deterministic Parser for Chinese

For example 5/5 The parse state : final

Page 13: A Fast, Accurate Deterministic Parser for Chinese

Introduction Deterministic parsing model

The parsing task the classification task The shift/reduce decision Four classifiers Feature selection

POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier

Experiments Conclusion

Page 14: A Fast, Accurate Deterministic Parser for Chinese

Four classifiers Support Vector Machine

The TinySVM toolkit (Kudo andMatsumoto,2000) Maximum-Entropy Classifier

The Le’s Maxent toolkit (Zhang, 2004) Decision Tree Classifier

The C4.5 decision tree classifier Memory-Based Learning

The TiMBL toolkit (Daelemans et al., 2004)

Page 15: A Fast, Accurate Deterministic Parser for Chinese

Introduction Deterministic parsing model

The parsing task the classification task The shift/reduce decision Four classifiers Feature selection

POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier

Experiments Conclusion

Page 16: A Fast, Accurate Deterministic Parser for Chinese

Feature selection

Page 17: A Fast, Accurate Deterministic Parser for Chinese

Introduction Deterministic parsing model

The parsing task the classification task The shift/reduce decision Four classifiers Feature selection

POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier

Experiments Conclusion

Page 18: A Fast, Accurate Deterministic Parser for Chinese

POS tagging1. Using gold-standard POS tags

2. A simple POS tagger using an SVM classifier Using gold-standard POS tags to train SVM Using a simple POS tagger Two passes

Pass 1 : To extract features from the two words and POS tags that came before the current word, the two words following the current word, and the current word itself

Then the tag is assigned to the word according to SVM classifier’s output

Pass 2 : Additional features such as the POS tags of the two words following the current word, and the POS tag of the current word (assigned in the first pass) are used

This tagger had a measured precision of 92.5% for sentences <= 40 words.

Word Segmentation

POS tagging

Parsing

Page 19: A Fast, Accurate Deterministic Parser for Chinese

Experiments Corpus

Penn Chinese Treebank Training : Sections 001-270 (3484 sentences, 84,873 words) Development : 271-300 (348 sentences, 7980 words) Testing : 271-300 (348 sentences, 7980 words) 99629 words

Evaluation Labeled recall (LR) Labeled precision (LP) F1 score (harmonic mean of LR and LP)

Page 20: A Fast, Accurate Deterministic Parser for Chinese

Experiments

Results of different classifiers On development set for sentence <= 40 words

Page 21: A Fast, Accurate Deterministic Parser for Chinese

Experiments Comparison with Related work

On the test set

Page 22: A Fast, Accurate Deterministic Parser for Chinese

Experiments Using gold-standard POS

Stacked classifier model

Using Maxent, DTree and TiNBL

outputs as features, in addition to the

original feature set, to train a new

SVM model on the original training set

Page 23: A Fast, Accurate Deterministic Parser for Chinese

Conclusion To present a novel classifier-based deterministic parser for Ch

inese constituency parsing The best model runs in linear time and has labeled precision a

nd recall above 88% using gold-standard part-of-speech tags The SVM parser is 2-13 times faster than state-of-the-art pars

ers, while producing more accurate results The Maxent and DTree parsers run at speeds 40-270 times fas

ter than state-of-the-art parsers, but with 5-6% losses in accuracy