the third chinese language processing bakeoff: word segmentation and named entity recognition

21
The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006

Upload: charde-hancock

Post on 31-Dec-2015

25 views

Category:

Documents


1 download

DESCRIPTION

The Third Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. Gina-Anne Levow Fifth SIGHAN Workshop July 22, 2006. Roadmap. Bakeoff Task Motivation Bakeoff Structure: Materials and annotations Tasks and conditions Participants and timeline - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

The Third Chinese Language Processing Bakeoff:Word Segmentation and Named Entity Recognition

Gina-Anne LevowFifth SIGHAN Workshop

July 22, 2006

Page 2: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Roadmap Bakeoff Task Motivation Bakeoff Structure:

Materials and annotations Tasks and conditions Participants and timeline

Results & Discussion: Word Segmentation Named Entity Recognition

Observations & Conclusions Thanks

Page 3: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Bakeoff Task Motivation Core enabling technologies for Chinese

language processing Word segmentation (WS)

Crucial tokenization in absence of whitespace Supports POS tagging, parsing, ref. resolution, etc Fundamental challenges:

“Word” not well, consistently defined; humans disagree Unknown words impede performance

Named Entity Recognition (NER) Essential for reference resolution, IR, etc Common class of new unknown words

Page 4: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Data Source Characterization Five corpora, providers

Annotation guidelines available, varied Simplified and traditional characters

Range of encodings, all available in Unicode (UTF-8)

Provided in common XML, converted to train/test form (LDC)

Page 5: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Tasks and Tracks Tasks:

Word Segmentation: Training and truth: whitespace delimited End-of-word tags replaced with space, no others

Named Entity Recognition: Training and truth: Similar to Co-NLL 2-column NAMEX only: LOC, PER, ORG (LDC: +GPE)

Tracks: Closed: Only provided materials may be used Open: Any materials may be used, but must

document

Page 6: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Structure: Participants &Timeline

Participants: 29 sites submitted runs for evaluation (36 init)

144 runs submitted: ~2/3 WS; 1/3 NER Diverse groups: 11 PRC, 7 Taiwan, 5 US, 2 Japan,

1each: Singapore, Korea, Hong Kong, Canada Mix of Commercial: MSRA, Yahoo!, Alias-I, FR Telecom,

etc- and Academic sites

Timeline: March 15: Registration open April 17: Training data released May 15: Test data released May 17: Results due

Page 7: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: Results Contrasts: Left-to-right maximal

match Baseline: Uses only training vocabulary Topline: Uses only testing vocabularySource Recall Prec F-score OOV Roov Riv

CITYU 0.93 0.882 0.906 0.049 0.009 0.969

CKIP 0.915 0.87 0.892 0.042 0.03 0.954

MSRA 0.949 0.9 0.924 0.034 0.022 0.981

UPUC 0.869 0.79 0.828 0.088 0.011 0.951

Source Recall Prec F-Score

OOV Roov Riv

CITYU 0.982 0.985

0.984 0.04 0.993 0.981

CKIP 0.98 0.987

0.983 0.042 0.997 0.979

MSRA 0.991 0.993

0.992 0.034 0.999 0.991

UPUC 0.961 0.976

0.968 0.088 0.989 0.958

Page 8: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: CityUSite RunID R P F Roov Riv

15 D 0.973 0.972 0.972 0.787 0.981

15 B 0.973 0.972 0.972 0.787 0.981

20 0.972 0.971 0.971 0.792 0.979

32 0.969 0.970 0.970 0.773 0.978

CityUClosed

Site RunID R P F Roov Riv

20 0.978 0.977 0.977 0.84 0.984

32 0.979 0.976 0.977 0.813 0.985

34 0.971 0.967 0.969 0.795 0.978

22 0.970 0.965 0.967 0.761 0.979

CityUOpen

Page 9: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: CKIP

Site RunID R P F Roov Riv

20 0.961 0.955 0.958 0.702 0.972

15 A 0.961 0.953 0.957 0.658 0.974

15 B 0.961 0.952 0.57 0.656 0.974

32 0.958 0.948 0.953 0.646 0.972

Site RunID R P F Roov Riv

20 0.964 0.955 0.959 0.704 0.975

34 0.959 0.949 0.954 0.672 0.972

32 0.958 0.948 0.953 0.647 0.972

2 A 0.953 0.946 0.949 0.679 0.965

CKIPClosed

CKIPOpen

Page 10: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: MSRASite RunID R P F Roov Riv

32 0.964 0.961 0.963 0.612 0.976

26 0.961 0.953 0.957 0.499 0.977

9 0.959 0.955 0.957 0.494 0.975

1 A 0.955 0.956 0.956 0.650 0.966

Site RunID R P F Roov Riv

11 A 0.980 0.978 0.979 0.839 0.985

11 B 0.977 0.976 0.977 0.840 0.982

14 0.975 0.976 0.975 0.811 0.981

32 0.977 0.971 0.974 0.675 0.988

MSRAClosed

MSRAOpen

Page 11: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: UPUC

Site RunID R P F Roov Riv

20 0.940 0.926 0.933 0.707 0.963

32 0.936 0.923 0.930 0.683 0.961

1 A 0.940 0.914 0.927 0.634 0.969

26 A 0.936 0.917 0.926 0.617 0.966

Site RunID R P F Roov Riv

34 0.949 0.939 0.944 0.768 0.966

2 0.942 0.928 0.935 0.711 0.964

20 0.940 0.927 0.933 0.741 0.959

7 0.944 0.922 0.933 0.680 0.970

UPUCClosed

UPUCOpen

Page 12: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: Overview

F-scores: 0.481-0.797 Best score: MSRA Open Task (FR Telecom) Best relative to topline: CityU Open: >99% Most frequent top rank: MSRA

Both F-scores and OOV recall higher in Open

Overall good results: Most outperform baseline

Page 13: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Word Segmentation: Discussion Continuing OOV challenges

Highest F-scores on MSRA Also highest top and base lines

Lowest OOV rate Lowest F-scores on UPUC

Also lowest top and baselines Highest OOV rate (> double all other OOV) Smallest corpus (~1/3 MSRA)

Best scores: most consistent corpus Vocabulary, annotation

UPUC also varies in genre: train: CTB; test: CTB,NW,BN

Page 14: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

NER Results Contrast: Baseline

Label as Named Entity if unique tag in training

Source P R F PER-F ORG-F LOC-F GPE-F

CITYU 0.611 0.467

0.529

0.587 0.516 0.503 N/A

LDC 0.493 0.378

0.428

0.395 0.29 0.259 0.539

MSRA 0.59 0.488

0.534

0.614 0.469 0.531 N/A

Page 15: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

NER Results: CityUSite P R F ORG-F LOC-F PER-F

3 0.914 0.867 0.89 0.805 0.921 0.909

19 0.92 0.854 0.886 0.805 0.925 0.887

21a 0.927 0.847 0.885 0.797 0.92 0.89

21b 0.924 0.849 0.885 0.798 0.924 0.892

Site P R F ORG-F LOC-F PER-F

6 0.869 0.749 0.805 0.68 0.86 0.81

CityUClosed

CityUOpen

Page 16: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

NER Results: LDC

Site P R F ORG-F LOC-F PER-F

7 0.7616 0.662 0.708 0.521 0.286 0.742

6-gpe-loc

0.672 0.655 0.664 0.455 0.708 0.742

6 0.306 0.298 0.302 0.455 0.037 0.742

Site P R F ORG-F LOC-F PER-F

3 0.803 0.726 0.763 0.658 0.305 0.788

8 0.814 0.594 0.688 0.585 0.170 0.657

LDCClosed

LDCOpen

Page 17: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

NER Results: MSRASite P R F ORG-F LOC-F PER-F

14 0.889 0.842 0.865 0.831 0.854 0.901

21a 0.912 0.817 0.862 0.82 0.905 0.826

21b 0.884 0.829 0.856 0.77 0.901 0.849

3 0.881 0.823 0.851 0.815 0.906 0.794

Site P R F ORG-F LOC-F PER-F

10 0.922 0.902 0.912 0.859 0.903 0.960

14 0.908 0.892 0.899 0.84 0.91 0.926

11b 0.877 0.875 0.876 0.761 0.897 0.922

11a 0.864 0.84 0.852 0.694 0.874 0.92

MSRAClosed

MSRAOpen

Page 18: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

NER: Overview

Overall results: Best F-score: MSRA Open Track: 0.91 Strong overall performance:

Only two results below baseline Direct comparison of NER Open vs Closed

Difficult: only two sites performed both tracks Only MSRA had large numbers of runs

Here Open outperformed Closed: top 3 Open > Closed

Page 19: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

NER Observations Named Entity Recognition challenges

Tagsets, variation, and corpus size Results on MSRA/CityU much better than LDC

LDC corpus substantially smaller Also larger tagset: GPE GPE easily confused for ORG or LOC

NER results sensitive to corpus size, tagset, genre

Page 20: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Conclusions & Future Challenges

Strong, diverse participation in WS & NER Many effective competitive results

Cross-task, cross-evaluation comparisons Still difficult Scores sensitive to corpus size, annotation consistency,

tagset, genre, etc Need corpus, config-independent measure of progress Encourage submissions that support comparisons

Extrinsic, task-oriented evaluation of WS/NER Continuing challenges: OOV, annotation

consistency, encoding combinations and variation, code-switching

Page 21: The Third Chinese Language Processing Bakeoff: Word Segmentation and  Named Entity Recognition

Thanks Data Providers:

Chinese Knowledge Information Processing Group, Academia Sinica, Taiwan:

Keh-Jiann Chen, Henning Chiu City University of Hong Kong:

Benjamin K.Tsou, Olivia Oi Yee Kwong Linguistic Data Consortium: Stephanie Strassel Microsoft Research Asia: Mu Li University of Pennsylvania/University of Colorado:

Martha Palmer, Nianwen Xue Workshop co-chairs:

Hwee Tou Ng and Olivia Oi Yee Kwong All participants!