![Page 1: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/1.jpg)
Using Parallel Propbanks to Enhance Word-Alignments
Jinho D. Choi (Univ. of Colorado at Boulder)
Martha Palmer (Univ. of Colorado at Boulder)
Niawen Xue (Brandeis University)
The 3rd Linguistic Annotation Workshop at ACL ’09August 7th, 2009
![Page 2: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/2.jpg)
Parallel Propbanks
• Propbank
- Corpus annotated with verbal propositions and their arguments (semantic roles)
• Parallel Propbanks
- Propbanks annotated in parallel corpus
2
Gansu Province also actively explored high risk business[ ] [ ] [ ]
Arg0: explorer Arg1: things explored
!!" " #极 #$ % $% &'[ ] [ ] [ ]
Arg0 Arg1
![Page 3: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/3.jpg)
Word-Alignments
• Given parallel sentences, discover translation for each word
• GIZA++: a statistical machine translation toolkit
- It is hard to verify if the alignments are correct.
- Words with low frequencies may not get aligned.
- It does not account for semantics.
3
!" # 开! $" % & # '( $% )&
is a principal economic activity in developing PudongConstruction
![Page 4: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/4.jpg)
Predicate Matching (based on GIZA++)
• English Chinese Parallel Treebank (ECTB)
- Xinhua: Chinese newswire + literal translation
- Sinorama: Chinese news magazine + non-literal translation
6
32%
19% 3%
45%
56%22%
3%
19%
En.verbEn.beEn.elseEn.none
Xinhua: 12,895 Sinorama: 40,086
![Page 5: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/5.jpg)
Top-down Argument Matching
• Verify word-alignments
- For each Chinese verb vc aligned to some English verb ve
- Verify that the alignment is correct if the arguments of vc and ve match
7
!!" " #极 #$ % $% &'
Gansu Province also actively explored high risk business[ ][ ][ ] [ ][ ]
Arg0 ArgM ArgM Rel Arg1
Arg0 ArgM ArgM Rel Arg1
[ ] [ ] [ ] [ ] [ ]
Bingo!
![Page 6: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/6.jpg)
Bottom-up Argument Matching
• Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes the argument matching with vc
8
!!" # $" %# & ' ( $ )" %& 担'
Foreign funded enterprises in Gansu Province no longer worry about investment risk[ ][ ][ ][ ][ ]
Arg0 A.M A.M Rel Arg1
Arg0 A.M A.M A.M Arg1 Rel
[ ] [ ] [ ][ ][ ] [ ]
![Page 7: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/7.jpg)
Bottom-up Argument Matching
• Expand word-alignments
- For each Chinese verb vc aligned to no English word
- Align vc to ve such that ve is an English verb that maximizes the argument matching with vc
8
ArgM Rel Arg1
[ ][ ][ ]Foreign funded enterprises in Gansu Province no longer worry about investment risk
!!" # $" %# & ' ( $ )" %& 担'
Foreign funded enterprises in Gansu Province no longer worry about investment risk
[ ] [ ] [ ][ ][ ] [ ]
Arg0 A.M A.M A.M Arg1 Rel
[ ][ ][ ][ ][ ]
Arg0 A.M A.M Rel Arg1
![Page 8: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/8.jpg)
Argument Matching Score
• Macro argument matching score
• Micro argument matching score
• Thresholds
- Top-down: thresholds on macro score
- Bottom-up: thresholds on both macro and micro scores
9
![Page 9: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/9.jpg)
System Overview
10
GIZA++
WordAlignmentsVerbs aligned
to verbsVerbs alignedto no word
Source Language Corpus
Target Language Corpus
ParallelPropbanksTop-down
MatchingBottom-upMatching
VerifiedAlignments
ExpandedAlignments
EnhancedAlignments
![Page 10: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/10.jpg)
Evaluations
• Test Corpus
- NIST-GALE Web Genre Test Data
- 100 parallel sentences, 365 verb tokens, 273 verb types
• Measurements
- Term Coverage: how many Chinese verb-types are covered
- Term Expansion: how many English verb-types are suggested
- Alignment Accuracy: how many suggested English verb-types are correct
11
![Page 11: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/11.jpg)
Evaluations: Top-down
12
0
32.5
65.0
97.5
130.0
Xinhua Sinorama
6276
129
79
Term Coverage
Mac.th = 0.0 (GIZA++) Mac.th = 0.5 (TDAM)
0%
22.5%
45.0%
67.5%
90.0%
Xinhua Sinorama
78.09%83.71%
57.76%
83.35%
Average Alignment Accuracy
![Page 12: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/12.jpg)
Evaluations: Bottom-up
13
0
7.5
15.0
22.5
30.0
Xinhua Sinorama
27
18
Term Coverage
0%
17.5%
35.0%
52.5%
70.0%
Xinhua Sinorama
14.46%
63.89%
Average Alignment Accuracy
Mac.th = 0.8, Mic.th = 0.6
5.5% error-reduction17% abs-improvement
![Page 13: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/13.jpg)
Conclusions & Future Work
• Conclusions
- Top-down Argument Matching is most effective for verifying word-alignments based on non-literal translations that have proven difficult for GIZA++.
- Bottom-up Argument Matching shows promise for expanding the coverage of GIZA++ alignments based on literal translations.
• We will try to enhance word-alignments by using
- Automatically labeled Propbanks
- Nombanks, Named-entity tags
- Parallel Propbanks prior to GIZA++
14
![Page 14: Using Parallel Propbanks to Enhance Word-alignments](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55795457d8b42ab6648b48fa/html5/thumbnails/14.jpg)
Acknowledgements
• We gratefully acknowledge the support of the National Science Foundation Grants IIS-0325646, Domain Independent Semantic Parsing, CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc.
• Special thanks to Daniel Gildea, Ding Liu (University of Rochester) who provided word-alignments, Wei Wang (Information Sciences Institute at University of Southern California) who provided the test-corpus, and Hua Zhong (University of Colorado at Boulder) who performed the evaluations.
15