acl 2005 workshop on building and using parallel texts (wpt-05), ann arbor, mi. june 2005 1...
Post on 15-Jan-2016
217 views
TRANSCRIPT
![Page 1: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/1.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
1
Competitive Grouping in Integrated Segmentation and Alignment Model
Ying Zhang Stephan Vogel
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
![Page 2: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/2.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
2
Integrated Segmentation and Alignment Model
• Phrase alignment models (Och et al., 1999; Marcu and Wong, 2002; Kohen et al., 2003)– Many of these models rely on the pre-calculated word alignment.– Use different heuristics to extract phrase pairs from the Viterbi word
alignment path.
• Integrated Segmentation and Alignment model (Zhang 2003)– No such word alignments needed– Segment source and target sentences into phrases and align them
simultaneously– Use chi-square(f, e) instead of the conditional probability P(f|e) for word
pair associations– Greedy search for phrase pairs– Key idea: competitive grouping algorithm– Inspired by the competitive linking algorithm (Melamed 1997) for word
alignment
![Page 3: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/3.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
3
Competitive Linking Algorithm
• A greedy word alignment algorithm.
• The word pair has the highest likelihood L(f,e) “wins” the competition.
• One-to-one assumption: when pair{f, e} is “linked”, neither f nor e can be aligned with any other words.
• Example:
![Page 4: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/4.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
4
Competitive Grouping Algorithm
• Discard the one-to-one assumption in competitive linking, make it less greedy.
• When a pair {e, f} wins the competition, inviting the neighboring pairs to join the “winner’s club”.
• Introducing the locality assumption: a source phrase of adjacent words can only be aligned to a target phrase of adjacent words.– Words inside the aligned phrase pairs can not be aligned to other words
![Page 5: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/5.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
5
Expanding the Phrase Pair Aligned
• Two criteria have to be satisfied to expand the seeding word pair to phrase pairs1. If a new source word f is to be grouped, the best e that f is associated
should not be “blocked” by this expansion; similar for grouping a new target word.
2. The highest word pair likelihood value in the expanded area needs to be “similar” to the seed value
• According to the locality assumption, words in the aligned phrase pairs can not be aligned with other words again.
![Page 6: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/6.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
6
Exploring All Possible Phrase Pairs
• Criterion 2 is used to control the granularity of the phrase pairs aligned– Two short phrase pairs
– Or one long phrase pairs
• Short phrases give better coverage for unseen testing data
• Long phrases encapsulate more context, e.g. local reordering, word sense, and etc.
• Hard to decided on the optimal granularity without knowing the testing data
• Solution: for each grouping, try all possible granularities
![Page 7: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/7.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
7
Exploring All Possible Phrase Pairs
French: Je déclare reprise la session
English: I declare resumed the session
![Page 8: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/8.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
8
The Likelihood of Word Associations
• Chi-square statistics is used to measure the likelihood of word associations for pair {e, f}
• For each word pair {e, f} null hypothesis: e and f are independent of each other.
• Calculating to measure how true is this hypothesis
• Construct the contingency table using the counts from the corpus given the current alignment, e.g. uniform alignment– O11: number of times when e and f are aligned
– O12: number of times when e aligned with other f
– O21: number of times when f aligned with other e
– O22: number of times when other f aligned with other e
f ~f
e O11 O12
~e O21 O22
![Page 9: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/9.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
9
In WPT-05
• Submitted results for all four languages
• Training data as provided
• Language model as provided
• Decoder (Pharaoh) as provided
BLEU German Spanish Finnish French
Dev-test 18.63 26.20 12.88 26.20
Test 18.93 26.14 12.66 26.71
![Page 10: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/10.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
10
Conclusion
• Competitive grouping algorithm at the core of the ISA model
• Simple and efficient model
• Comparable results as other phrase alignment models
![Page 11: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/11.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
11
The Evolution of ISA
![Page 12: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/12.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
12
Matrix of the Likelihood
![Page 13: ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment](https://reader036.vdocuments.mx/reader036/viewer/2022062423/56649d5d5503460f94a3cd39/html5/thumbnails/13.jpg)
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005
13
Expanding the Phrase Pairs