1 co-training for cross-lingual sentiment classification xiaojun wan ( 萬小軍 ) associate...
TRANSCRIPT
![Page 1: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/1.jpg)
1
Co-Training for Cross-Lingual Sentiment Classification
Xiaojun Wan (萬小軍 )
Associate Professor, Peking University
ACL 2009
![Page 2: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/2.jpg)
2
Research Gap
• Opinion mining has drawn much attention recently– Sentiment classification (POS, NEG, NEU)– Subjectivity analysis (subjective, objective)
• Annotated corpora are most important for training
• However, most of them are English data
• Corpora for other languages, including Chinese, are rare
![Page 3: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/3.jpg)
3
Related Work
• Pilot studies on cross-lingual subjectivity classification
• Mihalcea et al. ACL 2007– Bilingual lexicon and manually translated parallel
corpus• Banea et al. EMNLP 2008
– English annotation tool + MT– Build Romanian annotation tool– Not much loss compared to human translation– Suggesting MT is a viable way
![Page 4: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/4.jpg)
4
Problem Definition
• Perform cross-lingual sentiment classification– Either positive or negative
• Source: English
• Target: Chinese
• Leverage– 8000 Labeled English product reviews
– 1000 Unlabeled Chinese product reviews
– Machine translation (MT)
• Derive– Sentiment classification tools for Chinese product reviews
![Page 5: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/5.jpg)
5
Framework
• Training Phase
• Classification Phase
![Page 6: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/6.jpg)
6
Training Phase (1)Machine Translation
![Page 7: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/7.jpg)
7
Two Views
Chinese View English View
![Page 8: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/8.jpg)
8
Training Phase (2)The Co-Training Approach
English View
![Page 9: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/9.jpg)
9
Label the unlabeled data (English)
English Classifierwith SVM
Label
Een
Top p positiveTop n negative
most confidentreview
![Page 10: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/10.jpg)
10
Label the unlabeled data (Chinese)
Chinese Classifierwith SVM
Ecn
Top p positiveTop n negativemost confident
review
Label
![Page 11: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/11.jpg)
11
Remove from Unlabeled DataFinish one Iteration
Een
Top p positiveTop n negative
most confidentreview
Ecn
Top p positiveTop n negativemost confident
review
∪
Train again Train again
![Page 12: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/12.jpg)
12
Setting
• #Iteration = 40
• p = n = 5
![Page 13: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/13.jpg)
13
Classification Phase
Chinese Classifier
English Classifier
average [-1, 1]
![Page 14: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/14.jpg)
14
Experiment Setting (Training)
8000 Amazonproduct reviews.
4000 positive4000 negative
Books, DVDs,electronics
1000 product reviews fromwww.it168.com
mp3 player,mobile phones,DC
![Page 15: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/15.jpg)
15
Experiment Setting (Testing)
• 886 Chinese product reviews from www.it168.com– 451 positive, 435 negative
– Different from unlabeled training data (outside testing)
![Page 16: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/16.jpg)
16
Baseline
• SVM– Use only labeled data
• TSVM (Transductive SVM)– Joachims, 1999– Use both labeled and unlabeled
![Page 17: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/17.jpg)
17
SVM Baselines
SVM(EN)SVM(CN)
![Page 18: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/18.jpg)
18
SVM Baselines
SVM(ENCN1)
![Page 19: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/19.jpg)
19
SVM Baselines
SVM(ENCN2)
average
![Page 20: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/20.jpg)
20
TSVM Baselines
TSVM(EN)TSVM(CN)
![Page 21: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/21.jpg)
21
TSVM Baselines
TSVM(ENCN1)
![Page 22: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/22.jpg)
22
TSVM BaselinesTSVM(ENCN2)
average
![Page 23: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/23.jpg)
23
Result: Method Comparison (1)
![Page 24: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/24.jpg)
24
Result: Method Comparison (2)Performance on Each Side
SVM(EN)
TSVM(EN)
CoTrain(EN)
![Page 25: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/25.jpg)
25
Result: Method Comparison (3)
Accuracy
SVM(EN) 0.738
TSVM(EN) 0.769
CoTrain(EN) 0.790
Accuracy
SVM(CN) 0.771
TSVM(CN) 0.767
CoTrain(CN) 0.775
CoTrain make better use of unlabeled Chinese reviews than TSVM
![Page 26: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/26.jpg)
26
Result: Iteration Number Outperform TSVM(ENCN2) after 20 iterations
![Page 27: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/27.jpg)
27
Result: Balance of (p,n) Unbalanced examples hurt the performance badly
![Page 28: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/28.jpg)
28
Conclusion & Comment
• Co-Training approach for cross-lingual sentiment classification
• Future Work– Translated and natural text have different feature
distribution
– Domain adaptation algorithm (ex. structural correspondence learning) for linking them
![Page 29: 1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009](https://reader036.vdocuments.mx/reader036/viewer/2022062308/56649eda5503460f94bea215/html5/thumbnails/29.jpg)
29
Comment
• Leverage word (phrase) alignment in translated text