finding deceptive opinion spam by any stretch of the imagination · 2020-06-08 · finding...
TRANSCRIPT
![Page 1: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/1.jpg)
Finding Deceptive Opinion Spam by Any
Stretch of the Imagination Myle Ott,1 Yejin Choi,1 Claire Cardie,1 and Jeff Hancock2
Dept. of Computer Science,1 Communication2
Cornell University, Ithaca, NY
![Page 2: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/2.jpg)
Motivation • Consumers
increasingly rate, review and research products online
• Potential for opinion spam – Disruptive opinion
spam – Deceptive opinion
spam
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 3: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/3.jpg)
Motivation • Consumers
increasingly rate, review and research products online
• Potential for opinion spam – Disruptive opinion
spam – Deceptive opinion
spam
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 4: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/4.jpg)
Motivation • Consumers
increasingly rate, review and research products online
• Potential for opinion spam – Disruptive opinion
spam – Deceptive opinion
spam
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 5: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/5.jpg)
Motivation • Consumers
increasingly rate, review and research products online
• Potential for opinion spam – Disruptive opinion
spam – Deceptive opinion
spam
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 6: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/6.jpg)
Motivation
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
Which of these two hotel reviews is deceptive opinion spam?
![Page 7: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/7.jpg)
Motivation
Answer:
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
Which of these two hotel reviews is deceptive opinion spam?
![Page 8: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/8.jpg)
Overview
• Motivation • Gathering Data
• Human Performance
• Classifier Performance • Conclusion
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 9: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/9.jpg)
Gathering Data
• Label existing reviews – Can’t manually do this – Duplicate detection (Jindal and Liu, 2008)
• Create new reviews – Mechanical Turk
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 10: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/10.jpg)
Gathering Data
• Label existing reviews – Can’t manually do this – Duplicate detection (Jindal and Liu, 2008)
• Create new reviews – Mechanical Turk
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 11: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/11.jpg)
Gathering Data
• Label existing reviews – Can’t manually do this – Duplicate detection (Jindal and Liu, 2008)
• Create new reviews – Mechanical Turk
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 12: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/12.jpg)
Gathering Data
• Label existing reviews – Can’t manually do this – Duplicate detection (Jindal and Liu, 2008)
• Create new reviews – Mechanical Turk
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 13: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/13.jpg)
Gathering Data
• Label existing reviews – Can’t manually do this – Duplicate detection (Jindal and Liu, 2008)
• Create new reviews – Mechanical Turk
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 14: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/14.jpg)
Gathering Data
• Mechanical Turk – 20 hotels – 20 reviews / hotel – Offer $1 / review
– 400 reviews
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 15: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/15.jpg)
Gathering Data
• Mechanical Turk – 20 hotels – 20 reviews / hotel – Offer $1 / review
– 400 reviews
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 16: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/16.jpg)
Gathering Data
• Mechanical Turk – 20 hotels – 20 reviews / hotel – Offer $1 / review
– 400 reviews
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 17: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/17.jpg)
Gathering Data
• Mechanical Turk – 20 hotels – 20 reviews / hotel – Offer $1 / review
– 400 reviews
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 18: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/18.jpg)
Gathering Data
• Mechanical Turk – 20 hotels – 20 reviews / hotel – Offer $1 / review
– 400 reviews
• Average time spent: ���> 8 minutes
• Average length: ���> 115 words
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 19: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/19.jpg)
Gathering Data
• 400 truthful reviews – TripAdvisor.com – Lengths distributed similarly to deceptive
reviews
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 20: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/20.jpg)
Overview
• Motivation • Gathering Data
• Human Performance
• Classifier Performance • Conclusion
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 21: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/21.jpg)
Human Performance
• Why bother? – Validates deceptive opinions – Baseline to compare other approaches
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 22: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/22.jpg)
Human Performance
• Why bother? – Validates deceptive opinions – Baseline to compare other approaches
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 23: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/23.jpg)
Human Performance
• Why bother? – Validates deceptive opinions – Baseline to compare other approaches
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 24: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/24.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
![Page 25: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/25.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
![Page 26: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/26.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
Performed at chance (p-value = 0.1)
Performed at chance (p-value = 0.5)
![Page 27: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/27.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
![Page 28: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/28.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
Classified fewer than 12% of opinions as deceptive!
![Page 29: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/29.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
![Page 30: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/30.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
![Page 31: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/31.jpg)
Human Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• 80 truthful and 80 deceptive reviews • 3 undergraduate judges – Truth bias
• 2 meta-judges
No more truth bias!
![Page 32: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/32.jpg)
Overview
• Motivation • Gathering Data
• Human Performance
• Classifier Performance • Conclusion
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 33: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/33.jpg)
Classifier Performance
• Three feature sets – Genre identification – Psycholinguistic deception detection – Text categorization
• Linear SVM
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 34: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/34.jpg)
Classifier Performance
• Three feature sets – Genre identification – Psycholinguistic deception detection – Text categorization
• Linear SVM
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 35: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/35.jpg)
Classifier Performance
• Genre identification – 48 part-of-speech (PoS) features – Baseline automated approach
• Expectations – Truth similar to informative writing – Deception similar to imaginative writing
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 36: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/36.jpg)
Classifier Performance
• Genre identification – 48 part-of-speech (PoS) features – Baseline automated approach
• Expectations – Truth similar to informative writing – Deception similar to imaginative writing
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 37: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/37.jpg)
Classifier Performance
• Genre identification – 48 part-of-speech (PoS) features – Baseline automated approach
• Expectations – Truth similar to informative writing – Deception similar to imaginative writing
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 38: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/38.jpg)
Classifier Performance
• Genre identification – 48 part-of-speech (PoS) features – Baseline automated approach
• Expectations – Truth similar to informative writing – Deception similar to imaginative writing
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 39: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/39.jpg)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 40: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/40.jpg)
Outperforms human judges! (p-values = {0.06, 0.01, 0.001})
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 41: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/41.jpg)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• Rayson et. al. (2001) – Informative on left, imaginative on right
![Page 42: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/42.jpg)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
• Rayson et. al. (2001) – Informative on left, imaginative on right
e.g., best, finest
e.g., most
![Page 43: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/43.jpg)
Classifier Performance
• Linguistic Inquire and Word Count (Pennebaker et al., 2007) – Counts instances of ~4,500 keywords • Regular expressions, actually
– Keywords are divided into 80 dimensions across 4 broad groups
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 44: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/44.jpg)
Classifier Performance
• Linguistic Inquire and Word Count (Pennebaker et al., 2007) – Counts instances of ~4,500 keywords • Regular expressions, actually
– Keywords are divided into 80 dimensions across 4 broad groups
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 45: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/45.jpg)
Classifier Performance
• Linguistic Inquire and Word Count (Pennebaker et al., 2007) – Counts instances of ~4,500 keywords • Regular expressions, actually
– Keywords are divided into 80 dimensions across 4 broad groups
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 46: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/46.jpg)
Classifier Performance • Linguistic processes – e.g., average number of words per sentence
• Psychological processes – e.g., talk, happy, know, feeling, eat
• Personal concerns – e.g., job, cook, family
• Spoken categories – e.g., yes, umm, blah
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 47: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/47.jpg)
Classifier Performance • Linguistic processes – e.g., average number of words per sentence
• Psychological processes – e.g., talk, happy, know, feeling, eat
• Personal concerns – e.g., job, cook, family
• Spoken categories – e.g., yes, umm, blah
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 48: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/48.jpg)
Classifier Performance • Linguistic processes – e.g., average number of words per sentence
• Psychological processes – e.g., talk, happy, know, feeling, eat
• Personal concerns – e.g., job, cook, family
• Spoken categories – e.g., yes, umm, blah
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 49: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/49.jpg)
Classifier Performance • Linguistic processes – e.g., average number of words per sentence
• Psychological processes – e.g., talk, happy, know, feeling, eat
• Personal concerns – e.g., job, cook, family
• Spoken categories – e.g., yes, umm, blah
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 50: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/50.jpg)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 51: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/51.jpg)
Outperforms PoS! (p-value = 0.02)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 52: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/52.jpg)
Classifier Performance
• Text categorization (n-grams) – Unigrams – Bigrams+ • Includes unigrams
– Trigrams+ • Includes unigrams and bigrams
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 53: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/53.jpg)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 54: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/54.jpg)
Classifier Performance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
Outperforms all other methods!
![Page 55: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/55.jpg)
Classifier Performance
• Spatial difficulties���(Vrij et al., 2009)
• Psychological distancing (Newman et al., 2003)
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 56: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/56.jpg)
Classifier Performance
• Spatial difficulties���(Vrij et al., 2009)
• Psychological distancing (Newman et al., 2003)
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 57: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/57.jpg)
Classifier Performance
• Spatial difficulties���(Vrij et al., 2009)
• Psychological distancing (Newman et al., 2003)
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 58: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/58.jpg)
Classifier Performance
• Spatial difficulties���(Vrij et al., 2009)
• Psychological distancing (Newman et al., 2003)
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 59: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/59.jpg)
Classifier Performance
• Spatial difficulties���(Vrij et al., 2009)
• Psychological distancing (Newman et al., 2003)
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 60: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/60.jpg)
Overview
• Motivation • Gathering Data
• Human Performance
• Classifier Performance • Conclusion
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 61: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/61.jpg)
Conclusion • First large-scale gold-standard deception dataset – http://www.cs.cornell.edu/~myleott/op_spam
• Evaluated human deception detection performance • Developed automated classifiers capable of nearly
90% accuracy – Relationship between deceptive and imaginative text – Importance of moving beyond universal deception
cues
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 62: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/62.jpg)
Conclusion • First large-scale gold-standard deception dataset – http://www.cs.cornell.edu/~myleott/op_spam
• Evaluated human deception detection performance • Developed automated classifiers capable of nearly
90% accuracy – Relationship between deceptive and imaginative text – Importance of moving beyond universal deception
cues
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 63: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/63.jpg)
Conclusion • First large-scale gold-standard deception dataset – http://www.cs.cornell.edu/~myleott/op_spam
• Evaluated human deception detection performance • Developed automated classifiers capable of nearly
90% accuracy – Relationship between deceptive and imaginative text – Importance of moving beyond universal deception
cues
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 64: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/64.jpg)
Conclusion • First large-scale gold-standard deception dataset – http://www.cs.cornell.edu/~myleott/op_spam
• Evaluated human deception detection performance • Developed automated classifiers capable of nearly
90% accuracy – Relationship between deceptive and imaginative text – Importance of moving beyond universal deception
cues
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 65: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/65.jpg)
Conclusion • First large-scale gold-standard deception dataset – http://www.cs.cornell.edu/~myleott/op_spam
• Evaluated human deception detection performance • Developed automated classifiers capable of nearly
90% accuracy – Relationship between deceptive and imaginative text – Importance of moving beyond universal deception
cues
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
![Page 66: Finding Deceptive Opinion Spam by Any Stretch of the Imagination · 2020-06-08 · Finding Deceptive Opinion Spam by Any Stretch of the Imagination Myle Ott, 1 2Yejin Choi, Claire](https://reader034.vdocuments.mx/reader034/viewer/2022050405/5f830cfe12226c6eec1c0d98/html5/thumbnails/66.jpg)
Thank you. Questions? • First large-scale gold-standard deception dataset – http://www.cs.cornell.edu/~myleott/op_spam
• Evaluated human deception detection performance • Developed automated classifiers capable of nearly
90% accuracy – Relationship between deceptive and imaginative text – Importance of moving beyond universal deception
cues
Finding Deceptive Opinion Spam by Any Stretch of the Imagination