adaptive subjective triggers for opinionated document retrieval
DESCRIPTION
Adaptive Subjective Triggers for Opinionated Document Retrieval. Kazuhiro Seki Organization of Advanced Science & Technology Kobe University Kuniaki Uehara Graduate School of Engineering, Kobe University 2 /10/2009. Background. Increasing user-generated contents (UGC) on the web - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/1.jpg)
1
Adaptive Subjective Triggers for Opinionated Document Retrieval
Kazuhiro SekiOrganization of Advanced Science & TechnologyKobe University
Kuniaki UeharaGraduate School of Engineering, Kobe University
2/10/2009
![Page 2: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/2.jpg)
2
Background
• Increasing user-generated contents (UGC) on the web– often contain personal subjective opinions
• Can be helpful for personal/corporate decision making → demands to retrieve personal opinions for a given entity
• Traditional IR aims to find documents relevant to a given topic (entity)– not concerned with subjectivity
• Aim: Retrieve documents not only pertinent to a given entity but also containing subjective opinions
![Page 3: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/3.jpg)
3
An (existing) approach
• Lexicon-based (Mishne, 2006; Zhang et al., 2008; etc.)– Look for subjective words/phrases
• “like” conveys favorable feelings– “I like the movie.”
– Potential drawback• Only words/phrases separate from context do not indicate
subjectivity– “It looks like a cat.”– “She likes singing.”
![Page 4: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/4.jpg)
4
Another approach considering wider context
• n-gram language model– estimate word occurrence probabilities based on prior
context or history, i.e., (n – 1) words• bigram: P(wi|wi–1) • trigram: P(wi|wi–2,wi–1)
– Generally, n is set to 2 to 3
![Page 5: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/5.jpg)
5
Trigger models (Lau et al., 1993)
• Incorporate long distance dependency that cannot be handled by n-gram models
• Trigger pairs– word pairs such that one tends to bring about the
occurrence of the other• nor → either (syntactic dependency)• memory → GB (semantic dependency)
• Used by linearly interpolating with an n-gram model(1–λ)·PB(w|h) + λ·PT(w|h)
trigger modeln-gram model
![Page 6: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/6.jpg)
6
Identifying trigger pairs (Tillmann et al. , 1996)
corpus
n-gram modelP(w|h)
vocabulary
potentialtrigger pairs
trigger model
PT(w|h)
extended model
PE(w|h)
log likelihood difference
Δa→b=∑i {logPE(wi|hi) – logP(wi|hi)}
each paira → b
evaluation
start
When P(b|h) < t→ low level
triggers
![Page 7: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/7.jpg)
7
Building trigger model PT
1. For each identified trigger pair (a→b), compute their association score α(b|a) based on their co-occurrences
2. Define a trigger model PT by using α(·)
average association score betweenwords in history h and word w
![Page 8: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/8.jpg)
8
Subjective trigger model
• Assumptions– Personal subjective opinion consists of two main
components• Subject of the opinion (e.g, “I”, “you”) or the object the opinion is
about (e.g., “The Curious Case of Benjamin Button”) • Subjective expression (e.g., “like”, “feel”)• Treat them as triggering and triggered words, respectively
– Triggering words are expressed as pronouns• Empirical finding
– Proximity of pronouns and subjective expressions to objects is an effective measure of opinionatedness (Zhou et al., 2007; Yang et al., 2007)
![Page 9: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/9.jpg)
9
Identifying “subjective” trigger pairs
• Pronouns considered– I, my, you, it, its, he, his, she, her, we, our, they, their, this
• History h: preceding words in the same sentence• Corpus: 5000 customer reviews from Amazon.com
![Page 10: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/10.jpg)
10
Identifying “subjective” trigger pairs (cont.)
• Low level trigger (P(w|h) < t) causes the problem– Penalize frequent w with infrequent history h
![Page 11: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/11.jpg)
11
reranking
documentsd
query q
documentsd
Opinion retrieval
• Probability that d is relevant to q AND subjective– product of PINM(q|d) and
PE(d)=∏i PE(wi|hi)– PE(d) is smaller for longer d– PINM(q|d) and PE(d) may have
largely different variances• Normalize PE(d) by length m &
take weighted sum of logssubj. languagemodel PE(w|h)
PINM(q|d)
IR by INM
![Page 12: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/12.jpg)
12
Dynamic model adaptation
• Motivation– Language models created from Amazon reviews may not
be effective for some types of entities• Procedure
1. Carry out keyword search for a given topic2. Use k top ranked blog posts to identify new trigger pairs
(a→b) and compute α’(·)3. Update trigger model by using the new trigger pairs
association scores for new triggers
![Page 13: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/13.jpg)
13
Empirical evaluation
• Data– TREC Blog track test collection 2006
• 3 million blog posts crawled from Dec 2005 to Feb 2006• 50 “topics” (user information needs)• Relevant & opinionated posts are explicitly labeled
• Two types of assessment– Evaluation of the language models– Their effects on opinion retrieval
![Page 14: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/14.jpg)
14
Evaluation of language models
• Perplexity– Uncertainty of language model L in predicting word
sequence (d = w1,…,wm)
• Created two hypothetical documents from the Blog track collection– concatenate all the opinionated posts → dO
– all the relevant (but non-opinionated) posts → dN
![Page 15: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/15.jpg)
15
• Higher order n-grams monotonically decrease perplexity irrespective of language models and document types
• Opinionated document dO leads to lower perplexity• Subjective language model PE produces lower perplexity than
n-gram model PB
Perplexity Results
![Page 16: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/16.jpg)
16
Relation between parameter β and MAP
+22.0%
![Page 17: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/17.jpg)
17
Improvement for individual topics
![Page 18: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/18.jpg)
18
Analysis on individual topics
• Topics with notable improvement– “MacBook Pro”. Laptop (+0.22)– “Heineken”. Company and brand names (+0.20)– “Shimano”. Company and brand names (+0.19)– “Board chess”. Board game (+0.13)– “Zyrtec”. Medication (product name) (+0.12)– “Mardi Gras”. Final day of carnival (+0.11)
• Most of them are products– Model learned from Amazon reviews is effective for
products in general, including beer and medication– Also effective for other types of entities
![Page 19: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/19.jpg)
19
Analysis on individual topics (cont.)
• Topics with performance decline– “Jim Moran”. Congressman (–0.15)– “World Trade Org.”. International organization (–0.05)– “Cindy Sheehan”. Anti-war activist (–0.03)– “Ann Coulter”. Political commentator (–0.01)– “West Wing”. TV drama set in the white house (–0.01)– “Sonic food industry”. Fast-food restaurant chain (–0.01)
• Politics and organizations are difficult to improve?– Bruce Bartlett (+0.07), Jihad (+0.06), McDonalds (+0.03),
Qualcomm (+0.02)
![Page 20: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/20.jpg)
20
Results for dynamic model adaptation
• Moderately improved performance• For “Zyrtec”, AP improved by 47.7%
![Page 21: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/21.jpg)
21
Results for model adaptation for difficult topics
• For most topics, AP slightly but consistently improved
![Page 22: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/22.jpg)
22
Conclusions
• Proposed subjective trigger models reflecting subjective opinions– Two assumptions + a modification to low-level triggers
• Combined with an IR model for opinion retrieval– 22.0% improvement over INM in MAP– Effective for most topics, slight drop for topics concerning
politics and organizations• Dynamic model adaptation
– Positive effect overall (+25.0% over initial search)– Moderately effective for politics- and organization-related
topics
![Page 23: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/23.jpg)
23
Future work
• Use of a larger corpus of customer reviews• Use of labeled data in the blog track test collection• Refine the approach to model adaptation
![Page 24: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/24.jpg)
24
ReferencesMishne, G.: Multiple Ranking Strategies for Opinion Retrieval in Blogs, Proceesings of the 15th Text
Retrieval Conference (2006).
Zhang, M. and Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp.411.418 (2008).
Lau, R., Rosenfeld, R. and Roukos, S.: Trigger-based language models: a maximum entropy approach, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, pp.45.48 (1993).
Tillmann, C. and Ney, H.: Grammatical Interference: Learning Syntax from Sentences, Lecture Notes in Computer Science, chapter Selection criteria for word trigger pairs in language modeling, pp.95.106, Springer Berlin / Heidelberg (1996).
Zhou, G., Joshi, H. and Bayrak, C.: Topic Categorization for Relevancy and Opinion Detection, Proceedings of the 16th Text Retrieval Conference (2007).
Yang, K., Yu, N. and Zhang, H.: WIDIT in TREC 2007 Blog Track: Combining Lexicon- Based Methods to Detect Opinionated Blogs, Proceedings of the 16th Text Retrieval Conference (2007).
Zhang, W., Yu, C. and Meng, W.: Opinion retrieval from blogs, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 831.840 (2007).
![Page 25: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/25.jpg)
25
QUESTIONS?
![Page 26: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/26.jpg)
26
Comparative experiments
2006
TREC best 0.1885
Zhang et al. 0.2726
Ours w/ our baseline 0.2398
Ours w/ stronger baseline (0.3022)
0.3221
![Page 27: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/27.jpg)
27
Comparative experiments
2007
TREC best 0.4341
TREC 2nd 0.3453
TREC 3rd 0.3264
Ours w/ our baseline (0.2508)
0.3072
Ours w/ stronger baseline (0.3784)
0.4054
![Page 28: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/28.jpg)
28
Comparative experiments
2008. Same baseline
TREC best 0.4067
TREC 2nd 0.4006
TREC 3rd 0.3964
Ours w/ stronger baseline (0.3822)
0.3996
![Page 29: Adaptive Subjective Triggers for Opinionated Document Retrieval](https://reader035.vdocuments.mx/reader035/viewer/2022062315/56816163550346895dd0eec0/html5/thumbnails/29.jpg)
29
Comparative experiments (polarity task)
2008. Same baseline
TREC best (ours) 0.1448
TREC 2nd 0.1348
TREC 3rd 0.1129