adaptive subjective triggers for opinionated document retrieval

1

Adaptive Subjective Triggers for Opinionated Document Retrieval

Kazuhiro SekiOrganization of Advanced Science & TechnologyKobe University

Kuniaki UeharaGraduate School of Engineering, Kobe University

2/10/2009

2

Background

• Increasing user-generated contents (UGC) on the web– often contain personal subjective opinions

• Can be helpful for personal/corporate decision making → demands to retrieve personal opinions for a given entity

• Traditional IR aims to find documents relevant to a given topic (entity)– not concerned with subjectivity

• Aim: Retrieve documents not only pertinent to a given entity but also containing subjective opinions

3

An (existing) approach

• Lexicon-based (Mishne, 2006; Zhang et al., 2008; etc.)– Look for subjective words/phrases

• “like” conveys favorable feelings– “I like the movie.”

– Potential drawback• Only words/phrases separate from context do not indicate

subjectivity– “It looks like a cat.”– “She likes singing.”

4

Another approach considering wider context

• n-gram language model– estimate word occurrence probabilities based on prior

context or history, i.e., (n – 1) words• bigram: P(wi|wi–1) • trigram: P(wi|wi–2,wi–1)

– Generally, n is set to 2 to 3

5

Trigger models (Lau et al., 1993)

• Incorporate long distance dependency that cannot be handled by n-gram models

• Trigger pairs– word pairs such that one tends to bring about the

occurrence of the other• nor → either (syntactic dependency)• memory → GB (semantic dependency)

• Used by linearly interpolating with an n-gram model(1–λ)·PB(w|h) + λ·PT(w|h)

trigger modeln-gram model

6

Identifying trigger pairs (Tillmann et al. ， 1996)

corpus

n-gram modelP(w|h)

vocabulary

potentialtrigger pairs

trigger model

PT(w|h)

extended model

PE(w|h)

log likelihood difference

Δa→b=∑i {logPE(wi|hi) – logP(wi|hi)}

each paira → b

evaluation

start

When P(b|h) < t→ low level

triggers

7

Building trigger model PT

1. For each identified trigger pair (a→b), compute their association score α(b|a) based on their co-occurrences

2. Define a trigger model PT by using α(·)

average association score betweenwords in history h and word w

8

Subjective trigger model

• Assumptions– Personal subjective opinion consists of two main

components• Subject of the opinion (e.g, “I”, “you”) or the object the opinion is

about (e.g., “The Curious Case of Benjamin Button”) • Subjective expression (e.g., “like”, “feel”)• Treat them as triggering and triggered words, respectively

– Triggering words are expressed as pronouns• Empirical finding

– Proximity of pronouns and subjective expressions to objects is an effective measure of opinionatedness (Zhou et al., 2007; Yang et al., 2007)

9

Identifying “subjective” trigger pairs

• Pronouns considered– I, my, you, it, its, he, his, she, her, we, our, they, their, this

• History h: preceding words in the same sentence• Corpus: 5000 customer reviews from Amazon.com

10

Identifying “subjective” trigger pairs (cont.)

• Low level trigger (P(w|h) < t) causes the problem– Penalize frequent w with infrequent history h

11

reranking

documentsd

query q

documentsd

Opinion retrieval

• Probability that d is relevant to q AND subjective– product of PINM(q|d) and

PE(d)=∏i PE(wi|hi)– PE(d) is smaller for longer d– PINM(q|d) and PE(d) may have

largely different variances• Normalize PE(d) by length m &

take weighted sum of logssubj. languagemodel PE(w|h)

PINM(q|d)

IR by INM

12

Dynamic model adaptation

• Motivation– Language models created from Amazon reviews may not

be effective for some types of entities• Procedure

1. Carry out keyword search for a given topic2. Use k top ranked blog posts to identify new trigger pairs

(a→b) and compute α’(·)3. Update trigger model by using the new trigger pairs

association scores for new triggers

13

Empirical evaluation

• Data– TREC Blog track test collection 2006

• 3 million blog posts crawled from Dec 2005 to Feb 2006• 50 “topics” (user information needs)• Relevant & opinionated posts are explicitly labeled

• Two types of assessment– Evaluation of the language models– Their effects on opinion retrieval

14

Evaluation of language models

• Perplexity– Uncertainty of language model L in predicting word

sequence (d = w1,…,wm)

• Created two hypothetical documents from the Blog track collection– concatenate all the opinionated posts → dO

– all the relevant (but non-opinionated) posts → dN

15

• Higher order n-grams monotonically decrease perplexity irrespective of language models and document types

• Opinionated document dO leads to lower perplexity• Subjective language model PE produces lower perplexity than

n-gram model PB

Perplexity Results

16

Relation between parameter β and MAP

+22.0%

17

Improvement for individual topics

18

Analysis on individual topics

• Topics with notable improvement– “MacBook Pro”. Laptop (+0.22)– “Heineken”. Company and brand names (+0.20)– “Shimano”. Company and brand names (+0.19)– “Board chess”. Board game (+0.13)– “Zyrtec”. Medication (product name) (+0.12)– “Mardi Gras”. Final day of carnival (+0.11)

• Most of them are products– Model learned from Amazon reviews is effective for

products in general, including beer and medication– Also effective for other types of entities

19

Analysis on individual topics (cont.)

• Topics with performance decline– “Jim Moran”. Congressman (–0.15)– “World Trade Org.”. International organization (–0.05)– “Cindy Sheehan”. Anti-war activist (–0.03)– “Ann Coulter”. Political commentator (–0.01)– “West Wing”. TV drama set in the white house (–0.01)– “Sonic food industry”. Fast-food restaurant chain (–0.01)

• Politics and organizations are difficult to improve?– Bruce Bartlett (+0.07), Jihad (+0.06), McDonalds (+0.03),

Qualcomm (+0.02)

20

Results for dynamic model adaptation

• Moderately improved performance• For “Zyrtec”, AP improved by 47.7%

21

Results for model adaptation for difficult topics

• For most topics, AP slightly but consistently improved

22

Conclusions

• Proposed subjective trigger models reflecting subjective opinions– Two assumptions + a modification to low-level triggers

• Combined with an IR model for opinion retrieval– 22.0% improvement over INM in MAP– Effective for most topics, slight drop for topics concerning

politics and organizations• Dynamic model adaptation

– Positive effect overall (+25.0% over initial search)– Moderately effective for politics- and organization-related

topics

23

Future work

• Use of a larger corpus of customer reviews• Use of labeled data in the blog track test collection• Refine the approach to model adaptation

24

ReferencesMishne, G.: Multiple Ranking Strategies for Opinion Retrieval in Blogs, Proceesings of the 15th Text

Retrieval Conference (2006).

Zhang, M. and Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp.411.418 (2008).

Lau, R., Rosenfeld, R. and Roukos, S.: Trigger-based language models: a maximum entropy approach, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, pp.45.48 (1993).

Tillmann, C. and Ney, H.: Grammatical Interference: Learning Syntax from Sentences, Lecture Notes in Computer Science, chapter Selection criteria for word trigger pairs in language modeling, pp.95.106, Springer Berlin / Heidelberg (1996).

Zhou, G., Joshi, H. and Bayrak, C.: Topic Categorization for Relevancy and Opinion Detection, Proceedings of the 16th Text Retrieval Conference (2007).

Yang, K., Yu, N. and Zhang, H.: WIDIT in TREC 2007 Blog Track: Combining Lexicon- Based Methods to Detect Opinionated Blogs, Proceedings of the 16th Text Retrieval Conference (2007).

Zhang, W., Yu, C. and Meng, W.: Opinion retrieval from blogs, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 831.840 (2007).

25

QUESTIONS?

26

Comparative experiments

2006

TREC best 0.1885

Zhang et al. 0.2726

Ours w/ our baseline 0.2398

Ours w/ stronger baseline (0.3022)

0.3221

27


2007

TREC best 0.4341

TREC 2nd 0.3453

TREC 3rd 0.3264

Ours w/ our baseline (0.2508)

0.3072


0.4054

28


2008. Same baseline

TREC best 0.4067

TREC 2nd 0.4006

TREC 3rd 0.3964


0.3996

29

Comparative experiments (polarity task)

2008. Same baseline

TREC best (ours) 0.1448

TREC 2nd 0.1348

TREC 3rd 0.1129

adaptive subjective triggers for opinionated document retrieval

Documents