when relevance is not enough

26
When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question Recommendation IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG YAHOO!RESEARCH

Upload: moresmile

Post on 01-Jul-2015

157 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: When relevance is not enough

When Relevance is not Enough:Promoting Diversity and Freshness inPersonalized Question Recommendation IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG

YAHOO!RESEARCH

Page 2: When relevance is not enough

ABSTRACT

a good question recommendation system

1. designed around answerers, rather than exclusively for askers

2. Scale to many questions and users and be fast enough

3. Relevant to his or her interests

4. diversity

Page 3: When relevance is not enough

INTRODUCTION

Common way: only to the best possible answerers (“experts”)

All potential answerers

Page 4: When relevance is not enough

INTRODUCTION

relevance: to what degree the question matches the user’s tastes

diversity and freshness needs

Three requirements:

1. questions need to be recommended for all types of users

2. questions have to be diverse

3. recommendations need to be fresh and be served fasta) serve questions as recommendations immediately

b) instantly adapting to users’ changes in taste

Page 5: When relevance is not enough
Page 6: When relevance is not enough

RELATED WORK

limitations

real-time ranking

the needs of new users with very little historical data are not addressed well.

only on relevance

Page 7: When relevance is not enough

FrameworkQuestion profile:

1. LDA model

2. Lexical model

3. Category model

User profile:

Question recommendationMatching question and user profiles

Proactive diversification

Recommendation merging

Page 8: When relevance is not enough

QUESTION PROFILE

Split it according to the 26 top categories in Yahoo! Answers

Two Advantage:1. represent disjoint users’ interests.

2. word sense disambiguation

1. question textual content(title and body)

2. category

Page 9: When relevance is not enough

QUESTION PROFILE

Build profile, which is represented by three vectors:

1. a Latent Dirichlet Allocation (LDA) topic vector

2. a lexical vector

3. a category vector

Page 10: When relevance is not enough

LDA Model

1. Initial training: a random sample of up to 2 million resolved questions

2. Incremental learning: a random sample of up to half a million questions per top category

3. Inference: at least10% of the probability mass

Page 11: When relevance is not enough

Lexical Model a unigram bag-of-words representation of a question

tf·idf score / L1 normalized

a probability distribution

Category Model a probability of 1 to the category in which the question was posted

Page 12: When relevance is not enough

USER PROFILE

the questions answered in the past

the user representation is generated by aggregating signals over these questions

user profile: a probability tree

Page 13: When relevance is not enough

1. Aggregating the profiles of the questions the user answered

2. Update

Page 14: When relevance is not enough

the first and third tree levels: a decaying factor on past questions

the second level:1. Measure the similarity between the feature distribution of each model in the

question and the corresponding feature distribution in the user profile

2. Normalized to a probability distribution

Page 15: When relevance is not enough

QUESTION RECOMMENDATION

Matching Question and User Profiles

A list of open questions ranked by a relevance score, which is calculated for the pair {questionprofile , user profile}

For question profiles:

1. Turn the three vectors forming the question profile into a single vector, multiply the probability of each feature by 1/3 before storing it in the index

2. Index every question vector and build an inverted index

Page 16: When relevance is not enough

QUESTION RECOMMENDATION

For user profile:

associate with each user feature a score that consists of the product of each probability score on the tree path that led to this feature

Ranking:

Similarity: a simple dot-product

Page 17: When relevance is not enough

QUESTION RECOMMENDATION

Proactive Diversification

thematic sampling:

1. For each user vector u , we generate N query vectors u 1 ;u 2 ;…;u N

2. N ranked lists

3. Blending them together results in a final diverse list

Two types of thematic constraints:

specific top category: randomly select top categories as constraints by sampling without repetition based on their distribution in the root node of the user’s probability tree

spefic LDA topic: randomly sample LDA topics without repetition from the user profile by traversing the probability tree

Page 18: When relevance is not enough

QUESTION RECOMMENDATION

Recommendation Merging

blending algorithm

1. Each list being associated with a probability score

2. Sampling an intermediate list, based on the assigned probabilities

3. Removing one recommendation from the sampled list to be added at the end of the finallist.

4. Repeat

Page 19: When relevance is not enough

QUESTION RECOMMENDATION

Non-Thematic LDA Topics

Page 20: When relevance is not enough

QUESTION RECOMMENDATION

Non-Thematic LDA Topics

116 topics, 23 top categories34% non-thematic topics

A logistic regression classifier

Page 21: When relevance is not enough

EXPERIMENTS

Offline Experiment

8 different top categories

Active users: at least 21 questions as of January 2011

New users: at least two questions as of January 2011

Page 22: When relevance is not enough

EXPERIMENTS

Online Experiment

A/B test

Control bucket , CTL ( n = 25093)

Relevance bucket , R ( n = 5359)

Freshness bucket , F ( n = 46228) : 50% recent ; 20% thematic sampling

Diversity bucket , D ( n = 42041) : 20% recent ; 50% thematic sampling

Page 23: When relevance is not enough
Page 24: When relevance is not enough
Page 25: When relevance is not enough
Page 26: When relevance is not enough

CONCLUSIONS

Relevance, but also by freshness and diversity

Several relevance models

“question retrieval engine“

Diversity: thematic sampling

内容上:different factors/models/levels

写作上:层次清楚,递进