personalizing web search jaime teevan, mit with susan t. dumais and eric horvitz, msr

Personalizing Web Search

Jaime Teevan, MITwith Susan T. Dumais and Eric Horvitz, MSR

MotivationAlgorithmsResultsFuture Work

Study of Personal Relevancy

15 SIS users x ~10 queriesEvaluate 50 results

Highly relevant / Relevant / IrrelevantQuery selection

Previously issued query Chose from 10 pre-selected queries

Collected evaluations for 137 queries 53 of pre-selected queries (2-9/query)

Relevant Results Have Low Rank

1 5 9 13 17 21 25 29 33 37 41 45 49

Highly Relevant

Relevant

Irrelevant

Same Query, Different Intent

Different meanings “Information about the astronomical/astrological

sign of cancer” “information about cancer treatments”

Different intents “is there any new tests for cancer?” “information about cancer treatments”

Same Intent, Different Evaluation

Query: Microsoft “information about microsoft, the company” “Things related to the Microsoft corporation” “Information on Microsoft Corp”

31/50 rated as not irrelevant Only 6/31 do more than one agree All three agree only for www.microsoft.com

More to Understand

Do people cluster? Even if they can’t state their intention

How are the differences reflected? Can they be seen from the information on a

person’s computer?Can we do better than the ranking that

would make everyone the most happy? Best common ranking: +38% Best personalized ranking: +55%

Personalization Algorithms

Standard IR

Related to relevance feedbackQuery expansion

Document

Server

Client

v. Result re-ranking

Result Re-Ranking

Takes full advantage of SISEnsures privacyGood evaluation frameworkLook at light weight user models

Collected on server side Sent as query expansion

wi = log

with Relevance Feedback

Score = Σ tfi * wi

BM25 with Relevance Feedback

(ri+0.5)(N-ni-R+ri+0.5)

(ni-ri+0.5)(R-ri+0.5)

wi = log

Score = Σ tfi * wi

(ri+0.5)(N-ni-R+ri+0.5)

(ni- ri+0.5)(R-ri+0.5)

User Model as Relevance Feedback

Score = Σ tfi * wi

(ri+0.5)(N’-ni’-R+ri+0.5)

(ni’- ri+0.5)(R-ri+0.5)wi = log

N’ = N+R

ni’ = ni+ri

Score = Σ tfi * wi

World related to query

Score = Σ tfi * wi

UserWorld related to query

User related to query

Query Focused Matching

Score = Σ tfi * wi

UserWeb related to query

User related to query

Query Focused Matching

World Focused Matching

Score = Σ tfi * wi

Parameters

Matching

User representation

World representation

Query expansion

Parameters

Matching

User representation

Query expansion

Query focused

World focused

Parameters

Matching

User representation

Query expansion

Query focused

World focused

User Representation

Stuff I’ve Seen (SIS) indexRecently indexed documentsWeb documents in SIS indexQuery historyRelevance judgmentsNone

Parameters

Matching

User representation

Query expansion

Query focused

World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone

Parameters

Matching

User representation

Query expansion

Query Focused

World Focused All SISRecent SISWeb SISQuery HistoryRelevance FeedbackNone

World Representation

Document Representation Full text Title and snippet

Corpus Representation Web Result set – title and snippet Result set – full text

Parameters

Matching

User representation

Query expansion

Query focused

Full textTitle and snippet

WebResult set – full textResult set – title and snippet

Parameters

Matching

User representation

Query expansion

Query focused

Query Expansion

All words in document

Query focused

The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...

Parameters

Matching

User representation

Query expansion

Query focused

All words

Query focused

Parameters

Matching

User representation

Query expansion

Query focused

All words

Query focused

Parameters

Matching

User representation

Query expansion

Query focused

All words

Query focused

Baselines

Best possibleRandomText based rankingWeb rankingURL Boost

http://mail.yahoo.com/inbox/msg10

Best Parameter Settings

Richer user representation better SIS > Recent > Web > Query History > None

Suggests rich client importantEfficiency hacks don’t hurt

Snippets query focused Length normalization not an issue

Query focus good

Text Alone Not Enough

Better than some baselines Better than random Better than no user representation Better than relevance feedback

Worse than Web resultsBlend in other features

Web ranking URL boost

Good, but Lots of Room to Grow

Best combination: 9.1% improvementBest possible: 51.5% improvementAssumes best Web combination selectedOnly improves results 2/3 of the time

Finding the Best Parameter Setting

Almost always some parameter setting that improves results

Use learning to select parameters Based on individual Based on query Based on results

Give user control?

Further Exploration of Algorithms

Larger parameter space to explore More complex user model subsets Different parsing (e.g., phrases) Tune BM25 parameters

What is really helping? Generic user model or personal model Use different indices for the queries

Deploy system

Practical Issues

Efficiency issues Can interfaces mitigate some of the issues?

Merging server and client Query expansion

Get more relevant results in the set to be re-ranked Design snippets for personalization

Thank you!

personalizing web search jaime teevan, mit with susan t. dumais and eric horvitz, msr

Documents

dumais - technology lesson ed tech 501

finding and re-finding through personalization jaime teevan...

context-driven satirical headline generation -...

did3370t projets : français et autres disciplines les...

local radio handbook robert horvitz

rea review for dumais

citesight: contextual citation recommendation with...

understanding query ambiguity jaime teevan, susan dumais,...

la culture biologique dumais

eric horvitz technical fellow and director microsoft...

web projections learning from contextual subgraphs of the...

facets of personalization jaime teevan microsoft research...

did3370g projets : franÇais et autres disciplines christian...

marÍa eugenia horvitz

uncertainty, action, and interaction eric horvitz microsoft...

slow search jaime teevan, kevyn collins-thompson, ryen...

did3370t projets : franÇais et autres disciplines christian...

court appeal, fourth appellate district ... - horvitz & levy

how people recall, recognize, and reuse search results...

paul bennett, microsoft research (clues) joint work with ben...