龙星计划课程 : 信息检索 personalized search & user modeling
DESCRIPTION
龙星计划课程 : 信息检索 Personalized Search & User Modeling. ChengXiang Zhai (翟成祥) Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology, Statistics University of Illinois, Urbana-Champaign - PowerPoint PPT PresentationTRANSCRIPT
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1
龙星计划课程 :信息检索 Personalized Search & User Modeling
ChengXiang Zhai (翟成祥 ) Department of Computer Science
Graduate School of Library & Information Science
Institute for Genomic Biology, StatisticsUniversity of Illinois, Urbana-Champaign
http://www-faculty.cs.uiuc.edu/~czhai, [email protected]
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 2
What is Personalized Search?• Use more user information than the user’s query in retrieval
– “more information” = user’s interaction history Implicit feedback
– “more information” = user’s judgments or user’s answer to clarification questions explicit feedback
• Personalization can be done in multiple ways:– Personalize the collection– Personalize ranking– Personalize result presentation– …
• Personalized search = user modeling + model exploitation
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 3
Why Personalized Search?• The more we know about the user’s information
need, the more likely we can get relevant documents, thus we should know as much as we can about the users
• When a query doesn’t work well, personalized search would be extremely helpful.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 4
Client-Side vs. Server-Side Personalization
• Server-Side (most work, including commercial products): – Sees global information (all documents, all users)– Limited user information (can’t see activities outside search
results)– Privacy issue
• Client-Side (UCAIR):– More information about the user, thus more accurate user
modeling (complete interaction history + other user activities)– More scalable (“distributed personalization”)– Alleviate the problem of privacy
• Combination of server-side and client-side? How?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 5
Outline• A framework for optimal interactive retrieval
• Implicit feedback (no user effort)– Within a search session– For improving result organization
• Explicit feedback (with user effort)– Term feedback– Active feedback
• Improving search result organization
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 6
1. A Framework for Optimal Interactive Retrieval [Shen et al. 05]
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 7
IR as Sequential Decision Making
User SystemA1 : Enter a query Which documents to present?
How to present them?
Ri: results (i=1, 2, 3, …)Which documents to view?
A2 : View documentWhich part of the document
to show? How?
R’: Document contentView more?
A3 : Click on “Back” button
(Information Need) (Model of Information Need)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 8
Retrieval Decisions
User U: A1 A2 … … At-1 At
System: R1 R2 … … Rt-1
Given U, C, At , and H, choosethe best Rt from all possible
responses to At
History H={(Ai,Ri)} i=1, …, t-1
DocumentCollection
C
Query=“Jaguar”
All possible rankings of C
The best ranking for the query
Click on “Next” button
All possible rankings of unseen docs
The best ranking of unseen docs Rt r(At)
Rt =?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 9
A Risk Minimization Framework
User: U Interaction history: HCurrent user action: AtDocument collection: C
Observed
All possible responses: r(At)={r1, …, rn}
User Model
M=(S, U…) Seen docs
Information need
L(ri,At,M) Loss Function
Optimal response: r* (minimum loss)
( )arg min ( , , ) ( | , , , )tt r r A t tM
R L r A M P M U H A C dM ObservedInferredBayes risk
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 10
• Approximate the Bayes risk by the loss at the mode of the posterior distribution
• Two-step procedure– Step 1: Compute an updated user model M* based on the
currently available information– Step 2: Given M*, choose a response to minimize the loss
function
A Simplified Two-Step Decision-Making Procedure
( )
( )
( )
arg min ( , , ) ( | , , , )
arg min ( , , *) ( * | , , , )
arg min ( , , *)
* arg max ( | , , , )
t
t
t
t r r A t tM
r r A t t
r r A t
M t
R L r A M P M U H A C dM
L r A M P M U H A C
L r A M
where M P M U H A C
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 11
Optimal Interactive RetrievalUser
A1
U C M*1 P(M1|U,H,A1,C)
L(r,A1,M*1)R1A2
L(r,A2,M*2)R2
M*2 P(M2|U,H,A2,C)
A3 …
Collection
IR system
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 12
Refinement of Risk Minimization• r(At): decision space (At dependent)
– r(At) = all possible subsets of C (document selection)– r(At) = all possible rankings of docs in C – r(At) = all possible rankings of unseen docs– r(At) = all possible subsets of C + summarization strategies
• M: user model – Essential component: U = user information need– S = seen documents– n = “Topic is new to the user”
• L(Rt ,At,M): loss function– Generally measures the utility of Rt for a user modeled as M– Often encodes retrieval criteria (e.g., using M to select a ranking of docs)
• P(M|U, H, At, C): user model inference– Often involves estimating a unigram language model U
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 13
Case 1: Context-Insensitive IR– At=“enter a query Q”
– r(At) = all possible rankings of docs in C
– M= U, unigram language model (word distribution)
– p(M|U,H,At,C)=p(U |Q)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed dthe optimal ranking R is given by ranking documents by D
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 14
Case 2: Implicit Feedback – At=“enter a query Q”
– r(At) = all possible rankings of docs in C
– M= U, unigram language model (word distribution)
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed dthe optimal ranking R is given by ranking documents by D
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 15
Case 3: General Implicit Feedback – At=“enter a query Q” or “Back” button, “Next” button
– r(At) = all possible rankings of unseen docs in C
– M= (U, S), S= seen documents
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed dthe optimal ranking R is given by ranking documents by D
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 16
Case 4: User-Specific Result Summary – At=“enter a query Q”
– r(At) = {(D,)}, DC, |D|=k, {“snippet”,”overview”}
– M= (U, n), n{0,1} “topic is new to the user”
– p(M|U,H,At,C)=p(U,n|Q,H), M*=(*, n*)
( , , ) ( , , *, *)( , *) ( , *)
( * || ) ( , *)i
i t i i
i i
d id D
L r A M L D nL D L n
D L n
n*=1 n*=0
i=snippet 1 0i=overview 0 1
( , *)iL n
Choose k most relevant docs If a new topic (n*=1),
give an overview summary;otherwise, a regular snippet summary
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 17
What You Should Know• Disadvantages and advantages of client-side vs.
server-side personalization
• The optimal interactive retrieval framework provides a general way to model personalized search – Maximum user modeling– Immediate benefit (“eager feedback”)
• Personalization can be potentially done for all the components and steps in a retrieval system
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 18
2. Implicit Feedback [Shen et al. 05, Tan et al. 06]
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 19
“Jaguar” Example
Car
Car
Car
Car
Software
Animal
Suppose we know:1. Previous query = “racing
cars” vs. “Apple OS”2. “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days
3. User just viewed an “Apple OS” document
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 20
How can we exploit such implicit feedback information that already naturally exists to improve ranking
accuracy?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 21
Risk Minimization for Implicit Feedback – At=“enter a query Q”
– r(At) = all possible rankings of docs in C
– M= U, unigram language model (word distribution)
– H={previous queries} + {viewed snippets}
– p(M|U,H,At,C)=p(U |Q,H)
1
1
1 2
( , , ) (( ,..., ), )
( | ) ( || )
( | ) ( | ) ....( || )
i
i
i t N U
N
i U di
t U d
L r A M L d d
p viewed d D
Since p viewed d p viewed dthe optimal ranking R is given by ranking documents by D
Need to estimate a context-sensitive LM
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 22
Scenario 1:Use Information in one Session [Shen et al. 05]
Q2
C2={C2,1 , C2,2 ,C2,3 ,
… }…
C1={C1,1 , C1,2 ,C1,3 ,
…} User Clickthrough
Qk
Q1 User Query e.g., Apple software
e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, …
e.g., Jaguar
1 1 1 1,...,( | ,) ( | ,...,, ) ?k kk kp w p Q CQ Q Cw User Model:
Query History Clickthrough
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 23
Method1: Fixed Coeff. Interpolation (FixInt)
Qk
Q1
Qk-1
…
C1
Ck-1
…
Average user query history and clickthrough
CH
QH1
11
1
( | ) ( | )k
Q iki
p w H p w Q
11
11
( | ) ( | )k
C iki
p w H p w C
1
H Linearly interpolate history models
( | ) ( | ) (1 ) ( | )C Qp w H p w H p w H
k
1
Linearly interpolate current queryand history model
( | ) ( | ) (1 ) ( | )k kp w p w Q p w H
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 24
Method 2: Bayesian Interpolation (BayesInt)
Q1
Qk-1
…
C1
Ck-1
…
Average user query andclickthrough history
CH
QH1
11
1
( | ) ( | )i k
Q iki
p w H p w Q
11
11
( | ) ( | )i k
C iki
p w H p w C
Intuition: trust the current query Qk more if it’s longer
Qk
Dirichlet Prior
( , ) ( | ) ( | )| |( | ) k Q C
k
c w Q p w H p w Hk Qp w
k
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 25
Method 3: Online Bayesian Updating (OnlineUp)
'1k
Qk k
C2'2
v
Q1 1Intuition: incremental updating of the language model
C1
v'
1( , )
|| )' (
|( | ) i
i
ic p ww Ci C vp w
Q2 2
'1( ,
|))
|( |( | ) i
i
ic w Q p wi Qp w
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 26
Method 4: Batch Bayesian Update (BatchUp)
C2
1k
…Ck-1
'k
1
11
1
( , ) ( | )'
| |( | )
ij kj
ijj
c w C p w
k Cp w
Intuition: all clickthrough data are equally useful
Qk k
Q1 1
C1
1( , ) ( | )| |( | ) i i
i
c w Q p wi Qp w
Q2 2
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 27
TREC Style Evaluation
• Data collection: TREC AP88-90
• Topics: 30 hard topics of TREC topics 1-150
• System: search engine + RDBMS
• Context: Query and clickthrough history of 3 participants (http://sifaka.cs.uiuc.edu/ir/ucair/QCHistory.zip)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 28
Example of a Hard Topic<topic><number> 2 (283 relevant docs in 242918 documents)<title> Acquisitions<desc> Document discusses a currently proposed acquisition
involving a U.S. company and a foreign company.<narr> To be relevant, a document must discuss a currently
proposed acquisition (which may or may not be identified by type, e.g., merger, buyout, leveraged buyout, hostile takeover, friendly acquisition). The suitor and target must be identified by name; the nationality of one of the companies must be identified as U.S. and the nationality of the other company must be identified as NOT U.S.
</topic>
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 29
Performance of the Hard TopicQ1: acquisition u.s. foreign companyMAP: 0.004; Pr@20: 0.000
Q2: acquisition merge takeover u.s. foreign companyMAP: 0.026; Pr@20: 0.100
Q3: acquire merge foreign abroad international MAP: 0.004; Pr@20: 0.050
Q4: acquire merge takeover foreign european japan MAP: 0.027; Pr@20: 0.200
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 30
Overall Effect of Search Context
Query FixInt (=0.1,=1.0)
BayesInt(=0.2,=5.0)
OnlineUp(=5.0,=15.0)
BatchUp(=2.0,=15.0)
MAP pr@20 MAP pr@20 MAP pr@20 MAP pr@20
Q3 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483Q3+HQ+HC 0.0726 0.1967 0.0816 0.2067 0.0706 0.1783 0.0810 0.2067Improve 72.4% 32.6% 93.8% 39.4% 67.7% 20.2% 92.4% 39.4%Q4 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933Q4+HQ+HC 0.0891 0.2233 0.0955 0.2317 0.0792 0.2067 0.0950 0.2250Improve 66.2% 15.5% 78.2% 19.9% 47.8% 6.9% 77.2% 16.4%
• Short-term context helps system improve retrieval accuracy
• BayesInt better than FixInt; BatchUp better than OnlineUp
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 31
Using Clickthrough Data Only
Query MAP pr@20Q3 0.0421 0.1483
Q3+HC 0.0766 0.2033
Improve 81.9% 37.1%Q4 0.0536 0.1930
Q4+HC 0.0925 0.2283
Improve 72.6% 18.1%BayesInt (=0.0,=5.0)
Clickthrough is the major contributor
13.9% 67.2%Improve0.1880.0739Q4+HC
0.1650.0442Q4
42.4%99.7%Improve0.1780.0661Q3+HC
0.1250.0331Q3
pr@20MAPQueryPerformance
on unseen docs
-4.1%15.7%Improve0.18500.0620Q4+HC
0.19300.0536Q4
23.0%23.8%Improve0.18200.0521Q3+HC
0.14830.0421Q3
pr@20MAPQuery
Snippets for non-relevant docs are still useful!
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 32
Sensitivity of BatchUp Parameters
Sensivitiy of mu in BatchUp Model
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9 10
mu
MAP
Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc
• BatchUp is stable with different parameter settings• Best performance is achieved when =2.0; =15.0
Sensivity of nu in BatchUp Model
0
0.02
0.04
0.06
0.08
0.1
0 1 2 5 10 15 30 100 300 500
nu
MAP
Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 33
A User Study of Implicit Feedback • UCAIR toolbar (a client-side personalized search
agent using implicit feedback) is used in this study
• 6 participants use UCAIR toolbar to do web search
• 32 topics are selected from TREC Web track and Terabyte track
• Participants evaluate explicitly the relevance of top 30 search results from Google and UCAIR
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 34
UCAIR Outperforms Google: Precision at N Docs
Ranking Method
prec@5 prec@10 prec@20 prec@30
Google 0.538 0.472 0.377 0.308UCAIR 0.581 0.556 0.453 0.375Improvement 8.0% 17.8% 20.2% 21.8%
More user interactions better user models better retrieval accuracy
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 35
UCAIR Outperforms Google: PR Curve
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 36
Scenario 2:Use the Entire History of a User [Tan et al. 06]
• Challenge: Search log is noisy– How do we handle the noise?– Can we still improve performance?
• Solution: – Assign weights to the history data (Cosine, EM
algorithm)
• Conclusions:– All the history information is potentially useful– Most helpful for recurring queries– History weighting is crucial (EM better than Cosine)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 37
Algorithm Illustration
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 38
Sample Results: EM vs. Baseline
History is helpful and weighting is important
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 39
Sample Results: Different Weighting Methods
EM is better than Cosine; hybrid is feasible
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 40
What You Should Know• All search history information helps
• Clickthrough information is especially useful; it’s useful even when the actual document is non-relevant
• Recurring queries get more help, but fresh queries can also benefit from history information
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 41
3. Explicit Feedback [Shen et al. 05, Tan et al. 07]
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 42
Term Feedback for Information Retrieval with
Language ModelsBin Tan, Atulya Velivelli, Hui Fang,
ChengXiang Zhai
University of Illinois at Urbana-Champaign
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 43
Problems with Doc-Based Feedback• A relevant document may contain non-relevant parts
• None of the top-ranked documents is relevant
• User indirectly controls the learned query model
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 44
What about Term Feedback? • Present a list of terms to a user and asks for
judgments – More direct contribution to estimating q – Works even when no relevant document on top
• Challenges:– How do we select terms to present to a user? – How do we exploit term feedback to improve our
estimate of q ?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 45
Improve q with Term Feedback
Query RetrievalEngine
d1 3.5d2 2.4... User
Documentcollection
Term FeedbackModels
Improved estimate of q
Term Judgments
Term Extraction Terms
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 46
Feedback Term Selection• General (old) idea:
– The original query is used for an initial retrieval run – Feedback terms are selected from top N documents
• New idea: – Model subtopics – Select terms to represent every subtopic well– Benefits
• Avoid bias in term feedback • Infer relevant subtopics, thus achieve subtopic
feedback
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 47
oredexpl
un
User-Guided Query Model Refinement
UserExplored
area
Document space
Inferred topic preference directionMost promisingnew topic areasto move to
T1
T2T3
t11t12t21t22t31t32…
++-+--
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 48
Collaborative Estimation of q
q
d1
d2
d3
…dN
top N docs ranked byD(q || d)
t1
t2
t3
…tL
judged feedback terms
C1:0.2C2:0.1C3:0.3
…CK:0.1
weighted clusters
q
rank docs byD(q’ || d)
q
’
q
refined query model
t1
t2
t3
…tL
feedback terms
C1
C2
C3
…CK
Subtopic clusters
P(w|1)P(w|2)
P(w|k)
TFBP(t1|TFB)=0.2
…P(t3|TFB)=0.1
…
Original q
P(w|q)
CFBP(w| CFB)=0.2*P(w|1)+ 0.1*P(w|2)+ …
TCFB
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 49
Discovering Subtopic Clusters with PLSA [Hofmann 99, Zhai et al. 04]
Document dTheme 1
Theme k
Theme 2
…
Background B
traffic 0.3 railway 0.2..
Tunnel 0.1fire 0.05smoke 0.02 ..
tunnel 0.2amtrack 0.1train 0.05 ..
Is 0.05the 0.04a 0.03 ..
k
1
2
BB
1 - B
d,1
d, k
d,2
W
“Generating” word w in doc d in the collection
Query = “transportation tunnel disaster”
Maximum Likelihood Estimator (EM Algorithm)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 50
Selecting Representative Terms– Original query terms excluded– Shared terms assigned to most likely clusters
Cluster 1 Cluster 2 Cluster 3
tunnel
transport
1. traffic
2. railwai
3. harbor
4. rail
5. bridg
6. kilomet
truck
7. construct
……
0.0768
0.0364
0.0206
0.0186
0.0146
0.0140
0.0139
0.0136
0.0133
0.0131
1. tunnel
2. fire
3. truck
4. french
5. smoke
6. car
7. italian
8. firefight
9. blaze
10. blanc
……
0.0935
0.0295
0.0236
0.0220
0.0157
0.0154
0.0152
0.0144
0.0127
0.0121
tunnel
1. transport
2. toll
3. amtrak
4. train
5. airport
6. turnpik
7. lui
8. jersei
9. pass
……
0.0454
0.0406
0.0166
0.0153
0.0129
0.0122
0.0105
0.0095
0.0093
0.0087
L
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 51
User Interface for Term Feedback
Cluster 1 Cluster 3Cluster 2Cluster 1 Cluster 2 Cluster 3Cluster 1 Cluster 2
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 52
Experiment Setup• TREC 2005 HARD Track
• AQUAINT corpus (3GB)
• 50 hard query topics
• NIST assessors spend up to 3 min on each topic providing feedback using Clarification Form (CF)
• Submitted CFs: 1x48, 3x16, 6x8
• Baseline: KL-divergence retrieval method with 5 pseudo-feedback docs
• 48 terms generated from top 60 docs of baseline
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 53
Retrieval Accuracy Comparison
• 1C: 1x48 3C: 3x16 6C: 6x8
• (except for CFB1C) Baseline < TFB < CFB < TCFB
• CFB1C: user feedback plays no role
Base-line
TFB CFB TCFB
1C 3C 6C 1C 3C 6C 1C 3C 6C
MAP 0.219 0.288 0.288 0.278 0.254 0.305 0.301 0.274 0.309 0.304
PR@30 0.393 0.467 0.475 0.457 0.399 0.480 0.473 0.431 0.491 0.473
RR 4339 4753 4762 4740 4600 4907 4872 4767 4947 4906
MAP% 0% 31.5% 31.5% 26.9% 16.0% 39.3% 37.4% 25.1% 41.1% 38.8%
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 54
Reduction of # Terms PresentedTFB CFB TCFB
#terms 1C 3C 6C 3C 6C 3C 6C
6 0.245 0.240 0.227 0.279 0.279 0.281 0.274
12 0.261 0.261 0.242 0.299 0.286 0.297 0.281
18 0.275 0.274 0.256 0.301 0.282 0.300 0.286
24 0.276 0.281 0.265 0.303 0.292 0.305 0.292
30 0.280 0.285 0.270 0.304 0.296 0.307 0.296
36 0.282 0.288 0.272 0.307 0.297 0.309 0.297
42 0.283 0.288 0.275 0.306 0.298 0.309 0.300
48 0.288 0.288 0.278 0.305 0.301 0.309 0.303
#terms=12: 1x12/3x4/6x2
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 55
Clarification Form Completion Time
More than half completed in just 1 min
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 56
Term Relevance Judgment Quality
CF Type 1x48 3x16 6x8#checked terms 14.8 13.3 11.2#rel. terms 15.0 12.6 11.2#rel. checked terms 7.9 6.9 5.9precision 0.534 0.519 0.527recall 0.526 0.548 0.527
[Zaragoza et al. 04]Term relevance
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 57
Had the User Checked all “Relevant Terms”…
TFB1 0.288 -> 0.354TFB3 0.288 -> 0.354TFB6 0.278 -> 0.346CFB3 0.305 -> 0.325CFB6 0.301 -> 0.326TCFB3 0.309 -> 0.345TCFB6 0.304 -> 0.341
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 58
Comparison to Relevance Feedback
# FB Docs MAP Pr@30 RelRet5 0.302 0.586 4779
10 0.345 0.670 491620 0.389 0.772 5004
TCFB3C 0.309 0.491 4947
MAP equivalence: TCFB3C = Rel FB with 5 docs
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 59
Term Feedback Help Difficult Topics
No rel docsIn top 5
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 60
Related Work• Early work: [Harman 88], [Spink 94], [Koenemann &
Belkin 96]…
• More recent: [Ruthven03], [Anick03], …
• Main differences: – Language model– Consistently effective
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 61
Conclusions and Future Work• A novel way of improving query model estimation through
term feedback – active feedback based on subtopics – user-system collaboration– achieves large performance improvement over non-feedback
baseline with small amount of user effort– can compete with relevance feedback, esp. in a situation when
the latter is unable to help
• To explore more complex interaction processes– Combination of term feedback and relevance feedback– Incremental feedback
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 62
What You Should Know• Term feedback can be quite useful when the query is
difficult and relevance feedback isn’t feasible
• Language models can address weighting well in term feedback
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 63
Active Feedback in Ad Hoc IR
Xuehua Shen, ChengXiang ZhaiDepartment of Computer ScienceUniversity of Illinois, Urbana-Champaign
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 64
Normal Relevance Feedback (RF)
Feedback
Judgments:d1 +d2 -…dk -
Query RetrievalSystem
Top K Resultsd1 3.5d2 2.4…dk 0.5
User
DocumentCollection
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 65
Document Selection in RF
Feedback
Judgments:d1 +d2 -…dk -
Query RetrievalSystem
Which k docs
to present ?
User
DocumentCollection
Can we do better than just presenting top-K? (Consider diversity…)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 66
Active Feedback (AF)
An IR system actively selects documentsfor obtaining relevance judgments
If a user is willing to judge K documents,
which K documents should we present
in order to maximize learning effectiveness?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 67
Outline
• Framework and specific methods
• Experiment design and results
• Summary and future work
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 68
A Framework for Active Feedback
• Consider active feedback as a decision problem– Decide K documents (D) for relevance judgment
• Formalize it as an optimization problem– Optimize the expected learning benefits (loss) by
requesting relevance judgments on D from the user
• Consider two cases of loss function according to the interaction between documents– Independent loss: value of each judged document for
learning is independent on each other– Dependent loss
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 69
Independent Loss
Rank docs according to expected loss of each individual doc and then select top K docs
Top K
• Constant loss for any relevant and non-relevant docs• Smaller loss for relevant docs
• A doc is more useful for learning if the prediction of relevance is more uncertain
Uncertainty Sampling
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 70
Dependent Loss
Heuristics: consider relevancefirst, then diversity
First select Top N docs of baseline retrieval
Cluster N docs into K clusters
K Cluster Centroid
MMR
…
Model diversity and relevance
Gapped Top KPick one doc every G+1 docs
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 71
Illustration of Three AF Methods
Top-K (normal feedback)
12345678910111213141516…
GappedTop-K
K-Cluster Centroid
Aiming at high diversity …
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 72
Evaluating Active Feedback
QuerySelect K
Docs
K docs
Judgment File
+
Judged Docs
+ ++
--
InitialResultsNo Feedback
(Top-k, Gapped, Clustering)
FeedbackFeedbackResults
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 73
Retrieval Methods (Lemur toolkit)
Query Q
DDocument D
Q
)||( DQD Results
KL Divergence
Feedback Docs F={d1, …, dn}
Active Feedback
Default parameter settingsunless otherwise stated
FQQ )1('F
Mixture Model Feedback
Only learn from relevant docs
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 74
Comparison of Three AF Methods
Collection Active FB Method
#AFRel per topic
Include judged docsMAP Pr@10doc
HARD Baseline / 0.301 0.501
Pseudo FB / 0.320 0.515Top-K 3.0 0.325 0.527
Gapped 2.6 0.330 0.548Clustering 2.4 0.332 0.565
AP88-89
Baseline / 0.201 0.326Pseudo FB / 0.218 0.343
Top-K 2.2 0.228 0.351Gapped 1.5 0.234 0.389
Clustering 1.3 0.237 0.393Top-K is the worst!Clustering uses fewest relevant docs
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 75
Appropriate Evaluation of Active Feedback
New DB(AP88-89, AP90)
Original DBwith judged docs(AP88-89, HARD)
+ -+
Original DBwithout judged docs
+ -+
Can’t tell if the ranking of un-judged documents is improved
Different methods have different test documents
See the learning effectmore explicitly
But the docs must be similar to original docs
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 76
Retrieval Performance on AP90 Dataset
Method
Baseline
Pseudo
FB
Top K Gapped Top K
K Cluster Centroid
MAP 0.203 0.220 0.220 0.222 0.223pr@10 0.295 0.317 0.321 0.326 0.325
Top-K is consistently the worst!
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 77
Feedback Model Parameter Factor
Mixture Model Parameter alpha factor on the Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.5 0.6 0.7 0.8 0.9 0.95 0.98
alpha
pr@
10do
cs
Top K on HARD
Gapped Top K on HARD
K Cluster Centroid onHARDTop K on AP88-89
Gapped Top K on AP88-89K Cluster Centroid onAP88-89
parameter can amplify the effect of feedback
FQQ )1('
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 78
Summary• Introduce the active feedback problem
• Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering)
• Study the evaluation strategy
• Experiment results show that – Presenting the top-k is not the best strategy– Clustering can generate fewer, higher quality feedback
examples
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 79
Future Work
• Explore other methods for active feedback
• Develop a general framework
• Combine pseudo feedback and active feedback
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 80
What You Should Know• What is active feedback
• Top-k isn’t a good strategy for active feedback; diversifying the results is beneficial
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 81
Learn from Web Search Logs to Organize Search Results
Xuanhui Wang and ChengXiang ZhaiDepartment of Computer Science
University of Illinois, Urbana-Champaign
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 82
Motivation
• Search engine utility = Ranking accuracy + Result presentation + …
• Lots of research on improving ranking accuracy
• Relatively little work on improving result presentation
What’s the best way to present search results?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 83
Ranked List Presentation
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 84
However, when the query is ambiguous…
Query = JaguarCar
Car
Car
Car
Software
Animal
Unlikely optimal for any particular user!
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 85
Cluster Presentation (e.g., [Hearst et al. 96, Zamir & Etzioni 99])
From http://vivisimo.com
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 86
Deficiencies of Data-Driven Clustering
• Different users may prefer different ways to group the results. E.g., query=“area codes”– “phone codes” vs “zip codes”– “international codes” vs “local codes”
• Cluster labels may not be informative to help a user choose the right cluster. E.g., label = “panthera onca”
Need to group search results from a user’s perspective
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 87
Our Idea: User-Oriented Clustering• User-oriented clustering:
– Partition search results according to the aspects interesting to users
– Label each aspect with words meaningful to users
• Exploit search logs to do both– Partitioning
• Learn “interesting aspects” of an arbitrary query• Classify results into these aspects
– Labeling• Learn “representative queries” of the identified aspects • Use representative queries to label the aspects
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 88
Rest of the Talk
• General Approach
• Technical Details
• Experiment Results
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 89
Illustration of the General Idea
query=“ car”
car rental
car pricing
used car
hertz car rental
car accidents
car audio
car crash
…
Retrieval (over log)
1. {car rental, hertz car rental…}2. {car pricing, used car,…}3. {car accidents, car crash, …}4. {car audio, car stereo, …}5. …
Clustering
www.avis.comwww.hertz.comwww.cars.com…
Results
1. {car rental, hertz car rental,…}www.avis.comwww.hertz.com…
2. {car pricing, used car,…}www.cars.com...
3. {car accidents, car crash,…}…
Categorization
Car rental
Used cars
Car accidents
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 90
Query
Search History Collection
Query pseudo doc1
Query pseudo doc2
Clustering
Query Aspect 1
Query Aspect k
…
Results
…
Retrieval
Similar Queries
Categorization…
Labeling
…
Label 1
Label 2
User-Oriented Clustering via Log Mining
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 91
Query
Search History Collection
Query pseudo doc1
Query pseudo doc2 …
Results
…
Implementation Strategy
Retrieval
Similar Queries
BM25Lemur
query+clicked snippets
Pooling identical queries
Clustering
Query Aspect 1
Query Aspect k
Star clustering
Labeling
…
Label 1
Label 2
Center query
Categorization…
Centroid-based
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 92
More Details: Search Engine Log• Record user activities (queries, clicks)
• Reflect user information needs
• Valuable resources for learning to improve search engine utility
sessions
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 93
Recover snippets
More Details: Build History Collection
Pooling “car rental” …
“jaguar car”…
…
session n
For every query (e.g., car rental)
“car rental” “Car rental, rental cars” …,“National car rental” …, …
“jaguar car” “jaguar, car, parts”…, …
… …
History Collection
Clicked urlsU1,U2,…
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 94
More Details: Star Clustering [Aslam et al. 04]
6 2
4
1
12
12
3
2 1
1. Form a similarity graph -TF-IDF weight vectors-Cosine similarity-Thresholding
2. Iteratively identify a “star center” and its “satellites”
“Star center” query serves as a label for a cluster
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 95
Centroid-Based Classifier• Represent each query doc as a term vector (TF-IDF
weighting)
• Compute a centroid vector for each cluster/aspect
• Assign a new result vector to the cluster whose centroid is the closest to the new vector
Aspect 1
Aspect 2
Aspect 3
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 96
Evaluation: Data Preparation• Log data: May 2006 search log released by Microsoft Live
Labs
• First 2/3 to simulate history; last 1/3 to simulate future queries
• History collection (169,057 queries;3.5 clicked URLs/query)
• “Future” collection is further split into two sets for validation and testing
• Test case: a session with more than 4 clicks and at least 100 matching queries in history (172 and 177 test cases in two test sets)
• Use clicked URLs to approximate relevant documents [Joachims, 2002]
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 97
Experiment Design• Baseline method
– the original search engine ranking
• Cluster-based method– Traditional method solely based on content
• Log-based method– Our method based on search logs
• Evaluation – Based on a user’s perceived ranking accuracy – A user is assumed to first view the cluster with largest number of
relevant docs – Measures
• Precision@5 documents (P@5)• Mean Reciprocal Rank (MRR) of the first relevant document
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 98
Overall Comparison
• Log-based >> baseline
• Log-based >> cluster-based
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 99
Diversity Analysis• Do queries with diverse results benefit more?
• Bin by size ratios of the two largest clusters
Queries with diverse results benefit more
Primary/Secondary cluster size ratiomore
diverse
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 100
Query Difficulty Analysis• Do difficult queries benefit more?
• Bin by Mean Average Precisions (MAPs)
Difficult queries benefit more
moredifficult
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 101
Effectiveness of Learning
more history information
P@
5
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 102
Sample Results: Partitioning• Log-based method and regular clustering partition the results
differently
Query: “area codes”
“International codes” or “local codes”
“Phone codes” or “zip codes”
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 103
Sample Results: LabelingQuery: apple
Query: jaguar
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 104
Related Work• Categorization-based (e.g.,Chen & Dumais 00])
– Labels are meaningful to users– Partitioning may not match a user’s perspective
• Faceted search and browsing (e.g.,[Yee et al. 03])
– Labels are meaningful to users– Partitioning is generally useful for a user– Need faceted metadata
• Rather than pre-specify fixed categories/metadata, we learn them dynamically from search log
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 105
Conclusions and Future Work• Proposed a general strategy for organizing search results
based on interesting topic aspects learned from search log
• Experimented with a way to implement the strategy
• Results show that– User-oriented clustering is better than data-oriented clustering– Particularly help difficult topics and topics with diverse results
• Future directions– Mixture of data-driven and user-driven clustering– Study user interaction/feedback with cluster interface– Use general search log to “smooth” personal search log– Query-sensitive result presentation
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 106
What You Should Know• Search history for multiple users can be combined to
benefit a particular user’s search
• Difference between user-oriented result organization and data-oriented organization and their advantages and disadvantages
• How to evaluate clustering results indirectly based on the perceived precision
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 107
Future Research Directions in Personalized Search
• Robust personalization:– Optimization framework for progressive personalization (gradually
become more and more aggressive in using context/history information)
• More in-depth analysis of implicit feedback information– Why does a user add a query term and then drop it after viewing a
particular document?
• More computer-user dialogue to help bridging the vocabulary gap
• Generally, aim at helping improve performance for difficult topics
• What’s the right architecture for supporting personalized search?
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 108
Roadmap• This lecture: Personalized search (understanding
users)
• Next lecture: NLP for IR (understanding documents)
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 109
User-Centered
Search Engine
“java”
Personalized search agent
WEB
Search Engine
Search Engine
DesktopFiles
Personalized search agent
“java”
...Viewed Web pages
QueryHistory
A search agent can know abouta particular user very well
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 110
User-Centered Adaptive IR (UCAIR)• A novel retrieval strategy emphasizing
– user modeling (“user-centered”)– search context modeling (“adaptive”)– interactive retrieval
• Implemented as a personalized search agent that– sits on the client-side (owned by the user)– integrates information around a user (1 user vs. N
sources as opposed to 1 source vs. N users)– collaborates with each other– goes beyond search toward task support
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 112
Challenges in UCAIR• What’s an appropriate retrieval framework for
UCAIR?
• How do we optimize retrieval performance in interactive retrieval?
• How do we develop robust and accurate retrieval models to exploit user information and search context?
• How do we evaluate UCAIR methods?
• ……
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 113
The Rest of the Talk• Part I: A risk minimization framework for UCAIR
• Part II: Improve document ranking with implicit feedback
• Part III: User-specific summarization of search results
Joint work with Xuehua Shen, Bin Tan, and Qiaozhu Mei