ucair project xuehua shen, bin tan, chengxiang zhai
TRANSCRIPT
UCAIR Project
Xuehua Shen, Bin Tan, ChengXiang Zhai
http://sifaka.cs.uiuc.edu/ir/ucair/
2
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
3
Problem of Context-Independent Search
Jaguar
CarApple Software
Animal
Chemistry Software
4
Other Context Info:Dwelling timeMouse movement
Clickthrough
Query History
Put Search in Context Apple software
Hobby…
5
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
6
A Decision Theoretic Framework
• Model interactive IR as “action dialog”: cycles of user action and system response
User action System response
Submit a new query Retrieve new documents
View a document Rerank document
7
A Decision Theoretic Framework (cont.)
• Search optimal system response given a new user action
( ) 1
*t 1
* *( )
* arg min ( , , ) ( | , , , )
m arg max ( | , , , )
arg min ( , , )
t
t
t r R a t t tM
m t t
t r R a t t
r L a r m P m U D A R dm
P m U D A R
r L a r m
8
User Models
• Components of user model M– User information need
– User viewed documents S
– User actions At and system responses Rt-1
– …
1( , , , )t tM S x A R
x
9
Loss Functions
• Loss function for result reranking
• Loss function for query expansion
1
( , , ) ( | ) ( | , )k
i ii
L a r m P view d P relevant d m
*
( )
arg min ( , , )
= arg min ( , ( ), )
(arg min ( ( , ( ), )
tr t
f q
q t
r L a r m
L a f q m
f L q f q m
10
Implicit User Modeling
1
1
(1 )k
iki
x q s
������������� �
• Update user information need given a new query
• Learn better user models given skipped top n documents and viewed the (n+1)-th document
11
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
12
Four Contextual Language Models
Q2
{C2,1 , C2,2 ,C2,3 ,… } C2…
Q1 User Query
{C1,1 , C1,2 ,C1,3 ,…} C1
User Clickthrough
? User Information Need
How to model and use all the information?Qk
e.g., Apple software
e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, a screenshot gallery, latest software downloads, and a directory of ...
13
Retrieval Model
Qk
D
θQk
θD
Similarity Measure
Results( || )kQ DD
Basis: Unigram language model + KL divergence
( | ) ( | )k kp w p w Q 1 1 1 1,..., , ,...( | ) ,( | , )k kk kQ Qp Cw p w CQ
U
Contextual search: query model update using user query and clickthrough history
'
kQ
'( || )kQ DD
Query History Clickthrough
14
Fixed Coefficient Interpolation (FixInt)
Qk
Q1
Qk-1
…
C1
Ck-1
…
Average user query history and clickthrough
CH
QH
11
11
( | ) ( | )i k
Q iki
p w H p w Q
11
11
( | ) ( | )i k
C iki
p w H p w C
1 H
Linearly interpolate history models
( | ) ( | ) (1 ) ( | )C Qp w H p w H p w H
k
1
Linearly interpolate current queryand history model
( | ) ( | ) (1 ) ( | )k kp w p w Q p w H
15
Bayesian Interpolation (BayesInt)
Q1
Qk-1
…
C1
Ck-1
…
Average user query andclickthrough history
CH
QH
11
11
( | ) ( | )i k
Q iki
p w H p w Q
11
11
( | ) ( | )i k
C iki
p w H p w C
Intuition: if the current query Qk is longer, we should trust Qk more
Qk
Dirichlet Prior
( , ) ( | ) ( | )
| |
| || | | |
( | )
= ( | ) [ ( | ) ( | )]
k Q C
k
k
k k
c w Q p w H p w H
k Q
Qk Q CQ Q
p w
p w Q p w H p w H
k
16
Online Bayesian Update (OnlineUp)
'1k
Qk k
C2'2
v
Q1 1
Intuition: continuous belief update about user information need
Q2 2
'1( ,
|))
|( |( | ) i
i
ic w Q p wi Qp w
C1
v'
1( , )
|| )' (
|( | ) i
i
ic p ww Ci C vp w
17
Batch Bayesian Update (BatchUp)
C1
C2
1k
…Ck-1
'k
1
1
1
1
( , ) ( | )'
| |( | )
ij kj
ijj
c w C p w
k Cp w
Intuition: clickthrough data may not decay
Qk k
Q1 1 1( , ) ( | )| |( | ) i i
i
c w Q p wi Qp w
Q2 2
18
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
19
UCAIR Toolbar Architecture(http://sifaka.cs.uiuc.edu/ir/ucair/download.html)
Search Engine(e.g.,
Google)Search History Log
(e.g.,past queries,clicked results)
Query Modification
ResultRe-Ranking
UserModeling
Result Buffer
UCAIR User query
results
clickthrough…
20
System Characteristics
• Client side personalization
– Privacy
– Distribution of computation
– More clues about the user
• Implicit user modeling
• Bayesian decision theory and statistical language model
21
User Actions
• Submit a keyword query
• View a document
• Click the “Back” button
• Click the “Next” link
22
System Responses
• Decide relatedness of neighboring queries and do query expansion
• Update user model according to clickthrough
• Rerank unseen documents
23
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
24
TREC Style Evaluation – Data Set
• Data collection: TREC AP88-90
• Topics: 30 hard topics of TREC topics 1-150
• System: search engine + RDBMS
• Context: Query and clickthrough history of 3 participants (http://sifaka.cs.uiuc.edu/ir/ucair/QCHistory.zip)
25
Experiment Design
• Models: FixInt, BayesInt, OnlineUp and BatchUp
• Performance Comparison: Qk vs. Qk+HQ+HC
• Evaluation Metrics: MAP and Pr@20 docs
26
Overall Effect of Search Context
Query
FixInt
(=0.1,=1.0)
BayesInt
(=0.2,=5.0)
OnlineUp
(=5.0,=15.0)
BatchUp
(=2.0,=15.0)
MAP pr@20 MAP pr@20 MAP pr@20 MAP pr@20
Q3 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483
Q3+HQ+HC 0.0726 0.1967 0.0816 0.2067 0.0706 0.1783 0.0810 0.2067
Improve 72.4% 32.6% 93.8% 39.4% 67.7% 20.2% 92.4% 39.4%
Q4 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933
Q4+HQ+HC 0.0891 0.2233 0.0955 0.2317 0.0792 0.2067 0.0950 0.2250
Improve 66.2% 15.5% 78.2% 19.9% 47.8% 6.9% 77.2% 16.4%
• Interaction history helps system improve retrieval accuracy
• BayesInt better than FixInt; BatchUp better than OnlineUp
27
Using Clickthrough Data Only
Query MAP pr@20
Q3 0.0421 0.1483
Q3+HC 0.0766 0.2033
Improve 81.9% 37.1%
Q4 0.0536 0.1930
Q4+HC 0.0925 0.2283
Improve 72.6% 18.1%
Query MAP pr@20
Q3 0.0421 0.1483
Q3+HC 0.0521 0.1820
Improve 23.8% 23.0%
Q4 0.0536 0.1930
Q4+HC 0.0620 0.1850
Improve 15.7% -4.1%
Query MAP pr@20
Q3 0.0331 0.125
Q3+HC 0.0661 0.178
Improve 99.7% 42.4%
Q4 0.0442 0.165
Q4+HC 0.0739 0.188
Improve 67.2% 13.9%
BayesInt (=0.0,=5.0)
Clickthrough data can improve retrieval accuracy of unseen relevant docs
Clickthrough data corresponding to non-relevant docs are useful for feedback
28
Sensitivity of BatchUp Parameters
Sensivitiy of mu in BatchUp Model
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9 10
mu
MA
P
Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc
• BatchUp is stable with different parameter settings• Best performance is achieved when =2.0; =15.0
Sensivity of nu in BatchUp Model
0
0.02
0.04
0.06
0.08
0.1
0 1 2 5 10 15 30 100 300 500
nu
MA
P
Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc
29
A User Study of Personalized Search
• Six participants use UCAIR toolbar to do web search
• Topics are selected from TREC web track and terabyte track
• Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR
30
Precision at Top N DocumentsRanking Method
prec@5 prec@10 prec@20 prec@30
Google 0.538 0.472 0.377 0.308
UCAIR 0.581 0.556 0.453 0.375
Improvement
8.0% 17.8% 20.2% 21.8%
More user interaction, better user model and retrieval accuracy
31
Precision-Recall Curve
32
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
33
Decision Theoretic Framework
• User model
– Include more factors (e.g., readability)
– Represent information need in a multi-theme way
– Learn user model from data accurately
– Compute user model efficiently
• Loss function goes beyond relevance
• Short-term context synergize with long-term context
34
Retrieval Models
• Bridge existing retrieval models and decision theoretic framework (same for active feedback work)
• Deduce new retrieval models from decision theoretic framework
• Find effective and efficient retrieval models
35
Retrieval Models (cont.)
• Study specific parameter settings for personalized web search (e.g., ranking of snippets)
• Utilize context information in finer-granularity (e.g., query relationship and relative judgment of clickthrough data)
36
System
• Make system more robust and more efficient
• Enrich user profile (bookmark, local files, etc.)
• Study user interface design
– How many results are personalized
– Aggressive vs. conservative personalization
– Result representation
– …
• Study session boundary detection algorithms
37
System (cont.)
Add new features into UCAIR toolbar
– Incorporate clustering into the system
– Predict user preference based on non-textual features (e.g. website, document format)
• Analyze logs
– Simple statistics
– Query similarity in a community
• Distribute the toolbar
38
Evaluation
• Build an evaluation data set for contextual search (utilize TREC interactive track)
• Make a large scale user study of contextual search
• Study privacy issue of UCAIR toolbar
• Study how to share user logs
• When will personalization be more effective than non-personalization and vice versa
39
Outline
• Motivation
• Progress
– Framework
– Model
– System
– Evaluation
• Road ahead
– Continuous work
– New direction
40
Application
• Apply techniques in different domains
– Personalized tutoring system
– Personalized bioinfo system
• Collaborative filtering application
– Goodies for connecting people
– Social network?
• Combination of client and server for personalization
41
Personalization is a dead end by CEO (Raul Valdes-Perez ) of Vivisimo in Nov., 2004
• People are not static
• Surfing data is weak
• Whole web page is misleading
• Home computers are shared by family members
• Query is short
Best personalization is done by individuals themselves
Vivisimo way: Clustering, then user explore themselves
42
Personalization is the Holy Grail for searchco-founder of Yahoo! (Jerry Yang ) in March, 2005
• One size does fit not all
CNN report [Yang] also said that the key challenge for Yahoo! and all search companies going forward will be to find ways to increased the personalization of results, i.e. making sure that a user truly finds what he or she is looking for when typing in a keyword search.
"The relevance of search is still the Holy Grail for any search application," Yang said.
43
Thank you !
The End