ucair project xuehua shen, bin tan, chengxiang zhai

43
UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai http://sifaka.cs.uiuc.edu/ir/ucair/

Upload: loreen-nelson

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

UCAIR Project

Xuehua Shen, Bin Tan, ChengXiang Zhai

http://sifaka.cs.uiuc.edu/ir/ucair/

Page 2: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

2

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 3: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

3

Problem of Context-Independent Search

Jaguar

CarApple Software

Animal

Chemistry Software

Page 4: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

4

Other Context Info:Dwelling timeMouse movement

Clickthrough

Query History

Put Search in Context Apple software

Hobby…

Page 5: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

5

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 6: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

6

A Decision Theoretic Framework

• Model interactive IR as “action dialog”: cycles of user action and system response

User action System response

Submit a new query Retrieve new documents

View a document Rerank document

Page 7: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

7

A Decision Theoretic Framework (cont.)

• Search optimal system response given a new user action

( ) 1

*t 1

* *( )

* arg min ( , , ) ( | , , , )

m arg max ( | , , , )

arg min ( , , )

t

t

t r R a t t tM

m t t

t r R a t t

r L a r m P m U D A R dm

P m U D A R

r L a r m

Page 8: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

8

User Models

• Components of user model M– User information need

– User viewed documents S

– User actions At and system responses Rt-1

– …

1( , , , )t tM S x A R

x

Page 9: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

9

Loss Functions

• Loss function for result reranking

• Loss function for query expansion

1

( , , ) ( | ) ( | , )k

i ii

L a r m P view d P relevant d m

*

( )

arg min ( , , )

= arg min ( , ( ), )

(arg min ( ( , ( ), )

tr t

f q

q t

r L a r m

L a f q m

f L q f q m

Page 10: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

10

Implicit User Modeling

1

1

(1 )k

iki

x q s

������������� �

• Update user information need given a new query

• Learn better user models given skipped top n documents and viewed the (n+1)-th document

Page 11: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

11

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 12: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

12

Four Contextual Language Models

Q2

{C2,1 , C2,2 ,C2,3 ,… } C2…

Q1 User Query

{C1,1 , C1,2 ,C1,3 ,…} C1

User Clickthrough

? User Information Need

How to model and use all the information?Qk

e.g., Apple software

e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, a screenshot gallery, latest software downloads, and a directory of ...

Page 13: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

13

Retrieval Model

Qk

D

θQk

θD

Similarity Measure

Results( || )kQ DD

Basis: Unigram language model + KL divergence

( | ) ( | )k kp w p w Q 1 1 1 1,..., , ,...( | ) ,( | , )k kk kQ Qp Cw p w CQ

U

Contextual search: query model update using user query and clickthrough history

'

kQ

'( || )kQ DD

Query History Clickthrough

Page 14: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

14

Fixed Coefficient Interpolation (FixInt)

Qk

Q1

Qk-1

C1

Ck-1

Average user query history and clickthrough

CH

QH

11

11

( | ) ( | )i k

Q iki

p w H p w Q

11

11

( | ) ( | )i k

C iki

p w H p w C

1 H

Linearly interpolate history models

( | ) ( | ) (1 ) ( | )C Qp w H p w H p w H

k

1

Linearly interpolate current queryand history model

( | ) ( | ) (1 ) ( | )k kp w p w Q p w H

Page 15: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

15

Bayesian Interpolation (BayesInt)

Q1

Qk-1

C1

Ck-1

Average user query andclickthrough history

CH

QH

11

11

( | ) ( | )i k

Q iki

p w H p w Q

11

11

( | ) ( | )i k

C iki

p w H p w C

Intuition: if the current query Qk is longer, we should trust Qk more

Qk

Dirichlet Prior

( , ) ( | ) ( | )

| |

| || | | |

( | )

= ( | ) [ ( | ) ( | )]

k Q C

k

k

k k

c w Q p w H p w H

k Q

Qk Q CQ Q

p w

p w Q p w H p w H

k

Page 16: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

16

Online Bayesian Update (OnlineUp)

'1k

Qk k

C2'2

v

Q1 1

Intuition: continuous belief update about user information need

Q2 2

'1( ,

|))

|( |( | ) i

i

ic w Q p wi Qp w

C1

v'

1( , )

|| )' (

|( | ) i

i

ic p ww Ci C vp w

Page 17: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

17

Batch Bayesian Update (BatchUp)

C1

C2

1k

…Ck-1

'k

1

1

1

1

( , ) ( | )'

| |( | )

ij kj

ijj

c w C p w

k Cp w

Intuition: clickthrough data may not decay

Qk k

Q1 1 1( , ) ( | )| |( | ) i i

i

c w Q p wi Qp w

Q2 2

Page 18: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

18

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 19: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

19

UCAIR Toolbar Architecture(http://sifaka.cs.uiuc.edu/ir/ucair/download.html)

Search Engine(e.g.,

Google)Search History Log

(e.g.,past queries,clicked results)

Query Modification

ResultRe-Ranking

UserModeling

Result Buffer

UCAIR User query

results

clickthrough…

Page 20: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

20

System Characteristics

• Client side personalization

– Privacy

– Distribution of computation

– More clues about the user

• Implicit user modeling

• Bayesian decision theory and statistical language model

Page 21: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

21

User Actions

• Submit a keyword query

• View a document

• Click the “Back” button

• Click the “Next” link

Page 22: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

22

System Responses

• Decide relatedness of neighboring queries and do query expansion

• Update user model according to clickthrough

• Rerank unseen documents

Page 23: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

23

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 24: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

24

TREC Style Evaluation – Data Set

• Data collection: TREC AP88-90

• Topics: 30 hard topics of TREC topics 1-150

• System: search engine + RDBMS

• Context: Query and clickthrough history of 3 participants (http://sifaka.cs.uiuc.edu/ir/ucair/QCHistory.zip)

Page 25: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

25

Experiment Design

• Models: FixInt, BayesInt, OnlineUp and BatchUp

• Performance Comparison: Qk vs. Qk+HQ+HC

• Evaluation Metrics: MAP and Pr@20 docs

Page 26: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

26

Overall Effect of Search Context

Query

FixInt

(=0.1,=1.0)

BayesInt

(=0.2,=5.0)

OnlineUp

(=5.0,=15.0)

BatchUp

(=2.0,=15.0)

MAP pr@20 MAP pr@20 MAP pr@20 MAP pr@20

Q3 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483 0.0421 0.1483

Q3+HQ+HC 0.0726 0.1967 0.0816 0.2067 0.0706 0.1783 0.0810 0.2067

Improve 72.4% 32.6% 93.8% 39.4% 67.7% 20.2% 92.4% 39.4%

Q4 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933 0.0536 0.1933

Q4+HQ+HC 0.0891 0.2233 0.0955 0.2317 0.0792 0.2067 0.0950 0.2250

Improve 66.2% 15.5% 78.2% 19.9% 47.8% 6.9% 77.2% 16.4%

• Interaction history helps system improve retrieval accuracy

• BayesInt better than FixInt; BatchUp better than OnlineUp

Page 27: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

27

Using Clickthrough Data Only

Query MAP pr@20

Q3 0.0421 0.1483

Q3+HC 0.0766 0.2033

Improve 81.9% 37.1%

Q4 0.0536 0.1930

Q4+HC 0.0925 0.2283

Improve 72.6% 18.1%

Query MAP pr@20

Q3 0.0421 0.1483

Q3+HC 0.0521 0.1820

Improve 23.8% 23.0%

Q4 0.0536 0.1930

Q4+HC 0.0620 0.1850

Improve 15.7% -4.1%

Query MAP pr@20

Q3 0.0331 0.125

Q3+HC 0.0661 0.178

Improve 99.7% 42.4%

Q4 0.0442 0.165

Q4+HC 0.0739 0.188

Improve 67.2% 13.9%

BayesInt (=0.0,=5.0)

Clickthrough data can improve retrieval accuracy of unseen relevant docs

Clickthrough data corresponding to non-relevant docs are useful for feedback

Page 28: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

28

Sensitivity of BatchUp Parameters

Sensivitiy of mu in BatchUp Model

0

0.02

0.04

0.06

0.08

0.1

0 1 2 3 4 5 6 7 8 9 10

mu

MA

P

Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc

• BatchUp is stable with different parameter settings• Best performance is achieved when =2.0; =15.0

Sensivity of nu in BatchUp Model

0

0.02

0.04

0.06

0.08

0.1

0 1 2 5 10 15 30 100 300 500

nu

MA

P

Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc

Page 29: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

29

A User Study of Personalized Search

• Six participants use UCAIR toolbar to do web search

• Topics are selected from TREC web track and terabyte track

• Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR

Page 30: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

30

Precision at Top N DocumentsRanking Method

prec@5 prec@10 prec@20 prec@30

Google 0.538 0.472 0.377 0.308

UCAIR 0.581 0.556 0.453 0.375

Improvement

8.0% 17.8% 20.2% 21.8%

More user interaction, better user model and retrieval accuracy

Page 31: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

31

Precision-Recall Curve

Page 32: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

32

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 33: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

33

Decision Theoretic Framework

• User model

– Include more factors (e.g., readability)

– Represent information need in a multi-theme way

– Learn user model from data accurately

– Compute user model efficiently

• Loss function goes beyond relevance

• Short-term context synergize with long-term context

Page 34: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

34

Retrieval Models

• Bridge existing retrieval models and decision theoretic framework (same for active feedback work)

• Deduce new retrieval models from decision theoretic framework

• Find effective and efficient retrieval models

Page 35: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

35

Retrieval Models (cont.)

• Study specific parameter settings for personalized web search (e.g., ranking of snippets)

• Utilize context information in finer-granularity (e.g., query relationship and relative judgment of clickthrough data)

Page 36: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

36

System

• Make system more robust and more efficient

• Enrich user profile (bookmark, local files, etc.)

• Study user interface design

– How many results are personalized

– Aggressive vs. conservative personalization

– Result representation

– …

• Study session boundary detection algorithms

Page 37: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

37

System (cont.)

Add new features into UCAIR toolbar

– Incorporate clustering into the system

– Predict user preference based on non-textual features (e.g. website, document format)

• Analyze logs

– Simple statistics

– Query similarity in a community

• Distribute the toolbar

Page 38: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

38

Evaluation

• Build an evaluation data set for contextual search (utilize TREC interactive track)

• Make a large scale user study of contextual search

• Study privacy issue of UCAIR toolbar

• Study how to share user logs

• When will personalization be more effective than non-personalization and vice versa

Page 39: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

39

Outline

• Motivation

• Progress

– Framework

– Model

– System

– Evaluation

• Road ahead

– Continuous work

– New direction

Page 40: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

40

Application

• Apply techniques in different domains

– Personalized tutoring system

– Personalized bioinfo system

• Collaborative filtering application

– Goodies for connecting people

– Social network?

• Combination of client and server for personalization

Page 41: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

41

Personalization is a dead end by CEO (Raul Valdes-Perez ) of Vivisimo in Nov., 2004

• People are not static

• Surfing data is weak

• Whole web page is misleading

• Home computers are shared by family members

• Query is short

Best personalization is done by individuals themselves

Vivisimo way: Clustering, then user explore themselves

Page 42: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

42

Personalization is the Holy Grail for searchco-founder of Yahoo! (Jerry Yang ) in March, 2005

• One size does fit not all

CNN report [Yang] also said that the key challenge for Yahoo! and all search companies going forward will be to find ways to increased the personalization of results, i.e. making sure that a user truly finds what he or she is looking for when typing in a keyword search.

"The relevance of search is still the Holy Grail for any search application," Yang said.

Page 43: UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

43

Thank you !

The End