context-aware query suggestion by mining click-through and session data authors: h. cao et.al kdd 08...

25
Context-aware Query Suggestion by Mining Click- through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Upload: eleanor-washington

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Context-aware Query Suggestion by Mining Click-through and Session Data

Authors: H. Cao et.alKDD 08

Presented by Shize Su

1

Page 2: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Outline

Introduction

Framework of the Proposed Method

Mining Query Concepts

Concept Sequence Suffix Tree

Experimental Evaluation

Summary

2

Page 3: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Introduction

What is query suggestion in search engine? Guess user’s search intent ( user query )

suggest queries

Why query suggestion is important? Easy to issue appropriate query? No! A “bottleneck issue” of search engine usability (Google, Yahoo, Bing, Baidu, etc)

3

Better describe user’s information need?

1 2, or i iq q q q

,j kq q

Page 4: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

IntroductionMajor existing approaches (with search log data) :

Approach I: clustering queries using clicked URL data to find similar queries,

Approach II: mining pairs of queries which are adjacent or co-occur in the same query session,

4

Fig1: An example of search log data

frequent?i jq q

and similar?i jq q

Page 5: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Introduction

Key Limitation: None of them are context-aware: do not consider the

immediately preceding queries as context,

The clustering algorithms cannot scale up to very large data well.

An example: “apple” “steve jobs” “apple”

5

User’s search intent?

1 2 1 i iq q q q

1.8 billion query (151 million unique), 2.6 billion clicked URL(114 million unique)

Page 6: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Proposed Method Framework

6

Key steps: Capture the context: concept sequence Quickly find the queries that many users ask in that context

Clustering queries

Concept Sequence Suffix Tree

Page 7: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

An example of click-through bipartites data from search log:

7

Mining Query Concepts

For each query : a -normalized vector,

iq

2L

( ), if edge ,[ ]

0, oterwiseij ij

i

norm w e existq j

2with ( )

ik

ijij

ike

wnorm w

w

2

distance( , )

( [k] [k])k

i j

i iu U

q q

q q

Page 8: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Key challenges to cluster queries: Search log click-through bipartite could be huge: e.g.,

151 million unique queries Number of clusters is unknown Extremely high dimensionality of query vector: 114

million unique URLs Search logs increase dynamically

Existing query clustering algorithms: Hierarchical agglomerative method DBSCAN method (Wen, WWW’01) K-means, etc.

8

Mining Query Concepts

Page 9: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Proposed clustering method:

9

Mining Query Concepts

Page 10: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

for each query : Step 1: first find the closest cluster to among the

clusters obtained so far Step 2: compute the diameter of cluster Step 3: 1) diameter , is assigned to ,

2) otherwise, create a new cluster containing only

quite efficient: Only need one scan of queries Can run efficiently on a PC of 2GM main memory

10

Mining Query Concepts

C C q

qC

q

maxD q C

C q

q

Page 11: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Tricks for algorithm efficiency improvement: A dimension array data structure used in step 1 (sparse

data) Prune edges of low weights

11

Mining Query Concepts

2

distance( , )

( [k] [k])k

i j

i iu U

q q

q q

Page 12: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Extract query sessions data each individual user’s behavior (query/click) data segment into sessions (time interval>30mins) discard the click event data

12

Concept Sequence Suffix Tree

Fig: An example of search log data

Page 13: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Concept sequence suffix tree A structure used to efficiently find (search) the queries that

many users ask in that context (concept sequence)

13

Concept Sequence Suffix Tree

Fig: An example

Page 14: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Algorithm to build concept sequence suffix tree: 1) Map training session data to

2) Enumerate subsequence of (distributed, map-duce)

3) Get all frequent concept subsequences

4) Organize these into concept sequence suffix tree

14

Concept Sequence Suffix Tree

1 2 iqs q q q 1 2 ,

j i jc c c

1 2 1 ,1,

j i i lc c c c c ci l j

cs

cs

Page 15: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Algorithm for organizing into concept sequence suffix tree:

15

Concept Sequence Suffix Treecs

Page 16: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Organize into concept sequence suffix tree : 1) start from root node (empty), and scan through all frequent concept subsequence cs

2) for each first find node corresponding to

if cr doesn’t exist, create it

3) update the list of candidate concepts of if is among the top K (a specified threshold , e.g., K=5) candidates so far;

4) representative query of the top K candidate concepts are candidate suggestions for sequence

16

Concept Sequence Suffix Tree

1 2 ,lcs c c c

1 2 1' ,lcs c c c

lc

cs

cr

'cs

'cs

Page 17: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Review an example of Concept Sequence Suffix Tree:

17

Concept Sequence Suffix Tree

1 2 ,lcs c c c

1 2 1' ,lcs c c c

Page 18: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Online query suggestion algorithm:

18

Concept Sequence Suffix Tree

Page 19: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

For a query sequence : Map it to concept sequence : if is a new query,

stop mapping, and returned concept sequence corresponding to ;

Search the tree to find the longest matched subsequence of the form

Use candidate suggestions for as query suggestion for

19

Concept Sequence Suffix Tree1 2 lq q q

1 2 lc c c

iq

1 2i i lq q q

1 , 1j j lc c c j

1 , 1j j lc c c j 1 2 lq q q

Page 20: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Review an example of Concept Sequence Suffix Tree:

20

Concept Sequence Suffix Tree

1 2 iqs q q q 2 ,

1 j ij ics c c c

Page 21: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Experimental EvaluationTraining Data:

A commercial search engine search log (Bing) in US 1.8 billion queries (151 million unique ), 2.6 billion URL

clicks (115 million unique), 840million sessions

Baseline algorithms: Adjacency: given , rank based on frequency of N-Gram: given , rank based on frequency

of

Test set data: Test -0: 1000 randomly selected single-query case sessions Test-1: 1000 randomly selected multi-query case sessions

21

i jq q

1 2 i jq q q q

jqiq

1 2 iqs q q q jq

Page 22: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Experimental Results

Coverage of suggestion:

22

Fig: The coverage of the three methods on (a) Test-0 and (b) Test-1

Page 23: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Experimental Results

Quality of suggestion: (collect relevance grading from 10 judges)

23

Fig: The quality of the three methods on (a) Test-0 and (b) Test-1

Page 24: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Summary

Three things to know: Some basics about query suggestion using search log

The proposed efficient query clustering algorithm for search-log click-through bipartites data

The proposed efficient context-aware query suggestion method using concept sequence suffix tree

24

Hints: “concept” level N-gram with varied length N

+ A structure for efficient search

Page 25: Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Thank You!

25