query log analysis

Query Log Analysis

Naama Kraus

Slides are based on the papers:Andrei Broder, A taxonomy of web searchRicardo Baeza-Yates, Graphs from Search Engine QueriesHassan, Jones, Klinkner, Beyond DCG: User Behavior as a Predictor of a Successful Search

A Taxonomy of Web Searches

• [Andrei Broder] classifies web queries according to their intent:– Navigational - reach a particular site

• Example: cnn , Oracle– Informational - acquire some information

• Example: the history of haifa , information retrieval– Transactional - perform some web-mediated

activity. Further interaction is expected.• E.g. shopping, downloading files, accessing databases• Example: new balance shoes , Israel flights

Query Log

• Search Engine Query Log records users’ searches

• A typical record contains– Anonymous User id u– Search query q– Returned documents V– Clicked documents C– Timestamp t

Query Log Example

1234 , apple, 12:041234, apple ipod, 12:051234 ynet, 12:13145 google, 12:20145 eBay, 12:5632 ynet news, 12:59145 Solaris systen, 13:01145 Solaris system, 13:05…

Session

• A sequence of searches of one particular user u within a specific time limit

• S = < <u, q1 ,t1> , …, <u, qk, tk> >• t1 < …< tk (=> ordered sequence)• ti+1 – ti < t0 (=> t0 is a timeout threshold)

• Note1 may contain non related queries• Note2 identifying sessions is easy

Session Example

• 1234 , apple, 12:04• 1234, apple ipod, 12:05• 1234 ynet, 12:13• 1234 apple store, 12:20• 1234 cnn news, 12:56• 1234 cnn webcast,

12:59• 1234 apple apps, 13:01

• Session 1• Session 2• Timeout threshold = 30

minutes

Query Chain

• A sequence of queries with a similar information need of a particular user– Also known as mission or logical session

• Example: haifa maps haifa travel attractions in haifa

• Note1 contains related queries only• Note2 identifying chains is difficult

Query Chain Example

• 1234 , apple, 12:04• 1234, apple ipod, 12:05• 1234 ynet, 12:13• 1234 apple store, 12:20• 1234 cnn news, 12:56• 1234 cnn webcast,

12:59• 1234 apple apps, 13:01

• chain1• chain2

Click Graph

Bipartite graphNodes in left side are unique queriesNodes in right side are unique URLs

An edge between q,u if there existsin the log a click on u for query q

Edges may be weighted according tonumber of clicks

This graph is used by numerousAlgorithm for various purposesE.g., query and URL clustering,query recommendations …

Query Graphs

Each unique query isa node in the graph

Next slides – Connection types between queries(edges)

Proposed by[Ricardo Baeza-Yates]

Query Graphs – Word Graph

An edge between nodesexists, if queries sharecommon terms

Possible node weight –Number of occurrencesin the log

Possible edge weight -Jaccard distance

paris hotels

cheap paris hotels

paris attractions

london attractions

Query Graphs – Session Graph

Node’s q weight is the number ofsessions that contain the query q (usually equalsnumber of query occurrences)

A directed edge from q1 to q2if q1 occurred before q2 in the same session

Edge’s weight is numberof such occurrences

paris hotels

paris attractions

cheap paris hotels

london attractions

Query Graphs – URL Cover Graph

paris hotels

paris attractions

cheap paris hotels

london attractions

An edge exists between q1and q2, if they share clicked URLs

Node weight = #occurrences

Edge’s weight is the number ofcommon clicks

Query Graph – URL Link Graph

paris hotels

paris attractions

cheap paris hotels

london attractions

An edge exists between q1and q2, if there is at least one link between a url click of q1 and a url click of q2

Node weight =#occurrences

Edge’s weight is the numberof such common links

Query Graph –URL Terms Graph

paris hotels

paris attractions

cheap paris hotels

london attractions

Represent a clicked URL bya set of terms(whole page, snippet, anchors, title, a combination …)

Weight terms by their frequencies

Node weight =#occurrences

There’s an edge between q1 andq2 if there are at least m commonterms in at least one clickedurl of q1 and one clicked url of q2

Edge weight is sum of frequenciesof common terms

User Behavior as a Predictor of a Successful Search

• Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not– Success – user is satisfied with the results– Failure – user is unsatisfied

• Method: – Analyze the query log and learn success/failure

patterns– Use learned models for prediction

• Proposed by [Hassan, Jones and Klinkner]

Data

• A rich query log of queries and user actions:– Query (Q)– Search Click (SR)– Sponsored Search Click (AD)– Related Search Click (RL)

• Query recommendations– Spelling Suggestion Click (SP)– Shortcut Click (SC)

• E.g. image, video, news …– Any Other Click (OTH)

• E.g. browser tab

Data Labeling

• Random sample of user sessions

• Human editors labeled data:– Detected logical sessions– Success/Failure

• definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful

Markov Models

• Partition training data into two splits– successful goals– unsuccessful goals

• For each group construct a Markov Model derived from seen action sequences– A Model describes the user behavior in case of a

successful/unsuccessful search goal– Action type is a state– Weight a transition from one state to another

according to its probability as observed in the data

(MLE)

Transition Weighting - MLE

,

Pr ,

, :

:

i j

i

i j

i

S SMLE i j

S

S S

i j

S

i

N NS S

N

N N

Number of times we sawa transition fromS to S

N

Number of times we sawtransition S

Illustration

START

Q SR

END

ADRL

1

0.3 0.1

0.6

0.1

0.4

0.5

1 1

Prediction (1)• Given a user’s action sequence, need to

predict whether it is successful or not• We’ve learned two models Ms and Mf of

successful and unsuccessful patterns• Compute the probability that a given

sequence S={S1,…,Sn} was generated from Ms, same for Mf

• Predict success/non success by computing log likelihood– Formulas in next slide

Prediction (2)

Formulas taken from the paper

query log analysis

Documents

query q edges

apple ipod

apple store

apple apps

session edges weight

web queries

cnn news

occurrences edges weight