unsupervised query segmentation using clickthrough for information retrieval yanen li 1, bo-june...
TRANSCRIPT
Unsupervised Query Segmentation Using Clickthrough for Information Retrieval
Yanen Li1, Bo-June (Paul) Hsu2, ChengXiang Zhai1 and Kuansan Wang2
1Department of Computer Science, University of Illinois at Urbana-Champaign2Microsoft Research, Microsoft Research, One Microsoft Way Redmond, WA
Email: [email protected]
07/25/2011, SIGIR 2011, Beijing China
2
Outline
• Motivation and Related Works• Unsupervised Query Segmentation Model with
Clickthrough • Query Segmentation Evaluation • Integrated Language Model with Query
Segmentation (QSLM)• Evaluation of QSLM• Conclusion and Future Work
3
This Work:• Task 1: probabilistic query segmentationbank of america online banking{[bank of america] [online banking], 0.502}, {bank of america online banking], 0.428}, {[bank of ] [ america] [online banking], 0.001}
• Task 2: retrieval model with query segmentationQ -> {A(Q)} -> D
Motivation
query segmentation: breaking a query into semantic meaningful segments
bank of america online banking -> [bank of america ] [online banking]
Query seg is useful for: (1) noun phrase discovery; (2) query reformulation; (3) phrase-based retrieval models (4) user intent analysis
4
Related Work of Query Segmentation• Mutual information based models [Risvik www 03, Jones www 06]
• Supervised query segmentation models– MRF [Yu KEYS 09]– Limitation: need labeled training examples
• Simple N-gram probability models [Hagen SIGIR 10]
• Unsupervised models– [Tan WWW 2008]– Minimum description length
Limitation: no relevance information (example: “of the”, Query: president of the united states)
president | of the | united states?)
We try to model query seg with clickthrough data, which is previously unexplored
5
Unsupervised Query Segmentation Model using Clickthrough
• Appear both in query and doc • Relevance information• How to model?
Intuitions
6
1. Pick a query length n under a length distribution; e.g. n=4
2. Select a segmentation partition B B∈ n , according to a segmentation partition model P (B|n, ψ);e.g. [X X ] [X X ]
3. Generate query segments Sm consistent with B, ac-cording to a segment unigram model P(Sm|θ). e.g. [food network ] [coupon codes]
Our Segmentation Model
• A generative model• Generating a query:
7
• Under this model:
e.g P([the cuban swimmer paper] |θ) VS P(the | θ) P(cuban | θ) P(swimmer | θ) P(paper| θ)
B: segmentation partitionθ: segment unigram distribution. Vocabulary space: 12…K
infinite strong prior that penalizes longer segments
Prob of seeing Q given B
8
• Extending to <query, doc> pairs
An interpolated model:
global component document-specific component
[President] [of the] [united states]
1. the White House and President Barack Obama, the 44th President of the United States
2. the united states President Barack Obama …3. President Obama remained unable to break a stalemate over the debt…Few investors believe the United States …
QueryClicked docs
Prob is not high for this segmentation
9
• Parameter estimation
An EM algorithm:e.g. oxford real estate advisors
θ: segment unigram distributionEstimate by maximizing in all query-doc pairs
E step, given θ(k-1), for each Q compute posterior probability of a valid segmentation give Q
e.g. P([X ] [X X ] [ X ] | oxford real estate advisors, θD, ψ)
M step, update θ(k):
P(real estate |θ(k)) P([X] [X X] [X] | oxford real estate advisors, θD, ψ)+ P([X X] [X] | real estate california, θD, ψ)+ P([X] [X] [XX] [X] | find a real estate agent, θD, ψ)+…
10
Query Segmentation Evaluation • Datasets– Training set from Bing query log
– Test set 1500 queries from [Bergsma EMNLP-CoNLL 2007], 3 annotators
– Test set 21000 queries from Bing query log, 3 annotators
• Metrics– query accuracy– classify accuracy– segment precision– segment recall– segment F– On setA, setB, setC, set Intersection & Conjunction
11
Result Snapshot
30 [elizabeth nj] [factory outlets]31 [rush university] [medical center]32 [pitch card game] [program]33 [hillsborough] [river] [state park]34 [trane] [vs] [american standard] [a c]35 [jefferson county al] [school system]36 [oxford] [real estate] [advisors]37 [johnson county] [community college]
38 [new york] [insight meditation]39 [aurora ohio] [movie theater]40 [trigun] [maximum] [graphic novels]41 [animals] [redwood] [national park]42 [prime time] [male] [exotic] [dances]43 [pacific grove] [adult] [school]44 [ralph] [ m] [brown] [act]45 [chicago] [gay pride parade]46 [livermore] [mobile home parks]47 [vintage] [harley davidson] [soft] [tail] [standard]
48 [aerotemp] [heat pump] [pools]49 [american indian] [salt] [deficiency]50 [cheap] [crossword puzzle] [books]
2030822 [beauty and the beast]2025251 [history] [of] [armenia]2030690 [american saddlery country flex saddle]2024252 [funny] [award] [certificates]2023090 [champion] [mobile homes]2027667 [pictures] [of] [best friend] [woman] [hugging]2022846 [budget driving school] [san diego]2027746 [publishing] [web site] [internet]2030341 [you tube] [american idol] [results] [april 2 2008]… …
Test Set 1 Test Set 2
12
Subset Metric Baseline Tan's Models Our Models
MI EM + corpus EM+Clicked Doc
Annotation A query accuracy 0.274 0.414 0.440
classify accuracy 0.693 0.762 0.776
segment precision 0.469 0.562 0.598
segment recall 0.534 0.555 0.639
segment F 0.499 0.558 0.618
Annotation B query accuracy 0.244 0.44 0.410
classify accuracy 0.634 0.774 0.750
segment precision 0.408 0.568 0.521
segment recall 0.472 0.578 0.631
segment F 0.438 0.573 0.571
Annotation C query accuracy 0.264 0.416 0.402
classify accuracy 0.666 0.759 0.756
segment precision 0.451 0.558 0.548
segment recall 0.519 0.561 0.619
segment F 0.483 0.559 0.582
Intersection query accuracy 0.343 0.528 0.586
classify accuracy 0.728 0.815 0.842
segment precision 0.510 0.640 0.681
segment recall 0.550 0.650 0.747
segment F 0.530 0.645 0.713
--Clearly outperforms the MI baseline.-- Outperforms [Tan,
WWW 2008] model according to A, C and Intersection-- Our Model + MS Web n-gram beats other models with additional resources
Evaluation on Test Set 1
13
Segmentation Performance with Respect to Penalty Factor
1. Penalty Factor can affect the result a lot
1. At f=2 it achieves good results
14
Integrated Language Model with Query Segmentation (QSLM)
• Traditional IR models– TF-IDF, BM25, Unigram LM …– Terms are scored independently
• Proximity heuristics [Tao SIGIR 07]
• Higher order LMs (biterm LM [Srikanth SIGIR 02])• Capturing linkage [Gao SIGIR 04]
Simple Oracle Ranker
qID Unigram Bigram Oracle2024077 0.33 0.25 0.332024272 0.3 0.34 0.342024291 0.29 0.36 0.36
…
Oracle Ranker Procedure
ResultRemarks:1. Oracle ranker performs
very well2. Simulate similar behavior
with query seg
15
QSLM ModelQuery seg prob
LM
1. doc LM model
2. background LM model
16
bank of america online
1. AOL Inc. (NYSE: AOL, stylized as "Aol.", and previously known as America Online) is an American global Internet services and media company
Document Query Segmentation Prob a/(a+b) Ranking score
Doc 1[bank of america] [online]
0.94 0.6 0.564[bank] [of] [america online]
0.02 0.8 0.0160.58
Doc 2 [bank of america] [online] 0.94 0.9 0.846[bank] [of] [america online] 0.02 0.4 0.008
0.854
2. Online Banking from Bank of America lets you manage your accounts, pay your bills, view credit card activity and more.
How to score docs under QSLM
17
Evaluation of QSLM on Search Ranking
Dataset from Bing12,064 queries
Results on Web Search
1. Better performance than BM25 and Unigram, Bigram LMs2. Results more significant on longer queries
Baselines:BM25, Unigram LM,Bigram LM
18
How many segmentations are needed?1. More segmentations, better search ranking2. Small #segmentations is enough
19
Conclusions and Future Work
• Unsupervised model using clickthrough is effective on query segmentation
• LM with query segmentation can improve search ranking
• But QSLM still underperforms Oracle Ranker• Better model to incorporate query
segmentation is desirable
20
Acknowledgement
We thank SIGIR for the Travel Grant support!
22
Thank You!