term necessity prediction p(t | r q ) le zhao and jamie callan language technologies institute...

24
Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct 27, CIKM 2010 1 Necessity is as important as idf (theory) Explains behavior of IR models (practice) Can be predicted Performance gain Main Points

Upload: charlotte-goodwin

Post on 16-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

1

Term Necessity PredictionP(t | Rq)

Le Zhao and Jamie CallanLanguage Technologies Institute

School of Computer ScienceCarnegie Mellon University

Oct 27, CIKM 2010

• Necessity is as important as idf (theory)

• Explains behavior of IR models (practice)

• Can be predicted

• Performance gain

Main Points

Page 2: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

2

Definition of Necessity P(t | Rq)

Directly calculated given relevance judgements for q

Docs that contain t

Relevant (q)

P(t | Rq) = 0.4

Collection

Necessity == 1 – mismatch== term recall

Page 3: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

3

Why Necessity?Roots in Probabilistic Models

• Binary Independence Model– [Robertson and Spärck Jones 1976]

– “Relevance Weight”, “Term Relevance”• P(t | R) is effectively the only part about relevance.

Necessity oddsidf (sufficiency)

• Necessity is as important as idf (theory)

• Explains behavior of IR models (practice)

• Can be predicted

• Performance gain

Main Points

Page 4: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

Without Necessity

• The emphasis problem for idf-only term weighting– Emphasize high idf terms in query

• “prognosis/viability of a political third party in U.S.” (Topic 206)

4

Page 5: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

5

Ground Truth

party political third viability prognosis

True P(t | R) 0.9796 0.7143 0.5918 0.0408 0.0204

idf 2.402 2.513 2.187 5.017 7.471

Emphasis

TREC 4 topic 206

Page 6: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

6

Indri Top Results

1. (ZF32-220-147) Recession concerns lead to a discouraging prognosis for 1991

2. (AP880317-0017) Politics … party … Robertson's viability as a candidate

3. (WSJ910703-0174) political parties …

4. (AP880512-0050) there is no viable opposition …

5. (WSJ910815-0072) A third of the votes

6. (WSJ900710-0129) politics, party, two thirds

7. (AP880729-0250) third ranking political movement…

8. (AP881111-0059) political parties

9. (AP880224-0265) prognosis for the Sunday school

10. (ZF32-051-072) third party provider

(Google, Bing still have top 10 false positives. Emphasis also a problem for large search engines!)

Page 7: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

7

Without Necessity

• The emphasis problem for idf-only term weighting– Emphasize high idf terms in query

• “prognosis/viability of a political third party in U.S.” (Topic 206)

– False positives throughout rank list• especially detrimental at top rank

– No term recall hurts precision at all recall levels– (This is true for BIM, and also BM25, LM that use tf.)

• How significant is the emphasis problem?

Page 8: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

8

Emphasis 64%Mismatch 27%

Precision 9%

Failure Analysis of 44 Topics from TREC 6-8

RIA workshop 2003 (7 top research IR systems, >56 expert*weeks)

Necessity term weighting

Necessity guided expansion

Basis: Term Necessity Prediction

• Necessity is as important as idf (theory)

• Explains behavior of IR models (practice)& Bigrams, &Term restriction using doc fields

• Can be predicted

• Performance gain

Main Points

Page 9: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

9

Given True Necessity

• +100% over BIM (in precision at all recall levels)• [Robertson and Spärk Jones 1976]

• +30-80% over Language Model, BM25 (in MAP)• This work

• For a new query w/o relevance judgements, need to predict necessity. – Predictions don’t need to be very accurate to show

performance gain.

Page 10: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

10

(Examples from TREC 3 topics)

Term in Query

Oil Spills

Term limitations for US Congress members

Insurance Coverage which pays for Long Term Care

School Choice Voucher System and its effects on the US educational program

Vitamin the cure or cause of human ailments

P(t | R) 0.9914 0.9831 0.6885 0.2821 0.1071

How Necessary are Words?

Page 11: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

11

Mismatch Statistics

• Mismatch variation across terms

(TREC 3 title) (TREC 9 desc)

– Not constant, need prediction

stock

com

pute

cost to

y

vouc

hertak

en stop

fund

amen

talism

0

0.2

0.4

0.6

0.8

1Word Necessity

0

0.2

0.4

0.6

0.8

1Word Necessity

Page 12: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

12

Mismatch Statistics (2)

• Mismatch variation for the same term in different queries

TREC 3 recurring words

– Query dependent features needed(1/3 term occurrences have necessity variation>0.1)

Page 13: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

13

Prior Prediction Approaches

• Croft/Harper combination match (1979)– treats P(t | R) as a tuned constant– when >0.5, rewards docs that match more query terms

• Greiff’s (1998) exploratory data analysis– Used idf to predict overall term weighting– Improved over BIM

• Metzler’s (2008) generalized idf– Used idf to predict P(t | R)– Improved over BIM

• Years of simple idf feature, limited success– Missing piece: P(t | R) = term necessity = term recall

Page 14: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

14

Factors that Affect Necessity

What causes a query term to not appear in relevant documents?

• Topic Centrality (Concept Necessity)– E.g., Laser research related or potentially related to US

defense, Welfare laws propounded as reforms

• Synonyms– E.g., movie == film == …

• Abstractness– E.g., Ailments in the vitamin query, Dog Maulings,

Christian Fundamentalism– Worst thing is a rare & abstract term, e.g. prognosis

Page 15: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

15

Features

• We need to– Identify synonyms/searchonyms of a query term– in a query dependent way

• Use Thesauri?– Biased (not collection dependent)– Static (not query dependent)– Not promising, Not easy

• Term-term similarity in concept space!– Local LSI (Latent Semantic Indexing)

• LSI of (e.g. 200) top ranked documents• keep (e.g. 150) dimensions

Page 16: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

16

Features

• Topic Centrality– Length of term vector after dimension reduction (local

LSI)

• Synonymy (Concept Necessity)– Average similarity scores of top 5 similar terms

• Replaceability– Adjust the Synonymy measure by how many new

documents the synonyms match

• Abstractness– Users modify abstract terms with concrete termseffects on the US educational program prognosis of a political third party

Page 17: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

17

Experiments

• Necessity Prediction Error– Regression problem

• Model: RBF kernel regression, M: <f1, f2, .., f5> P(t | R)

• Necessity for Term Weighting– End-to-End retrieval performance– How to weight terms by their necessity

• In BM25– Binary Independence Model

• In Language Models– Relevance model Pm(t | R) – multinomial (Lavrenko and Croft 2001)

Page 18: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

18

Necessity Prediction Example

party political third viability prognosis

True P(t | R) 0.9796 0.7143 0.5918 0.0408 0.0204

Predicted 0.7585 0.6523 0.6236 0.3080 0.2869

Emphasis

Trained on TREC 3, tested on TREC 4

Page 19: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

19

Necessity Prediction Error

Averag

e (co

nstan

t)

IDF on

ly

All 5 f

eatu

res

Tunin

g meta

-para

meters

TREC 3 rec

urrin

g wor

ds0

0.050.1

0.150.2

0.250.3

0.35

Average Absolute Error (L1 loss) on TREC 4

L1 Loss:

The lower The better• Necessity is as important as idf

• Explains behavior of IR models

• Can be predicted

• Performance gain

Main Points

Page 20: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

20

Predicted Necessity Weighting

TREC train sets 3 3-5 3-7 7Test/x-validation 4 6 8 8LM desc – Baseline 0.1789 0.1586 0.1923 0.1923LM desc – Necessity 0.2261 0.1959 0.2314 0.2333Improvement 26.38% 23.52% 20.33% 21.32%

P@10Baseline 0.4160 0.2980 0.3860 0.3860Necessity 0.4940 0.3420 0.4220 0.4380

P@20Baseline 0.3450 0.2440 0.3310 0.3310Necessity 0.4180 0.2900 0.3540 0.3610

10-25% gain (necessity weight)

10-20% gain(top Precision)

Page 21: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

21

TREC train sets 3-9 9 11 13Test/x-validation 10 10 12 14LM desc – Baseline 0.1627 0.1627 0.0239 0.1789LM desc – Necessity 0.1813 0.1810 0.0597 0.2233Improvement 11.43% 11.25% 149.8% 24.82%

P@10Baseline 0.3180 0.3180 0.0200 0.4720Necessity 0.3280 0.3400 0.0467 0.5360

P@20Baseline 0.2400 0.2400 0.0211 0.4460Necessity 0.2790 0.2810 0.0411 0.5030

Predicted Necessity Weighting (ctd.)

• Necessity is as important as idf

• Explains behavior of IR models

• Can be predicted

• Performance gain

Main Points

Page 22: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

22

vs. Relevance Model

Test/x-validation 4 6 8 8 10 10 12 14

Relevance Model desc 0.2423 0.1799 0.2352 0.2352 0.1888 0.1888 0.0221 0.1774

RM reweight-Only desc 0.2215 0.1705 0.2435 0.2435 0.1700 0.1700 0.0692 0.1945

RM reweight-Trained desc 0.2330 0.1921 0.2542 0.2563 0.1809 0.1793 0.0534 0.2258

Weight Only ≈ ExpansionSupervised > Unsupervised

(5-10%)

Relevance Model: #weight( 1-λ #combine( t1 t2 ) λ #weight( w1 t1

w2 t2

w3 t3

… ) )

x ~ yw1 ~ P(t1|R)w2 ~ P(t2|R)

0 0.2 0.4 0.6 0.8 10

0.20.40.60.8

1

x

y

Page 23: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

23

• Necessity is as important as idf (theory)

• Explains behavior of IR models (practice)

• Effective features can predict necessity

• Performance gain

Take Home Messages

Page 24: Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct

24

Acknowledgements• Reviewers from multiple venues• Ni Lao, Frank Lin, Yiming Yang, Stephen Robertson,

Bruce Croft, Matthew Lease– Discussions & references

• David Fisher, Mark Hoy– Maintaining the Lemur toolkit

• Andrea Bastoni and Lorenzo Clemente– Maintaining LSI code for Lemur toolkit

• SVM-light, Stanford parser• TREC

– All the data

• NSF Grant IIS-0707801 and IIS-0534345Feedback: Le Zhao ([email protected])