clef 2010 - tie-breaking bias: effect of an uncontrolled parameter on information retrieval...

23
Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy

Upload: guillaume-cabanac

Post on 11-Jul-2015

417 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

Tie-Breaking Bias:Effect of an Uncontrolled Parameteron Information Retrieval Evaluation

Guillaume Cabanac, Gilles Hubert,

Mohand Boughanem, Claude Chrisment

CLEF’10: Conference on Multilingual and Multimodal

Information Access EvaluationSeptember 20-23, Padua, Italy

Page 2: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

2

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Page 3: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

3

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Page 4: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

4

A tale about two TREC participants (1/2)

1. Motivation Tie-breaking bias illustration G. Cabanac et al.

5 relevant documentsTopic 031 “satellite launch contracts”

Chris Ellen

C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5)

one single difference

Why such a huge difference?

unlucky lucky

Page 5: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

5

A tale about two TREC participants (2/2)

1. Motivation Tie-breaking bias illustration G. Cabanac et al.

Chris Ellen

C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5)

one single difference

Only difference: the name of one document

After 15 days of hard work

Page 6: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

6

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Page 7: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

7

Measuring the effectiveness of IRSs

User-centered vs. System-focused [Spärk Jones & Willett, 1997]

Evaluation campaigns 1958 Cranfield UK

1992 TREC Text Retrieval Conference USA

1999 NTCIR NII Test Collection for IR Systems Japan

2001 CLEF Cross-Language Evaluation Forum Europe

“Cranfield” methodology Task

Test collection

Corpus

Topics

Qrels

Measures : MAP, P@X ...

using trec_eval

2. Context & issue Tie-breaking bias G. Cabanac et al.

[Voorhees, 2007]

Page 8: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

8

Runs are reordered prior to their evaluation

Qrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim, run_id

( , 0.8), ( , 0.8), ( , 0.5)

Reordering by trec_evalqid asc, sim desc, docno desc

( , 0.8), ( , 0.8), ( , 0.5)

Effectiveness measure = f (intrinsic_quality, )MAP, P@X, MRR…

2. Context & issue Tie-breaking bias G. Cabanac et al.

Page 9: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

9

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Page 10: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

10

Consequences of run reordering

Measures of effectiveness for an IRS s RR(s,t) 1/rank of the 1st relevant document, for topic t

P(s,t,d) precision at document d, for topic t

AP(s,t) average precision for topic t

MAP(s) mean average precision

Tie-breaking bias

Is the Wall Street Journal collection more relevant than Associated Press?

Problem 1 comparing 2 systems AP(s1, t) vs. AP(s2, t)

Problem 2 comparing 2 topics AP(s, t1) vs. AP(s, t2)

Chris

Ellen

3. Contribution Reordering strategies G. Cabanac et al.

Sensitive to document

rank

Page 11: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

11

Alternative unbiased reordering strategies

Conventional reordering (TREC) Ties sorted Z A qid asc, sim desc, docno desc

Realistic reordering Relevant docs last qid asc, sim desc, rel asc, docno desc

Optimistic reordering Relevant docs first qid asc, sim desc, rel desc, docno desc

3. Contribution Reordering strategies G. Cabanac et al.

ex aequo

ex aequo

Page 12: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

12

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Page 13: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

13

Effect of the tie-breaking bias

Study of 4 TREC tasks

22 editions

1360 runs

Assessing the effect of tie-breaking Proportion of document ties How frequent is the bias?

Effect on measure values

Top 3 observed differences

Observed difference in %

Significance of the observed difference: Student’s t-test (paired, unilateral)

1993 1999 20001998 2002 20041997

routing webfiltering

adhoc

2009

3 GB of data from trec.nist.gov

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 14: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

14

Ties demographics

89.6% of the runs comprise ties

Ties are present all along the runs

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 15: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

15

Proportion of tied documents in submitted runs

On average, 10.6 docs in a tied group of docs On average, 25.2 % of a result-list = tied documents

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 16: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

16

Effect on Reciprocal Rank (RR)

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 17: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

17

Effect on Average Precision (AP)

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 18: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

18

Effect on Mean Average Precision (MAP)

Difference of ranks computed on MAP not significant (Kendall’s t)

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 19: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

19

What we learnt: Beware of tie-breaking for AP

Poor effect on MAP, larger effect on AP

Measure bounds APRealistic APConventionnal APOptimistic

Failure analysis for the ranking process Error bar = element of chance potential for improvement

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

padre1, adhoc’94

Page 20: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

20

Related works in IR evaluation

[Voorhees, 2007]

Topics reliability?[Buckley & Voorhees, 2000] 25[Voorhees & Buckley, 2002] error rate[Voorhees, 2009] n collections

Qrels reliability?[Voorhees, 1998] quality[Al-Maskari et al., 2008] TREC vs. TREC

Measures reliability?[Buckley & Voorhees, 2000] MAP [Sakai, 2008] ‘system bias’[Moffat & Zobel, 2008] new measures

[Raghavan et al., 1989] Precall[McSherry & Najork, 2008] Tied scores

Pooling reliability?[Zobel, 1998] approximation [Sanderson & Joho, 2004] manual[Buckley et al., 2007] size adaptation[Cabanac et al., 2010] tie-breaking bias

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Page 21: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

21

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Page 22: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

22

Conclusions and future works Context: IR evaluation

TREC and other campaigns based on trec_eval

ContributionsMeasure = f (intrinsic_quality, luck) tie-breaking bias

Measure bounds (realistic conventional optimistic)

Study of the tie-breaking bias effect

(conventional, realistic) for RR, AP and MAP

Strong correlation, yet significant difference

No difference on system rankings (based on MAP)

Future works Study of other / more recent evaluation campaigns

Reordering-free measures

Finer grained analyses: finding vs. ranking

Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al.

Page 23: CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

Thank you

CLEF’10: Conference on Multilingual and Multimodal

Information Access EvaluationSeptember 20-23, Padua, Italy