wxgb6106 information retrieval week 3 retrieval evaluation

22
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

Post on 22-Dec-2015

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

WXGB6106INFORMATION RETRIEVAL

Week 3RETRIEVAL EVALUATION

Page 2: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

INTRODUCTION

Evaluation necessary

Why evaluate ?

What to evaluate?

How to evaluate?

Page 3: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

WHY EVALUATE

Need to know the advantages and disadvantages of using a particular IRS . The user should be be able to decide whether he / she wants to use an IRS based on evaluation results.

The user should also be able to decide whether it is cost-effective to use a particular IRS based on evaluation results.

Page 4: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

WHAT TO EVALUATEWhat can be measured and should reflect the ability

of the IRS to satisfy user needs. Coverage of system – to what extent the IRS includes

relevant material Time lag – average interval between the time the user

query request is made and the time taken to obtain an answer set.

Form of presentation of output Effort involved on part of user in getting answers to his /

her query request. Recall of the IRS - % of relevant materials actually

retrieved in the answer to a query request. Precision of the IRS - % of retrieved material that is

actually relevant.

Page 5: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

HOW TO EVALUATE?

Various methods available.

Page 6: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

EVALUATION2 main processes in IR :

User query request/query request/ information query/query retrieval strategy / search request

Answer set / Hits

Need to know whether the documents retrieved in the answer set fulfills the user query request.Evaluation process known as retrieval performance evaluation.Evaluation is based on 2 main components : Test reference collection Evaluation measure.

Page 7: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

EVALUATIONTest reference collection consists of :

A collection of documents A set of example information requests A set of relevant documents (provided by

specialists) for each information request

2 interrelated measures – RECALL and PRECISION

Page 8: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

Relevance Recall and Precision

Parameters defined : I = information requestR = set of relevant documents|R| = number of documents in this setA = document answer set retrieved by the information request|A| = number of documents in this set|Ra| = number of documents in the intersection of sets R and A

Page 9: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

Recall = fraction of the relevant documents (set R) which have been retrieved |Ra|

R = -----|R|

Precision = fraction of the retrieved documents (set A) which is relevant

|Ra|P = ----- |A|

Page 10: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

Relevant docs in answer set

|Ra|

Relevant docs

|R|

Answer set

|A|

Collection

Precision and Recall for a given example information request

Page 11: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

Recall and precision are expressed as %

Sorted by degree of relevance or ranking.

User will see a ranked list.

Page 12: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

a. 10 documents in an IRS with a collection of 100 documents has been identified by specialists as being relevant to a particular query request - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123

b. A query request was submitted and the following documents were retrieved and ranked according to relevance.

Page 13: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

1. d123*2. d843. d56*4. d65. d86. d9*7. d5118. d1299. d187

10.d25*11.d3812.d4813.d25014.d11315.d3*

Page 14: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

c. Only 5 documents retrieved (d123, d56, d9, d25, d3) are relevant to the query and matches the ones in (a).

Page 15: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

d123 ranked 1st R=1/10 x 100% = 10%P=1/1 x 100% = 100%

d56 ranked 3rd R=2/10 x 100% = 20%P=2/3 x100% = 66%

 d9 ranked 6th R=3/10 x 100% = 30%

P=3/6 x 100% = 50% d25 ranked 10th R=4/10 x 100% = 40%

P=4/10 x 100% = 40% d3 ranked 15th R=5/10 x 100% = 50%

P=5/15 x 100% = 33%

Page 16: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

A = relevant documents = non-relevant documentsC = retrieved documentsĈ = not retrieved documentsN = total number of documents in the system

Relevant Non-relevantRetrieved A C Â CNot retrieved A Ĉ Â Ĉ

Contigency table

Page 17: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

RETRIEVAL PERFORMANCE EVALUATION

Contingency tableN = 100A=10, Ā =90C=15, Ĉ =85

Relevant Non-Relevant

Retrieved 5 15-5=10

Not-Retrieved

10-5=5 100-10-10=80

Recall =5/10X100% = 50% , Precision = 5/15X100% = 33%

Page 18: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

OTHER ALTERNATIVE MEASURES

Harmonic mean – a single measure which combines R & PE measures - a single measure which combines R & P, user specifies whether interested in R or PUser-oriented measures - based on a the user’s interpretation of which documents are relevant and which documents are not relevantExpected search lengthSatisfaction – focuses only on relevant docsFrustration – focuses on non-relevant docs

Page 19: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

REFERENCE COLLECTIONExperimentations on IR done on test collection – eg. of test collection 1. Yearly conference known as TREC – Text Retrieval Conference Dedicated to experimentation with a large test collection of over 1 million documents, testing is time consuming.For each TREC conference, a set of reference experiments designed and research groups use these reference experiments to compare their IRS TREC NIST site – http://trec.nist.gov

Page 20: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION
Page 21: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

REFERENCE COLLECTION

Collection known as TIPSTERTIPSTER/TREC test collectionCollection composed of DocumentsA set of example information requests ot topicsA set of relevant documents for each example information request

Page 22: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

OTHER TEST COLLECTIONS

ADI – documents on information scienceCACM – computer scienceINSPEC – abstracts on electronics, computer and physicsISI – library scienceMedlars – medical articlesdeveloped by E.A. Fox for his PhD thesis at Cornell University, Ithaca, New York in 1883 – Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types – http://www.ncstrl.org