proximity within paragraph: a measure to enhance document retrieval performance

2
PROXIMITY WITHIN PARAGRAPH: A MEASURE TO ENHANCE DOCUMENT RETRIEVAL PERFORMANCE Srisupa Palakvangsa-Na- Ayudhya John A. Keane School of Informatics, The University of Manchester, UK QUERY = Informatics and Manchester and Course How to calculate a score? Informat ics Manchest er Course Block 1 = 3 Informat ics Block 2 = 1/3 Informat ics Course Block 3 = 2/3 44 . 0 3 3 3 3 / 2 3 / 1 3 Score List Returned 1.Doc R: 0.85 2.Doc M: 0.71 3.Doc L: 0.69 4.Doc A: 0.56 1) Assign weights to each queried word 2) Split a document into blocks 3) Calculate a similarity score 4) Order documents by their scores

Upload: erica-hardin

Post on 31-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

3) Calculate a similarity score. 1) Assign weights to each queried word. 4) Order documents by their scores. 2) Split a document into blocks. Block 1. Informatics Manchester Course. = 3. Block 2. Informatics. = 1/3. Block 3. Informatics Course. = 2/3. John A. Keane. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PROXIMITY WITHIN PARAGRAPH: A MEASURE TO ENHANCE DOCUMENT RETRIEVAL PERFORMANCE

PROXIMITY WITHIN PARAGRAPH: A MEASURE TO ENHANCE DOCUMENT

RETRIEVAL PERFORMANCESrisupa Palakvangsa-Na-Ayudhya John A. Keane

School of Informatics, The University of Manchester, UK

QUERY = Informatics and Manchester and Course

How to calculate a score?

Informatics Manchester

Course

Block 1

= 3

Informatics

Block 2

= 1/3

Informatics Course

Block 3

= 2/3

44.0333

3/23/13Score

List Returned

1.Doc R: 0.85

2.Doc M: 0.71

3.Doc L: 0.69

4.Doc A: 0.56

1) Assign weights to each queried word

2) Split a document into blocks

3) Calculate a similarity score

4) Order documents by their scores

Page 2: PROXIMITY WITHIN PARAGRAPH: A MEASURE TO ENHANCE DOCUMENT RETRIEVAL PERFORMANCE

This work was supported by a PhD studentship from the Royal Thai Government

Experimental Setup Compared with Minimum Distance Between Queried Pair (MQP) Employed three test sets based on WT10G data set and 50 queries (Topics 501-550) Used two criteria: the average interpolated precision-recall (AvgPrec) and the average Discounted Cumulative Gain (AvgDCG).

Results

Test Set 1

0

2

4

6

8

10

12

Rank

Av

gD

CG

.

PWP

MQP

Test Set 2

0

2

4

6

8

10

12

Rank

Av

gD

CG

.

PWP

MQP

Test Set 3

0123456789

Rank

Av

gD

CG

.

PWP

MQP

Test Set 1

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Recall

Av

gP

rec

PWP

MQP

Test Set 2

0

10

20

30

40

50

60

0 10 20 30 40 50 60 70 80 90 100

Recall

Av

gP

rec

PWP

MQP

Test Set 3

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Recall

Av

gP

rec

PWP

MQP

Discussion

AdvantagesPWP MQP

tends to obtain higher AvgDCG values considers all logical blocks more fine-grained

Disadvantages

PWP MQP

considers only the number of

non-queried words between unique

queried words pairs difficult to rank documents as the same

score is given to many documents more coarse-grained

multiple-topics issue likely to obtain lower DCG values than

MQP, due to the use of multiple logical

blocks