mining dependency relations for query expansion in passage retrieval renxu sun, chai-huat ong,...

19
Mining Dependency Mining Dependency Relations for Query Relations for Query Expansion in Passage Expansion in Passage Retrieval Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore National University of Singapore SIGIR2006 SIGIR2006

Upload: kenneth-wright

Post on 19-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

Mining Dependency Mining Dependency Relations for Query Relations for Query

Expansion in Passage Expansion in Passage RetrievalRetrieval

Renxu Sun, Chai-Huat Ong, Tat-Seng ChuaRenxu Sun, Chai-Huat Ong, Tat-Seng ChuaNational University of SingaporeNational University of Singapore

SIGIR2006SIGIR2006

Page 2: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

22

IntroductionIntroduction

Query expansionQuery expansion (QE) is a method for (QE) is a method for improving the effectiveness of IRimproving the effectiveness of IR– by providing additional contextual by providing additional contextual

information to the original queriesinformation to the original queries

Traditional passage retrieval algorithms Traditional passage retrieval algorithms perform a perform a density baseddensity based weighting of query weighting of query termsterms– prefer passages containing query terms that prefer passages containing query terms that

are close together are close together

Page 3: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

33

IntroductionIntroduction

Local Context AnalysisLocal Context Analysis (LCA) [Croft, 1996] (LCA) [Croft, 1996]– A common QE technique based on term co-occuA common QE technique based on term co-occu

rrence statisticsrrence statistics– utilizes only statistical information instead of seutilizes only statistical information instead of se

mantic informationmantic information– unable to differentiate between noisy and good unable to differentiate between noisy and good

quality expansion termsquality expansion terms

Page 4: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

44

IntroductionIntroduction

[Cui et al., 2005][Cui et al., 2005]– The use of a The use of a fuzzy dependency relation matchingfuzzy dependency relation matching

method for passage retrievalmethod for passage retrieval

– significant improvement in MRR over the significant improvement in MRR over the density based passage retrieval systemsdensity based passage retrieval systems

– This work points towards the importance of This work points towards the importance of performing syntactical analysisperforming syntactical analysis

– The longer queries benefit more from this The longer queries benefit more from this methodmethod

Query expansion is needed for short queriesQuery expansion is needed for short queries

Page 5: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

55

IntroductionIntroduction

The main contribution of this paper is The main contribution of this paper is employing a relation based model to employing a relation based model to perform:perform:– contextual term selectioncontextual term selection to enhance density to enhance density

based passage retrievalbased passage retrieval– relation extractionrelation extraction to enhance the fuzzy to enhance the fuzzy

dependency relation matching approachdependency relation matching approach

To make the expansion process more To make the expansion process more robust, it extracts relations and terms robust, it extracts relations and terms from external corpus (web).from external corpus (web).

Page 6: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

66

Query Expansion Query Expansion Based on Dependency Based on Dependency RelationRelation

Fig. Framework Fig. Framework ofof Relation Based Query Relation Based Query Expansion Expansion

Page 7: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

77

Dependency Relation Dependency Relation Paths from Web Paths from Web SnippetsSnippets The Web is considered as a parallel corpus:The Web is considered as a parallel corpus:

1.1. Send the queries to Google and collect the top Send the queries to Google and collect the top kk snippets snippets

2.2. Each sentence is considered as a passage, and Each sentence is considered as a passage, and each snippet contains 2 sentences on average each snippet contains 2 sentences on average ((kk=100, similar to LCA [Croft, 1996])=100, similar to LCA [Croft, 1996])

3.3. Use Minipar, a dependency grammar parser, to Use Minipar, a dependency grammar parser, to parse the passages.parse the passages.

Page 8: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

88

Examples of Parse Examples of Parse TreeTree

Fig. The parse trees of the sample question and sentence, Fig. The parse trees of the sample question and sentence, <When, wha, head, purchased> is a relation path. The <When, wha, head, purchased> is a relation path. The directions of relations are ignored in experiments.directions of relations are ignored in experiments.

Page 9: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

99

Term Expansion for Term Expansion for Density Based Retrieval Density Based Retrieval System (1/2)System (1/2) Ranking candidate expanded termsRanking candidate expanded terms

– A variant formula of that in LCAA variant formula of that in LCA – Global importanceGlobal importance

IDF of the expanded termIDF of the expanded term

– Local importanceLocal importance The relation path linking to the query termThe relation path linking to the query term

Adding the top Adding the top kk terms to the original terms to the original query with weight (1-0.9*query with weight (1-0.9*i i //kk))

Page 10: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1010

Term Expansion for Density BTerm Expansion for Density Based Retrieval System (2/2)ased Retrieval System (2/2)

Qt

idf

n

j

ikT

k

i

t

k

i

N

jtTscorepathidf

QTScore )log

)),,(_(log

(),(10

10

1

where Tk = the term to be ranked;idfTk=max(1.0, log10(N / NTk));idfti=max(1.0, log10(N / Nti));

pj = the jth passage in the passage set P;

score(Reli) = the score of individual relation which is obtained through training

δ is set to 0.1 to avoid zero values

),,(

)(),,(_

jtTpathRelPtPT

ik

ki

jjk

RelscorejtTscorepath

Page 11: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1111

Relation Based Relation Based Retrieval Method Retrieval Method (RBM)(RBM) RBM is used to perform passage re-ranking based on tRBM is used to perform passage re-ranking based on t

he initial retrieval result obtained by the density based he initial retrieval result obtained by the density based method (DBM).method (DBM).

The similarity between passage S and Q is computed bThe similarity between passage S and Q is computed by finding all possible relation path pairs (y finding all possible relation path pairs (PPSS, , PPQQ) from S ) from S and Q that have and Q that have the same starting and ending nodesthe same starting and ending nodes..

The translation probability The translation probability ProbProb((PPSS||PPQQ) is the sum over ) is the sum over all possible alignments:all possible alignments:

m

a

m

a

n

i

Qa

Sit

nQS

l n

iRelRelPm

PPProb1 1 1

)()( )|()|(

Page 12: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1212

Relation Path Relation Path ExpansionExpansion A technique to be used on top of the fuzzy relation baseA technique to be used on top of the fuzzy relation base

d retrieval [Cui, 2005]d retrieval [Cui, 2005] The path expansion technique extracts additional relatiThe path expansion technique extracts additional relati

on paths linking the expanded terms with original query on paths linking the expanded terms with original query terms.terms.

Select the path associated with Select the path associated with TTkk that has the maximu that has the maximum m path_scorepath_score((TTkk,,tt,,jj)) to be expanded, weighted by (1-0.9* to be expanded, weighted by (1-0.9*i i //kk))

}},,(_{max),,(_|),,({

)(_

1

jtTscorepathjtTscorepathjtTpath

Texpath

k

njQt

kk

k

Page 13: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1313

Model TrainingModel Training

Retrieve the top 100 snippets from Google for each Retrieve the top 100 snippets from Google for each QQi i .. A path <A path <Start_NodeStart_Node, Rel, Rel11, …, Rel, …, Relmm, , End_NodeEnd_Node> in the snippe> in the snippe

ts is “relevant” ifts is “relevant” if– The relevant paths are those inferring a useful term to the queThe relevant paths are those inferring a useful term to the que

stion.stion. Employ unigram language model to train the weight of eacEmploy unigram language model to train the weight of eac

h relation:h relation:

i i QNodeEndANodeStart __ and

))log((/)1log(

)(

))/((1)(

1

__

1

__

NCC

Relscore

NCCRelP

Ni

pathrelevantRelpathrelevantRel

i

Ni

pathrelevantRelpathrelevantReli

ii

ii

Page 14: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1414

EvaluationsEvaluations

The evaluations aim to verify three The evaluations aim to verify three hypotheseshypotheses1.1. It’s effective to incorporate dependency relation based It’s effective to incorporate dependency relation based

query expansion technique to select high quality terms in query expansion technique to select high quality terms in a density based method.a density based method.

2.2. The use of dependency relation based query expansion The use of dependency relation based query expansion technique to extract relation paths further improves the technique to extract relation paths further improves the precisionprecision of passage ranking when integrated with fuzzy of passage ranking when integrated with fuzzy relation matching method.relation matching method.

3.3. As short queries with fewer key terms are likely to have As short queries with fewer key terms are likely to have word mismatch problemsword mismatch problems

Page 15: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1515

Experiment SetupExperiment Setup

Training dataTraining data– 10,255 factoid QA pairs from TREC-8 and TREC-9 QA tas10,255 factoid QA pairs from TREC-8 and TREC-9 QA tas

ksks– The top 100 snippets from Google for each questionThe top 100 snippets from Google for each question– 8,892 relevant paths extracted8,892 relevant paths extracted

Testing dataTesting data– The AQUAINT news corpusThe AQUAINT news corpus– 324 factoid questions in TREC-12 QA task324 factoid questions in TREC-12 QA task

Excluding 30 questions with NIL answers and 59 questions Excluding 30 questions with NIL answers and 59 questions that do not have any ground truth passagesthat do not have any ground truth passages

5 Comparison systems5 Comparison systems– DBS, DBS+LCA, DBS+DRQET, RBS, RBS+DRQERDBS, DBS+LCA, DBS+DRQET, RBS, RBS+DRQER

Page 16: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1616

Experiment Result-1Experiment Result-1

Table 1. Overall performance comparison. All improvements are Table 1. Overall performance comparison. All improvements are significant.significant.

Page 17: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1717

Experiment Result-2Experiment Result-2

Fig. MRR before and after query expansion vs. number Fig. MRR before and after query expansion vs. number of non-trivial question terms. of non-trivial question terms.

Page 18: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1818

Experiment Result-3Experiment Result-3

Testing dataset 2: 356 short queries in TREC-11 and TREC-Testing dataset 2: 356 short queries in TREC-11 and TREC-12 QA tasks12 QA tasks

The improvement is more significant than that in table 1.The improvement is more significant than that in table 1.

DBS+DRQET performs better than RBS.DBS+DRQET performs better than RBS.

Page 19: Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

1919

Conclusion and Future Conclusion and Future WorkWork Two dependency relation based query expansion Two dependency relation based query expansion

techniques, DRQET and DRQER, are presented.techniques, DRQET and DRQER, are presented.

The experimental results show that RBS+DRQER The experimental results show that RBS+DRQER performs best among the 5 systems.performs best among the 5 systems.

We also studied the relationship between query We also studied the relationship between query lengths and improvements by query expansion.lengths and improvements by query expansion.

Directions for future work: (1) explore the use of Directions for future work: (1) explore the use of different models and their combinations for different models and their combinations for relation based query expansion; (2) conduct relation based query expansion; (2) conduct detailed analysis on the performance of detailed analysis on the performance of RBS+DRQER on different types of queries.RBS+DRQER on different types of queries.