investigating per topic upper bound for session search...

InvestigatingperTopicUpperBoundforSessionSearchEvaluation

Zhiwen Tang

DepartmentofComputerScienceGeorgetownUniversity

GraceHuiYang

zt79@georgetown.edu huiyang@cs.georgetown.edu

SessionSearch

• Multiplerunsofsearch

• Complexinformationneed

• Evaluationneedstoconsiderthewholeprocess

• Usefulinformationthattheusergains• Rawrelevancescore

• Discounting• Basedondocumentranking• Basedondiversity

• User’sefforts• Timespent• Lengthsofdocumentsbeingviewed

EvaluationofSessionSearch

• Mostsessionsearchmetricsconsiderallthosefactorsintooneoverwhelminglycomplexformula

• Theoptimalvalue,akaupperbound,ofthosemetricshighlyvariesondifferentsearchtopics

• InCranfield-likesettings(e.g.TREC),thedifferenceisoftenignored

TheProblem

• Twosystems

• Allthesystemsreturns5docsperround

• Eachsystemconductsoneroundofinteraction

• Metric:• CubeTest:

• Luo,Jiyun,etal."Thewaterfillingmodelandthecubetest:multi-dimensionalevaluationforprofessionalsearch." CIKM,2013.

Toyexample

𝐶𝑇 =∑ ∑ ∑ 𝜃&�

& 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|4=6 >

∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >

ToyexampleDoc Relevancescoreregardingtopic-subtopic

1-1 1-2 2-1 2-2 2-3 2-4 2-5

d1 1 4

d2 3 4

System Topic1 CT-topic1

Topic2 CT-topic2

CT-avg NormalizedCT-avg

System1 d1, irrel,irrel,irrel,irrel 1 d1,d3,d4,d5,irrel 16 8.5 0.596

System2 d2, irrel,irrel,irrel,irrel 3 d1,d3,d4,d5,irrel 14 8.5 0.787

Optimal d1, d2,irrel,irrel,irrel 4 d1, d2,d3,d4,d5 17

• Whatistheoptimalmetricvaluethatasystemcanachieve?

• Howtogettheupperboundforeachsearchtopic?

• Howdoesitaffecttheevaluationconclusions?• Varianceofdifferenttopics

• Normalization

ResearchQuestions

𝑠𝑐𝑜𝑟𝑒C = D𝑟𝑎𝑤_𝑠𝑐𝑜𝑟𝑒 𝑡𝑜𝑝𝑖𝑐, 𝐴 − 𝑙𝑜𝑤𝑒𝑟_𝑏𝑜𝑢𝑛𝑑(𝑡𝑜𝑝𝑖𝑐)𝑢𝑝𝑝𝑒𝑟_𝑏𝑜𝑢𝑛𝑑 𝑡𝑜𝑝𝑖𝑐 − 𝑙𝑜𝑤𝑒𝑟_𝑏𝑜𝑢𝑛𝑑(𝑡𝑜𝑝𝑖𝑐)

• Session-DCG(sDCG)

• Järvelin,Kalervo,etal."Discountedcumulatedgainbasedevaluationofmultiple-queryIRsessions." AdvancesinInformationRetrieval (2008):4-15.

• CubeTest(CT)

• Luo,Jiyun,etal."Thewaterfillingmodelandthecubetest:multi-dimensionalevaluationforprofessionalsearch." CIKM,2013.

• ExpectedUtility(EU)

• Yang,Yiming,andAbhimanyuLad."Modelingexpectedutilityofmulti-sessioninformationdistillation." ConferenceontheTheoryofInformationRetrieval.Springer,Berlin,Heidelberg,2009.

Sessionsearchmetrics

𝐸𝑈 =D𝑃 𝜔 D D 𝜃& ∗ 𝛾1 &,3,456�

&∈V<,W

− 𝑎 ∗ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)�

3,4 ∈X

𝐶𝑇 =∑ ∑ ∑ 𝜃&�

& 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|4=6 >

∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >

𝑠𝐷𝐶𝐺 =D D𝑟𝑒𝑙(𝑖, 𝑗)

1 + log` 𝑗 ∗ 1 + log`a 𝑖

|93:;<|

• Gain• Theamountofusefulinformationausercanlearnfromadocument

• Cost• Theefforttheuserspendsonthatdocument

• Rankingdiscounts:• Basedontheoriginalrankingpositionofadocument• Assumption:theloweradocumentranks,thelesslikelytheuserwillreadit

• Noveltydiscounts:• Measuresuser’sknowledgecoverage,ageneralformofrankingdiscount• Assumption:Ifadocumentisrelatedtoasubtopic/nuggetthattheuserreadbefore,thenitcontributeslessnovelinformationaboutthissubtopic/nugget

Deconstructthemetrics

• sDCG

• CubeTest

• ExpectedUtility

CostGain Rank_discount Novelty_discount

1 + log` 𝑗 ∗ 1 + log`a 𝑖

|93:;<|

𝐶𝑇 =∑ ∑ ∑ 𝜃&�

& 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|4=6 >

∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >

𝐸𝑈 =D𝑃 𝜔 D D 𝜃& ∗ 𝛾1 &,3,456�

&∈V<,W

− 𝑎 ∗ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)�

3,4 ∈X

• sDCG

• CubeTest

• ExpectedUtility

𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 =D𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V

∗ 𝑔𝑎𝑖𝑛V

𝐶𝑇 =𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛

𝐶𝑜𝑠𝑡=∑ ∑ 𝑛𝑜𝑣𝑒𝑙𝑡𝑦_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V,& ∗ 𝑔𝑎𝑖𝑛V,&�

∑ 𝑐𝑜𝑠𝑡V�V

𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 − 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡

= DD𝑛𝑜𝑣𝑒𝑙𝑡𝑦_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V,& ∗ 𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V ∗ 𝑔𝑎𝑖𝑛V,&

−D𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V ∗ 𝑐𝑜𝑠𝑡V�

• Factorsconsideredinthemetrics:• Gain,Cost,Rankingdiscount,Noveltydiscount

• Wearedealingwithrankings• Howtomaximize/minimizethediscountedsum?

OptimizationMethod

• RearrangementInequality

• InIR,ProbabilityRankingPrinciple[4]• theoveralleffectivenessofanIRsystemcanbeachievedthebestbyrankingthedocumentsbytheirusefulnessindescendingorder

Oursolution

𝑥6𝑦1 + 𝑥g𝑦156 +…+ 𝑥1𝑦6 ≤ 𝑥j 6 𝑦6 + 𝑥j g 𝑦g +…+ 𝑥j 1 𝑦1 ≤ 𝑥6𝑦6 + 𝑥g𝑦g + ⋯+ 𝑥1𝑦1𝑓𝑜𝑟𝑥6 ≤ 𝑥g … ≤ 𝑥1𝑎𝑛𝑑𝑦6 ≤ 𝑦g … ≤ 𝑦1

Oursolution

• Butinourproblem:• Multiplerankinglistsarerequiredtobeoptimizedsimultaneously• E.g.Maximizethegainonallthesubtopicssimultaneously

• How?• Optimizeeachrequiredrankinglistindependentlytoapproximatetheoverallbound

• Onlyonerankinglistneedstobeoptimized

𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 =D𝑟𝑎𝑛𝑘_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V

∗ 𝑔𝑎𝑖𝑛V

𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒D D𝑟𝑒𝑙(𝑖, 𝑗)

1 + log` 𝑗 ∗ 1 + log`a 𝑖

|93:;<|

1 + log` 𝑗 ∗ 1 + log`a 𝑖

|93:;<|

• #(C)+1rankinglistsneedtobeoptimized

CubeTest(CT)

𝐶𝑇 =∑ 𝜃& ∑ ∑ 𝑟𝑒𝑙 𝑖, 𝑗 ∗ 𝛾1(&,3,456)|93:;<|

4=6 >3=6

∑ ∑ 𝑐𝑜𝑠𝑡(𝑖, 𝑗)|93:;<|4=6 >

𝐶𝑜𝑠𝑡=∑ ∑ 𝑛𝑜𝑣𝑒𝑙𝑡𝑦_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡V,& ∗ 𝑔𝑎𝑖𝑛V,&�

∑ 𝑐𝑜𝑠𝑡V�V

𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒D D 𝑟𝑒𝑙& 𝑖, 𝑗 ∗ 𝛾∑ 93:;o p 456<qrosr ∀𝑐

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒D D 𝑐𝑜𝑠𝑡(𝑖, 𝑗)93:;<

• AnapproximationofEU[2]

• 𝟂:thesubsetofdocumentstheuserchecked• #(C)+1rankinglistsneedtobeoptimized

ExpectedUtility(EU)

𝐸𝑈 = 1

1 − 𝛾 D𝜃& 1 − 𝛾∑ v X 1 &,X�

− 𝑎D𝑃 𝜔 𝑙𝑒𝑛(𝜔)�

𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒D D 𝑟𝑒𝑙& 𝑖, 𝑗 ∗ 1 − 𝑝 456∀𝑐93:;<

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒D D 𝑐𝑜𝑠𝑡 𝑖, 𝑗 1 − 𝑝 45693:;<

• Dataset:• SubmittedrunsofTREC2016DynamicDomaintrack• SomestatisticsofTREC2016DDcorpus:

• #Topics=53• #Subtopics=242• #relevantdocs=14597

Experiments

Boundsondifferenttopics

𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛

𝐶𝑜𝑠𝑡

𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛−𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡

• Thedifferenceoftheoptimalvalueametricwouldproducefordifferenttopicsislargeandshouldnotbeignored.

Conclusion1

NormalizationEffect𝑠𝐷𝐶𝐺 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛

NormalizationEffect𝐶𝑇 =

𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛𝐶𝑜𝑠𝑡

NormalizationEffect𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 − 𝑎 ∗ 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡 𝑎 = 0.01

NormalizationEffect𝐸𝑈 = 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐺𝑎𝑖𝑛 − 𝑎 ∗ 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝐶𝑜𝑠𝑡 𝑎 = 0.001

• Usingtheboundsfornormalizationbringsinmorefairnessintoevaluation

Conclusion2

• Deconstructionofsessionsearchmetrics

• Computingtheupperboundoneachsearchtopic

• Hugevarianceontheupperboundsamongtopics

• Normalizationprovidesanotherviewpoint

Summary

• Canthisboundhelpusdesignabettersessionsearchsystem?

• Lazyuser,smartsystem

• Ifthesystemhascompletedthefirst𝑘 iterationsandknowsitsactualscore

• Ifitalsoknowstheupperboundscorefor𝑘+1iterations

• Stoporcontinue?

Discussion

• Usedinthisyear’sTREC-DDevaluation• https://github.com/trec-dd/trec-dd-jig• http://trec-dd.org/

Resource

Thankyou!

Reference

• [1]Kalervo Järvelin,SusanLPrice,LoisMLDelcambre,andMarianneLykkeNielsen.2008. Discountedcumulatedgainbasedevaluationofmultiple-queryIRsessions. InEuropeanConferenceonInformationRetrieval.Springer,4-15.

• [2]Jiyun Luo,ChristopherWing,HuiYang,andMartiHearst.2013. Thewaterllingmodelandthecubetest:multi-dimensionalevaluationforprofessionalsearch.In Proceedingsofthe22ndACMinternationalconferenceonInformation&KnowledgeManagement.ACM,709-714.• [3]Yiming YangandAbhimanyuLad.2009. Modelingexpectedutilityofmulti-sessioninformationdistillation. InConferenceontheTheoryofInformationRetrieval.Springer,164-175.• [4]Robertson,StephenE."TheprobabilityrankingprincipleinIR." Journalofdocumentation 33.4(1977):294-304.

investigating per topic upper bound for session search...

Documents

the water filling model and the cube test: multi-dimensional...

cube printing with a cube

3d cube building cube by cube powerpoint ppt templates

characteristic rubik’s cube rubix cube · 2014. 8....

bound – bound transitions. bound bound transitions2...

cube licensing - cisco · cube licensing...

3d cube building cube by cube powerpoint ppt slides

series 200 filling machines - hunter filling

installatievoorschrift basic cube en hp (cool) cube

bound to stay bound books

fluid filling fas - dürr global website - dürr main...

u-cube...

filling your dreams - filling systems | enoberg

x cube ii x cube 160w user manual

rubik’s cube roboter - lqc€¦ ·...

web cube and news cube tips

nim cube unfoldign the innovation cube

7 cube and cube roots.pdf

rubik’s cube flags of the world - · pdf filerubik’s...

series 1500 filling machines - hunter filling