efficiency/effectiveness trade-offs in learning to...

Efficiency/EffectivenessTrade-offsinLearningtoRank

Tutorial@ICTIR2017http://learningtorank.isti.cnr.it/

ClaudioLuccheseCa’Foscari UniversityofVenice

Venice,Italy

FrancoMariaNardiniHPCLab,ISTI-CNR

Pisa,Italy

l a b o r a t o r y

TheRankingProblem

RankingisatthecoreofseveralIRTasks:

• DocumentRankinginWebSearch

• AdsRankinginWebAdvertising

• Querysuggestion&completion

• ProductRecommendation

• SongRecommendation

• …

LuccheseC.,NardiniF.M.Efficiency/EffectivenessTrade-offsinLearningtoRank 2

TheRankingProblem

Definition:Givenaqueryq andasetofobjects/documentsD,torankD soastomaximizeusers’satisfactionQ.

LuccheseC.,NardiniF.M.Efficiency/EffectivenessTrade-offsinLearningtoRank 3[KDF+13]Kohavi,R.,Deng,A.,Frasca,B.,Walker,T.,Xu,Y.,&Pohlmann,N.(2013,August).Onlinecontrolledexperimentsatlargescale.In Proceedingsofthe19thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining (pp.1168-1176).ACM.

Goal#1:Effectiveness• MaximizeQ !

• buthowtomeasureQ?

Goal#2:Efficiency• Makesuretherankingprocessisfeasibleandnottooexpensive• InBing...“every100msecimprovesrevenueby0.6%.Everymillisecondcounts.”[KDF+13]

Agenda

1. IntroductiontoLearningtoRank(LtR)• Background,algorithms,sourcesofcostinLtR,multi-stageranking

2. DealingwiththeEfficiency/Effectivenesstrade-off• FeatureSelection,EnhancedLearning,Approximatescoring,FastScoring

3. Hands-onI• Software,dataandpubliclyavailabletools• TraversingRegressionForests,SoA toolsandanalysis

4. Hands-onII• Trainingmodels,Pruningstrategies,Efficientscoring

Attheendofthedayyou’llbeabletotrainahighqualityrankingmodel,andtoexploitSoAtoolsandtechniquestoreduceitscomputationalcostupto18x!


DocumentRepresentationsandRanking

DocumentRepresentations

Adocumentisamulti-setofwords

Adocumentmayhavefields,itcanbesplitintozones,itcanbeenrichedwithexternaltextdata(e.g.,anchors)

Additionalinformationmaybeuseful,suchasIn-Links,Out-Links,PageRank,#clicks,sociallinks,etc.

HundredsignalsinpublicLtR Datasets


RankingFunctions

Term-weighting[SJ72]VectorSpaceModel[SB88]

BM25[JWR00],BM25f[RZT04]LanguageModeling[PC98]

LinearCombinationoffeatures[MC07]

Howtocombinehundredsofsignals?

[SJ72] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21, 1972.[SB88]GerardSaltonandChristopherBuckley.Term-weightingapproachesinautomatictextretrieval.Informationprocessing&management,24(5):513–523,1988.[JWR00]KSparck Jones,SteveWalker,andStephenE.Robertson.Aprobabilisticmodelofinformationretrieval:developmentandcomparativeexperiments.Informationprocessing&management,36(6):809–840,2000[RZT04]StephenRobertson,HugoZaragoza,andMichaelTaylor.Simplebm25extensiontomultipleweightedfields.InProceedingsofthethirteenthACMinternationalconferenceonInformationandknowledgemanagement,pages42–49.ACM,2004.[PC98]JayMPonteandWBruceCroft.Alanguagemodelingapproachtoinformationretrieval.InProceedingsofthe21stannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages275–281.ACM,1998.[MC07]DonaldMetzlerandWBruceCroft.Linearfeature-basedmodelsforinformationretrieval.InformationRetrieval,10(3):257–274,2007.

RankingasaSupervisedLearningTask


d1

y1

Training Instance

Machine Learning Algo(NeuralNet, SVM, Decision-Tree)

q d2

y2

d3

y3

di

yi

…

…

…

…

LossFunction

RankingModel

Query/DocumentRepresentationUsefulsignals

• LinkAnalysis[H+00]• Termproximity[RS03]• Queryclassification[BSD10]• Queryintentmining[JLN16,LOP+13]• Findingentitiesdocuments[MW08]andinqueries[BOM15]

• Documentrecency [DZK+10]• Distributedrepresentationsofwordsandtheircompositionality[MSC+13]

• Convolutionalneuralnetworks[SHG+14]

• ….


• ExplicitFeedback• ThousandsofSearchQualityRaters

• Absolutevs.RelativeJudgments[CBCD08]

• ImplicitFeedback• clicks/querychains[JGP+05,Joa02,RJ05]• De-biasing/clickmodels[JSS17]

• Minimizeannotationcost• ActiveLearning[LCZ+10]• DeepversusShallowlabelling[YR09]

RelevanceLabelsGeneration

d

q y

EvaluationMeasuresforRanking

Manyareintheform:

• (N)DCG[JK00]:• RBP[MZ08]:• ERR[CMZG09]:

DotheymatchUsersatisfaction?• ERRcorrelatesbetterwithusersatisfaction(clicksandeditorials)[CMZG09]

• ResultsInterleavingtocomparetworankings[CJRY12]• “majorrevisionsofthewebsearchrankers[Bing] ...Thedifferencesbetweentheserankersinvolvechangesofoverhalfapercentagepoint,inabsoluteterms,ofNDCG”


Gain(d) = 2

y � 1 Discount(r) =1

log(r + 1)

Q@k =X

ranks r=1...k

Gain(dr) · Discount(r)

Gain(d) = I(y) Discount(r) = (1� p)pr�1

Gain(d) = Ri

i�1Y

j=1

(1�Rj) with Ri = (2y � 1)/2ymax

Discount(r) = 1/r

[JK00]Kalervo Jarvelin andJaana Kekalainen.Ir evaluationmethodsforretrievinghighlyrelevantdocuments.InProceedingsofthe23rdannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages41–48.ACM,2000.[MZ08]AlistairMoffatandJustinZobel.Rank-biasedprecisionformeasurementofretrievaleffectiveness.ACMTransactionsonInformationSystems(TOIS),27(1):2,2008.[CMZG09]OlivierChapelle,DonaldMetlzer,Ya Zhang,andPierreGrinspan.Expectedreciprocalrankforgradedrelevance.InProceedingsofthe18thACMconferenceonInformationandknowledgemanagement,pages621–630.ACM,2009.[CJRY12]OlivierChapelle,ThorstenJoachims,FilipRadlinski,andYisong Yue.Large-scalevalidationandanalysisofinterleavedsearchevaluation.ACMTransactionsonInformationSystems(TOIS),30(1):6,2012.

Isitaneasyordifficulttask?

Gradientdescentcannotbeapplieddirectly

Rank-basedmeasures(NDCG,ERR,MAP,…)dependondocumentssortedorder

• gradientiseither0(sortedorderdidnotchange)orundefined (discontinuity)

Solution:weneedaproxyLossfunction• itshouldbedifferentiable• andwithasimilarbehavioroftheoriginalcostfunction

di document score(model parameters)

ND

CG

@k

d0

d1

d2

d3


di document score(model parameters)

Pro

xy Q

ualit

y Fu

nctio

n

d0

d1

d2

d3

Point-WiseAlgorithms

di yi

Training InstanceEachdocumentisconsideredindependently fromtheothers

• Noinformationaboutothercandidatesforthesamequeryisusedattrainingtime

Adifferentcost-functionisoptimized• Severalapproaches:Regression,Multi-ClassClassification,Ordinalregression,… [Liu11]

AmongRegression-Based:GradientBoostingRegressionTrees [Fri01]

• MeanSquaredError isminimized

LuccheseC.,NardiniF.M.Efficiency/EffectivenessTrade-offsinLearningtoRank 10[Liu11] Tie-Yan Liu. Learning to rank for information retrieval, 2011. Springer.[Fri01]JeromeHFriedman.Greedyfunctionapproximation:agradientboostingmachine.Annalsofstatistics,pages1189–1232,2001.

Training Algo: GBRTLoss Function: MSE

…

GradientBoostingRegressionTrees

Iterativealgorithm:

Eachfi isregardedasastepinthebestoptimizationdirection,i.e.,asteepestdescentstep:

GivenL = MSE/2:

Gradientgi isapproximatedbyaRegressionTreeti

F (d) =X

i

fi(d)

fi(d) = �⇢i gi(d) � gi(d) = �@L(y, f(d))

@f(d)

�

f=P

j<i fj

�@⇥12MSE(y, f(d))

⇤

@f(d)= �

@⇥12

P(y � f(d))2

⇤

@f(d)= y � f(d)

WeakLearner

negativegradientbyline-search

pseudo-response

d

pre

dic

ted

doc

umen

t sco

re

f 1(d)

y

t1

y-f 1(

d)

f 2(d)

t2

f 3(d)

t3

ErrorY-F(d)

Pair-wiseAlgorithms:RankNet[BSR+05]DocumentsareconsideredinpairsEstimatedprobabilitythatdi isbetterthandj is:

LetQij bethetrueprobability,theCrossEntropyLoss is:

Weconsideronlypairswheredi isbetterthandj ,ie.,yi >yj :

Thisisdifferentiable: usedtotrainaNeuralNetworkwithback-propagation.

Otherapproaches:Ranking-SVM[Joa02],RankBoost[FISS03],…


[BSR+05]ChrisBurges,TalShaked,ErinRenshaw,AriLazier,MattDeeds,NicoleHamilton,andGregHullender.Learningtorankusinggradientdescent.InProceedingsofthe22ndinternationalconferenceonMachinelearning,pages89–96.ACM,2005.[Joa02]ThorstenJoachims.Optimizingsearchenginesusingclickthrough data.InProceedingsoftheeighthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages133–142.ACM,2002.[FISS03]Yoav Freund,RajIyer,RobertESchapire,andYoram Singer.Anefficientboostingalgorithmforcombiningpreferences.Journalofmachinelearningresearch,4(Nov):933–969,2003.

di

Training Instance

Training Algo: ANNLoss: Cross Entropy

dj

with yi>yj

Pij

=eoij

1 + eoijoij = F(di)-F(dj)

Cij = �Qij logPij � (1�Qij) log(1� Pij)

Cij

= log(1 + e�oij)

Ifoij → +∞(i.e.,correctlyordered)

Cij → 0

Ifoij → -∞(i.e.,mis-ordered)

Cij → +∞

Pair-wiseAlgorithms

LuccheseC.,NardiniF.M.Efficiency/EffectivenessTrade-offsinLearningtoRank 13[CQL+07]Zhe Cao,TaoQin,Tie-YanLiu,Ming-FengTsai,andHangLi.Learningtorank:frompairwiseapproachtolistwise approach.InProceedingsofthe24thinternationalconferenceonMachinelearning,pages129–136.ACM,2007.

RankNet performsbetter thanotherpairwisealgorithms

RankNet costisnotnicelycorrelatedwith

NDCGquality

List-wiseAlgorithms:LambdaMart[Bur10]

Training Algo: GBRTLoss Function: MSE

di 𝜆i

Training Instance

q: …d1 d2 d3 dj d|q|

Recall:GBRTrequiresagradientgi foreverydi

First:estimatethegradientcomparingtodj,withyi>yj :

Then:estimatethegradientcomparingtoeveryotherdj forq

LuccheseC.,NardiniF.M.Efficiency/EffectivenessTrade-offsinLearningtoRank 14[Bur10]ChristopherJ.C.Burges.Fromranknet tolambdarank tolambdamart:Anoverview.TechnicalReportMSR-TR-2010-82,June2010.

…

Δ Qualityafterswappingdi withdj

derivativeofthenegativeRankNet cost

Ifoij → +∞ (i.e.,correctlyordered)

𝜆ij → 0

Ifoij → -∞ (i.e.,mis-ordered)𝜆ij → |Δ NDCG|

gi = �i =X

yi>yj

�ij �X

yi<yj

�ij

�ij

=1

1 + eoij|�NDCG| = ��

ji

Topdocumentsaremorerelevant!

List-wiseAlgorithms:someresults

• NDCG@10onpublicLtR Datasets

Otherapproaches:ListNet/ListMLE[CQL+07],ApproximateRank[QLL10],SVMAP[YFRJ07],RankGP[YLKY07],others...


Algorithm MSN10K Y!S1 Y!S2 Istella-SRankingSVM 0.4012 0.7238 0.7306 N/A

GBRT 0.4602 0.7555 0.7620 0.7313LambdaMART 0.4618 0.7529 0.7531 0.7537

[CQL+07]Zhe Cao,TaoQin,Tie-YanLiu,Ming-FengTsai,andHangLi.Learningtorank:frompairwiseapproachtolistwise approach.InProceedingsofthe24thinternationalconferenceonMachinelearning,pages129–136.ACM,2007.[QLL10]TaoQin,Tie-YanLiu,andHangLi.Ageneralapproximationframeworkfordirectoptimizationofinformationretrievalmeasures.Informationretrieval,13(4):375–397,2010.[YFRJ08]Yisong Yue,ThomasFinley,FilipRadlinski,andThorstenJoachims.Asupportvectormethodforoptimizingaverageprecision.InProceedingsofthe30thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages271– 278.ACM,2007.[YLKY07]Jen-YuanYeh,Jung-YiLin,Hao-RenKe,andWei-PangYang.Learningtorankforinformationretrievalusinggeneticprogramming.InProceedingsofSIGIR2007WorkshoponLearningtoRankforInformationRetrieval(LR4IR2007),2007.

LearningtoRankAlgorithms• NewapproachestooptimizeIR

measures:• DirectRank[XLL+08],LambdaMart[Bur10],

BLMart[GCL11],SSLambdaMART[SY11],CoList[GY14],LogisticRank[YHT+16],…See[Liu11][TBH15].

• DeepLearning toimprovequery-documentmatching:• Conv.DNN[SM15],DSSM[HHG+13],

Dual-Embedding[MNCC16],LocalandDistributedrepr.[MDC17],WeakSupervision[DZS+17],NeuralClickModel[BMdRS16],…

• On-linelearning:• Multi-armedbandits[RKJ08],

Duelingbandits[YJ09],K-armedduelingbandits[YBKJ12],onlinelearning[HSWdR13][HWdR13],…

16[Liu11] Tie-Yan Liu. Learning to rank for information retrieval, 2011. Springer.[TBH15]Niek Tax,SanderBockting,andDjoerd Hiemstra.Across-benchmarkcomparisonof87learningtorankmethods.Informationprocessing&management,51(6):757–772,2015.

Figurefrom[Liu11]

InthistutorialwefocusonGBRTs

AdsClickPrediction:GBDTasafeatureextractor,thenLogReg [HPJ+14]

AdsClickPrediction:refine/boostNN output[LDG+17]

ProductRanking:100GBDTswithpairwiseranking[SCP16]

DocumentRanking:GBDTnamedLogisticRank [YHT+16]

Ranking,forecasting&recommendations:ObliviousGBRT

17

[HPJ+14] XinranHe,JunfengPan,OuJin,TianbingXu,BoLiu,TaoXu,YanxinShi,AntoineAtallah,RalfHerbrich,StuartBowers,etal.Practicallessonsfrompredictingclicksonadsatfacebook.InProceedingsoftheEighthInternationalWorkshoponDataMiningforOnlineAdvertising,pages1–9.ACM,2014.[LDG+17] Xiaoliang Ling,Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun.Model ensemble for click prediction in bing search ads. In Proceedings of the 26th InternationalConferenceonWorldWideWebCompanion,pages689–698.InternationalWorldWideWebConferencesSteeringCommittee,2017.[SCP16]DariaSorokinaandErickCantu-Paz.Amazonsearch:The joyof rankingproducts. InProceedingsof the39th InternationalACMSIGIRconferenceonResearchandDevelopment inInformationRetrieval,pages459–460.ACM,2016.[YHT+16]DaweiYin,YueningHu,JiliangTang,TimDaly,MianweiZhou,HuaOuyang,JianhuiChen,ChangsungKang,HongboDeng,ChikashiNobata,etal.Rankingrelevanceinyahoosearch.InProceedingsofthe22ndACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages323–332.ACM,2016.

InthistutorialwefocusonGBRTs

• Successful in several Data Challenges:• Winner of the Yahoo! LtR Challenge: combination of 12 ranking models,

8 of which were Lambda-MART models, each having up to 3,000 trees [CC11]• According to the 2015 statistics, GBRTs were adopted by the majority of the

winning solutions among the Kaggle competitions, even more than thepopular deep networks, and all the top-10 teams qualified in the KDDCup2015 used GBRT-based algorithms [CG16]

• New interesting open-source implementations:• XGBoost, LightGBM byMicrosoft, CatBoost by Yandex

• Pluggable within Apache Lucene/Solr18

[CC11] Olivier Chapelle and Yi Chang. Yahoo! learning to rank challenge overview. In Proceedings of the Learning to Rank Challenge, pages 1–24, 2011.[CG16] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,KDD ’16, pages 785–794, New York, NY, USA, 2016. ACM.

Single-StageRanking

Requirestoapplythelearntmodel toeverymatchingdocument,andtogeneratetherequiredfeatures.Notfeasible!Wehaveatleast3 efficiencyvs.effectivenesstrade-offs.


ResultsRANKERQuery

Single-StageRanking

①FeatureComputationTrade-off• ComputationallyExpensive &highlydiscriminative featuresvs.computationallyCheap&slightlydiscriminative features


ResultsRANKERQuery

Two-StageRanking

Expensivefeaturesarecomputedonlyforthetop-K candidatedocuments passingthefirststage.HowtochoseK?②NumberofMatchingCandidatesTrade-off :• a Largeset ofcandidatesisExpensiveandproduceshigh-quality resultsvs.aSmallsetofcandidatesisCheap andproduceslow-quality results• 1000documents[DBC13] (Gov2,ClueWeb09-Bcollections)• 1500-2000documents[MSO13](ClueWeb09-B)• “hundredsofthousands”(over“hundredsofmachines”)[YHT+16a]


Query+top-K docs

STAGE1:

Matching/Recall-oriented

Ranking

STAGE2:

Precision-orientedRanking

Query Results

[DBC13]VanDang,MichaelBendersky,andWBruceCroft.Two-stagelearningtorankforinformationretrieval.InAdvancesinInformationRetrieval,pages423–434.Springer,2013.[MSO13]CraigMacdonald,Rodrygo LTSantos,andIadh Ounis.Thewhens andhows oflearningtorankforwebsearch.InformationRetrieval,16(5):584–628,2013.[YHT+16]Dawei Yin,Yuening Hu,Jiliang Tang,TimDaly,Mianwei Zhou,HuaOuyang,Jianhui Chen,Changsung Kang,Hongbo Deng,Chikashi Nobata,etal.Rankingrelevanceinyahoosearch.InProceedingsofthe22ndACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages323–332.ACM,2016.

Multi-StageRanking

• 3stages[YHT+16]:Contextualfeatures areconsideredinthe3rd stage• Contextual=>aboutthecurrentresultset• Rankbasedonspecificfeatures,Mean,Variance,Standardizedfeatures(seealso[LNO+15a]),

Topicmodelsimilarity• Firsttwostagesareexecutedateachservingnode

• N stages[CGBC17]:Whichmodel ineachstage?Whichfeatures?Howmanydocuments?• About200configurationstested• bestresultswithN=3stages,2500and700docsbetweenstages

• Apropermethodology/algorithmforchoosingthebestconfigurationisstillmissing.


STAGE1:

Matching/Recall-oriented

Ranking

STAGE2:

Precision-orientedRanking

Query Query+Top30

[YHT+16]Dawei Yin,Yuening Hu,Jiliang Tang,TimDaly,Mianwei Zhou,HuaOuyang,Jianhui Chen,Changsung Kang,Hongbo Deng,Chikashi Nobata,etal.Rankingrelevanceinyahoosearch.InProceedingsofthe22ndACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages323–332.ACM,2016.[CGBC17]Ruey-ChengChen,LukeGallagher,Roi Blanco,andJ.ShaneCulpepper.Efficientcost-awarecascaderankinginmulti-stageretrieval.InProceedingsofthe40thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,SIGIR’17,pages445–454,NewYork,NY,USA,2017.ACM.

STAGE3:

ContextualRanking

Results

Multi-StageRanking

③ModelComplexityTrade-off :• Complex &Slow high-quality vs.Simple &Fast low-qualitymodels:

• Complex as:RandomForest,GBRT,InitializedGBRT,Lambda-MART,• Simple as:CoordinateAscent,RidgeRegression,SVM-Rank,RankBoost• In-between as:ObliviousLambda-Mart,ListNet


STAGEi-1:

CheapRanker

STAGEi:

AccurateRanker

Query

STAGEi+1:

VeryAccurateRanker

Results

ModelComplexityTrade-off

• Comparisononvaryingtrainingparameters[CLN+16]:• #trees,#leaves,learningrate,etc.

• Complexmodelsachievesignificantlyhigherquality• Bestmodeldependsontimebudget

• TodayisaboutModelComplexityTrade-off!

LuccheseC.,NardiniF.M.Efficiency/EffectivenessTrade-offsinLearningtoRank 24[CLN+16]GabrieleCapannini,ClaudioLucchese,FrancoMariaNardini,SalvatoreOrlando,RaffaelePerego,andNicolaTonellotto.Qualityversusefficiencyindocumentscoringwithlearning-to-rankmodels.InformationProcessing&Management,2016.

Next…

Efficiency/Effectivenesstrade-offsin:• FeatureSelection• EnhancedLearningAlgorithms• Approximatescoring• FastScoring


References[BMdRS16]AlexeyBorisov,IlyaMarkov,MaartendeRijke,andPavelSerdyukov.A neuralclickmodelforwebsearch.In Proceedingsofthe25thInternationalConferenceonWorldWideWeb,pages531--541.InternationalWorldWideWebConferencesSteeringCommittee,2016.

[BOM15]Roi Blanco,GiuseppeOttaviano,andEdgarMeij.Fast andspace-efficiententitylinkingforqueries.In ProceedingsoftheEighthACMInternationalConferenceonWebSearchandDataMining,pages179--188.ACM,2015.

[BSD10]PaulNBennett,KrystaSvore,andSusanTDumais.Classification-enhancedranking.In Proceedingsofthe19thinternationalconferenceonWorldwideweb,pages111--120.ACM,2010.

[BSR+05]ChrisBurges,TalShaked,ErinRenshaw,AriLazier,MattDeeds,NicoleHamilton,andGregHullender.Learning torankusinggradientdescent.In Proceedingsofthe22ndinternationalconferenceonMachinelearning,pages89--96.ACM,2005.

[Bur10]ChristopherJ.C.Burges.From ranknet tolambdarank tolambdamart:Anoverview.Technical ReportMSR-TR-2010-82,June2010.

[CBCD08]BenCarterette,PaulBennett,DavidChickering,andSusanDumais.Here orthere:PreferenceJudgmentsforRelevance.AdvancesinInformationRetrieval,pages16--27,2008.

[CC11]OlivierChapelle andYiChang.Yahoo!learningtorankchallengeoverview.In ProceedingsoftheLearningtoRankChallenge,pages1--24,2011.

[CCL11]OlivierChapelle,YiChang,andT-YLiu.Future directionsinlearningtorank.In ProceedingsoftheLearningtoRankChallenge,pages91--100,2011.

[CG16]Tianqi ChenandCarlosGuestrin.Xgboost:Ascalabletreeboostingsystem.In Proceedingsofthe22NdACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,KDD'16,pages785--794,NewYork,NY,USA,2016.ACM.


References[CGBC17]Ruey-ChengChen,LukeGallagher,Roi Blanco,andJ.ShaneCulpepper.Efficient cost-awarecascaderankinginmulti-stageretrieval.InProceedingsofthe40thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,SIGIR'17,pages445--454,NewYork,NY,USA,2017.ACM.

[CJRY12]OlivierChapelle,ThorstenJoachims,FilipRadlinski,andYisong Yue.Large-scalevalidationandanalysisofinterleavedsearchevaluation.ACM TransactionsonInformationSystems(TOIS),30(1):6,2012.

[CLN+16]GabrieleCapannini,ClaudioLucchese,FrancoMariaNardini,SalvatoreOrlando,RaffaelePerego,andNicolaTonellotto.Qualityversusefficiencyindocumentscoringwithlearning-to-rankmodels.Inf.Process.Manage.,52(6):1161--1177,November2016.

[CMZG09]OlivierChapelle,DonaldMetlzer,Ya Zhang,andPierreGrinspan.Expected reciprocalrankforgradedrelevance.In Proceedingsofthe18thACMconferenceonInformationandknowledgemanagement,pages621--630.ACM,2009.

[CQL+07]Zhe Cao,TaoQin,Tie-YanLiu,Ming-FengTsai,andHangLi.Learning torank:frompairwiseapproachtolistwise approach.InProceedingsofthe24thinternationalconferenceonMachinelearning,pages129--136.ACM,2007.

[DBC13]VanDang,MichaelBendersky,andWBruceCroft.Two-stagelearningtorankforinformationretrieval.In AdvancesinInformationRetrieval,pages423--434.Springer,2013.

[DZK+10]Anlei Dong,Ruiqiang Zhang,Pranam Kolari,JingBai,FernandoDiaz,YiChang,Zhaohui Zheng,andHongyuan Zha.Time isoftheessence:improvingrecency rankingusingtwitterdata.In Proceedingsofthe19thinternationalconferenceonWorldwideweb,pages331--340.ACM,2010.

[DZS+17]MostafaDehghani,Hamed Zamani,Aliaksei Severyn,Jaap Kamps,andW.BruceCroft.Neural rankingmodelswithweaksupervision.In Proceedingsofthe40thInternational{ACM{SIGIRConferenceonResearchandDevelopmentinInformationRetrieval,Shinjuku,Tokyo,Japan,August7-11,2017,pages65--74.{ACM,2017.


References[FISS03]Yoav Freund,RajIyer,RobertESchapire,andYoram Singer.An efficientboostingalgorithmforcombiningpreferences.Journal ofmachinelearningresearch,4(Nov):933--969,2003.

[Fri01]JeromeHFriedman.Greedy functionapproximation:agradientboostingmachine.Annals ofstatistics,pages1189--1232,2001.

[GCL11]YasserGanjisaffar,RichCaruana,andCristinaVideira Lopes.Bagging gradient-boostedtreesforhighprecision,lowvariancerankingmodels.In Proceedingsofthe34thinternationalACMSIGIRconferenceonResearchanddevelopmentinInformationRetrieval,pages85--94.ACM,2011.

[GY14]WeiGaoandPeiYang.Democracy isgoodforranking:Towardsmulti-viewranklearningandadaptationinwebsearch.In Proceedingsofthe7thACMinternationalconferenceonWebsearchanddatamining,pages63--72.ACM,2014.

[H+00]MonikaRauchHenzinger etal.Link analysisinwebinformationretrieval.IEEE DataEng.Bull.,23(3):3--8,2000.

[HHG+13]Po-SenHuang,Xiaodong He,Jianfeng Gao,LiDeng,AlexAcero,andLarryHeck.Learning deepstructuredsemanticmodelsforwebsearchusingclickthrough data.In Proceedingsofthe22ndACMinternationalconferenceonConferenceoninformation& knowledgemanagement,pages2333--2338.ACM,2013.

[HPJ+14]XinranHe,Junfeng Pan,Ou Jin,Tianbing Xu,BoLiu,TaoXu,Yanxin Shi,AntoineAtallah,RalfHerbrich,StuartBowers,etal.Practicallessonsfrompredictingclicksonadsatfacebook.In ProceedingsoftheEighthInternationalWorkshoponDataMiningforOnlineAdvertising,pages1--9.ACM,2014.


References[HSWdR13]Katja Hofmann,AnneSchuth,ShimonWhiteson,andMaartendeRijke.Reusing historicalinteractiondataforfasteronlinelearningtorankforir.In ProceedingsofthesixthACMinternationalconferenceonWebsearchanddatamining,pages183--192.ACM,2013.

[HWdR13]Katja Hofmann,ShimonWhiteson,andMaartendeRijke.Balancing explorationandexploitationinlistwise andpairwiseonlinelearningtorankforinformationretrieval.Information Retrieval,16(1):63--90,2013.

[JGP+05]ThorstenJoachims,LauraGranka,BingPan,HeleneHembrooke,andGeriGay.Accurately interpretingclickthrough dataasimplicitfeedback.In Proceedingsofthe28thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages154--161.Acm,2005.

[JK00]Kalervo J{\"arvelin andJaana Kekalainen.Ir evaluationmethodsforretrievinghighlyrelevantdocuments.In Proceedingsofthe23rdannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages41--48.ACM,2000.

[JLN16]DiJiang,KennethWai-TingLeung,andWilfredNg.Query intentminingwithmultipledimensionsofwebsearchdata.World WideWeb,19(3):475--497,2016.

[Joa02]ThorstenJoachims.Optimizingsearchenginesusingclickthrough data.InProceedingsoftheeighthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages133--142.ACM,2002.

[JSS17]ThorstenJoachims,Adith Swaminathan,andTobiasSchnabel.Unbiasedlearning-to-rankwithbiasedfeedback.ProceedingsoftheTenthACMInternationalConferenceonWebSearchandDataMining.ACM,2017.

[JWR00]KSparck Jones,SteveWalker,andStephenE.Robertson.A probabilisticmodelofinformationretrieval:developmentandcomparativeexperiments:Part2.Informationprocessing& management,36(6):809--840,2000.


References[KDF+13]RonKohavi,AlexDeng,BrianFrasca,TobyWalker,Ya Xu,andNilsPohlmann.Online controlledexperimentsatlargescale.InProceedingsofthe19thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages1168--1176.ACM,2013.

[LCZ+10]BoLong,OlivierChapelle,Ya Zhang,YiChang,Zhaohui Zheng,andBelleTseng.Active learningforrankingthroughexpectedlossoptimization.In Proceedingsofthe33rdinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages267--274.ACM,2010.

[LDG+17]Xiaoliang Ling,Weiwei Deng,ChenGu,Hucheng Zhou,CuiLi,andFengSun.Model ensembleforclickpredictioninbing searchads.InProceedingsofthe26thInternationalConferenceonWorldWideWebCompanion,pages689--698.InternationalWorldWideWebConferencesSteeringCommittee,2017.

[Liu11]Tie-YanLiu.Learning torankforinformationretrieval,2011.

[LNO+15]ClaudioLucchese,FrancoMariaNardini,SalvatoreOrlando,RaffaelePerego,andNicolaTonellotto.Speeding updocumentrankingwithrank-basedfeatures.In Proceedingsofthe38thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages895--898.ACM,2015.

[LOP+13]ClaudioLucchese,SalvatoreOrlando,RaffaelePerego,Fabrizio Silvestri,andGabrieleTolomei.Discovering tasksfromsearchenginequerylogs.ACM TransactionsonInformationSystems(TOIS),31(3):14,2013.

[MC07]DonaldMetzlerandWBruceCroft.Linear feature-basedmodelsforinformationretrieval.Information Retrieval,10(3):257--274,2007.

[MDC17]Bhaskar Mitra,FernandoDiaz,andNickCraswell.Learning tomatchusinglocalanddistributedrepresentationsoftextforwebsearch.In Proceedingsofthe26thInternationalConferenceonWorldWideWeb,pages1291--1299.InternationalWorldWideWebConferencesSteeringCommittee,2017.


References[MNCC16]Bhaskar Mitra,EricNalisnick,NickCraswell,andRichCaruana.A dualembeddingspacemodelfordocumentranking.arXiv preprintarXiv:1602.01137,2016.

[MSC+13]TomasMikolov,IlyaSutskever,KaiChen,GregSCorrado,andJeffDean.Distributed representationsofwordsandphrasesandtheircompositionality.In Advancesinneuralinformationprocessingsystems,pages3111--3119,2013.

[MSO13]CraigMacdonald,Rodrygo LTSantos,andIadh Ounis.The whens andhows oflearningtorankforwebsearch.Information Retrieval,16(5):584--628,2013.

[MW08]DavidMilneandIanHWitten.Learning tolinkwithwikipedia.In Proceedingsofthe17thACMconferenceonInformationandknowledgemanagement,pages509--518.ACM,2008.

[MZ08]AlistairMoffatandJustinZobel.Rank-biasedprecisionformeasurementofretrievaleffectiveness.ACM TransactionsonInformationSystems(TOIS),27(1):2,2008.

[PC98]JayMPonteandWBruceCroft.A languagemodelingapproachtoinformationretrieval.In Proceedingsofthe21stannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages275--281.ACM,1998.

[QLL10]TaoQin,Tie-YanLiu,andHangLi.A generalapproximationframeworkfordirectoptimizationofinformationretrievalmeasures.Information retrieval,13(4):375--397,2010.

[RJ05]FilipRadlinski andThorstenJoachims.Query chains:learningtorankfromimplicitfeedback.In ProceedingsoftheeleventhACMSIGKDDinternationalconferenceonKnowledgediscoveryindatamining,pages239--248.ACM,2005.


References[RKJ08]FilipRadlinski,RobertKleinberg,andThorstenJoachims.Learning diverserankingswithmulti-armedbandits.In Proceedingsofthe25thinternationalconferenceonMachinelearning,pages784--791.ACM,2008.

[RS03]YvesRasolofo andJacquesSavoy.Term proximityscoringforkeyword-basedretrievalsystems.Advances ininformationretrieval,pages79--79,2003.

[RZT04]StephenRobertson,HugoZaragoza,andMichaelTaylor.Simple bm25extensiontomultipleweightedfields.In ProceedingsofthethirteenthACMinternationalconferenceonInformationandknowledgemanagement,pages42--49.ACM,2004.

[SB88]GerardSaltonandChristopherBuckley.Term-weightingapproachesinautomatictextretrieval.Information processing&management,24(5):513--523,1988.

[SCP16]DariaSorokina andErickCantu-Paz.Amazon search:Thejoyofrankingproducts.In Proceedingsofthe39thInternationalACMSIGIRconferenceonResearchandDevelopmentinInformationRetrieval,pages459--460.ACM,2016.

[SHG+14]Yelong Shen,Xiaodong He,Jianfeng Gao,LiDeng,andGregoire Mesnil.Learning semanticrepresentationsusingconvolutionalneuralnetworksforwebsearch.In Proceedingsofthe23rdInternationalConferenceonWorldWideWeb,pages373--374.ACM,2014.

[SJ72]KarenSparck Jones.A statisticalinterpretationoftermspecificityanditsapplicationinretrieval.Journal ofdocumentation,28(1):11--21,1972.

[SM15]Aliaksei Severyn andAlessandroMoschitti.Learning torankshorttextpairswithconvolutionaldeepneuralnetworks.In Proceedingsofthe38thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval,pages373--382.ACM,2015.

[SY11]MartinSzummer andEmine Yilmaz.Semi-supervisedlearningtorankwithpreferenceregularization.In Proceedingsofthe20thACMinternationalconferenceonInformationandknowledgemanagement,pages269--278.ACM,2011.


References[TBH15]Niek Tax,SanderBockting,andDjoerd Hiemstra.A cross-benchmarkcomparisonof87learningtorankmethods.Informationprocessing&management,51(6):757--772,2015.

[XLL+08]JunXu,Tie-YanLiu,MinLu,HangLi,andWei-YingMa.Directly optimizingevaluationmeasuresinlearningtorank.In Proceedingsofthe31stannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages107--114.ACM,2008.

[YBKJ12]Yisong Yue,JosefBroder,RobertKleinberg,andThorstenJoachims.The k-armedduelingbanditsproblem.Journal ofComputerandSystemSciences,78(5):1538--1556,2012.

[YFRJ07]Yisong Yue,ThomasFinley,FilipRadlinski,andThorstenJoachims.A supportvectormethodforoptimizingaverageprecision.InProceedingsofthe30thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages271--278.ACM,2007.

[YHT+16]Dawei Yin,Yuening Hu,Jiliang Tang,TimDaly,Mianwei Zhou,HuaOuyang,Jianhui Chen,Changsung Kang,Hongbo Deng,ChikashiNobata,etal.Ranking relevanceinyahoosearch.In Proceedingsofthe22ndACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages323--332.ACM,2016.

[YJ09]Yisong YueandThorstenJoachims.Interactively optimizinginformationretrievalsystemsasaduelingbanditsproblem.In Proceedingsofthe26thAnnualInternationalConferenceonMachineLearning,pages1201--1208.ACM,2009.

[YLKY07]Jen-YuanYeh,Jung-YiLin,Hao-RenKe,andWei-PangYang.Learning torankforinformationretrievalusinggeneticprogramming.InProceedingsofSIGIR2007WorkshoponLearningtoRankforInformationRetrieval(LR4IR2007),2007.

[YR09]Emine YilmazandStephenRobertson.Deep versusshallowjudgmentsinlearningtorank.In Proceedingsofthe32ndinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,pages662--663.ACM,2009.