learning to rank: new techniques and applications

26
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK

Upload: aldis

Post on 24-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Learning to Rank: New Techniques and Applications. Martin Szummer Microsoft Research Cambridge, UK. Why learning to rank?. Current rankers use many features, in complex combinations Applications Web search ranking, enterprise search Image search Ad selection - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning  to  Rank:  New Techniques  and  Applications

Learning to Rank: New Techniques and Applications

Martin SzummerMicrosoft Research

Cambridge, UK

Page 2: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer2 Microsoft Research

Why learning to rank?

• Current rankers use many features, in complex combinations

• Applications– Web search ranking, enterprise search – Image search– Ad selection– Merging multiple results lists

• The good: uses training data to find combinations of features that optimize IR metrics

• The bad: requires judged training data. Expensive, subjective, not provided by end-users, out-of-date

Page 3: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer3 Microsoft Research

This talk• Learning to rank with IR metrics A single, simple yet competition-winning, recipe. Works for NDCG, MAP, Precision with linear or non-linear ranking functions (neural nets, boosted trees etc)

• Semi-supervised rankingA new technique. Reduce the amount of judged training data required.

• Learning to mergeApplication: merging results lists from multiple query reformulations

Actually – I apply the same recipe in three

different settings!

Page 4: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer4 Microsoft Research

Ranking Background

• Classification: determine the class of an item i (operates on individual items)

• Ranking: determine the preference of item i versus j (operates on pairs of items)

• Ranking function:

Example: Linear function Ranking function induces a preference: when

score function query-docfeatures

parameters

Page 5: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer5 Microsoft Research

From Ranking Function to the Ranking

• Applying the ranking function to define a ranking Sort {}

• Above: had a deterministic model of preference• Henceforth: a probabilistic model

translates score differences into a probability of preference Bradley-Terry/Mallows

Page 6: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer6 Microsoft Research

Learning to Rank

• Learning to rank Sort {}

• Maximize likelihood of the preference pairs given in training data

indicator when in train

e.g. RankNet model [Burges et al 2005]

given givendetermine w

preferencepairs

Page 7: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer7 Microsoft Research

Learning to Rank for IR metrics• IR metrics such as NDCG, MAP or Precision depend on:

– sorted order of items– ranks of items: weight the top of the ranking more

Recipe1) Express the metric as a sum of pairwise swap deltas2) Smooth it by multiplying by a Bradley-Terry term3) Optimize parameters by gradient descent over a judged

training set

LambdaRank & LambdaMART [Burges et al] are instances of this recipe. The latter won the Yahoo! Learning to rank challenge (2010).

Page 8: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer8 Microsoft Research

Example: Apply recipe to NDCG metric

Unpublished material. Email me if interested.

Page 9: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer9 Microsoft Research

Gradients - intuition

• Gradients act as forces on doc pairs

x Lr12345

𝑑𝐶𝑑𝑠𝑖 𝑗

𝑑𝐶𝑑𝑠𝑖 𝑗

Page 10: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer10 Microsoft Research

Semi-supervised Ranking

prefer

[with Emine Yilmaz]

Train with judged AND unjudged query-document pairs

Page 11: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer11 Microsoft Research

Semi-supervised Ranking

• Applications– (Pseudo) Relevance feedback– Reduce the number of (expensive) human judgments– Use when judgments are hard to obtain

• Customers may not want to judge their collections– adaptation to a specific company in enterprise search– ranking for small markets, special interest domains,

• Approach– preference learning– end-to-end optimization of ranking metrics (NDCG, MAP)– multiple and completely unlabeled rank instances– scalability

Page 12: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer12 Microsoft Research

How to benefit from unlabeled data?

Unlabeled data gives information about the data distribution P(x). We must make assumptions about what the structure of the unlabeled data tells us about the ranking distribution P(R|x).

A common assumption: the cluster assumption Unlabeled data defines the extent of clusters, Labeled data determines the class/function value of each cluster

Page 13: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer13 Microsoft Research

Semi-supervised classification: similar documents Þ same class regression: similar documents Þ similar function value ranking: similar documents Þ similar preference i.e. neither is preferred to the other

• Differences from classification & regression:– Preferences provide weaker constraints than function values or classes

is a type of regularizer on the function we are learning.

Similarity can be defined based on content. Does not require judgments.

Page 14: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer14 Microsoft Research

Quantify Similarity

similar documents Þ similar preference i.e. neither is preferred to the other

Unpublished material. Email me if interested.

Page 15: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer15 Microsoft Research

Semi-supervised Gradients

x L 𝑑𝐶𝐿

𝑑 𝑠𝑖 𝑗𝑑𝐶𝑈

𝑑𝑠𝑖 𝑗𝑑𝐶𝐿

𝑑𝑠𝑖 𝑗+𝛽 𝑑𝐶

𝑈

𝑑 𝑠𝑖𝑗

Page 16: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer16 Microsoft Research

ExperimentsRelevance Feedback task: 1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25) 2) system trains query-specific ranker, and re-ranks

Data: TREC collection. 528,000 documents, 150 queries1000 total documents per query; 2-15 docs are labeled

Features: ranking features (q, d): 22 features from LETOR content features (d1, d2): TF-IDF dist between top 50 words

Neighbors in input space using either of the above Note: at test time, only ranking features are used; method allows using features of type (d1, d2) and (q, d1, d2) at training that other algos cannot use

Ranking function f(): neural network, 3 hidden unitsK=5 neighbors

Page 17: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer17 Microsoft Research

Relevance Feedback Task

2 3 5 10 150.1

0.2

0.3

0.4

0.5

0.6

Number of labeled documents

ND

CG

(10)

LambdaRank L&U ContLambdaRank L&ULambdaRank LTSVM L&U

RankBoost L&URankingSVM L

RankBoost L

Page 18: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer18 Microsoft Research

Novel Queries Task

90,000 training documents3500 preference pairs

40 million unlabeled pairs

Page 19: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer19 Microsoft Research

Novel Queries Task

102

1030.1

0.2

0.3

0.4

0.5

Number of labeled preference pairs

ND

CG

(10)

LambdaRank L&U ContLambdaRank L&ULambdaRank L

Upper Bound

Page 20: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer20 Microsoft Research

Learning to MergeTask: learn a ranker that merges results from other rankers Example application

users do not know the best way to express their web search querya single query may not be enough to reach all relevant documents

merge results

wp7

wp7 phonereformulatein parallel: microsoft wp7

user:Solution

Page 21: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer21 Microsoft Research

Merging Multiple Queries [with Sheldon, Shokouhi, Craswell]

• Traditional approach: alter before retrieval• Merging: alter after retrieval

– Prospecting: see results first, then decide– Flexibility: any is rewrite allowed, arbitrary

features– Upside potential: better than any individual list– Increased query load on engine: use cache to

mitigate it

Page 22: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer22 Microsoft Research

LambdaMerge: learn to merge

A weighted mixture of ranking function

Rewrite features: Rewrite-difficulty: ListMean, ListStd, Clarity Rewrite-drift: IsRewrite, RewriteRank, RewriteScore,Overlap@N

Scoring features: Dynamic rank score, BM25, Rank, IsTopN

rewrite feat

score feat score feat

jupiters mass mass of jupiter

Page 23: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer23 Microsoft Research

Page 24: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer24 Microsoft Research

Page 25: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer25 Microsoft ResearchReformulation – Original NDCG

Mer

ged

– O

rigin

al N

DCG

-Merge Results

Page 26: Learning  to  Rank:  New Techniques  and  Applications

Martin Szummer26 Microsoft Research

Summary

• Learning to Rank– An indispensable tool– Requires judgments: but semi-supervised learning can help

crowd-sourcing is also a possibility research frontier: implicit judgments from clicks

– Many applications beyond those shown• Merging: multiple local search engines, multiple language engines• Rank recommendations in collaborative filtering• Many thresholding tasks (filtering) can be posed as ranking• Rank ads for relevance • Elections

– Use it!