learning to rank data2day 2017

Learning to Rank

Stefan Kühn

Join me on XING

data2day Heidelberg - September 28th, 2017

Stefan Kühn (XING) Ranking 28.09.2017 1 / 30

Contents

1 Rankings and Humans

2 Ranking and Machine Learning

3 Formalizing Ranking Problems

4 Rankings and Recommender Systems

Rankings in Everyday Life

TODO ListsPrioritized BacklogsTop X songs/movies/. . .You get the idea. . .

Rankings in History

It all started with

Rankings Nowadays

German States by Employee Happiness (according to Kununu)

Rankings, Heuristics, Decisions

Rankings are about comparisonsRankings are about decision-makingSome heuristics are about both

Recognition HeuristicIf one of two objects is recognized and the other is not, then infer that therecognized object has the higher value with respect to the criterion.proposed by Gigerenzer and Goldstein, built upon the great works of Kahneman and Tversky

Learning

Is Ranking a Machine Learning Problem?

Machine Learning Concepts

Supervised - Learning from LabelsFigure out how to generate correct labels using the given data

ClassificationRegression

Unsupervised - Learning from DataIdentify hidden/inherent structure using the given data

ClusteringDimensionality Reduction / Manifold LearningOutlier Detection

Supervised versus Unsupervised

Learning to RankFigure out how to generate good ranking using the given data

What about Learning to Rank = Machine-Learned Ranking or MLR?1 Supervised because ranks are like labels?2 Unsupervised because ranks are typically based on implicit feedback,

i.e. latent/hidden/inherent structure?3 Mixed/intermediate/something else?4 Ill-posed question?

Could you please rank these options according to whatever you think isappropriate?

And by the way, how did you do it?

Supervised versus Unsupervised

Learning to RankFigure out how to generate good ranking using the given data

What about Learning to Rank = Machine-Learned Ranking or MLR?1 Supervised because ranks are like labels?2 Unsupervised because ranks are typically based on implicit feedback,

i.e. latent/hidden/inherent structure?3 Mixed/intermediate/something else?4 Ill-posed question?

Could you please rank these options according to whatever you think isappropriate?

And by the way, how did you do it?

Example: XING Stream

How to order News?

By time?By content/topic?By popularity?By clicking probability?

Every choice changes the problem tosolve while the result set is always thesame - a ranked list of items. Everychoice represents a different distancemeasure / objective function tominimize.

Ranking - Problem Formulation

Items x ∈ X

Ordered Labels or Ranks 1 > 2 > . . . > k > . . .

Ranking rule f that allows to do the following:I Input: Unordered subset {x , y , z , . . .} ⊆ XI Output: Ordered list, i.e. y > z > x > . . .

Example: Text searchItems: Set of DocumentsRanking rule f : Similarity measure for documents and search terms

Ranking and Level of Measurement

Supervised Learning ProblemsClassification - Nominal Scale - Class LabelsRanking - Ordinal Scale - RanksRegression - Intervall Scale - Real Values

Ranking is the task of predicting labels on an ordinal scale.

Informally: Learn ordering from labeled training data - typically ordered listsof items - and try to predict ordering for new sets of items.

What is special about this?Ordering is context-dependent. One additional item (or one item less) canchange all other ranks. This is clearly different compared to regression andclassification.

Ranking in Information Retrieval

CC BY-SA 3.0,https://commons.wikimedia.org/w/index.php?curid=518546

Ranking - Pointwise

Approach CharacteristicsInput: Single itemsEvaluation: Scoring function evaluated for each point/itemOptimization: Loss function derived from individual scores

Reduces Ranking Problem to eitherRegressionClassificationOrdinal Regression

Ranking - Pointwise

Image taken from Tie-Yan Liu @ WWW 2009 Tutorial on Learning to Rankhttp://wwwconference.org/www2009/pdf/T7A-LEARNING TO RANK TUTORIAL.pdf

Ranking - Pointwise

Problems with the Pointwise Approach

Length of item lists can differ significantlyExample: There are more website related to the search term Online(ca. 10 Mrd.) than to Offline (ca. 666 Mio)Position of items on list is not taken into accountExample: Incorrect ordering of the top 10 results will have a slightlybigger impact than errors/inversions below position 123456789

ConsequenceLonger lists will dominate the optimization, while actually the shorter listsare more important for humans/customers.

Advantages

If all individual scores are known, all possible Rankings are determined.Stefan Kühn (XING) Ranking 28.09.2017 19 / 30

Ranking - Pairwise

Approach CharacteristicsInput: Pairs of ItemsEvaluation: Preference function evaluated for each pair - binaryclassificationOptimization: Pairwise Classification Loss derived from all pairings,weighted majority voting

Reduces Ranking Problem toBinary (or pairwise) Classification

Ranking - Pairwise

Problems with the Pairwise Approach

Length of item lists can differ significantlyNumber of pairs depends quadratically on the length of the listEven bigger imbalance w.r.t. list length

Advantages

Comparisons of pairs of elements is a much more natural approach toRanking than Regression or Classification.

Ranking - Listwise

Approach CharacteristicsInput: Set of ItemsEvaluation: Some Evaluation MetricOptimization:

I Either: Directly minimize Evaluation MetricI Or: Loss function defined for permutations of the given input

Reduces Ranking Problem to eitherDirect Optimization of Evaluation MetricListwise Loss Optimization (Distance between lists is non-trivial)

Ranking - Listwise

Problems with the Listwise Approach

Huge complexity issueDirect Optimization: Non-smooth functionsOften only incomplete knowledge about ground truth for lists (onlytiny subset available for learning)

Advantages

Positions on lists are visible to the algorithms.

Important Contributions

Natural Language ProcessingI tf-idfI Okapi BM25I Link to Information Theory

Interesting Nonlinear Evaluation MetricsI P@k = Precision restricted to the best k itemsI MAPI Discounted Cumulative Gain = DCG

Interesting Non-Standard Ojective FunctionsI (N)DCG as optimization objectiveI non-continuous and non-smooth

Interesting RankersI Pointwise: Subset Ranking; McRank; PRanking (Ordinal Regression)I Pairwise: RankNet; FRank; RankBoost; Ranking SVMI Listwise: SoftRank; SoftNDCG; SVM-MAP, Structural SVM, AdaRank

Example: Personalized Ad Recommendations

Standard ApproachesI Contextual BanditsI Policies based on classifiers for

each adI Collaborative FilteringI Based on Latent Features,

e.g. when using MatrixFactorization

Main ProblemI Extreme sparsity of positive

feedback

Example: Personalized Ad Recommendations

New ApproachesI Still Contextual BanditsI Policies based on rankers

instead of classifiersRecent Paper by Chaudhuri etal.

I Personalized AdvertisementRecommendation: A RankingApproach to Address theUbiquitous Click SparsityProblem

I Works best in the case ofextreme sparsity

Thank you!

learning to rank data2day 2017

Data & Analytics

elegantes in-memory computing mit apache ignite und...

mehr und schneller ist nicht automatisch besser - data2day,...

t7a-learning to rank tutorial

fast and reliable online learning to rank for information...

learning to rank: an introduction to lambdamart

10. learning to rank - max planck...

learning to rank in theory and prac3ce

learning to rank (part 2) - radlinski

effective learning to rank persian web content

folktale classification using learning to rank

big game data - event tracking mit storm, kestrel und der...

selective gradient boosting for effective learning to rank

tutorial on learning to rank

11. learning to rank - depaul university

learning to rank

learning to rank for spatiotemporal search

learning to rank: new techniques and applications

learning to rank - from pairwise approach to listwise

extending learning to rank with user dynamic

dcm bandits: learning to rank with multiple clicks