learning to rank data2day 2017

Learning to Rank

Stefan Kühn

Join me on XING

data2day Heidelberg - September 28th, 2017

Stefan Kühn (XING) Ranking 28.09.2017 1 / 30

https://www.xing.com/profile/Stefan_Kuehn46

Contents

1 Rankings and Humans

2 Ranking and Machine Learning

3 Formalizing Ranking Problems

4 Rankings and Recommender Systems


Rankings in Everyday Life

TODO ListsPrioritized BacklogsTop X songs/movies/. . .You get the idea. . .


Rankings in History

It all started with


Rankings Nowadays

German States by Employee Happiness (according to Kununu)


Rankings, Heuristics, Decisions

Rankings are about comparisonsRankings are about decision-makingSome heuristics are about both

Recognition HeuristicIf one of two objects is recognized and the other is not, then infer that therecognized object has the higher value with respect to the criterion.proposed by Gigerenzer and Goldstein, built upon the great works of Kahneman and Tversky


Learning

Is Ranking a Machine Learning Problem?


Machine Learning Concepts

Supervised - Learning from LabelsFigure out how to generate correct labels using the given data

ClassificationRegression

Unsupervised - Learning from DataIdentify hidden/inherent structure using the given data

ClusteringDimensionality Reduction / Manifold LearningOutlier Detection


Supervised versus Unsupervised

Learning to RankFigure out how to generate good ranking using the given data

What about Learning to Rank = Machine-Learned Ranking or MLR?1 Supervised because ranks are like labels?2 Unsupervised because ranks are typically based on implicit feedback,

i.e. latent/hidden/inherent structure?3 Mixed/intermediate/something else?4 Ill-posed question?

Could you please rank these options according to whatever you think isappropriate?

And by the way, how did you do it?


Example: XING Stream

How to order News?

By time?By content/topic?By popularity?By clicking probability?

Every choice changes the problem tosolve while the result set is always thesame - a ranked list of items. Everychoice represents a different distancemeasure / objective function tominimize.


Ranking - Problem Formulation

Items x ∈ X

Ordered Labels or Ranks 1 > 2 > . . . > k > . . .

Ranking rule f that allows to do the following:I Input: Unordered subset {x , y , z , . . .} ⊆ XI Output: Ordered list, i.e. y > z > x > . . .

Example: Text searchItems: Set of DocumentsRanking rule f : Similarity measure for documents and search terms


Ranking and Level of Measurement

Supervised Learning ProblemsClassification - Nominal Scale - Class LabelsRanking - Ordinal Scale - RanksRegression - Intervall Scale - Real Values

Ranking is the task of predicting labels on an ordinal scale.

Informally: Learn ordering from labeled training data - typically ordered listsof items - and try to predict ordering for new sets of items.

What is special about this?Ordering is context-dependent. One additional item (or one item less) canchange all other ranks. This is clearly different compared to regression andclassification.


Ranking in Information Retrieval

CC BY-SA 3.0,https://commons.wikimedia.org/w/index.php?curid=518546


Ranking - Pointwise

Approach CharacteristicsInput: Single itemsEvaluation: Scoring function evaluated for each point/itemOptimization: Loss function derived from individual scores

Reduces Ranking Problem to eitherRegressionClassificationOrdinal Regression


Ranking - Pointwise

Image taken from Tie-Yan Liu @ WWW 2009 Tutorial on Learning to Rankhttp://wwwconference.org/www2009/pdf/T7A-LEARNING TO RANK TUTORIAL.pdf


Ranking - Pointwise

Problems with the Pointwise Approach

Length of item lists can differ significantlyExample: There are more website related to the search term Online(ca. 10 Mrd.) than to Offline (ca. 666 Mio)Position of items on list is not taken into accountExample: Incorrect ordering of the top 10 results will have a slightlybigger impact than errors/inversions below position 123456789

ConsequenceLonger lists will dominate the optimization, while actually the shorter listsare more important for humans/customers.

Advantages

If all individual scores are known, all possible Rankings are determined.Stefan Kühn (XING) Ranking 28.09.2017 19 / 30

Ranking - Pairwise

Approach CharacteristicsInput: Pairs of ItemsEvaluation: Preference function evaluated for each pair - binaryclassificationOptimization: Pairwise Classification Loss derived from all pairings,weighted majority voting

Reduces Ranking Problem toBinary (or pairwise) Classification


Ranking - Pairwise



Ranking - Pairwise

Problems with the Pairwise Approach

Length of item lists can differ significantlyNumber of pairs depends quadratically on the length of the listEven bigger imbalance w.r.t. list length

Advantages

Comparisons of pairs of elements is a much more natural approach toRanking than Regression or Classification.


Ranking - Listwise

Approach CharacteristicsInput: Set of ItemsEvaluation: Some Evaluation MetricOptimization:

I Either: Directly minimize Evaluation MetricI Or: Loss function defined for permutations of the given input

Reduces Ranking Problem to eitherDirect Optimization of Evaluation MetricListwise Loss Optimization (Distance between lists is non-trivial)


Ranking - Listwise



Ranking - Listwise

Problems with the Listwise Approach

Huge complexity issueDirect Optimization: Non-smooth functionsOften only incomplete knowledge about ground truth for lists (onlytiny subset available for learning)

Advantages

Positions on lists are visible to the algorithms.


Important Contributions

Natural Language ProcessingI tf-idfI Okapi BM25I Link to Information Theory

Interesting Nonlinear Evaluation MetricsI P@k = Precision restricted to the best k itemsI MAPI Discounted Cumulative Gain = DCG

Interesting Non-Standard Ojective FunctionsI (N)DCG as optimization objectiveI non-continuous and non-smooth

Interesting RankersI Pointwise: Subset Ranking; McRank; PRanking (Ordinal Regression)I Pairwise: RankNet; FRank; RankBoost; Ranking SVMI Listwise: SoftRank; SoftNDCG; SVM-MAP, Structural SVM, AdaRank


Example: Personalized Ad Recommendations

Standard ApproachesI Contextual BanditsI Policies based on classifiers for

each adI Collaborative FilteringI Based on Latent Features,

e.g. when using MatrixFactorization

Main ProblemI Extreme sparsity of positive

feedback


Example: Personalized Ad Recommendations

New ApproachesI Still Contextual BanditsI Policies based on rankers

instead of classifiersRecent Paper by Chaudhuri etal.

I Personalized AdvertisementRecommendation: A RankingApproach to Address theUbiquitous Click SparsityProblem

I Works best in the case ofextreme sparsity


https://arxiv.org/pdf/1603.01870.pdf





Thank you!


learning to rank data2day 2017

Data & Analytics