learning to rank web science 2013

1

Learning to rank

Web Science 2013

Jaspreet Singh

2

Overview

• Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002.

• Large Scale learning to rank. D. Sculley.

Machine Learning Algorithm

Retrieval function

3

Optimizing search engines using click through data.

• Explicit feedback vs Click through data• Click through data as triplets (q,r,c)

q is the query, r is the ranking, c is the list of links the user has clicked on

Assuming that the user scanned the ranking from top to bottom, he must have observed link 2 before clicking on 3, making a decision to not click on it.

4

Learning of retrieval functions• Exact ordering of documents close to impossible• Measure similarity between optimal ordering and given ordering using

average precision (Kendall’s tau)• Maximizing Kendall’s tau is equivalent to reducing the average rank.• For a fixed but unknown distribution Pr(q,r ) of queries and target ∗

rankings on a document collection D with m documents, the goal is to learn a retrieval function f(q) for which the expected Kendall’s τ is maximal

• The above equation is equivalent to a risk function where – τ is the loss.• Empirical risk minimization principle states that the learning algorithm

should choose a hypothesis which minimizes the empirical risk

5

Rank SVM• Is it possible to design an algorithm and a family of ranking functions F so

that finding the function f belonging to F maximizing τ is efficient and that this function generalizes well beyond the training data.

• Usage of weight vectors to adjust rank.

• Instead of maximizing τ directly, it is equivalent to minimize the number of discordant pairs in the calculation of τ. This is equivalent to finding the weight vector so that the maximum number of the following inequalities is fulfilled:

6

Rank SVM• NP hard problem similar to SVM classification• Use some regularization parameters to bound and approximate the result.• SVM light

7

Experiments• Meta search engine used to collect results from the best search engines

and combine them into a single list by union.• To be able to compare the quality of different retrieval functions, the key

idea is to present two rankings at the same time. Then measure which ranking has more clicks.

Ranking A1. D12. D23. D3

Ranking B1. D42. D53. D6

Union1. D12. D43. D24. D55. D36. D6

8

Experiments• Offline experiment: verify that the Ranking SVM can indeed learn a

retrieval function maximizing Kendall’s tau on partial preference feedback.• Split the collected queries into training and test set and then train the

classifier using SVM light.• Result: Ranking SVM can learn regularities in the preferences. More the

training queries lesser the error.

• Online experiment: verifies that the learned retrieval function does improve retrieval quality as desired.

• The learned retrieval function is compared against : Google, MSNSearch, Toprank

• Result: More links from the learned ranking clicked on.

9

Conclusion• The key insight is that such click through data can provide training data in

the form of relative preferences• The experimental results show that the Ranking SVM can successfully

learn an improved retrieval function from click through data. Without any explicit feedback or manual parameter tuning, it has automatically adapted to the particular preferences of a group of 20 users(112 queries).

• There is a trade-off between the amount of training data (ie. large group) and maximum homogeneity (ie. single user)

10

Overview

• Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002.

• Large Scale learning to rank. D. Sculley.


Retrieval function

11

Large scale learning to rank• Pair-wise learning to rank methods such as Rank SVM give good

performance, but suffer from the computational burden of optimizing an objective defined over O(n2) possible pairs for data sets with n examples.

• Removal of super-linear dependence on training set size by sampling pairs from an implicit pair-wise expansion and applying efficient stochastic gradient descent learners for approximate SVMs

• The main approach of this paper is to adapt the pair-wise learning to rank problem into the stochastic gradient descent framework

12

Optimization and stochastic gradient descent

• The paper is restricted to solving the classic Rank SVM optimization problem, first posed by Joachims: Minimize the hinge loss.

• Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.

• generalization ability of stochastic gradient descent relies only on the number of stochastic steps taken, not the size of the data set

13

Indexed Sampling - GetRandomnPair

• 2 level nested hashmap• First level : query is key • Second level: rank is key

14

Stochastic gradient descent• Stochastic implies sampling• Gradient descent is a step wise process to find the local minimum of a

function.• Rank SVM has a hinge loss function. The hinge loss is used for "maximum-

margin" classification.• Hence we need to minimize this function and get a good classifier.• The hinge loss is a convex function, so many of the usual convex

optimizers used in machine learning can work with it. Hence we can use SGD.

• Depending on how they perform updates to the weight vector there are many SGD variations.

15

LETOR Experiment and Results• LETOR: Learning to Rank for Information Retrieval• Ranking performance: comparable if not better• Training speed: 100 times faster

16

Conclusion• Click through data can be used as partial relevance feedback• We can learn a retrieval function that can improve mean average precision• Learning retrieval functions can be done on a large scale using stochastic

gradient descent.


Retrieval function

learning to rank web science 2013

Documents

risk function

ranking svm

average rank

top10 x

top50 x

top1count x

rank svmis

retrieval function fq