learning to rank web science 2013

16
Learning to rank Web Science 2013 Jaspreet Singh 1

Upload: brent

Post on 24-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Learning to rank Web Science 2013. Jaspreet Singh. Overview. Optimizing search engines using click through data. Thorsten Joachims , SIGKDD 2002 . Large Scale learning to rank. D. Sculley . Machine Learning Algorithm. Retrieval function. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning to rank Web Science 2013

1

Learning to rank

Web Science 2013

Jaspreet Singh

Page 2: Learning to rank Web Science 2013

2

Overview

• Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002.

• Large Scale learning to rank. D. Sculley.

Machine Learning Algorithm

Retrieval function

Page 3: Learning to rank Web Science 2013

3

Optimizing search engines using click through data.

• Explicit feedback vs Click through data• Click through data as triplets (q,r,c)

q is the query, r is the ranking, c is the list of links the user has clicked on

Assuming that the user scanned the ranking from top to bottom, he must have observed link 2 before clicking on 3, making a decision to not click on it.

Page 4: Learning to rank Web Science 2013

4

Learning of retrieval functions• Exact ordering of documents close to impossible• Measure similarity between optimal ordering and given ordering using

average precision (Kendall’s tau)• Maximizing Kendall’s tau is equivalent to reducing the average rank.• For a fixed but unknown distribution Pr(q,r ) of queries and target ∗

rankings on a document collection D with m documents, the goal is to learn a retrieval function f(q) for which the expected Kendall’s τ is maximal

• The above equation is equivalent to a risk function where – τ is the loss.• Empirical risk minimization principle states that the learning algorithm

should choose a hypothesis which minimizes the empirical risk

Page 5: Learning to rank Web Science 2013

5

Rank SVM• Is it possible to design an algorithm and a family of ranking functions F so

that finding the function f belonging to F maximizing τ is efficient and that this function generalizes well beyond the training data.

• Usage of weight vectors to adjust rank.

• Instead of maximizing τ directly, it is equivalent to minimize the number of discordant pairs in the calculation of τ. This is equivalent to finding the weight vector so that the maximum number of the following inequalities is fulfilled:

Page 6: Learning to rank Web Science 2013

6

Rank SVM• NP hard problem similar to SVM classification• Use some regularization parameters to bound and approximate the result.• SVM light

Page 7: Learning to rank Web Science 2013

7

Experiments• Meta search engine used to collect results from the best search engines

and combine them into a single list by union.• To be able to compare the quality of different retrieval functions, the key

idea is to present two rankings at the same time. Then measure which ranking has more clicks.

Ranking A1. D12. D23. D3

Ranking B1. D42. D53. D6

Union1. D12. D43. D24. D55. D36. D6

Page 8: Learning to rank Web Science 2013

8

Experiments• Offline experiment: verify that the Ranking SVM can indeed learn a

retrieval function maximizing Kendall’s tau on partial preference feedback.• Split the collected queries into training and test set and then train the

classifier using SVM light.• Result: Ranking SVM can learn regularities in the preferences. More the

training queries lesser the error.

• Online experiment: verifies that the learned retrieval function does improve retrieval quality as desired.

• The learned retrieval function is compared against : Google, MSNSearch, Toprank

• Result: More links from the learned ranking clicked on.

Page 9: Learning to rank Web Science 2013

9

Conclusion• The key insight is that such click through data can provide training data in

the form of relative preferences• The experimental results show that the Ranking SVM can successfully

learn an improved retrieval function from click through data. Without any explicit feedback or manual parameter tuning, it has automatically adapted to the particular preferences of a group of 20 users(112 queries).

• There is a trade-off between the amount of training data (ie. large group) and maximum homogeneity (ie. single user)

Page 10: Learning to rank Web Science 2013

10

Overview

• Optimizing search engines using click through data. Thorsten Joachims, SIGKDD 2002.

• Large Scale learning to rank. D. Sculley.

Machine Learning Algorithm

Retrieval function

Page 11: Learning to rank Web Science 2013

11

Large scale learning to rank• Pair-wise learning to rank methods such as Rank SVM give good

performance, but suffer from the computational burden of optimizing an objective defined over O(n2) possible pairs for data sets with n examples.

• Removal of super-linear dependence on training set size by sampling pairs from an implicit pair-wise expansion and applying efficient stochastic gradient descent learners for approximate SVMs

• The main approach of this paper is to adapt the pair-wise learning to rank problem into the stochastic gradient descent framework

Page 12: Learning to rank Web Science 2013

12

Optimization and stochastic gradient descent

• The paper is restricted to solving the classic Rank SVM optimization problem, first posed by Joachims: Minimize the hinge loss.

• Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.

• generalization ability of stochastic gradient descent relies only on the number of stochastic steps taken, not the size of the data set

Page 13: Learning to rank Web Science 2013

13

Indexed Sampling - GetRandomnPair

• 2 level nested hashmap• First level : query is key • Second level: rank is key

Page 14: Learning to rank Web Science 2013

14

Stochastic gradient descent• Stochastic implies sampling• Gradient descent is a step wise process to find the local minimum of a

function.• Rank SVM has a hinge loss function. The hinge loss is used for "maximum-

margin" classification.• Hence we need to minimize this function and get a good classifier.• The hinge loss is a convex function, so many of the usual convex

optimizers used in machine learning can work with it. Hence we can use SGD.

• Depending on how they perform updates to the weight vector there are many SGD variations.

Page 15: Learning to rank Web Science 2013

15

LETOR Experiment and Results• LETOR: Learning to Rank for Information Retrieval• Ranking performance: comparable if not better• Training speed: 100 times faster

Page 16: Learning to rank Web Science 2013

16

Conclusion• Click through data can be used as partial relevance feedback• We can learn a retrieval function that can improve mean average precision• Learning retrieval functions can be done on a large scale using stochastic

gradient descent.

Machine Learning Algorithm

Retrieval function