learning to rankdilan/sharefiles/liu_ranking.pdf · learning of relational ranking model...

32
Learning to Rank: from Pairwise Approach to Listwise Approach Tie-Yan Liu Lead Researcher, Microsoft Research Asia http://research.microsoft.com/users/tyliu/

Upload: others

Post on 24-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Learning to Rank: from Pairwise Approach to Listwise Approach

Tie-Yan LiuLead Researcher, Microsoft Research Asia

http://research.microsoft.com/users/tyliu/

Page 2: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Talk Outline

• Introduction to Learning to Rank

• Previous work: Pairwise Approach

• Our proposal: Listwise Approach

– ListNet

– Relational Ranking

• Summary

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 2

Page 3: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Ranking Problem: Example = Information Retrieval

query

documents

ranking of documents ldddD ,,, 21

Ranking System

)(

)(

2

)(

1

)(

i

n

i

i

id

d

d

)(iq

32008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Ranking is also important in NLP applications, such as first-pass attachment disambiguation, and reranking alternative parse trees generated for the same

sentence by a statistical parser, etc.

Page 4: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

IR Evaluation Measures

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 4

• Precision at position n

• Average precision

• MAP: averaged over all queries in the test set

• NDCG at position n:

n

nnP

}results in top documentsrelevant {#@

}documentsrelevant {#

}relevant is document {@

n

nInP

AP

)1log(/)12(@1

)( jZnNn

j

jr

n

Page 5: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Learning to Rank

5

Learning System

Ranking System

Model

),( dqf

),(,

),(,

),(,

22

11

inin

ii

ii

dqfd

dqfd

dqfd

1,

2,

4,

)1(

)1(

2

)1(

1

)1(

)1(nd

d

d

q

2,

3,

5,

)(

)(

2

)(

1

)(

)(

m

n

m

m

m

md

d

d

q

Labels: a total order or multiple-level ratings (as in this example)

queries

do

cum

ents

Training Data

Test data

?),(......,

?),(?),,( 21

nd

dd

q

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 6: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Previous Work: Pairwise Approach

• Transforming ranked list into document pairs

• Formalizing ranking as classification on document pairs

• Ranking SVM, RankNet, RankBoost

Transform

2,

3,

5,

)(

)(

2

)(

1

)(

)(

i

n

i

i

i

id

d

d

q

)()(

2

)()(

1

)(

2

)(

1

)(

)()( ,,...,,,, i

n

ii

n

iii

i

ii dddddd

q

5 > 3 5 > 2 3 > 2

62008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 7: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Problems with Pairwise Approach

• Loss function is suboptimal

– Not to optimize evaluation measures (e.g. NDCG and MAP)

– Not to consider position in ranking and number of documents per query

• Ranking function cannot represent relational information

• Generalization theory is limitedPairwise loss vs. (1-NDCG@5)

TREC Dataset

72008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 8: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Our Proposal: Listwise Approach

• Listwise Loss Function

– Using permutation probability distribution to represent a ranked list

– Using KL-divergence between permutation probability distributions to define loss function

• Learning to rank relational objects

– Embed object relationship in the ranking function

– Formalize ranking function as a new optimization problem.

• Query-level Generalization Analysis

82008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 9: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

ListNet: A Listwise Loss Function for Learning to Rank

(ICML, 2007)

Page 10: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Distance between Ranked Lists

• A Simple Example:– function f: f(A)=3, f(B)=1, f(C)=0 ABC

– function h: h(A)=5, h(B)=2, h(C)=3 ACB

– ground truth g: g(A)=5, g(B)=3, g(C)=2 ABC

• Question: which function is closer to ground truth?

• Euclidian distance between score vectors?– f: <3,1,0> g:<5,3,2> h:<5,2,3>

– However, ranked lists given by f and g are exactly the same!

102008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

d(g,h)=1.4d(f,g)=3.5

Closer!

Page 11: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Representation of Ranked List

• Question:

– How to represent a ranked list?

• Our proposal:

– Ranked list ↔ Permutation probability distribution

112008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 12: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Permutation Probability

• Probability of permutation 𝜋 is defined as (Luce Model)

• Example:

)(sP

n

jn

jk k

j

s

s

1 )(

)(

)exp(

)exp(

1

)C(exp)B(exp

)B(exp

)A(exp)B(exp)A(exp

)A(expABC

ff

f

fff

fPf

P(A ranked No.1)

P(B ranked No.2 | A ranked No.1)= P(B ranked No.1)/(1- P(A ranked No.1))

P(C ranked No.3 | A ranked No.1, B ranked No.2)

122008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 13: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Distance between Ranked Lists

d(f,g) = 0

d(g,h) = 0.5

φ = exp

132008/2/12

Using KL-divergenceto measure difference between distributions

Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 14: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Permutation Probability Loss

• Formulation

• Properties– Continuous and Differentiable– Convex

• Log of a convex function is still convex, and the set of convex functions is closed under addition.

– Consistent• In the large sample limit, minimizing the proposed listwise

loss can achieve the optimal Bayes error rate with respect to 0-1 loss.

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 14

Qq jjG

n

tn

tu jy

jyn

tn

tu jy

jy

ku

t

u

t

Xw

Xw

s

swL

),...,( 1 )(

)(

1 )(

)(

1 )exp(

)exp(log

)exp(

)exp()(

Page 15: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Top-k Probability

• Correctly ranking top-k documents is more critical

• Computation of Permutation Probability is intractable

• Top-k Probability

– Defining Top-k subgroup G(j1, …, jk) containing all permutations whose top-k documents are j1, …, jk

– Time complexity of computation : from n! to

k

tn

tu j

j

ks

u

t

s

sjjGP

1 )(

)(

1

)exp(

)exp(,...,

)!(! knn

152008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 16: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

ListNet Method

• Loss function = KL-divergence between two Top-kprobability distributions

• Model = Neural Network

• Algorithm = Stochastic Gradient Descent

Qq jjG

k

tn

tu jy

jyk

tn

tu jy

jy

ku

t

u

t

Xw

Xw

s

swL

),...,( 1 )(

)(

1 )(

)(

1 )exp(

)exp(log

)exp(

)exp()(

162008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 17: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Experimental Results

Pairwise (RankNet)

172008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Training Performance on TREC Dataset

Listwise (ListNet)

More correlated!

Page 18: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Experimental Results

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 18

NDCG@

Testing Performance on TREC Dataset

Page 19: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Learning to Rank Relational Objects using a Listwise Ranking Function

(WWW, 2008)

Page 20: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Motivation

• Traditional ranking function models independent relevance

– Working on features of single documents

• Many applications in IR are beyond independent relevance

– Subtopic retrieval

– Representative page retrieval

– Pseudo relevance feedback

– Topic distillation

2008/2/12 20Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 21: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Relational Ranking Model

• Generic formulation

• Practical formulation

2008/2/12 21Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Application-dependent

Page 22: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Pseudo Relevance Feedback

• A very common type of relationship between objects is similarity.

– Similarity relationship can be represented in an undirected graph.

– Such kind of graph is widely used in the learning tasks of clustering and semi-supervised learning.

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 22

Page 23: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Topic Distillation

• Topic distillation prefers parent objects to be ranked before child objects.

– Preference relationship can be represented by directed graph.

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 23

Page 24: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Learning of Relational Ranking Model

• Substitute the ranking model into conventional optimization problems for model learning

• Conventional Ranking SVM is a special case of relational ranking

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 24

Page 25: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Challenges

• The optimization problem for relational ranking is a constrained optimization problem, and the constraint itself is also an optimization problem.

• No existing method can be directly applied to the problem.

• We propose solving the problem in two steps.

– Solve the optimization problem in constraint and obtain an explicit form for the constraint

– Solve the original optimization problem with explicit constraint

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 25

Page 26: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Explicit Model for Pseudo Relevance Feedback

2008/2/12 26Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 27: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Explicit Model for Topic Distillation

2008/2/12 27Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Page 28: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Relational Ranking SVM

2008/2/12 28Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Relational Ranking SVM for Pseudo Relevance Feedback

Ranking SVM

Relational Ranking SVM for Topic Distillation

Page 29: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Experimental Results

2008/2/12 29Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM

Pseudo Relevance Feedback on OHSUMED Topic Distillation on TD 2004

Page 30: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Summary

• Listwise approach is more effective than pairwise approach

• Future Work

– Combination of listwise loss function and listwiseranking function

– Investigation of generalization ability of listwiseapproach

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 30

Page 31: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

References• T. Qin, T.-Y. Liu, et al. Learning to Rank Relational Objects, WWW 2008.

• X. Geng, T.-Y. Liu, et al. Feature Selection for Ranking, SIGIR 2007.

• M. Tsai, T.-Y. Liu, et al. FRank: A Ranking Method with Fidelity Loss, SIGIR 2007.

• T. Qin, T.-Y. Liu, et al. Ranking with Multiple Hyperplanes, SIGIR 2007.

• J. Xu, H. Li, AdaRank: A Boosting Approach to Information Retrieval, SIGIR 2007.

• T.-Y. Liu, T. Qin, et al. LETOR: Benchmark dataset for research on learning to rank for information retrieval, LR4IR 2007, in conjunction with SIGIR 2007.

• T. Joachims, H. Li, T.-Y. Liu, C. Zhai, SIGIR Workshop Report: Learning to Rank for Information Retrieval (LR4IR 2007), SIGIR Forum, 2007.

• T.-Y. Liu, J. Xu, et al. Learning to Rank for Information Retrieval, TCACS, 2007.

• Z. Cao, T. Qin, T.-Y. Liu, et al. Learning to Rank: From Pairwise Approach to ListwiseApproach. ICML 2007.

• Y. Liu, T.-Y. Liu, et al. Supervised Rank Aggregation, WWW 2007.

• T. Qin, T.-Y. Liu, et al. Query-level Loss Function for Information Retrieval. IP&M 2007.

• Y. Cao, Jun Xu, T.-Y. Liu, et al. Adapting Ranking SVM to Document Retrieval, SIGIR 2006.

2008/2/12 Tie-Yan Liu @ Tokyo Forum on Advanced NLP and TM 31

Page 32: Learning to Rankdilan/sharefiles/Liu_ranking.pdf · Learning of Relational Ranking Model •Substitute the ranking model into conventional optimization problems for model learning

Thanks!

Benchmark Dataset for Learning to Rank: http://research.microsoft.com/users/LETOR/