learning to rank - university of texas at austinmooney/ir-course/slides/ravikumar/ir_ltr.pdf · is...

Learning to Rank

Pradeep Ravikumar [email protected] !CS 371R !Adapted from Hang Li, MSR Asia, 09!

mailto:[email protected]

Ranking

Coke or Pepsi or Sprite)?

Ranking

Ranking Chess Players

Ranking

Ranking Chess Players

Golf Players, etc.

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects Ranking of Objects

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects Ranking of Objects

From “training” data

RankingRanking(Problem

16

sns

s

s

o

o

o

,

2,

1,

�s

request

offerings

ranking(of(offerings

� �NoooO ,,, 21 ��

),( osf

Objects

Ranking of Objects

Query

Ranking via Machine Learning

Learning(to(Rank

1,1

2,1

1,1

1

no

o

os

��

�

mnm

m

m

m

o

o

os

,

2,

1,

�

Learning(System

Ranking(System

1�ms

),(

),( ),(

11 ,11,1

2,112,1

1,111,1

��

��

��

mm nmmnm

mmm

mmm

osfo

osfo

osfo

�

),( osf

17

� �NoooO ,,, 21 ��

“Training” data

Is learning to rank necessary for IR?

• Has been successfully used in web-document search

• Has been the focus of increasing research at the intersection of machine learning + IR

Example: Machine TranslationRe5Ranking(in(Machine(Translation

138

Re5Ranking(Model

1000

2

1

e

ee

�ranked(sentencecandidates(in(target(language

GenerativeModel

f

sentence(source(language

e~re5ranked(topsentence(in(targetlanguage(

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3


130


User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Input: User Ratings on some itemsOutput: User Ratings on other items


130


User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Any user is a query; based on this query, rank the set of items (based on user-item features)

Example: Key Phrase Extraction

• Input: Document

• Output: Key phrases in document

• Pre-processing step: extraction of possible phrases in document

• Ranking approach: rank the extracted phrases

Other IR applications of learning to rank

• Question Answering

• Document Summarization Opinion Mining

• Sentiment Analysis

• Machine Translation

• Sentence Parsing

• ….

Our focus: ranking documentsRanking(Problem:(Example(=(Document(Retrieval

qnq

q

q

d

d

d

,

2,

1,

�

qquery

documents

ranking(of(documents

� �NdddD ,,, 21 ��

),( dqf

20

ranking(based(on(relevance,(importance,(

preference

Traditional approach to “learning to rank”

Probabilistic(Model

Nd

dd

�

2

1

q

),|(~

),|(~),|(~

22

11

nn dqrPd

dqrPddqrPd

�

query

documents

ranking(of(documents

}0,1{),|(

�RdqrP

23

Modern approach to learning to rank

Learning(to(Rank

1,1

2,1

1,1

1

nd

d

dq

�

�

�

mnm

m

m

m

d

d

dq

,

2,

1,

�

Learning(System

Ranking(System

1�mq

),(

),( ),(

11 ,11,1

2,112,1

1,111,1

��

��

��

mm nmmnm

mmm

mmm

dqfd

dqfd

dqfd

�

),( dqf

26

� �NdddD ,,, 21 ��

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

�

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

�

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

�

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

�

Training(Process

27

�

�

�

�

�

�

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf


&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

�

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

�

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

�

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

�

Training(Process

27

�

�

�

�

�

�

features are functions of query AND document

Learning to Rank: Testing Process

1.(Data(Labeling(rank)


Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��

��

��

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

�

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

�

�

�

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

��

��

��

�

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

�

Learning to Rank: Testing Process

1.(Data(Labeling(rank)


Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��

��

��

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

�

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

�

�

�

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

��

��

��

�

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

�

Ground truth ranks

Issues in learning to rank

• Data Labeling

• Feature Extraction

• Evaluation Measure

• Learning Method (Model, Loss Function, Algorithm)

Data Labeling

Data(Labeling(Problem� E.g.,(relevance(of(documents(w.r.t.(query

32

Doc(A

Doc(B

Doc(C

Query

• Ground truth/supervision: “ranks” of set of documents/objects

Data Labeling: Supervision takes different forms

• Multiple levels (e.g., relevant, partially relevant, irrelevant)

‣ Widely used in IR

• Ordered Pairs

‣ e.g. ordered pairs between documents (e.g. A>B, B>C)

‣ e.g. implicit relevance judgments derived from click-through data

• Fully ordered list (i.e. a permutation) of documents is given

‣ Ideal but difficult to implement

Data Labeling: Implicit Relevance JudgementImplicit(Relevance(Judgment

Doc(A

Doc(B

Doc(CB(>(A

ranking(of(documents(at(search(system

users(often(clicked(on(Doc(B

ordered(pair

34

Feature ExtractionFeature(Extraction

35

Doc(A

Doc(B

Doc(C

Query

BM25

BM25

BM25

PageRank

PageRank

PageRank

.............

.............

.............

Query5document(feature

Document(feature

Feature(Vectors

Feature Extraction: Example Features

• BM25 (a classical ranking score function)

• Proximity/distance measures between query and document

• Whether query exactly occurs in document

• PageRank

• …

Evaluation Measures

• Allows us to quantify how good the predicted ranking is when compared with ground truth ranking

• Typical evaluation metrics care more about ranking top results correctly

• Popular Measures

‣ NDCG (Normalized Discounted Cumulative Gain)

‣ MAP (Mean Average Precision)

‣ MRR (Mean Reciprocal Rank)

‣ WTA (Winners Take All)

‣ Kendall’s Tau

Kendall’s Tau��-!��#

� Comparing(agreement(between(two(permutations

� Number(of(pairs(� Number(of(concordant(pairs(and(disconcordantpairs(

� Completely(agree� Completely(disagree(

)1(21

-

��

nn

QP�

)1(21

�nn

QP,

41

1��

-1��

Kendall’s Tau��-!��#�0��"-1

� ExampleA((B((CC((A((B

42

31

32-1

��

Learning to Rank Methods

• Three major approaches

‣ Pointwise approach

‣ Pairwise approach

‣ Listwise approach

Pointwise Approach

• Transforming ranking to regression, classification, or ordinal regression

• Query-document group structure is ignored

Pointwise ApproachPointwise(Approach

Learning

Regression Classification Ordinal2Regression

Input Feature(vector

Output Real(number Category Ordered category

Model Ranking(function

Loss2Function Regression(loss Classification loss Ordinal(regression(

loss

51

x

)(xfy � ))(sign( xfy � ))((thresold xfy �

)(xf

Pairwise Approach

• Transforming ranking to pairwise classification

• Query-document group structure is ignored

Pairwise ApproachPairwise(Approach

53

Learning Ranking

Input Ordered(feature(vector(pair Feature(vectors

Output Classification on(order(of(vector(pair Permutation on(vectors

Model Ranking function((

Loss2Function Pairwise classification(loss( Ranking evaluation(measure

)(xf

))(sign(, jiji xxfy ��

),( ji xx

))}({sort( 1niixf ��

niix 1}{ ��x

Pairwise(Approach

53

Learning Ranking

Input Ordered(feature(vector(pair Feature(vectors

Output Classification on(order(of(vector(pair Permutation on(vectors

Model Ranking function((

Loss2Function Pairwise classification(loss( Ranking evaluation(measure

)(xf

))(sign(, jiji xxfy ��

),( ji xx


niix 1}{ ��x

Pairwise RankingPairwise(Classification

� Converting(document(list(to(document(pairs

62

Doc(A

Doc(B

Doc(C

Query

Doc(A

Doc(B Doc(C

Query

Doc(ADoc(B

Doc(C

Rank(1

Rank(2

Rank(3

Transformed Pairwise Classification ProblemTransformed(Pairwise(Classification(Problem

66

);( wxf31 xx �

32 xx �

21 xx �

+151

12 xx �

23 xx �

13 xx �

Positive(Examples

Negative(Examples

Listwise Approach

• Entire list of documents as input-instance

• Query-document group structure is used

• Straightforwardly represents learning to rank problem

‣ Many state of the art methods: ListNet, ListMLE, etc.

Listwise ApproachListwise(Approach

55

Learning &2Ranking

Input Feature(vectors

Output Permutation(on(feature(vectors

Model Ranking(function

Loss2Function Listwise loss(function(ranking(evaluation(measure)

niix 1}{ ��x

)(xf


learning to rank - university of texas at austinmooney/ir-course/slides/ravikumar/ir_ltr.pdf · is...

Documents