learning to rank - university of texas at austinmooney/ir-course/slides/ravikumar/ir_ltr.pdf · is...
TRANSCRIPT
Learning to Rank
Pradeep Ravikumar [email protected] !CS 371R !Adapted from Hang Li, MSR Asia, 09!
Ranking
Coke or Pepsi or Sprite)?
Ranking
Ranking Chess Players
Ranking
Ranking Chess Players
Golf Players, etc.
Whither Machine Learning?
Machine Learning Magic Box
Coke, Pepsi, Sprite Coke > Sprite > Pepsi
List of Objects Ranking of Objects
Whither Machine Learning?
Machine Learning Magic Box
Coke, Pepsi, Sprite Coke > Sprite > Pepsi
List of Objects Ranking of Objects
From “training” data
RankingRanking(Problem
16
sns
s
s
o
o
o
,
2,
1,
�s
request
offerings
ranking(of(offerings
� �NoooO ,,, 21 ��
),( osf
Objects
Ranking of Objects
Query
Ranking via Machine Learning
Learning(to(Rank
1,1
2,1
1,1
1
no
o
os
��
�
mnm
m
m
m
o
o
os
,
2,
1,
�
Learning(System
Ranking(System
1�ms
),(
),( ),(
11 ,11,1
2,112,1
1,111,1
�� ���
���
���
mm nmmnm
mmm
mmm
osfo
osfo
osfo
�
),( osf
17
� �NoooO ,,, 21 ��
“Training” data
Is learning to rank necessary for IR?
• Has been successfully used in web-document search
• Has been the focus of increasing research at the intersection of machine learning + IR
Example: Machine TranslationRe5Ranking(in(Machine(Translation
138
Re5Ranking(Model
1000
2
1
e
ee
�ranked(sentencecandidates(in(target(language
GenerativeModel
f
sentence(source(language
e~re5ranked(topsentence(in(targetlanguage(
Example: Collaborative FilteringCollaborative(Filtering
130
Item1 I tem2 Item3 ...
User1 5 4
User2 1 2 2
... ? ? ?
User M 4 3
Example: Collaborative FilteringCollaborative(Filtering
130
Item1 I tem2 Item3 ...
User1 5 4
User2 1 2 2
... ? ? ?
User M 4 3
Input: User Ratings on some itemsOutput: User Ratings on other items
Example: Collaborative FilteringCollaborative(Filtering
130
Item1 I tem2 Item3 ...
User1 5 4
User2 1 2 2
... ? ? ?
User M 4 3
Any user is a query; based on this query, rank the set of items (based on user-item features)
Example: Key Phrase Extraction
• Input: Document
• Output: Key phrases in document
• Pre-processing step: extraction of possible phrases in document
• Ranking approach: rank the extracted phrases
Other IR applications of learning to rank
• Question Answering
• Document Summarization Opinion Mining
• Sentiment Analysis
• Machine Translation
• Sentence Parsing
• ….
Our focus: ranking documentsRanking(Problem:(Example(=(Document(Retrieval
qnq
q
q
d
d
d
,
2,
1,
�
qquery
documents
ranking(of(documents
� �NdddD ,,, 21 ��
),( dqf
20
ranking(based(on(relevance,(importance,(
preference
Traditional approach to “learning to rank”
Probabilistic(Model
Nd
dd
�
2
1
q
),|(~
),|(~),|(~
22
11
nn dqrPd
dqrPddqrPd
�
query
documents
ranking(of(documents
}0,1{),|(
�RdqrP
23
Modern approach to learning to rank
Learning(to(Rank
1,1
2,1
1,1
1
nd
d
dq
�
�
�
mnm
m
m
m
d
d
dq
,
2,
1,
�
Learning(System
Ranking(System
1�mq
),(
),( ),(
11 ,11,1
2,112,1
1,111,1
�� ���
���
���
mm nmmnm
mmm
mmm
dqfd
dqfd
dqfd
�
),( dqf
26
� �NdddD ,,, 21 ��
Learning to Rank: Training Process
1.(Data(Labeling(rank) 3.(Learning
)(xf
2.(Feature(Extraction
&&
%
&&
$
#
1,1
2,1
1,1
1
nd
d
d
q�
&&
%
&&
$
#
mnm
m
m
m
d
d
d
q
,
2,
1,
�
&&
%
&&
$
#
11 ,1,1
2,12,1
1,11,1
1
nn yd
yd
yd
q�
&&
%
&&
$
#
mm nmnm
mm
mm
m
yd
yd
yd
q
,,
2,2,
1,1,
�
&&
%
&&
$
#
11 ,1,1
2,12,1
1,11,1
nn yx
yx
yx
�
&&
%
&&
$
#
mm nmnm
mm
mm
yx
yx
yx
,,
2,2,
1,1,
�
Training(Process
27
�
�
�
�
�
�
Learning to Rank: Training Process
1.(Data(Labeling(rank) 3.(Learning
)(xf
2.(Feature(Extraction
&&
%
&&
$
#
1,1
2,1
1,1
1
nd
d
d
q�
&&
%
&&
$
#
mnm
m
m
m
d
d
d
q
,
2,
1,
�
&&
%
&&
$
#
11 ,1,1
2,12,1
1,11,1
1
nn yd
yd
yd
q�
&&
%
&&
$
#
mm nmnm
mm
mm
m
yd
yd
yd
q
,,
2,2,
1,1,
�
&&
%
&&
$
#
11 ,1,1
2,12,1
1,11,1
nn yx
yx
yx
�
&&
%
&&
$
#
mm nmnm
mm
mm
yx
yx
yx
,,
2,2,
1,1,
�
Training(Process
27
�
�
�
�
�
�
Learning to Rank: Training Process
1.(Data(Labeling(rank) 3.(Learning
)(xf
2.(Feature(Extraction
&&
%
&&
$
#
1,1
2,1
1,1
1
nd
d
d
q�
&&
%
&&
$
#
mnm
m
m
m
d
d
d
q
,
2,
1,
�
&&
%
&&
$
#
11 ,1,1
2,12,1
1,11,1
1
nn yd
yd
yd
q�
&&
%
&&
$
#
mm nmnm
mm
mm
m
yd
yd
yd
q
,,
2,2,
1,1,
�
&&
%
&&
$
#
11 ,1,1
2,12,1
1,11,1
nn yx
yx
yx
�
&&
%
&&
$
#
mm nmnm
mm
mm
yx
yx
yx
,,
2,2,
1,1,
�
Training(Process
27
�
�
�
�
�
�
features are functions of query AND document
Learning to Rank: Testing Process
1.(Data(Labeling(rank)
2.(Feature(Extraction
Testing(Process
28
3.((Ranking(with )(xf
&&
%
&&
$
#
��� ���
���
���
)(
)( )(
111 ,1,1,1
2,12,12,1
1,11,11,1
mmm nmnmnm
mmm
mmm
yxfx
yxfx
yxfx
�
4.((Evaluation
EvaluationResult
&&
%
&&
$
#
��
�
�
�
1,1
2,1
1,1
1
mnm
m
m
m
d
d
d
q�
&&
%
&&
$
#
�� ��
��
��
�
11 ,1,1
2,12,1
1,11,1
1
mm nmnm
mm
mm
m
yd
yd
yd
q�
&&
%
&&
$
#
�� ��
��
��
11 ,1,1
2,12,1
1,11,1
mm nmnm
mm
mm
yx
yx
yx
�
Learning to Rank: Testing Process
1.(Data(Labeling(rank)
2.(Feature(Extraction
Testing(Process
28
3.((Ranking(with )(xf
&&
%
&&
$
#
��� ���
���
���
)(
)( )(
111 ,1,1,1
2,12,12,1
1,11,11,1
mmm nmnmnm
mmm
mmm
yxfx
yxfx
yxfx
�
4.((Evaluation
EvaluationResult
&&
%
&&
$
#
��
�
�
�
1,1
2,1
1,1
1
mnm
m
m
m
d
d
d
q�
&&
%
&&
$
#
�� ��
��
��
�
11 ,1,1
2,12,1
1,11,1
1
mm nmnm
mm
mm
m
yd
yd
yd
q�
&&
%
&&
$
#
�� ��
��
��
11 ,1,1
2,12,1
1,11,1
mm nmnm
mm
mm
yx
yx
yx
�
Ground truth ranks
Learning to Rank: Testing Process
1.(Data(Labeling(rank)
2.(Feature(Extraction
Testing(Process
28
3.((Ranking(with )(xf
&&
%
&&
$
#
��� ���
���
���
)(
)( )(
111 ,1,1,1
2,12,12,1
1,11,11,1
mmm nmnmnm
mmm
mmm
yxfx
yxfx
yxfx
�
4.((Evaluation
EvaluationResult
&&
%
&&
$
#
��
�
�
�
1,1
2,1
1,1
1
mnm
m
m
m
d
d
d
q�
&&
%
&&
$
#
�� ��
��
��
�
11 ,1,1
2,12,1
1,11,1
1
mm nmnm
mm
mm
m
yd
yd
yd
q�
&&
%
&&
$
#
�� ��
��
��
11 ,1,1
2,12,1
1,11,1
mm nmnm
mm
mm
yx
yx
yx
�
Ground truth ranks
Learning to Rank: Testing Process
1.(Data(Labeling(rank)
2.(Feature(Extraction
Testing(Process
28
3.((Ranking(with )(xf
&&
%
&&
$
#
��� ���
���
���
)(
)( )(
111 ,1,1,1
2,12,12,1
1,11,11,1
mmm nmnmnm
mmm
mmm
yxfx
yxfx
yxfx
�
4.((Evaluation
EvaluationResult
&&
%
&&
$
#
��
�
�
�
1,1
2,1
1,1
1
mnm
m
m
m
d
d
d
q�
&&
%
&&
$
#
�� ��
��
��
�
11 ,1,1
2,12,1
1,11,1
1
mm nmnm
mm
mm
m
yd
yd
yd
q�
&&
%
&&
$
#
�� ��
��
��
11 ,1,1
2,12,1
1,11,1
mm nmnm
mm
mm
yx
yx
yx
�
Ground truth ranks
Issues in learning to rank
• Data Labeling
• Feature Extraction
• Evaluation Measure
• Learning Method (Model, Loss Function, Algorithm)
Data Labeling
Data(Labeling(Problem� E.g.,(relevance(of(documents(w.r.t.(query
32
Doc(A
Doc(B
Doc(C
Query
• Ground truth/supervision: “ranks” of set of documents/objects
Data Labeling: Supervision takes different forms
• Multiple levels (e.g., relevant, partially relevant, irrelevant)
‣ Widely used in IR
• Ordered Pairs
‣ e.g. ordered pairs between documents (e.g. A>B, B>C)
‣ e.g. implicit relevance judgments derived from click-through data
• Fully ordered list (i.e. a permutation) of documents is given
‣ Ideal but difficult to implement
Data Labeling: Implicit Relevance JudgementImplicit(Relevance(Judgment
Doc(A
Doc(B
Doc(CB(>(A
ranking(of(documents(at(search(system
users(often(clicked(on(Doc(B
ordered(pair
34
Feature ExtractionFeature(Extraction
35
Doc(A
Doc(B
Doc(C
Query
BM25
BM25
BM25
PageRank
PageRank
PageRank
.............
.............
.............
Query5document(feature
Document(feature
Feature(Vectors
Feature Extraction: Example Features
• BM25 (a classical ranking score function)
• Proximity/distance measures between query and document
• Whether query exactly occurs in document
• PageRank
• …
Evaluation Measures
• Allows us to quantify how good the predicted ranking is when compared with ground truth ranking
• Typical evaluation metrics care more about ranking top results correctly
• Popular Measures
‣ NDCG (Normalized Discounted Cumulative Gain)
‣ MAP (Mean Average Precision)
‣ MRR (Mean Reciprocal Rank)
‣ WTA (Winners Take All)
‣ Kendall’s Tau
Kendall’s Tau�������-!���#
� Comparing(agreement(between(two(permutations
� Number(of(pairs(� Number(of(concordant(pairs(and(disconcordantpairs(
� Completely(agree� Completely(disagree(
)1(21
-
��
nn
QP�
)1(21
�nn
QP,
41
1���
-1��
Kendall’s Tau�������-!���#�0���"-1
� ExampleA((B((CC((A((B
42
31
32-1
����
Learning to Rank Methods
• Three major approaches
‣ Pointwise approach
‣ Pairwise approach
‣ Listwise approach
Pointwise Approach
• Transforming ranking to regression, classification, or ordinal regression
• Query-document group structure is ignored
Pointwise ApproachPointwise(Approach
Learning
Regression Classification Ordinal2Regression
Input Feature(vector
Output Real(number Category Ordered category
Model Ranking(function
Loss2Function Regression(loss Classification loss Ordinal(regression(
loss
51
x
)(xfy � ))(sign( xfy � ))((thresold xfy �
)(xf
Pairwise Approach
• Transforming ranking to pairwise classification
• Query-document group structure is ignored
Pairwise ApproachPairwise(Approach
53
Learning Ranking
Input Ordered(feature(vector(pair Feature(vectors
Output Classification on(order(of(vector(pair Permutation on(vectors
Model Ranking function((
Loss2Function Pairwise classification(loss( Ranking evaluation(measure
)(xf
))(sign(, jiji xxfy ��
),( ji xx
))}({sort( 1niixf ���
niix 1}{ ��x
Pairwise(Approach
53
Learning Ranking
Input Ordered(feature(vector(pair Feature(vectors
Output Classification on(order(of(vector(pair Permutation on(vectors
Model Ranking function((
Loss2Function Pairwise classification(loss( Ranking evaluation(measure
)(xf
))(sign(, jiji xxfy ��
),( ji xx
))}({sort( 1niixf ���
niix 1}{ ��x
Pairwise RankingPairwise(Classification
� Converting(document(list(to(document(pairs
62
Doc(A
Doc(B
Doc(C
Query
Doc(A
Doc(B Doc(C
Query
Doc(ADoc(B
Doc(C
Rank(1
Rank(2
Rank(3
Transformed Pairwise Classification ProblemTransformed(Pairwise(Classification(Problem
66
);( wxf31 xx �
32 xx �
21 xx �
+151
12 xx �
23 xx �
13 xx �
Positive(Examples
Negative(Examples
Listwise Approach
• Entire list of documents as input-instance
• Query-document group structure is used
• Straightforwardly represents learning to rank problem
‣ Many state of the art methods: ListNet, ListMLE, etc.
Listwise ApproachListwise(Approach
55
Learning &2Ranking
Input Feature(vectors
Output Permutation(on(feature(vectors
Model Ranking(function
Loss2Function Listwise loss(function(ranking(evaluation(measure)
niix 1}{ ��x
)(xf
))}({sort( 1niixf ���