Download - Part 1
![Page 1: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/1.jpg)
Learning to Rank (part 1)
NESCAI 2008 Tutorial
Yisong YueCornell University
![Page 2: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/2.jpg)
Booming Search Industry
![Page 3: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/3.jpg)
Goals for this Tutorial
Basics of information retrieval
What machine learning contributes
New challenges to address
New insights on developing ML algorithms
![Page 4: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/4.jpg)
(Soft) Prerequisites
Basic knowledge of ML algorithms Support Vector Machines Neural Nets Decision Trees Boosting Etc…
Will introduce IR concepts as needed
![Page 5: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/5.jpg)
Outline (Part 1)
Conventional IR Methods (no learning) 1970s to 1990s
Ordinal Regression 1994 onwards
Optimizing Rank-Based Measures 2005 to present
![Page 6: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/6.jpg)
Outline (Part 2)
Effectively collecting training data E.g., interpreting clickthrough data
Beyond independent relevance E.g., diversity
Summary & Discussion
![Page 7: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/7.jpg)
Disclaimer
This talk is very ML-centric Use IR methods to generate features Learn good ranking functions on feature space Focus on optimizing cleanly formulated objectives Outperform traditional IR methods
![Page 8: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/8.jpg)
Disclaimer
This talk is very ML-centric Use IR methods to generate features Learn good ranking functions on feature space Focus on optimizing cleanly formulated objectives Outperform traditional IR methods
Information Retrieval Broader than the scope of this talk Deals with more sophisticated modeling questions Will see more interplay between IR and ML in Part 2
![Page 9: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/9.jpg)
Brief Overview of IR
Predated the internet As We May Think by Vannevar Bush (1945)
Active research topic by the 1960’s Vector Space Model (1970s) Probabilistic Models (1980s)
Introduction to Information Retrieval (2008) C. Manning, P. Raghavan & H. Schütze
![Page 10: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/10.jpg)
Basic Approach to IR
Given query q and set of docs d1, … dn
Find documents relevant to q Typically expressed as a ranking on d1,… dn
![Page 11: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/11.jpg)
Basic Approach to IR
Given query q and set of docs d1, … dn
Find documents relevant to q Typically expressed as a ranking on d1,… dn
Similarity measure sim(a,b)!R Sort by sim(q,di) Optimal if relevance of documents are
independent. [Robertson, 1977]
![Page 12: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/12.jpg)
Vector Space Model
Represent documents as vectors One dimension for each word Queries as short documents
Similarity Measures Cosine similarity = normalized dot product
BA
BABA
),cos(
![Page 13: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/13.jpg)
Cosine Similarity Example
![Page 14: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/14.jpg)
Other Methods
TF-IDF [Salton & Buckley, 1988]
Okapi BM25 [Robertson et al., 1995]
Language Models [Ponte & Croft, 1998] [Zhai & Lafferty, 2001]
![Page 15: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/15.jpg)
Machine Learning
IR uses fixed models to define similarity scores
Many opportunities to learn models Appropriate training data Appropriate learning formulation
Will mostly use SVM formulations as examples General insights are applicable to other techniques.
![Page 16: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/16.jpg)
Training Data
Supervised learning problem
Document/query pairs Embedded in high dimensional feature space
Labeled by relevance of doc to query Traditionally 0/1 Recently ordinal classes of relevance (0,1,2,3,…)
![Page 17: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/17.jpg)
Feature Space
Use to learn a similarity/compatibility function
Based off existing IR methods Can use raw values Or transformations of raw values
Based off raw words Capture co-occurrence of words
![Page 18: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/18.jpg)
Training Instances
ikik
ijij
IR
IR
IR
ii
ii
dq
dwqw
dwqw
dqsim
dqrank
dqrank
dqTF
dqTF
dqx
),(
10 in top ),(
5 in top ),(
05.0),(
1.0),(
),(,
![Page 19: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/19.jpg)
Learning Problem
Given training instances: (xq,d, yq,d) for q = {1..N}, d = {1 .. Nq}
Learn a ranking function f(xq,1, … xq,Nq ) ! Ranking
Typically decomposed into per doc scores f(x) ! R (doc/query compatibility) Sort by scores for all instances of a given q
![Page 20: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/20.jpg)
How to Train?
Classification & Regression Learn f(x) ! R in conventional ways Sort by f(x) for all docs for a query Typically does not work well
2 Major Problems Labels have ordering
Additional structure compared to multiclass problems Severe class imbalance
Most documents are not relevant
![Page 21: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/21.jpg)
Conventional multiclass learning does not incorporateordinal structure of class labels
Not Relevant
Somewhat Relevant
Very Relevant
![Page 22: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/22.jpg)
Conventional multiclass learning does not incorporateordinal structure of class labels
Not Relevant
Somewhat Relevant
Very Relevant
![Page 23: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/23.jpg)
Ordinal Regression
Assume class labels are ordered True since class labels indicate level of relevance
Learn hypothesis function f(x) ! R Such that the ordering of f(x) agrees with label ordering Ex: given instances (x, 1), (y, 1), (z, 2)
f(x) < f(z) f(y) < f(z) Don’t care about f(x) vs f(y)
![Page 24: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/24.jpg)
Ordinal Regression
Compare with classification Similar to multiclass prediction But classes have ordinal structure
Compare with regression Doesn’t necessarily care about value of f(x) Only care that ordering is preserved
![Page 25: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/25.jpg)
Ordinal Regression Approaches
Learn multiple thresholds
Learn multiple classifiers
Optimize pairwise preferences
![Page 26: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/26.jpg)
Option 1: Multiple Thresholds
Maintain T thresholds (b1, … bT)
b1 < b2 < … < bT
Learn model parameters + (b1, …, bT)
Goal Model predicts a score on input example Minimize threshold violation of predictions
![Page 27: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/27.jpg)
Ordinal SVM Example
[Chu & Keerthi, 2005]
![Page 28: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/28.jpg)
Ordinal SVM Formulation
T
j i ijiji
bw N
Cw
1,,
2
,,, 2
1minarg
i
jyibxw
jyibxw
jiji
ijijiT
ijijiT
,0,
1: ,1
: ,1
1,,
1,
,
Such that for j = 0..T :
[Chu & Keerthi, 2005]
Tbbb ...21And also:
![Page 29: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/29.jpg)
Learning Multiple ThresholdsGaussian Processes [Chu & Ghahramani, 2005]
Decision Trees [Kramer et al., 2001]
Neural Nets RankProp [Caruana et al., 1996]
SVMs & Perceptrons PRank [Crammer & Singer, 2001] [Chu & Keerthi, 2005]
![Page 30: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/30.jpg)
Option 2: Voting Classifiers Use T different training sets
Classifier 1 predicts 0 vs 1,2,…T Classifier 2 predicts 0,1 vs 2,3,…T … Classifier T predicts 0,1,…,T-1 vs T
Final prediction is combination E.g., sum of predictions
Recent work McRank [Li et al., 2007] [Qin et al., 2007]
![Page 31: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/31.jpg)
•Severe class imbalance •Near perfect performance by always predicting 0
![Page 32: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/32.jpg)
Option 3: Pairwise Preferences
Most popular approach for IR applications
Learn model to minimize pairwise disagreements
%(Pairwise Agreements) = ROC-Area
![Page 33: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/33.jpg)
• 2 pairwise disagreements
![Page 34: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/34.jpg)
Optimizing Pairwise Preferences
Consider instances (x1,y1) and (x2,y2)
Label order has y1 > y2
![Page 35: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/35.jpg)
Optimizing Pairwise Preferences
Consider instances (x1,y1) and (x2,y2)
Label order has y1 > y2
Create new training instance (x’, +1) where x’ = (x1 – x2)
Repeat for all instance pairs with label order preference
![Page 36: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/36.jpg)
Optimizing Pairwise Preferences
Result: new training set! Often represented implicitly
Has only positive examples
Mispredicting means that a lower ordered instance received higher score than higher order instance.
![Page 37: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/37.jpg)
Pairwise SVM Formulation
ji
jiw N
Cw
,,
2
, 2
1minarg
ji
yyjixwxw
ji
jijijT
iT
, ,0
:, ,1
,
,
Such that:
[Herbrich et al., 1999]
Can be reduced to time [Joachims, 2005]. ))log(( nnO
![Page 38: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/38.jpg)
Optimizing Pairwise Preferences
Neural Nets RankNet [Burges et al., 2005]
Boosting & Hedge-Style Methods [Cohen et al., 1998] RankBoost [Freund et al., 2003] [Long & Servidio, 2007]
SVMs [Herbrich et al., 1999] SVM-perf [Joachims, 2005] [Cao et al., 2006]
![Page 39: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/39.jpg)
Rank-Based Measures
Pairwise Preferences not quite right Assigns equal penalty for errors no matter where
in the ranking
People (mostly) care about top of ranking IR community use rank-based measures which
capture this property.
![Page 40: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/40.jpg)
Rank-Based Measures
Binary relevance Precision@K (P@K) Mean Average Precision (MAP) Mean Reciprocal Rank (MRR)
Multiple levels of relevance Normalized Discounted Cumulative Gain (NDCG)
![Page 41: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/41.jpg)
Precision@K
Set a rank threshold K
Compute % relevant in top K
Ignores documents ranked lower than K
Ex: Prec@3 of 2/3 Prec@4 of 2/4 Prec@5 of 3/5
![Page 42: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/42.jpg)
Mean Average Precision
Consider rank position of each relevance doc K1, K2, … KR
Compute Precision@K for each K1, K2, … KR
Average precision = average of P@K
Ex: has AvgPrec of
MAP is Average Precision across multiple queries/rankings
76.05
3
3
2
1
1
3
1
![Page 43: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/43.jpg)
Mean Reciprocal Rank
Consider rank position, K, of first relevance doc
Reciprocal Rank score =
MRR is the mean RR across multiple queries
K
1
![Page 44: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/44.jpg)
NDCG
Normalized Discounted Cumulative Gain Multiple Levels of Relevance
DCG: contribution of ith rank position:
Ex: has DCG score of
NDCG is normalized DCG best possible ranking as score NDCG = 1
)1log(
12
i
iy
45.5)6log(
1
)5log(
0
)4log(
1
)3log(
3
)2log(
1
![Page 45: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/45.jpg)
Optimizing Rank-Based Measures
Let’s directly optimize these measures As opposed to some proxy (pairwise prefs)
But… Objective function no longer decomposes
Pairwise prefs decomposed into each pair
Objective function flat or discontinuous
![Page 46: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/46.jpg)
Discontinuity Example
NDCG = 0.63
D1 D2 D3
Retrieval Score 0.9 0.6 0.3
Rank 1 2 3
Relevance 0 1 0
![Page 47: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/47.jpg)
Discontinuity Example
NDCG computed using rank positionsRanking via retrieval scores
D1 D2 D3
Retrieval Score 0.9 0.6 0.3
Rank 1 2 3
![Page 48: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/48.jpg)
Discontinuity Example
NDCG computed using rank positionsRanking via retrieval scores Slight changes to model parameters Slight changes to retrieval scores No change to ranking No change to NDCG
D1 D2 D3
Retrieval Score 0.9 0.6 0.3
Rank 1 2 3
![Page 49: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/49.jpg)
Discontinuity Example
NDCG computed using rank positionsRanking via retrieval scores Slight changes to model parameters Slight changes to retrieval scores No change to ranking No change to NDCG
D1 D2 D3
Retrieval Score 0.9 0.6 0.3
Rank 1 2 3
NDCG discontinuous w.r.t model parameters!
![Page 50: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/50.jpg)
[Yue & Burges, 2007]
![Page 51: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/51.jpg)
Optimizing Rank-Based Measures
Relaxed Upper Bound Structural SVMs for hinge loss relaxation
SVM-map [Yue et al., 2007] [Chapelle et al., 2007]
Boosting for exponential loss relaxation [Zheng et al., 2007] AdaRank [Xu et al., 2007]
Smooth Approximations for Gradient Descent LambdaRank [Burges et al., 2006] SoftRank GP [Snelson & Guiver, 2007]
![Page 52: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/52.jpg)
Structural SVMs
Let x denote the set of documents/query examples for a query Let y denote a (weak) ranking
Same objective function:
Constraints are defined for each incorrect labeling y’ over the set of documents x.
After learning w, a prediction is made by sorting on wTxi
i
iN
Cw 2
2
1
)'(),'(),( :' yxywxywyy TT
[Tsochantaridis et al., 2007]
![Page 53: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/53.jpg)
Structural SVMs for MAP
Maximize
subject to
where ( yij = {-1, +1} )
and
Sum of slacks upper bound MAP loss.
reli relj
jiij xxyxy: :!
)(),(
i
iN
Cw 2
2
1
)'(),'(),( :' yxywxywyy TT
)(1)( yAvgprecy
[Yue et al., 2007]
![Page 54: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/54.jpg)
Too Many Constraints!
For Average Precision, the true labeling is a ranking where the relevant documents are all ranked in the front, e.g.,
An incorrect labeling would be any other ranking, e.g.,
This ranking has Average Precision of about 0.8 with (y,y’) ¼ 0.2
Intractable number of rankings, thus an intractable number of constraints!
![Page 55: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/55.jpg)
Structural SVM Training
STEP 1: Solve the SVM objective function using only the current working set of constraints.
STEP 2: Using the model learned in STEP 1, find the most violated constraint from the exponential set of constraints.
STEP 3: If the constraint returned in STEP 2 is more violated than the most violated constraint the working set by some small constant, add that constraint to the working set.
Repeat STEP 1-3 until no additional constraints are added. Return the most recent model that was trained in STEP 1.
STEP 1-3 is guaranteed to loop for at most a polynomial number of iterations. [Tsochantaridis et al., 2005]
![Page 56: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/56.jpg)
Illustrative Example
Original SVM Problem Exponential constraints Most are dominated by a small
set of “important” constraints
Structural SVM Approach Repeatedly finds the next most
violated constraint… …until set of constraints is a good
approximation.
![Page 57: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/57.jpg)
Illustrative Example
Original SVM Problem Exponential constraints Most are dominated by a small
set of “important” constraints
Structural SVM Approach Repeatedly finds the next most
violated constraint… …until set of constraints is a good
approximation.
![Page 58: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/58.jpg)
Illustrative Example
Original SVM Problem Exponential constraints Most are dominated by a small
set of “important” constraints
Structural SVM Approach Repeatedly finds the next most
violated constraint… …until set of constraints is a good
approximation.
![Page 59: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/59.jpg)
Illustrative Example
Original SVM Problem Exponential constraints Most are dominated by a small
set of “important” constraints
Structural SVM Approach Repeatedly finds the next most
violated constraint… …until set of constraints is a good
approximation.
![Page 60: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/60.jpg)
Finding Most Violated Constraint
Required for structural SVM training Depends on structure of loss function Depends on structure of joint discriminant Efficient algorithms exist despite intractable
number of constraints.
More than one approach [Yue et al., 2007] [Chapelle et al., 2007]
![Page 61: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/61.jpg)
Gradient Descent
Objective function is discontinuous Difficult to define a smooth global approximation Upper-bound relaxations (e.g., SVMs, Boosting)
sometimes too loose.
We only need the gradient! But objective is discontinuous… … so gradient is undefined
Solution: smooth approximation of the gradient Local approximation
![Page 62: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/62.jpg)
LambdaRank
Assume implicit objective function C
Goal: compute dC/dsi
si = f(xi) denotes score of document xi
Given gradient on document scores Use chain rule to compute gradient on model
parameters (of f)[Burges et al., 2006]
![Page 63: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/63.jpg)
Intuition: •Rank-based measures emphasize top of ranking•Higher ranked docs should have larger derivatives(Red Arrows)•Optimizing pairwise preferences emphasize bottom of ranking (Black Arrows)
[Burges, 2007]
![Page 64: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/64.jpg)
LambdaRank for NDCG
The pairwise derivative of pair i,j is
Total derivative of output si is
)exp(1
1),(
jiij ss
jiNDCG
i iDj Djjiiji
is
C
![Page 65: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/65.jpg)
Properties of LambdaRank 1There exists a cost function C if
Amounts to the Hessian of C being symmetric
If Hessian also positive semi-definite, then C is convex.
i
j
j
i
ssji
:,
1Subject to additional assumptions – see [Burges et al., 2006]
![Page 66: Part 1](https://reader033.vdocuments.mx/reader033/viewer/2022051314/54c25eef4a7959ec478b4591/html5/thumbnails/66.jpg)
Summary (Part 1)
Machine learning is a powerful tool for designing information retrieval models
Requires clean formulation of objective
Advances Ordinal regression Dealing with severe class imbalances Optimizing rank-based measures via relaxations Gradient descent on non-smooth objective functions