learning to rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · learning to rank approaches...

42
Learning to Rank Temporal Information Retrieval pointwise, pairwise and listless approaches

Upload: others

Post on 13-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Learning to Rank

Temporal Information Retrieval

pointwise, pairwise and listless approaches

Page 2: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Road so Far - Ranking in IR

2

• Ranking documents important for information overload, quickly finding documents which are “relevant” for the query

• Interpretations and Modelling of relevance

• Geometric Interpretation — Vector Space Similarity, Okapi BM25

• Probabilistic interpretation — Probabilistic IR, Statistical LM

Page 3: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Road so Far - Ranking in IR

2

• Ranking documents important for information overload, quickly finding documents which are “relevant” for the query

• Interpretations and Modelling of relevance

• Geometric Interpretation — Vector Space Similarity, Okapi BM25

• Probabilistic interpretation — Probabilistic IR, Statistical LM

sim(d, q) =

Pi2V wd(i).wq(i)pP

i2V wd(qi)2pP

i2V wq(i)2

Page 4: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Road so Far - Ranking in IR

2

• Ranking documents important for information overload, quickly finding documents which are “relevant” for the query

• Interpretations and Modelling of relevance

• Geometric Interpretation — Vector Space Similarity, Okapi BM25

• Probabilistic interpretation — Probabilistic IR, Statistical LM

sim(d, q) =

Pi2V wd(i).wq(i)pP

i2V wd(qi)2pP

i2V wq(i)2

P (D|Q) / P (Q |D).P (D)

P (Q|D) =Y

wi2Q

�.P (wi|D) + (1� �).P (wi|C)corpus contrib.

doc. contrib.

P (Q|D) =Y

wi2Q

tf(wi;D) + µ.P (wi|C)

|D|+ µ

corpus contrib.doc.

contrib.laplace smoothing

dirichlet smoothing

Page 5: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Other Ranking features

3

• Link analysis - Page rank, HITS

• Temporal ranking

• Recency ranking

• Historical features

• Query similarity to the document title

• Any other features that you can think of ?

Page 6: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Other Ranking features

3

• Link analysis - Page rank, HITS

• Temporal ranking

• Recency ranking

• Historical features

• Query similarity to the document title

• Any other features that you can think of ?

• Anchor text similarity

• proximity of query terms in the document

Page 7: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Other Ranking features

3

• Link analysis - Page rank, HITS

• Temporal ranking

• Recency ranking

• Historical features

• Query similarity to the document title

• Any other features that you can think of ?

How do you design a ranking function if you have many such features ?

• Anchor text similarity

• proximity of query terms in the document

Page 8: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Learning for IR

4

• What if we can learn a model from a set of features ?

• Focus on developing interesting and novel features and less on how to put them together

• Why wasn't learning approaches used earlier ?

• Limited training data

• Machine learning techniques were not good enough

• Traditional IR methods had small number of features and worked well

• small features => easily hand tunable

• Not many specialised IR problems

Page 9: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Supervised learning

5

• Given a set of training examples how do we learn a model to predict, classify, rank objects

• classical tasks: regression, classification

• Terminology: Input space, output space, hypothesis, loss function

• Input space: (query, document) pairs

• Output space: {relevant, nonrelevant}

Page 10: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Classification for IR

6

Non Relevant

Relevant

• Given a set of relevant and non-relevant documents find the hypothesis which best separates them

• Minimizes loss - e.g. number of misclassified documents (squared loss, hinge loss, cross entropy)

• Some classical classification approaches: Naive Bayes, decision trees, SVMs

Page 11: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Classification for IR

6

Non Relevant

Relevant

• Given a set of relevant and non-relevant documents find the hypothesis which best separates them

• Minimizes loss - e.g. number of misclassified documents (squared loss, hinge loss, cross entropy)

• Some classical classification approaches: Naive Bayes, decision trees, SVMs

Page 12: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

7

non relevantRelevant

Page 13: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

7

non relevantRelevant

Page 14: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

7

non relevantRelevant

Page 15: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

7

non relevantRelevant

Page 16: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

7

non relevantRelevant

Page 17: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

8

Large  Margin  Classifiers

Page 18: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

9

hw, xi+ b �1 hw, xi+ b � 1

f(x) = hw, xi+ blinear  func1onfeature vector for

(q,d) pair normal vector for separator

dot product

Assume:  Linearly  Separable

Page 19: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

10

hw, xi+ b = �1 hw, xi+ b = 1

hx+ � x�, wi2 kwk =

1

2 kwk [[hx+, wi+ b]� [hx�, wi+ b]] =1

kwk

margin

w

Page 20: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

hw, xi+ b = �1 hw, xi+ b = 1

op1miza1on  problem

w

maximize

w,b

1

kwk subject to yi [hxi, wi+ b] � 1

Page 21: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

hw, xi+ b = �1 hw, xi+ b = 1

op1miza1on  problem

w

minimize

w,b

1

2

kwk2 subject to yi [hxi, wi+ b] � 1

+1 if relevant , -1 o.w.

Page 22: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Support Vector Machines

13

• Weight vector w as weighted linear combination of instances

• Only points on margin matter (ignore the rest and get same solution)• Only inner products matter

• Quadratic program• Keeps instances away from the margin

w

Page 23: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Soft Margins

14

Convex  op)miza)on  problem

hw, xi+ b �1 + ⇠ hw, xi+ b � 1� ⇠

Page 24: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Soft Margins

14

Convex  op)miza)on  problem

hw, xi+ b �1 + ⇠ hw, xi+ b � 1� ⇠

Page 25: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Soft Margins

14

Convex  op)miza)on  problem

hw, xi+ b �1 + ⇠

minimize amount of slack

hw, xi+ b � 1� ⇠

Page 26: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Soft Margins

15

• Hard  margin  problem  

• With  slack  variables

minimize

w,b

1

2

kwk2 subject to yi [hw, xii+ b] � 1

minimize

w,b

1

2

kwk2 + C

X

i

⇠i

subject to yi [hw, xii+ b] � 1� ⇠i and ⇠i � 0

Whats the loss function here ?

Page 27: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Loss Function Perspective

16

• Soft margin loss • Binary loss

max(0, 1� yf(x))

{yf(x) < 0}

convex upper bound

binary loss function margin

Page 28: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Loss Function Perspective

16

• Soft margin loss • Binary loss

max(0, 1� yf(x))

{yf(x) < 0}

convex upper bound

binary loss function margin

Hinge  loss

Page 29: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Loss Function Perspective

16

• Soft margin loss • Binary loss

max(0, 1� yf(x))

{yf(x) < 0}

convex upper bound

binary loss function margin

Hinge  loss

Hinge  loss

Page 30: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Evaluation and Relevance Judgements

17

• Construct a test set containing a large number of (randomly sampled) queries, their associated documents, and relevance judgment for each (query, document) pair.

• Relevance judgement is the assessment of the relevance of documents for a query

• Degree of relevance

• Binary: relevant vs. irrelevant

• Multiple ordered categories: Perfect > Excellent > Good > Fair > Bad

• Pairwise preference

• Document A is more relevant than document B

• Total order

• Documents are ranked as {A,B,C,..} according to their relevance

Page 31: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Learning to Rank Framework

18

Page 32: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Learning to Rank Approaches

19

• Pointwise Approaches

• Input space: single documents d1 — if a doc. is relevant to a query

• Output space: scores or relevant classes

• Pairwise Approaches

• Input space: (d1, d2) — pairs of documents s.t. d1 > d2

• Output space: preferences (yes/no) for a given doc. pair

• Listwise Approaches

• Input space: Document set {d}

• Output space: Permutation — ranking of documents

• Optimize directly a IR quality measure — MAP, NDCG. etc

Page 33: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Pointwise Ranking - large margin ranking

20

• Margin:

• Constraints: Every document is correctly placed in the corresponding category

Original  SVM  formula)on

X

k

(bk � ak)

Fixed  Margin  Strategy

margin minimize

w,b

1

2

kwk2 + C

X

i

⇠i

subject to yi [hw, xii+ b] � 1� ⇠i and ⇠i � 0

Page 34: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Problems with Pointwise ranking

21

• Some queries have large number of documents and dominate the learned hypothesis

• Does not optimise for the standard IR measures like NDCG, MAP, ERR etc

• Position of the documents in the ranked list is not modelled by the loss function

• Approaches might emphasise more on the unimportant documents

• Sampling is essential (if non relevant documents >> relevant documents)

Page 35: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Listwise Ranking

22

• Convert input into preferences

• Use preferences as input to learn the classes

• Define loss L(.) on the difference vector and minimise loss while learning f(.)

queryd1d2d3

d5d4

d6

(d1 > d2) (d1 > d5) (d1 > d6) (d1 > d3) …..

(x1 � x2)

[Rank Boost 2003]

Page 36: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Listwise Ranking

23

Original  SVM  formula)on

minimize

w,b

1

2

kwk2 + C

X

i

⇠i

subject to yi [hw, xii+ b] � 1� ⇠i and ⇠i � 0

Ranking  SVM1

2||w||2 + C

X

i

X

u,v

⇠iu,vminimize

w,b

1

2

kwk2 + C

X

i

⇠i

subject to yi [hw, xii+ b] � 1� ⇠i and ⇠i � 0

yi[hw, (xiu � x

iv)i+ b] � 1� ⇠

iu,v and ⇠

iu,v � 0

minimize

w,b

1

2

kwk2 + C

X

i

⇠i

subject to yi [hw, xii+ b] � 1� ⇠i and ⇠i � 0

Page 37: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Pros and Cons - Pairwise Approach

24

• Pros

• One step closer to ranking documents compared to predicting class labels

• Cons

• Distribution of document pairs is more skewed than distribution of document w.r.t queries

• Some queries have more query pairs than others

• Still does not optimise for IR measures

• Rank ignorant — (d1 > d2) does not encode which ranks are being compared. Rank 1 vs Rank 2 or Rank 99 vs Rank 1000

Page 38: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Pairwise Ranking — IR SVM

25

Page 39: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Listwise Approaches

26

NDCG,  ERR,  MAP

Page 40: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Learning to Rank — In Action

27

extract features from document collection and

queries

feature file

create model

Execute model on

test data for results

Page 41: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

Learning to Rank - in Action

28

Example of a feature file

Page 42: Learning to Rankanand/tir15/lectures/ws15-tir-l2r.pdf · 2016-11-21 · Learning to Rank Approaches 19 • Pointwise Approaches • Input space: single documents d1 — if a doc

Avishek Anand

References and Further Readings

Books  Liu,  Tie-­‐Yan.  Learning  to  rank  for  informa.on  retrieval.  Vol.  13.  Springer,  2011.  Li,  Hang.  "Learning  to  rank  for  informa)on  retrieval  and  natural  language  processing."  Synthesis  Lectures  on  Human  Language  Technologies  4.1  (2011):  1-­‐113.  

Helpful  pages  hWp://en.wikipedia.org/wiki/Learning_to_rank  

Packages  RankingSVM:  hWp://svmlight.joachims.org/  RankLib:  hWp://people.cs.umass.edu/~vdang/ranklib.html  

Data  sets  LETOR  hWp://research.microso].com/en-­‐us/um/beijing/projects/letor//  Yahoo!  Learning  to  rank  challenge  hWp://learningtorankchallenge.yahoo.com