learning to rank fulltext results from clicks

27
Learning to rank fulltext results from clicks Tomáš Kramár @tkramar @synopsitv

Upload: tkramar

Post on 14-Dec-2014

332 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Learning to rank fulltext results from clicks

Learning to rank fulltext results from

clicksTomáš Kramár

@tkramar@synopsitv

Page 2: Learning to rank fulltext results from clicks

Let's build a fulltext search engine.

QueryFind matches

Rank results

1 2

43

Page 3: Learning to rank fulltext results from clicks

Let's build a fulltext search engine.

QueryFind matches

Rank results

1 2

43

● ElasticSearch● LIKE %%● ...

Page 4: Learning to rank fulltext results from clicks

Let's build a fulltext search engine.

QueryFind matches

Rank results

1 2

43

● By number of hits● By PageRank● By Date● ...

Page 5: Learning to rank fulltext results from clicks
Page 6: Learning to rank fulltext results from clicks

How do you choose relevant results?

Page 7: Learning to rank fulltext results from clicks

Number of keywords in title

2 2

Number of keywords in text

2 0

Domain carrerjet.sk vienna-rb.at

Category Job search Programming

Language Slovak English

Page 8: Learning to rank fulltext results from clicks

Document feature How much I care about it (the higher the more I care)

# keywords in title 2.1

# keywords in text 1

Domain is carreerjet.sk -2

Domain is vienna-rb.at 3.5

Category is Job Search -1

Category is Programming 4.2

Language is Slovak 0.9

Language is English 1.5

Page 9: Learning to rank fulltext results from clicks

Document feature How much I care about it

# keywords in title 2.1 2 2

# keywords in text 1 2 0

Domain is carreerjet.sk -2 1 0

Domain is vienna-rb.at 3.5 0 1

Category is Job Search -1 1 0

Category is Programming 4.2 0 1

Language is Slovak 0.9 1 0

Language is English 1.5 0 1

= 4.1 = 13.3rank = d . u

Page 10: Learning to rank fulltext results from clicks

Rate each result on a scale 1-5.

Page 11: Learning to rank fulltext results from clicks

rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un

d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3

d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5

d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1

d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3

Page 12: Learning to rank fulltext results from clicks

rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un

d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3

d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5

d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1

d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3

di,j are known, solve this system of

equations and you have u. Done.

Page 13: Learning to rank fulltext results from clicks

Except..

● You don't know the explicit ratings

● User preferences change in time● Those equations probably don't

have solution

Page 14: Learning to rank fulltext results from clicks

Clicked! Assume rating 1.

Not clicked. Assume rating 0.

Page 15: Learning to rank fulltext results from clicks

Except..

● You don't know the explicit ratings

● User preferences change in time● Those equations probably don't

have solution

Page 16: Learning to rank fulltext results from clicks

Approximation functionh(d): d → rankh(d) = d1.u1 + ... + dn.un = estimated_rank

If the function is good, it should make minimal errorserror = (estimated_rank - real_rank)2

Page 17: Learning to rank fulltext results from clicks

Gradient descent

1. Set user preferences (u) to arbitrary values

2. Calculate the estimated rank h(d) for each document

3. Calculate the mean square error4. Adjust preferences u in a way that

minimizes the error5. Repeat until the error converges

Page 18: Learning to rank fulltext results from clicks

mea

n sq

uare

err

or

u# of keywords in title

cost function

Page 19: Learning to rank fulltext results from clicks

mea

n sq

uare

err

or

u# of keywords in title

cost function

Calculate the derivation of cost function at this point and it will give you the direction to move in.

Page 20: Learning to rank fulltext results from clicks

Preference update

ui = ui - α.h(d)dui

α learning rate

h(d)dui partial derivation of cost function h(d) by ui

Page 21: Learning to rank fulltext results from clicks

Preference update

ui = ui - α.h(d)dui

α learning rate

h(d)dui partial derivation of cost function h(d) by ui

How fast will you move. Too low - slow progress. Too high - you will overshoot.

Page 22: Learning to rank fulltext results from clicks

Preference update

ui = ui - α.h(d)dui

α learning rate

h(d)dui partial derivation of cost function h(d) by ui

Nothing scary. You can find these online for standard cost functions.

For mean square error:

(rank(d) - h(d)) * ui

Page 23: Learning to rank fulltext results from clicks

Gradient descent

1. Set user preferences (u) to arbitrary values

2. Calculate the estimated rank h(d) for each document

3. Calculate the square error4. Adjust preferences u in a way that

minimizes the error5. Repeat until the error converges

Page 24: Learning to rank fulltext results from clicks

Clicked! Assume rating 1.

Clicked! Assume rating 1. Or? Doesn't this mean result #1 is not relevant?

Page 25: Learning to rank fulltext results from clicks

Clicked! Assume nothing.

Clicked! Assume it is better than #2 and #3.

Page 26: Learning to rank fulltext results from clicks

What's changed?

We no longer have ratings, just document comparisons.

Cost function - something that considers ordering, e.g., Kendall's T (number of concordant and discordant pairs)

h is now a function of 2 parameters: h(d1, d2). But you can just do d2 - d1 and learn on that.

d4 > d3

d4 > d2

Page 27: Learning to rank fulltext results from clicks