learning to rank - hang li · 2014-08-16 · •pairwise approach and listwise approach perform...

142
Machine Learning and Applications to Social Media Hang Li Huawei Technologies CCF ADL 2014 Beijing Aug. 10, 2014

Upload: others

Post on 13-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Machine Learning and Applications to Social Media

Hang Li

Huawei Technologies

CCF ADL 2014 Beijing

Aug. 10, 2014

Page 2: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Outline

1. Introduction

2. Learning to Rank

3. Learning to Match

4. Applications to Social Media

5. Summary

2

Page 3: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Two Challenges: Information Overload and Information Shortage

Page 4: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Video: Weibo Robot

Page 5: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Main Features of Weibo Robot V1

• Features Developed

– Following People

– Re-Tweeting (Forwarding Tweets)

– Generating Short Comments

• Features To Be Developed

– Generating Original Tweets

– Generating Long Comments 小诺_Noah

Page 6: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank
Page 7: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

2. Learning to Rank

7

Page 8: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

2.1. Overview of Learning to Rank

8

Page 9: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Plays Key Role in Many Applications

9

Page 10: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Problem: Example = Document Retrieval

qnq

q

q

d

d

d

,

2,

1,

q

query

documents

ranking of documents

NdddD ,,, 21

),( dqf

10

ranking based on relevance, importance,

preference

Page 11: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Problem Example = Recommenders System

11

Item1 Item2 Item3 ... ItemN

User1 5 4

User2 1 2 2

... ? ? ?

UserM 4 3

Page 12: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Problem Example = Machine Translation

12

Re-Ranking

Model

1000

2

1

e

e

e

ranked sentence candidates in target language

Generative Model

f

sentence source language

e~

re-ranked top sentence in target language

Page 13: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Problem Example = Meta Search

13

Page 14: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning to Rank

• Definition 1 (in broad sense)

Learning to rank = any machine learning technology for ranking problem

• Definition 2 (in narrow sense)

Learning to rank = machine learning technology for ranking creation and ranking aggregation

• This tutorial takes Definition 2

14

Page 15: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Taxonomy of Problems in Learning to Rank

15

Page 16: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Creation (with Local Ranking Model)

16

requests

objects

ranking of objects

Mi qqqqS ,,,,, 21

Nj ooooO ,,,,, 21

iq

iniiii oooO ,2,1, ,,,

),( oqf

),(

),(

),(

,,

2,2,

1,1,

ii niini

iii

iii

oqfo

oqfo

oqfo

Page 17: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning for Ranking Creation

• Creating a ranking list of offerings based on request and offerings

• Feature-based

• Usually local ranking model

• Usually supervised learning

17

Page 18: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Aggregation

18

Page 19: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning for Ranking Aggregation

• Aggregating a ranking list from multiple ranking lists of offerings

• Ranking based

• Usually global ranking model

• Both supervised and unsupervised learning

19

Page 20: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

2.2. Problem and Approaches of Learning to Rank

20

Page 21: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Problem: Example = Document Search

qnq

q

q

d

d

d

,

2,

1,

q

query

documents

ranking of documents

NdddD ,,, 21

),( dqf

21

ranking based on relevance, importance,

preference

Page 22: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Traditional Approach = Probabilistic Model

Nd

d

d

2

1

q

),|(~

),|(~

),|(~

22

11

nn dqrPd

dqrPd

dqrPd

query

documents

ranking of documents

}0,1{

),|(

R

dqrP

22

Page 23: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

BM25 [Robertson & Walker 94]

Nd

d

d

2

1

qquery

documents

ranking function

qdw wtfavgdl

dlbkb

wtfk

)()1(

)()1(

23

),|(~

),|(~

),|(~

22

11

nn dqrPd

dqrPd

dqrPd

ranking of documents

}0,1{

),|(

R

dqrP

Page 24: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

PageRank [Page et al, 1999]

ndL

dPdP

ij dMd j

j

i

1)1(

)(

)()(

)(

Page 25: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

New Approach = Learning to Rank

1,1

2,1

1,1

1

nd

d

d

q

mnm

m

m

m

d

d

d

q

,

2,

1,

Learning System

Ranking System

1mq

),(

),(

),(

11 ,11,1

2,112,1

1,111,1

mm nmmnm

mmm

mmm

dqfd

dqfd

dqfd

),( dqf

25

NdddD ,,, 21

Page 26: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

1. Data Labeling (rank) 3. Learning

)(xf

2. Feature Extraction

1,1

2,1

1,1

1

nd

d

d

q

mnm

m

m

m

d

d

d

q

,

2,

1,

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training Process

26

Page 27: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

1. Data Labeling (rank)

2. Feature Extraction

Testing Process

27

3. Ranking with )(xf

)(

)(

)(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4. Evaluation

Evaluation Result

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Page 28: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Notes

• Features are functions of query and document

• Query and associated documents form a group

• Groups are i.i.d. data

• Feature vectors within group are not i.i.d. data

• Ranking model is function of features

• Several data labeling methods (here labeling of grade)

28

Page 29: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Issues in Learning to Rank

• Data Labeling

• Feature Extraction

• Evaluation Measure

• Learning Method (Model, Loss Function, Algorithm)

29

Page 30: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Data Labeling Problem

• E.g., relevance of documents w.r.t. query

30

Doc A

Doc B

Doc C

Query

Page 31: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Data Labeling Methods

• Labeling of Grades

– Multiple levels (e.g., relevant, partially relevant, irrelevant)

– Widely used in IR

• Labeling of Ordered Pairs

– Ordered pairs between documents (e.g. A>B, B>C)

– Implicit relevance judgment: derived from click-through data

• Creation of List

– List (or permutation) of documents is given

– Ideal but difficult to implement

31

Page 32: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Implicit Relevance Judgment

Doc A

Doc B

Doc C B > A

ranking of documents at search system

users often clicked on Doc B

ordered pair

32

Page 33: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Feature Extraction

33

Doc A

Doc B

Doc C

Query

BM25

BM25

BM25

PageRank

PageRank

PageRank

.............

.............

.............

Query-document feature

Document feature

Feature Vectors

Page 34: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Example Features

34

Page 35: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Evaluation Measures

• Important to rank top results correctly

• Measures – NDCG (Normalized Discounted Cumulative Gain)

– MAP (Mean Average Precision)

– MRR (Mean Reciprocal Rank)

– WTA (Winners Take All)

– Kendall’s Tau

35

Page 36: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

NDCG

• Evaluating ranking using labeled grades

• NDCG at position j

36

)1log(/)12(1

1

)( in

j

i

ir

j

Page 37: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

NDCG (cont’)

• Example: perfect ranking

– (3, 3, 2, 2, 1, 1, 1) grade r=3,2,1

– (7, 7, 3, 3, 1, 1, 1) gain

– (1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) position discount

– (7, 11.41, 12.91, …) DCG

– (1/7, 1/11.41, 1/12.91, …) normalizing factor

– (1, 1,1,1,1,1,1) NDCG for perfect ranking

12 )( jr

)1log(/1 j

)1log(/)12(1

)( ij

i

ir

jn

37

Page 38: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

NDCG (cont’)

• Example: imperfect ranking

– (2, 3, 2, 3, 1, 1, 1)

– (3, 7, 3, 7, 1, 1, 1) Gain

– (1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) Position discount

– (3, 7.41, 8.91, … ) DCG

– (1/7, 1/11.41, 1/12.91, …) normalizing factor

– (0.43, 0.65, 0.69, ….) NDCG

• Imperfect ranking decreases NDCG

38

Page 39: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Relations with Other Learning Tasks

• No need to predict category

vs Classification

• No need to predict value of

vs Regression

• Relative ranking order is more important

vs Ordinal regression

• Learning to rank can be approximated by classification, regression, ordinal regression

39

),( dqf

Page 40: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ordinal Regression (Ordinal Classification)

• Categories are ordered

– 5, 4, 3, 2, 1

– e.g., rating restaurants

• Prediction

– Map to ordered categories

40

Page 41: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Three Major Approaches

• Pointwise approach

• Pairwise approach

• Listwise approach

• SVM based

• Boosting based

• Neural Network based

• Others

41

Page 42: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Categorization of Learning to rank Methods

42

Page 43: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Pointwise Approach

• Transforming ranking to regression, classification, or ordinal classification

• Query-document group structure is ignored

43

Page 44: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Pointwise Approach

44

Page 45: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Pointwise Approach

45

Page 46: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Pairwise Approach

• Transforming ranking to pairwise classification

• Query-document group structure is ignored

46

Page 47: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Pairwise Approach

47

Page 48: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Listwise Approach

• List as instance

• Query-document group structure is used

• Straightforwardly represents learning to rank problem

48

Page 49: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Listwise Approach

49

Page 50: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Evaluation Results

• Pairwise approach and listwise approach perform better than pointwise approach

• LabmdaMART performs best in Yahoo Learning to rank Challenge

• No significant difference among pairwise and listwise methods

50

Page 51: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

51

Page 52: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

2.3. Methods of Learning to Rank

52

Page 53: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking SVM

53

Page 54: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Pairwise Classification

• Converting document list to document pairs

54

Doc A

Doc B

Doc C

Query

Doc A

Doc B Doc C

Query

Doc A Doc B

Doc C

grade 1

grade 2

grade 3

Page 55: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Transforming Ranking to Pairwise Classification

• Input space: X

• Ranking function

• Ranking:

• Linear ranking function:

• Transforming to pairwise classification:

RXf :

);();( wxfwxfxx jiji

xwwxf ,);(

);();( 0, wxfwxfxxw jiji

ij

ji

jixx

xxyzxx

1

1 ),,(

55

Page 56: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking Problem

56

w

grade 1

grade 2

grade 3

1x

2x

3x

w

Page 57: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Transformed Pairwise Classification Problem

57

);( wxf31 xx

32 xx

21 xx

+1 -1

12 xx

23 xx

13 xx

Positive Examples

Negative Examples

Redundant

Page 58: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Ranking SVM (Hebrich et al., 1999)

• Pairwise classification on differences of feature vectors

• Corresponding positive and negative examples

• Negative examples are redundant and can be discarded

• Hyper plane passes the origin

• Soft margin and kernel can be used

• Ranking SVM = pairwise classification SVM

58

Page 59: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning of Ranking SVM

2

1

)2()1( ||||,1min wxxwyl

i

iiiw

59

),0max(][ ss C2

1

0

,,1 1,

||||2

1min

)2()1(

1

2

,

i

iiii

N

i

iw

Nixxwy

Cw

Page 60: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

IR SVM

60

Page 61: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Cost-sensitive Pairwise Classification

• Converting to document pairs

61

Doc A

Doc B

Doc C

Query

Doc A

Doc B Doc C

Query

Doc A Doc B

Doc C

grade 1

grade 2

grade 3

Critical Not Critical

Page 62: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Problems with Ranking SVM

• Not sufficient emphasis on correct ranking on top grades: 3, 2, 1 ranking 1: 2 3 2 1 1 1 1 ranking 2: 3 2 1 2 1 1 1 ranking 2 should be better than ranking 1 Ranking SVM views them as the same

• Numbers of pairs vary according to queries q1: 3 2 2 1 1 1 1 q2: 3 3 2 2 2 1 1 1 1 1 number of pairs for q1 : 2*(2-2) + 4*(3-1) + 8*(2-1) = 14 number of pairs for q2: 6*(3-2) + 10*(3-1) + 15*(2-1) = 31 Ranking SVM is biased toward q2

62

Page 63: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

IR SVM (Cao et al., 2006)

• Solving the two problems of Ranking SVM

• Higher weight on important grade pairs

• Normalization weight on pairs in query

• IR SVM = Ranking SVM using modified hinge loss

63

)(ik

)(iq

Page 64: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Modified Hinge Loss function

64

2)2()1(

)(

1

)( ||||,1min wxxwy iiiiq

l

i

ikw

1

2

0.5

1 )( )2()1( xxfy

Loss

Page 65: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning of IR SVM

65

2)2()1(

)(

1

)( ||||,1min wxxwy iiiiq

l

i

ikw

2

0

,,1 1,

||||2

1min

)()(

)2()1(

1

2

,

iqik

i

i

iiii

l

i

iiw

C

lixxwy

Cw

Page 66: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

AdaRank

66

Page 67: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Listwise Loss

67

111 ,1,1,1

2,11,22,1

1,11,11,1

1

nnn yx

yx

yx

q

mmm nmnmnm

mmm

mmm

m

yx

yx

yx

q

,,,

2,2,2,

1,1,1,

Page 68: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

AdaRank (Xu and Li, 2007)

• Optimizing exponential loss function

• Algorithm: AdaBoost-like algorithm for ranking

68

Page 69: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Loss Function of AdaRank

69

Any evaluation measure taking value between [-1,+1]

Page 70: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

AdaRank Algorithm

70

Page 71: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

3. Learning to Match

71

Page 72: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

3-1. Semantic Matching in Search

72

Page 73: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Query Document Mismatch is Biggest Challenge in Search

73

Page 74: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Query Document Mismatch

• Same intent can be represented by different queries (representations)

• Search is still mainly based on term level matching

• Query document mismatch occurs, when searcher and author use different representations

74

Page 75: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Examples of Query Document Mismatch

Query Document Term Matching

Semantic Matching

seattle best hotel seattle best hotels

no yes

pool schedule swimmingpool schedule

no yes

natural logarithm transformation

logarithm transformation

partial yes

china kong china hong kong partial no

why are windows so expensive

why are macs so expensive

partial no

75

Page 76: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching at Different Levels

Semantic Matching

Form Phrase Sense Topic Structure

Term Matching

Page 77: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Structure Identification

Topic Identification

Similar Query Finding

Phrase Identification

Spelling Error Correction

Sense

michael jordan berkele

query form: michael jordan berkeley

phrase: michael jordan

similar query: michael i. jordan

main phrase: michael jordan

Phrase

Term

Structure

topic: machine learning, berkeley

Topic

phrase: berkeley

Query Understanding

Page 78: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Title Structure Identification

Topic Identification

Key Phrase Identification

Phrase Identification Term

Topic

Homepage of Michael Jordan

Michael Jordan is Professor in the

Department of Electrical Engineering

……

key phrase: michael jordan, professor,

electrical engineering Key Phrase

topic: machine learning, berkeley

main phrase in title: michael jordan

phrase: michael jordan, professor,

department, electrical engineering]

Phrase

Structure

Document Understanding

Page 79: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Query

Representation

Document

Representation Query Document

Matching

Relevance Ranking

Query form: michael jordan berkeley

Similar query: michael i jordan

Main phrase: michael jordan

Phrase: michael jordan, berkeley

Topic : machine learning

Document: michael jordan homepage

Main phrase in title: michael jordan

Key phrase: michael jordan, berkeley

Phrase: michael jordan, professor,

department of electrical engineering

Topic : machine learning, berkeley

Semantic Matching

Page 80: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching in Different Ways

80

q d

q’

d’

c

Query Reformulation

Document transformation

Query and document transformation

No transformation

Page 81: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Web Search System

Ranking

Document

Understanding Indexing

Index

User Interface

Web

User

Crawling

Query

Document

Matching

Query

Understanding Retrieving

Page 82: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Machine Learning for Query Document Matching in Web Search

82

Page 83: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning for Matching between Query and Document

• Learning matching function

• Using training data

• and can be id’s or feature vectors

• can be binary or numerical values

• Using relations in data and/or prior knowledge

83

),|(or ),( dqrpdqf MM

),,(,),,,( 111 NNN rdqrdq

Nqqq ,,, 21 Nddd ,,, 21

Nrrr ,,, 21

Page 84: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Challenges in Machine Learning for Matching

• How to leverage relations in data

• How to handle complicated data (e.g., long queries, questions)

• How to scale up

• How to leverage prior knowledge

• How to deal with tail

84

Page 85: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Long Tail Challenge

• Head pages have rich anchor texts and click data

• Tail queries and pages suffer more from mismatch

• Problem of propagating information and knowledge from head to tail

85

Page 86: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Relation between Matching and Ranking

• In traditional IR: – Ranking = matching

• Web search:

– Ranking and matching become separated – Learning to rank becomes state-of-the-art

– Matching = feature learning for ranking

86

or ),(),( 25 dqfdqf BM

)(),(),( 25 dgdqfdqf PageRankBM

)|(),( qdPdqf LMIR

Page 87: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching vs Ranking

Matching Ranking

Prediction Matching degree between query and document

Ranking list of documents

Model f(q, d) f(q,d1), f(q,d2), … f(q,dn)

Challenge Mismatch Correct ranking on top

87

In search, first matching and then ranking

Page 88: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching Functions as Features in Learning to Rank

• Term level matching:

• Phrase level matching:

• Sense level matching:

• Topic level matching:

• Structure level matching:

• Term level matching (spelling, stemming):

88

),(25 dqfBM

),( dqfP

),( dqfT

),( dqfC

),( dqfS

qq '

),(25 dqf BMn

Page 89: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Approaches to Learning for Matching Between Query and Document

• Matching by Query Reformulation

• Matching with Term Dependency Model

• Matching with Translation Model

• Matching with Topic Model

• Matching with Latent Space Model

89

Page 90: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank
Page 91: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

3-2. Overview of Learning to Match

91

Page 92: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching between Heterogeneous Data is Everywhere

• Matching between user and product (collaborative filtering)

• Matching between text and image (image annotation)

• Matching between people (dating)

• Matching between languages (machine translation)

• Matching between receptor and ligand (drug design)

92

Page 93: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Formulation of Learning Problem

• Learning matching function

• Training data

• Generated according to

93

),( yxf

),,(,),,,( 111 NNN ryxryx

),|(~ ),|(~ ),(~ YXRPrXYPyXPx

Page 94: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Formulation of Learning Problem

• Loss Function

• Risk Function

• Objective Function in Learning

94

) ),(,( yxfrL

RYXryxdPyxfrLryxPyxfrR ),,() ),(,(),,() ),(,(

N

iiii

FffyxfrL

1

)() ),(,(min

Page 95: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching Problem: Instance Matching

1

4

5 1

1

x1

xm

y1 yn y2 y3

x2

x3

95

Instances

Can be represented as matching between nodes in bipartite graph

Page 96: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching Problem: Feature Matching

1

4

5 1

1

x1

xm

y1 yn y2 y3

x2

x3

96

Features

Can be represented as matching between objects in two spaces

Page 97: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching Problem: Structure Matching

1

4

5 1

1

x1

xm

y1 yn y2 y3

x2

x3

97

Structures

Page 98: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

3-3. Methods of Learning to Match

98

Page 99: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Regularized Matching in Latent Space (RMLS)

99

Page 100: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching in Latent Space (Wu et al., WSDM 2013, JMLR 2013)

• Motivation

– Matching between query and document in latent space

• Assumption

– Queries have similarity

– Document have similarity

– Click-through data represent “similarity” relations between queries and documents

• Approach

– Projection to latent space

– Regularization or constraints

• Results

– Significantly enhance accuracy of query document matching

Page 101: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching in Latent Space

q1

qm q2

d1

dn

d2

Query Space Document Space

q1 d2

qm

dn

d1

q2 Latent Space

qL

dL

Page 102: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

• Matching between Heterogeneous Data

• Example: Image Annotation

hook

fishing

singer

solider

worrier

microphone

Page 103: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Projecting Keywords and Images into Latent Space

Page 104: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

IR Models as Similarity Functions (Xu and Li 2010)

q1

qm q2

d1

dn

d2

Query Space Document Space

q1 d2

qm

dn

d1

q2

New Space

'

unigram

unigram

unigram

unigram

unigram

unigram

unigram unigram

unigram

VSM, BM25, LM, MRF

Mapping functions are diagonal matrices

Page 105: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

IR Models Are Similarity Functions

Page 106: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Partial Least Square (PLS) • Setting

– Two spaces:

• Input

– Training data:

• Output

– Matching function

• Assumption

– Two linear (and orthonormal ) transformations

– Dot product as similarity function

• Optimization

DdQq ,

Nccdq iiii )},,{(

),( dqf

dq LL ,

dLqL dq ,

ILLILLdLqLc d

T

dq

T

q

dq

idiqiLL

iidq

, st. , maxarg),(

,

Page 107: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Solution of Partial Least Square

• Non-convex optimization

• Global optimal solution exists

• Global optimum can be found by solving SVD (Singular Value Decomposition)

Page 108: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Regularized Mapping to Latent Space (RMLS)

• Setting

– Two spaces:

• Input

– Training data:

• Output

– Matching function

• Assumption

– Two linear (and sparse) transformations

– Dot product as similarity function

• Optimization

DdQq ,

Nccdq iiii )},,{(

),( dqf

dq LL ,

dLqL dq ,

ddqqddqq

dq

idiqiLL

lllldLqLcii

dq

||||,||||,||,|| st. , maxarg),(

,

Page 109: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Solution of Regularized Mapping to Latent Space (RMLS)

• Coordinate Descent

• Repeat

– Fix , update

– Fix , update

• No guarantee to find global optimum

• Updates can be parallelized by rows

qL dL

dLqL

Page 110: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Comparison between PLS and RMLS

PLS RMLS

Assumption Orthogonal L1 and L2 Regularization

Optimization Method

Singular Value Decomposition

Coordinate Descent

Optimality Global optimum

Local optimum

Efficiency Low High

Scalability Low High

Page 111: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

String Re-writing Kernel

111

Page 112: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning for String Re-wring

((distance between sun and earth), (how far sun earth), +1) ((distance between beijing and shanghai), (how far is beijing from shanghai), +1) … …. ((distance between moon and earth), (how far sun earth), -1)

((distance between moon and earth), (how far is earth from moon))

Learning System

Prediction System

Model

+1/-1?

Page 113: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

String Re-writing Kernel (Bu et al., 2012)

• Training data

• Model

• String Re-writing Kernel

)),,(()),,(( 111 nnn ytsyts

0 )),(),,(((sign1

iii

n

i

ii tstsKyy

)),(),,(( tstsK ii

Page 114: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Re-Writing Rule • Rule is applied to re-writing of substrings

Shakespeare wrote Hamlet in 16 century Hamlet was written by Shakespeare

Cao Xueqin wrote Dream of the Red Chamber Dream of the Red Camber was written by Cao Xueqin

* wrote * * was written by *

Re-writing Rule

Cao Xueqin wrote Dream of the Red Chamber Hamlet was written by Shakespeare

Page 115: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Re-Writing Rule (2) • Multiple re-writing rules may be applied

Shakespeare wrote Hamlet in 16 century Hamlet was written by Shakespeare

Re-writing Rule

* wrote * in 16 century * was written by *

Page 116: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Definition of String Re-writing Kernel (SRK)

),(),,()),(),,(( 22112211 tstststsK

Rrr tsts )),((),(

(0,1] ),( i

r nts

),(),()),(),,(( 22112211 tstststsK r

r

r

Page 117: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Similarity between Two Re-writings • Similarity between two re-writings with respect to re-

writing rules

Shakespeare wrote Hamlet in 16 century Hamlet was written by Shakespeare

Cao Xueqin wrote Dream of the Red Chamber Dream of the Red Camber was written by Cao Xueqin

,1,,1,,0 10011001 rrr ,0,,1,,0 10011001 rrr

* wrote * * was written by *

100r 1001r * wrote * in 16 century * was written by *

Page 118: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

A Sub-Class of String Re-writing Kernel

• Advantage: Matching between informally written sentences such as long queries in search can be effectively performed

• Challenge

• Number of re-writing rules is infinite

• Number of applicable rules increase exponentially when length of sentence increases

• Our Approach

• Sub-class: kb-SRK

Page 119: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Definition of kb-SRK

• Special class of SRK

• Re-writing rules in kb-SRK

– String patterns in rule are of length k

– Wildcard ? only substitutes a single character

– Alignment between string patterns is bijective

? wrote ? ? described ?

Page 120: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Reformulation of kb-SRK

otherwise ,0),( match, , if,),(

),(),()( )(

tsrts

i

tsr

sgramk tgramk

tsrr

s t

ts

),(),()),(),,(( 22112211 tstststsK r

r

r

Definition

Equivalent formulation based on fact that summations are exchangable

Rr

tsrtsrk

tstsk

sgramk

sgramk

tgramk

tgramk

K

KtstsK

s

s

t

t

),(),(

)),(),,(()),(),,((

2211

2211

22

11

22

11)(

)(

)(

)(

2211

Page 121: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Similarity between Re-writings of K-gram Pairs

abbccbb1s

cbcbbcb1t

abcccdd2s

cbccdcd2t

two k-gram pairs

,1,,1,,0 20012001 rrr ,0,,1,,0 20012001 rrr

a b ? c c ? ? c b c ? ? c ?

200r 2001r

abbccbb1s

cbcbbcb1t

abcccdd2s

cbccdcd2t

a b ? c c ? ? c ? c b ? c ?

applicable to both pairs

not applicable to second pair

Only need to consider the rules applicable to both k-gram pairs

two k-gram pairs

Page 122: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Efficiently Calculating Number of Applicable Rules on Combined k-gram Pairs

abbccbb1s

cbcbbcb1t

abcccdd2s

cbccdcd2t

cbc??c?

ab?cc??

),)(cc,)(,)(,)(cc,)(bb,)(cc,(

),)(,)(cc,)(cc,)(,)(bb,)(aa,(

dbdbcb

dbdbcb

t

s

?)cc,??()cc,)(bb,)(cc,(

??)cc,)(cc,?()bb,)(aa,(

applicable rule applicable rule

combined k-gram pairs two k-gram pairs

identical doubles can be either or not substituted non-identical doubles must be substituted

)661)(!2)()(1)(1( 42422 kK

Page 123: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Time Complexity of Calculation of kb-SRK

• Inner loop

– Time complexity

• Outer loop

– Empirical time complexity

123

)(kO

)( 2 klO

|)||,||,||,max(| 2211 tstsl

Page 124: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

4. Applications to Social Media

Page 125: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

4.1 Tweet Recommendation

Page 126: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Tweet Recommendation

126

1

1

1 1

1

u1

um

i1 in i2 i3

u2

u3

tweets

users

Page 127: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

SVD Feature: Feature-based Matrix Factorization

(Chen et al., 2012)

• Model

• Loss function: squared loss

• Algorithm: Gradient descent

127

iLuLilulgliur iugug ,,,,),(

Page 128: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Matching between User and Item

128

ui

g

uLu iLi

Matching in latent space

global feature vector

user vector item vector

Page 129: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

4.2 Short Text Conversation

Page 130: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Short Text Conversation

One Step toward Turing Test Important subtask of dialogue

Human: a message Computer: a reply

Page 131: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Massive Amount of Data Available

Our paper entitled learning to rank has been accepted by ACL .

Congratulations! It is a great achievement

We are lucky. Our paper has been accepted by SIGIR this year. We are going to present it.

Great news! Please accept my congrats!

The PC of WSDM noticed us that our paper has been accepted.

Awesome! It is a great achievement

…..

…..

Page 132: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

System of Retrieval-based Short Text Conversation

Given post, find most suitable response

Large repository of post-response pairs

Take it as search problem

index of post and response

matching

ranking

short text

retrieval

retrieved posts and responses

ranked responses

learning to match

learning to rank

online

offline

best response

matched responses

Page 133: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Learning to Match for Short Text Conversation

Learning System

Matching System

Model

t1 c1

t2

tn

c2

cn

c(n+1) t(n+1)

……

……

Page 134: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Linear Matching Model (RMLS) (Wang et al. EMNLP 2013)

• Short text = sum of word vectors

• Matching degree between short texts = Inner product of their document vectors

• Word vectors are learned from short text data

Page 135: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Deep Matching Model (Lu & Li, NIPS 2013)

• Take interactions between texts as input

• Learning features in different granularities

using LDA

• Learning parameters using back-propagation

Page 136: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Representing Posts and Comments as Bags of Words

Mats catch mice Great mats

Mats chase mice Poor mice

P_mats, P_catch, P_mice, C_great, C_mats

P_mats, P_chase, P_mice, C_poor, C_mice

….

….

….

Post Comment Bag of words

Words in post and comment are viewed as different words

Page 137: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Constructing Topics Using Latent Dirichlet Allocation

P_mats, P_catch, P_mice, C_great, C_mats

P_mats, P_chase, P_mice, C_poor, C_mice

Bag of words

….

….

P_mats P_mice ….

C_mice C_mats

P_chase P_catch ….

C_great C_poor ….

Topic

Topics in different granularities form different hidden layers

Page 138: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

matching degree

Architecture of Deep Matching Model

(Q,A) interaction space

局部匹配模型

烤鸭啊,想吃热乎的去烤鸭店,如全聚德,真空包装的超市就有

北京有什么出名的特产?

0 0 1 0 1 … 0 0 0 1 0

0

0

1

0

1

0

0

0

1

0

Examples Local Model 1: (特产, 土产, 味道, …) || (豆腐, 烤鸭, 甜, 野味, 糯米…) Local Model 2: (路程, 安排, 地点, …) || (距离, 安全, 隧道, 高速, 机票…)

multiple layers

Page 139: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

5. Summary

139

Page 140: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Summary

• Learning to rank and learning to match are key technologies for many applications including search and recommendation

• Learning to Rank – Ranking SVM – IR SVM – AdaRank

• Learning to Match – Regularized Matching in Semantic Space – String Re-writing Kernel

• Many applications on social media can be constructed with learning to rank and learning to match techniques

140

Page 141: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

References • Hang Li, Learning to Rank for Information Retrieval and Natural Language Processing,

Synthesis Lectures on Human Language Technology, Lecture 12, Morgan & Claypool Publishers, 2011.

• Hang Li, A Short Introduction to Learning to Rank, IEICE Transactions on Information and Systems, E94-D(10), 2011.

• Jun Xu and Hang Li. AdaRank: A Boosting Algorithm for Information Retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference (SIGIR’07), 391-398, 2007.

• Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon. Adapting Ranking SVM to Document Retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference (SIGIR’06), 186-193, 2006.

• Hang Li, Jun Xu, Semantic Matching in Search, Foundations and Trends in Information Retrieval, Now Publishers, 2014, ISBN 978-1-60198-804-1.

• Wei Wu, Zhengdong Lu, Hang Li, Learning Bilinear Model for Matching Queries and Documents, Journal of Machine Learning Research (JMLR), 14: 2519-2548, 2013.

• Fan Bu, Hang Li, Xiaoyan Zhu, String Re-Writing Kernel, In Proceedings of the 50th Annual Meeting of Association for Computational Linguistics (ACL’12), 449-458, 2012.

• Zhengdong Lu, Hang Li, In Advances in Neural Information Processing Systems (NIPS 2013), 1367-1375.

141

Page 142: Learning to Rank - Hang Li · 2014-08-16 · •Pairwise approach and listwise approach perform better than pointwise approach •LabmdaMART performs best in Yahoo Learning to rank

Thank You!

[email protected]

142