learning to rank - hang li · 2014-08-16 · •pairwise approach and listwise approach perform...
TRANSCRIPT
Machine Learning and Applications to Social Media
Hang Li
Huawei Technologies
CCF ADL 2014 Beijing
Aug. 10, 2014
Outline
1. Introduction
2. Learning to Rank
3. Learning to Match
4. Applications to Social Media
5. Summary
2
Two Challenges: Information Overload and Information Shortage
Video: Weibo Robot
Main Features of Weibo Robot V1
• Features Developed
– Following People
– Re-Tweeting (Forwarding Tweets)
– Generating Short Comments
• Features To Be Developed
– Generating Original Tweets
– Generating Long Comments 小诺_Noah
2. Learning to Rank
7
2.1. Overview of Learning to Rank
8
Ranking Plays Key Role in Many Applications
9
Ranking Problem: Example = Document Retrieval
qnq
q
q
d
d
d
,
2,
1,
q
query
documents
ranking of documents
NdddD ,,, 21
),( dqf
10
ranking based on relevance, importance,
preference
Ranking Problem Example = Recommenders System
11
Item1 Item2 Item3 ... ItemN
User1 5 4
User2 1 2 2
... ? ? ?
UserM 4 3
Ranking Problem Example = Machine Translation
12
Re-Ranking
Model
1000
2
1
e
e
e
ranked sentence candidates in target language
Generative Model
f
sentence source language
e~
re-ranked top sentence in target language
Ranking Problem Example = Meta Search
13
Learning to Rank
• Definition 1 (in broad sense)
Learning to rank = any machine learning technology for ranking problem
• Definition 2 (in narrow sense)
Learning to rank = machine learning technology for ranking creation and ranking aggregation
• This tutorial takes Definition 2
14
Taxonomy of Problems in Learning to Rank
15
Ranking Creation (with Local Ranking Model)
16
requests
objects
ranking of objects
Mi qqqqS ,,,,, 21
Nj ooooO ,,,,, 21
iq
iniiii oooO ,2,1, ,,,
),( oqf
),(
),(
),(
,,
2,2,
1,1,
ii niini
iii
iii
oqfo
oqfo
oqfo
Learning for Ranking Creation
• Creating a ranking list of offerings based on request and offerings
• Feature-based
• Usually local ranking model
• Usually supervised learning
17
Ranking Aggregation
18
Learning for Ranking Aggregation
• Aggregating a ranking list from multiple ranking lists of offerings
• Ranking based
• Usually global ranking model
• Both supervised and unsupervised learning
19
2.2. Problem and Approaches of Learning to Rank
20
Ranking Problem: Example = Document Search
qnq
q
q
d
d
d
,
2,
1,
q
query
documents
ranking of documents
NdddD ,,, 21
),( dqf
21
ranking based on relevance, importance,
preference
Traditional Approach = Probabilistic Model
Nd
d
d
2
1
q
),|(~
),|(~
),|(~
22
11
nn dqrPd
dqrPd
dqrPd
query
documents
ranking of documents
}0,1{
),|(
R
dqrP
22
BM25 [Robertson & Walker 94]
Nd
d
d
2
1
qquery
documents
ranking function
qdw wtfavgdl
dlbkb
wtfk
)()1(
)()1(
23
),|(~
),|(~
),|(~
22
11
nn dqrPd
dqrPd
dqrPd
ranking of documents
}0,1{
),|(
R
dqrP
PageRank [Page et al, 1999]
ndL
dPdP
ij dMd j
j
i
1)1(
)(
)()(
)(
New Approach = Learning to Rank
1,1
2,1
1,1
1
nd
d
d
q
mnm
m
m
m
d
d
d
q
,
2,
1,
Learning System
Ranking System
1mq
),(
),(
),(
11 ,11,1
2,112,1
1,111,1
mm nmmnm
mmm
mmm
dqfd
dqfd
dqfd
),( dqf
25
NdddD ,,, 21
1. Data Labeling (rank) 3. Learning
)(xf
2. Feature Extraction
1,1
2,1
1,1
1
nd
d
d
q
mnm
m
m
m
d
d
d
q
,
2,
1,
11 ,1,1
2,12,1
1,11,1
1
nn yd
yd
yd
q
mm nmnm
mm
mm
m
yd
yd
yd
q
,,
2,2,
1,1,
11 ,1,1
2,12,1
1,11,1
nn yx
yx
yx
mm nmnm
mm
mm
yx
yx
yx
,,
2,2,
1,1,
Training Process
26
1. Data Labeling (rank)
2. Feature Extraction
Testing Process
27
3. Ranking with )(xf
)(
)(
)(
111 ,1,1,1
2,12,12,1
1,11,11,1
mmm nmnmnm
mmm
mmm
yxfx
yxfx
yxfx
4. Evaluation
Evaluation Result
1,1
2,1
1,1
1
mnm
m
m
m
d
d
d
q
11 ,1,1
2,12,1
1,11,1
1
mm nmnm
mm
mm
m
yd
yd
yd
q
11 ,1,1
2,12,1
1,11,1
mm nmnm
mm
mm
yx
yx
yx
Notes
• Features are functions of query and document
• Query and associated documents form a group
• Groups are i.i.d. data
• Feature vectors within group are not i.i.d. data
• Ranking model is function of features
• Several data labeling methods (here labeling of grade)
28
Issues in Learning to Rank
• Data Labeling
• Feature Extraction
• Evaluation Measure
• Learning Method (Model, Loss Function, Algorithm)
29
Data Labeling Problem
• E.g., relevance of documents w.r.t. query
30
Doc A
Doc B
Doc C
Query
Data Labeling Methods
• Labeling of Grades
– Multiple levels (e.g., relevant, partially relevant, irrelevant)
– Widely used in IR
• Labeling of Ordered Pairs
– Ordered pairs between documents (e.g. A>B, B>C)
– Implicit relevance judgment: derived from click-through data
• Creation of List
– List (or permutation) of documents is given
– Ideal but difficult to implement
31
Implicit Relevance Judgment
Doc A
Doc B
Doc C B > A
ranking of documents at search system
users often clicked on Doc B
ordered pair
32
Feature Extraction
33
Doc A
Doc B
Doc C
Query
BM25
BM25
BM25
PageRank
PageRank
PageRank
.............
.............
.............
Query-document feature
Document feature
Feature Vectors
Example Features
34
Evaluation Measures
• Important to rank top results correctly
• Measures – NDCG (Normalized Discounted Cumulative Gain)
– MAP (Mean Average Precision)
– MRR (Mean Reciprocal Rank)
– WTA (Winners Take All)
– Kendall’s Tau
35
NDCG
• Evaluating ranking using labeled grades
• NDCG at position j
36
)1log(/)12(1
1
)( in
j
i
ir
j
NDCG (cont’)
• Example: perfect ranking
– (3, 3, 2, 2, 1, 1, 1) grade r=3,2,1
– (7, 7, 3, 3, 1, 1, 1) gain
– (1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) position discount
– (7, 11.41, 12.91, …) DCG
– (1/7, 1/11.41, 1/12.91, …) normalizing factor
– (1, 1,1,1,1,1,1) NDCG for perfect ranking
12 )( jr
)1log(/1 j
)1log(/)12(1
)( ij
i
ir
jn
37
NDCG (cont’)
• Example: imperfect ranking
– (2, 3, 2, 3, 1, 1, 1)
– (3, 7, 3, 7, 1, 1, 1) Gain
– (1, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33) Position discount
– (3, 7.41, 8.91, … ) DCG
– (1/7, 1/11.41, 1/12.91, …) normalizing factor
– (0.43, 0.65, 0.69, ….) NDCG
• Imperfect ranking decreases NDCG
38
Relations with Other Learning Tasks
• No need to predict category
vs Classification
• No need to predict value of
vs Regression
• Relative ranking order is more important
vs Ordinal regression
• Learning to rank can be approximated by classification, regression, ordinal regression
39
),( dqf
Ordinal Regression (Ordinal Classification)
• Categories are ordered
– 5, 4, 3, 2, 1
– e.g., rating restaurants
• Prediction
– Map to ordered categories
40
Three Major Approaches
• Pointwise approach
• Pairwise approach
• Listwise approach
• SVM based
• Boosting based
• Neural Network based
• Others
41
Categorization of Learning to rank Methods
42
Pointwise Approach
• Transforming ranking to regression, classification, or ordinal classification
• Query-document group structure is ignored
43
Pointwise Approach
44
Pointwise Approach
45
Pairwise Approach
• Transforming ranking to pairwise classification
• Query-document group structure is ignored
46
Pairwise Approach
47
Listwise Approach
• List as instance
• Query-document group structure is used
• Straightforwardly represents learning to rank problem
48
Listwise Approach
49
Evaluation Results
• Pairwise approach and listwise approach perform better than pointwise approach
• LabmdaMART performs best in Yahoo Learning to rank Challenge
• No significant difference among pairwise and listwise methods
50
51
2.3. Methods of Learning to Rank
52
Ranking SVM
53
Pairwise Classification
• Converting document list to document pairs
54
Doc A
Doc B
Doc C
Query
Doc A
Doc B Doc C
Query
Doc A Doc B
Doc C
grade 1
grade 2
grade 3
Transforming Ranking to Pairwise Classification
• Input space: X
• Ranking function
• Ranking:
• Linear ranking function:
• Transforming to pairwise classification:
RXf :
);();( wxfwxfxx jiji
xwwxf ,);(
);();( 0, wxfwxfxxw jiji
ij
ji
jixx
xxyzxx
1
1 ),,(
55
Ranking Problem
56
w
grade 1
grade 2
grade 3
1x
2x
3x
w
Transformed Pairwise Classification Problem
57
);( wxf31 xx
32 xx
21 xx
+1 -1
12 xx
23 xx
13 xx
Positive Examples
Negative Examples
Redundant
Ranking SVM (Hebrich et al., 1999)
• Pairwise classification on differences of feature vectors
• Corresponding positive and negative examples
• Negative examples are redundant and can be discarded
• Hyper plane passes the origin
• Soft margin and kernel can be used
• Ranking SVM = pairwise classification SVM
58
Learning of Ranking SVM
2
1
)2()1( ||||,1min wxxwyl
i
iiiw
59
),0max(][ ss C2
1
0
,,1 1,
||||2
1min
)2()1(
1
2
,
i
iiii
N
i
iw
Nixxwy
Cw
IR SVM
60
Cost-sensitive Pairwise Classification
• Converting to document pairs
61
Doc A
Doc B
Doc C
Query
Doc A
Doc B Doc C
Query
Doc A Doc B
Doc C
grade 1
grade 2
grade 3
Critical Not Critical
Problems with Ranking SVM
• Not sufficient emphasis on correct ranking on top grades: 3, 2, 1 ranking 1: 2 3 2 1 1 1 1 ranking 2: 3 2 1 2 1 1 1 ranking 2 should be better than ranking 1 Ranking SVM views them as the same
• Numbers of pairs vary according to queries q1: 3 2 2 1 1 1 1 q2: 3 3 2 2 2 1 1 1 1 1 number of pairs for q1 : 2*(2-2) + 4*(3-1) + 8*(2-1) = 14 number of pairs for q2: 6*(3-2) + 10*(3-1) + 15*(2-1) = 31 Ranking SVM is biased toward q2
62
IR SVM (Cao et al., 2006)
• Solving the two problems of Ranking SVM
• Higher weight on important grade pairs
• Normalization weight on pairs in query
• IR SVM = Ranking SVM using modified hinge loss
63
)(ik
)(iq
Modified Hinge Loss function
64
2)2()1(
)(
1
)( ||||,1min wxxwy iiiiq
l
i
ikw
1
2
0.5
1 )( )2()1( xxfy
Loss
Learning of IR SVM
65
2)2()1(
)(
1
)( ||||,1min wxxwy iiiiq
l
i
ikw
2
0
,,1 1,
||||2
1min
)()(
)2()1(
1
2
,
iqik
i
i
iiii
l
i
iiw
C
lixxwy
Cw
AdaRank
66
Listwise Loss
67
111 ,1,1,1
2,11,22,1
1,11,11,1
1
nnn yx
yx
yx
q
mmm nmnmnm
mmm
mmm
m
yx
yx
yx
q
,,,
2,2,2,
1,1,1,
AdaRank (Xu and Li, 2007)
• Optimizing exponential loss function
• Algorithm: AdaBoost-like algorithm for ranking
68
Loss Function of AdaRank
69
Any evaluation measure taking value between [-1,+1]
AdaRank Algorithm
70
3. Learning to Match
71
3-1. Semantic Matching in Search
72
Query Document Mismatch is Biggest Challenge in Search
73
Query Document Mismatch
• Same intent can be represented by different queries (representations)
• Search is still mainly based on term level matching
• Query document mismatch occurs, when searcher and author use different representations
74
Examples of Query Document Mismatch
Query Document Term Matching
Semantic Matching
seattle best hotel seattle best hotels
no yes
pool schedule swimmingpool schedule
no yes
natural logarithm transformation
logarithm transformation
partial yes
china kong china hong kong partial no
why are windows so expensive
why are macs so expensive
partial no
75
Matching at Different Levels
Semantic Matching
Form Phrase Sense Topic Structure
Term Matching
Structure Identification
Topic Identification
Similar Query Finding
Phrase Identification
Spelling Error Correction
Sense
michael jordan berkele
query form: michael jordan berkeley
phrase: michael jordan
similar query: michael i. jordan
main phrase: michael jordan
Phrase
Term
Structure
topic: machine learning, berkeley
Topic
phrase: berkeley
Query Understanding
Title Structure Identification
Topic Identification
Key Phrase Identification
Phrase Identification Term
Topic
Homepage of Michael Jordan
Michael Jordan is Professor in the
Department of Electrical Engineering
……
key phrase: michael jordan, professor,
electrical engineering Key Phrase
topic: machine learning, berkeley
main phrase in title: michael jordan
phrase: michael jordan, professor,
department, electrical engineering]
Phrase
Structure
Document Understanding
Query
Representation
Document
Representation Query Document
Matching
Relevance Ranking
Query form: michael jordan berkeley
Similar query: michael i jordan
Main phrase: michael jordan
Phrase: michael jordan, berkeley
Topic : machine learning
Document: michael jordan homepage
Main phrase in title: michael jordan
Key phrase: michael jordan, berkeley
Phrase: michael jordan, professor,
department of electrical engineering
Topic : machine learning, berkeley
Semantic Matching
Matching in Different Ways
80
q d
q’
d’
c
Query Reformulation
Document transformation
Query and document transformation
No transformation
Web Search System
Ranking
Document
Understanding Indexing
Index
User Interface
Web
User
Crawling
Query
Document
Matching
Query
Understanding Retrieving
Machine Learning for Query Document Matching in Web Search
82
Learning for Matching between Query and Document
• Learning matching function
• Using training data
• and can be id’s or feature vectors
• can be binary or numerical values
• Using relations in data and/or prior knowledge
83
),|(or ),( dqrpdqf MM
),,(,),,,( 111 NNN rdqrdq
Nqqq ,,, 21 Nddd ,,, 21
Nrrr ,,, 21
Challenges in Machine Learning for Matching
• How to leverage relations in data
• How to handle complicated data (e.g., long queries, questions)
• How to scale up
• How to leverage prior knowledge
• How to deal with tail
84
Long Tail Challenge
• Head pages have rich anchor texts and click data
• Tail queries and pages suffer more from mismatch
• Problem of propagating information and knowledge from head to tail
85
Relation between Matching and Ranking
• In traditional IR: – Ranking = matching
• Web search:
– Ranking and matching become separated – Learning to rank becomes state-of-the-art
– Matching = feature learning for ranking
86
or ),(),( 25 dqfdqf BM
)(),(),( 25 dgdqfdqf PageRankBM
)|(),( qdPdqf LMIR
Matching vs Ranking
Matching Ranking
Prediction Matching degree between query and document
Ranking list of documents
Model f(q, d) f(q,d1), f(q,d2), … f(q,dn)
Challenge Mismatch Correct ranking on top
87
In search, first matching and then ranking
Matching Functions as Features in Learning to Rank
• Term level matching:
• Phrase level matching:
• Sense level matching:
• Topic level matching:
• Structure level matching:
• Term level matching (spelling, stemming):
88
),(25 dqfBM
),( dqfP
),( dqfT
),( dqfC
),( dqfS
qq '
),(25 dqf BMn
Approaches to Learning for Matching Between Query and Document
• Matching by Query Reformulation
• Matching with Term Dependency Model
• Matching with Translation Model
• Matching with Topic Model
• Matching with Latent Space Model
89
3-2. Overview of Learning to Match
91
Matching between Heterogeneous Data is Everywhere
• Matching between user and product (collaborative filtering)
• Matching between text and image (image annotation)
• Matching between people (dating)
• Matching between languages (machine translation)
• Matching between receptor and ligand (drug design)
92
Formulation of Learning Problem
• Learning matching function
• Training data
• Generated according to
93
),( yxf
),,(,),,,( 111 NNN ryxryx
),|(~ ),|(~ ),(~ YXRPrXYPyXPx
Formulation of Learning Problem
• Loss Function
• Risk Function
• Objective Function in Learning
94
) ),(,( yxfrL
RYXryxdPyxfrLryxPyxfrR ),,() ),(,(),,() ),(,(
N
iiii
FffyxfrL
1
)() ),(,(min
Matching Problem: Instance Matching
1
4
5 1
1
x1
xm
y1 yn y2 y3
x2
x3
95
Instances
Can be represented as matching between nodes in bipartite graph
Matching Problem: Feature Matching
1
4
5 1
1
x1
xm
y1 yn y2 y3
x2
x3
96
Features
Can be represented as matching between objects in two spaces
Matching Problem: Structure Matching
1
4
5 1
1
x1
xm
y1 yn y2 y3
x2
x3
97
Structures
3-3. Methods of Learning to Match
98
Regularized Matching in Latent Space (RMLS)
99
Matching in Latent Space (Wu et al., WSDM 2013, JMLR 2013)
• Motivation
– Matching between query and document in latent space
• Assumption
– Queries have similarity
– Document have similarity
– Click-through data represent “similarity” relations between queries and documents
• Approach
– Projection to latent space
– Regularization or constraints
• Results
– Significantly enhance accuracy of query document matching
Matching in Latent Space
q1
qm q2
d1
dn
d2
Query Space Document Space
q1 d2
qm
dn
d1
q2 Latent Space
qL
dL
• Matching between Heterogeneous Data
• Example: Image Annotation
hook
fishing
singer
solider
worrier
microphone
Projecting Keywords and Images into Latent Space
IR Models as Similarity Functions (Xu and Li 2010)
q1
qm q2
d1
dn
d2
Query Space Document Space
q1 d2
qm
dn
d1
q2
New Space
'
unigram
unigram
unigram
unigram
unigram
unigram
unigram unigram
unigram
VSM, BM25, LM, MRF
Mapping functions are diagonal matrices
IR Models Are Similarity Functions
Partial Least Square (PLS) • Setting
– Two spaces:
• Input
– Training data:
• Output
– Matching function
• Assumption
– Two linear (and orthonormal ) transformations
– Dot product as similarity function
• Optimization
DdQq ,
Nccdq iiii )},,{(
),( dqf
dq LL ,
dLqL dq ,
ILLILLdLqLc d
T
dq
T
q
dq
idiqiLL
iidq
, st. , maxarg),(
,
Solution of Partial Least Square
• Non-convex optimization
• Global optimal solution exists
• Global optimum can be found by solving SVD (Singular Value Decomposition)
Regularized Mapping to Latent Space (RMLS)
• Setting
– Two spaces:
• Input
– Training data:
• Output
– Matching function
• Assumption
– Two linear (and sparse) transformations
– Dot product as similarity function
• Optimization
DdQq ,
Nccdq iiii )},,{(
),( dqf
dq LL ,
dLqL dq ,
ddqqddqq
dq
idiqiLL
lllldLqLcii
dq
||||,||||,||,|| st. , maxarg),(
,
Solution of Regularized Mapping to Latent Space (RMLS)
• Coordinate Descent
• Repeat
– Fix , update
– Fix , update
• No guarantee to find global optimum
• Updates can be parallelized by rows
qL dL
dLqL
Comparison between PLS and RMLS
PLS RMLS
Assumption Orthogonal L1 and L2 Regularization
Optimization Method
Singular Value Decomposition
Coordinate Descent
Optimality Global optimum
Local optimum
Efficiency Low High
Scalability Low High
String Re-writing Kernel
111
Learning for String Re-wring
((distance between sun and earth), (how far sun earth), +1) ((distance between beijing and shanghai), (how far is beijing from shanghai), +1) … …. ((distance between moon and earth), (how far sun earth), -1)
((distance between moon and earth), (how far is earth from moon))
Learning System
Prediction System
Model
+1/-1?
String Re-writing Kernel (Bu et al., 2012)
• Training data
• Model
• String Re-writing Kernel
)),,(()),,(( 111 nnn ytsyts
0 )),(),,(((sign1
iii
n
i
ii tstsKyy
)),(),,(( tstsK ii
Re-Writing Rule • Rule is applied to re-writing of substrings
Shakespeare wrote Hamlet in 16 century Hamlet was written by Shakespeare
Cao Xueqin wrote Dream of the Red Chamber Dream of the Red Camber was written by Cao Xueqin
* wrote * * was written by *
Re-writing Rule
Cao Xueqin wrote Dream of the Red Chamber Hamlet was written by Shakespeare
Re-Writing Rule (2) • Multiple re-writing rules may be applied
Shakespeare wrote Hamlet in 16 century Hamlet was written by Shakespeare
Re-writing Rule
* wrote * in 16 century * was written by *
Definition of String Re-writing Kernel (SRK)
),(),,()),(),,(( 22112211 tstststsK
Rrr tsts )),((),(
(0,1] ),( i
r nts
),(),()),(),,(( 22112211 tstststsK r
r
r
Similarity between Two Re-writings • Similarity between two re-writings with respect to re-
writing rules
Shakespeare wrote Hamlet in 16 century Hamlet was written by Shakespeare
Cao Xueqin wrote Dream of the Red Chamber Dream of the Red Camber was written by Cao Xueqin
,1,,1,,0 10011001 rrr ,0,,1,,0 10011001 rrr
* wrote * * was written by *
100r 1001r * wrote * in 16 century * was written by *
A Sub-Class of String Re-writing Kernel
• Advantage: Matching between informally written sentences such as long queries in search can be effectively performed
• Challenge
• Number of re-writing rules is infinite
• Number of applicable rules increase exponentially when length of sentence increases
• Our Approach
• Sub-class: kb-SRK
Definition of kb-SRK
• Special class of SRK
• Re-writing rules in kb-SRK
– String patterns in rule are of length k
– Wildcard ? only substitutes a single character
– Alignment between string patterns is bijective
? wrote ? ? described ?
Reformulation of kb-SRK
otherwise ,0),( match, , if,),(
),(),()( )(
tsrts
i
tsr
sgramk tgramk
tsrr
s t
ts
),(),()),(),,(( 22112211 tstststsK r
r
r
Definition
Equivalent formulation based on fact that summations are exchangable
Rr
tsrtsrk
tstsk
sgramk
sgramk
tgramk
tgramk
K
KtstsK
s
s
t
t
),(),(
)),(),,(()),(),,((
2211
2211
22
11
22
11)(
)(
)(
)(
2211
Similarity between Re-writings of K-gram Pairs
abbccbb1s
cbcbbcb1t
abcccdd2s
cbccdcd2t
two k-gram pairs
,1,,1,,0 20012001 rrr ,0,,1,,0 20012001 rrr
a b ? c c ? ? c b c ? ? c ?
200r 2001r
abbccbb1s
cbcbbcb1t
abcccdd2s
cbccdcd2t
a b ? c c ? ? c ? c b ? c ?
applicable to both pairs
not applicable to second pair
Only need to consider the rules applicable to both k-gram pairs
two k-gram pairs
Efficiently Calculating Number of Applicable Rules on Combined k-gram Pairs
abbccbb1s
cbcbbcb1t
abcccdd2s
cbccdcd2t
cbc??c?
ab?cc??
),)(cc,)(,)(,)(cc,)(bb,)(cc,(
),)(,)(cc,)(cc,)(,)(bb,)(aa,(
dbdbcb
dbdbcb
t
s
?)cc,??()cc,)(bb,)(cc,(
??)cc,)(cc,?()bb,)(aa,(
applicable rule applicable rule
combined k-gram pairs two k-gram pairs
identical doubles can be either or not substituted non-identical doubles must be substituted
)661)(!2)()(1)(1( 42422 kK
Time Complexity of Calculation of kb-SRK
• Inner loop
– Time complexity
• Outer loop
– Empirical time complexity
123
)(kO
)( 2 klO
|)||,||,||,max(| 2211 tstsl
4. Applications to Social Media
4.1 Tweet Recommendation
Tweet Recommendation
126
1
1
1 1
1
u1
um
i1 in i2 i3
u2
u3
tweets
users
SVD Feature: Feature-based Matrix Factorization
(Chen et al., 2012)
• Model
• Loss function: squared loss
• Algorithm: Gradient descent
127
iLuLilulgliur iugug ,,,,),(
Matching between User and Item
128
ui
g
uLu iLi
Matching in latent space
global feature vector
user vector item vector
4.2 Short Text Conversation
Short Text Conversation
One Step toward Turing Test Important subtask of dialogue
Human: a message Computer: a reply
Massive Amount of Data Available
Our paper entitled learning to rank has been accepted by ACL .
Congratulations! It is a great achievement
We are lucky. Our paper has been accepted by SIGIR this year. We are going to present it.
Great news! Please accept my congrats!
The PC of WSDM noticed us that our paper has been accepted.
Awesome! It is a great achievement
…..
…..
System of Retrieval-based Short Text Conversation
Given post, find most suitable response
Large repository of post-response pairs
Take it as search problem
index of post and response
matching
ranking
short text
retrieval
retrieved posts and responses
ranked responses
learning to match
learning to rank
online
offline
best response
matched responses
Learning to Match for Short Text Conversation
Learning System
Matching System
Model
t1 c1
t2
tn
c2
cn
c(n+1) t(n+1)
……
……
Linear Matching Model (RMLS) (Wang et al. EMNLP 2013)
• Short text = sum of word vectors
• Matching degree between short texts = Inner product of their document vectors
• Word vectors are learned from short text data
Deep Matching Model (Lu & Li, NIPS 2013)
• Take interactions between texts as input
• Learning features in different granularities
using LDA
• Learning parameters using back-propagation
Representing Posts and Comments as Bags of Words
Mats catch mice Great mats
Mats chase mice Poor mice
P_mats, P_catch, P_mice, C_great, C_mats
P_mats, P_chase, P_mice, C_poor, C_mice
….
….
….
Post Comment Bag of words
Words in post and comment are viewed as different words
Constructing Topics Using Latent Dirichlet Allocation
P_mats, P_catch, P_mice, C_great, C_mats
P_mats, P_chase, P_mice, C_poor, C_mice
Bag of words
….
….
P_mats P_mice ….
C_mice C_mats
P_chase P_catch ….
C_great C_poor ….
Topic
Topics in different granularities form different hidden layers
matching degree
Architecture of Deep Matching Model
(Q,A) interaction space
局部匹配模型
烤鸭啊,想吃热乎的去烤鸭店,如全聚德,真空包装的超市就有
北京有什么出名的特产?
0 0 1 0 1 … 0 0 0 1 0
0
0
1
0
1
…
0
0
0
1
0
Examples Local Model 1: (特产, 土产, 味道, …) || (豆腐, 烤鸭, 甜, 野味, 糯米…) Local Model 2: (路程, 安排, 地点, …) || (距离, 安全, 隧道, 高速, 机票…)
multiple layers
5. Summary
139
Summary
• Learning to rank and learning to match are key technologies for many applications including search and recommendation
• Learning to Rank – Ranking SVM – IR SVM – AdaRank
• Learning to Match – Regularized Matching in Semantic Space – String Re-writing Kernel
• Many applications on social media can be constructed with learning to rank and learning to match techniques
140
References • Hang Li, Learning to Rank for Information Retrieval and Natural Language Processing,
Synthesis Lectures on Human Language Technology, Lecture 12, Morgan & Claypool Publishers, 2011.
• Hang Li, A Short Introduction to Learning to Rank, IEICE Transactions on Information and Systems, E94-D(10), 2011.
• Jun Xu and Hang Li. AdaRank: A Boosting Algorithm for Information Retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference (SIGIR’07), 391-398, 2007.
• Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, Hsiao-Wuen Hon. Adapting Ranking SVM to Document Retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference (SIGIR’06), 186-193, 2006.
• Hang Li, Jun Xu, Semantic Matching in Search, Foundations and Trends in Information Retrieval, Now Publishers, 2014, ISBN 978-1-60198-804-1.
• Wei Wu, Zhengdong Lu, Hang Li, Learning Bilinear Model for Matching Queries and Documents, Journal of Machine Learning Research (JMLR), 14: 2519-2548, 2013.
• Fan Bu, Hang Li, Xiaoyan Zhu, String Re-Writing Kernel, In Proceedings of the 50th Annual Meeting of Association for Computational Linguistics (ACL’12), 449-458, 2012.
• Zhengdong Lu, Hang Li, In Advances in Neural Information Processing Systems (NIPS 2013), 1367-1375.
141