who will follow you back? reciprocal relationship prediction* 1 john hopcroft, 2 tiancheng lou, 3...
Post on 15-Jan-2016
217 views
TRANSCRIPT
![Page 1: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/1.jpg)
Who Will Follow You Back? Reciprocal Relationship Prediction*
1John Hopcroft, 2Tiancheng Lou, 3Jie Tang1Department of Computer Science, Cornell University,2Institute for Interdisciplinary Information Sciences, Tsinghua University3Department of Computer Science, Tsinghua University
![Page 2: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/2.jpg)
Motivation
Two kinds of relationships in social network, one-way(called parasocial) relationship and, two-way(called reciprocal) relationship
Two-way(reciprocal) relationship usually developed from a one-way relationship more trustful.
Try to understand(predict) the formation of two-way relationships micro-level dynamics of the social network. underlying community structure? how users influence each other?
v1
v2v3
v4
v6
v5
v1
v2v3
v4
v6
v5
after 3 days prediction
reciprocalparasocial
![Page 3: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/3.jpg)
100%
30%
1%
60%
Example : real friend relationship
On Twitter : Who Will Follow You Back?
Ladygaga
?
?
?
?
Shiteng
Obama Huwei
JimmyQiao
![Page 4: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/4.jpg)
y1
y2
y3
y4
y1= 1
y2= ?
y4= 0
y3= ?
(v1, v5)
(v1, v2)
(v2, v4)
(v3, v4)
Several key challenges
How to model the formation of two-way relationships? SVM & LRC
How to combine many social theories into the prediction model?
v1
v2v3
v4
v6
v5
y1
y2
y3
y4
y1= 1
y2= ?
y4= 0
y3= ?
(v1, v5)
(v1, v2)
(v2, v4)
(v3, v4)
y1
y2
y3
y4
y1= 1
y2= ?
y4= 0
y3= ?
(v1, v5)
(v1, v2)
(v2, v4)
(v3, v4)
![Page 5: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/5.jpg)
Outline
Previous works Our approach Experimental results Conclusion & future works
![Page 6: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/6.jpg)
Link prediction
Unsupervised link prediction Scores & intution, such as preferential attachment [N01].
Supervised link prediction supervised random walks [BL11]. logistic regression model to predict positive and negative links [L10].
Main differences: We predict a directed link instead of only handles undirected social networks. Our model is dynamic and learned from the evolution of the Twitter network.
![Page 7: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/7.jpg)
Social behavior analysis
Existing works on social behavior analysis: The difference of the social influence on difference topics and to model the topic-
level social influence in social networks. [T09] How social actions evolve in a dynamic social network? [T10]
Main differences: The proposed methods in previous work can be used here but the problem is fundamentally different.
![Page 8: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/8.jpg)
Twitter study
The twitter network. The topological and geographical properties. [J07] Twittersphere and some notable properties, such as a non-power-law follower dis
tribution, and low reciprocity. [K10] The twitter users.
Influential users. Tweeting behaviors of users.
The tweets. Utilize the real-time nature to detect a target event. [S10] TwitterMonitor, to detect emerging topics. [M10]
![Page 9: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/9.jpg)
Outline
Previous works Our approach Experimental results Conclusion & future works
![Page 10: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/10.jpg)
Factor graph model
Problem definition Given a network at time t, i.e., Gt = (Vt, Et, Xt, Yt) Variables y are partially labeled. Goal : infer unknown variables.
Factor graph model P(Y | X, G) = P(X, G|Y) P(Y) / P(X, G) = C0 P(X | Y) P(Y | G) In P(X | Y), assuming that the generative probability is conditionally independent, P(Y | X, G) = C0P(Y | G)ΠP(xi|yi) How to deal with P(Y | G) ?
![Page 11: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/11.jpg)
Random Field
aYbY
cY
eY
fY
dY
Given an undirected graph G=(V, E) such that Y={Yv|v∈V}, if
the probability of Yv given X and those random variables corresponding to nodes neighboring v in G. Then (X, Y) is a conditional random field.
undirected graphical model globally conditioned on X
![Page 12: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/12.jpg)
Conditional random field(CRF)
Model them in a Markov random field, by the Hammersley-Clifford theorem, P(xi|yi) = 1/Z1 * exp {Σαj fj (xij, yi)}
P(Y|G) = 1/Z2 * exp {ΣcΣkμkhk(Yc)}
Z1 and Z2 are normalization factors.
So, the likelihood probability is : P(Y | X, G) = 1/Z0 * exp {Σαj fj (xij, yi) + ΣcΣkμkhk(Yc)}
Z0 is normalization factors.
![Page 13: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/13.jpg)
Maximize likelihood
Objective function O(θ) = log Pθ(Y | X, G) = ΣiΣjαj fj (xij, yi) + ΣΣμk hk(Yc) – log Z
Learning the model to estimate a parameter configuration θ= {α, μ} to maximize the objective functio
n : that is, the goal is to compute θ* = argmax O(θ)
![Page 14: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/14.jpg)
Learning algorithm
Goal : θ* = argmax O(θ) Newton methold. The gradient of each μk with r
egard to the objective function. dθ/ dμk= E[hk(Yc)] – EPμk(Yc|X, G)
[hk(Yc)]
( ) '( ) ( )
( )
( ) ( )
( ) ( )( ) '
( ) ( )
( )( )
( )
( )
( ( ))( , )
( )
( ) exp ( , )
exp( ( , )) ( , )( ( ))
( ) exp ( , )
exp( ( , )) ( , )
exp ( , )
( |
kk k
j kkj
k k
y
k kjk
y
k k
y
kk
jky
y
k
L Z xF y x
Z x
Z x F y x
F y x F y xZ x
Z x F y x
F y xF y x
F y x
p y x
( )
( )
( )( | )
) ( , )
( , )k
kj
y
kjp Y x
F y x
E F Y x
![Page 15: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/15.jpg)
Learning algorithm
One challenge : how to calculate the marginal distribution Pμk(yc|x, G). Intractable to directly calculate the marginal distribution. Use approximation algorithm : Belief propagation.
![Page 16: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/16.jpg)
Belief propagation
Sum-product The algorithm works by passing real values functions called “messages” along
the edges between the nodes. Extend edges to triangles
![Page 17: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/17.jpg)
Belief propagation
A “message” from a variable node v to a factor node u is μv->u(xv) = Πu*∈N(v)\{u}μu*->v(xv)
A “message” from a factor node u to a variable node v is μu->v(xu) = Σx’ux’v=xvΠu*∈N(u)\{v}μv*->u(x’u)
![Page 18: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/18.jpg)
Learning algorithm(TriFG model)
Input : network Gt, learning rateηOutput : estimated parametersθ
Initalize θ= 0;Repeat
Perform LBP to calculate marginal distribution of unknown variables P(y i|xi, G);Perform LBP to calculate marginal distribution of triad c, i.e. P(yc|Xc, G);Calculate the gradient of μk according to :
dθ/ dμk= E[hk(Yc)] – EPμk(Yc|X, G)[hk(Yc)]Update parameter θ with the learning rate η:
θ new = θold + ηd θ
Until Convergence;
![Page 19: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/19.jpg)
LocalGlobal
Prediction features
Geographic distance Global vs Local
Homophily Link homophily Status homophily
Implicit structure Retweet or reply Retweeting seems to be m
ore helpful Structural balance
Two-way relationships are balanced (88%),
But, one-way relationships are not (only 29%).
Users who share common links will have a tendency to follow each other.
Elite users have a much stronger tendency to follow each other
(A) and (B) are balanced, but (C) and (D) are not.
![Page 20: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/20.jpg)
Our approach : TriFG
TriFG model Features based on observations Partially labeled Conditional random field Triad correlation factors
![Page 21: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/21.jpg)
Outline
Previous works Our approach Experimental results Conclusion & future works
![Page 22: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/22.jpg)
Data collection
Huge sub-network of twitter 13,442,659 users and 56,893,234 following links. Extracted 35,746,366 tweets.
Dynamic networks With an average of 728,509 new links per day. Averagely 3,337 new follow-back links per day. 13 time stamps by viewing every four days as a time stamp
![Page 23: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/23.jpg)
Prediction performance
Baseline algorithms SVM & LRC & CRF
Accurately infer 90% of reciprocal relationships in twitter.
Data Algotithm Precision Recall F1Measure Accuracy
Test
Case
1
SVM 0.6908 0.6129 0.6495 0.9590
LRC 0.6957 0.2581 0.3765 0.9510
CRF 1.0000 0.6290 0.7723 0.9770
TriFG 1.0000 0.8548 0.9217 0.9910
Test
Case
2
SVM 0.7323 0.6212 0.6722 0.9534
LRC 0.8333 0.3030 0.4444 0.9417
CRF 1.0000 0.6333 0.7755 0.9717
TriFG 1.0000 0.8788 0.9355 0.9907
![Page 24: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/24.jpg)
Convergence property
BLP-based learning algorithm can converge in less than 10 iterations. After only 7 learning iterations, the prediction performance of TriFG become
s stable.
![Page 25: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/25.jpg)
Effect of Time Span
Distribution of time span in which follow back action is performed. 60% of follow-backs are performed in the next time stamp, 37% of follow-backs would be still performed in the following 3 time stamps.
How different settings of the time span. The prediction performance of TriFG drops sharply when setting the time span as
two or less. The performance is acceptable, when setting it as three time stamps.
![Page 26: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/26.jpg)
Factor contribution analysis
We consider five different factor functions Geographic distance(G) Link homophily(L) Status homophily(S) Implicit network correlation(I) Structural balance correlation(B)
![Page 27: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/27.jpg)
Outline
Previous works Our approach Experimental results Conclusion & future works
![Page 28: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/28.jpg)
Conclusion
Reciprocal relationship prediction in social network Incorporates social theories into prediction model. Several interesting phenomena.
Elite users tend to follow each other. Two-way relationships on Twitter are balanced, but one-way relationships are
not. Social networks are going global, but also stay local.
![Page 29: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/29.jpg)
Future works
Other social theories for reciprocal relationship prediction. User feedback. Incorporating user interactions. Building a theory for different kinds of networks.
![Page 30: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/30.jpg)
Thanks! Q & A
![Page 31: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/31.jpg)
Reference
[BL11] L.Backstrom and J.Leskovec. Supervised random walks : predicting and recommending links in social networks. In WSDM’11
[C10] D.J.Crandall, L.Backstrom, D. Cosley, S.Suri, D.Huttenlocher, and J. Kleinberg. Inferring social ties from geographic coincidences. PNAS, Dec. 2010
[W10] C.Wang, J. Han, Y.Jia, J.Tang, D.Zhang, Y. Yu and J.Guo. Mining advisor-advisee relationships from research publication networks. In KDD’10.
[N01]M.E.J. Newman. Clustering and preferential attachment in growing networks. Phys. Rev. E, 2001
[L10] J.Leskovec, D.Huttenlocher, and J.Kleinberg. Predicting positive and negative links in online social networks. In WWW10.
[T10] C.Tan, J. Tang, J. Sun, Q.Lin, and F.Wang. Social action tracking via noise tolerant time-varying factor graphs. In KDD10
[T09] J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD09.
![Page 32: Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,](https://reader034.vdocuments.mx/reader034/viewer/2022051417/56649d545503460f94a3163b/html5/thumbnails/32.jpg)
Reference
[J07]A. Java, X.Song, T.Finin, and B.L. Tseng. Why we twitter : An analysis of a microblogging community. In KDD2007.
[K10]H. Kwak, C.Lee, H.Park, and S.B. Moon. What is twitter, a social network or a news media? In WWW2010.
[M10]M.Mathioudakis and N.Koudas. Twittermonitor : trend detection over the twitter stream. In SIGMOD10.
[S10]T. Sakaki, M. Okazaki, and Y.Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW10.