predicting positive and negative links in online social networks
DESCRIPTION
Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell). Predicting Positive and Negative Links in Online Social Networks. Social Interaction on the Web. Rich social structure in online computing applications Such structures are modeled by networks - PowerPoint PPT PresentationTRANSCRIPT
Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell)
Rich social structure in online computing applications
Such structures are modeled by networks
Most social network analyses view links as positive Friends Fans Followers
But generally links can convey either friendship or antagonism
2
Our plan Study social interactions on the Web that have positive
and negative relationships Questions
How do edge signs and network structure interact? Approach
Edge sign prediction problem Given a network and signs on all but
one edge, predict the missing sign Applications
Friend recommendation for social media Easier to predict whether you know someone
vs. to predict what you think of them3
+
+?+
++
+
+–
––
–
––
–
Each link AB is explicitly tagged with a sign: Epinions: Trust/Distrust
Does A trust B’s product reviews?(only positive links are visible)
Wikipedia: Support/Oppose Does A support B to become
Wikipedia administrator? Slashdot: Friend/Foe
Does A like B’s comments? Other examples:
World of Warcraft [Szell et al. 2010]
+
++
+++
+
+–
––
–
––
–
4
Edge signs can be predicted with ~90% accuracy using only the local network structure No need for global trust-propagation mechanisms
Data oriented justification of classical theories from social psychology Our models align with theories of Balance and Status
Near perfect generalization: Same underlying mechanism of signed edge creation
Can train on “how people vote” and predict trust as well as the model trained on trust itself
5
Machine Learning formulation: Predict sign of edge (u,v) Class label:
+1: positive edge -1: negative edge
Learning method: Logistic regression
Evaluation: Accuracy and ROC curves
Dataset: Original: 80% +edges Balanced: 50% +edges
Features for learning: Next slide
6
uu
vv
+
+?+
+
++
+
––
–
–
––
–
For each edge (u,v) create features: Triad counts (16):
Counts of signed triads edge uv takes part in
Node degree (7 features): Signed degree:
d+out(u), d-
out(u), d+
in(v), d-in(v)
Total degree: dout(u), din(v)
Embeddedness of edge (u,v)
7
u v
-+++
--+-
Error rates: Epinions: 6.5% Slashdot: 6.6% Wikipedia: 19%
Signs can be modeled from local network structure alone Trust propagation model of
[Guha et al. ‘04] has 14% error on Epinions
Triad features perform less well for less embedded edges
Wikipedia is harder to model: Votes are publicly visible
8
Epin
Slash
Wiki
Our goal is not just to predict signs but also to derive insights into usage of signed edges
Logistic regression learns a weight bi for each feature xi:
Connection to theories from social psychology: Structural balance Theory of statuswhich both give predictions onthe sign of the edge (u,v) basedon the triad it is embedded into
9
u v
+-
Consider edges as undirected Start with intuition [Heider ’46]:
Friend of my friend is my friend Enemy of enemy is my friend Enemy of friend is my enemy
Look at connected triples of nodes that are consistent with this logic:
10
+++
--
+--+
Status theory [Davis-Leinhardt ‘68, Guha et al. ’04, Leskovec et al. ‘10] Link u v means: v has higher status than u Link u v means: v has lower status than u Based on signs/directions of links from/to node
x make a prediction Status and balance can make different predictions:
11
+
–
u v
x--
u v
x
++
Balance: +Status: –LogReg: –
Balance: +Status: –LogReg: –
Balance: +Status: –LogReg: –
Balance: +Status: –LogReg: –
u v
x -+
Balance: –Status: –LogReg: –
Balance: –Status: –LogReg: –
12
+ ++ -- +- -+ ++ -- +- -
+ ++ -- +- -
+ ++ -- +- -
Both theories agree well with learned models
Further observations: Backward-backward triads have smaller
weights than forward and mixed direction triads
Balance is in better agreement with Epinions and Slashdot while Status is with Wikipedia
Balance consistently disagrees with “enemy of my enemy is my friend”
13
vv
xx
uu
Balance based and learned coefficients:
14
FeatureBalancetheory
EpinSlashdot
Wiki
const 0 0.43 1.49 0.04
1 0.05 0.04 0.05
-1 -0.11 -0.24 -0.16
-1 -0.21 -0.35 -0.14
1 -0.01 -0.03 -0.05
+ --
- +-
- -+
+ ++
Model if signs would be created purely based on Balance theoryModel if signs would be created
purely based on Balance theory
Status based and learned coefficients:
15
FeatureStatustheory
EpinSlashdot
Wiki
const 0 -0.68 -1.39 -0.30
u < x < v 1 0.11 0.05 0.03
u > x > v -1 -0.10 -0.11 -0.19
u < x > v 0 0.06 0.16 0.03
u > x < v 0 -0.01 0.04 0.05
vv
xx
uu+ +
Triads whereu > x > v
Triads whereu > x > v
vv
xx
uu
vv
xx
uu
vv
xx
uu
+
+
––
–
–
Model if signs would be created purely based on Status theoryModel if signs would be created purely based on Status theory
16
Epin
Slash
Wiki
Do people use these very different linking systems by obeying the same principles? How generalizable are the results across the datasets?
Train on row “dataset”, predict on “column”
Almost perfect generalization of the models even though networks come from very different applications
17
Suppose we are only interested in predicting whether there is a trust edge or no edge
Does knowing negative edges help?
18
+
+?+
++
+
+–
––
–
––
–
+
+?+
++
+
+
Vs.
YES!
19
5
1
3
+ +-
20
21
22Jure Leskovec
Heuristic predictors for (u,v): Balance: Chose sign that makes
majority of the triads balanced Status: Predict in based on status
(v)=d+in(u)+d-
out(u)-d+out(v)-d-
in(v) Out-sign of v: majority sign In-sign of u: majority sign
Observations: Triadic models do better with
increasing embeddedness23