predicting positive and negative links in online social networks

Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell)

Rich social structure in online computing applications

Such structures are modeled by networks

Most social network analyses view links as positive Friends Fans Followers

But generally links can convey either friendship or antagonism

2

Our plan Study social interactions on the Web that have positive

and negative relationships Questions

How do edge signs and network structure interact? Approach

Edge sign prediction problem Given a network and signs on all but

one edge, predict the missing sign Applications

Friend recommendation for social media Easier to predict whether you know someone

vs. to predict what you think of them3

+

+?+

++

+

+–

––

–

––

–

Each link AB is explicitly tagged with a sign: Epinions: Trust/Distrust

Does A trust B’s product reviews?(only positive links are visible)

Wikipedia: Support/Oppose Does A support B to become

Wikipedia administrator? Slashdot: Friend/Foe

Does A like B’s comments? Other examples:

World of Warcraft [Szell et al. 2010]

+

++

+++

+

+–

––

–

––

–

4

Edge signs can be predicted with ~90% accuracy using only the local network structure No need for global trust-propagation mechanisms

Data oriented justification of classical theories from social psychology Our models align with theories of Balance and Status

Near perfect generalization: Same underlying mechanism of signed edge creation

Can train on “how people vote” and predict trust as well as the model trained on trust itself

5

Machine Learning formulation: Predict sign of edge (u,v) Class label:

+1: positive edge -1: negative edge

Learning method: Logistic regression

Evaluation: Accuracy and ROC curves

Dataset: Original: 80% +edges Balanced: 50% +edges

Features for learning: Next slide

6

uu

vv

+

+?+

+

++

+

––

–

–

––

–

For each edge (u,v) create features: Triad counts (16):

Counts of signed triads edge uv takes part in

Node degree (7 features): Signed degree:

d+out(u), d-

out(u), d+

in(v), d-in(v)

Total degree: dout(u), din(v)

Embeddedness of edge (u,v)

7

u v

-+++

--+-

Error rates: Epinions: 6.5% Slashdot: 6.6% Wikipedia: 19%

Signs can be modeled from local network structure alone Trust propagation model of

[Guha et al. ‘04] has 14% error on Epinions

Triad features perform less well for less embedded edges

Wikipedia is harder to model: Votes are publicly visible

8

Epin

Slash

Wiki

Our goal is not just to predict signs but also to derive insights into usage of signed edges

Logistic regression learns a weight bi for each feature xi:

Connection to theories from social psychology: Structural balance Theory of statuswhich both give predictions onthe sign of the edge (u,v) basedon the triad it is embedded into

9

u v

+-

Consider edges as undirected Start with intuition [Heider ’46]:

Friend of my friend is my friend Enemy of enemy is my friend Enemy of friend is my enemy

Look at connected triples of nodes that are consistent with this logic:

10

+++

--

+--+

Status theory [Davis-Leinhardt ‘68, Guha et al. ’04, Leskovec et al. ‘10] Link u v means: v has higher status than u Link u v means: v has lower status than u Based on signs/directions of links from/to node

x make a prediction Status and balance can make different predictions:

11

+

–

u v

x--

u v

x

++

Balance: +Status: –LogReg: –




u v

x -+

Balance: –Status: –LogReg: –

Balance: –Status: –LogReg: –

12

+ ++ -- +- -+ ++ -- +- -

+ ++ -- +- -

+ ++ -- +- -

Both theories agree well with learned models

Further observations: Backward-backward triads have smaller

weights than forward and mixed direction triads

Balance is in better agreement with Epinions and Slashdot while Status is with Wikipedia

Balance consistently disagrees with “enemy of my enemy is my friend”

13

vv

xx

uu

Balance based and learned coefficients:

14

FeatureBalancetheory

EpinSlashdot

Wiki

const 0 0.43 1.49 0.04

1 0.05 0.04 0.05

-1 -0.11 -0.24 -0.16

-1 -0.21 -0.35 -0.14

1 -0.01 -0.03 -0.05

+ --

- +-

- -+

+ ++

Model if signs would be created purely based on Balance theoryModel if signs would be created

purely based on Balance theory

Status based and learned coefficients:

15

FeatureStatustheory

EpinSlashdot

Wiki

const 0 -0.68 -1.39 -0.30

u < x < v 1 0.11 0.05 0.03

u > x > v -1 -0.10 -0.11 -0.19

u < x > v 0 0.06 0.16 0.03

u > x < v 0 -0.01 0.04 0.05

vv

xx

uu+ +

Triads whereu > x > v

Triads whereu > x > v

vv

xx

uu

vv

xx

uu

vv

xx

uu

+

+

––

–

–

Model if signs would be created purely based on Status theoryModel if signs would be created purely based on Status theory

16

Epin

Slash

Wiki

Do people use these very different linking systems by obeying the same principles? How generalizable are the results across the datasets?

Train on row “dataset”, predict on “column”

Almost perfect generalization of the models even though networks come from very different applications

17

Suppose we are only interested in predicting whether there is a trust edge or no edge

Does knowing negative edges help?

18

+

+?+

++

+

+–

––

–

––

–

+

+?+

++

+

+

Vs.

YES!

19

5

1

3

+ +-

22Jure Leskovec

Heuristic predictors for (u,v): Balance: Chose sign that makes

majority of the triads balanced Status: Predict in based on status

(v)=d+in(u)+d-

out(u)-d+out(v)-d-

in(v) Out-sign of v: majority sign In-sign of u: majority sign

Observations: Triadic models do better with

increasing embeddedness23

predicting positive and negative links in online social networks

Documents

edge signs

sign of edge u

positive edge

dinvembeddedness of

signed triads edge uv

ulink u v

theories of balance

uvx balance