influence and correlation in social networks aris anagnostopoulos ravi kumar mohammad mahdian

40
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Influence and Correlation in Social Networks

Aris AnagnostopoulosRavi Kumar

Mohammad Mahdian

Page 2: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Preliminaries

- Correlations exist in users' behaviors 

Page 3: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Preliminaries

- Correlations exist in users' behaviors - Representation:     individuals are nodes of a social graph, G    every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G:     the event that u becomes active is correlated with v becoming active

Page 4: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Preliminaries

- Correlations exist in users' behaviors - Representation:     individuals are nodes of a social graph, G    every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G:     the event that u becomes active is correlated with v becoming active

- Want to distinguish between different sources of social correlation

Page 5: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Models of Social Correlation

- Homophily = tendency for individuals to choose friends with similar characteristics / preferences

Page 6: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Models of Social Correlation

- Homophily = tendency for individuals to choose friends with similar characteristics / preferences

- Confounding = external influence from elements in the environment (confounding factors)

Page 7: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Models of Social Correlation

- Homophily = tendency for individuals to choose friends with similar characteristics / preferences

- Confounding = external influence from elements in the environment (confounding factors)

- Influence = the action of one individual induces another individual to act in a similar way.

Page 8: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Motivation

- Useful to know when social influence is the source of correlation

Page 9: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Motivation

- Useful to know when social influence is the source of correlation

- Viral marketing -> want to target select individuals

Page 10: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Motivation

- Useful to know when social influence is the source of correlation

- Viral marketing -> want to target select individuals

- Influence behavior -> create "role models" (e.g. in fashion)

Page 11: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Motivation

- Useful to know when social influence is the source of correlation

- Viral marketing -> want to target select individuals

- Influence behavior -> create "role models" (e.g. in fashion)

- We want to identify situations when such techniques can be applied.

Page 12: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Motivation

- Useful to know when social influence is the source of correlation

- Viral marketing -> want to target select individuals

- Influence behavior -> create "role models" (e.g. in fashion)

- We want to identify situations when such techniques can be applied.

- Also useful for analysis (predicting future state of network)

Page 13: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Modeling Influence

1. Graph G drawn according to some distribution 

Page 14: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Modeling Influence

1. Graph G drawn according to some distribution 2. In each of the time steps 1, ..., T, each non-active agent decides whether to become active.  

Page 15: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Modeling Influence

1. Graph G drawn according to some distribution 2. In each of the time steps 1, ..., T, each non-active agent decides whether to become active.  3. An agent becomes active with probability p(a), a function of the number of neighboring and active nodes.

Page 16: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

or, alternatively,

Page 17: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Some remarks...

- The coefficient α measures social correlation. 

Page 18: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Some remarks...

- The coefficient α measures social correlation. 

- Since actions are stored, a represents the number of users active at any earlier time step

Page 19: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Some remarks...

- The coefficient α measures social correlation. 

- Since actions are stored, a represents the number of users active at any earlier time step

- This model is relatively simplistic:     - the probability does not vary between nodes    - or as time passes

Page 20: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Some remarks...

- The coefficient α measures social correlation. 

- Since actions are stored, a represents the number of users active at any earlier time step

- This model is relatively simplistic:     - the probability does not vary between nodes    - or as time passes

- However, these simplifying assumption are practical

Page 21: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Estimating α, β

 - Can estimate using maximum likelihood logistic regression

 - Maximize expression

whereis the number of users who at the beginning of time had a active friends and became active at time t

Page 22: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

The Shuffle Test

- Idea: if influence does not play a role, then the timing of activations amongst users should be independent of each other:

Pr(a active before b) = Pr(b active before a)

Page 23: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

The Shuffle Test

1. Estimate α for initial graph2. Randomly permute the order in which active nodes have been activated:

set the time of

3. Estimate α' for this configuration4. If the values for α and α' are close to each other, the model exhibits little or no social influence.

Page 24: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

The Edge-reversal Test

1. reverse direction of all the edges 2. run the same logistic regression on the data using the new graph

If correlation is not due to influence, then α should not change

Page 25: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Generative Models

- No Correlation

- Influence

- Correlation, no influence

Page 26: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Generative Models - No Correlation

- network grows just as the real data - at every step, randomly pick n nodes, and make them active

Page 27: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Influence Model- network grows just as the real data -  at every step, every inactive node flips a coin, with

Page 28: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Correlation, No Influence Model

- network grows just as the real data - Pick a subset S of G:    - randomly pick centers, add a ball of radius 2 from each to S    - do this until |S| reaches parameter L- Pick nodes to become active uniformly at random, from S 

Page 29: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Distinguishing Influence: Shuffle Test

Influence:

Correlation:

Page 30: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Distinguishing Influence: Edge Reversal

Correlation:

Influence:

Page 31: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Real Data: the Flickr Dataset

- analyzed 800K users over 16 months - about 340K exhibited tagging behavior

- size of giant component: 160K

- 2.8M directed edges, 28.5% not mutual

- analyzed 1,700 tags independently    - various types (event, color, object, etc)    - various numbers of users    - various growth patterns (bursty, smooth, periodic)

Page 32: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Distinguishing Influence in Flickr

Shuffle test

Page 33: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Distinguishing Influence in Flickr

Edge reversal test

Page 34: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Some Influence

- can discover traces of influence by looking at similar tags 

Page 35: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Some Influence

- can discover traces of influence by looking at similar tags - for the tag "graffiti", the difference between αs was 0

- however, for the misspelling "grafitti", difference was slightly larger

- with even less common misspelling "graffitti", difference increased even more

Page 36: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Conclusions

- distinguishing between correlation and causation is difficult

Page 37: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Conclusions

- distinguishing between correlation and causation is difficult

- timing information can help answer the question (shuffle)

Page 38: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Conclusions

- distinguishing between correlation and causation is difficult

- timing information can help answer the question (shuffle)

- knowing of asymmetric social ties is also useful (edge-reversal)

Page 39: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Further research directions

- formal verification of results? (controlled experiments) - quantification of the strength of influence?  - identify which nodes influence others  - what if social ties are symmetric?   - distinguishing between other forms of correlation

- distinguishing between different forms of social influence

Page 40: Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

Questions?