social networks and graph mining christos faloutsos cmu - mld

45
Social Networks and Graph Mining Christos Faloutsos CMU - MLD

Upload: herbert-day

Post on 19-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

Social Networks and Graph Mining

Christos Faloutsos

CMU - MLD

Page 2: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 2

Outline

• Problem definition / Motivation

• Graphs and power laws

• [Virus propagation]

• [e-bay fraud detection]

• Conclusions

Page 3: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 3

Motivation

• Data mining: ~ find patterns (rules, outliers)

• Problem#1: How do real graphs look like?

• Problem#2: How do viruses propagate?

• Problem#3: How to spot fraudsters in e-bay?

Page 4: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 4

Problem#1: Joint work with

• Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

Page 5: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 5

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

Page 6: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 6

Graphs - why should we care?

• network of companies & board-of-directors members

• ‘viral’ marketing

• web-log (‘blog’) news propagation

• computer network security: email/IP traffic and anomaly detection

• ....

Page 7: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 7

Problem #1 - network and graph mining• How does the Internet look like?• How does the web look like?• What constitutes a ‘normal’ social

network?• What is ‘normal’/‘abnormal’?• which patterns/laws hold?

Page 8: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 8

Graph mining

• Are real graphs random?

Page 9: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 9

Laws and patterns

NO!!

• Diameter

• in- and out- degree distributions

• other (surprising) patterns

Page 10: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 10

Solution• Power law in the degree distribution [SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Page 11: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 11

But:

• Q1: How about graphs from other domains?

• Q2: How about temporal evolution?

Page 12: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 12

The Peer-to-Peer Topology

• Frequency versus degree • Number of adjacent peers follows a power-law

[Jovanovic+]

Page 13: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 13

More power laws:

citation counts: (citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

Ullman

Page 14: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 14

Swedish sex-web

Nodes: people (Females; Males)Links: sexual relationships

Liljeros et al. Nature 2001

4781 Swedes; 18-74; 59% response rate.

Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt

Page 15: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 15

More power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(in-degree)

log(count)

Zipf

userssites

``ebay’’

Page 16: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 16

epinions.com

• who-trusts-whom [Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

Page 17: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 17

But:

• Q1: How about graphs from other domains?

• Q2: How about temporal evolution?

Page 18: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 18

Time evolution

• with Jure Leskovec (CMU/MLD)

• and Jon Kleinberg (Cornell – sabb. @ CMU)

Page 19: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 19

Evolution of the Diameter

• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)

• What is happening in real data?

Page 20: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 20

Evolution of the Diameter

• Prior work on Power Law graphs hints at slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)

• What is happening in real data?

• Diameter shrinks over time– As the network grows the distances between

nodes slowly decrease

Page 21: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 21

Diameter – ArXiv citation graph

• Citations among physics papers

• 1992 –2003

• One graph per year

time [years]

diameter

Page 22: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 22

Diameter – “Autonomous Systems”

• Graph of Internet

• One graph per day

• 1997 – 2000

number of nodes

diameter

Page 23: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 23

Diameter – “Affiliation Network”

• Graph of collaborations in physics – authors linked to papers

• 10 years of data

time [years]

diameter

Page 24: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 24

Diameter – “Patents”

• Patent citation network

• 25 years of data

time [years]

diameter

Page 25: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 25

Temporal Evolution of the Graphs

• N(t) … nodes at time t

• E(t) … edges at time t

• Suppose thatN(t+1) = 2 * N(t)

• Q: what is your guess for E(t+1) =? 2 * E(t)

Page 26: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 26

Temporal Evolution of the Graphs

• N(t) … nodes at time t• E(t) … edges at time t• Suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for E(t+1) =? 2 * E(t)

• A: over-doubled!– But obeying the ``Densification Power Law’’

Page 27: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 27

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

??

Page 28: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 28

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

1.69

Page 29: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 29

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

1.69

1: tree

Page 30: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 30

Densification – Physics Citations

• Citations among physics papers

• 2003:– 29,555 papers,

352,807 citations

N(t)

E(t)

1.69clique: 2

Page 31: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 31

Densification – Patent Citations• Citations among

patents granted

• 1999– 2.9 million nodes– 16.5 million

edges

• Each year is a datapoint N(t)

E(t)

1.66

Page 32: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 32

Densification – Autonomous Systems

• Graph of Internet

• 2000– 6,000 nodes– 26,000 edges

• One graph per day

N(t)

E(t)

1.18

Page 33: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 33

Densification – Affiliation Network

• Authors linked to their publications

• 2002– 60,000 nodes

• 20,000 authors

• 38,000 papers

– 133,000 edgesN(t)

E(t)

1.15

Page 34: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 34

Outline

• Problem definition / Motivation

• Graphs and power laws

• [Virus propagation]

• [e-bay fraud detection]

• Conclusions

Page 35: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 35

Virus propagation

• How do viruses/rumors propagate?

• Will a flu-like virus linger, or will it become extinct soon?

Page 36: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 36

The model: SIS

• ‘Flu’ like: Susceptible-Infected-Susceptible

• Virus ‘strength’ s= /

Infected

Healthy

NN1

N3

N2Prob.

Prob. β

Prob.

Page 37: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 38

Epidemic threshold

What should depend on?

• avg. degree? and/or highest degree?

• and/or variance of degree?

• and/or third moment of degree?

• and/or diameter?

Page 38: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 39

Epidemic threshold

• [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

Page 39: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 40

Epidemic threshold

• [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

largest eigenvalueof adj. matrix A

attack prob.

recovery prob.epidemic threshold

Proof: [Wang+03]

Page 40: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 41

Experiments (Oregon)

/ > τ (above threshold)

/ = τ (at the threshold)

/ < τ (below threshold)

Page 41: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 42

Outline

• Problem definition / Motivation

• Graphs and power laws

• [Virus propagation]

• [e-bay fraud detection]

• Conclusions

Page 42: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 43

E-bay Fraud detection

w/ Polo Chau, CMU

Page 43: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 44

E-bay Fraud detection - NetProbe

Page 44: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 45

Conclusions

• Graphs pose fascinating problems

• self-similarity/fractals and power laws work, when textbook methods fail!

• Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++…

Page 45: Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '07 46

Contact info

[email protected]• www.cs.cmu.edu/~christos