cmu scs graph and stream mining christos faloutsos cmu

38
CMU SCS Graph and stream mining Christos Faloutsos CMU

Post on 21-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

CMU SCS

Graph and stream mining

Christos Faloutsos

CMU

CALD IC 2004 © C. Faloutsos (2004) 2

CMU SCS

CONGRATULATIONS!

CALD IC 2004 © C. Faloutsos (2004) 3

CMU SCS

Outline

• Problem definition / Motivation

• Graphs and power laws

• Streams and forecasting

• Conclusions

CALD IC 2004 © C. Faloutsos (2004) 4

CMU SCS

Motivation

• Data mining: ~ find patterns

• How do real graphs look like?

• How do (numerical) streams look like?

CALD IC 2004 © C. Faloutsos (2004) 5

CMU SCS

Joint work with

• Deepayan Chakrabarti (CMU - CALD)

CALD IC 2004 © C. Faloutsos (2004) 6

CMU SCS

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

CALD IC 2004 © C. Faloutsos (2004) 7

CMU SCS

Problem #1 - network and graph mining• How does the Internet look like?• How does the web look like?• What constitutes a ‘normal’ social

network?• What is the ‘network value’ of a

customer? • which gene/species affects the others

the most?

CALD IC 2004 © C. Faloutsos (2004) 8

CMU SCS

Problem#1Given a graph:

• which node to market-to / defend / immunize first?

• Are there un-natural sub-graphs? (criminals’ rings or terrorist cells)?

• How do P2P networks evolve?

CALD IC 2004 © C. Faloutsos (2004) 9

CMU SCS

Solution#1:• A1: Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CALD IC 2004 © C. Faloutsos (2004) 10

CMU SCS

Solution#1’: Eigen Exponent E

• power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

CALD IC 2004 © C. Faloutsos (2004) 11

CMU SCS

But:

• Q1: How about graphs from other domains?

• Q2: How about temporal evolution?

CALD IC 2004 © C. Faloutsos (2004) 12

CMU SCS

More power laws:

citation counts: (citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

Ullman

CALD IC 2004 © C. Faloutsos (2004) 13

CMU SCS

More power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(freq)

log(count)

Zipf

userssites

CALD IC 2004 © C. Faloutsos (2004) 14

CMU SCS

The Peer-to-Peer Topology

• Frequency versus degree • Number of adjacent peers follows a power-law

[Jovanovic+]

CALD IC 2004 © C. Faloutsos (2004) 15

CMU SCS

epinions.com

• who-trusts-whom [Richardson + Domingos, KDD 2001]

(out) degree

count

CALD IC 2004 © C. Faloutsos (2004) 16

CMU SCS

More Power laws

• Also hold for other web graphs [Barabasi+], [Tomkins+], with additional ‘rules’ (bi-partite cores follow power laws)

CALD IC 2004 © C. Faloutsos (2004) 17

CMU SCS

CALD IC 2004 © C. Faloutsos (2004) 18

CMU SCS

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

• similarly, in many other languages; for customers and sales volume; city populations etc etclog(rank)

log(freq)

CALD IC 2004 © C. Faloutsos (2004) 19

CMU SCS

Olympic medals (Sidney):

y = -0.9676x + 2.3054

R2 = 0.9458

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2

Series1

Linear (Series1)

rank

log(#medals)

CALD IC 2004 © C. Faloutsos (2004) 20

CMU SCS

More power laws: areas – Korcak’s law

Scandinavian lakes area vs complementary cumulative count (log-log axes)

log(count( >= area))

log(area)

CALD IC 2004 © C. Faloutsos (2004) 21

CMU SCS

CALD IC 2004 © C. Faloutsos (2004) 22

CMU SCS

Time evolution

• Do these laws hold over time?

CALD IC 2004 © C. Faloutsos (2004) 23

CMU SCS

Time Evolution: rank R

-1

-0.9

-0.8

-0.7

-0.6

-0.50 200 400 600 800

Instances in time: Nov'97 - now

Ra

nk

ex

po

ne

nt

The rank exponent has not changed!

Domainlevel

log(rank)

log(degree)

-0.82

att.com

ibm.com

CALD IC 2004 © C. Faloutsos (2004) 24

CMU SCS

Outline

• Problem definition / Motivation

• Graphs and power laws

• Streams and forecasting

• Conclusions

CALD IC 2004 © C. Faloutsos (2004) 25

CMU SCS

Why care about streams?• Sensor devices

– Temperature, weather measurements– Road traffic data– Geological observations– Patient physiological data

• Embedded devices– Network routers– Intelligent (active) disks

CALD IC 2004 © C. Faloutsos (2004) 26

CMU SCS

Modeling bursty traffic

Given a signal (eg., bytes over time)

• model bursty traffic

• generate realistic traces

• (Poisson does not work)

time

# bytes

Poisson

CALD IC 2004 © C. Faloutsos (2004) 27

CMU SCS

Modeling bursty traffic

Given a signal (eg., bytes over time)

• give guarantees:

time

# bytes

Poisson

CALD IC 2004 © C. Faloutsos (2004) 28

CMU SCS

Solution: self-similarity

# bytes

time time

# bytes

CALD IC 2004 © C. Faloutsos (2004) 29

CMU SCS

Approach

• Q1: How to generate a sequence, that is– bursty– self-similar– and has ‘realistic’ queue length distributions

CALD IC 2004 © C. Faloutsos (2004) 30

CMU SCS

Binary multifractals20 80

CALD IC 2004 © C. Faloutsos (2004) 31

CMU SCS

Binary multifractals20 80

[Mengzhi Wang, best student paper award, PEVA’02]

CALD IC 2004 © C. Faloutsos (2004) 32

CMU SCS

Forecasting:

• AWSOM: using wavelets and AutoRegression [Papadimitriou+, 2003]

CALD IC 2004 © C. Faloutsos (2004) 33

CMU SCS

ResultsReal data – Sunspot

• Sunspot intensity – Slightly time-varying “period”• AR captures wrong trend (average)• Seasonal ARIMA

– Captures immediate wrong downward trend– Requires human to determine seasonal component period (fixed)

CALD IC 2004 © C. Faloutsos (2004) 34

CMU SCS

ResultsReal data – Sunspot

• Sunspot intensity – Slightly time-varying “period”

CALD IC 2004 © C. Faloutsos (2004) 35

CMU SCS

Conclusions

• Graphs & streams pose fascinating problems

• self-similarity, fractals and power laws work, when textbook methods fail!

CALD IC 2004 © C. Faloutsos (2004) 36

CMU SCS

Other on-going projects• video data mining [Pan + Yang]

• Disk traffic modeling [Ailamaki;

Ganger]

CALD IC 2004 © C. Faloutsos (2004) 37

CMU SCS

Books

• Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

CALD IC 2004 © C. Faloutsos (2004) 38

CMU SCS

Contact info

[email protected]• www.cs.cmu.edu/~christos• Wean Hall 7107• Ph#: x8.1457