spotlight - cs.cmu.edudeswaran/talks/kdd18-spotlight-slides.pdf · problem spotlight sketching...

49
Detecting Anomalies in Streaming Graphs Nina Mishra Dhivya Eswaran Christos Faloutsos Sudipto Guha SpotLight Carnegie Mellon University Amazon This work was performed at Amazon.

Upload: others

Post on 17-May-2020

49 views

Category:

Documents


0 download

TRANSCRIPT

Detecting Anomalies in Streaming Graphs

Nina MishraDhivya Eswaran Christos Faloutsos Sudipto Guha

SpotLight

Carnegie Mellon University Amazon

This work was performed at Amazon.

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Graphs are being created everywhere

�2

INTRODUCTION

You Alice

6 Jun 2018, 1.34am

………

………

………

………………

………

………………

………

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Many other settings…

�3

INTRODUCTION

IM/e-mail networks Computer networks

Transportation networks Edit networks

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

As a sequence of graph snapshots

�4

INTRODUCTION

time

Monday PM Tuesday PM

Monday AM Tuesday AM Wednesday AMMORNINGS

NIGHTS

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

But sometimes unusual events happen

�5

INTRODUCTION

NormalTax scamNetwork failure

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Unusual events in other settings

�6

INTRODUCTION

Computer networks (e.g., port scans,

denial-of-service)Transportation networks (events/weather)

stadium

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

How do we detect such anomalies in streaming graphs?

�7

INTRODUCTION

How do we even characterize these anomalies?

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

INSIGHT

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Anomalies tend to involve…

�9

INSIGHT

sudden (dis)appearance of large dense directed subgraph

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

sudden (dis)appearance of large dense directed subgraph

�10

INSIGHT

sourcessources

destinationsdestinations

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �11

INSIGHT

sudden (dis)appearance of large dense directed subgraph

sources

destinationsmany nodes

many many edges

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �12

INSIGHT

sudden (dis)appearance of large dense directed subgraph

steady evolution?

suddeninitial final

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �13

INSIGHT

appearance disappearance

sudden (dis)appearance of large dense directed subgraph

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

PROBLEM

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �15

PROBLEM

time

anomaly!Ok! Ok!Ok!

• (Un)directed weighted edges • Time-evolving node set • Known node-correspondence

STREAMING MODEL

• Real-time and fast detection • Bounded working memory

ALGORITHMIC CONSTRAINTS

GIVEN

FIND

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

ALGORITHM

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Overview of SpotLight

�17

ALGORITHM

Graph

Sketching

v(G3)

v(G1)

v(G2) v(G4)

G1

G3 G4

G2

anomaly! v(G3)

v(G1)

v(G2) v(G4)

Anomaly

Detection

Many off-the-shelf methods for anomaly detection:

‣ Robust Random Cut Forests [Guha, Mishra, Roy & Schrijvers; ICML 2016]

‣ Light-weight Online Detector of Anomalies [Pevny; ML 2016]

‣ Randomized Space Forests [Wu, Zhang, Fan, Edwards & Yu; ICDM 2014]

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0 100

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0 100 20

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0 100 20

THREE PARAMETERS:

‣ Probability of sampling source ‘p’ ‣ Probability of sampling destination ‘q’ ‣ Number of sketching dimensions ‘K’

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

time5pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 0

ahS hS hS

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 0

ahS hS hS

bhD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 1

ahS hS hS

bhD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm

0 0 1

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm

0 0 1

bhS hS hS

chD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm

0 2 3

bhS hS hS

chD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

0 2 3

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

5-6pm

0 2 3

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

5-6pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

a

d2

a

a1

b

c1

7pm

5-6pm

1 0 2

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

a

d2

a

a1

b

c1

7pm

5-6pm 6-7pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

a

d2

a

a1

b

c1

7pm

5-6pm 6-7pm

0 0 0

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

GUARANTEES

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Intuition behind our theorems

�21

GUARANTEES

G GBGR

v(GR)

v(GB)

K-dim SpotLight Space

v(G)dR

dB dR - dB > O(K m2)

Deterministic Experiment: Add ‘m’ unit-weight edges.

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Thm 1: Focus-awareness in expectation

�22

GUARANTEES

<

GGR GB

Randomized Experiment: Add ‘m’ unit-weight edges uniformly at random.

K-dim SpotLight Space

dR

dB

distance

proba

bility

E[dB]

Focus-awareness property was introduced by Koutra, Vogelstein & Faloutsos [SDM 2013].

E[dR]

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Thm 2: Criterion for anomaly detection

�23

GUARANTEES

distance

proba

bility dR dB

FN FP

decision thresholdanomalynormal

distancepro

babil

ity

dR dB

FPR ≤ 𝛅

𝛜

➡ Pr[dR-dB > 𝛜] ≥ 1-𝛅

“EXPECTED” GAP “HIGH PROBABILITY” GAP

sketch size, K ≥ K*

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

EXPERIMENTS

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

The labeled DARPA dataset

�25

EXPERIMENTS

4.5M edges in 87.7K time ticks 9.5K sources, 24K destinations Edges labeled as attack/not

Stream of 1.5K hourly graphs(24% anomalous)

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

DARPA: Precision and recall

�26

EXPERIMENTS

#graphs correctly flagged

#graphs flaggedPrecision =

#graphs correctly flagged

#anomalous graphsRecall =

RHSS: (Ranshous, Harenburg, Sharma & Samatova, SDM 2016)STA: Streaming Tensor Analysis (Sun, Tao & Faloutsos, KDD 2006)

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

DARPA: Challenges and successes

�27

EXPERIMENTS

SpotLight

Edge Weight = SL with K=p=q=1 (+misses medium size attacks)

(misses small attacks)

RHSS = Edge likelihood function (+misses repeated attacks)

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

CONCLUSION

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Summary

29

CONCLUSION

Memory efficient Theoretical guaranteesReal-time

Ok!

anomaly!

Ok! Ok! time

PROBLEM

SpotLight sketching

SOLUTION

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Future directions

�30

CONCLUSION

MORE CHALLENGING ANOMALIES

‣ Slow and/or small attacks

‣ Sequence of suspicious events rather than a single event

STREAMING ANOMALY ATTRIBUTION

‣ Blame a small set of sources and destinations for the anomaly

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 31

CONCLUSION

Thank you! Questions:

[email protected]

distance

proba

bility

dR dB

FPR ≤ 𝛅

𝛜

ALGORITHM THEORY PRACTICE