distributed clustering for robust aggregation in large networks

32
Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel

Upload: clay

Post on 15-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Distributed Clustering for Robust Aggregation in Large Networks. Ittay Eyal, Idit Keidar, Raphi Rom. Technion, Israel. 2. Aggregation in Sensor Networks – Applications. Temperature sensors thrown in the woods. Seismic sensors. Grid computing load. 3. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed Clustering for Robust Aggregation in Large Networks

Distributed Clustering for Robust Aggregation in Large Networks

Ittay Eyal, Idit Keidar, Raphi Rom

Technion, Israel

Page 2: Distributed Clustering for Robust Aggregation in Large Networks

Aggregation in Sensor Networks – Applications

Temperature sensors thrown in the woods

Seismic sensors

Grid computing load

2

Page 3: Distributed Clustering for Robust Aggregation in Large Networks

Aggregation in Sensor Networks – Applications

• Large networks, light nodes, low bandwidth• Fault-prone sensors, network• Multi-dimensional (location X temperature)• Target is a function of all sensed data

Average temperature, max location, majority…

3

Page 4: Distributed Clustering for Robust Aggregation in Large Networks

What has been done?

Page 5: Distributed Clustering for Robust Aggregation in Large Networks

Tree Aggregation

Hierarchical solution

Fast - O(height of tree)

1 3 9 11

2 10

5

6

Page 6: Distributed Clustering for Robust Aggregation in Large Networks

Tree Aggregation

Hierarchical solution

Fast - O(height of tree)

Limited to static topology No failure robustness

1 3 9 11

2 1010

6

62

Page 7: Distributed Clustering for Robust Aggregation in Large Networks

Gossip

• D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003.

• S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004.

Gossip:

Each node maintains a synopsis

7

11

9

3

1

Page 8: Distributed Clustering for Robust Aggregation in Large Networks

Gossip

• D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003.

• S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004.

Gossip:

Each node maintains a synopsis

Occasionally, each node contacts a neighbor and they improve their synopses

8

11

9

3

1 7

5

5

7

Page 9: Distributed Clustering for Robust Aggregation in Large Networks

Gossip

• D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003.

• S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004.

Gossip:

Each node maintains a synopsis

Occasionally, each node contacts a neighbor and they improve their synopses

Indifferent to topology changes Crash robust

No data error robustness

Proven convergence

9

7

5

5

7

6

6

6

6

Page 10: Distributed Clustering for Robust Aggregation in Large Networks

A closer look at the problem

Page 11: Distributed Clustering for Robust Aggregation in Large Networks

The Implications of Irregular Data

4A single erroneous sample can radically offset the data

31 106

27o

The average (47o) doesn’t tell the whole story

25o 26o 25o 28o 98o 120o 27o

11

Page 12: Distributed Clustering for Robust Aggregation in Large Networks

Sources of Irregular Data

Sensor Malfunction

Short circuit in a seismic sensor

Sensing Error

An animal sitting on a temperature sensor

Interesting Info: DDoS: Irregular load on some machines in a grid

Software bugs: In grid computing, a machine reports negative CPU usage

Interesting Info: Fire outbreak: Extremely high temperature in a certain area of the woods

Interesting Info: intrusion: A truck driving by a seismic detector

12

Page 13: Distributed Clustering for Robust Aggregation in Large Networks

It Would Help to Know The Data Distribution

27o

The average is 47o

Bimodal distribution with peaks at 26.3o and 109o

25o 26o 25o 28o 98o 120o 27o

13

Page 14: Distributed Clustering for Robust Aggregation in Large Networks

Estimate a range of distributions [1,2] or data clustering according to values [3,4]

Fast aggregation [1,2] Tolerate crash failures, dynamic networks [1,2] High bandwidth [3,4], multi-epoch [2,3,4] or One dimensional data only [1,2] No data error robustness [1,2]

Existing Distribution Estimation Solutions

1. M. Haridasan and R. van Renesse. Gossip-based distribution estimation in peer-to-peer networks. In InternationalWorkshop on Peer-to-Peer Systems (IPTPS 08), February 2008.

2. J. Sacha, J. Napper, C. Stratan, and G. Pierre. Reliable distribution estimation in decentralised environments. Submitted for Publication, 2009.

3. W. Kowalczyk and N. A. Vlassis. Newscast em. In Neural Information Processing Systems, 2004.4. N. A. Vlassis, Y. Sfakianakis, and W. Kowalczyk. Gossip-based greedy gaussian mixture learning. In

Panhellenic Conference on Informatics, 2005.

14

Page 15: Distributed Clustering for Robust Aggregation in Large Networks

Our Solution

Samples deviating from the distribution of the bulk of the data

Outliers:

15

Estimate a range of distributions by data clustering according to values

Fast aggregation Tolerate crash failures, dynamic networks Low bandwidth, single epoch Multi-dimensional data Data error robustness by outlier detection

Page 16: Distributed Clustering for Robust Aggregation in Large Networks

Outlier Detection Challenge

27o 25o 26o 25o 28o 98o 120o 27o

16

Page 17: Distributed Clustering for Robust Aggregation in Large Networks

Outlier Detection Challenge

A double bind:

27o 25o 26o 25o 28o 98o 120o 27o

Regular data distribution

~26o

Outliers{98o, 120o}

No one in the system has enough information

17

Page 18: Distributed Clustering for Robust Aggregation in Large Networks

Aggregating Data Into Clusters

• Each cluster has its own mean and mass • A bounded number (k) of clusters is maintained

Herek = 2

Original samples

1 3 5 10

1a b c d

1

Clustering a and b

1 3

a b

Clustering all

1 3 5 10

1

abc

d

3Clustering a, b and c

5 10

1

2

abc

2

1

18

Page 19: Distributed Clustering for Robust Aggregation in Large Networks

But What Does The Mean Mean?

New Sample

Mean A Mean B

The variance must be taken into account

Gaussian A

Gaussian B

19

Page 20: Distributed Clustering for Robust Aggregation in Large Networks

Gossip Aggregation of Gaussian Clusters

Distribution is described as k clusters Each cluster is described by: • Mass• Mean • Covariance matrix (variance for 1-d data)

20

Page 21: Distributed Clustering for Robust Aggregation in Large Networks

Gossip Aggregation of Gaussian Clusters

a

b

Merge

21

Keep half, Send half

Page 22: Distributed Clustering for Robust Aggregation in Large Networks

Distributed Clustering for Robust Aggregation 22

• Aggregate a mixture of Gaussian clusters• Merge when necessary (exceeding k)

Our solution:

Recognize outliers

By the time we need to merge, we can estimate the distribution

Page 23: Distributed Clustering for Robust Aggregation in Large Networks

Simulation Results 23

1. Data error robustness2. Crash robustness3. Elaborate multidimensional data

Simulation Results:

Page 24: Distributed Clustering for Robust Aggregation in Large Networks

It Works Where It Matters

Not Interesting

Easy

24

Page 25: Distributed Clustering for Robust Aggregation in Large Networks

It Works Where It Matters

Error

0 5 10 15 20 250

0.5

1E

rror

No outlier d

etection

With outlier detection

25

Page 26: Distributed Clustering for Robust Aggregation in Large Networks

Simulation Results 26

1. Data error robustness2. Crash robustness3. Elaborate multidimensional data

Simulation Results:

Page 27: Distributed Clustering for Robust Aggregation in Large Networks

Err

or

Round

Protocol is Crash Robust

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Round

Ave

rage

Err

or

No outlier detection, 5% crash probability

No outlier detection, no crashes

Outlier detection

27

Page 28: Distributed Clustering for Robust Aggregation in Large Networks

Simulation Results 28

1. Data error robustness2. Crash robustness3. Elaborate multidimensional data

Simulation Results:

Page 29: Distributed Clustering for Robust Aggregation in Large Networks

Describe Elaborate Data

FireNo Fire

Distance

Tem

pera

ture

29

Page 30: Distributed Clustering for Robust Aggregation in Large Networks

The algorithm converges• Eventually all nodes have the same clusters forever• Note: this holds even without atomic actions

• The invariant is preserved by both send and receive

Theoretical Results (In Progress) 30

… to the “right” output• If outliers are “far enough” from other samples, then they

are never mixed into non-outlier clusters• They are discovered• They do not bias the good samples’ aggregate

(where it matters)

Page 31: Distributed Clustering for Robust Aggregation in Large Networks

Summary

Robust Aggregation requires outlier detection

27o 27o 27o 27o 27o98o 120o

31

We present outlier detection by Gaussian clustering:

Merge

Page 32: Distributed Clustering for Robust Aggregation in Large Networks

Summary – Our Protocol 32

Elaborate DataCrash Robustness

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Round

Ave

rage

Err

or

Outlier Detection (where it matters)