software-defined measurement

43
Software-defined Measurement Minlan Yu University of Southern California 1 Joint work with Lavanya Jose, Rui Miao, Masoud Moshref, Ramesh Govindan, Amin Vahdat

Upload: qabil

Post on 26-Feb-2016

65 views

Category:

Documents


0 download

DESCRIPTION

Software-defined Measurement. Minlan Yu University of Southern California. Joint work with Lavanya Jose, Rui Miao, Masoud Moshref , Ramesh Govindan , Amin Vahdat. Management = Measurement + Control . Accounting Count resource usage for tenants Traffic engineering - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Software-defined Measurement

1

Software-defined Measurement

Minlan YuUniversity of Southern California

Joint work with Lavanya Jose, Rui Miao, Masoud Moshref, Ramesh Govindan, Amin Vahdat

Page 2: Software-defined Measurement

2

Management = Measurement + Control • Accounting

– Count resource usage for tenants

• Traffic engineering– Identify large traffic aggregates, traffic changes– Understand flow characteristics (flow size, etc.)

• Performance diagnosis– Why my application has high delay, low throughput?

Page 3: Software-defined Measurement

3

Yet, measurement is underexplored• Measurement is an afterthought in network device

– Control functions are optimized w/ many resources– Limited, fixed measurement support with NetFlow/sFlow

• Traffic analysis is incomplete and indirect– Incomplete: May not catch all the events from samples– Indirect: Offline analysis based on pre-collected logs

• Network-wide view of traffic is especially difficult– Data are collected at different times/places

Page 4: Software-defined Measurement

4

Software-defined Measurement• SDN offers unique opportunities for measurement

– Simple, reusable primitives at switches– Diverse and dynamic analysis at controller– Network-wide view

Controller

Heavy Hitter detection

Configure resources1 Fetch statistics2(Re)Configure resources1

Change detection

Page 5: Software-defined Measurement

5

Challenges• Diverse measurement tasks

– Generic measurement primitives for diverse tasks– Measurement library for easy programming

• Limited resources at switches– New data structures to reduce memory usage– Multiplexing across many tasks

Page 6: Software-defined Measurement

Software-defined Measurement

6

OpenSketch (NSDI’13)

DREAM(SIGCOMM’14)

Sketch-basedcommodity switch

components

Flow-based OpenFlow TCAM

Data plane Primitives

Optimization w/ Provable resource-accuracy bounds

Dynamic Allocation w/ Accuracy

estimatorResource alloc across tasks

OpenSourceNetFPGA + Sketch library

networks of hardware switches and Open vSwitch

Prototype

Page 7: Software-defined Measurement

Software-defined Measurement with Sketches

(NSDI’13)

7

Page 8: Software-defined Measurement

8

Software Defined Networking

API to the data plane (OpenFlow)Fields action countersSrc=1.2.3.4drop, #packets, #bytes

SwitchesForward/measure packets

ControllerConfigure devices and collect measurements

Rethink the abstractions for measurement

Page 9: Software-defined Measurement

9

Tradeoff of Generality and Efficiency

• Generality– Supporting a wide variety of measurement tasks– Who’s sending a lot to 23.43.0.0/16?– Is someone being DDoS-ed?– How many people downloaded files from 10.0.2.1?

• Efficiency– Enabling high link speed (40 Gbps or larger)– Ensuring low cost (Cheap switches with small memory)– Easy to implement with commodity switch components

Page 10: Software-defined Measurement

10

NetFlow: General, Not Efficient

• Cisco NetFlow/sFlow– Log sampled packets, or flow-level counters

• General– Ok for many measurement tasks– Not ideal for any single task

• Not efficient– It’s hard to determine the right sampling rate– Measurement accuracy depends on traffic distribution– Turned off or not even available in datacenters

Page 11: Software-defined Measurement

11

Streaming Algo: Efficient, Not General• Streaming algorithms

– Summarize packet information with Sketches– E.g. Count-Min Sketch, Who’s sending a lot to host A?

• Not general:Each algorithm solves just one question– Require customized hardware or network processors– Hard to implement every solution in practice

# bytes from 23.43.12.1

3 0 5 1 9

0 1 9 3 0

1 2 0 3 4

Hash2Hash1

Hash3

Data plane

Query: 23.43.12.1

5 3 4

Pick min: 3

Control plane

Page 12: Software-defined Measurement

Where is the Sweet Spot?

12

EfficientGeneral

NetFlow/sFlow(too expensive)

Streaming Algo(Not practical)

OpenSketch• General, and efficient data plane based on sketches• Modularized control plane with automatic configuration

Page 13: Software-defined Measurement

13

Flexible Measurement Data Plane• Picking the packets to measure

– Hashes to represent a compact set of flows• A set of blacklisting IPs

– Classify flows with different resources/accuracy• Filter out traffic for 23.43.0.0/16

• Storing and exporting the data– A table with flexible indexing– Complex indexing using hashes and classification– Diverse mappings between counters and flows

Page 14: Software-defined Measurement

A three-stage pipeline– Hashing: A few hash functions on packet source– Classification: based on hash value or packets– Counting: Update a few counters with simple calc.

# bytes from 23.43.12.1

3 0 5 1 9

0 1 9 3 0

1 2 0 3 4

Hash2Hash1

Hash3

Page 15: Software-defined Measurement

15

Build on Existing Switch Components• A few simple hash functions

– 4-8 three-wise or five-wise independent hash functions– Leverage traffic diversity to approx. truly random func.

• A few TCAM entries for classification– Match on both packets and hash values– Avoid matching on individual micro-flow entries

• Flexible counters in SRAM– Many logical tables for different sketches– Different numbers and sizes of counters– Access counters by addresses

Page 16: Software-defined Measurement

16

Modularized Measurement Libarary

• A measurement library of sketches– Bitmap, Bloom filter, Count-Min Sketch, etc.– Easy to implement with the data plane pipeline– Support diverse measurement tasks

• Implement Heavy Hitters with OpenSketch– Who’s sending a lot to 23.43.0.0/16?– count-min sketch to count volume of flows– reversible sketch to identify flows with heavy counts in

the count-min sketch

Page 17: Software-defined Measurement

17

Support Many Measurement TasksMeasurement Programs

Building blocks Line of Code

Heavy hitters Count-min sketch; Reversible sketch

Config:10Query: 20

Superspreaders Count-min sketch; Bitmap; Reversible sketch

Config:10Query:: 14

Traffic change detection

Count-min sketch;Reversible sketch

Config:10Query: 30

Traffic entropy on port field

Multi-resolution classifier; Count-min sketch

Config:10Query: 60

Flow size distribution

multi-resolution classifier; hash table

Config:10Query: 109

Page 18: Software-defined Measurement

18

Resource management• Automatic configuration within a task

– Pick the right sketches for measurement tasks– Allocating resources across sketches– Based on provable resource-accuracy curves

• Resource allocation across tasks– Operators simply specify relative importance of tasks– Minimizing weighted error using convex optimization– Decompose to optimization problem of individual tasks

Page 19: Software-defined Measurement

OpenSketch Architecture

Page 20: Software-defined Measurement

20

Evaluation• Prototype on NetFPGA

– No effect on data plane throughput– Line speed measurement performance

• Trace Driven Simulators– OpenSketch, NetFlow, and streaming algorithm– One-hour CAIDA packet traces on a backbone link

• Tradeoff between generality and efficiency– How efficient is OpenSketch compared to NetFlow?– How accurate is OpenSketch compared to specific

streaming algorithms?

Page 21: Software-defined Measurement

21

Heavy Hitters: false positives/negatives• Identify flows taking > 0.5% bandwidth

OpenSketch requires less memory with higher accuracy

Page 22: Software-defined Measurement

22

Tradeoff Efficiency for Generality

In theory, OpenSketch requires 6 times memory than complex streaming algorithm

Page 23: Software-defined Measurement

23

OpenSketch Conclusion

• OpenSketch: – Bridging the gap between theory and practice

• Leveraging good properties of sketches– Provable accuracy-memory tradeoff

• Making sketches easy to implement and use– Generic support for different measurement tasks– Easy to implement with commodity switch hardware– Modularized library for easy programming

Page 24: Software-defined Measurement

Dynamic Resource AllocationFor TCAM-based Measurement

SIGCOMM’14

24

Page 25: Software-defined Measurement

25

SDM Challenges

Controller

Configure resources1 Fetch statistics2(Re)Configure resources1

Heavy Hitter detectionHeavy Hitter detectionHeavy Hitter detection HChange detection

Dynamic Resource Allocator

Many Management tasks

Limited resources (TCAM)

Page 26: Software-defined Measurement

26

Dynamic Resource Allocator• Diminishing return of resources

– More resources make smaller accuracy gain– More resources find less significant outputs– Operators can accept an accuracy bound <100%

256512 1024 20480

0.2

0.4

0.6

0.8

1

Resources

Rec

all

Reca

ll=

dete

cted

true

HH/

all

Page 27: Software-defined Measurement

27

Dynamic Resource Allocator• Temporal and spatial resource multiplexing

– Traffic varies over time and switches– Resource for an accuracy bound depends on Traffic

Reca

ll=

dete

cted

true

HH/

all

Page 28: Software-defined Measurement

28

Challenges• No ground truth of resource-accuracy

– Hard to do traditional convex optimization– New ways to estimate accuracy on the fly– Adaptively increase/decrease resources accordingly

• Spatial & temporal changes– Task and traffic dynamics– Coordinate multiple switches to keep a task accurate– Spatial and temporal resource adaptation

Page 29: Software-defined Measurement

29

Dynamic Resource Allocator

Controller

Heavy Hitter detectionHeavy Hitter detectionHeavy Hitter detection HChange detection

Dynamic Resource Allocator

Estimated accuracy

Allocated resource

Estimated accuracy

Allocated resource

• Decompose the resource allocator to each switch– Each switch separately increase/decrease resources– When and how to change resources?

Page 30: Software-defined Measurement

30

Per-switch Resource Allocator: When?• When a task on a switch needs more resources?

– Based on A’s accuracy (25%) is not enough• if bound is 40%, no need to increase A’s resources

– Based on the global accuracy (47%) is not enough• if bound is 80%, increasing B’s resources is not helpful

– Conclusion: when max(local, global) < accuracy bound

A B

ControllerHeavy Hitter detection

Detected HH:5 out of 20Local accuracy=25% Detected HH:9 out of 10

Local accuracy=90%

Detected HH: 14 out of 30Global accuracy=47%

Page 31: Software-defined Measurement

31

Per-Switch Resource Allocator: How?• How to adapt resources?

– Take from rich tasks, give to poor tasks • How much resource to take/give?

– Adaptive change step for fast convergence– Small steps close to bound, large steps otherwise

0 100 200 300 400 5000

500

1000

1500

Time(s)

Res

ourc

e

0 100 200 300 400 5000

500

1000

1500

Time(s)

Res

ourc

e

0 100 200 300 400 5000

500

1000

1500

Time(s)

Res

ourc

e

0 100 200 300 400 5000

500

1000

1500

Time(s)

Res

ourc

e

GoalMMAMAAMA

0 100 200 300 400 5000

500

1000

1500

Time(s)

Res

ourc

e

GoalMMAMAAMA

Page 32: Software-defined Measurement

32

Task ImplementationController

Configure resources1 Fetch statistics2(Re)Configure resources1

Heavy Hitter detectionHeavy Hitter detectionHeavy Hitter detection HChange detection

Dynamic Resource Allocator

Estimated accuracy

Allocated resource

Estimated accuracy

Allocated resource

Page 33: Software-defined Measurement

33

Flow-based algorithms using TCAM

• Goal: Maximize accuracy given limited resources• A general resource-aware algorithm

– Different tasks: e.g., HH, HHH, Change detection– Multiple switches: e.g., HHs from different switches

• Assume: Each flow is seen at one switch (e.g., at sources)

36

26 10

12 14

5 7 12 2

5 5

2 30 5000

001

010

011

100

101

110

111

10* 11*00* 01*

0** 1***** New

Current

Page 34: Software-defined Measurement

34

Divide & Merge at Multiple Switches• Divide: Monitor children to increase accuracy

– Requires more resources on a set of switches• Example: Needs an additional entry on switch B

• Merge: Monitor parent to free resources– Each node keeps the switch set it frees after merge– Finding the least important prefixes to merge is the

minimum set cover problem

26

12 1400* 01*

0**

{A,B} {B,C}{A,B,C}

New: A:00*, B:00*,01*, C:01*

Current: A:0**, B:0**, C:0**

Page 35: Software-defined Measurement

35

Accuracy Estimation: Heavy Hitter Detection

• Any monitored leaf with volume > threshold is a true HH

• Recall:

– Estimate missing HHs using volume and level of counter

76

26 50

12 14

5 7 12 2

15 35

20 150 15000

001

010

011

100

101

110

111

10* 11*00* 01*

0** 1*****

With size 26 missed <=2 HHs

At level 2 missed <=2 HH

Threshold=10

Page 36: Software-defined Measurement

36

DREAM Overview

Task

obj

ect

1

Task

obj

ect

n

DREAMSDN Controller

2) Accept/Reject5) Report

1) Instantiate task

3) Configure counters

4) Fetch counters

7) Allocate / Drop

6) Estimate accuracy

Resource Allocator

• Task type (Heavy hitter, Hierarchical heavy hitter, Change detection)

• Task specific parameters (HH threshold)• Packet header field (source IP)• Filter (src IP=10/24, dst IP=10.2/16)• Accuracy bound (80%)

Prototype Implementation with DREAM algorithms on Floodlight and Open vSwitches

Page 37: Software-defined Measurement

37

Evaluation• Evaluation Goals

– How accurate are tasks in DREAM? • Satisfaction: Task lifetime fraction above given accuracy

– How many more accurate tasks can DREAM support?• % of rejected/dropped tasks

– How fast is the DREAM control loop?• Compare to

– Equal: divide resources equally at each switch, no reject– Fixed: 1/n resources to each task, reject extra tasks

Page 38: Software-defined Measurement

38

512 1024 2048 40960

20

40

60

80

100

Switch capacity%

of t

asks

DREAM-rejectFixed-rejectDREAM-drop

Prototype Results

Mean

5th %

DREAM: High satisfaction for avg & 5th % of tasks with low rejection

Fixed: High rejection as over-provisions for small tasks

Equal: only keeps small tasks satisfied

256 tasks (various task types) on 8 switches

Page 39: Software-defined Measurement

39

512 1024 2048 40960

20

40

60

80

100

Switch capacity%

of t

asks

DREAM-rejectFixed-rejectDREAM-drop

Prototype ResultsDREAM: High satisfaction for avg & 5th % of tasks at the expense of more rejection

Equal & Fixed: only keeps small tasks satisfied

Page 40: Software-defined Measurement

40

Control Loop DelayAllocation delay is

negligible vs. other delays

Incremental saving lets reduce save delay

Page 41: Software-defined Measurement

41

DREAM Conclusion• Challenges with software-defined measurement

– Diverse and dynamic measurement tasks – Limited resources at switches

• Dynamic resource allocation across tasks– Accuracy estimators for TCAM-based algorithms– Spatial and temporal resource multiplexing

Page 42: Software-defined Measurement

42

Summary• Software-defined measurement

– Measurement is important, yet underexplored– SDN brings new opportunities to measurement– Time to rebuild the entire measurement stack

• Our work– OpenSketch:Generic, efficient measurement on sketches– DREAM: Dynamic resource allocation for many tasks

Page 43: Software-defined Measurement

43

Thanks!