network telemetry through tomography - stanford university

1
Network Telemetry through Tomography Yilong Geng 1 , Shiyu Liu 1 , Zi Yin 1 , Ashish Naik 2 , Balaji Prabhakar 1 , Mendel Rosenblum 1 , and Amin Vahdat 2 1 Stanford University 2 Google Inc. Motivation Network Telemetry through Tomography Network tomography Network tomography with LASSO Network tomography with neural networks Network microscopy Want to observe and monitor network and application performance. Connectivity and reachability Regression tests for system changes/updates Root-cause analysis of performance degradation Security: monitor bad traffic Current approaches: Switches report detailed stats: Per-queue counters Per-packet measurements Expensive, power-dissipated, require bandwidth to ship sensed data, need same vendor Our approach: Telemetry: Sense at the edge and reconstruct switch/link utilizations More scalable: No per-queue counters or per-packet measurement No extra network traffic due to sensed data Software-based, no hardware upgrade of network Ø For each probe: Ø Combine all probes: Ø Solve for queueing delays: System Pipeline Network tomography with packet counts 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% Queue length Queue breakdown Link util Relative error of recon 10ms counts 5ms counts 2ms counts 1ms counts LASSO + per pkt Method For recon of 10ms interval Computation time Storage space 10ms counts 0.13ms 10.6KB 5ms counts 0.26ms 22.0KB 2ms counts 0.58ms 68.6KB 1ms counts 1.14ms 194.1KB LASSO + per pkt injection 1.8ms(LASSO) 40.7ms (Microscopy) 1720.4KB Network tomography results (Stanford testbed) Q = arg min Q ||D - AQ|| 2 2 + ||Q|| 1 Queue length breakdown Link utilization breakdown Algorithm Relative error Pseudo-inverse 44% Linear regression 35% LASSO regression 9% ReLU NN 7.4% Network tomography results (Google testbed) Ø Use NN to approximate LASSO Goal: reduce amount of data needed for tomography Don’t need to keep timestamps of all packets Simply use packet/byte counts Ø Alerts and Replica Ø Network tomography

Upload: others

Post on 26-May-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Telemetry through Tomography - Stanford University

Network Telemetry through TomographyYilong Geng1, Shiyu Liu1, Zi Yin1, Ashish Naik2, Balaji Prabhakar1, Mendel Rosenblum1, and Amin Vahdat2

1 Stanford University 2 Google Inc.

Motivation Network Telemetry through Tomography

• Network tomography

• Network tomography with LASSO

• Network tomography with neural networks

• Network microscopy

• Want to observe and monitor network and

application performance. • Connectivity and reachability

• Regression tests for system changes/updates

• Root-cause analysis of performance degradation

• Security: monitor bad traffic

• Current approaches:• Switches report detailed stats:

• Per-queue counters

• Per-packet measurements

• Expensive, power-dissipated, require bandwidth

to ship sensed data, need same vendor

• Our approach:• Telemetry: Sense at the edge and reconstruct

switch/link utilizations

• More scalable: • No per-queue counters or per-packet measurement

• No extra network traffic due to sensed data

• Software-based, no hardware upgrade of network

Ø For each probe:

Ø Combine all probes:

Ø Solve for queueing delays:

System Pipeline

• Network tomography with packet counts

0.00% 5.00%

10.00% 15.00% 20.00% 25.00% 30.00%

Queuelength

Queuebreakdown

Linkutil

Relativeerrorofrecon

10mscounts 5mscounts2mscounts 1mscountsLASSO+perpkt

Method

Forreconof10msinterval

Computationtime

Storagespace

10mscounts 0.13ms 10.6KB

5mscounts 0.26ms 22.0KB

2mscounts 0.58ms 68.6KB

1mscounts 1.14ms 194.1KB

LASSO+perpkt injection

1.8ms(LASSO)40.7ms(Microscopy)

1720.4KB

Network tomography results (Stanford testbed)

Q = argminQ ||D �AQ||22 + ↵||Q||1

Queue length breakdown

Link utilization breakdown

Algorithm Relativeerror

Pseudo-inverse 44%

Linear regression 35%

LASSO regression 9%

ReLU NN 7.4%

Network tomography results (Google testbed)

Ø Use NN to approximate LASSO

• Goal: reduce amount of data needed for

tomography

• Don’t need to keep timestamps of all packets

• Simply use packet/byte counts

Ø Alerts and Replica

Ø Network tomography