ppt

22
A Machine Learning-based A Machine Learning-based Approach for Estimating Approach for Estimating Available Bandwidth Available Bandwidth Ling-Jyh Chen Ling-Jyh Chen 1 , Cheng-Fu Chou , Cheng-Fu Chou 2 and Bo-Ch and Bo-Ch un Wang un Wang 2 1 Academia Sinica Academia Sinica 2 National Taiwan University National Taiwan University

Upload: butest

Post on 02-Nov-2014

485 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: PPT

A Machine Learning-based A Machine Learning-based Approach for Estimating Approach for Estimating

Available BandwidthAvailable Bandwidth

Ling-Jyh ChenLing-Jyh Chen11, Cheng-Fu Chou, Cheng-Fu Chou22 and Bo-Chun Wang and Bo-Chun Wang22

11Academia SinicaAcademia Sinica22National Taiwan UniversityNational Taiwan University

Page 2: PPT

Link CapacityLink Capacity: maximum IP-layer throughput that a flow can get, without any cross traffic.

Available BandwidthAvailable Bandwidth: maximum IP-layer throughput that a flow can get, given (stationary) cross traffic.

DefinitionDefinition

Page 3: PPT

Related WorkRelated Work

Statistical cross-traffic models: Measure the time interval between the arrival of any two succ

essive probe packets at the receiver and use the dispersion measurements to estimate the available bandwidth.

E.g., Delphi, IGI, Spruce

Self-induced congestion models: Based on the intuition that if the probing rate is lower than th

e available bandwidth, the probe packets will not experience additional queueing delay during transmission.

E.g., TOPP, Pathload, pathChirp

However, these approaches are either inaccurate or intrusive.

Page 4: PPT

Our ContributionOur Contribution

We propose a machine learning-based approach for accurate available bandwidth estimation.

The proposed approach can estimate available bandwidth even if there are no sample with similar properties to the measured path in the training dataset.

Using a set of simulations, we show the proposed approach is fast, accurate and non-intrusive.

Page 5: PPT

The fact: Due to the diversity and dynamics of the

Internet, collecting and verifying the correctness of data of such a large-scale network is hard.

Our ideas: Create a representative network in the network

simulator using realistic network traces with well-established network traffic models.

Probe the network using effective probing model and collect training data for the machine learning tool.

Estimate the available bandwidth using the well-trained machine learning tool.

Basic IdeasBasic Ideas

Page 6: PPT

Network Scenarios Topology: Tiscali topology of Rocketfuel

’s trace [13] 750 links and 506 nodes (221 are end-users) We assume

Propagation delay: uniformly distributed between [10,20] ms

Buffer size of each link is 20 50% of end-users are ADSL (3M/1Mbps), and the re

mainder are academic networks (100Mbps) Core router links are 1Gbps

System SettingsSystem Settings

Page 7: PPT

Network Scenarios Traffic: based on measurement results

[7] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. Blinc: multilevel traffic classification in the dark. In ACM SIGCOMM, 2005.

System SettingsSystem Settings

Page 8: PPT

Network Scenarios Traffic: based on measurement results

[12] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-service mapping for qos: a statistical signature-based approach to ip traffic classification. In ACM IMC, 2004.

System SettingsSystem Settings

Page 9: PPT

Probing Models Packet Train model

Send k packets in a burst (back-to-back) k-1 dispersions observed k = 11

System SettingsSystem Settings

Page 10: PPT

Probing Models pathChirp-like model

Send a chirp of fifteen packets each time The lowest sending rate is five percent of the

bottleneck capacity Spread factor γ=1.2

System SettingsSystem Settings

Page 11: PPT

Machine Learning Tools Unsupervised learning: EM, K-means clustering Supervised learning: k-NN, SVM

We use SVM in this study, because It can handle missing data caused by packet

loss. It can interpolate/extrapolate the system output. The computation overhead is affordable.

System SettingsSystem Settings

Page 12: PPT

Packet Train model pathChirp-like model Comparison with other tools Scale-Free approach

Performance EvaluationPerformance Evaluation

Page 13: PPT

Each sample is the probing results of a randomly selected node pair.

16,000 samples as the training data.

1,500 samples as the test data.

Each sample of the training data is comprised of 13 properties: 10 dispersions, hop count, bottleneck capacity, and tightest link’s available bandwidth.

Each sample of the test data contains all above information, except the available bandwidth.

Evaluation: Packet Train Evaluation: Packet Train ModelModel

Page 14: PPT

The results are divided into three groups based on their bottleneck link capacity.

Evaluation: Packet Train Evaluation: Packet Train ModelModel

Page 15: PPT

Evaluation: pathChirp-like ModeEvaluation: pathChirp-like Modell

Each chirp consists of 15 packets with a spread factor γ=1.2.

16,000 samples as the training data.

1,500 samples as the test data.

Each sample of the training data is comprised of 17 properties: 14 dispersions, hop count, bottleneck capacity, and tightest link’s available bandwidth.

Each sample of the test data contains all above information, except the available bandwidth.

Page 16: PPT

Evaluation: pathChirp-like ModeEvaluation: pathChirp-like Modell

The results are divided into two groups based on their bottleneck link capacity.

Page 17: PPT

Compare the proposed approach using the pathChirp-like model with pathChirp and Spruce.

Run 1,500 tests for both pathChirp and Spruce in the same network scenario.

Comparing with Other ToolsComparing with Other Tools

Page 18: PPT

The results are divided into two groups based on their bottleneck link capacity.

Comparing with Other ToolsComparing with Other Tools

Page 19: PPT

Scale-Free ApproachScale-Free Approach The proposed approach collects training

data from a very limited network scenario.

The cost of building a database covering all types of Internet scenario is prohibitively expensive.

We propose a Scale-Free approach to normalize all properties in our system. The dispersion measurements are divided b

the initial inter-packet gap. The observed available bandwidth is replaced

with the utilization of the bottleneck link.

Page 20: PPT

Scale-Free ApproachScale-Free Approach Two scenarios (6 Mbps and 50Mbps of the bo

ttleneck link capacity) 1,500 samples for each case

Page 21: PPT

We propose a machine learning-based approach for estimating the available bandwidth of a network path.

We show that the pathChirp-like model outperforms the packet train model in all test cases.

By normalizing all attributes, we show this novel approach is able to accurately estimate available bandwidth, even if there are no sample with similar properties to the measured path in the training dataset.

ConclusionConclusion

Page 22: PPT

Thanks!Thanks!

http://www.iis.sinica.edu.tw/~cclljj/http://www.iis.sinica.edu.tw/~cclljj/http://nrl.iis.sinica.edu.tw/http://nrl.iis.sinica.edu.tw/