dealing with p2p traffic in modern networks: measurement, identification and control

52
Silvio Valenti Télécom ParisTech, France 21 September 2011 Dealing with P2P traffic in modern networks: measurement, identification and control Directeur de thèse: Dario Rossi

Upload: wright

Post on 25-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Dealing with P2P traffic in modern networks: measurement, identification and control. Silvio Valenti Tél é com ParisTech , France 21 September 2011. Directeur de thèse: Dario Rossi. Outline. Traffic classification State of the art Behavioral classification for P2P traffic – Abacus - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dealing with P2P traffic in modern networks: measurement, identification and control

Silvio ValentiTélécom ParisTech, France

21 September 2011

Dealing with P2P traffic in modern networks: measurement, identification and control

Directeur de thèse: Dario Rossi

Page 2: Dealing with P2P traffic in modern networks: measurement, identification and control

2

OutlineContext and motivation

P2P applicationsP2P traffic diffusion

Contributions of this thesis1. Traffic classification2. Data reduction3. Congestion control for

P2P traffic

Summary and Conclusion

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Traffic classificationState of the artBehavioral classification

for P2P traffic – AbacusMethodologyExperimental campaign

Dataset & metricsAbacus vs KISSAbacus & sampling

Page 3: Dealing with P2P traffic in modern networks: measurement, identification and control

3S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

P2P applications

21/09/2011

Client-server systems resources on the server contents on the server clients exploit server resources

Peer-2-Peer systems hosts share their resources with

the others clients talk directly to each other

and collaborate robust, scalable, autonomous many services

file-sharing, VoIP, live-streaming

Page 4: Dealing with P2P traffic in modern networks: measurement, identification and control

4S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

File-Sharing

P2P timeline

21/09/2011

t

Napster Gnutella

Bittorrent

1999 2000 2001 2002 2003 2004 2007

eMule

Limewire

Spotify

Kazaa

Search

Chord Kademlia

VoIP

Skype

Live streaming

PPLive Sopcast Joost

TVAntsCool

streaming

uTorrent 3.0

Web based

Megaupload

2005

Music streaming

P2P inbrowsers

PhD Thesis!2008 2011

Page 5: Dealing with P2P traffic in modern networks: measurement, identification and control

5S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Decline in the last few yearsVideo traffic (YouTube)Web hosting (MegaUpload)

…but likely not to disappearAbsolute volume increasesUsers go back to P2P [2]New services still emerging

P2P traffic in modern networks

21/09/2011

High volumes: in 2009, 40-70% of total trafficConcerns among ISPs: especially for P2P-TV [1]

Source: Ipoque,Internet studies 2008-2009

Page 6: Dealing with P2P traffic in modern networks: measurement, identification and control

6S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Content of this thesis

21/09/2011

Goal: Develop tools and protocols to help operators deal with P2P traffic

1.Traffic Classification 2.Data reduction 3.Congestion controlfor P2P

?P2P?File-sharing?Bittorrent?

Sampling

Page 7: Dealing with P2P traffic in modern networks: measurement, identification and control

7S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Content of this thesis

21/09/2011

Goal: Develop tools and protocols to help operators deal with P2P traffic

1.Traffic Classification 2.Data reduction 3.Congestion controlfor P2P

?P2P?File-sharing?Bittorrent?

Sampling

Page 8: Dealing with P2P traffic in modern networks: measurement, identification and control

8S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

P2P Traffic classification

21/09/2011

Problem: Identify P2P traffic in the network

…to better manage itManagement: QoS, Differential queuingSecurity: Intrusion detection, Lawful intercept

Technical Challengesencryption, tunneling, proprietary protocolsCPU power

?

Page 9: Dealing with P2P traffic in modern networks: measurement, identification and control

9

Abacus classifier

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Contribution: AbacusBehavioral classifier tailored for P2P-TV applicationsLater generalized to P2P in generalOpen source demo software

FeaturesBehavioral approach Based only flow-level data

counts of packets and bytesFine-grained classificationRobust, portableAs accurate as a payload-based classifier

Page 10: Dealing with P2P traffic in modern networks: measurement, identification and control

10S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Content of this thesis

21/09/2011

Goal: Develop tools and protocols to help operators deal with P2P traffic

1.Traffic Classification 2.Data reduction 3.Congestion controlfor P2P

?P2P?File-sharing?Bittorrent?

Sampling

Page 11: Dealing with P2P traffic in modern networks: measurement, identification and control

11S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Sampling common practice among ISPsreduces load, amount of data… …and information!

Goal: is traffic classification possible with flow-level data? (Netflow)with flow-sampling? (routing)with packet-sampling?

Contributions:studied Abacus with Netflow-data and flow-sampling

Data reduction and classification

21/09/2011

Page 12: Dealing with P2P traffic in modern networks: measurement, identification and control

12S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Packet Sampling

21/09/2011

Studied impact of packet sampling on classificationtstat flow-monitor modified to apply samplingdifferent sampling policies (systematic, random…) and rates

Findingsheavy distortion

no matter the policyinformation content of

features less impactedclassification possible when

sampled data used for training (homogeneous policy)

50

60

70

80

90

100

1 2 5 10 20 50 100Fl

ow a

ccur

acy

Sampling step

Train = unsampled

Train = sampled

Page 13: Dealing with P2P traffic in modern networks: measurement, identification and control

13S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Content of this thesis

21/09/2011

Goal: Develop tools and protocols to help operators deal with P2P traffic

1.Traffic Classification 2.Data reduction 3.Congestion controlfor P2P

?P2P?File-sharing?Bittorrent?

Sampling

Page 14: Dealing with P2P traffic in modern networks: measurement, identification and control

14S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Congestion control for P2P

21/09/2011

Goal: a low-priority protocol for P2P applicationsRequirements:

efficient use of bandwidth, detect congestion earlyautomatically yield to other traffic (interactive, web)

Contributions:implemented new BitTorrent protocol (LEDBAT or uTP)

delay-based low-priority congestion control

Page 15: Dealing with P2P traffic in modern networks: measurement, identification and control

15S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Congestion control for P2P

21/09/2011

Contributions:Evaluated through measurements and simulation

0 20 40 60 80

100 120

0 20 40 60 80 100 120Con

gest

ion

win

dow

[pkt

s]

Time [s]

Flow 1Flow 2

0

20

40

60

80

0 20 40 60 80 100 120Con

gest

ion

win

dow

[pkt

s]

Time [s]

Discovered a fairness issueLatecomer advantage

Proposed effective solutionVerified also analytically

Page 16: Dealing with P2P traffic in modern networks: measurement, identification and control

16

OutlineContext and motivation

P2P applicationsP2P traffic diffusion

Contributions of this thesis1. Traffic classification2. Data reduction3. Congestion control for

P2P traffic

Summary and Conclusion

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Traffic classificationState of the artBehavioral classification

for P2P traffic – AbacusMethodologyExperimental campaign

Dataset & metricsAbacus vs KISSAbacus & sampling

Page 17: Dealing with P2P traffic in modern networks: measurement, identification and control

17S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Statistical classification

Classifiers families (1)

Deep Packet Inspection (DPI)

Behavior analysis(Abacus)

GET

MAIL FROM:

BT

Specific Keyword Flow properties Algorithm design

+s1 +s3-s2 +s6-s4 -s5

+s1 -s3-s2 +s4 -s5

21/09/2011

Page 18: Dealing with P2P traffic in modern networks: measurement, identification and control

18S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Taxonomy of traffic classification

21/09/2011

Approach Features Granularity Timeliness Training Computational cost

Deep packet inspection

Signature in payload [3]

Fine grained

First payload packet

difficult High, access to payload

Stochastic packet inspection

Statistical properties of payload [4]

Fine grained

Online after 80 packets (~100ms)

easy High, access to payload

Statistical Flow-level properties [5]

Coarse grained

Post mortem easy Lightweight

Packet-level properties [6]

Fine grained

After few packets (~5)

easy Lightweight

Behavioral Host-levelproperties [7]

Coarse grained

Post mortem easy Lightweight

Endpointrate [8]

Fine grained

Online, after 5s easy Lightweight

Page 19: Dealing with P2P traffic in modern networks: measurement, identification and control

19

Abacus: the idea

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Different kinds of people in a partychat briefly with many otherstalk at length with few others

…and different kinds of applicationsdownload small pieces of video from

many peersdownload all video from almost the

same peers Leverage this to classify traffic

1. Observe a host for a given time2. Count the packet received by others 3. What kind of application?

APP1

APP2

Page 20: Dealing with P2P traffic in modern networks: measurement, identification and control

20

Classification process

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it

Support Vector Machines (or other learning tool)3. Validate the classification accuracy

Cfr with an “oracle” that knows the truth

Phase 1

Signature

Phase 3

Verify

Phase 2

DecisionTraffic(Known) (Training) (Operation)

Page 21: Dealing with P2P traffic in modern networks: measurement, identification and control

21

Classification process

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it

Support Vector Machines (or other learning tool)3. Validate the classification accuracy

Cfr with an “oracle” that knows the truth

Phase 1

Signature

Phase 3

Verify

Phase 2

DecisionTraffic(Known) (Training) (Operation)

Page 22: Dealing with P2P traffic in modern networks: measurement, identification and control

22

Abacus: Signature definition

21/09/2011S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Procedure1. Observe host X for ∆T = 5s2. Count packets received from peers Yi

3. Divide peers in bin (exponential size)4. Normalize over total number of peers5. Repeat for bytes6. The distribution is the Abacus signature

Pros Only lightweight operations No access to packet payloads Focus on incoming traffic

more stable throughput for video

X

Y1 Y2 Y3 Y4 Y5

Freq.

Distribution = [1, 1, 3, 0]Signature = [0.2, 0.2, 0.6]

1 2 3-4 5-8 9-16

Page 23: Dealing with P2P traffic in modern networks: measurement, identification and control

23

Signature comparison

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Pmf

Time [steps of 5sec] Time [steps of 5sec]

Time [steps of 5sec] Time [steps of 5sec]

PPLive Tvants

JoostSopcast

Pmf

Pmf

Pmf

Page 24: Dealing with P2P traffic in modern networks: measurement, identification and control

24

Classification process

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it

Support Vector Machines (or other learning tool)3. Validate the classification accuracy

Cfr with an “oracle” that knows the truth

Phase 1

Signature

Phase 3

Verify

Phase 2

DecisionTraffic(Known) (Training) (Operation)

Page 25: Dealing with P2P traffic in modern networks: measurement, identification and control

25S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Support Vector Machines

21/09/2011

Space ofsamples(dim. C)

Kernel trick

Space of feature(dim. ∞)

Classificationdecision

Training

=

Signatures are points in a multi-dimentional space complex surfaces separating regions

SVM training phase starting from a set of labeled points kernel maps points in a higher-dimentionality space simple hyperplanes separating points Support Vectors individuate the planes

Decision phase map the new sample in the higher space label the point according to the region it falls into

Unknown traffic rejection criterion or additional class

Page 26: Dealing with P2P traffic in modern networks: measurement, identification and control

26

Classification process

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

1. Statistical characterization of traffic 2. Assign traffic to the class that best fits it

Support Vector Machines (or other learning tool)3. Validate the classification accuracy

Cfr with an “oracle” that knows the truth

Phase 1

Signature

Phase 3

Verify

Phase 2

DecisionTraffic(Known) (Training) (Operation)

Page 27: Dealing with P2P traffic in modern networks: measurement, identification and control

27S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Overview of experiments

21/09/2011

Dataset and metricsExperimental results

accuracy resultsportability analysis

Abacus with NetflowAbacus in the core

Page 28: Dealing with P2P traffic in modern networks: measurement, identification and control

28S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Overview of experiments

21/09/2011

Dataset and metricsExperimental results

accuracy resultsportability analysis

Abacus with NetflowAbacus in the core

Page 29: Dealing with P2P traffic in modern networks: measurement, identification and control

29

Dataset

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Known issuesGround-truth vs representativeness

Our datasetActive traces from European testbed (2008)

P2P-TV apps, 40 hosts, 26 GB of dataReliable ground-truthHigh-heterogenity (access network, location)

Passive traces from ISP, Campus (2006–2009)Other P2P apps, and generic traffic, ~4GB of dataGround-truth with DPI or GT[8] Representative of generic environment

PPLive

Sopcast

Joost

TVAnts

Bittorrent

eMule Skype

Page 30: Dealing with P2P traffic in modern networks: measurement, identification and control

30

MetricsMetrics

True Positive Rate (TPR), percentage of traffic correctly classified

Misclassified (Mis)percentage of traffic classified as the wrong applications

Other (Ot)percentage of traffic classified as “unknown”

Percentage are computed…signature-wise

related to the performance of the classification enginebyte-wise

related to the bulk of traffic (interesting for ISPs)

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Page 31: Dealing with P2P traffic in modern networks: measurement, identification and control

31

Overview of experiments

21/09/2011S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Dataset and metricsExperimental results

accuracy resultsportability analysis

Abacus with NetflowAbacus in the core

Page 32: Dealing with P2P traffic in modern networks: measurement, identification and control

32

Baseline results

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Classification outcome

Signature % Bytes %

TP Mis Unk TP Mis Unk

PPLive 95.42 2.44 2.14 98.32 1.54 0.14

TVAnts 99.84 0.16 0.00 99.82 0.17 0.01

SopCast 97.55 1.17 1.29 98.96 0.98 0.06

Joost 94.97 0.23 4.80 99.62 0.23 0.15

Unk(UDP) 0.1 99.9 <0.1 >99.9

TP higher than 95% in term of signature and 98% in term of bytes Misclassification for signatures carrying less bytes

FPR for unknown traffic 0.1% Effective rejection criterion

Page 33: Dealing with P2P traffic in modern networks: measurement, identification and control

33S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Portability

21/09/2011

Are Abacus signature portable across…Networks?

train on one network, test on another oneloss 6% worst case

Access technologies?divide peers with High Bandwidth 10Mbps and ADSL 2Mbpsok, train=HB has some difficult when test=ADSL

Channel popularity? (# of peers in swarm)2nd experiment with unpopular channelproblems when train=popular and test=unpopular

Time?traces of P2P-TV from 2006 as test (train 2008)classification possible unless software version changes

Page 34: Dealing with P2P traffic in modern networks: measurement, identification and control

34S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Overview of experiments

21/09/2011

Dataset and metricsExperimental results

baseline resultsPortability analysis

Abacus with NetflowAbacus in the core

Page 35: Dealing with P2P traffic in modern networks: measurement, identification and control

35S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Abacus and Netflow (1)

21/09/2011

Netflow de facto standard for flow monitoringrouters exports data on flowswhen flow terminates (explicitly or for timeout) Netflow data has larger time granularities (minutes)

Netflowrouter

Collector

For each flow• ip src, dst• port src, dst• ip protocol• #packets• #bytes• begin, end time• …

Page 36: Dealing with P2P traffic in modern networks: measurement, identification and control

36

Abacus and Netflow (2)

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Bytes Signatures

Application TP Mis Other TP Mis Other

PPLive 96 4 - 63.6 14.1 22.3

SopCast 92.9 7.1 - 54.4 21.3 24.3

TVAnts 99.4 0.6 - 49.7 22.2 28.1

Joost 99.9 0.1 - 53.2 24.3 22.5

eDonkey 98.9 0.2 0.9 94.4 0.8 4.8

BitTorrent 89.1 10.3 0.6 12.5 86.9 0.6

Skype 90.5 3.1 6.4 86.1 7.5 6.4

DNS 92.1 5.0 2.9 63.9 7 29.1

Other 0.2 99.8 12.4 87.6

Most significant signatures are correctly classified

Page 37: Dealing with P2P traffic in modern networks: measurement, identification and control

37S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Abacus in the core (1)

21/09/2011

Abacus needs all traffic for one host (only on the edge)

In the core, it is no longer possible due to routing

Target host

Classifier

Flows seenHost1Host2Host3Host4

Page 38: Dealing with P2P traffic in modern networks: measurement, identification and control

38S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Abacus in the core(2) Abacus signature are normalized

if there is no bias in peers selection, classification possible

Randomly sampled network with rate 1/2, 1/4, 1/8

train with unsampled,test with sampled traffic

ResultsByte and signature accuracy degrade

smoothlyTest with real routing tables agrees

with our experiments

65

70

75

80

85

90

95

100

1 1/2 1/4 1/8

Ove

rall

accu

racy

Flow sampling rate

bytessignaturesrouting - bytesrouting - signatures

Peer 0Peer 3

21/09/2011

Page 39: Dealing with P2P traffic in modern networks: measurement, identification and control

39

Conclusion

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

P2P has a central role in today’s Internet trafficOperators need tools to manage such traffic

Abacus our contribution to traffic classificationbehavioral classifier as accurate as payload based

algorithms (byte accuracy > 98%)portable (time, space) robust (low false alarm rate <0.1%)works with Netflow data may be deployed in the core

Page 40: Dealing with P2P traffic in modern networks: measurement, identification and control

40S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Future work

21/09/2011

1. Behavioral classification test Abacus with TCP and other kind of traffic

2. Data reduction test abacus with packet sampling evaluate other smart policies evaluate portability of sampled flow records

3. Congestion control evaluate LEDBAT in the real world evaluate Bittorrent+LEDBAT in the real world improve LEDBAT definition

Page 41: Dealing with P2P traffic in modern networks: measurement, identification and control

41

References

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

1. X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross. A Measurement Study of a Large-Scale P2P IPTV System. IEEE Transactions on Multimedia, Dec. 2007.

2. A. Finamore, M. Mellia, M. Meo, M. Munafo, and D. Rossi. Experiences of internet traffic monitoring with tstat. IEEE Network Magazine, Special Issue on Network Traffic Monitoring and Analysis, May 2011.

3. V. Paxson. Bro: a system for detecting network intruders in real-time. Elsevier Comput. Netw., 31:2435–2463, December 1999

4. A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, 2010.

5. M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli. Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review, 37(1):5–16, January 2007.

6. A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, 2010.

7. T. Z. J. Fu, Y. Hu, X. Shi, D.-M. Chiu, and J. C. S. Lui. PBS: Periodic Behavioral Spectrum of P2P Applications. In Proc. of PAM ’09, Seoul, South Korea, Apr 2009

8. F. Gringoli, Luca Salgarelli, M. Dusi, N. Cascarano, F. Risso, and k. c. claffy. GT: picking up the truth from the ground for internet traffic. SIGCOMM Comput. Commun. Rev. 39, 5 2009

Page 42: Dealing with P2P traffic in modern networks: measurement, identification and control

42

Publications1. S. Valenti, D. Rossi, Fine-grained behavioral classification in the core: the issue of flow sampling, In TRaffic Analysis

and Classification (TRAC) Workshop at IWCMC 2011

2. P. Bermolen, M. Mellia, M. Meo, D. Rossi, S. Valenti, Abacus: Accurate behavioral classification of P2P-TV traffic, Elsevier Computer Networks, 55(6):1394-1411, April 2011

3. S.Valenti, D. Rossi, Identifying key features for P2P traffic classification, in IEEE ICC'11, Kyoto, Japon, June 2011

4. G. Carofiglio, L. Muscariello, D. Rossi and S. Valenti, The quest for LEDBAT fairness, In IEEE Globecom'10,

5. A. Finamore, M. Mellia, M. Meo, D. Rossi and S. Valenti, Peer-to-peer traffic classification: exploiting human communication dynamics, In IEEE Globecom'10, Demo Session,

6. A. Pescape, D. Rossi, D. Tammaro and S. Valenti, On the impact of sampling on traffic monitoring and analysis, In Proceedings of the 22nd International Teletraffic Congress (ITC22), 2010.

7. D. Rossi, C. Testa, S. Valenti and L. Muscariello, LEDBAT: the new BitTorrent congestion control protocol, In International Conference on Computer Communication Networks (ICCCN'10)

8. D.Rossi, S. Valenti, Fine-grained traffic classification with Netflow data, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC 2010

9. A.Finamore, M. Meo, D. Rossi, S. Valenti, Kiss to Abacus: a comparison of P2P-TV traffic classifiers, In Traffic Measurement and Analysis (TMA) Workshop at PAM'10

10. D. Rossi, C. Testa, S. Valenti, Yes, we LEDBAT: Playing with the new BitTorrent congestion control algorithm, In Passive and Active Measurement (PAM) 2010

11. D. Rossi, E. Sottile, S. Valenti and P. Veglia, Gauging the network friendliness of P2P applications., In SIGCOMM Demo Session,

12. S. Valenti, D. Rossi, M. Meo, M.Mellia and P. Bermolen, Accurate and Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets, In Traffic Measurement and Analysis (TMA) Workshop at IFIP Networking'09

13. S. Valenti, D. Rossi, M. Meo, M. Mellia and P. Bermolen, An Abacus for P2P-TV traffic classification, In IEEE INFOCOM 2009, Demo 21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Page 43: Dealing with P2P traffic in modern networks: measurement, identification and control

43

Thank you for your attention!

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Page 44: Dealing with P2P traffic in modern networks: measurement, identification and control

44S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Abacus and KISS

21/09/2011

Characteristic Abacus KissTechnique Behavioral Stochastic Payload

InspectionInput format Netflow-like Packet trace

Protocol family P2P (especially P2P-TV)

Any

Train set size 4000 samples 300 samples

Time responsiveness

deterministic (5s)

Stochastic (80 pkts, ~2s)

Memory occupation

320 B 384 B

Memory operations

2 per pkt177 every 5s

49 per pkt768 every 80 pkt

CPU operations 2 per pkt200 every 5s

24 per pkt1200 every 80 pkt

KISS[6] recognizes protocol syntaxanalyze first payload bytesuse a c2-like to recognize fields

Abacus has same accuracy Abacus outperform KISS for

computation cost

F1 pkt1 cb d2 ... 02 60 F1 pkt2 cc d5 ... 02 08 F2 pkt1 01 da ... 02 65 F1 pkt3 cd c0 ... 02 d9 F2 pkt2 02 c1 ... 02 5c F2 pkt3 03 dc ... 02 11

3 bit = 1ID det

randomcounter

Page 45: Dealing with P2P traffic in modern networks: measurement, identification and control

45

Performance metrics

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Confusion matrix representation

IndexesTP rate (or Recall) = TP / (TP + FN)

recognizing the application trafficFP rate = FP / (FP + TN)

recognizing other traffic

Real label

Classification outcome

App A Other

App A TP FN

Other FP TN

Page 46: Dealing with P2P traffic in modern networks: measurement, identification and control

46

Sensitivity

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Impact of classifier parameters1. Time interval

shorter windows (1s) -> difficult longer windows 10, 15, 30, 60 s -> similar performance

2. Training set sizewe used 20% of dataset (4000 signatures per app)with 300 signatures -> 10% reduction for some apps

3. Training set diversity1 or 2 peers per network is enough for a robust training

4. SVM Kernel and bin sizeGaussian kernel is better than linearExponential binning is more efficient of linear binning

Page 47: Dealing with P2P traffic in modern networks: measurement, identification and control

47S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Packet Sampling

21/09/2011

Studied impact of packet sampling on classificationTstat export flow-level featureModified to apply samplingDifferent policies and rates

Findingsheavy distortion in the

measurement, no matter the policy

information content of metrics less impacted

classification possible when sampled data used for training

Systematic

Random

Stratified

SYN

50

60

70

80

90

100

1 2 5 10 20 50 100

Flow

acc

urac

y

Sampling step

Heterogeneous

Homogeneous

Page 48: Dealing with P2P traffic in modern networks: measurement, identification and control

48

For R~0

low TPR

low FPR

For R~1

high TPR

high FPR

For R=0.5

high TPR

low FPR

Rejection criterion selection

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Page 49: Dealing with P2P traffic in modern networks: measurement, identification and control

49

B=qp,BD 1 n

=x

xqxp=B1

Training points

R

R

Center of the class

New points

Labeled as“unknown”

Labeled as“green”

Rejection criterion

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

Hyper-space is partitioned every point is given a labeleven “unknown” apps

Need a way to recognize themDefine a center for each classDefine a threshold R

d = distance between sampled and the center of the assigned classIf d > R mark the new point as

unknownBhattacharyya distance BD

Distance between p.d.f.

Page 50: Dealing with P2P traffic in modern networks: measurement, identification and control

50

Signature comparison (mean)

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

PPLive Tvants

Bins

0.5

0.4

0.3

0.2

0.1

0.0

Pmf

0.5

0.4

0.3

0.2

0.1

0.0

0.5

0.4

0.3

0.2

0.1

0.0

0.5

0.4

0.3

0.2

0.1

0.0

Pmf

Pmf

Pmf

Sopcast Bins

Joost

Bins Bins

Page 51: Dealing with P2P traffic in modern networks: measurement, identification and control

51

Abacus with Netflow

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis

From packet-level to flow-level data1. Use longer time-scales (5->120s)

applications may become similar -> more difficult to identify2. Prorate flow-records over time-windows

3. Add a specific class for “unknown traffic” this problem comes from SVM

flow 1, ok!

flow 2, to split!

t0 120 240

Page 52: Dealing with P2P traffic in modern networks: measurement, identification and control

52

Ops…wrong key !

21/09/2011 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis