network measurement and monitori - assigment 1, group3, "classification"

Classification

Patrick Herbeuval University of Liège

1st Master in Computer Science [email protected]

Valentin ThirionUniversity of Liège

1st Master in Computer Science [email protected]

Networking measurements and monitoring

1st assigment: Oral Presentation

Teacher: B. [email protected]

Plan

I. Introduction

Four papersII. Early Application Identification

III. Multilevel classifier: BLINC

IV. Statistical: The ADSL Case

V. Application specific: Skype

VI. Comparative

VII. Conclusion

I - Introduction

Internet is more and more used today

We want to keep the network comfortable enough

The quality of service asked by consumers increases as fast as applications consumes more bandwidth

ISPs, companies and universities want to ban P2P

Port based classifiers were good years ago, quite inefficient now

Why classify?

Classification is today a key issue for today’s network administrators and companies for the following reasons:

• Improve the network infrastructure

• Ban undesired traffic

• Protect the network against potential attacks

• Global knowledge of trends

How classify?

Deep Packet Inspection (DPI): verry precise technique but lots of drawbacks:

Huge computation power needed

Unneficient if packets are crypted

Continuous need of database updates

Statistical analysis

Social

II - Early Application Identification

Goal: determine the app with the first few packets

Advantage: knowing the kind of traffic in the beginning, ability to block, redirect it

DPI consumes too much ressources and flows need to be ended to be analysed

Statistical: usage of the mean sizes, durations, … these are values that are not available for the first few packets

Clustering the flows

Techniques used: K-Means, Gaussian Mixture Model, special

Values used:Size of the first few packets

Duration of the first few packets (negociation phase)

Data set

4 packet traces3 from a University network

1 from an enterprise network

Keep only TCP packets and trash the ones that flow began before the trace capture

Features analysed: need for an efficient metric

Size and direction of the first 4 packets

We can observe that the range of theses values is very similar across traces, see graph next slide

Size & Directio

n

Classification, 2 phases

Training phase: offline at management sites.Apply clustering techniques to samples of TCP connections for all target applications

Creation of a spatial representation based on the sizes of the first P packets (vector of P dimensions or HMM)

Then find applications that have the same behaviour

Best results: 40 clusters and the 4 first packets

Creation of two sets:One with the description of each cluster

One with applications present in each cluster

Classification, 2 phasesClassification phase: online at management hosts

Extract the 5-tuple and analysis of the size of packets in all directions

With this size, use the assigment module (associates a connection to a cluster)

With the clusters, the labelling module selects the application associated with the connection

Evaluation & ConclusionEvaluation

Assigment accuracy: above 95% for all heuristics

Labbeling accuracy: between 85% and 98%

The size of first few packet is a good metric

Quality of clustering is richer with HMM but comparable with Euclidean

GMM Clustering with TCP ports classifies over 98% of know applications

Limitation: need the first 4 packets in the correct order

Heuristic: (Wikipedia) Where the exhaustive search is impractical (NP-complete for instance), heuristic methods are used to speed up the process of finding a satisfactory solution.

III – The BLINC Classifier

Stands for BLINd Classification

Avoid reading the whole content of the packetPrivacy, performance, cyphered packets

3 levels of classificationSocial level

Functional level

Application level

The Social level

Finding host communitiesClient-server, P2P, …

Analyse these communitiesPerfect match : likely malicious

Partial overlap : P2P sources, websites, gaming, …

Partial overlap within the same subnet : farms

The Social level (2)

The functional level

Find if a host offers a service, uses it or both

Mostly depending on the port range used by this host

Works better when a host is connected to many servers

Typical schemes: HTTP server: 1-2 ports

P2P: many ports (to 1 per host)

Mail server: depending on services available

The application level

Using the connections 4-tuple (+ maybe other characteristics)

Create a model for every application type

Models are represented by little graphs called « graphlets »

BLINC : Results

Uses 2 metrics to evaluate the classifierCompleteness (% classified traffic)

Accuracy (% correctly classified traffic)

Some parameters can be used to tune the classifierChanging a threshold can improve the results for one of the metrics, but significantly degrade the other one

Global results

GN : Genome campus (~1000 users), UN : university network (~20.000 users)

Tuning

Td : minimal # of destination IPs needed to classify the flow as P2P

Results (2)

Good detection rate without reading any byte of the payloadNon payload flows classified as well.

Cyphering is not a problem

Low resource consumption

Good detection of unknown flows

Difficult to distinguish applications of the same type (e.g.a ll VoIP protocols grouped as the same one)

Doesn’t work if the header are encrypted

Hard to identify multiple sources behind NATs

Results from the edge of the network, the classifier may work differently at the backbone of the network

BLINC : conclusion

BLINC has a good detection rate without costing a lot of processing and without being intrusive

It can detect attacks and unknown protocols

It can be improved in some situations

IV – The ADSL Case

Test statistical classifier on different sites, after having been trained on some others.

Dataset:4 packet traces collected at 3 different ADSL POPs from Orange

2 traces at the same time, different locations

2 traces at the same location, 17 days between

Reference used: ODP tool (provided by Orange)

Classification methodology

3 algorithms used to classify the tracesNaïve Bayes Kernel Estimation

Bayesian Network

C4.5 Decision Tree

Traces analysed on the two featuresSET_A: Packet Level Information

SET_B: Flow Level Statistics

3 filters:S/S: flows with 3-way-Handshake

S/S+4D: same as S/S + at least 4 data packets

S/S+F/R: same as S/S + FIN or RST flag at the end

Classification, 2 cases

Static case: classification on each site independently

Ideal number of packets: 4

Accuracy: about 90%

Great classification of WEB and EDONKEY flows

Cross-site case:SET_A: EDONKEY result immune, spatial similarity seems more important than temporal similarity.

Classifier very sensitive to the context in which it is trained

MAIL is often taken for FTP due to the packet sizes similarities

Usage of Port number increases the quality of results

Classification, 2 cases (continued)

SET_B: some degradationsFocus on a single feature: Port number

Results are the opposite from the static case

Prediction of traffic using non-legacy ports is non efficient

Due to the heavy-hitters (typically P2P)

Global results: C4.5 algorithm is the best in term of overall accuracy for almost all cases (static + cross-site)

Degradation : C4.5 is comparable with other algorithms (≤17%)

Data overfitting problem

Unknown class + Conclusion

Looking for the unknown marked flows3 way handshake

Apply classifiers and get confidence level, this value is then compared to the one returned by C4.5

Useful to detect malicious traffic and P2P

Should be integrated into existing DPI tool

Conclusion:Statistical tools are very useful to identify unknown traffic

Good performances if used in the same site as training

Can detect applications among protocols

Really suffers from data overfitting (same behaviour from different apps)

Great thing about this analysis: used commercial traffic, so very differentiated

V – Skype case

We want to detect Skype traffic

It’s already possible to detect VoIP traffic with other classifiers, but how to distinguish it ?

Skype is a closed and cyphered protocol, which has to be analysed before starting the classification

Skype model

Using a controlled environment, detection of Skype traffic characteristics

2 kinds of connections : E2E and E2OE2E : End 2 End, Skype to Skype

E2O : End 2 Out, Skype to telephone network

Skype works on TCP and UDP

Skype can carry text, voice, video and filesEverything multiplexed in 1 packet

In this case, only voice traffic is treated

Skype SoM

TCP packets are entirely cyphered, they cannot be analysed

UDP has a small uncyphered overhead, called Start of Message (SoM)

E2E : id and message type (signaling or data)

E20 : unique connection identifier

Skype also always uses the same port number in UDP (12340)

Classifiers

Chi-Square Classifier (CSC)Based on the randomness of bits in packets

Doesn’t works on TCP since cyphered packets seems to be completely random.

Naive Bayes Classifier (NBC)Real-time voice protocol classifier

Based on message size (depending of the audio codec) and on average inter-packet gap

Used on a short window of samples to cope with variability in packet size

Payload based classifierUsed in the controlled environment to check if CSC and NBC work well

Experiments

NBC detects all kinds of VoIP traffic

CSC detects all kinds of Skype trafficUsing both of them should detect Skype voice traffic

Results

Very low false positive rate

Bigger false negative rate

Skype : Conclusion

Skype is hard to classify due to its cyphering protocol, which makes its analysis hard to do

But with this classifier, we have good results on UDP

False positive is almost zero, good if the ISP wants to prioritarize its traffic

False negative is bigger but not really a problem while the ISP doesn’t want to block Skype

VI - Comparative

All these classifiers have good results, but each of them has its strengths and weaknesses

ADSL needs specific training, but best detection rate

BLINC and Early are less precise but more flexibleThey are also faster and good to detect attacks

BLINC detects unknown protocols but cannot discern apps

Early needs the 4 first packets in order, ADSL the 3-way handshake

Skype is more specific, cannot be compared immediatelyGood false positive rate but higher false negative rate

VII – Conclusion

We have now solutions that can replace DPI’s

Each classifier is good in its domainImportant network: early app detection (detect attacks soon)

ADSL and commercial: statistical (user trends, adapt infrastructure)

University or academy: BLINC (statistics, trends)

Everywhere we want to improve it: Skype classifier

Remarks:Traces and classifiers are quite old (4 to 6 years)

What about mobile usage ? Multimedia over 3/4G networks ?

Thanks for your attention

Any questions ?

References:

K. Karagiannis, K. Papagiannaki, M. Faloutsos. BLINC: Multilevel Traffic

Classification in the Dark. In Proc. ACM SIGCOMM. August 2005.

L. Bernaille, R. Teixeira, K. Salamatian. Early Application Identification. In Proc.

ACM CoNEXT. December 2006.

M. Pietrzyk, J.-L. Costeux, G. Urvoy-Keller, T. En-Jajjary. Challenging Statistical

Classification for Operational Usage: the ADSL Case. In Proc. ACM/USENIX Internet

Measurement Conference (IMC). Novem- ber 2009.

D.Bonfiglio,M.Mellia,M.Meo,D.Rossi,P.Tofanelli.RevealingSkype Traffic: When

Randomness Plays with You. In Proc. ACM SIGCOMM. August 2007.

network measurement and monitori - assigment 1, group3, "classification"

Technology

size of packets

tcp packets

packets negociation

ports p2p

application typemodels

application specific

application levelusing

size direction

￼network measurement and monitori - assigment 1, group3, "classification"

network measurement and monitori - assigment 1, group3, "classification"