network measurement and monitori - assigment 1, group3, "classification"
DESCRIPTION
Created by Patrick Herbeuval and Valentin ThirionTRANSCRIPT
Classification
Patrick Herbeuval University of Liège
1st Master in Computer Science [email protected]
Valentin ThirionUniversity of Liège
1st Master in Computer Science [email protected]
Networking measurements and monitoring
1st assigment: Oral Presentation
Teacher: B. [email protected]
Plan
I. Introduction
Four papersII. Early Application Identification
III. Multilevel classifier: BLINC
IV. Statistical: The ADSL Case
V. Application specific: Skype
VI. Comparative
VII. Conclusion
I - Introduction
Internet is more and more used today
We want to keep the network comfortable enough
The quality of service asked by consumers increases as fast as applications consumes more bandwidth
ISPs, companies and universities want to ban P2P
Port based classifiers were good years ago, quite inefficient now
Why classify?
Classification is today a key issue for today’s network administrators and companies for the following reasons:
• Improve the network infrastructure
• Ban undesired traffic
• Protect the network against potential attacks
• Global knowledge of trends
How classify?
Deep Packet Inspection (DPI): verry precise technique but lots of drawbacks:
Huge computation power needed
Unneficient if packets are crypted
Continuous need of database updates
Statistical analysis
Social
II - Early Application Identification
Goal: determine the app with the first few packets
Advantage: knowing the kind of traffic in the beginning, ability to block, redirect it
DPI consumes too much ressources and flows need to be ended to be analysed
Statistical: usage of the mean sizes, durations, … these are values that are not available for the first few packets
Clustering the flows
Techniques used: K-Means, Gaussian Mixture Model, special
Values used:Size of the first few packets
Duration of the first few packets (negociation phase)
Data set
4 packet traces3 from a University network
1 from an enterprise network
Keep only TCP packets and trash the ones that flow began before the trace capture
Features analysed: need for an efficient metric
Size and direction of the first 4 packets
We can observe that the range of theses values is very similar across traces, see graph next slide
Size & Directio
n
Classification, 2 phases
Training phase: offline at management sites.Apply clustering techniques to samples of TCP connections for all target applications
Creation of a spatial representation based on the sizes of the first P packets (vector of P dimensions or HMM)
Then find applications that have the same behaviour
Best results: 40 clusters and the 4 first packets
Creation of two sets:One with the description of each cluster
One with applications present in each cluster
Classification, 2 phasesClassification phase: online at management hosts
Extract the 5-tuple and analysis of the size of packets in all directions
With this size, use the assigment module (associates a connection to a cluster)
With the clusters, the labelling module selects the application associated with the connection
Evaluation & ConclusionEvaluation
Assigment accuracy: above 95% for all heuristics
Labbeling accuracy: between 85% and 98%
The size of first few packet is a good metric
Quality of clustering is richer with HMM but comparable with Euclidean
GMM Clustering with TCP ports classifies over 98% of know applications
Limitation: need the first 4 packets in the correct order
Heuristic: (Wikipedia) Where the exhaustive search is impractical (NP-complete for instance), heuristic methods are used to speed up the process of finding a satisfactory solution.
III – The BLINC Classifier
Stands for BLINd Classification
Avoid reading the whole content of the packetPrivacy, performance, cyphered packets
3 levels of classificationSocial level
Functional level
Application level
The Social level
Finding host communitiesClient-server, P2P, …
Analyse these communitiesPerfect match : likely malicious
Partial overlap : P2P sources, websites, gaming, …
Partial overlap within the same subnet : farms
The Social level (2)
The functional level
Find if a host offers a service, uses it or both
Mostly depending on the port range used by this host
Works better when a host is connected to many servers
Typical schemes: HTTP server: 1-2 ports
P2P: many ports (to 1 per host)
Mail server: depending on services available
The application level
Using the connections 4-tuple (+ maybe other characteristics)
Create a model for every application type
Models are represented by little graphs called « graphlets »
BLINC : Results
Uses 2 metrics to evaluate the classifierCompleteness (% classified traffic)
Accuracy (% correctly classified traffic)
Some parameters can be used to tune the classifierChanging a threshold can improve the results for one of the metrics, but significantly degrade the other one
Global results
GN : Genome campus (~1000 users), UN : university network (~20.000 users)
Tuning
Td : minimal # of destination IPs needed to classify the flow as P2P
Results (2)
Good detection rate without reading any byte of the payloadNon payload flows classified as well.
Cyphering is not a problem
Low resource consumption
Good detection of unknown flows
Difficult to distinguish applications of the same type (e.g.a ll VoIP protocols grouped as the same one)
Doesn’t work if the header are encrypted
Hard to identify multiple sources behind NATs
Results from the edge of the network, the classifier may work differently at the backbone of the network
BLINC : conclusion
BLINC has a good detection rate without costing a lot of processing and without being intrusive
It can detect attacks and unknown protocols
It can be improved in some situations
IV – The ADSL Case
Test statistical classifier on different sites, after having been trained on some others.
Dataset:4 packet traces collected at 3 different ADSL POPs from Orange
2 traces at the same time, different locations
2 traces at the same location, 17 days between
Reference used: ODP tool (provided by Orange)
Classification methodology
3 algorithms used to classify the tracesNaïve Bayes Kernel Estimation
Bayesian Network
C4.5 Decision Tree
Traces analysed on the two featuresSET_A: Packet Level Information
SET_B: Flow Level Statistics
3 filters:S/S: flows with 3-way-Handshake
S/S+4D: same as S/S + at least 4 data packets
S/S+F/R: same as S/S + FIN or RST flag at the end
Classification, 2 cases
Static case: classification on each site independently
Ideal number of packets: 4
Accuracy: about 90%
Great classification of WEB and EDONKEY flows
Cross-site case:SET_A: EDONKEY result immune, spatial similarity seems more important than temporal similarity.
Classifier very sensitive to the context in which it is trained
MAIL is often taken for FTP due to the packet sizes similarities
Usage of Port number increases the quality of results
Classification, 2 cases (continued)
SET_B: some degradationsFocus on a single feature: Port number
Results are the opposite from the static case
Prediction of traffic using non-legacy ports is non efficient
Due to the heavy-hitters (typically P2P)
Global results: C4.5 algorithm is the best in term of overall accuracy for almost all cases (static + cross-site)
Degradation : C4.5 is comparable with other algorithms (≤17%)
Data overfitting problem
Unknown class + Conclusion
Looking for the unknown marked flows3 way handshake
Apply classifiers and get confidence level, this value is then compared to the one returned by C4.5
Useful to detect malicious traffic and P2P
Should be integrated into existing DPI tool
Conclusion:Statistical tools are very useful to identify unknown traffic
Good performances if used in the same site as training
Can detect applications among protocols
Really suffers from data overfitting (same behaviour from different apps)
Great thing about this analysis: used commercial traffic, so very differentiated
V – Skype case
We want to detect Skype traffic
It’s already possible to detect VoIP traffic with other classifiers, but how to distinguish it ?
Skype is a closed and cyphered protocol, which has to be analysed before starting the classification
Skype model
Using a controlled environment, detection of Skype traffic characteristics
2 kinds of connections : E2E and E2OE2E : End 2 End, Skype to Skype
E2O : End 2 Out, Skype to telephone network
Skype works on TCP and UDP
Skype can carry text, voice, video and filesEverything multiplexed in 1 packet
In this case, only voice traffic is treated
Skype SoM
TCP packets are entirely cyphered, they cannot be analysed
UDP has a small uncyphered overhead, called Start of Message (SoM)
E2E : id and message type (signaling or data)
E20 : unique connection identifier
Skype also always uses the same port number in UDP (12340)
Classifiers
Chi-Square Classifier (CSC)Based on the randomness of bits in packets
Doesn’t works on TCP since cyphered packets seems to be completely random.
Naive Bayes Classifier (NBC)Real-time voice protocol classifier
Based on message size (depending of the audio codec) and on average inter-packet gap
Used on a short window of samples to cope with variability in packet size
Payload based classifierUsed in the controlled environment to check if CSC and NBC work well
Experiments
NBC detects all kinds of VoIP traffic
CSC detects all kinds of Skype trafficUsing both of them should detect Skype voice traffic
Results
Very low false positive rate
Bigger false negative rate
Skype : Conclusion
Skype is hard to classify due to its cyphering protocol, which makes its analysis hard to do
But with this classifier, we have good results on UDP
False positive is almost zero, good if the ISP wants to prioritarize its traffic
False negative is bigger but not really a problem while the ISP doesn’t want to block Skype
VI - Comparative
All these classifiers have good results, but each of them has its strengths and weaknesses
ADSL needs specific training, but best detection rate
BLINC and Early are less precise but more flexibleThey are also faster and good to detect attacks
BLINC detects unknown protocols but cannot discern apps
Early needs the 4 first packets in order, ADSL the 3-way handshake
Skype is more specific, cannot be compared immediatelyGood false positive rate but higher false negative rate
VII – Conclusion
We have now solutions that can replace DPI’s
Each classifier is good in its domainImportant network: early app detection (detect attacks soon)
ADSL and commercial: statistical (user trends, adapt infrastructure)
University or academy: BLINC (statistics, trends)
Everywhere we want to improve it: Skype classifier
Remarks:Traces and classifiers are quite old (4 to 6 years)
What about mobile usage ? Multimedia over 3/4G networks ?
Thanks for your attention
Any questions ?
References:
K. Karagiannis, K. Papagiannaki, M. Faloutsos. BLINC: Multilevel Traffic
Classification in the Dark. In Proc. ACM SIGCOMM. August 2005.
L. Bernaille, R. Teixeira, K. Salamatian. Early Application Identification. In Proc.
ACM CoNEXT. December 2006.
M. Pietrzyk, J.-L. Costeux, G. Urvoy-Keller, T. En-Jajjary. Challenging Statistical
Classification for Operational Usage: the ADSL Case. In Proc. ACM/USENIX Internet
Measurement Conference (IMC). Novem- ber 2009.
D.Bonfiglio,M.Mellia,M.Meo,D.Rossi,P.Tofanelli.RevealingSkype Traffic: When
Randomness Plays with You. In Proc. ACM SIGCOMM. August 2007.