towards understanding network traffic through whole packet analysis abdulrahman hijazi hajime inoue...

26
Towards Understanding Network Traffic through Whole Packet Analysis Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P.C. van Oorschot Anil Somayaji

Upload: linda-carroll

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Towards Understanding Network Traffic through

Whole Packet AnalysisAbdulrahman Hijazi

Hajime InoueAshraf Matrawy

P.C. van OorschotAnil Somayaji

Towards Understanding Network Traffic through Whole Packet Analysis

2

Agenda

• Introduction

• Project in a nutshell

• ADHIC

• NetADHICT• Overview• In progress

• Results• Performance• Multimedia & encrypted traffic• P2P• No-headers

• Limitations

• Applications

Towards Understanding Network Traffic through Whole Packet Analysis

3

Introduction

• Complexity of modern computer networks

• Common network analysis strategies• Predetermined classifiers (port, address, …)• Protocol dissectors (wireshark, …)

• High-level view of network structure through packets clustering • Header information• Payload

• Better distinguishes: p2p, worms, …• Performance issue

Towards Understanding Network Traffic through Whole Packet Analysis

4

Introduction

• We developed a packet clustering technique that:• finds semantically interesting clusters• adapts to the changing nature of traffic patterns • does not require explicit a priori information• does not rely on any specific fields in the packets • can run in sub-linear time (packets length)

• Two innovations:• (p,n)-grams: n-bytes substrings at p byte offset• ADHIC (Approximate Divisive HIerarchical Clustering)

• Two key features: • Network traffic redundancy• Optimal clustering is not required

Towards Understanding Network Traffic through Whole Packet Analysis

5

Project in a Nutshell

• NetADHICT: our implementation of ADHIC

• It can analyze data as it is received by a network interface, or offline using libpcap files.

• Observed data is used to generate & update a (p,n)-gram decision tree.

• This tree serves as a classifier tree reflecting the high-level structure of network traffic at a given time.

• Deduced structure corresponds to • the typical network traffic division (TCP vs.

UDP; web vs. non-web), which is• arrived at using automatically generated

context related (p,n)-grams.

Towards Understanding Network Traffic through Whole Packet Analysis

6

ADHIC

• Using sampled measure of similarity, ADHIC recursively subdivides traffic into binary classes until resulting traffic is:• below certain threshold or• too similar or dissimilar

• Produced binary tree consists of:• internal decision nodes with one (p,n)-gram per

node• leaf nodes that constitute final clusters

• Classification rule is based on matching (p,n)-grams.

• Traffic at each terminal cluster is a result of a Boolean equation constructed by following the path from root to leaf.

Towards Understanding Network Traffic through Whole Packet Analysis

7

ADHIC

Towards Understanding Network Traffic through Whole Packet Analysis

8

ADHIC

• ADHIC adapts to changing traffic by performing the following two tree operations:

• Splitting, when: • a leaf contains more than preset threshold of

traffic and • there is a (p,n)-gram that matches a percentage

between certain range (e.g. 40%-60%).

• Deletion, when:• a subtree has not matched a minimum threshold

• Both of these statistics are measured over a preset period of time called: maturation window.

Towards Understanding Network Traffic through Whole Packet Analysis

9

NetADHICT: Overview

• Licensed under GNU GPL

• It usually starts by separating IP from non-IP, then later in lower nodes it sequesters specific protocols.

• NetADHICT segregates packets by protocol and other characteristics (e.g. length).

• (p,n)-grams corresponding to special header or payload fields allow unconventional classification measures.

• NetADHICT was tested against four week-long traces from our CCSL lab.

Towards Understanding Network Traffic through Whole Packet Analysis

10

NetADHICT: Overview

Towards Understanding Network Traffic through Whole Packet Analysis

11

NetADHICT: Overview

Towards Understanding Network Traffic through Whole Packet Analysis

12

NetADHICT: In progress

Towards Understanding Network Traffic through Whole Packet Analysis

13

NetADHICT: In progress

Towards Understanding Network Traffic through Whole Packet Analysis

14

NetADHICT: In progress

Towards Understanding Network Traffic through Whole Packet Analysis

15

NetADHICT: In progress

• Examples of interesting segregation through (p,n)-grams:• (51, 0x00 0x00): part of ARP’s Ethernet frame

trailer • (64, 0x00 0x0f): part of EIGRP’s non-IP header • (22, 0x2c 0x06) and (54, 0x01 0x01): part of

IMAPS’s TTL & protocol ID and “NOP, NOP” options field respectively

• (37, 0xc1 0x0c): HSRP’s 2nd byte of dest port & 1st byte of UDP length

• (174, 0x00 0x00): part of NetBIOS-DGM’s payload

Towards Understanding Network Traffic through Whole Packet Analysis

16

Results: Performance

Single protocol cluster: clusters that the traditional classifier reports as containing packets of only one protocol.

Towards Understanding Network Traffic through Whole Packet Analysis

17

Results: Performance

• NetADHICT does well with most traffic types.• Structured packets (e.g. non-IP, UDP, …) are

segregated through header and/or payload (p,n)-grams.

• Unstructured packets (e.g. TCP) are more segregated through header (p,n)-grams including fields like the five tuples and others (e.g. packet length, QoS field, TTL, options, padding, …).

• NetADHICT also clusters same protocol packets running on different port numbers together (e.g. HTTP on 80 and 8080).

Towards Understanding Network Traffic through Whole Packet Analysis

18

Results: Multimedia & Encrypted Traffic

• In addition: multimedia (e.g. MS-Streaming) & encrypted (e.g. SSH, HTTPS, IMAPS) traffic are both:• Segregated from unencrypted traffic:

NetADHICT either segregates them through header (p,n)-grams or shunts them to default clusters

• Distinguished from each other: NetADHICT finds suitable header (p,n)-grams to separate different encrypted traffic from each other.

Towards Understanding Network Traffic through Whole Packet Analysis

19

Results: P2P

• Many P2P applications feature using constantly changing non-standard port numbers in the same network session.

• In all the experiments done, NetADHICT was able to:• cluster the P2P UDP tracker packets together

through a non-IP-header (p,n)-gram.• cluster all other related TCP packets (data and

control) to the tree’s global default cluster and its adjacent cluster.

• Even when the running port of all the P2P packets was maliciously changed to the standard HTTP port number (i.e. 80), packets were clustered exactly like before.

Towards Understanding Network Traffic through Whole Packet Analysis

20

Results: P2P

Towards Understanding Network Traffic through Whole Packet Analysis

21

Results: P2P

• Two observations:• NetADHICT rarely uses ports to cluster traffic.• NetADHICT managed to segregate P2P traffic

by characterizing other network traffic as having patterns that were absent in the P2P traffic.

• Conclusion:• So long as most well-behaved traffic can be

appropriately clustered, evasive protocols can be identified.

Towards Understanding Network Traffic through Whole Packet Analysis

22

Results: No-Headers

• NetADHICT can also do semantically meaningful clustering even without looking at the IP header (first 38 bytes).

• Although performance is occasionally degraded, decision trees made with no header information are qualitatively similar to those done using all packet information.

• The main difference is in NetADHICT’s inability to separate different encrypted traffic when headers are restricted.

Towards Understanding Network Traffic through Whole Packet Analysis

23

Results: No-Headers

Towards Understanding Network Traffic through Whole Packet Analysis

24

Limitations

• Analysis challenge:• Difficulty (work and time) in analyzing clusters

both manually and automatically

• Privacy issues:• Our algorithm looks at both headers and

payloads

• Sophisticated design:• Large configuration space, making it difficult to

choose an optimal set of parameters

Towards Understanding Network Traffic through Whole Packet Analysis

25

Applications

• Network administration:• understand overall structure of network traffic

and further assist in monitoring its changes.

• Network security:• isolate malicious traffic from normal traffic,

(featuring no outdated signatures, long training, or false alarms).

• Quality of Service:• actively manage bandwidth by giving each leaf

cluster an equal share of the bandwidth.

• Other applications:• ADHIC has no built-in knowledge of

networking!

Towards Understanding Network Traffic through Whole Packet Analysis

26

Thank you