evaluation of header field entropy for hash-based packet selection evaluation of header field...

15
Evaluation of Header Field Entropy for Evaluation of Header Field Entropy for Hash-Based Packet Selection Hash-Based Packet Selection Christian Henke, Carsten Schmoll, Tanja Zseby Fraunhofer Institute FOKUS, Berlin, Germany

Upload: harriet-boyd

Post on 16-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

    Evaluation of Header Field Entropy for Evaluation of Header Field Entropy for Hash-Based Packet SelectionHash-Based Packet Selection

Christian Henke, Carsten Schmoll, Tanja Zseby

Fraunhofer Institute FOKUS, Berlin, Germany

Page 2: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Outline Outline

2

1. Introduction Multipoint Sampling

2. Problem Statement

3. Approach

4. Measurement Setup

5. Measurement Results

6. Conclusion

Page 3: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

IntroductionIntroduction Multipoint Sampling Multipoint Sampling

3

Passive Multipoint Measurements– at observation points a packet ID and timestamp exported for each

packet

– trace observable based on occurrence of packet ID – delay = timestamp A – timestamp B of packets with equal ID

Multipoint Collector

Point A Point B

Point C

Page 4: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

IntroductionIntroduction Multipoint Sampling Multipoint Sampling

4

CChallenge in Passive Multipoint Measurements immense amounts of measurement data High infrastructure costs: processing, storing, exporting

Random Packet Selection and Estimation

Random Sampling (n-out-of-N, probabilistic) unsuitable -> inconsistent sample at observation points

Duffield and Grossglauser in “Trajectory Sampling for Direct Traffic Observation” propose hash-based packet selection.

Page 5: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

IntroductionIntroduction Multipoint Sampling Multipoint Sampling

5

IP Header Transport Header Payload

hash input

hash function

packet selected packet not selected

RSSxh ;)( Sxh )()(xh

consistent selected subset if x, h and S are equal at all observation points

RDh :

Dx

Hash-Based Paket Selection

Page 6: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Problem StatementProblem Statement

Which packet content to use as hash input?

Requirements for header fields 1. static between network nodes ( IP TTL and checksum)

2. variable among packets

Challenge: HBS is deterministic; but goal is to emulate random selection choice of hash input can introduce bias to the selection

6

Page 7: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Problem StatementProblem Statement

7

How bias is introduced

- packets in a hash input collision have same hash input - selection decision is not independent- the more packets in collision the more grievous the bias- unsuitable to use whole packet because hash value calculation time

increases with hash input length

Page 8: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

ApproachApproach

Approach– packets differ more often in high variable bytes– entropy per byte used to measure variability

Entropy

InformationEfficiency

pi probability that hash value i occurs

H(B) entropy dependent on discrete Variant of Byte Values

8

Page 9: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Evaluation dependent on analyzed traces - 6 IPv4 trace groups – 1 IPv6 - geographical locations (NZ, AUT, FR, NED – 2 LEO)- network location (university, peering point, large ISP)- application mix

Measurement SetupMeasurement Setup

9

Trace Name Location Duration packets in millions

IP Address anonymized

IP Version TCP UDP ICMP others main applications %

NZIX New Zealand 30 hours ~200 Yes 4 68 20 9 3 http(50) quake(5)FH Salzburg Austria 3 days ~110 Yes 4 99 1 0 0 http(90)LEO 1 3 hours ~130 No 4 1 90 10 0 edonkey(25)LEO 2 6 min ~12 No 4 33 60 0 7 tunnel(60) edonkey(10)Twente Netherlands 10 days ~380 Yes 4 89 7 1 3 diversifiedCiril France 6 hours ~100 No 4Mawi WIDE-6Bone 40 days ~80 Yes 6

Page 10: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Measurement ResultsMeasurement Results

Entropy IPv4

10

Page 11: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Measurement ResultsMeasurement Results

High Entropy Header Fields IPv4: Identification, Length LSB, Src/Dst Address 2 LSB TCP: Chksum, SeqNo, AckNo, Src/Dst Port 2 LSB UDP: Chksum, Length LSB, Src/Dst Port 2 LSB ICMP: Chksum, Bytes 12,13,18,19 IPv6: Length LSB

– more IPv6 traces required for further evaluation– Addresses anonymized and no transport header - only 8 bytes could be evaluated

Recommended 8 byte Configuration IP ID field + 6 Transport Header Bytes:

TCP (Checksum, 2 LSB of Seq and AckNo) UDP (Checksum, Source Port, LSB Destination Port, LSB Length) ICMP (Checksum, Bytes 12,13,18,19)

11

Page 12: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Measurement ResultsMeasurement Results

12

Empirical Hash Input Collisions Evaluation 4 configurations used

1. whole IP and transport header (minimum reachable collisions)

2. only IP header (bad configuration)

3. 8 high entropy bytes

4. Molina‘s 16 bytes sum of packets on 20 largest collisions of each trace

– Large collision: all or none decision of all packets that have same attributes– Small collisions: packets equal in one collision but different between

Page 13: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Measurement ResultsMeasurement Results

Hash Input Collision Comparison

recommended 8 bytes better than Molina’s 16 bytes LEO2 traces include a large VPN traffic flow with UDP Checksum==0 – more high

entropy bytes should be used

13

Trace Group Trace Files Packets/file in millions

Identical IP + Transport

header Identical IP

Header Recomm.

8 BytesMolina’s 16

Bytes FH Salzburg 18 6 3,547 238,174 3,547 3,547 NZIX 19 10 484,034 1,564,246 484,405 1,562,066 Twente 36 10 13,120 475,570 16,004 49,477 LEO 1 12 10 61,072 450,273 73,730 86,809 LEO 2 1 10 949 8,116 7,919 1,121

Page 14: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

ConclusionConclusion

Outcome give a recommendation of 8 bytes for use as hash input for HBS 8 recommended bytes sufficient to gain unique hash inputs

Henke, Schmoll, Zseby “Empirical Evaluation of Hash Functions for Multipoint Measurements”

hash calculation time linear increase with input length hash functions are able to select representative subset based on 8 bytes

14

Page 15: Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,

Evaluation of Header Field Entropy for Hash-Based Packet Selection

PAM 2008, Cleveland

Future Work

Correlation between Bytes Correlation between address bytes entropy of combined bytes expected to be average of entropy

IPv6

entropy evaluation of IPv6 addresses transport headers