preventing encrypted traffic analysis

55
Preventing Encrypted Traffic Analysis Nabíl Adam Schear University of Illinois at Urbana-Champaign Department of Computer Science [email protected] 6 January 2011 Committee: Nikita Borisov (UIUC-ECE) Karen L. Bintz (Los Alamos National Laboratory) Matthew Caesar (UIUC-CS) Carl A. Gunter (UIUC-CS) David M. Nicol (UIUC-ECE) For a copy of the slides or dissertation: http://helious.net / UNCLASSIFIED Open Release LA-UR 11-01317

Upload: gladys

Post on 24-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

6 January 2011. Preventing Encrypted Traffic Analysis. Nabíl Adam Schear University of Illinois at Urbana-Champaign Department of Computer Science [email protected]. Committee: Nikita Borisov (UIUC-ECE) Karen L. Bintz (Los Alamos National Laboratory) Matthew Caesar (UIUC-CS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Preventing Encrypted Traffic Analysis

Preventing Encrypted Traffic Analysis

Nabíl Adam SchearUniversity of Illinois at Urbana-Champaign

Department of Computer [email protected]

6 January 2011

Committee: Nikita Borisov (UIUC-ECE)

Karen L. Bintz (Los Alamos National Laboratory)Matthew Caesar (UIUC-CS)Carl A. Gunter (UIUC-CS)

David M. Nicol (UIUC-ECE)

For a copy of the slides or dissertation: http://helious.net/

UNCLASSIFIEDOpen Release

LA-UR 11-01317

Page 2: Preventing Encrypted Traffic Analysis

2

Encrypted Traffic Analysis• Encrypted protocols provide strong confidentiality

through encryption of content

• But – encryption does not mask packet sizes and timing– Privacy can be breached by traffic analysis attacks

• Traffic analysis can recover:– Browsed websites through encrypted proxies– Keystrokes in real-time systems– Language and phrases in VoIP– Identity in anonymity systems– Embedded protocols in tunnels

• Who needs defense against traffic analysis?– Privacy seeking users and enterprises with VPNs, SSL, VoIP,

anonymity networks etc.

Page 3: Preventing Encrypted Traffic Analysis

3

Bob logs into SSH gateway

SSH Traffic Analysis Attack

SSH Gateway

54523234JA1232542234

Internal Server A

Corporate Network

Bob’s Computer

dejfwoLfowjf2394h

247jfwolf2uenql2394h

237jfwolf2uenql2394h

By keystroke timing, Bob

typed: U-I-U-C

Bob types password for A:

UIUC

Bob starts login to A

Page 4: Preventing Encrypted Traffic Analysis

4

Defense Detection Attack

Tor Relay Network

Private Tor Bridge

Alice wants to use Tor from China

Port 443

HFA0adfalkjU4;KDJA23ADK542542342AF

HFA0adfdsfaalU213sdfsdf23ADasdfaK

54251234242342AF

Traffic does not match real HTTPSSignature: TOR

HFA0adfalkjU4;KDJA23ADK542542342AF

Page 5: Preventing Encrypted Traffic Analysis

5

Approach:Realistic Mimicry with TrafficMimic

• Tunneling real data over cover traffic– Force attacker to see cover packet sizes and timings not

real ones– Attacker cannot separate real data from cover padding

because of encryption

• Use realistic model to generate cover traffic– Simultaneously prevent both types of attack– Less overhead and vulnerability than constant rate

techniques

Page 6: Preventing Encrypted Traffic Analysis

6

Thesis Statement

“Tunneling real data through realistic cover traffic models is a robust defense that

provides balanced performance and security against powerful traffic analysis attacks.”

Page 7: Preventing Encrypted Traffic Analysis

7

TrafficMimic Goals

• Offer the user choices for performance versus security trade-offs– Quantify risk and performance gain

• Robustly defend traffic analysis and defense detection attacks against a powerful adversary– Favor adversary with resources and access

• Retain realism, practicality, and usability in implementation and evaluation– Ensure that real users can benefit from TrafficMimic

Page 8: Preventing Encrypted Traffic Analysis

8

Outline

• Introduction• TrafficMimic design and implementation• Independent cover traffic evaluation• Simulation and modeling performance study• Improving performance with biasing• Conclusions and future work

Page 9: Preventing Encrypted Traffic Analysis

9

Cover Traffic Tunneling

User

Google

Tunnel

header1data EA

Tunnelheader2

data EB

header2

EBTunnel header

data EA

padding

data size not leaked

header1 no longer visible

Timing changed

Page 10: Preventing Encrypted Traffic Analysis

10

Constant Rate Cover Traffic

• Current state-of-the-art defense– Effective at masking real protocol activity

• Drawbacks:1) Overhead in excess bytes and delay2) Vulnerable to defense detection3) Packet drops identify the flow in mix systems

Page 11: Preventing Encrypted Traffic Analysis

11

Realistic Cover Traffic

• Goal 1: Generate cover traffic tunnel that prevents attacker from recovering tunneled protocol

Network Data

Models

Learning Playback

5453234JA23254223

dejfwoLfowjf2394hProxy Proxy

HFA0adfdsfaalU213sdfsdf23ADasdfaK

54251234242342AF

HFA0adfalkjU4;KDJA23ADK542542342AF

Real Traffic

Cover

• Goal 2: Generate traffic of protocol X that is indistinguishable from real X traffic

Page 12: Preventing Encrypted Traffic Analysis

12

Requirements for Secure Traffic Generation

1) Network agnostic– Closed-loop, TCP-based, reactive to live network conditions– Use Swing [Vishwanath06] to capture empirical distributions of

protocol features from a network trace

2) Quality training– Sanitize training network trace data of anomalies– Care in selecting training data to avoid training-based

attacks

3) Synchronized playback– Secure cover traffic generation is bidirectional

• Requires both statistical and heuristic consistency– Single node controls all traffic generation

Page 13: Preventing Encrypted Traffic Analysis

13

TrafficMimic Design

• Tunnel uses SOCKS/HTTP or port forwarding• Cover traffic specified by type, size, timing• Master controls all cover traffic generation

• Real data encoded into messages• Message data split into fragments that contain

– message chunk, model traffic, and padding

Page 14: Preventing Encrypted Traffic Analysis

14

Tor Integrations

• Protects individual Tor SSL links from traffic analysis– OP to OR links OR to OR links OP to Bridge links

• Addresses blocking resistant design [Dingledine06]• No code changes required to any Tor components

• Integrate TM end-to-end in Tor with small code change to OR– Can advertise TM ability through existing Tor directory

Solves Alice’s Tor Problem

Page 15: Preventing Encrypted Traffic Analysis

15

Outline

• Introduction• TrafficMimic design and implementation• Independent cover traffic evaluation• Simulation and modeling performance study• Improving performance with biasing• Conclusions and future work

Page 16: Preventing Encrypted Traffic Analysis

16

Evaluation Setup

• System Implementation tested on real Internet wide-area links– Montreal (MN) to London (UK)– Urbana (IL) to San Diego (CA)

• Cover protocol models:– HTTPS, SMTP, and SSH– Fixed 28kbps constant rate model

• Three sets of network traces (~1 hour each)– CAIDA, Jan/Feb 2009, equinix-chicago monitor– UNC, April 20/29, 2003, campus border link– LANL, Sept 2009, external gateway link

Link RTT

MN-UK 85ms

IL-CA 63MS

Page 17: Preventing Encrypted Traffic Analysis

17

Protocol Classification Attack• Distill connections into network agnostic feature vectors

– Feature selection based, in part, on prior work• Use supervised learning to identify unknown protocols

– Weighted K-nearest neighbor algorithm for classification, K=3– Inter-cluster and distance threshold anomaly detection schemes

Protocol Accuracy F-Score

SSH 95.1% 0.886SMTP 88.7% 0.863

HTTPS 90.5% 0.874

Protocol Accuracy F-Score

SSH 92.3% 0.78SMTP 85.2% 0.748

HTTPS 82.0% 0.791

CAIDA/CAIDA CAIDA/LANL

Consistent with [Wright06]

• First to evaluate realistic attack• Accuracy suffers ~10% with

different test network

Page 18: Preventing Encrypted Traffic Analysis

18

Classifying Cover Traffic• 73% accuracy fooling

classifier with realistic traffic– 91% of best rate for real

traffic• Constant rate schemes

easily detected with anomaly algorithms

• What if we let the attacker train on TrafficMimic generated traffic with a binary classifier?- With independent HTTPS training/test sets and Internet Links- Accuracy: 49.4% (worse than random guessing)

Page 19: Preventing Encrypted Traffic Analysis

19

100KB Transfer over Cover Traffic• Need to put real load on cover traffic to test performance

• HTTPS-resp and SMTP high bandwidth, low overhead• HTTPS-req and SSH asymmetric bytes sent/received

- poor performance and high overhead

Page 20: Preventing Encrypted Traffic Analysis

20

Web Site Load Time Slowdown

• HTTPS provides best real/cover match• SMTP (not shown) very poor

- due to long wait times and relatively small response stream• Performance impact is considerable when compared with native

Page 21: Preventing Encrypted Traffic Analysis

21

Outline

• Introduction• TrafficMimic design and implementation• Independent cover traffic evaluation• Simulation and modeling performance

study• Improving performance with biasing• Conclusions and future work

Page 22: Preventing Encrypted Traffic Analysis

22

Performance Study

• Deeper understanding of performance with simulation/modeling

• Questions– Cover traffic tunneling impact on user experience?– Overhead compared to transmitting without cover traffic?– Dependence on relationship between real and cover traffic?

• We assess these impacts with:– tunnel-free network properties derived from simulation– real trace-driven protocol models– analytic models of cover traffic tunneling

Page 23: Preventing Encrypted Traffic Analysis

23

Simulation Model

• Larger cover sizes have higher TCP efficiency

• Conversely, large cover sizes delay real traffic– On startup waiting for next

cover to begin– On final chunk waiting for end

100 Mbit/s50 ms delay

1.5 Mbit/s20 ms delay

1.5 Mbit/s20 ms delay

client server

• Use bidirectional model of traffic based on HTTP• Collect real HTTP traffic patterns from UNC traces

- Requests are normally distributed- Responses are not, heavy tails- Use clustering to create response categories

Simulated Network

Page 24: Preventing Encrypted Traffic Analysis

24

Developing an Analytic Model

• Create bidirectional HTTP-like model based on– On-Off renewal processes

• How do we model the real delay of sending data?– Three components: startup, request, response

Page 25: Preventing Encrypted Traffic Analysis

25

Model Validation

• Use tunnel-free network data from simulation to validate tunneled response delay

• Need to make some simplifying assumptions about data– Normally distributed responses– Exponentially distributed inter-

session timing– Capture all network link

properties with single transmission cost

Model error: 3.57%Simulation error: 3.97%

Larger real load yields higher model accuracy

Page 26: Preventing Encrypted Traffic Analysis

26

Investigating Slowdown• Use Discrete-time

Markov chain• Find Slowdown: steady-

state probability for real transmission conditioned on availability of real

– Decreasing function of cover session utilization

– Real size larger than cover yields higher slowdown

– Increasing cover sizes with high utilization yields best performance

Page 27: Preventing Encrypted Traffic Analysis

27

Performance Observations

• Best when there is plenty of cover traffic to carry real– Similar to intuition for constant rate cover traffic

• But mismatches between real and cover are painful– Cover too small, wait time hurts– Cover too large, real traffic has to wait for next cover to begin

• Even when size ratio is favorable low utilization yields high slowdown

• Waiting times dominate effects of padding and network transmission

Page 28: Preventing Encrypted Traffic Analysis

28

Outline

• Introduction• TrafficMimic design and implementation• Independent cover traffic evaluation• Simulation and modeling performance study• Improving performance with biasing• Conclusions and future work

Page 29: Preventing Encrypted Traffic Analysis

29

Performance Enhancements

• So far: prevent attack using independent cover traffic– Attacker inference cannot recover information from real flow

• Can we relax strict independence without sacrificing security?– Against both traffic analysis and defense detection– Need to quantify security impact

• Concept: influence the traffic generation process with biasing

Page 30: Preventing Encrypted Traffic Analysis

30

Recall: Model-Based Traffic Generation

• Collect ECDF of structural features• Sample features using:

– uniform [0,1] random variable– inverse transform sampling

• Bias parameter selection given current real state– Biased samples still come from empirical data

UserSessions

Application ProtocolConnections

Feature ECDFs

Bias

Real state

Optimized sample

Swing

Page 31: Preventing Encrypted Traffic Analysis

31

Biasing: Two Techniques

• Functional– Create probability distribution that parameterizes biasing

effect– Derive CDF by integration and invert to select samples

• Algorithmic– Select optimized structural sample with iterative algorithm

• Goals:– Avoid splitting real objects– Minimize overhead

Minimize waiting

Page 32: Preventing Encrypted Traffic Analysis

32

Functional Biasing Example:Probability Split

• PDF given by

• x0 is the optimal point, perfect biasing– Given by the current/recent needs of the real traffic

• Derive inverse CDF F-1 of distribution by integration• Parameter p controls height of left side of x0

– standardize bias factor relation to p

p 1

Page 33: Preventing Encrypted Traffic Analysis

33

Functional Biasing Distributions

Page 34: Preventing Encrypted Traffic Analysis

34

Algorithmic Biasing

• Directly try to achieve biasing goals with algorithm

• Try Try Again (TTA)– Select sample from empirical CDF– If greater than real buffer size, use it– If not, repeat up to R times

• Linear algorithm takes first sample that does not split• Optimal algorithm finds minimum of all R samples

that does not split

Page 35: Preventing Encrypted Traffic Analysis

35

Algorithmic Biasing IllustratedUnbiased Geometric

Optimal TTA

Optimal value shown in red

Linear TTA

Page 36: Preventing Encrypted Traffic Analysis

36

Simulator

• Simulate 20,000 real objects being sent by a variable number of cover sizes

• Estimate performance with– Number of cover sessions (num splits)– Excess padding needed in final cover (overage)

• Simulation outputs results in tuples:– Denote random variables corresponding to these values Qsim

and Csim

qsim i,c i( )

Page 37: Preventing Encrypted Traffic Analysis

37

Attacking Biasing

• Deduce the real size given observation of cover size using:– Bayesian inference– Maximum likelihood estimation

• Real attack contains history of multiple cover sizes- Approximate real attack by inferring estimate qest from

individual ci observations

• Theoretical attack used for quantifying security impact of biasing– Performance in practice does not recover actionable

information (acc ~5%)

Page 38: Preventing Encrypted Traffic Analysis

38

Quantifying Information Leakage

• Use information theory: Mutual Information

• Intuitively: given two RV Qsim and Qest: measure how much knowing one variable reduces our uncertainty about the other– Measured in bits per sample pair

• Mathematically:

Page 39: Preventing Encrypted Traffic Analysis

39

Mutual Information EstimatorOur Bayesian inference:

where m represents the additional information about dist of Csim

Now consider a simplified inference function:

Qest has strictly more information than Qg, thus:

MI computed from ci order of magnitude faster to compute

Page 40: Preventing Encrypted Traffic Analysis

40

HTTPS Biasing Performance

Performance

• Biasing can improve performance by factor of 2– Significantly fewer splits than constantMTU

• Information leakage higher with linear functional techniques• Lowest MI with algorithmic and expBias

Security

Page 41: Preventing Encrypted Traffic Analysis

41

SSH over SMTP Responses

• Swapping causes large information leakage in exchange for huge performance improvement

• SMTP large request stream, small protocol-only response stream

• SSH smaller request stream, variable response stream

• Causes SMTP protocol model to “swap” sent/received ratio

Page 42: Preventing Encrypted Traffic Analysis

42

HTTP Request over HTTP Response

• Over provisioned cover model– Minimal impact on

session performance

• Several techniques can reduce overhead

Page 43: Preventing Encrypted Traffic Analysis

43

Defense Detection

• Recall: using realistic models prevents adversary from detecting TrafficMimic

• But biasing destroys pristine cover model!

• Solution: sample without replacement within a window of L samples- Replenish when subset is exhausted- Distribution is consistent with original within L observations

Page 44: Preventing Encrypted Traffic Analysis

44

Defense Detection Attack:Kolmogorov-Smirnov Test

• Attacker checks observed distribution with window W– make determination about

defense detection quicker• Attacker detection

advantage when L>W• Defender performance

advantage with larger L

• Vary L and show maximum attack W with 95% confidence of proving that observed distribution is abnormal- Because of noise in real distributions, attacker must limit W to 50- While defender can safely use up to L=300

Page 45: Preventing Encrypted Traffic Analysis

45

Performance of Limiting L

• Larger than expected improvement even with very small L

• Performance continues to improve modestly- 7.8% fewer session at

L=5000

• Conservatively use L=100 for further experiments

OptimalTTA on HTTPS

Page 46: Preventing Encrypted Traffic Analysis

46

Sampling Without Replacement

• Much lower information leakage compared to sampling with replacement– Very low Bayesian attack accuracy

• Performance improvement due to splitting avoided when subset is still large (early optimization)– 55% increase in splits without replacement

Page 47: Preventing Encrypted Traffic Analysis

47

Trade-Offs• Simulations can help drive traffic combination choices

– Relative importance of MI, overhead, and splits varies

Page 48: Preventing Encrypted Traffic Analysis

48

Bias Implementation

• Critical parameter to biasing: current real state– Master needs slave and local event loop buffer state to select

biased traffic parameters– Maintain synchronous control of traffic generation at master

• Feedback– Report real buffer state to master model thread through traffic

confirmations used for model synchronization– Potential for stale information, especially from slave

• Split Biasing– Pre-sample R samples from distribution in model thread and defer

parameter selection until just before transmission– For algorithmic techniques only where R is relatively small

Page 49: Preventing Encrypted Traffic Analysis

49

• Linear algorithms can best use distribution tails– Due to x0~=1 for much of the transfer

• Algorithmic approaches have smaller working set (<10)– Still provide modest improvements in bandwidth– Safest approaches for unknown/variable traffic combinations

Bulk Transfer over HTTP-resp

Page 50: Preventing Encrypted Traffic Analysis

50

• 3.5-5.5x improvement in bandwidth– Small SSH request stream not well suited to bulk transfer

• Bidirectional overhead reduction for HTTPS and SSH– SMTP already minimal overhead because of small response stream

Bulk Transfer with OptimalTTA

Page 51: Preventing Encrypted Traffic Analysis

51

Website Load Times with OptimalTTA

• 2.5-9.5x improvement in load time compared to independent cover models- HTTPS-Split takes 6-19 seconds to load pages

• SMTP cover traffic still impractical for most uses• Split biasing provides small advantage over feedback

Page 52: Preventing Encrypted Traffic Analysis

52

Outline

• Introduction• TrafficMimic design and implementation• Independent cover traffic evaluation• Simulation and modeling performance study• Improving performance with biasing• Conclusions and future work

Page 53: Preventing Encrypted Traffic Analysis

53

Recall: Thesis Statement

• Robust defense to powerful attack– Tested against generic protocol classification, Bayesian

Inference, and KS-test attacks

• Offers users more performance/security trade-offs– Biasing study helps users decide how to combine protocols– Algorithmic biasing provides safest performance boost

• Practical design, evaluation, and deployment– Used real network data, system implementations, and

Internet evaluation to prove effectiveness– Provided integration with existing privacy tools to aid adoption

Page 54: Preventing Encrypted Traffic Analysis

54

Conclusions

• Traffic Analysis is daunting security challenge with few deployable defenses– New and existing attacks growing in effectiveness

• TrafficMimic addresses the traffic analysis defense gap– Mimicked realistic cover traffic with 91% accuracy against

powerful protocol classification attack– Resists defense detection and traffic analysis simultaneously– Up to 9x performance increase with biasing

Page 55: Preventing Encrypted Traffic Analysis

55

Usable Technology Transition

• Integration with existing services– E.g., Tor Anonymity network, VPN clients

• Integrate with OpenSSL library – Allows arbitrary applications to utilize TrafficMimic without

code modification– Platform dependent if using library hooking

• Continue development/adoption of TrafficMimic as a stand-alone proxy– Could be Jack of all trades, but master of none