sampling dynamic dataflow analyses joseph l. greathouse advanced computer architecture laboratory...
DESCRIPTION
Simple Buffer Overflow Example 3 void foo() { int local_variables; int buffer[256]; … buffer = read_input(); … return; } Return address Local variables buffer Buffer Fill New Return address Bad Local variables If read_input() reads 200 ints If read_input() reads >256 ints bufferTRANSCRIPT
Sampling Dynamic Dataflow Analyses
Joseph L. GreathouseAdvanced Computer Architecture Laboratory
University of Michigan
University of British ColumbiaJune 10, 2011
2
NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005
>⅓ from viruses, network intrusion, etc.
Software Errors Abound
2000 2001 2002 2003 2004 2005 2006 2007 2008 (proj.)
0
3,000
6,000
9,000CERT-Cataloged Vulnerabilities
3
Simple Buffer Overflow Examplevoid foo(){ int local_variables; int buffer[256]; … buffer = read_input(); … return;}
Return address
Local variables
buffer
Buffer Fill
New Return address
Bad Local variables
If read_input() reads 200 ints If read_input() reads >256 ints
buffer
4
Concurrency Bugs Also Matter
if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len);}
Thread 2mylen=large
Nov. 2010 OpenSSL Security Flaw
Thread 1mylen=small
5
Concurrency Bugs Also Matter
if(ptr==NULL)if(ptr==NULL)
memcpy(ptr, data2, len2)
ptrLEAKED
TIM
E
Thread 2mylen=large
Thread 1mylen=small
∅
len2=thread_local->mylen;ptr=malloc(len2);
len1=thread_local->mylen;ptr=malloc(len1);
memcpy(ptr, data1, len1)
6
One Layer of a Solution High quality dynamic software analysis
Find difficult bugs that other analyses miss
Distribute Tests to Large Populations Low overhead or users get angry
Accomplished by sampling the analyses Each user only tests part of the program
7
Dynamic Dataflow Analysis
Associate meta-data with program values
Propagate/Clear meta-data while executing
Check meta-data for safety & correctness
Forms dataflows of meta/shadow information
8
a += y z = y * 75
y = x * 1024 w = x + 42 Check wCheck w
Example Dynamic Dataflow Analysis
validate(x)x = read_input() Clear
a += y z = y * 75
y = x * 1024
x = read_input()
Propagate
Associate
Input
Check aCheck a
Check zCheck z
Data
Meta-data
9
Distributed Dynamic Dataflow Analysis Split analysis across large populations
Observe more runtime states Report problems developer never thought to test
Instrumented Program Potential
problems
10
Taint Analysis(e.g.TaintCheck)
Dynamic Bounds Checking
FP Accuracy Verification
Symbolic Execution
Data Race Detection(e.g. Helgrind)
Memory Checking(e.g. Dr. Memory)
Problem: DDAs are Slow
10-200x2-200x
10-80x
5-50x100-500x
2-300x
One Solution: Sampling
11
Lower overheads by skipping some analyses
0
25
50
75
100
Overhead
Idea
lD
etec
tion
Acc
urac
y (%
)
CompleteAnalysis
NoAnalysis
Sampling Allows Distribution
12
0
25
50
75
100
Overhead
Idea
lD
etec
tion
Acc
urac
y (%
)
Developer
Beta TestersEnd Users
Many users testing at little overhead see more errors than
one user at high overhead.
13
a += y z = y * 75
y = x * 1024 w = x + 42
Cannot Naïvely Sample Code
Validate(x)x = read_input()
a += y
y = x * 1024
False Positive
False Negative
w = x + 42
validate(x)x = read_input() Skip Instr.
Input
Check wCheck w
Check zCheck zSkip Instr.
Check aCheck a
14
Sampling must be aware of meta-data
Remove meta-data from skipped dataflows Prevents false positives
Sample Data, not Code
15
a += y z = y * 75
y = x * 1024 w = x + 42
Dataflow Sampling Example
validate(x)x = read_input()
a += y
y = x * 1024
False Negative
x = read_input() Skip Dataflow
Input
Check wCheck w
Check zCheck z
Skip Dataflow
Check aCheck a
16
Mechanisms for Dataflow Sampling (1) Start with demand analysis
Demand Analysis Tool
NativeApplication
InstrumentedApplication
(e.g. Valgrind)
InstrumentedApplication
(e.g. Valgrind)
Shadow Value DetectorMeta-data
No meta-data
17
Mechanisms for Dataflow Sampling (2) Remove dataflows if execution is too slow
Sampling Analysis Tool
InstrumentedApplication
NativeApplication
InstrumentedApplication
Shadow Value DetectorMeta-data
Clear meta-dataOH Threshold
18
Prototype Setup Taint analysis sampling system
Network packets untrusted Xen-based demand analysis
Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled
Xen Hypervisor
OS and Applications
App App App…
Linux
ShadowPage Table
Admin VM
Taint Analysis QEMU
Net Stack OHM
19
Benchmarks Performance – Network Throughput
Example: ssh_receive Accuracy of Sampling Analysis
Real-world Security Exploits
Name Error DescriptionApache Stack overflow in Apache Tomcat JK ConnectorEggdrop Stack overflow in Eggdrop IRC botLynx Stack overflow in Lynx web browserProFTPD Heap smashing attack on ProFTPD ServerSquid Heap smashing attack on Squid proxy server
20
ssh_receive
Performance of Dataflow Sampling (1)
0 20 40 60 80 1000
5
10
15
20
25
Maximum % Time in Analysis
Thro
ughp
ut (M
B/s
) Throughput with no analysis
21
netcat_receive
Performance of Dataflow Sampling (2)
0 20 40 60 80 1000
10
20
30
40
50
60
Maximum % Time in Analysis
Thro
ughp
ut (M
B/s
)
Throughput with no analysis
22
ssh_transmit
Performance of Dataflow Sampling (3)
0 20 40 60 80 1000
5
10
15
20
25
Maximum % Time in Analysis
Thro
ughp
ut (M
B/s
)
Throughput with no analysis
23
Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold
Lowest probability of detecting exploits
Name Chance of Detecting ExploitApache 100%Eggdrop 100%Lynx 100%ProFTPD 100%Squid 100%
24
netcat_receive running with benchmark
Accuracy with Background Tasks
10% 25% 50% 75% 90%0
20
40
60
80
100
0.38 0.4
ApacheEggdropLynxProFTPDSquid
Maximum Allowed Overhead %
% C
hanc
e of
Det
ectin
g E
xplo
it
25
Accuracy with Background Tasks ssh_receive running in background
10% 25% 50% 75% 90%0
20
40
60
80
100
0.1 0.7 2
ApacheEggdropLynxProFTPDSquid
Maximum % Time in Analysis
% C
hanc
e of
Det
ectin
g E
xplo
it
26
Conclusion & Future WorkDynamic dataflow sampling gives users aknob to control accuracy vs. performance
Better methods of sample choices Combine static information New types of sampling analysis
27
BACKUP SLIDES
28
Outline
Software Errors and Security Dynamic Dataflow Analysis Sampling and Distributed Analysis Prototype System Performance and Accuracy
29
Detecting Security Errors Static Analysis
Analyze source, formal reasoning+ Find all reachable, defined errors– Intractable, requires expert input,
no system state Dynamic Analysis
Observe and test runtime state+ Find deep errors as they happen– Only along traversed path,
very slow
30
Width Test