hardware mechanisms for distributed dynamic software analysis joseph l. greathouse advisor: prof....

76
Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Upload: warren-phillips

Post on 18-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Hardware Mechanisms for Distributed Dynamic

Software Analysis

Joseph L. Greathouse

Advisor: Prof. Todd Austin

May 10, 2012

Page 2: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

NIST: Software errors cost U.S. ~$60 billion/year

FBI: Security Issues cost U.S. $67 billion/year >⅓ from viruses, network intrusion, etc.

Software Errors Abound

2

Page 3: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Example of a Modern Bug

3

Thread 2mylen=large

Thread 1mylen=small

ptr∅

Nov. 2010 OpenSSL Security FlawNov. 2010 OpenSSL Security Flaw

if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len);}

Page 4: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Example of a Modern Bug

4

if(ptr==NULL)if(ptr==NULL)

if(ptr==NULL)if(ptr==NULL)

memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)

ptrLEAKED

Thread 2mylen=large

Thread 1mylen=small

len2=thread_local->mylen;len2=thread_local->mylen;

ptr=malloc(len2);ptr=malloc(len2);

len1=thread_local->mylen;len1=thread_local->mylen;

ptr=malloc(len1);ptr=malloc(len1);

memcpy(ptr, data1, len1)memcpy(ptr, data1, len1)

Page 5: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

In spite of proposed hardware solutions

Hardware Plays a Role in this Problem

5

Hardware Data

Race RecordingDeterministic Execution/Replay

Atomicity Violation

DetectorsBug-Free Memory Models

Bulk Memory Commits

TRANSACTIONALMEMORY

Page 6: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Taint Analysis

Dynamic Bounds Checking

Data Race Detection(e.g. Inspector XE)

Memory Checking(e.g. MemCheck)

Dynamic Software Analyses

6

2-200x2-200x

2-80x2-80x5-50x5-50x

2-300x2-300x

Analyze the program as it runs+ Find errors on any executed path– LARGE overheads, only test one path at a time

Page 7: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Goals of this Thesis Allow high quality dynamic software analyses

Find difficult bugs that weaker analyses miss

Distribute the tests to large populations Must be low overhead or users will get angry

Sampling + Hardware to accomplished this Each user only tests a small part of the program Each test should be helped by hardware

7

Page 8: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Meeting These Goals - Thesis Overview

8

Allow high quality dynamic software analyses

Allow high quality dynamic software analyses

Dat

aflo

wA

naly

sis Dataflow

Analysis

Data Race Detection

Software Support Hardware Support

Dat

a R

ace

Det

ectio

n

Page 9: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Meeting These Goals - Thesis Overview

9

Dataflow Analysis Sampling (CGO’11)

Dataflow Analysis Sampling (CGO’11)

Dataflow Analysis Sampling

(MICRO’08)

Dataflow Analysis Sampling

(MICRO’08)Dat

aflo

wA

naly

sis

Dat

a R

ace

Det

ectio

n

Software Support Hardware Support

Unlimited Watchpoint

System (ASPLOS’12)

Unlimited Watchpoint

System (ASPLOS’12)

Hardware-Assisted Demand-DrivenRace Detection (ISCA’11)

Hardware-Assisted Demand-DrivenRace Detection (ISCA’11)

Distribute the tests + Sample the analyses

Distribute the tests + Sample the analyses

Tests must below overhead

Tests must below overhead

Page 10: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Outline

Problem Statement

Distributed Dynamic Dataflow Analysis

Demand-Driven Data Race Detection

Unlimited Watchpoints

10

Page 11: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Outline

Problem Statement

Distributed Dynamic Dataflow Analysis

Demand-Driven Data Race Detection

Unlimited Watchpoints

11

SWDataflowSampling

SWDataflowSampling

HW Dataflow Sampling

HW Dataflow SamplingWatc

hpoints

Watch

points

Demand-Driven Race DetectionDemand-Driven Race Detection

Page 12: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Distributed Dynamic Dataflow Analysis

12

Split analysis across large populations Observe more runtime states Report problems developer never thought to test

Instrumented Program Potential

problems

Page 13: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Taint Analysis(e.g.TaintCheck)

Dynamic Bounds Checking

Data Race Detection(e.g. Thread Analyzer)

Memory Checking(e.g. MemCheck)

The Problem: OVERHEADS

13

2-200x2-200x

2-80x2-80x5-50x5-50x

2-300x2-300x

Analyze the program as it runs+ System state, find errors on any executed path– LARGE runtime overheads, only test one path

Page 14: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Current Options Limited

14

CompleteAnalysis

CompleteAnalysis

NoAnalysis

NoAnalysis

Page 15: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Solution: Sampling

15

Lower overheads by skipping some analyses

CompleteAnalysis

CompleteAnalysis

NoAnalysis

NoAnalysis

Page 16: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Fe

w U

sers

Ma

ny

Use

rs

Lower overheads mean more users

CompleteAnalysis

CompleteAnalysis

NoAnalysis

NoAnalysis

Sampling Allows Distribution

16

Developers

Beta Testers

End Users

Many users testing at little overhead see more errors than

one user at high overhead.

Many users testing at little overhead see more errors than

one user at high overhead.

Page 17: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024 w = x + 42w = x + 42 Check wCheck w

Example Dynamic Dataflow Analysis

17

validate(x)validate(x)x = read_input()x = read_input() Clear

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024

x = read_input()x = read_input()

Propagate

Associate

InputInput

Check aCheck a

Check zCheck z

Data

Meta-data

Page 18: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Sampling must be aware of meta-data

Remove meta-data from skipped dataflows

Sampling Dataflows

18

Page 19: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Dataflow Sampling

19

Sampling ToolSampling Tool

Analysis ToolAnalysis ToolApplicationApplication Analysis ToolAnalysis Tool

Meta-Data DetectionMeta-Data DetectionMeta-dataMeta-data

Clear meta-dataClear meta-dataOH ThresholdOH Threshold

Page 20: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Finding Meta-Data

20

No additional overhead when no meta-data Needs hardware support

Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints

V→P V→PFAULT

Page 21: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Prototype Setup Xen+QEMU Taint analysis sampling system

Network packets untrusted

Performance Tests – Network Throughput Example: ssh_receive

Sampling Accuracy Tests Real-world Security Exploits

21

Xen HypervisorXen Hypervisor

OS and ApplicationsOS and Applications

AppApp AppApp AppApp…

LinuxLinux

ShadowPage Table

ShadowPage Table

Admin VMAdmin VM

Taint Analysis QEMU

Taint Analysis QEMU

Net StackNet

Stack OHMOHM

Page 22: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

ssh_receive

Performance of Dataflow Sampling

22

Throughput with no analysis

Page 23: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Accuracy with Background Tasks ssh_receive running in background

23

Page 24: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Outline

Problem Statement

Distributed Dynamic Dataflow Analysis

Demand-Driven Data Race Detection

Unlimited Watchpoints

24

SWDataflow Sampling

SWDataflow Sampling

HW Dataflow Sampling

HW Dataflow SamplingWatc

hpoints

Watchpoint

s

Demand-Driven Race DetectionDemand-Driven Race Detection

Page 25: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Dynamic Data Race Detection

Add checks around every memory access

Find inter-thread sharing

Synchronization between write-shared accesses? No? Data race.

25

Page 26: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

SW Race Detection is Slow

26

Phoenix PARSEC

Page 27: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Inter-thread Sharing is What’s Important

27

if(ptr==NULL)if(ptr==NULL)

if(ptr==NULL)if(ptr==NULL)

len1=thread_local->mylen;len1=thread_local->mylen;

ptr=malloc(len1);ptr=malloc(len1);

memcpy(ptr, data1, len1)memcpy(ptr, data1, len1)

len2=thread_local->mylen;len2=thread_local->mylen;

ptr=malloc(len2);ptr=malloc(len2);

memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)

Thread-local dataNO SHARING

Shared dataNO INTER-THREAD SHARING EVENTS

Page 28: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Very Little Dynamic Sharing

28

Phoenix PARSEC

Page 29: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Run the Analysis On Demand

29

Multi-threadedApplication

Multi-threadedApplication

SoftwareRace Detector

SoftwareRace Detector

Local Access

Inter-thread sharing

Inter-thread sharing

Inter-thread Sharing MonitorInter-thread Sharing Monitor

Page 30: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Finding Inter-thread Sharing Virtual Memory Watchpoints?

– ~100% of accesses cause page faults

Granularity Gap Per-process not per-thread

30

FAULTFAULT

Inter-Thread Sharing

Page 31: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Hardware Sharing Detector HITM in Cache Memory: W→R Data Sharing

Hardware Performance Counters

31

Read YRead YSS

IISS

IIHITM

Core 1 Core 2

PipelinePipeline

CacheCache

00

00

00

00

Perf. Ctrs

11

11-1-1 FAULT

Write Y=5Write Y=5 MMY=5Y=5

Page 32: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Potential Accuracy & Perf. Problems Limitations of Performance Counters

Intel HITM only finds W→R Data Sharing

Limitations of Cache Events SMT sharing can’t be counted Cache eviction causes missed events

Events go through the kernel

32

Page 33: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

On-Demand Analysis on Real HW

33

Execute Instruction

SW Race Detection

Enable Analysis

Disable Analysis

HITMInterrupt?

HITMInterrupt?

Sharing Recently?

AnalysisEnabled?

NO

NO

NOYES

YES

YES

> 97%

< 3%

Page 34: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Performance Increases

34

Phoenix PARSEC

51x51x

Accuracy vs. Continuous Analysis:

97%

Accuracy vs. Continuous Analysis:

97%

Page 35: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Outline

Problem Statement

Distributed Dynamic Dataflow Analysis

Demand-Driven Data Race Detection

Unlimited Watchpoints

35

SWDataflow Sampling

SWDataflow Sampling

HW Dataflow Sampling

HW Dataflow SamplingWatc

hpoints

Watchpoint

s

Demand-Driven Race DetectionDemand-Driven Race Detection

Page 36: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Bounds Checking

Watchpoints Work for Many Analyses

36

Taint Analysis

Data Race Detection

Deterministic Execution

Transactional Memory

Speculative Parallelization

Page 37: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Desired Watchpoint Capabilities Large Number

Store in memory Cache on chip

Fine-grained Watch full VA

Per Thread Cached per HW thread

Ranges Range Cache

37

V W X YZ???

WP Fault False Fault

False Faults

Page 38: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Range Cache

38

0x0 0xffff_ffff Not Watched

Start Address End Address Watchpoint?

1

0

0

Valid

Set Addresses 0x5 – 0x2000R-Watched

Set Addresses 0x5 – 0x2000R-Watched

0x4

0x20000x5 R Watched 1

0x2001 0xffff_ffff Not Watched 1

Load Address 0x400

Load Address 0x400

≤ 0x400? ≥ 0x400?

R Watched

WPInterrupt

Page 39: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Memory

Watchpoint System Design

39

Store Ranges in Main Memory Per-Thread Ranges, Per-Core Range Cache Software Handler on RC miss or overflow Write-back RC works as a write filter Precise, user-level watchpoint faults

T1 Memory T2 Memory Core 1 Core 2

WP Changes

Page 40: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Experimental Evaluation Setup

Trace-based timing simulator using Pin

Taint analysis on SPEC INT2000 Race Detection on Phoenix and PARSEC

Comparing only shadow value checks

40

Page 41: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Watchpoint-Based Taint Analysis

41

19x10x 30x 206x 423x 23x1429x

128 entry RC –or– 64 entry RC + 2KB Bitmap

20%Slowdown

28x

RC+Bitmap

Page 42: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Watchpoint-Based Data Race Detection

42

+10% +20%

RC+Bitmap

Page 43: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Future Directions Dataflow Tests find bugs on executed code

What about code that is never executed?

Sampling + Demand-Driven Race Detection Good synergy between the two, like taint analysis

Further watchpoint hardware studies: Clear microarchitectural analysis More software systems, different algorithms

43

Page 44: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

SoftwareDataflow Analysis Sampling

SoftwareDataflow Analysis Sampling

Conclusions Sampling allows distributed dataflow analysis

Existing hardware can speed up race detection

Watchpoint hardware useful everywhere

44

HardwareDataflow Analysis Sampling

HardwareDataflow Analysis Sampling

Unlimited Watchpoint System

Unlimited Watchpoint System

Hardware-Assisted Demand-DrivenData Race Detection

Hardware-Assisted Demand-DrivenData Race Detection

Distributed DynamicSoftware Analysis

Distributed DynamicSoftware Analysis

Page 45: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

45

Thank You

Page 46: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

BACKUP SLIDES

46

Page 47: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Finding Errors Brute Force

Code review, fuzz testing,whitehat/grayhat hackers

– Time-consuming, difficult

Static Analysis Automatically analyze source,

formal reasoning, compiler checks– Intractable, requires expert input,

no system state

47

Page 48: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Dynamic Dataflow Analysis

Associate meta-data with program values

Propagate/Clear meta-data while executing

Check meta-data for safety & correctness

Forms dataflows of meta/shadow information

48

Page 49: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Demand-Driven Dataflow Analysis

49

Only Analyze Shadowed Data

NativeApplication

NativeApplication

InstrumentedApplication

InstrumentedApplication

Meta-Data DetectionMeta-Data Detection

Non-Shadowed

Data

Shadowed Data

Shadowed Data

Page 50: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Results by Ho et al. lmbench Best Case Results:

Results when everything is tainted:

50

System Slowdown

Taint Analysis 101.7x

On-Demand Taint Analysis 1.98x

Page 51: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

CompleteAnalysis

CompleteAnalysis

NoAnalysis

NoAnalysis

Sampling Allows Distribution

51

Developer

Beta TestersEnd Users

Many users testing at little overhead see more errors than

one user at high overhead.

Many users testing at little overhead see more errors than

one user at high overhead.

Page 52: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024 w = x + 42w = x + 42

Cannot Naïvely Sample Code

52

Validate(x)Validate(x)x = read_input()x = read_input()

a += ya += y

y = x * 1024y = x * 1024

False PositiveFalse Positive

False NegativeFalse Negative

w = x + 42w = x + 42

validate(x)validate(x)x = read_input()x = read_input() Skip Instr.

InputInput

Check wCheck w

Check zCheck zSkip Instr.

Check aCheck a

Page 53: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024 w = x + 42w = x + 42

Dataflow Sampling Example

53

validate(x)validate(x)x = read_input()x = read_input()

a += ya += y

y = x * 1024y = x * 1024

False NegativeFalse Negative

x = read_input()x = read_input()Skip Dataflow

InputInput

Check wCheck w

Check zCheck z

Skip Dataflow

Check aCheck a

Page 54: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Benchmarks Performance – Network Throughput

Example: ssh_receive Accuracy of Sampling Analysis

Real-world Security Exploits

54

Name Error Description

Apache Stack overflow in Apache Tomcat JK Connector

Eggdrop Stack overflow in Eggdrop IRC bot

Lynx Stack overflow in Lynx web browser

ProFTPD Heap smashing attack on ProFTPD Server

Squid Heap smashing attack on Squid proxy server

Page 55: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

netcat_receive

Performance of Dataflow Sampling (2)

55

Throughput with no analysis

Page 56: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

ssh_transmit

Performance of Dataflow Sampling (3)

56

Throughput with no analysis

Page 57: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold

Lowest probability of detecting exploits

57

Name Chance of Detecting Exploit

Apache 100%

Eggdrop 100%

Lynx 100%

ProFTPD 100%

Squid 100%

Page 58: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

netcat_receive running with benchmark

Accuracy with Background Tasks

58

Page 59: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Outline Problem Statement

Proposed Solutions Distributed Dynamic Dataflow Analysis Testudo: Hardware-Based Dataflow Sampling Demand-Driven Data Race Detection

Future Work

Timeline

59

HW Dataflow Sampling

HW Dataflow SamplingWatc

hpoints

Watchpoint

s

DataflowSamplingDataflowSampling

Demand-Driven Race DetectionDemand-Driven Race Detection

Page 60: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Virtual Memory Not Ideal

60

FAULTFAULT netcat_receive

Page 61: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Word Accurate Meta-Data

What happens when the cache overflows? Increase the size of main memory? Store into virtual memory?

Use Sampling to Throw Away Data

61

Data CacheData Cache

Word AccurateMeta Data CacheWord Accurate

Meta Data Cache

PipelinePipeline

Page 62: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

On-Chip Sampling Mechanism

62

Avg

. #

of e

xecu

tions

512-entry cache 1024-entry cache17601

Page 63: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Useful for Scaling to Complex AnalysesIf each shadow operation uses 1000 instructions:

63

Av

era

ge

% O

ve

rhea

d

1024-entry Sample Cache

Av

era

ge

% O

ve

rhea

d

telnet server benchmark

% %

3500

executions

17.3%

0.3%

169,000

executions

Page 64: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Thread 2mylen=large

Thread 1mylen=small

if(ptr==NULL)if(ptr==NULL)

if(ptr==NULL)if(ptr==NULL)

len1=thread_local->mylen;len1=thread_local->mylen;

ptr=malloc(len1);ptr=malloc(len1);

memcpy(ptr, data1, len1)memcpy(ptr, data1, len1)

len2=thread_local->mylen;len2=thread_local->mylen;

ptr=malloc(len2);ptr=malloc(len2);

memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)

Example of Data Race Detection

64

ptr write-shared?ptr write-shared?Interleaved Synchronization?

Interleaved Synchronization?

Page 65: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Demand-Driven Analysis Algorithm

65

Page 66: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Demand-Driven Analysis on Real HW

66

Page 67: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Performance Difference

67

Phoenix PARSEC

Page 68: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Demand-Driven Analysis Accuracy

68

1/11/1 2/42/4 3/33/3 4/44/4 3/33/3 4/44/4 4/44/42/42/4

Accuracy vs. Continuous Analysis:

97%

Accuracy vs. Continuous Analysis:

97%

Page 69: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Accuracy on Real Hardware

69

kmeans facesim ferret freqmine vips x264 streamcluster

W→W 1/1 (100%)

0/1(0%)

- - 1/1 (100%)

- 1/1(100%)

R→W - 0/1(0%)

2/2 (100%)

2/2 (100%)

1/1 (100%)

3/3 (100%)

1/1(100%)

W→R - 2/2 (100%)

1/1 (100%)

2/2 (100%)

1/1 (100%)

3/3/ (100%)

1/1(100%)

SpiderMonkey-0

SpiderMonkey-1

SpiderMonkey-2

NSPR-1 Memcached-1 Apache-1

W→W 9/9(100%)

1/1(100%)

1/1(100%)

3/3(100%)

- 1/1(100%)

R→W 3/3(100%)

- 1/1 (100%)

1/1(100%)

1/1(100%)

7/7(100%)

W→R 8/8 (100%)

1/1(100%)

2/2 (100%)

4/4(100%)

- 2/2(100%)

kmeans facesim ferret freqmine vips x264 streamcluster

W→W 1/1 (100%)

0/1(0%)

- - 1/1 (100%)

- 1/1(100%)

R→W - 0/1(0%)

2/2 (100%)

2/2 (100%)

1/1 (100%)

3/3 (100%)

1/1(100%)

W→R - 2/2 (100%)

1/1 (100%)

2/2 (100%)

1/1 (100%)

3/3/ (100%)

1/1(100%)

Page 70: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

HW Interrupt when touching watched data

SW knows it’s touching important data AT NO OVERHEAD

Normally used for debugging

Hardware-Assisted Watchpoints

70

0 1 2 3 4 5 6 7

A B C D E F G H

LD 2R-Watch 2-4

DC E

WR X→7

X

W-Watch 6-7

G X

Page 71: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Existing Watchpoint Solutions Watchpoint Registers

– Limited number (4-16), small reach (4-8 bytes)

Virtual Memory– Coarse-grained, per-process, only aligned ranges

ECC Mangling– Per physical address, all cores, no ranges

71

Page 72: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Meeting These Requirements Unlimited Number of Watchpoints

Store in memory, cache on chip Fine-Grained

Watch full virtual addresses Per-Thread

Watchpoints cached per core/thread TID Registers

Ranges Range Cache

72

Page 73: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

The Need for Many Small Ranges Some watchpoints better suited for ranges

32b Addresses: 2 ranges x 64b each = 16B Some need large # of small watchpoints

51 ranges x 64b each = 408B Better stored as bitmap? 51 bits!

Taint analysis has good ranges Byte-accurate race detection does not..

73

Page 74: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Watchpoint System Design II Make some RC entries point to bitmaps

74

-

Start Addr End Addr Pointer toWP Bitmap

R

-

W

1

V

1

B

Memory CoreRanges Bitmaps Range Cache Bitmap Cache

Accessed in ParallelAccessed in Parallel

Page 75: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Watchpoint-Based Taint Analysis

75

19x10x 30x 206x 423x 23x1429x

128 entry Range Cache

20%Slowdown

28x

Page 76: Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012

Width Test

76