exploiting multicore cpus now: scalability and reliability for off-the-shelf software

56
UNIVERSITY OF NIVERSITY OF MASSACHUSETTS ASSACHUSETTS AMHERST MHERST Department of Computer Science Department of Computer Science 2006 2006 Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software Emery Berger University of Massachusetts Amherst

Upload: emery-berger

Post on 10-May-2015

4.314 views

Category:

Technology


1 download

DESCRIPTION

Multiple core CPUs are here. Conventional wisdom holds that, to take best advantage of these processors, we now need to rewrite sequential applications to make them multithreaded. Because of the difficulty of programming correct and efficient multithreaded applications (e.g., race conditions, deadlocks, and scalability bottlenecks), this is a major challenge.This talk presents two alternative approaches that bring the power of multiple cores to today's software. The first approach focuses on building highly-concurrent client-server applications from legacy code. I present a system called Flux that allows users to take unmodified off-the-shelf *sequential* C and C++ code and build concurrent applications. The Flux compiler combines the Flux program and the sequential code to generate a deadlock-free, high-concurrency server. Flux also generates discrete event simulators that accurately predict actual server performance under load. While the Flux language was initially targeted at servers, we have found it to be a useful abstraction for sensor networks, and I will briefly talk about our use of an energy-aware variant of Flux in a deployment on the backs of endangered turtles. The second approach uses the extra processing power of multicore CPUs to make legacy C/C++ applications more reliable. I present a system called DieHard that uses randomization and replication to transparently harden programs against a wide range of errors, including buffer overflows and dangling pointers. Instead of crashing or running amok, DieHard lets programs continue to run correctly in the face of memory errors with high probability. This is joint work with Brendan Burns, Kevin Grimaldi, Alex Kostadinov, Jacob Sorber, and Mark Corner (University of Massachusetts Amherst), and Ben Zorn (Microsoft Research).

TRANSCRIPT

Page 1: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Exploiting Multicore CPUs Now:

Scalability and Reliabilityfor Off-the-shelf Software

Emery BergerUniversity of Massachusetts Amherst

Page 2: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Research Overview

High-performance memory managers Hoard allocator for concurrent apps [ASPLOS-IX] Heap Layers infrastructure [PLDI 01] Reaps (regions + heaps) [OOPSLA 02]

Cooperative memory management (OS + GC) Bookmarking: GC without paging [PLDI 04] CRAMM VM + any GC, max thruput [ISMM 04, OSDI 06]

And: Memory management studies

Custom allocation [OOPSLA 02], GC vs. malloc [OOPSLA 05]

Support for contributory applications Transparent contribution: memory, disk [USENIX 06, FAST

07] Plus other compiler & runtime stuff

Transparently improving performance,robustness & reliability (PL + OS)

Page 3: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

processor 0 processor 1

Concurrent Memory Allocators Previous allocators

unsuitable for multithreaded apps Serialized heap

Protected by lock Allocator-induced

false sharing Poor space bounds:

blowup O(P), O(T), or

unbounded increase in memory

= in use, processor 0

= free, on heap 1

Key:

free(x1)

x2= malloc(1)free(x2)

x1= malloc(1)

x3= malloc(1)free(x3)

“pure private heaps” (STL, Cilk,

others)

Page 4: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Hoard Memory Allocator Hoard

Scalable heap Provably low synch

overhead Optimal space

consumption:blowup = O(1)

Avoids false sharing

www.hoard.org 40,000+ downloads AOL, BT, Philips, Credit

Suisse, Novell, etc.

Page 5: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

The Cores Have Arrived

Hurray! Now what?

Multithreading problems: Data races Deadlock & livelock Scalability bottlenecks

Automatic Parallelization?

Page 6: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Exploit Multicores Now!

Taking advantage of multicores without rewriting a line of code:

Build scalable applications from parts

Flux: “glue” language for easily building highly-concurrent servers [USENIX 06]

Increase reliability DieHard: lets C/C++ programs run

correctly in face of memory errors with high probability[PLDI 06]

Page 7: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

FluxA Language for Programming

High-Performance Servers

joint work with Brendan Burns, Kevin Grimaldi,Alex Kostadinov, Mark Corner

University of Massachusetts Amherst

Page 8: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Motivating Example: Image Server

Client Requests image @

desired quality, size Server

Images: RAW Compresses to JPG Caches requests Sends to client

http://server/Easter-bunny/200x100/75

not found

client

imageserver

Page 9: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Problem: Concurrency

Could write sequential code but… More clients (latency) Bigger server

Multicores, multiprocessors

One approach: threads Risk deadlock, etc. Mixes program logic &

concurrency control –ties to runtime (threads?!) clients

imageserver

Page 10: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

The Flux Programming Language

Unmodified C, C++ (or Java) – black boxes

+ Compose with Flux program Assume #clients » #cores

= High-quality server + performance tools: Statically enforces atomicity w/o deadlock Path profiling Discrete event simulator

High-performance & deadlock-freeconcurrent programming w/ sequential

components

Page 11: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Flux Server “Main”

Source nodes originate flows Conceptually in separate thread Executes inside implicit infinite loop

Initiates flow (“thread”) for each image request

ReadRequest WriteCompress CompleteListen

ReadRequest WriteCompress CompleteReadRequest WriteCompress Complete

source Listen Image;

image server

Page 12: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Flux Image Server

ReadRequest WriteCompress Complete

libjpeg sockethttp http

Basic image server requires: HTTP parsing (http) Socket handling (socket) JPEG compression (libjpeg) All UNIX-style C libraries

Abstract node = flow across nodes Concrete or abstract

Image = ReadRequest Compress Write Complete;

image server

Page 13: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Listen

Control Flow

Direct flow via user-supplied predicate types Type test applied to output

Note: no variables – dispatch on output “type” Here: cache frequently requested images

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Completehit

handler

handler

Image = ReadRequest Handler Write Complete;

typedef hit TestInCache;Handler:[_,_,hit] = ;Handler:[_,_,_] = ReadFromDisk Compress StoreInCache;

Page 14: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Listen

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

CompletehitReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Completehit

Supporting Concurrency Many clients = concurrent flows

Must keep cache consistent Atomicity constraints

Same name = mutual exclusion (2PL) Apply to nodes or whole flow (abstract node)

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Completehit

atomic CheckCache {};atomic Complete {, };atomic StoreInCache {};

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Complete

hit

handler

handler

Page 15: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

More Atomicity

Reader / writer constraints Multiple readers or single writer

(default)atomicatomic ReadList: {listAccess?};atomicatomic AddToList: {listAccess!};

Per-session constraints User-supplied function ≈ hash on

source Added to flow ≈ chooses from array of locksatomicatomic AddHasChunk: {chunks(session)};

Page 16: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006 16

Preventing Deadlock

Naïve execution can deadlock

Establish canonical lock order Partial order Alphabetic by name

atomic A: {z,y};atomic B: {y,z};

atomic A: {y,z};atomic B: {y,z};

Page 17: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006 17

Preventing Deadlock, II

A = B;C = D;atomic A:{z};atomic B:{y};atomic C:{y,z};

BA

CB

A:{z}C

BA:{z}

C:{y}

A = B;C = D;atomic A:{y,z};atomic B:{y};atomic C:{y,z};

Harder with abstract nodes

Solution: Elevate constraints; fixed point

BA:{y,z}

C

Page 18: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Almost Complete Flux Image Server

Listenimageserver

Concise, readable expression of server logic No threads, etc.: simplifies programming, debugging

source Listen Image;Image = ReadRequest CheckCache Handler Write Complete;Handler[_,_,hit] = ;Handler[_,_,_] = ReadFromDisk Compress StoreInCache;

atomic CheckCache: {cacheLock};atomic StoreInCache: {cacheLock};atomic Complete: {cacheLock};

handle error ReadInFromDisk FourOhFour;

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Complete

hit

handler

handler

Page 19: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Flux Outline

Intro to Flux: building a server Components, flow Atomicity, deadlock avoidance

Performance results Server performance Performance prediction

Future work

Page 20: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006 20

Flux Results Four servers:

Image server [23] + libjpeg

Multi-player game [54]

BitTorrent [84] 2 undergrads: 1

week! Web server [36]

+ PHP

Evaluation Benchmark: variant

of SPECweb99 Compared to

Capriccio [SOSP03], SEDA [SOSP01]

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Completehit

handler

handler

ReadRequest ReadRequest ReadRequestCheckCache Compress

CheckCache CheckCache CheckCache Write StoreInCache

thread-per-connection

event-driven

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Completehit

handler

handler

thread pool

Page 21: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006 21

Web Server

Page 22: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006 22

Performance Prediction

observedparameter

s

Page 23: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006 23

Performance Prediction

observedparameter

s

Page 24: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Flux Conclusion

Flux language & system Concurrency made easier Build high-performance servers

from sequential parts Deadlock-free

Predict & debug performance before deployment

Page 25: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Future Work: eFlux

eFlux: language for perpetual computing Sensors ≈ client-server! Energy-aware language

Flows decorated with power states (e.g., “high”, “low”)

Provide different levels of service depending on available & predicted energy

Wood turtle (Clemmys insculpta)

Page 26: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

DieHard:Probabilistic Memory Safety

for Unsafe Programming Languages

Joint work with Ben Zorn (Microsoft Research)

Page 27: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Problems with Unsafe Languages

C, C++: pervasive apps, but langs.memory unsafe

Numerous opportunities for security vulnerabilities, errors Double free Invalid free Uninitialized reads Dangling pointers Buffer overflows (stack & heap)

Page 28: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Current Approaches

Unsound, may work or abort Windows, GNU libc, etc., Rx [Zhou]

Unsound, will definitely continue Failure oblivious [Rinard]

Sound, definitely aborts (fail-safe) CCured [Necula], CRED [Ruwase & Lam],

SAFECode [Dhurjati, Kowshik & Adve], &c. Slowdowns: 30% - 20X Requires C source, programmer intervention Garbage collection or partially sound (pools)

Good for debugging, less for deployment

Page 29: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Soundness for “Erroneous” Programs

Normally: memory errors ) ? … Consider infinite-heap allocator:

All news fresh;ignore delete

No dangling pointers, invalid frees,double frees

Every object infinitely large No buffer overflows, data overwrites

Transparent to correct program “Erroneous” programs sound

Page 30: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Probabilistic Memory Safety

Approximate with M-heaps (e.g., M=2)

Naïve: pad allocations, defer deallocations

– No protection from larger overflows– pad = 8 bytes, overflow = 9 bytes…

– Deterministic: overflow crashes everyone

DieHard: randomize M-heap+ Probabilistic memory safety

+ Independent across heaps

? Efficient implementation…

Page 31: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Implementation Choices

Conventional, freelist-based heaps Hard to randomize, protect from

errors Double frees, heap corruption

What about bitmaps? [Wilson90]– Catastrophic fragmentation

Each small object likely to occupy one page

obj obj objobj

pages

Page 32: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Randomized Heap Layout

Bitmap-based, segregated size classes Bit represents one object of given size

i.e., one bit = 2i+3 bytes, etc. Prevents fragmentation

00000001 1010 10size = 2i+3 2i+

4

2i+

5

metadata

heap

Page 33: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Randomized Allocation

malloc(8): compute size class = ceil(log2 sz) – 3 randomly probe bitmap for zero-bit (free)

Fast: runtime O(1) M=2 ) E[# of probes] · 2

00000001 1010 10size = 2i+3 2i+

4

2i+

5

metadata

heap

Page 34: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

malloc(8): compute size class = ceil(log2 sz) – 3 randomly probe bitmap for zero-bit (free)

Fast: runtime O(1) M=2 ) E[# of probes] · 2

00010001 1010 10size = 2i+3 2i+

4

2i+

5

metadata

heap

Randomized Allocation

Page 35: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

free(ptr): Ensure object valid – aligned to right address Ensure allocated – bit set Resets bit

Prevents invalid frees, double frees

00010001 1010 10size = 2i+3 2i+

4

2i+

5

metadata

heap

Randomized Deallocation

Page 36: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Randomized Deallocation

free(ptr): Ensure object valid – aligned to right address Ensure allocated – bit set Resets bit

Prevents invalid frees, double frees

00010001 1010 10size = 2i+3 2i+

4

2i+

5

metadata

heap

Page 37: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

free(ptr): Ensure object valid – aligned to right address Ensure allocated – bit set Resets bit

Prevents invalid frees, double frees

00000001 1010 10size = 2i+3 2i+

4

2i+

5

metadata

heap

Randomized Deallocation

Page 38: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Randomized Heaps & Reliability

2 34 5 3 1 6

object size = 2i+4

object size = 2i+3

11 6 3 2 5 4 …

My Mozilla: “malignant” overflow

Your Mozilla: “benign” overflow

Objects randomly spread across heap Different run = different heap

Errors across heaps independent

Page 39: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

DieHard software architecture

“Output equivalent” – kill failed replicas

broadcast

vote

input output

execute replicas(separate processes)

replica3seed3

replica1seed1

replica2seed2

Each replica has different allocator

Page 40: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

DieHard Results

Analytical results (pictures!) Buffer overflows Uninitialized reads Dangling pointer errors (the best)

Empirical results Runtime overhead Error avoidance

Injected faults & actual applications

Page 41: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

Model overflow as write of live data Heap half full (max occupancy)

Page 42: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

Model overflow as write of live data Heap half full (max occupancy)

Page 43: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

Model overflow: random write of live data Heap half full (max occupancy)

Page 44: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

Replicas: Increase odds of avoiding overflow in at least one replica

rep

licas

Page 45: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

Replicas: Increase odds of avoiding overflow in at least one replica

rep

licas

Page 46: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

Replicas: Increase odds of avoiding overflow in at least one replica

rep

licas

P(Overflow in all replicas) = (1/2)3 = 1/8 P(No overflow in ¸ 1 replica) = 1-(1/2)3 = 7/8

Page 47: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Analytical Results: Buffer Overflows

F = free space H = heap size N = # objects

worth of overflow

k = replicas

Overflow one object

Page 48: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Empirical Results: Runtime

Page 49: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Empirical Results: Runtime

Page 50: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Empirical Results: Error Avoidance

Injected faults: Dangling pointers (@50%, 10

allocations) glibc: crashes; DieHard: 9/10 correct

Overflows (@1%, 4 bytes over) – glibc: crashes 9/10, inf loop; DieHard: 10/10

correct

Real faults: Avoids Squid web cache overflow

Crashes BDW & glibc Avoids dangling pointer error in Mozilla

DoS in glibc & Windows

Page 51: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

DieHard Conclusion Randomization + replicas =

probabilistic memory safety Improves over today (0%) Useful point between absolute

soundness (fail-stop) and unsound Future work – locate & fix errors

automatically

Trades hardware resources (RAM,CPU) for reliability

Hardware trends Larger memories, multi-core CPUs

Follows in footsteps ofECC memory, RAID

Page 52: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

The End

http://www.cs.umass.edu/~emery/diehard

Linux, Solaris (stand-alone & replicated) Windows (stand-alone only)

flux: from Latin fluxus, p.p. of fluere = “to flow”

http://flux.cs.umass.edu

Hosted by Flux web server Download via Flux BitTorrent

Page 53: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Backup

Page 54: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Listen

Handling Errors What if image requested doesn’t exist?

Error = negative return value from component Remember – nodes oblivious to Flux

Solution: error handlers Go to alternate paths on error Possible extension – can match on error paths

FourOhFour

handle error ReadInFromDisk FourOhFour;

ReadRequest

ReadInFromDisk

WriteCheckCache

Compress StoreInCache

Complete

hit

handler

handler

Page 55: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Flux Outline

Intro to Flux: building a server Components Flows Atomicity

Performance results Server performance Performance prediction

Future work

Page 56: Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf Software

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTS ASSACHUSETTS AAMHERST • MHERST • Department of Computer Science Department of Computer Science • 2006 • 2006

Probabilistic Memory Safety

Fully-randomized memory manager Increases odds of benign memory errors Ensures independent heaps across users

Replication Run multiple replicas simultaneously,

vote on results Detects crashing & non-crashing errors

DieHard: correct execution in face of errorswith high probability