u niversity of m assachusetts d epartment of c omputer s cience reconsidering custom memory...

30
UNIVERSITY NIVERSITY OF OF M MASSACHUSETTS ASSACHUSETTS D DEPARTMENT EPARTMENT OF OF COMPUTER OMPUTER SCIENCE CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

Upload: tristan-cubberley

Post on 16-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE

Reconsidering Custom Memory Allocation

Emery Berger, Ben Zorn, Kathryn McKinley

Page 2: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 2

Custom Memory Allocation

Very common practice Apache, gcc, lcc,

STL, database servers…

Language-level support in C++

Widely recommended

Programmers replace new/delete, bypassingsystem allocator Reduce runtime – often Expand functionality –

sometimes Reduce space – rarely

“Use custom allocators”

Page 3: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 3

Drawbacks of Custom Allocators

Avoiding system allocator: More code to maintain & debug Can’t use memory debuggers Not modular or robust:

Mix memory from customand general-purpose allocators → crash!

Increased burden on programmers

Are custom allocators really a win?

Page 4: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 4

Overview

Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose

allocators Advantages and drawbacks of regions Reaps – generalization of regions &

heaps

Page 5: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 5

Class1 free list

(1) Per-Class Allocators

a

b

c

a = new Class1;b = new Class1;c = new Class1;delete a;delete b;delete c;a = new Class1;b = new Class1;c = new Class1;

Recycle freed objects from a free list

+ Fast+ Linked list operations

+ Simple+ Identical semantics+ C++ language

support- Possibly space-

inefficient

Page 6: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 6

(II) Custom Patterns

Tailor-made to fit allocation patterns Example: 197.parser (natural

language parser)char[MEMORY_LIMIT]

a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);

a b cd

end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array

+ Fast+ Pointer-bumping allocation

- Brittle- Fixed memory size- Requires stack-like

lifetimes

Page 7: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 7

(III) Regions

+ Fast+ Pointer-bumping

allocation+ Deletion of chunks

+ Convenient+ One call frees all memory

regionmalloc(r, sz)regiondelete(r)

Separate areas, deletion only en masse

regioncreate(r) r

- Risky- Dangling

references- Too much

space

Increasingly popular custom allocator

Page 8: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 8

Overview

Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose

allocators Advantages and drawbacks of regions Reaps – generalization of regions &

heaps

Page 9: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 9

Custom Allocators Are Faster…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

No

rma

lize

d R

un

tim

e

Custom Win32

non-regions regions

As good as and sometimes much faster than Win32

Page 10: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 10

Not So Fast…Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc

non-regions regions

DLmalloc: as fast or faster for most benchmarks

Page 11: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 11

The Lea Allocator (DLmalloc 2.7.0)

Mature public-domain general-purpose allocator

Optimized for common allocation patterns Per-size quicklists ≈ per-class allocation

Deferred coalescing(combining adjacent free objects) Highly-optimized fastpath

Space-efficient

Page 12: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 12

Space Consumption: Mixed Results

Space - Custom Allocator Benchmarks

0

0.25

0.5

0.751

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rmal

ized

Sp

ace

Custom DLmalloc

regionsnon-regions

Page 13: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 13

Overview

Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose

allocators Advantages and drawbacks of regions Reaps – generalization of regions &

heaps

Page 14: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 14

Regions – Pros and Cons

+ Fast, convenient, etc.+ Avoid resource leaks (e.g., Apache)

Tear down memory for terminated connections

- No individual object deletion Unbounded memory consumption

(producer-consumer, long-running computations, off-the-shelf programs)

Apache: vulnerable to DoS, memory leaks

Page 15: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 15

Reap = region + heap Adds individual object deletion & heap

Reap Hybrid Allocator

reapmalloc(r, sz)

reapdelete(r)

reapcreate(r)r

reapfree(r,p)

+ Can reduce memory consumption+ Fast

+ Adapts to use (region or heap style)+ Cheap deletion

Page 16: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 16

Reap Runtime

Runtime - Custom Allocation Benchmarks

00.25

0.50.75

11.25

1.51.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d r

un

tim

e

Custom Win32 DLmalloc Reap

non-regions regions

Page 17: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 17

Reap Space

Space - Custom Allocator Benchmarks

00.25

0.50.75

11.25

1.51.75

No

rmal

ized

Sp

ace

Custom DLmalloc Reap

non-regions regions

Page 18: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 18

Reap: Best of Both Worlds

Allows mixing of regions and new/delete

Case study: New Apache module “mod_bc”

bc: C-based arbitrary-precision calculator

Changed 20 lines out of 8000 Benchmark: compute 1000th prime

With Reap: 240K Without Reap: 7.4MB

Page 19: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 19

Conclusions and Future Work

Empirical study of custom allocators Lea allocator often as fast or faster Non-region custom allocation

ineffective Reap: region performance without

drawbacks Future work:

Reduce space with per-page bitmaps Combine with scalable general-purpose

allocator (e.g., Hoard)

Page 20: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 20

Software

http://www.cs.umass.edu/~emery(Reap: part of Heap Layers

distribution)

http://g.oswego.edu(DLmalloc 2.7.0)

Page 21: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 21

If You Can Read This,I Went Too Far

Page 22: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 22

Backup Slides

Page 23: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 23

Experimental Methodology

Comparing to general-purpose allocators Same semantics: no problem

E.g., disable per-class allocators Different semantics: use emulator

Uses general-purpose allocator Adds bookkeeping to supportregion semantics

Page 24: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 24

Why Did They Do That?

Recommended practice Premature optimization

Microbenchmarks vs. actual performance

Drift Not bottleneck anymore

Improved competition Modern allocators are better

Page 25: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 25

Reaps as Regions: Runtime

Runtime - Region-Based Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

lcc mudlle

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc Reap

Reap performance nearly matches regions

Page 26: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 26

Using Reap as Regions

Runtime - Region-Based Benchmarks

0

0.5

1

1.5

2

2.5

lcc mudlle

No

rma

lize

d R

un

tim

e

Original Win32 DLmalloc WinHeap Vmalloc Reap

4.08

Reap performance nearly matches regions

Page 27: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 27

Drawbacks of Regions

Can’t reclaim memory within regions Bad for long-running computations,

producer-consumer patterns, “malloc/free” programs

unbounded memory consumption

Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming

Page 28: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 28

Use Custom Allocators?

Strongly recommended by practitioners

Little hard data on performance/space improvements Only one previous study [Zorn 1992] Focused on just one type of allocator Custom allocators: waste of time

Small gains, bad allocators Different allocators better? Trade-

offs?

Page 29: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 29

Kinds of Custom Allocators

Three basic types of custom allocators Per-class

Fast Custom patterns

Fast, but very special-purpose Regions

Fast, possibly more space-efficient Convenient Variants: nested, obstacks

Page 30: U NIVERSITY OF M ASSACHUSETTS D EPARTMENT OF C OMPUTER S CIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 30

Optimization Opportunity

Time Spent in Memory Operations

0

20

40

60

80

100

% o

f ru

nti

me

Memory Operations Other