u niversity of m assachusetts d epartment of c omputer s cience reconsidering custom memory...
TRANSCRIPT
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE
Reconsidering Custom Memory Allocation
Emery Berger, Ben Zorn, Kathryn McKinley
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 2
Custom Memory Allocation
Very common practice Apache, gcc, lcc,
STL, database servers…
Language-level support in C++
Widely recommended
Programmers replace new/delete, bypassingsystem allocator Reduce runtime – often Expand functionality –
sometimes Reduce space – rarely
“Use custom allocators”
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 3
Drawbacks of Custom Allocators
Avoiding system allocator: More code to maintain & debug Can’t use memory debuggers Not modular or robust:
Mix memory from customand general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 4
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 5
Class1 free list
(1) Per-Class Allocators
a
b
c
a = new Class1;b = new Class1;c = new Class1;delete a;delete b;delete c;a = new Class1;b = new Class1;c = new Class1;
Recycle freed objects from a free list
+ Fast+ Linked list operations
+ Simple+ Identical semantics+ C++ language
support- Possibly space-
inefficient
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 6
(II) Custom Patterns
Tailor-made to fit allocation patterns Example: 197.parser (natural
language parser)char[MEMORY_LIMIT]
a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);
a b cd
end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array
+ Fast+ Pointer-bumping allocation
- Brittle- Fixed memory size- Requires stack-like
lifetimes
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 7
(III) Regions
+ Fast+ Pointer-bumping
allocation+ Deletion of chunks
+ Convenient+ One call frees all memory
regionmalloc(r, sz)regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky- Dangling
references- Too much
space
Increasingly popular custom allocator
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 8
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 9
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
No
rma
lize
d R
un
tim
e
Custom Win32
non-regions regions
As good as and sometimes much faster than Win32
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 10
Not So Fast…Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc
non-regions regions
DLmalloc: as fast or faster for most benchmarks
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 11
The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose allocator
Optimized for common allocation patterns Per-size quicklists ≈ per-class allocation
Deferred coalescing(combining adjacent free objects) Highly-optimized fastpath
Space-efficient
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 12
Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
0
0.25
0.5
0.751
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rmal
ized
Sp
ace
Custom DLmalloc
regionsnon-regions
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 13
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 14
Regions – Pros and Cons
+ Fast, convenient, etc.+ Avoid resource leaks (e.g., Apache)
Tear down memory for terminated connections
- No individual object deletion Unbounded memory consumption
(producer-consumer, long-running computations, off-the-shelf programs)
Apache: vulnerable to DoS, memory leaks
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 15
Reap = region + heap Adds individual object deletion & heap
Reap Hybrid Allocator
reapmalloc(r, sz)
reapdelete(r)
reapcreate(r)r
reapfree(r,p)
+ Can reduce memory consumption+ Fast
+ Adapts to use (region or heap style)+ Cheap deletion
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 16
Reap Runtime
Runtime - Custom Allocation Benchmarks
00.25
0.50.75
11.25
1.51.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rma
lize
d r
un
tim
e
Custom Win32 DLmalloc Reap
non-regions regions
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 17
Reap Space
Space - Custom Allocator Benchmarks
00.25
0.50.75
11.25
1.51.75
No
rmal
ized
Sp
ace
Custom DLmalloc Reap
non-regions regions
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 18
Reap: Best of Both Worlds
Allows mixing of regions and new/delete
Case study: New Apache module “mod_bc”
bc: C-based arbitrary-precision calculator
Changed 20 lines out of 8000 Benchmark: compute 1000th prime
With Reap: 240K Without Reap: 7.4MB
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 19
Conclusions and Future Work
Empirical study of custom allocators Lea allocator often as fast or faster Non-region custom allocation
ineffective Reap: region performance without
drawbacks Future work:
Reduce space with per-page bitmaps Combine with scalable general-purpose
allocator (e.g., Hoard)
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 20
Software
http://www.cs.umass.edu/~emery(Reap: part of Heap Layers
distribution)
http://g.oswego.edu(DLmalloc 2.7.0)
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 21
If You Can Read This,I Went Too Far
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 22
Backup Slides
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 23
Experimental Methodology
Comparing to general-purpose allocators Same semantics: no problem
E.g., disable per-class allocators Different semantics: use emulator
Uses general-purpose allocator Adds bookkeeping to supportregion semantics
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 24
Why Did They Do That?
Recommended practice Premature optimization
Microbenchmarks vs. actual performance
Drift Not bottleneck anymore
Improved competition Modern allocators are better
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 25
Reaps as Regions: Runtime
Runtime - Region-Based Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
lcc mudlle
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc Reap
Reap performance nearly matches regions
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 26
Using Reap as Regions
Runtime - Region-Based Benchmarks
0
0.5
1
1.5
2
2.5
lcc mudlle
No
rma
lize
d R
un
tim
e
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
Reap performance nearly matches regions
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 27
Drawbacks of Regions
Can’t reclaim memory within regions Bad for long-running computations,
producer-consumer patterns, “malloc/free” programs
unbounded memory consumption
Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 28
Use Custom Allocators?
Strongly recommended by practitioners
Little hard data on performance/space improvements Only one previous study [Zorn 1992] Focused on just one type of allocator Custom allocators: waste of time
Small gains, bad allocators Different allocators better? Trade-
offs?
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 29
Kinds of Custom Allocators
Three basic types of custom allocators Per-class
Fast Custom patterns
Fast, but very special-purpose Regions
Fast, possibly more space-efficient Convenient Variants: nested, obstacks
UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTSASSACHUSETTS • • D DEPARTMENTEPARTMENT OF OF CCOMPUTER OMPUTER SSCIENCECIENCE 30
Optimization Opportunity
Time Spent in Memory Operations
0
20
40
60
80
100
% o
f ru
nti
me
Memory Operations Other