how’s the parallel computing revolution going?
DESCRIPTION
How’s the Parallel Computing Revolution Going?. Kathryn S. McKinley The University of Texas at Austin. 20 th Century Simplicity. Hardware. software does not change it just runs faster. 20 th Century Simplicity. Software. hardware does not change it just runs faster. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/1.jpg)
How’s the Parallel Revolution Going? 1
How’s the Parallel Computing Revolution
Going?
McKinley
Kathryn S. McKinleyThe University of Texas at Austin
![Page 2: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/2.jpg)
How’s the Parallel Revolution Going? 2
20th Century Simplicity
McKinley
Hardware
software does not change
it just runs faster
![Page 3: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/3.jpg)
How’s the Parallel Revolution Going? 3McKinley
hardware does not change
it just runs faster
Software
20th Century Simplicity
![Page 4: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/4.jpg)
How’s the Parallel Revolution Going? 4
How could they pretend?
McKinley
![Page 5: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/5.jpg)
How’s the Parallel Revolution Going? 5McKinley
Hardware Capabilities &
Complexity
sequential interface sequential interface
SoftwareCapabilities &
Complexity
Sequential interface hid explosion in capability & complexity
20th Century Virtuous Cycle
![Page 6: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/6.jpg)
How’s the Parallel Revolution Going? 6
20th Century Languagesinsufficient for software complexity
NativeProgrammingLanguages
McKinley
![Page 7: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/7.jpg)
How’s the Parallel Revolution Going? 7
21st Century Managed Language Revolution
McKinley
PHP
![Page 8: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/8.jpg)
How’s the Parallel Revolution Going? 8McKinley
Hardware Capabilities &
Complexity
sequential interface
20th Century Virtuous Cycle
sequential interface
Managed Languages
SoftwareCapabilities
![Page 9: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/9.jpg)
How’s the Parallel Revolution Going? 9
Processor Evolution
Power 41.3 GHz130nm 174M Tr.267 mm2
2 Cores2001
i72.7 GHz45nm
731M Tr.263mm2
4 Cores x 2 SMT2008
i53.4 GHz32nm
382M Tr.81mm2
2C x 2T2010
Power 52.3 GHz90nm
276M Tr.389 mm2
2 Cores2005
![Page 10: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/10.jpg)
Processor Evolutionwhy multicore?
![Page 11: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/11.jpg)
How’s the Parallel Revolution Going? 11
Processor Evolutionwhy multicore?
on chip power constraints & wire delay slowed clock scaling
McKinley
![Page 12: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/12.jpg)
How’s the Parallel Revolution Going? 12McKinley
Hardware Capabilities &
Complexity
sequential interface
20th Century Virtuous Cycle
✗ sequential interface
Managed Languages
SoftwareCapabilities
✗
![Page 13: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/13.jpg)
How’s the Parallel Revolution Going? 13McKinley
Parallel Hardware
Capabilities
Parallel interface Parallel interface
21st Century Virtuous Cycle?
? Managed Languages
SoftwareCapabilities
![Page 14: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/14.jpg)
How’s the Parallel Revolution Going? 14McKinley
21st Century Virtuous Cycle ?parallel interface
combines time and spacewicked to program
8MB L3
CPU CPU
8KB L1
512KB L2
Pentium 4w/ SMT
CPU CPU
32KB
4MB L2
Core 2 Quad
32KB
CPU CPU
32KB
4MB L2
32KB
system bus
32KB
256KB
32KB 32KB 32KB
256KB
256KB
256KB
Core i7
CPUs
![Page 15: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/15.jpg)
How’s the Parallel Revolution Going? 15
How is this new virtuous cycle going?
McKinley
![Page 16: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/16.jpg)
How’s the Parallel Revolution Going? 16
What should we measure?
McKinley
![Page 17: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/17.jpg)
How’s the Parallel Revolution Going? 17
performancepowerenergy
native languagesmanaged languages
sequential & parallel programs
McKinley
![Page 18: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/18.jpg)
How’s the Parallel Revolution Going? 18
How do we measure power?
McKinley
![Page 19: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/19.jpg)
Measured Power, Performance & Scaling
19Esmaeilzadeh et al
![Page 20: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/20.jpg)
How’s the Parallel Revolution Going? 20
Looking Back on the Language & Hardware Revolutions:
Measured Power, Performance, and ScalingASPLOS 2011
McKinley
Stephen M. BlackburnAustralian National University
Kathryn S. McKinleyUniversity of Texas at Austin
Hadi EsmaeilzadehUniversity of Washington
Ting CaoAustralian National University
Xi YangAustralian National University
![Page 21: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/21.jpg)
21
Workload4 groups weighed equally
61 benchmarks from 6 suites
Native Non-Scalable: SPECcpu 2006 Native Scalable: PARSEC 2008
Java Non-Scalable: SPECjvm98, JBB’05 DaCapo’06
Java Scalable: DaCapo’09
![Page 22: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/22.jpg)
22
Intel Processors5 technology generations from
similar price points
Pentium 4130nm55M Tr.
131mm2
1C x 2T2003
Core 2 D65nm
291M Tr.143mm2
2C 2006
i745nm
731M Tr.263mm2
4C x 2T2008
Atom45nm
47M Tr.36mm2
1C x 2T2008
Core 2 D45nm
228M Tr.82mm2
2C2009
Atom D45nm
176M Tr.87mm2
2Cx2T+GPU2009
i532nm
382M Tr.81mm2
2C x 2T2010
![Page 23: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/23.jpg)
Measured Power, Performance & Scaling
23
TDP & Measured Power
Esmaeilzadeh et al
2 20 2000
2
20
200
P4 (130)C2D (65)C2Q (65)i7 (45)Atom (45)C2D (45)AtomD (45)i5 (32)
TDP (W) (log)
Mea
sure
d Po
wer
(W)
(log)
![Page 24: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/24.jpg)
Measured Power, Performance & Scaling
24
Measured Power vs Performance
Esmaeilzadeh et al
0.5 510
Performance / Reference Performance
Pow
er (
W)
20
40
80
100
60
1 2 3 4
??
2003Pentium 4 (130)
2008Core 2 Duo (45)
2006Core 2 Duo (65)
2008i7 (45)
2010i5 (32)
![Page 25: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/25.jpg)
How’s the Parallel Revolution Going? 25
How is this new virtuous cycle going
for native non-scalable?
McKinley
![Page 26: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/26.jpg)
26
Native Non-Scalable Performance
McKinley
470.lbm
465.to
nto
437.les
lie3d
435.gr
omacs
434.ze
usmp
462.lib
quantu
m
464.h2
64ref
445.go
bmk
458.sje
ng
459.Gem
sFDTD
416.ga
mess
444.na
md
436.ca
ctusADM
400.pe
rlben
ch
454.ca
lculix
401.bz
ip2
447.de
alII
483.xa
lancbm
k
482.sp
hinx3
456.hm
mer
471.om
netpp
453.po
vray
429.m
cf
473.as
tar
403.gc
c
450.so
plex
433.m
ilc0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
2C1T 4C1T 4C2T
Perf
orm
ance
/ 1C
1T
Perf
orm
ance
![Page 27: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/27.jpg)
27
Native Non-Scalable Energy
McKinley
470.lbm
465.to
nto
437.les
lie3d
435.gr
omacs
434.ze
usmp
462.lib
quantu
m
464.h2
64ref
445.go
bmk
458.sje
ng
459.Gem
sFDTD
416.ga
mess
444.na
md
436.ca
ctusADM
400.pe
rlben
ch
454.ca
lculix
401.bz
ip2
447.de
alII
483.xa
lancbm
k
482.sp
hinx3
456.hm
mer
471.om
netpp
453.po
vray
429.m
cf
473.as
tar
403.gc
c
450.so
plex
433.m
ilc0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2C1T 4C1T 4C2T
Ener
gy /
1C1T
Ene
rgy
![Page 28: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/28.jpg)
How’s the Parallel Revolution Going? 28
How is this new virtuous cycle going
for Java single threaded?
McKinley
![Page 29: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/29.jpg)
29
Java Single Threaded Performance
McKinley
antlr fop
luindex
_209_d
bblo
at
_228_j
ack
_213_j
avac
_202_j
ess
_222_m
pega
udio
_201_c
ompre
ss0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
2C1T 4C1T 4C2T
Perf
orm
ance
/ 1C
1T
Perf
orm
ance
![Page 30: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/30.jpg)
30
Java Single Threaded Energy
McKinley
antlr fop
luindex
_209_d
bblo
at
_228_j
ack
_213_j
avac
_202_j
ess
_222_m
pega
udio
_201_c
ompre
ss0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2C1T 4C1T 4C2T
Ener
gy /
1C1T
Ene
rgy
![Page 31: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/31.jpg)
How’s the Parallel Revolution Going? 31
How is this new virtuous cycle going
for native scalable?
McKinley
![Page 32: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/32.jpg)
32
Native Scalable Performance
McKinley
ferret
swapt
ions
blacks
choles
raytra
ce
fluida
nimate x26
4
facesi
m
bodytr
ack
strea
mcluste
rvip
s
canne
al0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
2C1T 4C1T 4C2T
Perf
orm
ance
/ 1C
1T
Perf
orm
ance
![Page 33: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/33.jpg)
33
Native Scalable Energy
McKinley
ferret
swapt
ions
blacks
choles
raytra
ce
fluida
nimate x26
4
facesi
m
bodytr
ack
strea
mcluste
rvip
s
canne
al0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2C1T 4C1T 4C2T
Ener
gy/ 1
C1T
Ener
gy
![Page 34: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/34.jpg)
How’s the Parallel Revolution Going? 34
How is this new virtuous cycle going
for Java scalable?
McKinley
![Page 35: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/35.jpg)
35
Java Multithreaded Performance
McKinley
sunflow
tomcat xal
an
lusear
checl
ipse
pjbb2
005
_227_m
trt
tradeb
eans
jytho
nbat
ikavr
ora pmd h2
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
2C1T 4C1T 4C2T
Perf
orm
ance
/ 1C
1T
Perf
orm
ance
![Page 36: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/36.jpg)
36
Java Multithreaded Energy
McKinley
sunflow
tomcat xal
an
lusear
checl
ipse
pjbb2
005
_227_m
trt
tradeb
eans
jytho
nbat
ikavr
ora pmd h2
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2C1T 4C1T 4C2T
Ener
gy /
1C1T
Ene
rgy
![Page 37: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/37.jpg)
How’s the Parallel Revolution Going? 37
Is there hope?
McKinley
![Page 38: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/38.jpg)
How’s the Parallel Revolution Going? 38McKinley
parallel interfacecombines time and space
wicked to program
8MB L3
CPU CPU
8KB L1
512KB L2
Pentium 4w/ SMT
CPU CPU
32KB
4MB L2
Core 2 Quad
32KB
CPU CPU
32KB
4MB L2
32KB
system bus
32KB
256KB
32KB 32KB 32KB
256KB
256KB
256KB
Core i7
CPUs
![Page 39: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/39.jpg)
How’s the Parallel Revolution Going? 39
Vision• Algorithms must be space and time efficient
• Scalable Runtimes– Runtime & application parallelism & concurrency– CMP aware runtime improves application scalability
• Communication– Cache coherency is expensive and performance sensitive– Memory bandwidth scaling is problematic
• Heterogeneity– Move non-critical path off power-hungry cores– Smarter, more aggressive analysis
• Specialization?– Tuned cores? Special purpose cores?
McKinley
![Page 40: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/40.jpg)
How’s the Parallel Revolution Going? 40
Managed Languages
Challenges & Opportunities
McKinley
![Page 41: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/41.jpg)
How’s the Parallel Revolution Going? 41
Must start with a
scalable managed runtime
McKinley
![Page 42: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/42.jpg)
How’s the Parallel Revolution Going? 42
Sequential Managed Programs
McKinley
Application Managed Runtime
SingleCore
time
• Profiling• Dynamic Analysis• Compilation• Garbage Collection• Other Helper Threads• ……
![Page 43: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/43.jpg)
How’s the Parallel Revolution Going? 43
Steps towards scalability
McKinley
Step 1. Parallel application
ApplicationThreads
Core 0Core 1Core 2Core 3Core 4Core 5Core 6Core 7
time
Unused cores
Each thread has different running time
![Page 44: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/44.jpg)
How’s the Parallel Revolution Going? 44
Steps towards scalability
McKinley
Step 2. Parallel runtime
Application
Threads
Core 0Core 1Core 2Core 3Core 4Core 5Core 6Core 7
time
Runtime
Managed Application
Threads
Runtime waits for all application threads to pause
![Page 45: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/45.jpg)
How’s the Parallel Revolution Going? 45
Steps towards scalability
McKinley
Step 3. Parallel & concurrent runtime
Application
Threads
Core 0Core 1Core 2Core 3Core 4Core 5Core 6Core 7
time
Runtime
Managed Application
Threads
Managed runtime on application’s critical pathmay perturb performance
![Page 46: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/46.jpg)
How’s the Parallel Revolution Going? 46
Steps towards scalability Ideal model
McKinley
Step 4. Minimize perturbation
Application
Threads
Core 0Core 1Core 2Core 3Core 4Core 5Core 6Core 7
time
Threads
Analysis
Application
Threads
Offload work to concurrent runtime threads
Whole runtime task taken off critical path
![Page 47: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/47.jpg)
How’s the Parallel Revolution Going? 47
Steps towards scalability Ideal model
McKinley
Step 4. Minimize perturbation
Application
Threads
Core 0Core 1Core 2Core 3Core 4Core 5Core 6Core 7
time
Threads
Analysis
Application
Threads
Worst case is parallel & concurrent
![Page 48: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/48.jpg)
How’s the Parallel Revolution Going? 48
Scalable VM Services• Profiling (feedback directed optimization)
– Concurrent analysis– More invasive analysis on low-power cores– J. Ha et al. OOPSLA’09, Bond et al., PLDI’10, etc.
• GC– High performance parallel & concurrent GC– High performance mostly non-moving GC– Reduced synchronization overheads– Distributed & scratchpad GC– Blackburn et al. PLDI’10,CACM’08,PLDI’08,SIGMETRICS’04, etc.
• JIT– Concurrent, parallel JIT– Cost-benefit shift with low-power cores– Ha et al. PESPMA’09
• Architecture– Tuned and/or specialized cores for runtime services– Coherence tailored for restricted, common case of GC
McKinley
![Page 49: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/49.jpg)
How’s the Parallel Revolution Going? 49
Today• Profiling (feedback directed optimization)
– Concurrent analysis– More invasive analysis on low-power cores– J. Ha et al. OOPSLA’09, Bond et al., PLDI’10, etc.
• GC– High performance parallel & concurrent GC– High performance mostly non-moving GC– Reduced synchronization overheads– Distributed & scratchpad GC– Blackburn et al. PLDI’10,CACM’08,PLDI’08,SIGMETRICS’04, etc.
• JIT– Concurrent, parallel JIT– Cost-benefit shift with low-power cores– Ha et al. PESPMA’09
• Architecture– Tuned and/or specialized cores for runtime services– Coherence tailed for restricted, common case of GC
McKinley
![Page 50: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/50.jpg)
How’s the Parallel Revolution Going? 50
Garbage Collection
McKinley
![Page 51: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/51.jpg)
How’s the Parallel Revolution Going? 51
Isn’t Garbage Collection retro?
McKinley
Mark-CompactStyger, 1967
Mark-SweepMcCarthy, 1960
Semi-SpaceCheney, 1970
canonical algorithms
![Page 52: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/52.jpg)
How’s the Parallel Revolution Going? 52
Programmer Productivity
McKinley
![Page 53: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/53.jpg)
How’s the Parallel Revolution Going? 53
Programmer Productivity
& Performance?
McKinley
![Page 54: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/54.jpg)
How’s the Parallel Revolution Going? 54
GC FundamentalsAlgorithmic Components
Allocation Reclamation
McKinley
Identification
Bump Allocation
Free List
`
Tracing(implicit)
Reference Counting(explicit)
Sweep-to-Free
Compact
Evacuate3 1
![Page 55: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/55.jpg)
55
Mark-Compact [Styger 1967]Bump allocation + trace + compact
GC FundamentalsCanonical Garbage Collectors
`
Sweep-to-Free
Compact
Evacuate
Mark-Sweep [McCarthy 1960]Free-list + trace + sweep-to-free
Semi-Space [Cheney 1970]Bump allocation + trace + evacuate
![Page 56: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/56.jpg)
56
Garbage Collection
Space
Tim
e
Total PerformanceSemiSpaceMarkCompactMarkSweep
Space
Tim
e
Performance PathologiesMark-Sweep, Mark-Compact, Semi-Space
Mutator
Space
Tim
e
Minimum Heap
Spac
e
Geometric mean of DaCapo’06, jvm98, and jbb2000 on 2.4GHz Core 2 Duo
Mark-SweepPoor locality
Semi-SpaceSpace
inefficient
Mark-Compact expensive multi-pass
McKinley
![Page 57: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/57.jpg)
How’s the Parallel Revolution Going? 57
Can we have space and time efficiency?
McKinley
![Page 58: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/58.jpg)
How’s the Parallel Revolution Going? 58
Mark-RegionPLDI 2008
McKinley
Kathryn S. McKinley Stephen M. BlackburnUniversity of Texas at Austin Australian National University
![Page 59: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/59.jpg)
How’s the Parallel Revolution Going? 59
Mark-Regionwith Sweep-To-Region
McKinley
`
Sweep-to-Free
Compact
Evacuate
Reclamation
Sweep-to-Region
Mark-SweepFree-list + trace + sweep-to-free
Mark-CompactBump allocation + trace + compact
Semi-SpaceBump allocation + trace + evacuate
Mark-RegionBump + trace + sweep-to-region
![Page 60: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/60.jpg)
How’s the Parallel Revolution Going? 60
Naïve Mark-Region
McKinley
• Contiguous allocation into regionsExcellent locality– Objects cannot span regions
• Simple mark phase– Mark objects and their region
• Free unmarked region
0
![Page 61: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/61.jpg)
How’s the Parallel Revolution Going? 61
Region Size?Lines and Blocks
McKinley
Small Regions
Large Regions
✗ Fragmentation (can’t fill blocks)
✓ More contiguous allocation ✗ Fragmentation (false marking)
Lines & BlocksN pages approx 1 cache line
✓ Less fragmentation Objects span lines
✓ Fast common case Lines marked with objects
✗ Increased metadata o/h✗ Constrained object sizes
0
TLB locality, cache locality Block > 4 X max object sizeFree FreeRecyclable lines Recyclable lines
![Page 62: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/62.jpg)
How’s the Parallel Revolution Going? 62
Allocation Policy(Recycling)
McKinley
• Recycle partially marked blocks first Minimizes fragmentation Maximizes sharing of freed blocks
![Page 63: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/63.jpg)
How’s the Parallel Revolution Going? 63
Immix Mark-RegionParallel
Opportunistic defragmentation
Overflow allocation
Implicit marking
McKinley
![Page 64: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/64.jpg)
How’s the Parallel Revolution Going? 64
Garbage Collection
Space
Tim
e
Total Performance
MarkSweepMarkCompactSemiSpaceImmix
Space
Tim
e
Immix Mark-RegionBump Allocation + Trace + Sweep-to-Region
Mutator
Space
Tim
e
Minimum Heap
Spac
e
✓ Simple, very fast collection
✓Space
efficient✓Good
locality
✓Excellent
performance
Geometric mean of DaCapo’06, jvm98, and jbb2000 on 2.4GHz Core 2 DuoMcKinley
![Page 65: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/65.jpg)
A Better Space-Time Tradeoff 65
Space & time efficiency Why now?
![Page 66: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/66.jpg)
How’s the Parallel Revolution Going? 66
8MB L3
The PresentParallel interface
combines space & timewicked to program
McKinley
CPU CPU
8KB L1
512KB L2
Pentium 4w/ SMT
CPU CPU
32KB
4MB L2
Core 2 Quad
32KB
CPU CPU
32KB
4MB L2
32KB
system bus
32KB
256KB
32KB 32KB 32KB
256KB
256KB
256KB
Core i7
CPUs
![Page 67: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/67.jpg)
How’s the Parallel Revolution Going? 67
The FutureA parallel ecosystem?
space time efficiency
Parallel software stackruntime
applicationsalgorithms
McKinley
![Page 68: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/68.jpg)
How’s the Parallel Revolution Going? 68
Software Challenges and Opportunities
Communication (efficient coherency)Analysis (off critical path, new analyses)GC (concurrent, parallel, high throughput)JIT (concurrent, parallel, more aggressive)Heterogeneity (exploit it)Memory (PCM, bandwidth limits)
McKinley
![Page 69: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/69.jpg)
How’s the Parallel Revolution Going? 69
HardwareChallenges and Opportunities
Heterogeneity– Tune cores to specific workloads?– Specialize for workloads?
Coherence– SMT coherency does not scale– Software guarantees for simplified protocols?
Memory/Cache– Optimize access behavior of managed
languages
McKinley
![Page 70: How’s the Parallel Computing Revolution Going?](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568164c9550346895dd6e4c6/html5/thumbnails/70.jpg)
How’s the Parallel Revolution Going? 70
The Future?
McKinley
Thank you