the dark silicon implications for microprocessors
DESCRIPTION
The Dark Silicon Implications for Microprocessors. Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh , Emily Blem , Renee St. Amant , and Doug Burger. Multicore Decade?. We have relied on multicore scaling for over five years. ?. 2000200520102015. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/1.jpg)
Karu SankaralingamUniversity of Wisconsin-Madison
Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee
St. Amant, and Doug Burger
The Dark Silicon Implications for Microprocessors
![Page 2: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/2.jpg)
Multicore Decade?
2
We have relied on multicore scaling for over five years.
How much longer will it be our primary performance scaling technique?
2000 2005 2010 2015Pentium Extreme
Dual-Core
Core 2 Quad-Core
i7 980x Hex-Core
?
![Page 3: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/3.jpg)
Finding Optimal Multicore Designs
3
Comprehensive design space:Fixed area budgetFixed power budgetTwo sets of CMOS scaling projectionsOptimal core and diverse multicore organizationsParallel benchmarks
For next 5 technology generations, we find the best performing multicore from a
comprehensive design space search for each of the PARSEC benchmarks
![Page 4: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/4.jpg)
Symmetric Multicore Projections
4
Symmetric multicores alone will not sustain the multicore era.
0 2 4 6 8 100
4
8
12
16
20
TargetSymmetric
Year
Spee
dup
3.4x in 10 years
18x
![Page 5: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/5.jpg)
Multicore Solutions
5
Asymmetric Topologies
0 2 4 6 8 100
4
8
12
16
20TargetSym-metric
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20TargetSymmetric
Year
Spee
dup
3.5x
![Page 6: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/6.jpg)
Multicore Solutions
6
Dynamic Topologies
[Chakraborty (2008), Suleman et al (2009)]
0 2 4 6 8 100
4
8
12
16
20 TargetSym-metricAsym-metric
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20TargetSym-metric
Year
Spee
dup
3.5x
![Page 7: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/7.jpg)
Multicore Solutions
7
Composed/Fused Topologies
[Ipek et al (2007), Kim et al (2007)]
0 2 4 6 8 100
4
8
12
16
20 TargetSym-metricAsym-metricDynamic
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20 TargetSym-metricAsym-metric
Year
Spee
dup
3.7x
![Page 8: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/8.jpg)
Multicore Solutions
8
GPU-Style Cores
0 2 4 6 8 100
4
8
12
16
20TargetSym-metricAsym-metricDynamic
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20 TargetSym-metricAsym-metricDynamic
Year
Spee
dup
2.7x
![Page 9: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/9.jpg)
Multicore Era Projections
9
The best designs speed up 14% per year rather than the recent trend of 34% per year
0 2 4 6 8 100
4
8
12
16
20
TargetComposed
Year
Spee
dup
Composed
3.7x
18x
![Page 10: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/10.jpg)
Why Diminishing Returns?
10
Transistor area is still scalingVoltage and capacitance scaling have
slowedResult: designs are power, not area,
limited
![Page 11: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/11.jpg)
Overview
11
![Page 12: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/12.jpg)
Conservative Optimistic
Area 32x 32x
Power 4.5x 8.3x
Frequency 1.3x 3.9x
[Borkar 2007]
[ITRS 2010]
Device Scaling Projections
12
From 45 nm to 8 nm:
![Page 13: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/13.jpg)
0 5 10 15 20 25 30 35 400
5
10
15
20
25
30Intel Nehalem AMD Shanghai Intel Core Intel Atom
SPECmark Score
Pow
er (T
DP,
Wat
ts)
Modeling Ideal Core Power/Perf.
13
Atom
Nehalem
0 5 10 15 20 25 300
5
10
15
20
25
30Intel Nehalem AMD Shanghai Intel Core Intel Atom
SPECmark Score
Pow
er (T
DP,
Wat
ts)
Pareto Frontier includes all optimal power/performance points
Repeat using core area for optimal area/performance points
![Page 14: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/14.jpg)
0 5 10 15 20 25 30 35 400
5
10
15
20
25
30
SPECmark Score
Pow
er (T
DP,
Wat
ts)
Combining Device and Core Models
14
45 nm Frontie
r32 nm Frontie
r
Device Scaling
![Page 15: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/15.jpg)
Overview
15
![Page 16: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/16.jpg)
What belongs in multicore model?
16
Styles
Applications
Topologies
Area & Power / Performance Tradeoffs
Architectures
PARSEC fparallel,Data Use
Number of Threads,
Cache Sizes
Area & Power Budget
Cache & memory latencies, memory
bandwidth
Pareto Frontiers
![Page 17: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/17.jpg)
Multicore Speedup Model
17
MulticoreSpeedup
11-fparallelSerial
Speedup
fparallelParallel
Speedup
=+
![Page 18: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/18.jpg)
Multicore Performance Model
18
Performance is limited by:
and
Memory bandwidthBWmax / (instructions per byte from
memory)
ComputationNcores (core frequency/CPIexe) core
utilization
[Guz et al, 2009]
![Page 19: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/19.jpg)
Core Utilization Model
19
Core utilization is limited by:
Fraction of Time Core is Ready to IssueNumber of Threads in Core / Number of Threads to
Keep Busy
[Guz et al, 2009]
![Page 20: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/20.jpg)
Multicore Model & Pareto Frontiers
200 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
100 points
A(q),
P(q)
q
![Page 21: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/21.jpg)
Translating from SPECmark
21
1. From q, find core’s SPECmark speedup
2. Frequency linearly distributed from Atom to Nehalem
3. Recall: model predicts benchmark performance as f(benchmark chars, frequency, CPIexe)
4. Compute CPIexe such that Benchmark Speedup = SPECmark Speedup
![Page 22: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/22.jpg)
Area and Power Constraints
22
Ncores x A(q) ≤ Area Budget
Ncores x P(q) ≤ Power Budget
Dark silicon = Ncores / # of cores that fit in chip area
![Page 23: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/23.jpg)
Overview
23
![Page 24: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/24.jpg)
Dark Silicon
24
blacksholes bodytrack canneal ferret streamcluster GM0%
20%
40%
60%
80%
100%ITRS Conservative
Perc
enta
ge D
ark
Silic
on
blacksholes bodytrack canneal ferret streamcluster GM0%
20%
40%
60%
80%
100%ITRS Conservative
Perc
enta
ge D
ark
Silic
on
Sources of Dark Silicon:Power + Limited Parallelism
At 22 nm:
At 8 nm:
17%
26%
51%
71%
![Page 25: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/25.jpg)
0 2 4 6 8 100
4
8
12
16
20
ITRS: All TopologiesITRS: Symmetric
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20Conservative: Symmet...
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20Conservative: All Topologies
Year
Spee
dup
0 2 4 6 8 100
4
8
12
16
20
ITRS: All TopologiesITRS: Symmetric
Year
Spee
dup
Overall Performance
25
Target
fparallel = 0.99
18x16x
8x6x3x
![Page 26: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/26.jpg)
ConclusionsMulticore performance gains are limited
Need at least 18%-40% per generation from architecture alone without
additional power26
Unicore Era
Multicore Era ?
![Page 27: The Dark Silicon Implications for Microprocessors](https://reader036.vdocuments.mx/reader036/viewer/2022062501/568161ac550346895dd16be3/html5/thumbnails/27.jpg)
27
Specialization
Shrinking chips
PervasiveEfficiency