process variation in near-threshold wide simd architectures
DESCRIPTION
Process Variation in Near-threshold Wide SIMD Architectures. Sangwon Seo 1 , Ronald G. Dreslinski 1 , Mark Woh 1 , Yongjun Park 1 , Chaitali Chakrabarti 2 , Scott Mahlke 1 , David Blaauw 1 , Trevor Mudge 1 University of Michigan 1 , Arizona State University 2. Near Threshold Computing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/1.jpg)
11
1
Process Variation in Near-threshold Wide SIMD Architectures
Sangwon Seo1, Ronald G. Dreslinski1, Mark Woh1, Yongjun Park1,Chaitali Chakrabarti2, Scott Mahlke1, David Blaauw1, Trevor Mudge1
University of Michigan1, Arizona State University2
![Page 2: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/2.jpg)
22
2Near Threshold Computing
Super Threshold high performance
high energy consumption
Near Threshold 10x energy reduction
10x performance degradation
Sub Threshold exponentially decreasing
performance
increasing leakage becomes dominant
2
![Page 3: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/3.jpg)
33
3Near-threshold Computing
Advantage: High energy efficiency
Disadvantage Low performance throughput
Compensated with very wide SIMD architecture
Sensitive to variations in threshold voltage
More critical issues in wide SIMD architectures Increased probability of timing errors
Expensive error recovery mechanisms
3
![Page 4: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/4.jpg)
44
4Near-threshold Computing
Advantage: High energy efficiency
Disadvantage Low performance throughput
Compensated with very wide SIMD architecture
Sensitive to variations in threshold voltage
More critical issues in wide SIMD architectures Increased probability of timing errors
Expensive error recovery mechanisms
How bad is the delay variation in wide SIMD architectures running at near-threshold voltages?
How to mitigate the variation-induced timing errors?
4
![Page 5: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/5.jpg)
55
5Delay Variations in 90nm
5
~2.3x ~1.6x
Uncorrelated variations are averaged out over the chain.
![Page 6: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/6.jpg)
66
6Delay Variations – f(Vdd=0.55V, N)
6
A long chain helps, but the effect diminishes as N increases.
Variations are exacerbated with technology scaling.
![Page 7: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/7.jpg)
77
7Delay Variations – f(Vdd, N=50)
7
LER causes high variations in advanced technology nodes
Strict Design Rules
Metal-Gates w/ high-k material or SOI
Advanced lithography
![Page 8: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/8.jpg)
88
8Delay Distribution – 90nm GP
8
1 critical path delay = delay of a chain of 50 FO4 inverters.
1-wide system delay = max (delays of 100 critical paths )
128-wide system delay = max (delays of 128 1-wide system)
Performance Drop
![Page 9: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/9.jpg)
99
9Variation Effects on 128-wide SIMD Architecture
9
- Structural Duplication- Voltage margining- Frequency margining
![Page 10: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/10.jpg)
1010
10Near-threshold Wide SIMD Architecture: Diet SODA
10
[Seo et al. ISLPED 2010]
![Page 11: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/11.jpg)
1111
11Structural Duplication
11
SIMD Function Unit #7
SIMD Function Unit #6
SIMD Function Unit #5
SIMD Function Unit #4
SIMD Function Unit #3
SIMD Function Unit #2
SIMD Function Unit #1
SIMD Function Unit #0
SIMD Function Unit #9
SIMD Function Unit #8
Crossbar
Datapath#7
Datapath#6
Datapath#5
Datapath#4
Datapath#3
Datapath#2
Datapath#1
Datapath#0
8-wide+2-spare system
Increase number of processing resources
![Page 12: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/12.jpg)
1212
12Structural Duplication
12
SIMD Function Unit #7
SIMD Function Unit #6
SIMD Function Unit #5
SIMD Function Unit #4
SIMD Function Unit #3
SIMD Function Unit #2
SIMD Function Unit #1
SIMD Function Unit #0
SIMD Function Unit #9
SIMD Function Unit #8
Crossbar
Datapath#6
Datapath#6
Datapath#5
Datapath#4
Datapath#3
Datapath#2
Datapath#1
Datapath#0
8-wide+2-spare system
Use the spares if required.
![Page 13: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/13.jpg)
1313
13Structural Duplication – 90nm GP
13
6 spares are required to match the chip delay of baseline architecture.
![Page 14: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/14.jpg)
1414
14Voltage Margining
14
Delay distributions: 45nm PTM model is used
Increase supply voltage
![Page 15: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/15.jpg)
1515
15Frequency Margining
Increase clock period
Applicable for applications with relaxed time constraints
For advanced technology nodes, this is impractical
Caveat
Consider its impact on system
SIMD subsystem clock period (Tclk@NTV)
memory subsystem clock period (Tclk@FV)
15
![Page 16: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/16.jpg)
1616
16Structural Duplication vs. Voltage Margining
16
![Page 17: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/17.jpg)
1717
17Combination of two schemes – 45nm GP
17
128-wide system @ 0.6V
26 spares
17mV boost
5mV + 8 spares
10mV + 2 spares
![Page 18: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/18.jpg)
1818
18Variation-Aware Diet SODA
18
![Page 19: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/19.jpg)
1919
19Conclusions
Near-threshold operation of wide SIMD system can have timing problems due to process variations.
Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non-negligible for current/future technology nodes.
A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures.
19
![Page 20: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/20.jpg)
2020
20Questions?
Thank you!
20
![Page 21: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/21.jpg)
2121
21Backup Slides
21
![Page 22: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/22.jpg)
2222
22Local Spares vs. Global Spares
22
Local Sparing 1 out of 4
(2 spares)
Global Sparing
(2 spares)
+ small overhead
- burst errors
+ burst errors
- Large overhead
![Page 23: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/23.jpg)
2323
23Local Spares vs. Global Spares
23
Global sparing is better than local sparing.
XRAM crossbar supports global sparing.
128 + 8 global spares
128 + 32 local spares(1 out of 4)
![Page 24: Process Variation in Near-threshold Wide SIMD Architectures](https://reader030.vdocuments.mx/reader030/viewer/2022032710/56813af1550346895da36be6/html5/thumbnails/24.jpg)
2424
24Variation-Aware Diet SODA
24
With little area and power overhead, delay variations can be solved.