hierarchically focused guardbanding: an adaptive approach to mitigate pvt variations and aging
DESCRIPTION
Hierarchically Focused Guardbanding: An Adaptive Approach to Mitigate PVT Variations and Aging. Abbas Rahimi, Luca Benini , Rajesh K. Gupta UC San Diego and Università di Bologna. Outline. Device Variability Process, voltage, and temperature, and aging Resilient Techniques - PowerPoint PPT PresentationTRANSCRIPT
Hierarchically Focused Guardbanding: An Adaptive Approach to Mitigate PVT Variations and Aging
Abbas Rahimi, Luca Benini, Rajesh K. GuptaUC San Diego and Università di Bologna
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 2
• Device Variability– Process, voltage, and temperature, and aging
• Resilient Techniques• Hierarchically Focused Guardbanding
– Analysis Flow for Timing Error Rate– Parametric Model Fitting
• Hierarchical Sensors Observability • Online Utilization of HFG
– Throughput improvement• Conclusion
Outline
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 3
• Variability in transistor characteristics is a major challenge in nanoscale CMOS, PVTA– Static Process variation: effective transistor channel length and threshold
voltage– Dynamic variations: Temperature fluctuations, supply Voltage droops, and
device Aging (NBTI, HCI)• To handle variations designers use conservative guardbands loss of
operational efficiency
Ever-increasing PVTA Variations
Temperature
Clock
actual circuit delay guardband
Aging VCC Droop Across-wafer Frequency
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 4
I. Sense & AdaptObservation using in situ monitors (Razor, EDS) with cycle-by-cycle corrections (leveraging CMOS knobs or replay)
Resilient Techniques
1ns
4ns3ns
5ns
Sense (detect)
Adapt (correct)
SensorsModel
1ns
4ns3ns
5ns
Prevent
II. Predict & PreventRelying on external or replica monitors Model-based rule
derive adaptive guardband to prevent error
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 5
Our Resilient View
[ILV] A. Rahimi, L. Benini, R. K. Gupta, “Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ,” DATE, 2012. [SLV] A. Rahimi, L. Benini, R. K. Gupta, “Application-Adaptive Guardbanding to Mitigate Static and Dynamic Variability,” IEEE Tran. on Computer, 2013.[PLV] A. Rahimi, L. Benini, R. K. Gupta, “Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters ,” ISLPED, 2012.[TLV] A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, L. Benini, “Variation-Tolerant OpenMP Tasking on Tightly-Coupled Processor Clusters,” DATE, 2013.
Instruction-level Vulnerability (ILV)
Sequence-level Vulnerability (SLV)
Procedure-level Vulnerability (PLV)
Task-level Vulnerability (TLV)
I. Sense & AdaptWe have done cross-layer vulnerability analysis: Manifestation of variability from instruction-level to task-level
II. Model & PreventIn this work, we present Hierarchically Focused Guardbanding (HFG), a model-based rule to derive guardband adaptively, for avoiding PVTA-induced timing error.
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 6
1. A new high-level model for Timing Error Rate of various integer as well as floating-point functional units (FUs) in presence of PVTA variations. Online: a model-based rule to derive guardband from the PVTA
sensor readings Offline: identifying vulnerable FUs
2. Notion of Hierarchically “Focused” Guardbanding (HFG) which is guided by online utilization of the model in view of monitors, observation granularity, and reaction times.
3. Applying HFG on GPU at two distinct granularities:i. Fine-grained granularity of instruction-by-instruction
monitoring and adaptive guardbanding ii. Coarse-grained granularity of kernel-level monitoring and
adaptive guardbanding
Contributions
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 7
• The model takes into account1. PVTA parameter variations 2. Clock frequency3. Physical details of Placed-and-Routed FUs
in 45nm TSMC technology• Analyzed FUs:
10 32-bit integer 15 single precision floating-point (fully
compatible with the IEEE 754 standard)• A full permutation of PVTA parameters and
clock frequency are applied.• For each FUi working with tclk and a given PVTA
variations, we defined Timing Error Rate (TER):
HFG Analysis Flow for TER
Design Compiler
IC Compiler
DesignWareLibs
FUs VERILOG
45nm Corners
Libs
PrimeTime
45nm Process VA Libs
Variable Parameters
Netlist&SPEF
VoltageTemp.
Processtclk
Vth
Timing Error Rate Analysis
SSTA &STA
MATLAB Linear
Classifier
Parametric Model
Start Point
End Point Step # of
PointsVoltage 0.88V 1.10V 0.01V 23
Temperature 0°C 120°C 10°C 13Process (σWID) 0% 9.6% 3.2% 4Aging (∆Vth) 0mV 100mV 25mV 5
tclk 0.2ns 5.0ns 0.2ns 25
i clki clk
i
CriticalPaths (FU ,t ,V,T,P,A) 100TER (FU ,t ,V,T,P,A)
Paths (FU )
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 8
• We used Supervised learning (linear discriminant analysis) to generate a parametric model at the level of FU that relates PVTA parameters variation and tclk to classes of TER.
• On average, for all FUs the resubstitution error is 0.036, meaning the models classify nearly all data correctly.
• For extra characterization points, the model makes correct estimates for 97% of out-of-sample data. The remaining 3% is misclassified to the high-error rate class, CH, thus will have safe guardband.
Parametric Model Fitting
HFG ASIC Analysis Flow for TER
PVTA tclk
TER=0% 33%>= TER >0% 66%>= TER >33% 100%>= TER >66%Class0 (C0) ClassLow (CL) ClassMedium (CM) ClassHigh (CH)
Classes of TER
TER TER Class
K
1,...,K 1
ˆ ( | ) ( | )arg miny k
y P k x C y k
1
σ0.5σ
1 1 ( )( | ) exp( M ( ))2(2 M )
TxP x k xk k
( | ) ( )( | )( )
P k x P kP k xP x
K
1
cost( ) ( | ) ( | )i
k P i x C k i
Linear discriminant analysis
Parametric Model
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 9
• During design time the delay of the FP adder has a large uncertainty of [0.73ns,1.32ns], since the actual values of PVTA parameters are unknown.
Delay Variation and TER Characterization (P,?,?,?) (P,A,?,?) (P,A,T,?)(P,A,T,V)
P(σWID)=0%
A(∆Vth)=0mV
T=0°CV=1.10V
…V=0.88V
… …
T=120°CV=1.10V
…V=0.88V
… … …
A(∆Vth)=100mV
T=0°CV=1.10V
…V=0.88V
… …
T=120°CV=1.10V
…V=0.88V
… … … …
P(σWID)=9.6%
A(∆Vth)=0mV
T=0°CV=1.10V
…V=0.88V
…
T=120°CV=1.10V
…V=0.88V
… … …
A(∆Vth)=100mV
T=0°CV=1.10V
…V=0.88V
… …
T=120°CV=1.10V
…V=0.88V
Delay(ns)
0
50
100
0.90.95
11.05
1.1
0
20
40
60
80
100
Temperature (°C)VDD (V)
Tim
ing
Err
or R
ate
(%)
20
40
60
80
0
50
1000.9 0.95 1 1.05 1.1
0
20
40
60
80
100
Temperature (°C)VDD (V)
Tim
ing
Erro
r Rat
e (%
)
0
20
40
60
80
(P(σWID) = 0%, A(∆Vth)=100mV)(P(σWID) = 0%, A(∆Vth)=0mV)
0
50
100
0.90.95
11.05
1.1
0
20
40
60
80
100
Temperature (°C)VDD (V)
Tim
ing
Erro
r Rat
e (%
)
0
20
40
60
80
(P(σWID) = 9.6%, A(∆Vth)=0mV)
020406080100120
0.90.95
11.05
1.1
0
20
40
60
80
100
Temperature (°C)VDD (V)
Tim
ing
Erro
r Rat
e (%
)
20
40
60
80
(P(σWID) = 9.6%, A(∆Vth)=100mV)
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 10
• The question is that mix of monitors that would be useful?
• The more sensors we provide for a FU, the better conservative guardband reduction for that FU.
Hierarchical Sensors Observability
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
t clk
(ns)
FP_expFP_add
INT_mac
P_sensor PA_sensors PAT_sensors PATV_sensors• The guardband of
FP adder can be reduced up to • 8% (P_sensor), • 24% (PA_sensors), • 28% (PAT_sensors), • 44% (PATV_sensors)
Sensor overheads:In-situ PVT sensors impose 1−3% area overhead
[Bowman’09]Five replica PVT sensors increase area of by 0.2%
[Lefurgy’11]The banks of 96 NBTI aging sensors occupy less
than 0.01% of the core's area [Singh’11]
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 11
• The control system tunes the clock frequency through an online model-based rule.
Online Utilization of HFG
FUk
FUj
TER raw data Classifier
Parametric Model
PATV_configtarget_TER
P A T V tclk
… … … … …
PATV
Sensor
CLKcontrol
FUi
max
P (2-bit)A (3-bit)T (3-bit)V (3-bit)instruction
tclk(5-bit)
LUTsGPU
SIMD IF
offlineonline
• To support fast controller's computation, the parametric model generates distinct Look Up Tables (LUTs) for every FUs
• We apply HFG to architecture at two granularities1. Fine-grained granularity of instruction-by-instruction monitoring and
adaptation that signals of PATV sensors come from individual FUs2. Coarse-grained granularity of kernel-level monitoring uses a
representative PATV sensors for the entire execution stage of pipeline
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 12
Throughput benefit of HFG
1. At kernel-level monitoring, on average, the throughput increases by 70%, when the PE moves from only P_sensor to PATV_sensors scenario. The target TER is set to “0” in preference to the error-intolerant applications.
2. Instruction-by-instruction monitoring and adaptation improves the throughput by 1.8×−2.1× depends to the PATV sensors configuration and kernel's instructions.
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Thro
ughp
ut (G
IPS)
P_sensor PA_sensors PAT_sensors PATV_sensors
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Thro
ughp
ut (G
IPS)
P_sensor PA_sensors PAT_sensors PATV_sensors
Apr 22, 2023 Rajesh K. Gupta / UC San Diego 13
• We present a model ‡ and its usage for online variation-aware resource management as well as design time analysis of vulnerable functional units through an accurate 45nm TSMC flow.
• The model is used as an adaptive resource management technique to proactively prevent timing error by applying a focused guardbanding.
• We demonstrate the effectiveness of HFG on GPU architecture at two granularities of observation and adaptation: (i) fine-grained instruction-level; and (ii) coarse-grained kernel-level.
Conclusion
‡publicly available for download at: http://mesl.ucsd.edu/site/PVTA_MODELS/models.htm
Apr 22, 2023 14
Thank You!
Rajesh K. Gupta / UC San Diego
NSF Variability ExpeditionERC MultiTherman