can we save energy if we allow errors in computing? janak h. patel department of electrical and...
DESCRIPTION
3 Why Energy vs. Accuracy? Manufacturing Process Variations unavoidable Gate Oxide thickness (T OX ) Random Doping Fluctuations (RDF) Device geometry, Lithography (W, L) Resulting Transistor Threshold Variations ( σ V T ) Designing for Worst Case for error-free computing Wastes Energy, Performance and Yield Designing for Typical Case may cause errors Tolerating occasional errors saves Energy, gives higher Performance and improves Yield In Future technologies, Errors are inevitableTRANSCRIPT
Can we Save Energy if we allow Errors in Computing?
Janak H. PatelDepartment of Electrical and Computer Engineering
University of Illinois at [email protected]
Auburn UniversityFebruary 18, 2016
2
OutlineWhy Energy versus Accurate Computing?An example of Energy versus Accuracy TradeoffChallenge to the Accepted Tradeoff Model
A Thought Experiment Two Real Experiments
Potential exploits of the new hypothesis Power savings in large data-centers Predicting early failures in microprocessors
3
Why Energy vs. Accuracy?Manufacturing Process Variations unavoidable
Gate Oxide thickness (TOX) Random Doping Fluctuations (RDF) Device geometry, Lithography (W, L) Resulting Transistor Threshold Variations (σVT)
Designing for Worst Case for error-free computing Wastes Energy, Performance and Yield
Designing for Typical Case may cause errors Tolerating occasional errors saves Energy, gives
higher Performance and improves Yield In Future technologies, Errors are inevitable
4
Within-Die Variation Example
“TeraflopResearch Chip”Courtesy Intel,
ISSCC 2010
80 Identical CoresBut different FmaxWithin a single chip
5
Designing with a Perfect Process
S63 S62 S61 S60 S3 S2 S1 S0
15ps60ps 45ps 30ps960ps
Carry Propagations
Slack time/Guard band/
Safety margin
960ps
40ps
A 1GHz 64-bit Adder
1000ps
A0 B0A1 B1A63 B63
FullAdder
SC
15ps
15ps
6
Modeling Process VariationsMeasured Data from FabricationAnalytical Process Models
Die-to-Die and Within-Die Variations
FullAdder
S
C
15ps±2ps
15ps±2ps
15ps 20ps10ps
7
Worst Case Design
S63 S62 S61 S60 S3 S2 S1 S0
≈15ps≈60ps ≈45ps ≈30ps960 ± 128ps
Carry Propagations830ps
A 64-bit Adder with Process Variations
1090ps
260ps
Increase Voltage to meet 1ns cycle time. Costs More Energy.Increase Cycle Time to 1.1ns. Costs in Performance.Throw away the Slow Chips! Loss in Yield and Revenue
A0 B0A1 B1A63 B63
FullAdder
S
C15 ± 2ps
15 ± 2ps
8
Saving Energy and Performance
S63 S62 S61 S60 S3 S2 S1 S0
≈15ps≈60ps ≈45ps ≈30ps≈960ps
Carry Propagations
≈960ps
A 64-bit Adder
1000ps
Errors in most significant bits on some computationsIf we can live with these errors, we case save Energy, Performance and Yield.
830ps 1090ps
A0 B0A1 B1A63 B63
9
Energy Savings with Errors
Register
ALU
Register
Register
Clock1 GHz 960 ps
±130 ps
ALU Outputs
0ps 960ps
1
2
ErrorsPer
Min/Sec
Voltage Reduction830ps 1090ps
10
Basics of Power and EnergyEnergy to Switch a Logic Gate Output
½ CV2 JoulesC is capacitance as seen by the Gate OutputV is the Supply Voltage
Energy spent by the chip during one Clock Cycle kV2 Joules
k is a constant reflecting average chip Activity and Total Gate Capacitance
Power for the chip running at F cycles/sec kV2F Joules/sec or Watts
11
Power or Energy? Common PitfallsEnergy = kV2 Joules per Clock CyclePower = kV2F Joules/secA Task that takes N Cycles to compute, consumes
kV2N Joules Independent of Frequency!!Some Observations
Modern Chips also have Static or DC power, S.Power = S + kV2F Joules/secN-cycle Task, completes in N/F sec and consumes
(S + kV2F)*(N/F) = SN/F + kV2N JoulesLowering Power by Frequency increases Energy!
12
Energy Savings with Errors
Register
ALU
Register
Register
Clock1 GHz 960 ps
±130 ps
ALU Outputs
0ps 960ps
1
2
ErrorsPer
Min/Sec
Voltage Reduction830ps 1090ps
13
Protecting against Errors If the error rate from added delays remains
relatively small, we can utilize some of the established techniques Circuit Level – iROC, Razor, etc. Error Coding – Parity codes, Arithmetic codes,
Residue codes, Parity prediction, Algorithm-based fault-tolerance, TMR etc.
Time Redundancy – RESOWhat if the error rate is massive?Are massive errors possible in a good chip?
14
Signal Propagations
CombinationalLogic
PresentState
NextState
FFs FFs
clock
0 900 960 1000 ps
Calculate from timing analysis
How many flip-flops on critical-paths?
Threshold
15
Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay
of p picoseconds How many FFs are in bins 900ps to 950ps?
Path Delay in picoseconds →
How many flip-flops on critical-paths?
16
Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay
of p picoseconds How many FFs are in bins 900ps to 950ps?
Path Delay in picoseconds →
How many flip-flops on critical-paths?
17
Signal Propagations
CombinationalLogic
PresentState
NextState
FFs 1M FFs
clock
0 900 960 1000 ps
200,000? 100,000? 50,000?
How many flip-flops on critical-paths?
18
A Thought ExperimentLet us conservatively assume 100,000 ffs are on
critical paths (10% of total)Now Reduce the supply Voltage, thus increasing
DelayAssume just 10% of critical ffs get its inputs late
this cycle This implies 10,000 flip-flops produce errors in a
single clock cycle!Massive number of errors result in a few clock
cycles
19
Do your own AnalysisAnalytical Procedure
Estimate number (n) of FFs on critical paths from timing analysis or synthesis report -
Compute percent (p) of FFs that switch per cycle from functional simulation
p X n = errors/cycleFor example:
Total Number of Flip-Flops: 400,000 5% of these are on critical paths: n = 20,000 FFs Only 5% of these switch every cycle: p = 5% Error Rate = p X n = 1000 errors/cycle In 10 consecutive clock cycles: 10,000 errors!
20
A new hypothesis
Errors/Cycle
104
106 NEW(large chips)
1
2 OLD(small chips)
Errors
PerMin/sec Zero errors
Massiveerrors
Reduce Supply Voltage Reduce Supply Voltage
21
Hypothesis of Critical Operation Point In large CMOS circuits there exists a Critical
Voltage VC for a fixed Frequency F and ambient temperature T, such that Any voltage below VC , massive errors result Any voltage above VC , no delay related errors
occurNo tradeoff is possible between Energy and
Errors in large chips No practical ways exist to handle massive errors! No error handling needed above VC
22
Experiments to disprove the hypothesisSubject a large chip to slowly decreasing supply
voltage, keeping the Frequency fixed. At each step, exercise the chip extensively and
monitor continuously for any errors
23
Experiments on MicroprocessorsPowerPC 750
2.5V, 233MHz Voltage reduced in steps of 10 mV A C-program monitored the errors while stressing
the processorPentium-M
(1.308V, 2GHz) down to (0.700V, 0.798GHz) For each of the 9 different frequencies, Voltage
reduced in steps of 16 mV Third-party program to keep the cpu 100% busy
and report errors (more like a power virus!)
24
Experiment to find which of these two?
Errors/Cycle
104
106NEW
Decrease Supply Voltage
1
2 OLD
Errors
PerMin/sec
Decrease Supply Voltage
25
Experimental Set-Up for PowerPCA Single-Board-Computer with PowerPC 750
233MHz, 2.5V Power Supply A Hewlett-Packard E3631A Power Supply
Digital control in units of 10 miliVolts steps A Blow-Drier to raise the ambient temperature
A Program written to stress all major functional blocks Tried to maximize execution rate (load) Tried to maximize logic switching rate Every operation was checked against known good
values and instantly reported for any error
27
Results of Lowering Supply VoltagePower PC-750 μP
Chip No.
No. Tests
Critical Supply Voltage VC
Observations System Hangs
Program Crashed
1 45 1.99 V 31 14 2 35 2.00 V 26 9 3 25 2.10 V 18 7 4 25 2.08 V 17 8
Nominal Supply Voltage of 2.5 V is reduced in steps of1/100th Volt with clock frequency constant at 233MHz
No Data Error was ever Observed at user visible Registers!
28
Intel Pentium-M Processor“Speed step” technology with 9 different speedsRated at 2GHz at core voltage of 1.308VExperiment
Choose one of the nine frequencies F1, … , F9 While keeping cpu 100% busy reduce the voltage
in steps of 16mV Monitor for errors
No errors observed – only crashes!
29
Experiment on Pentium-M
No errors!
Catastrophic failure!
30
Some Remarks on ExperimentPossible explanation for the observations
A modern processor has a large number of flip-flops that are not user visible
e.g. Pre-fetch buffers, history tables, reservation stations, write buffers, and state controllers for everything from moving instructions and data to controlling a cache
Control Logic fails simultaneously with ALU datapath
Massive errors in control and data in a single cycle Instruction flow is completely disrupted. Therefore
no error could be reported. Catastrophic failure!
32
Exploiting Process Variations If the “critical operation point hypothesis” holds
Above critical supply voltage VC no errors occur. In large data-centers with 1000’s of
microprocessors, operating each with the lowest VC for a given frequency can save lots of power
As the number of cores approach 100 or more, it would be imperative to use different voltage-frequency pair (FC, VC) for each core on the same die.
33
Dynamic Power Savings in Pentium-M
34
Energy Saving Ideas for Microprocessors
Simplify CPU Remove Floating Point Unit Remove Divider Remove Multiplier Reduce Word Width
From 32 bits to 16 or 8 bitsSimplify Memory Hierarchy
Remove Virtual Addresses Manage Memory Hierarchy explicitly
35
Accuracy vs. QualityEnergy and Quality of Solution Tradeoff
All Iterative Algorithms that improve the Quality of Solution with more iterations (more Energy)
Simulated Annealing, Genetic, Steepest Decent All Algorithms that use Grid of various dimensions,
improve the Quality of Solution with finer Grid, or more Resolution (more Energy)
Image Processing, Weather Modeling, Device Physics Simulation
Signal Processing Signal/Noise Ratio, Less Power, More Errors
36
Final ObservationsFor Large Chips and Data-centers
There is no possible Trade-off between Energy and Inaccurate Computing
Must avoid confusion between Energy and Power
Frequency can not affect the Energy of a TaskLower Frequency implies Higher Energy!
Critical Voltage is easy to determine for a chip Aging of Chips can be easily detected All future researchers must keep in mind the
large number of flips-flops on Critical Paths!
37
Questions? Comments?
38
39
40