can we save energy if we allow errors in computing? janak h. patel department of electrical and...

Can we Save Energy if we allow Errors in Computing?

Janak H. PatelDepartment of Electrical and Computer Engineering

University of Illinois at [email protected]

Auburn UniversityFebruary 18, 2016

2

OutlineWhy Energy versus Accurate Computing?An example of Energy versus Accuracy TradeoffChallenge to the Accepted Tradeoff Model

A Thought Experiment Two Real Experiments

Potential exploits of the new hypothesis Power savings in large data-centers Predicting early failures in microprocessors

3

Why Energy vs. Accuracy?Manufacturing Process Variations unavoidable

Gate Oxide thickness (TOX) Random Doping Fluctuations (RDF) Device geometry, Lithography (W, L) Resulting Transistor Threshold Variations (σVT)

Designing for Worst Case for error-free computing Wastes Energy, Performance and Yield

Designing for Typical Case may cause errors Tolerating occasional errors saves Energy, gives

higher Performance and improves Yield In Future technologies, Errors are inevitable

4

Within-Die Variation Example

“TeraflopResearch Chip”Courtesy Intel,

ISSCC 2010

80 Identical CoresBut different FmaxWithin a single chip

5

Designing with a Perfect Process

S63 S62 S61 S60 S3 S2 S1 S0

15ps60ps 45ps 30ps960ps

Carry Propagations

Slack time/Guard band/

Safety margin

960ps

40ps

A 1GHz 64-bit Adder

1000ps

A0 B0A1 B1A63 B63

FullAdder

SC

15ps

15ps

6

Modeling Process VariationsMeasured Data from FabricationAnalytical Process Models

Die-to-Die and Within-Die Variations

FullAdder

S

C

15ps±2ps

15ps±2ps

15ps 20ps10ps

7

Worst Case Design

S63 S62 S61 S60 S3 S2 S1 S0

≈15ps≈60ps ≈45ps ≈30ps960 ± 128ps

Carry Propagations830ps

A 64-bit Adder with Process Variations

1090ps

260ps

Increase Voltage to meet 1ns cycle time. Costs More Energy.Increase Cycle Time to 1.1ns. Costs in Performance.Throw away the Slow Chips! Loss in Yield and Revenue

A0 B0A1 B1A63 B63

FullAdder

S

C15 ± 2ps

15 ± 2ps

8

Saving Energy and Performance

S63 S62 S61 S60 S3 S2 S1 S0

≈15ps≈60ps ≈45ps ≈30ps≈960ps

Carry Propagations

≈960ps

A 64-bit Adder

1000ps

Errors in most significant bits on some computationsIf we can live with these errors, we case save Energy, Performance and Yield.

830ps 1090ps

A0 B0A1 B1A63 B63

9

Energy Savings with Errors

Register

ALU

Register

Register

Clock1 GHz 960 ps

±130 ps

ALU Outputs

0ps 960ps

1

2

ErrorsPer

Min/Sec

Voltage Reduction830ps 1090ps

10

Basics of Power and EnergyEnergy to Switch a Logic Gate Output

½ CV2 JoulesC is capacitance as seen by the Gate OutputV is the Supply Voltage

Energy spent by the chip during one Clock Cycle kV2 Joules

k is a constant reflecting average chip Activity and Total Gate Capacitance

Power for the chip running at F cycles/sec kV2F Joules/sec or Watts

11

Power or Energy? Common PitfallsEnergy = kV2 Joules per Clock CyclePower = kV2F Joules/secA Task that takes N Cycles to compute, consumes

kV2N Joules Independent of Frequency!!Some Observations

Modern Chips also have Static or DC power, S.Power = S + kV2F Joules/secN-cycle Task, completes in N/F sec and consumes

(S + kV2F)*(N/F) = SN/F + kV2N JoulesLowering Power by Frequency increases Energy!

12

Energy Savings with Errors

Register

ALU

Register

Register

Clock1 GHz 960 ps

±130 ps

ALU Outputs

0ps 960ps

1

2

ErrorsPer

Min/Sec

Voltage Reduction830ps 1090ps

13

Protecting against Errors If the error rate from added delays remains

relatively small, we can utilize some of the established techniques Circuit Level – iROC, Razor, etc. Error Coding – Parity codes, Arithmetic codes,

Residue codes, Parity prediction, Algorithm-based fault-tolerance, TMR etc.

Time Redundancy – RESOWhat if the error rate is massive?Are massive errors possible in a good chip?

14

Signal Propagations

CombinationalLogic

PresentState

NextState

FFs FFs

clock

0 900 960 1000 ps

Calculate from timing analysis

How many flip-flops on critical-paths?

Threshold

15

Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay

of p picoseconds How many FFs are in bins 900ps to 950ps?

Path Delay in picoseconds →


16

Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay

of p picoseconds How many FFs are in bins 900ps to 950ps?

Path Delay in picoseconds →


17

Signal Propagations

CombinationalLogic

PresentState

NextState

FFs 1M FFs

clock

0 900 960 1000 ps

200,000? 100,000? 50,000?


18

A Thought ExperimentLet us conservatively assume 100,000 ffs are on

critical paths (10% of total)Now Reduce the supply Voltage, thus increasing

DelayAssume just 10% of critical ffs get its inputs late

this cycle This implies 10,000 flip-flops produce errors in a

single clock cycle!Massive number of errors result in a few clock

cycles

19

Do your own AnalysisAnalytical Procedure

Estimate number (n) of FFs on critical paths from timing analysis or synthesis report -

Compute percent (p) of FFs that switch per cycle from functional simulation

p X n = errors/cycleFor example:

Total Number of Flip-Flops: 400,000 5% of these are on critical paths: n = 20,000 FFs Only 5% of these switch every cycle: p = 5% Error Rate = p X n = 1000 errors/cycle In 10 consecutive clock cycles: 10,000 errors!

20

A new hypothesis

Errors/Cycle

104

106 NEW(large chips)

1

2 OLD(small chips)

Errors

PerMin/sec Zero errors

Massiveerrors

Reduce Supply Voltage Reduce Supply Voltage

21

Hypothesis of Critical Operation Point In large CMOS circuits there exists a Critical

Voltage VC for a fixed Frequency F and ambient temperature T, such that Any voltage below VC , massive errors result Any voltage above VC , no delay related errors

occurNo tradeoff is possible between Energy and

Errors in large chips No practical ways exist to handle massive errors! No error handling needed above VC

22

Experiments to disprove the hypothesisSubject a large chip to slowly decreasing supply

voltage, keeping the Frequency fixed. At each step, exercise the chip extensively and

monitor continuously for any errors

23

Experiments on MicroprocessorsPowerPC 750

2.5V, 233MHz Voltage reduced in steps of 10 mV A C-program monitored the errors while stressing

the processorPentium-M

(1.308V, 2GHz) down to (0.700V, 0.798GHz) For each of the 9 different frequencies, Voltage

reduced in steps of 16 mV Third-party program to keep the cpu 100% busy

and report errors (more like a power virus!)

24

Experiment to find which of these two?

Errors/Cycle

104

106NEW

Decrease Supply Voltage

1

2 OLD

Errors

PerMin/sec

Decrease Supply Voltage

25

Experimental Set-Up for PowerPCA Single-Board-Computer with PowerPC 750

233MHz, 2.5V Power Supply A Hewlett-Packard E3631A Power Supply

Digital control in units of 10 miliVolts steps A Blow-Drier to raise the ambient temperature

A Program written to stress all major functional blocks Tried to maximize execution rate (load) Tried to maximize logic switching rate Every operation was checked against known good

values and instantly reported for any error

27

Results of Lowering Supply VoltagePower PC-750 μP

Chip No.

No. Tests

Critical Supply Voltage VC

Observations System Hangs

Program Crashed

1 45 1.99 V 31 14 2 35 2.00 V 26 9 3 25 2.10 V 18 7 4 25 2.08 V 17 8

Nominal Supply Voltage of 2.5 V is reduced in steps of1/100th Volt with clock frequency constant at 233MHz

No Data Error was ever Observed at user visible Registers!

28

Intel Pentium-M Processor“Speed step” technology with 9 different speedsRated at 2GHz at core voltage of 1.308VExperiment

Choose one of the nine frequencies F1, … , F9 While keeping cpu 100% busy reduce the voltage

in steps of 16mV Monitor for errors

No errors observed – only crashes!

29

Experiment on Pentium-M

No errors!

Catastrophic failure!

30

Some Remarks on ExperimentPossible explanation for the observations

A modern processor has a large number of flip-flops that are not user visible

e.g. Pre-fetch buffers, history tables, reservation stations, write buffers, and state controllers for everything from moving instructions and data to controlling a cache

Control Logic fails simultaneously with ALU datapath

Massive errors in control and data in a single cycle Instruction flow is completely disrupted. Therefore

no error could be reported. Catastrophic failure!

32

Exploiting Process Variations If the “critical operation point hypothesis” holds

Above critical supply voltage VC no errors occur. In large data-centers with 1000’s of

microprocessors, operating each with the lowest VC for a given frequency can save lots of power

As the number of cores approach 100 or more, it would be imperative to use different voltage-frequency pair (FC, VC) for each core on the same die.

33

Dynamic Power Savings in Pentium-M

34

Energy Saving Ideas for Microprocessors

Simplify CPU Remove Floating Point Unit Remove Divider Remove Multiplier Reduce Word Width

From 32 bits to 16 or 8 bitsSimplify Memory Hierarchy

Remove Virtual Addresses Manage Memory Hierarchy explicitly

35

Accuracy vs. QualityEnergy and Quality of Solution Tradeoff

All Iterative Algorithms that improve the Quality of Solution with more iterations (more Energy)

Simulated Annealing, Genetic, Steepest Decent All Algorithms that use Grid of various dimensions,

improve the Quality of Solution with finer Grid, or more Resolution (more Energy)

Image Processing, Weather Modeling, Device Physics Simulation

Signal Processing Signal/Noise Ratio, Less Power, More Errors

36

Final ObservationsFor Large Chips and Data-centers

There is no possible Trade-off between Energy and Inaccurate Computing

Must avoid confusion between Energy and Power

Frequency can not affect the Energy of a TaskLower Frequency implies Higher Energy!

Critical Voltage is easy to determine for a chip Aging of Chips can be easily detected All future researchers must keep in mind the

large number of flips-flops on Critical Paths!

37

Questions? Comments?

can we save energy if we allow errors in computing? janak h. patel department of electrical and...

Documents