methods for true energy- performance optimization

METHODS FOR TRUE ENERGY-

PERFORMANCE OPTIMIZATION

Naga Harika Chinta

OVERVIEW

� Introduction

� Optimization Methods

A. Gate size

B. Supply voltage

C. Threshold voltage

� Circuit level optimization

A. Technology Calibration

B. Sensitivity to gate sizing, supply and threshold voltage

C. Optimization example:64 bit adder

� Micro architectural optimization

A. Choosing optimal circuit topology

B. Parallelism versus pipelining

C. Choosing optimal VDD and VTH

D. Optimal level of parallelism

E. Energy Area tradeoff

� Conclusion

� References

2

INTRODUCTION

� Maximum achievable performance depends on the efficiency of computation per unit energy.

� Optimization is achieved by

- minimizing energy consumption.

- maximizing the amount of computation.

� Optimization performed at 3 layers of abstraction:

-System architecture (outer layer).

-Micro architectural (intermediate).

-Fixed circuit topology(inner layer).

� Inner layer deals with:

-Circuit-specific supply voltage(VDD).

-Threshold voltage(VTH).

-Gate sizing. 3

OPTIMIZATION METHODS

� The tradeoff achieved by tuning some design variable x is given by the sensitivity to variable x:

� Tuning knobs in the circuit optimization:

-Gate size.

-Supply voltage.

-Threshold voltage.

� Gate sizing: achieved by TILOS(Timed logic synthesizer), a component that adjusts the transistor sizes and reorders series devices to meet performance optimizations.

� Supply Voltage: employed in low-power DSP.

� Threshold Voltage: became more prominent with leakage power consumption. 4

CIRCUIT-LEVEL OPTIMIZATION:

TECHNOLOGY CALIBRATION

A. Delay Model:

� Alpha-power law model is used as a baseline for gate-delay model

B. Energy Model:

� 2 components of energy are considered:

1. Switching Component:

2. Leakage Component:

5

REFERENCE DESIGN PROPERTIES

6

SENSITIVITY TO GATE SIZING

� Sensitivity of circuit energy to delay due to change in size of a gate

in ith stage is given by

� Largest potential for energy is achieved with a design of minimum

delay.

� Decrease in gate size decreases leakage current, which decreases

cycle time D, increasing the leakage energy.

� To achieve equal sensitivity in all stages, the difference in effective

fan-out's must increase in proportion to the energy of the gate.

7

SENSITIVITY TO SUPPLY VOLTAGE

� Sensitivity of circuit energy to delay increase due to global supply

reduction is given by,

� Design sized for minimum delay at maximal supply voltage offers

the greatest potential for energy reduction.

� Decrease in Supply voltage:

- decreases switching and leakage energy

- results in negative sensitivity 8

SENSITIVITY TO THRESHOLD VOLTAGE

� Sensitivity of circuit energy to delay due to threshold voltage is

given by,

� The sensitivity here decays exponentially with increase in Vth.

� Reduced Vdd is achieved:

- by lowering the threshold voltage while maintaining circuit

speed for designs with low leakage.

� Total energy is minimized:

- when leakage and switching components are comparable to

each other.9

64-BIT ADDER

10

16-bit kogge-Stone tree adder:

� All paths are assumed to be critical paths.

� Calculates the carry signals in advance, based on the input signals.

� Carry propagation is recursive.

Boolean Equations:

Pi= Ai ⊕⊕⊕⊕ Bi Carry propagate

Gi= Ai.Bi Carry generate

Si= Pi⊕⊕⊕⊕ Ci Sum

Ci+1= Gi+ PiC Carry out

� Signals P and G only depend on the input bits

16-BIT KOGGE STONE TREE ADDER

11

ENERGY MAP FOR 64-BIT ADDER

12

COMPARISON BETWEEN DIFFERENT

LOGIC CIRCUITS

� Superior performance achieved with gate sizing at small incremental

delays.

� Data below shows the potential energy reduction due to W, Vdd and

Vth:

13

COMPARISON BETWEEN DIFFERENT

LOGIC CIRCUITS (CONT’D)

� Supply voltage becomes more prominent at larger delays providing

larger energy savings.

� Figure below shows the energy-delay sensitivity to each of the

tuning variables in adder example.

14

OPTIMAL ENERGY TRADE-OFF IN 64-BIT

ADDER

� Energy-delay tradeoff curve obtained by jointly optimizing gate size, supply and threshold voltage:

� Circuit optimization is effective only in the region of about 30% around reference delay.

15

MICRO ARCHITECTURAL

OPTIMIZATION

� Various abstraction layers in the optimization

16

OPTIMAL CIRCUIT TOPOLOGY

� 64-bit ALU:

� ALU consists of 2 registers to drive 64-bit kogge- Stone tree adder:

1. Simple cycle latches(CL).

2. Static master-slave latch pairs.

� Output load (CL) is due to: registers, wire and bus capacitances.

� Input capacitance of the adder is fixed to reduce search space in global optimization.

� The major difference between register and adder is the switching activity.

17

OPTIMAL CIRCUIT TOPOLOGY (CONT’D)

� Figure below shows the energy-efficient curves in register, adder and ALU:

� At circuit level total energy of ALU is minimized subject to cycle time constraint.

18

MICRO-ARCHITECTURAL

OPTIMIZATION

19

PARALLELISM VERSUS PIPELINING

� Threshold voltage is introduced as an additional tuning variable in the optimization.

� Micro architectural design options:

nominal, parallel and pipeline

� Nominal: add-compare unit which uses 64-bit adder for both the adder(A) and comparator(B).

� Pipelining: extra register is inserted between blocks A and B.

� Parallel: area is doubled by operating two blocks in parallel.

20

PARALLELISM VERSUS

PIPELINING(CONT’D)

� Figure below shows the energy-per-operation in all the three designs is compared to nominal case:

� Conclusions from the above graph:

1. Parallelism is more energy efficient than pipelining when leakage energy is smaller than switching.

2. As devices becomes more leakier, parallel implementations achieve smaller energy savings.

3. Total energy is minimized at the point where the leakage energy is about the half of the active energy.

21

OPTIMAL VDD AND VTH

� The dependence of critical path delay and the ELk/Esw ratio on Vdd and Vth yields:

where init = initial design point

opt = optimal design point

� Equations based on:

-changing threshold voltage initially.

-changing Vdd to achieve desired performance.22

OPTIMAL VDD AND VTH(CONT’D)

23

OPTIMAL LEVEL OF PARALLELISM

� Figure below shows the energy-performance space for designs with parallelism P=2 to P=5.

� External load used was CL=32.

� Conclusions:

1. Parallel architecture provides an increase in performance at a very small marginal cost in energy

2. Addition of more parallel stages increases the throughput/performance.

24

ENERGY-AREA TRADEOFF

� Optimal tradeoff between energy and area reduces the cost of

design.

� Design cost function C(x) is given by:

where E(x) and A(x) are total energy and area.

β is the weight-factor.

� Time-multiplexing is used to reduce the area at the expense of some

increase in energy.

25

ENERGY-AREA TRADEOFF(CONT’D)

� Optimization in this case is achieved at the point where the overall chip cost is minimized.

26

CONCLUSIONS

27

REFERENCES

� Markovic, D.; Stojanovic, V.; Nikolic, B.; Horowitz, M.A.;

Brodersen, R.W, “Methods for true energy performance

optimization” Proc. IEEE, vol. 39, Issue:8, August 2004.

� Dunlop, A.E.; Fishburn, J.P.; Hill, D.D.; Shugard, D.D,”

Experiments using automatic physical design techniques for

optimizing circuit performance” Proc. IEEE, vol. 1, Year 1989.

� Neil H. E. Weste, David Money Harris,”CMOS VLSI DESIGN a

Circuits and Systems Perspective”, fourth ed.,2011

28

methods for true energy- performance optimization

Documents