system/behavioural level low power design

57

Upload: others

Post on 18-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Dimitrios Soudris, NTUA Low Power Design Course

System-level Estimation

Dimitrios Soudris, NTUA Low Power Design Course

Power savings in terms of Design Level

System level

Behavior level

Logic level

Transistor level

Layout level

RT level

10-20 x

2-5 x

20-50%

Incr

easi

ng

pow

er s

avin

gs

Dimitrios Soudris, NTUA Low Power Design Course

Techniques to reduce supply voltage

Algorithm

Architecture

Circuit/Logic

Technology

Transformation to exploitconcurrency

Parallelism and Pipelining

Transistor S izing, Fast LogicS tructures

Threshold Voltage Reduction,Feature S ize scaling

Dimitrios Soudris, NTUA Low Power Design Course

Techniques to minimizing the switched capacitance

Partitioning, Powe r-down, powe r state s

C omple xity, C oncurre ncy, Re gularity,Local i ty, Data re pre se ntation

C oncurre ncy, Instruction se t se le ction,S ignal corre lations,

Data re pre se ntation , Data Encoding

Transistor siz ing, Logic optimiz ation ,Powe r down, Layout O ptimiz ation

Advance d packaging, SO I

Archite cture

C ircuit/Logic

Te chnology

Algorithm

USyste m

Dimitrios Soudris, NTUA Low Power Design Course

Instruction Level Power Estimation methodology

A ssem bly /M a chineC o de

D eterm ina tio n o fB a s ic B lo cks

E x ecutio nP ro fi l ing

B a se C o s tta ble

G lo ba l P ro g ra mC o st E s tia m tio n

C a che pena lty E s t.(C a che Sim ula tio n)

F ina l P ro g ra m C o s t

Sta l lA na ly s is

B a s ic B lo ck C o s tE s tim a tio n

Dimitrios Soudris, NTUA Low Power Design Course

Power consumption of microprocessors

Nap

Nap

1.0

2.0

pow

er (w

atts

)

PowerPC 603microprocessor1

normal operation

Doze

2.2

0.366

0.1350.047

1.0

1.5

pow

er (w

atts

)

MIPS 4200microprocessor2

normal operation

reduced power

0.4

1.5

100

150

pow

er (w

atts

)

H itachi SH7032microprocessor3

normal operation

Doze

130

50

6.6

50

Dimitrios Soudris, NTUA Low Power Design Course

Power consumption of transfer and storage over datapath operations both in hardware and

software

16-bit carry

-selec

t

13.6

4.4

910

33

rela

tive

ener

gy/o

pera

tion

16-bit Multip

lier

8x128x16 SRAM (read)

8x128x16 SRAM (writ

e)

External I/

O Access

16 bit Mem

ory Access

rela

tive

ener

gyStorage

Interconnect

Other RISC

components

0.0

0.2

0.4

a

Dimitrios Soudris, NTUA Low Power Design Course

System-level Optimization

Dimitrios Soudris, NTUA Low Power Design Course

Variable-Voltage Techniques

• Existing low-power techniques– static variable-voltage techniques, – efficient design of the voltage converter– dynamic variable-voltage techniques

Dimitrios Soudris, NTUA Low Power Design Course

System for Variable Voltage Supplies

VDDVDD clockclock

RateController

RateController

Workloadfilter

Workloadfilter

FIFOFIFO

VoltageRegulator

VoltageRegulator

RingOcsillator

RingOcsillator

Arbitrary SynchronousDSP

Arbitrary SynchronousDSP

Dimitrios Soudris, NTUA Low Power Design Course

Architecture/RTL Power Optimization Techniques

Dimitrios Soudris, NTUA Low Power Design Course

Architecture Power Optimization Techniques

• Architecture-driven voltage reduction: The key idea is to speed upthe circuit in order to be able reduces voltage while meetingthroughput rate constraints. Voltage reduction can be achieved byintroducing parallelism in hardware or inserting flip-flops

• Switching activity minimization: Try to prevent the generation andpropagation of spurious transitions or to reduce the number oftransitions, e.g. retiming, path balancing, data representation

• Switched capacitance minimization: Aim at the minimization ofswitched capacitance

• Dynamic power management: Under certain conditions, a circuitpart becomes inactive, avoiding unnecessary calculations, e.g. gatedclocks, operand isolation, pre-computation, and guarded evaluation

Dimitrios Soudris, NTUA Low Power Design Course

Architecture Trade-offs: Reference Data Path

• Critical path delay Tadder + Tcomparator (= 25ns), fref = 40MHz• Total capacitance being switched = Cref

• Vdd = Vref = 5V • Power for reference datapath = Pref = Cref Vref

2 fref

Dimitrios Soudris, NTUA Low Power Design Course

Voltage Reduction Technique: Parallelism

• The clock rate can be reduced by half with the same throughput

fpar = fref / 2 • Vpar = Vref / 1.7 Cpar = 2.15 Cref

• Ppar = (2.15 Cref ) (Vref /1.7)2 (fref /2) 0.36 P ref

Dimitrios Soudris, NTUA Low Power Design Course

Voltage Reduction Technique: Pipeline

• fpipe = fref, Cpipe = 1.1 Cref, Vpipe = Vref /1.7• Voltage can be dropped while maintaining the original throughput• Ppipe = Cpipe Vpipe

2 fpipe = (1.1 Cref ) (Vref /1.7)2 fref = 0.37 Pref

Dimitrios Soudris, NTUA Low Power Design Course

Comparisons

Dimitrios Soudris, NTUA Low Power Design Course

Logic Style and Power Consumption

• Power-delay product improves as voltage decreases• The “best” logic style minimizes power-delay for a given delay constraint

Dimitrios Soudris, NTUA Low Power Design Course

Glitching: Chain of NOR Gates

Dimitrios Soudris, NTUA Low Power Design Course

Glitching: Adder Circuit

Dimitrios Soudris, NTUA Low Power Design Course

Switching Activity in Adders

Dimitrios Soudris, NTUA Low Power Design Course

Switching Activity in Multipliers

Dimitrios Soudris, NTUA Low Power Design Course

Resource Sharing Can Increase Activity (1)

Dimitrios Soudris, NTUA Low Power Design Course

Resource Sharing Can Increase Activity (2)

Dimitrios Soudris, NTUA Low Power Design Course

Data representation

• Sign-extension activity significantly reduced using sign-magnitude representation

Dimitrios Soudris, NTUA Low Power Design Course

Delay and Power under Voltage Scaling

Dimitrios Soudris, NTUA Low Power Design Course

Glitching activity reduction (1)

• Depends heavily on the topology of the circuit

• Circuit topology is also important for the clock selection

• The selection of the clock period used during scheduling may affect the glitching activity, since large values of the clock period lead to schedule chains with many functional units

(a) (b)

D

C

+

+

A B

+

A B

+

C D

+

+

Dimitrios Soudris, NTUA Low Power Design Course

Glitching activity reduction (2)

• Sometimes the architecture topology is not detailed

• RTL transformations for reducing glitching activity:

– Architectural delay balancing using buffers and transparentlatches

– Use of the clock signal to suppress glitchy transitions

– Selective delay insertion to minimize glitch propagation

– Multiplexer decomposition and multiplexer tree structuring toeliminate the use of glitchy control signals, and minimize glitchpropagation data and control signals

Dimitrios Soudris, NTUA Low Power Design Course

Glitching activity reduction (3)

x y

z

ARCHITECTURE 1

Power Consumption:

Without glitches: 823.9 μW

With glitches: 1650 μW

ARCHITECTURE 2

Power Consumption:

Without glitches: 951.7 μW

With glitches: 1357.7 μW

Function

if (x < y) then

z=c+d

else

z=a+b

a c

0 1

x y

a b c db d

0 1

0 1

z

Dimitrios Soudris, NTUA Low Power Design Course

Signals and Operations Reordering

• Example: complex multiplicationTrading a multiplication for an addition

(a) (b)

x

Xr

x

-

Xi

Ar Ai

Yr

x

Xr

x

+

Xi

Ai Ar

Yi

Ai-Ar x

Xr

x

+

Ar

Yi

x

Xi

Yr

Ai+Ar

-

+

Xr Xi

Dimitrios Soudris, NTUA Low Power Design Course

Module Selection

* **i ii iii

+i

+ii

(a)

(c)

(d)

* **i ii iii

+

+ii

*ii iii

+i

+ii

**i

Area=2744

Latency=30 ns

Power=1199μW

ripple

adder

carryloohahead

adder

Area=3959

Latency=20 ns

Power=1467μW

array

multiplier

wallace

multiplier

Area=16185

Latency=60 ns

Power=18540μW

Area=18443

Latency=40 ns

Power=23545μW

RTL

Library

(b)

Dimitrios Soudris, NTUA Low Power Design Course

Power Management Techniques in RT-level

• Power Management reduce the unnecessary transitions under certain conditions

• Power Management Techniques– Clock-Based Power Management

automatic synthesis of gated-clocks circuits, clock gating techniques for data path registers, clock tree construction to facilitate clock gating,

and power management using multiple non-

overlapping clocks

NTUA Low Power Design Course

Dimitrios Soudris, NTUA Low Power Design Course

Power Management Techniques in RT-level (cont’d)

– Pre-computation– Operand Isolation

• Guarded evaluation• Operand isolation in the context of high-level

synthesis– Dynamic Frequency Scaling

Dimitrios Soudris, NTUA Low Power Design Course

The concept of gating clock signals

0 1

REG clock

X Y

B

A <

<

clock

gatedclock

scheme 1

<

clock

gatedclock

scheme 2

comparatoroutput

gated clock(scheme 2)

gated clock(scheme 1)

clock

0

0

0

0

1 clock period

(a) (c)(b)

Dimitrios Soudris, NTUA Low Power Design Course

Automatic synthesis of gated clocks

• Reactive systems wait for a certain event to occurbefore changing state. During the wait periods theoutputs of system do not change and if system isclocked power can be wasted. This method recognizesthese idle states and inserts the appropriate logic thatstops the clock

L

CombinationalLogic

CombinationalLogic

Fa

.

..

.ININ OUT

OUT

CLK CLK

GCLK

STATESTATE

Dimitrios Soudris, NTUA Low Power Design Course

Gated-clocked techniques for data path registers

• The aim is to determine the conditions under which the register retains or re-loads its value

• The condition can be activated in terms of the select signals connected to the individual multiplexers along the path

constr(1)

constr(0) constr(2)0 1 10

10

register

Dimitrios Soudris, NTUA Low Power Design Course

Clock tree design to derive gated-clock signals

gatedclock

clockidle

condition

gatedclock cell

x1R1

x2R2

x1+x3R3

x2+x4R4

clock

A

B

R1

x3R3

x4R4

clock

A

B

x1

x2

R2

(a) (b)

Dimitrios Soudris, NTUA Low Power Design Course

Power Management Using Multiple Non-Overlapping Clocks

• The use of gated clocks results in the clock signals, which feed various sub-circuits, being suppressed when the registers in the sub-circuits do not need to load a new value. The cycles during which the clock transitions are suppressed need not follow any regular pattern in general, since the suppression of the clock signal transitions is data-dependent. Some types of designs, however, contain sub-circuits whose idle clock cycles follow a simple, regular pattern. For example, a component may be active and idle in alternating clock cycles. If the cycles in which a sub-circuit is idle follows a regular pattern, the clock generation circuitry need not be data-dependent.

Dimitrios Soudris, NTUA Low Power Design Course

Pre-computation

• Pre-computation [Ald94] is another RT-level and gate-level powermanagement and relies on the idea of duplicating part of the logicwith the purpose of pre-computing the output values one clockcycle earlier than required. Then if this is achieved the originallogic is turned off in the next clock cycle, thus eliminating activityin the internal nodes. In order for pre-computation to achievepower-savings there must be combinational blocks, for which arelatively big percentage of the output values can be pre-computedby a significantly smaller block

Dimitrios Soudris, NTUA Low Power Design Course

Pre-computation: Example (1)

A

g1

g0

R1

FF

FF

..

...

.

.

R2

LE

fX1

XN

X2

AR1

..

.

X1

XN

X2 R2

f

Dimitrios Soudris, NTUA Low Power Design Course

Pre-computation: Example (2)

• The Boolean functions g1 and g0 serve as the predictor functions of the whole architecture, according to the following equations:

g1=1 f=1

g0=1 f=0

• Therefore, if either g1 or g0 is high during clock cycle T, the load enable signal (LE) goes low, and the inputs to block A are forced to retain their values during clock cycle T+1 changing. Hence no gate output transitions inside block A occur, while the correct output value for the next time frame is provide by the two registers located at the output of g1 and g0

Dimitrios Soudris, NTUA Low Power Design Course

The concept of operand isolation

• The concept of operand isolation occurs, where transparent latches are inserted at all the inputs of an embedded logic block, and control circuitry is added to detect the idle conditions for the block. When the clock is not required to perform any useful operation, the transparent latches at its inputs are disabled, and retain the previous cycle's values, avoiding unnecessary power dissipation in the idle block

transparent latch

. . .

. . .

. . .

circuitry detectingidle condition

Embedded

Block

COMBINATIONAL LOGIC

Dimitrios Soudris, NTUA Low Power Design Course

Guarded evaluation

• Guarded evaluation [Tiwari95] is a shut-down technique in the RT and gate-level that does not require to synthesize additional logic to implement the shut-down mechanism; rather it exploits existing signals in the original circuit. The approach is based on placing transparent latches with an enable signal at the input of each block of the circuit that needs to be power managed.

F F

X XOOY Y

ZZ

S'

Dimitrios Soudris, NTUA Low Power Design Course

Operand isolation during high-level synthesis: RTL circuit (1)

• For functional units that have one or more idle controller states, itis possible to insert transparent latches at the functional at thefunctional unit’s inputs to perform operand isolation. The latchenables signals for the latches at the inputs of a functional unit canbe derived directly from its idle controller states

• The expressions for the latch signals LE1,…, LE4 in are:

• LE1 = LE3 = x4

• LE2 = x1 + x2

• LE4 = x2 + x3

Dimitrios Soudris, NTUA Low Power Design Course

Operand isolation during high-level synthesis: RTL circuit (2)

MUL1(*1,*3,*5)

LE1

v1, v5,v6

R1

SUB1(-1,-2)

LE2

u, u1, v7

R2

MUL2(*2,*4,*6)

LE3

v2, v3, v4

R3

ADD1(+1,+2)

LE4

y, y1

R4

x, x1

R5

CMP(<1)

a

control

c1

x

c1=a<x1x1=x+dxy1=y+v4

v2=3*xv3=3*y

v4=u*dx

R23

uR3u

v1

v3dx

v2 dx

v4

R5

dx

x y y

R4

v=u-v5u1=v7-v6

v1=u*dxv5=v1*v2v6=v3*dx

s4 s2

s1

s3

controlFSM

. . .

. . .

LE1

LE4

contr(1)

contr(13)

reset c1

transparentlatches

Dimitrios Soudris, NTUA Low Power Design Course

Glitching in Static CMOS

A

B

X

CZ

ABC 101 000

X

Z

Unit Delay

also called: dynamic hazards

Observe: No glitching in dynamic circuits

Dimitrios Soudris, NTUA Low Power Design Course

Low Power Design Course

RTL Power Estimation Techniques

Dimitrios Soudris, NTUA Low Power Design Course

RTL Estimation Classification

• RTL Power Estimation– Analytical Methods

Complexity-based Models Information theoretic-based Models

– Empirical Methods Constant-Activity Models Variable Activity-based Models

Dimitrios Soudris, NTUA Low Power Design Course

RTL Estimation Methods

• Analytical Methods– attempt to relate the power consumption of a

particular RTL description to fundamental quantities that describe the physical capacitance and a activity of a design

• Empirical Methods– the strategy is to “measure” the power consumption

of existing implementations and produce a model based on those measurements. There techniques employ the so called macromodelling approach to architectural power estimation

Dimitrios Soudris, NTUA Low Power Design Course

RTL Macromodelling (1)

• A RTL power estimation flow consists:– Characterize energy every component in the high-level

design library by simulating it under pseudo-random data and fitting the power macro-model equation to power dissipation results using a least mean square error

– Extract the variable values for the macro-model equation from either static analysis of the circuit structure and functionality, or by performing a behavioral simulation. In the latter case, a power co-simulator linked with a standard RTL simulator can be used to collect input data statistics for various RTL modules in the design

Dimitrios Soudris, NTUA Low Power Design Course

RTL Macromodelling (2)

– Evaluate the power macro-model equations for high-level design components which are found in the library by plugging the parameter values in the corresponding macro-model equations

– Estimate the power dissipation for random logic or interface circuitry by simulating the gate-level description of these components, or by performing probabilistic power estimation. The low level simulation can be significantly sped up by the application of statistical sampling techniques

Dimitrios Soudris, NTUA Low Power Design Course

Low Power Design Course

Low-Power Design at the Logic Level

Dimitrios Soudris, NTUA Low Power Design Course

Retiming: Flip-flop insertion to minimize hazard activity

• The method is based on repositioning the flip-flops in the circuit so as to minimize either the number of flip-flops or the delay through the longest pipeline stage

gCL

g RCL

Dimitrios Soudris, NTUA Low Power Design Course

Two-Level Logic Circuits Switching Activity Minimization (1)

• Taking into account the static and transition probabilities (i.e. temporal correlation) of the primary inputs, we can insert in certain gates of the first logic level (i.e. AND gates), additional input signals resulting into reduced switching activity

• Appropriately-selected input signals force the outputs of the AND gates to logic level zero for a number of combinations of the binary input signals

Dimitrios Soudris, NTUA Low Power Design Course

Two-Level Logic Circuits Switching Activity Minimization (2)

• Example: • Signal x3 exhibits low-transition probability and high

static-1 probability, while the signals x0 , x1, and x2 are characterized by high-transition probabilities

F'g4g4

g1

g2

g3

x0x1

x0x2

x0x3

x3

'y1

'y2

'y3

Fg4

g1

g2

g3

x0x1

x0x2

x0x3

y1

y2

y3

g4

Intial Logic Circuit Modified Logic circuit

F x x x x x x 0 1 0 2 0 3