adders (cont.) multipliersbwrcs.eecs.berkeley.edu/classes/icdesign/ee141_f05/... · 2005. 11....

27
EE141 – Fall 2005 Lecture 19 Adders (Cont.) Adders (Cont.) Multipliers Multipliers EE141 2 Administrative Stuff Homework 7 due today Midterm 2 material Wires Logic gates Logical effort Adders Review session on Tue Nov 8 North Gate Hall, Room 105, 6:30-8:30pm

Upload: others

Post on 15-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

1

EE141 – Fall 2005Lecture 19

Adders (Cont.)Adders (Cont.)MultipliersMultipliers

EE141 2

Administrative Stuff

Homework 7 due today

Midterm 2 material• Wires• Logic gates• Logical effort• Adders

Review session on Tue Nov 8North Gate Hall, Room 105, 6:30-8:30pm

Page 2: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

2

EE141 3

Class Material

Last lecture• Adders

Today’s lecture• Adders (Cont.)• Multipliers and other arithmetic• Intro to power

Adders (Cont.)Adders (Cont.)

Page 3: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

3

EE141 5

Carry Look-Ahead

Sumi = Ai ⊕ Bi ⊕ Carryi-1

Carryi = Ai·Bi + (Ai + Bi)·Carryi-1

Partial Sum

Generate Propagate

Carryi = Gi + Pi·Carryi-1

EE141 6

Co k, f A k Bk Co k, 1–, ,( ) Gk P kCo k 1–,+= =

AN-1, BN-1A1, B1

P1

S1

• • •

• • • SN-1

PN-1Ci, N-1

S0

P0Ci,0 Ci,1

A0, B0

Look-Ahead: Basic Idea

The idea is to eliminatecarry rippling effect

Page 4: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

4

EE141 7

Co k, Gk Pk Gk 1– Pk 1– Co k 2–,+( )+=

Co k, Gk Pk Gk 1– Pk 1– … P1 G0 P0 Ci 0,+( )+( )+( )+=

Expanding Look-Ahead equations:

All the way:

Co,3

Ci,0

VDD

P0

P1

P2

P3

G0

G1

G2

G3

Look-Ahead: Topology

Implementation issues:- long stack (N+1)- or multiple stages

still linear delay!

EE141 8

A7

F

A6A5A4A3A2A1

A0

A0A1

A2A3

A4A5

A6

A7

F

tp∼ log2(N)

tp∼ N

Logarithmic Look-Ahead Adder

Idea: large stacks limit carry look-ahead to 2-4 bitsorganize carry P and G into recursive trees

Page 5: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

5

EE141 9

Carry Look-Ahead Trees

Co 0, G0 P0Ci 0,+=

Co 1, G1 P1 G0 P1P0 Ci 0,+ +=

Co 2, G2 P2G1 P2 P1G0 P+ 2 P1P0Ci 0,+ +=

G2 P2G1+( )= P2P1( ) G0 P0Ci 0,+( )+ G 2:1 P2:1Co 0,+=

Can continue building the tree hierarchically...

EE141 10

GG=Gi+PiGi-1GP=PiPi-1

Oddinput bits

Even input bits

Sumeven

Sumodd

PG Gen. CM1 CM2 CM3 CM4 CM5

CM1 CM2 CM3 CM4 CM5PG Gen.

1 2 3 4 5 6 7

XOR

XOR

Courtesy:R. Krishnamurthy(Intel)

High-Performance Adders: Kogge-Stone Tree Adder

Generate all 32 carries• Full-blown binary tree ⇒ energy-inefficient

# carry-merge stages = log2(32) ⇒ 5 stages

Page 6: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

6

EE141 11

Energy inefficientEnergy

inefficient

1235 4679 8101113 12141517 16181921 20222325 24262729 283031PGC

arry

-mer

ge g

ates

XOR

00

Courtesy:R. Krishnamurthy (Intel)

Critical path = PG + 5 + XOR = 7 gate stagesGenerate, Propagate FO of 2,3Maximum interconnect spans 16b

Kogge-Stone Adder

EE141 12

Tree Adders

16-bit radix-2 Kogge-Stone tree

(A0,

B0)

(A1,

B1)

(A2,

B2)

(A3,

B3)

(A4,

B4)

(A5,

B5)

(A6,

B6)

(A7,

B7)

(A8,

B8)

(A9,

B9)

(A10

, B10

)

(A11

, B11

)

(A12

, B12

)

(A13

, B13

)

(A14

, B14

)

(A15

, B15

)

S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15

Page 7: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

7

EE141 13

Example: Domino Adder

VDD

Clk Pi= ai + bi

Clk

ai bi

VDD

Clk Gi = aibi

Clk

ai

bi

Propagate Generate

EE141 14

Example: Domino Adder

VDD

Clkk

Pi:i-k+1

Pi-k:i-2k+1

Pi:i-2k+1

VDD

Clkk

Gi:i-k+1

Pi:i-k+1

Gi-k:i-2k+1

Gi:i-2k+1

Propagate Generate

The “dot” operator (carry-merge)

Page 8: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

8

EE141 15

Example: Domino SumVDD

Clk

Gi:0

Clk

Sum

VDD

Clkd

Clk

Gi:0

Clk

Si1

Clkd

Si0

Keeper

EE141 16

Tree Adders

(a0,

b 0)

(a1,

b 1)

(a2,

b 2)

(a3,

b 3)

(a4,

b 4)

(a5,

b 5)

(a6,

b 6)

(a7,

b 7)

(a8,

b 8)

(a9,

b 9)

(a10

, b10

)

(a11

, b11

)

(a12

, b12

)

(a13

, b13

)

(a14

, b14

)

(a15

, b15

)

S0

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S14

S15

16-bit radix-4 Kogge-Stone Tree

Page 9: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

9

EE141 17

Courtesy:R. Krishnamurthy(Intel)

Generate every 4th carry in parallelSide-path: 4-bit conditional sum generator73% fewer carry-merge gates ⇒ energy-efficient

Sparse-Tree Adder Architecture

EE141 18

PGPG GGGG11 GGGG77

Static sum generatorStatic sum generator

SingleSingle--rail dynamic sparserail dynamic sparse--tree pathtree path

AdderAdderInputsInputs

clk2clk2

SumSum3131

clk3clk3clkclk

clkclk

GGGG2727GGGG1515

CM0CM0LatchLatch CM1CM1 XORXOR

CC2727

SumSum31_031_0

SumSum31_131_1

GGGG33

Courtesy:R. Krishnamurthy

(Intel)

Adder Core Critical Path

Critical path: 7 gates same as KSSparse-tree: single-rail dynamicExploit non-criticality of sum generatorConvert to static logic semi-dynamic design

Page 10: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

10

EE141 19

Courtesy:R. Krishnamurthy(Intel)

Sparse-Tree Architecture

Performance impact: 20% speedup• 33-50% reduced G/P fanouts• 80% reduced wiring complexity• 30% reduction in maximum interconnect

Power impact: 56% reduction• 73% fewer carry-merge gates• 50% reduction in average transistor size

EE141 20

00

2020

4040

6060

8080

100100

140140 160160 180180 200200 220220 240240 260260 280280Delay (ps)Delay (ps)

Wor

stW

orst

-- cas

e En

ergy

(pJ)

case

Ene

rgy

(pJ)

Dynamic KoggeDynamic Kogge--StoneStone

SemiSemi--dynamic Sparsedynamic Sparse--Tree Tree

20%20%

4GHz 4GHz DesignDesign

56%

56%

130nm CMOS, 1.2V, 110130nm CMOS, 1.2V, 110ooCCCourtesy:R. Krishnamurthy(Intel)

Energy-Delay Space

20% speedup over Kogge-Stone56% worst-case energy reduction

Page 11: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

11

MultipliersMultipliers

EE141 22

Z X·· Y× Zk2k

k 0=

M N 1–+

∑= =

Xi2i

i 0=

M 1–

Yj2j

j 0=

N 1–

=

XiYj2i j+

j 0=

N 1–

i 0=

M 1–

∑=

X Xi2i

i 0=

M 1–

∑=

Y Yj2j

j 0=

N 1–

∑=

with

The Binary Multiplication

Page 12: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

12

EE141 23

x

+

Partial products

Multiplicand

Multiplier

Result

1 0 1 0 1 0

1 0 1 0 1 0

1 0 1 0 1 0

1 1 1 0 0 1 1 1 0

0 0 0 0 0 0

1 0 1 0 1 0

1 0 1 1

The Binary Multiplication

EE141 24

Y0

Y1

X3 X2 X1 X0

X3

HA

X2

FA

X1

FA

X0

HA

Y2X3

FA

X2

FA

X1

FA

X0

HA

Z1

Z3Z6Z7 Z5 Z4

Y3X3

FA

X2

FA

X1

FA

X0

HA

Z2

Z0

The Array Multiplier

Page 13: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

13

EE141 25

HA FA FA HA

HAFAFAFA

FAFA FA HA

Critical Path 1

Critical Path 2

Critical Path 1 & 2

( ) ( )[ ] ( ) ( ) andsumcarrymult tNtNtNMt ⋅−+⋅−+⋅−+−≈ 1121

The M-by-N Array Multiplier: Critical Path

EE141 26

A

B

P

Ci

VDD A

A A

VDD

Ci

A

P

AB

VDD

VDD

Ci

Ci

Co

S

Ci

P

P

P

P

P

Sum Generation

Carry Generation

Setup

Transmission-Gate Full Adder

Balanced tsum and tcarry

Page 14: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

14

EE141 27

Carry-Save Multiplier

HA HA HA HA

FAFAFAHA

FAHA FA FA

FAHA FA HA

Vector Merging Adder

( ) ( ) mergeandcarrymult ttNtNt +⋅−+⋅−= 11

EE141 28

Multiplier Floorplan

SCSCSCSC

SCSCSCSC

SCSCSCSC

SC

SC

SC

SC

Z0

Z1

Z2

Z3Z4Z5Z6Z7

X0X1X2X3

Y1

Y2

Y3

Y0

Vector Merging Cell

HA Multiplier Cell

FA Multiplier Cell

X and Y signals are broadcastedthrough the complete array.( )

Page 15: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

15

EE141 29

Wallace-Tree Multiplier

6 5 4 3 2 1 0 6 5 4 3 2 1 0

Partial products First stage

Bit position

6 5 4 3 2 1 0 6 5 4 3 2 1 0Second stage Final adder

FA HA

(a) (b)

(c) (d)

EE141 30

Wallace-Tree Multiplier

Partial products

First stage

Second stage

Final adder

FA FA FA

HA HA

FA

x3y3

z7 z6 z5 z4 z3 z2 z1 z0

x3y2x2y3

x1y1x3y0 x2y0 x0y1x0y2

x2y2x1y3

x1y2x3y1x0y3 x1y0 x0y0x2y1

Page 16: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

16

EE141 31

Wallace-Tree Multiplier

FA

FA

FA

FA

y0 y1 y2

y3

y4

y5

S

Ci-1

Ci-1

Ci-1

Ci

Ci

Ci

FA

y0 y1 y2

FA

y3 y4 y5

FA

FA

CC S

Ci-1

Ci-1

Ci-1

Ci

Ci

Ci

EE141 32

Multipliers – Summary

Optimization goals different than in binary adder

Once again: Identify critical path

Other possible techniques• Logarithmic versus linear (Wallace Tree Mult)• Data encoding (Booth)• Pipelining

First glimpse at system level optimization

Page 17: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

17

EE141 33

The Binary Shifter

Ai

Ai-1

Bi

Bi-1

Right Leftnop

Bit-Slice i

...

EE141 34

The Barrel Shifter

Sh3Sh2Sh1Sh0

Sh3

Sh2

Sh1

A3

A2

A1

A0

B3

B2

B1

B0

: Control Wire

: Data Wire

Area Dominated by Wiring

Page 18: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

18

EE141 35

4x4 Barrel Shifter

BufferSh3S h2Sh 1Sh0

A3

A2

A 1

A 0

Widthbarrel ~ 2 pm M

EE141 36

Logarithmic ShifterSh1 Sh1 Sh2 Sh2 Sh4 Sh4

A3

A2

A1

A0

B1

B0

B2

B3

Page 19: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

19

EE141 37

A3

A 2

A1

A0

Out3

Out2

Out1

Out0

0-7 bit Logarithmic Shifter

( )[ ] ( )1222...212 1log −+⋅=++++⋅≈ − KpKpwidth K

mK

m

PowerPower

Page 20: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

20

EE141 39

400 million computers in the world• 0.16 PW (PetaWatt = 1015 W) of power dissipation• Equivalent to 26 nuclear plants!

Data centers represent the absolute challenge• 1 single server rack is between 5 and 20 kW• 100’s of those racks in a single room!

The Power Challenge

EE141 40

Courtesy of IBM

Power and energy management and minimizationhave emerged as some of the most dominant roadblocks. The best opportunity lies in a very aggressive scaling and adaptation of supply and threshold values in concert with a careful orchestration of the system activity.

138 W/cm2

Power and Energy Challenges

Page 21: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

21

EE141 41

Little change in basic technology• store energy using a chemical reaction

Battery capacity doubles every 10 yearsEnergy density/size, safe handling are limiting factor

Energy densityof material

KWH/kg

Gasoline 14

Lead-Acid 0.04

Li polymer 0.15

Portability: Battery Storage is the Limiting Factor

EE141 42

020406080

100120140160

1940 1950 1960 1970 1980 1990 2000 2010

First Commercial Use

Energy Density(Wh/kg) Trend Line

NiCd SLA NiMH Li-Ion ReusableAlkaline

Li-Polymer

Factor 4 over the last 10 years!

Battery Progress

Page 22: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

22

EE141 43

Power Dissipation in CMOS

Dynamic power• Charging capacitances• Dominant today

Leakage power• Leaky transistors• Concern in low-activity, portable devices

Short circuit power

Static power• E.g. pseudo-NMOS

EE141 44

( ) ( ) ∫∫ ∫ ====→

DDV

DDLoutLDD

T T

DDDDDD VCdvCVdttiVdttPE0

2

0 010

( ) ( ) ∫∫ ∫ ====DDV

DDLoutoutL

T T

LoutCC VCdvvCdttivdttPE0

2

0 0 21

Vdd

Vout

iL

CL

PMOSNETWORK

NMOS

A1

AN

NETWORK

210 DDLVCE =→

Dynamic Power Consumption

Page 23: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

23

EE141 45

One half of the power from the supply is consumed in the pull-up network and one half is stored on CLCharge from CL is dumped during the 1→0 transition

Vdd

Vout

iL

CL

PMOSNETWORK

NMOS

A1

AN

NETWORK

210 DDLVCE =→

221

DDLR VCE =

221

DDLC VCE =

Dynamic Power Consumption

EE141 46

Power = Energy/transition • Transition rate

= CLVDD2 • f0→1

= CLVDD2 • f • P0→1

= CswitchedVDD2 • f

Power dissipation is data dependent – depends on the switching probability

Switched capacitance Cswitched = CL • P0→1

Dynamic Power Consumption

Page 24: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

24

EE141 47

Energy consumed in N cycles, EN:

EN = CL • VDD2 • n0→1

n0→1 – number of 0→1 transitions in N cycles

fVCN

nfNEP DDLN

NNavg ⋅⋅⋅

=⋅= →

∞→∞→

210limlim

fN

nN

⋅= →

∞→→10

10 limα

fVCP DDLavg ⋅⋅⋅= →2

10α

Transition Activity and Power

EE141 48

Factors Affecting Transition Activity

“Static” component (does not account for timing)• Type of logic function (NOR vs. XOR)• Type of logic style (Static vs. Dynamic)• Signal statistics• Inter-signal correlations

“Dynamic” or timing dependent component• Circuit topology• Signal Statistics and correlations

Page 25: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

25

EE141 49

011

001

010

100

OutBA

Example: Static 2-input NOR Gate

Assume signal probabilitiespA=1 = 1/2pB=1 = 1/2

Then transition probabilityp0→1 = pOut=0 x pOut=1

= 3/4 x 1/4 = 3/16

α0→1 = 3/16

If inputs switch every cycle

Type of Logic Function: NOR vs. XOR

EE141 50

011

101

110

000

OutBA

Example: Static 2-input XOR Gate

Assume signal probabilitiespA=1 = 1/2pB=1 = 1/2

Then transition probabilityp0→1 = pOut=0 x pOut=1

= 1/2 x 1/2 = 1/4

α0→1 = 1/4

If inputs switch in every cycle

Type of Logic Function: NOR vs. XOR

Page 26: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

26

EE141 51

In1

In2 PDNIn3

Me

Mp

CLK

CLKOut

CL

Power only dissipated when previous Out = 0

Power Consumption of Dynamic Gates

EE141 52

011

001

010

100

OutBA

Dynamic 2-input NOR Gate

Assume signal probabilitiesPA=1 = 1/2PB=1 = 1/2

Then transition probabilityP0→1 = Pout=0 x Pout=1

= 3/4 x 1 = 3/4

Switching activity always higher in dynamic gates!P0→1 = Pout=0

Dynamic Power Consumption is Data Dependent

Page 27: Adders (Cont.) Multipliersbwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f05/... · 2005. 11. 3. · 10 EE141 19 Courtesy: R. Krishnamurthy (Intel) Sparse-Tree Architecture Performance

27

EE141 53

Prime choice: Reduce voltage!• Recent years have seen an acceleration in supply

voltage reduction• Design at very low voltages still open question

(0.6 … 0.9 V by 2010!)• Reducing thresholds to improve performance

increases leakage

Reduce switching activity

Reduce physical capacitance

Principles for Power Reduction

EE141 54

Next Lecture

PowerSequential Logic