adders (cont.) multipliersbwrcs.eecs.berkeley.edu/classes/icdesign/ee141_f05/... · 2005. 11....
TRANSCRIPT
1
EE141 – Fall 2005Lecture 19
Adders (Cont.)Adders (Cont.)MultipliersMultipliers
EE141 2
Administrative Stuff
Homework 7 due today
Midterm 2 material• Wires• Logic gates• Logical effort• Adders
Review session on Tue Nov 8North Gate Hall, Room 105, 6:30-8:30pm
2
EE141 3
Class Material
Last lecture• Adders
Today’s lecture• Adders (Cont.)• Multipliers and other arithmetic• Intro to power
Adders (Cont.)Adders (Cont.)
3
EE141 5
Carry Look-Ahead
Sumi = Ai ⊕ Bi ⊕ Carryi-1
Carryi = Ai·Bi + (Ai + Bi)·Carryi-1
Partial Sum
Generate Propagate
Carryi = Gi + Pi·Carryi-1
EE141 6
Co k, f A k Bk Co k, 1–, ,( ) Gk P kCo k 1–,+= =
AN-1, BN-1A1, B1
P1
S1
• • •
• • • SN-1
PN-1Ci, N-1
S0
P0Ci,0 Ci,1
A0, B0
Look-Ahead: Basic Idea
The idea is to eliminatecarry rippling effect
4
EE141 7
Co k, Gk Pk Gk 1– Pk 1– Co k 2–,+( )+=
Co k, Gk Pk Gk 1– Pk 1– … P1 G0 P0 Ci 0,+( )+( )+( )+=
Expanding Look-Ahead equations:
All the way:
Co,3
Ci,0
VDD
P0
P1
P2
P3
G0
G1
G2
G3
Look-Ahead: Topology
Implementation issues:- long stack (N+1)- or multiple stages
still linear delay!
EE141 8
A7
F
A6A5A4A3A2A1
A0
A0A1
A2A3
A4A5
A6
A7
F
tp∼ log2(N)
tp∼ N
Logarithmic Look-Ahead Adder
Idea: large stacks limit carry look-ahead to 2-4 bitsorganize carry P and G into recursive trees
5
EE141 9
Carry Look-Ahead Trees
Co 0, G0 P0Ci 0,+=
Co 1, G1 P1 G0 P1P0 Ci 0,+ +=
Co 2, G2 P2G1 P2 P1G0 P+ 2 P1P0Ci 0,+ +=
G2 P2G1+( )= P2P1( ) G0 P0Ci 0,+( )+ G 2:1 P2:1Co 0,+=
Can continue building the tree hierarchically...
EE141 10
GG=Gi+PiGi-1GP=PiPi-1
Oddinput bits
Even input bits
Sumeven
Sumodd
PG Gen. CM1 CM2 CM3 CM4 CM5
CM1 CM2 CM3 CM4 CM5PG Gen.
1 2 3 4 5 6 7
XOR
XOR
Courtesy:R. Krishnamurthy(Intel)
High-Performance Adders: Kogge-Stone Tree Adder
Generate all 32 carries• Full-blown binary tree ⇒ energy-inefficient
# carry-merge stages = log2(32) ⇒ 5 stages
6
EE141 11
Energy inefficientEnergy
inefficient
1235 4679 8101113 12141517 16181921 20222325 24262729 283031PGC
arry
-mer
ge g
ates
XOR
00
Courtesy:R. Krishnamurthy (Intel)
Critical path = PG + 5 + XOR = 7 gate stagesGenerate, Propagate FO of 2,3Maximum interconnect spans 16b
Kogge-Stone Adder
EE141 12
Tree Adders
16-bit radix-2 Kogge-Stone tree
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15
7
EE141 13
Example: Domino Adder
VDD
Clk Pi= ai + bi
Clk
ai bi
VDD
Clk Gi = aibi
Clk
ai
bi
Propagate Generate
EE141 14
Example: Domino Adder
VDD
Clkk
Pi:i-k+1
Pi-k:i-2k+1
Pi:i-2k+1
VDD
Clkk
Gi:i-k+1
Pi:i-k+1
Gi-k:i-2k+1
Gi:i-2k+1
Propagate Generate
The “dot” operator (carry-merge)
8
EE141 15
Example: Domino SumVDD
Clk
Gi:0
Clk
Sum
VDD
Clkd
Clk
Gi:0
Clk
Si1
Clkd
Si0
Keeper
EE141 16
Tree Adders
(a0,
b 0)
(a1,
b 1)
(a2,
b 2)
(a3,
b 3)
(a4,
b 4)
(a5,
b 5)
(a6,
b 6)
(a7,
b 7)
(a8,
b 8)
(a9,
b 9)
(a10
, b10
)
(a11
, b11
)
(a12
, b12
)
(a13
, b13
)
(a14
, b14
)
(a15
, b15
)
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
16-bit radix-4 Kogge-Stone Tree
9
EE141 17
Courtesy:R. Krishnamurthy(Intel)
Generate every 4th carry in parallelSide-path: 4-bit conditional sum generator73% fewer carry-merge gates ⇒ energy-efficient
Sparse-Tree Adder Architecture
EE141 18
PGPG GGGG11 GGGG77
Static sum generatorStatic sum generator
SingleSingle--rail dynamic sparserail dynamic sparse--tree pathtree path
AdderAdderInputsInputs
clk2clk2
SumSum3131
clk3clk3clkclk
clkclk
GGGG2727GGGG1515
CM0CM0LatchLatch CM1CM1 XORXOR
CC2727
SumSum31_031_0
SumSum31_131_1
GGGG33
Courtesy:R. Krishnamurthy
(Intel)
Adder Core Critical Path
Critical path: 7 gates same as KSSparse-tree: single-rail dynamicExploit non-criticality of sum generatorConvert to static logic semi-dynamic design
10
EE141 19
Courtesy:R. Krishnamurthy(Intel)
Sparse-Tree Architecture
Performance impact: 20% speedup• 33-50% reduced G/P fanouts• 80% reduced wiring complexity• 30% reduction in maximum interconnect
Power impact: 56% reduction• 73% fewer carry-merge gates• 50% reduction in average transistor size
EE141 20
00
2020
4040
6060
8080
100100
140140 160160 180180 200200 220220 240240 260260 280280Delay (ps)Delay (ps)
Wor
stW
orst
-- cas
e En
ergy
(pJ)
case
Ene
rgy
(pJ)
Dynamic KoggeDynamic Kogge--StoneStone
SemiSemi--dynamic Sparsedynamic Sparse--Tree Tree
20%20%
4GHz 4GHz DesignDesign
56%
56%
130nm CMOS, 1.2V, 110130nm CMOS, 1.2V, 110ooCCCourtesy:R. Krishnamurthy(Intel)
Energy-Delay Space
20% speedup over Kogge-Stone56% worst-case energy reduction
11
MultipliersMultipliers
EE141 22
Z X·· Y× Zk2k
k 0=
M N 1–+
∑= =
Xi2i
i 0=
M 1–
∑
Yj2j
j 0=
N 1–
∑
=
XiYj2i j+
j 0=
N 1–
∑
i 0=
M 1–
∑=
X Xi2i
i 0=
M 1–
∑=
Y Yj2j
j 0=
N 1–
∑=
with
The Binary Multiplication
12
EE141 23
x
+
Partial products
Multiplicand
Multiplier
Result
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
0 0 0 0 0 0
1 0 1 0 1 0
1 0 1 1
The Binary Multiplication
EE141 24
Y0
Y1
X3 X2 X1 X0
X3
HA
X2
FA
X1
FA
X0
HA
Y2X3
FA
X2
FA
X1
FA
X0
HA
Z1
Z3Z6Z7 Z5 Z4
Y3X3
FA
X2
FA
X1
FA
X0
HA
Z2
Z0
The Array Multiplier
13
EE141 25
HA FA FA HA
HAFAFAFA
FAFA FA HA
Critical Path 1
Critical Path 2
Critical Path 1 & 2
( ) ( )[ ] ( ) ( ) andsumcarrymult tNtNtNMt ⋅−+⋅−+⋅−+−≈ 1121
The M-by-N Array Multiplier: Critical Path
EE141 26
A
B
P
Ci
VDD A
A A
VDD
Ci
A
P
AB
VDD
VDD
Ci
Ci
Co
S
Ci
P
P
P
P
P
Sum Generation
Carry Generation
Setup
Transmission-Gate Full Adder
Balanced tsum and tcarry
14
EE141 27
Carry-Save Multiplier
HA HA HA HA
FAFAFAHA
FAHA FA FA
FAHA FA HA
Vector Merging Adder
( ) ( ) mergeandcarrymult ttNtNt +⋅−+⋅−= 11
EE141 28
Multiplier Floorplan
SCSCSCSC
SCSCSCSC
SCSCSCSC
SC
SC
SC
SC
Z0
Z1
Z2
Z3Z4Z5Z6Z7
X0X1X2X3
Y1
Y2
Y3
Y0
Vector Merging Cell
HA Multiplier Cell
FA Multiplier Cell
X and Y signals are broadcastedthrough the complete array.( )
15
EE141 29
Wallace-Tree Multiplier
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Partial products First stage
Bit position
6 5 4 3 2 1 0 6 5 4 3 2 1 0Second stage Final adder
FA HA
(a) (b)
(c) (d)
EE141 30
Wallace-Tree Multiplier
Partial products
First stage
Second stage
Final adder
FA FA FA
HA HA
FA
x3y3
z7 z6 z5 z4 z3 z2 z1 z0
x3y2x2y3
x1y1x3y0 x2y0 x0y1x0y2
x2y2x1y3
x1y2x3y1x0y3 x1y0 x0y0x2y1
16
EE141 31
Wallace-Tree Multiplier
FA
FA
FA
FA
y0 y1 y2
y3
y4
y5
S
Ci-1
Ci-1
Ci-1
Ci
Ci
Ci
FA
y0 y1 y2
FA
y3 y4 y5
FA
FA
CC S
Ci-1
Ci-1
Ci-1
Ci
Ci
Ci
EE141 32
Multipliers – Summary
Optimization goals different than in binary adder
Once again: Identify critical path
Other possible techniques• Logarithmic versus linear (Wallace Tree Mult)• Data encoding (Booth)• Pipelining
First glimpse at system level optimization
17
EE141 33
The Binary Shifter
Ai
Ai-1
Bi
Bi-1
Right Leftnop
Bit-Slice i
...
EE141 34
The Barrel Shifter
Sh3Sh2Sh1Sh0
Sh3
Sh2
Sh1
A3
A2
A1
A0
B3
B2
B1
B0
: Control Wire
: Data Wire
Area Dominated by Wiring
18
EE141 35
4x4 Barrel Shifter
BufferSh3S h2Sh 1Sh0
A3
A2
A 1
A 0
Widthbarrel ~ 2 pm M
EE141 36
Logarithmic ShifterSh1 Sh1 Sh2 Sh2 Sh4 Sh4
A3
A2
A1
A0
B1
B0
B2
B3
19
EE141 37
A3
A 2
A1
A0
Out3
Out2
Out1
Out0
0-7 bit Logarithmic Shifter
( )[ ] ( )1222...212 1log −+⋅=++++⋅≈ − KpKpwidth K
mK
m
PowerPower
20
EE141 39
400 million computers in the world• 0.16 PW (PetaWatt = 1015 W) of power dissipation• Equivalent to 26 nuclear plants!
Data centers represent the absolute challenge• 1 single server rack is between 5 and 20 kW• 100’s of those racks in a single room!
The Power Challenge
EE141 40
Courtesy of IBM
Power and energy management and minimizationhave emerged as some of the most dominant roadblocks. The best opportunity lies in a very aggressive scaling and adaptation of supply and threshold values in concert with a careful orchestration of the system activity.
138 W/cm2
Power and Energy Challenges
21
EE141 41
Little change in basic technology• store energy using a chemical reaction
Battery capacity doubles every 10 yearsEnergy density/size, safe handling are limiting factor
Energy densityof material
KWH/kg
Gasoline 14
Lead-Acid 0.04
Li polymer 0.15
Portability: Battery Storage is the Limiting Factor
EE141 42
020406080
100120140160
1940 1950 1960 1970 1980 1990 2000 2010
First Commercial Use
Energy Density(Wh/kg) Trend Line
NiCd SLA NiMH Li-Ion ReusableAlkaline
Li-Polymer
Factor 4 over the last 10 years!
Battery Progress
22
EE141 43
Power Dissipation in CMOS
Dynamic power• Charging capacitances• Dominant today
Leakage power• Leaky transistors• Concern in low-activity, portable devices
Short circuit power
Static power• E.g. pseudo-NMOS
EE141 44
( ) ( ) ∫∫ ∫ ====→
DDV
DDLoutLDD
T T
DDDDDD VCdvCVdttiVdttPE0
2
0 010
( ) ( ) ∫∫ ∫ ====DDV
DDLoutoutL
T T
LoutCC VCdvvCdttivdttPE0
2
0 0 21
Vdd
Vout
iL
CL
PMOSNETWORK
NMOS
A1
AN
NETWORK
210 DDLVCE =→
Dynamic Power Consumption
23
EE141 45
One half of the power from the supply is consumed in the pull-up network and one half is stored on CLCharge from CL is dumped during the 1→0 transition
Vdd
Vout
iL
CL
PMOSNETWORK
NMOS
A1
AN
NETWORK
210 DDLVCE =→
221
DDLR VCE =
221
DDLC VCE =
Dynamic Power Consumption
EE141 46
Power = Energy/transition • Transition rate
= CLVDD2 • f0→1
= CLVDD2 • f • P0→1
= CswitchedVDD2 • f
Power dissipation is data dependent – depends on the switching probability
Switched capacitance Cswitched = CL • P0→1
Dynamic Power Consumption
24
EE141 47
Energy consumed in N cycles, EN:
EN = CL • VDD2 • n0→1
n0→1 – number of 0→1 transitions in N cycles
fVCN
nfNEP DDLN
NNavg ⋅⋅⋅
=⋅= →
∞→∞→
210limlim
fN
nN
⋅= →
∞→→10
10 limα
fVCP DDLavg ⋅⋅⋅= →2
10α
Transition Activity and Power
EE141 48
Factors Affecting Transition Activity
“Static” component (does not account for timing)• Type of logic function (NOR vs. XOR)• Type of logic style (Static vs. Dynamic)• Signal statistics• Inter-signal correlations
“Dynamic” or timing dependent component• Circuit topology• Signal Statistics and correlations
25
EE141 49
011
001
010
100
OutBA
Example: Static 2-input NOR Gate
Assume signal probabilitiespA=1 = 1/2pB=1 = 1/2
Then transition probabilityp0→1 = pOut=0 x pOut=1
= 3/4 x 1/4 = 3/16
α0→1 = 3/16
If inputs switch every cycle
Type of Logic Function: NOR vs. XOR
EE141 50
011
101
110
000
OutBA
Example: Static 2-input XOR Gate
Assume signal probabilitiespA=1 = 1/2pB=1 = 1/2
Then transition probabilityp0→1 = pOut=0 x pOut=1
= 1/2 x 1/2 = 1/4
α0→1 = 1/4
If inputs switch in every cycle
Type of Logic Function: NOR vs. XOR
26
EE141 51
In1
In2 PDNIn3
Me
Mp
CLK
CLKOut
CL
Power only dissipated when previous Out = 0
Power Consumption of Dynamic Gates
EE141 52
011
001
010
100
OutBA
Dynamic 2-input NOR Gate
Assume signal probabilitiesPA=1 = 1/2PB=1 = 1/2
Then transition probabilityP0→1 = Pout=0 x Pout=1
= 3/4 x 1 = 3/4
Switching activity always higher in dynamic gates!P0→1 = Pout=0
Dynamic Power Consumption is Data Dependent
27
EE141 53
Prime choice: Reduce voltage!• Recent years have seen an acceleration in supply
voltage reduction• Design at very low voltages still open question
(0.6 … 0.9 V by 2010!)• Reducing thresholds to improve performance
increases leakage
Reduce switching activity
Reduce physical capacitance
Principles for Power Reduction
EE141 54
Next Lecture
PowerSequential Logic