low-power, high-speed multiplier architectures
DESCRIPTION
Low-power, High-speed Multiplier Architectures. Shawn Nicholl ELEC-5705y March 7, 2005. Agenda/Overview. Design Abstraction Numbering Systems Addition and Subtraction Adder Architectures Multiplication Traditional Multiplier Architectures Advanced Multiplier Architectures. - PowerPoint PPT PresentationTRANSCRIPT
Low-power, High-speed Multiplier Architectures
Shawn NichollELEC-5705y
March 7, 2005
2005/03/07 Low-Power, High-Speed Multiplier Architectures 2
Agenda/Overview
Design Abstraction Numbering Systems Addition and Subtraction Adder Architectures Multiplication Traditional Multiplier Architectures Advanced Multiplier Architectures
2005/03/07 Low-Power, High-Speed Multiplier Architectures 3
Levels of Abstraction in Digital ICs
Higher levels of abstraction have greater effect on overall system performance
Systems
Modules
Logic Gates
Circuits
Devices
Low-power, high-speed techniques can be used at many levels of abstraction
Incr
easi
ng
Ab
stra
ctio
n
Multiplier Architectures
2005/03/07 Low-Power, High-Speed Multiplier Architectures 4
Numbering Systems – A Quick Review
Decimal
1
0
10n
i
iidD
1
0
2n
i
iibB
Range: 0 to 10n-1
Range: 0 to 2n-1
Range: -2n-1 to +(2n-1 –1)
Some common numbering systems:
Unsigned Binary
Two’s-Complement
Sign Decimal Sign Unsigned Binary
Sign Two’s Complement
+ 10 + 0000 1010 N/A 0000 1010
- 45 - 0010 1101 N/A 1101 0011
1 1 0 1 0 0 1 1
1 1 0 1 0 0 1 0 1
2’s Comp
45d = 0+0+25+0+23+22+0+20
0 0 1 0 1 1 0 1Eg.
2005/03/07 Low-Power, High-Speed Multiplier Architectures 5
Adding and Subtracting Two’s-complement algorithm is consistent
Addition and subtraction and behave the same Negative numbers treated same as positive numbers
Example: Add –45d to 10d 10d-45d
-45d 10d
45d-10d
45d-10d 35d
-35d
Step1) Initialize
Step2) Compare so that augend holds larger number
Step3) Treat as a subtraction
Step4) Do subtraction (borrows may be required)
Step5) Negate result (knowing that augend was negative)
Two’s Complement Method
Step1) Initialize
Step2) Add (no special rules)
10d = 0000 1010b-45d = 1101 0011b 0000 1010b 1101 0011b 1101 1101b
Converting 2’s Comp back to decimal:
1101 1101b = -35d
2005/03/07 Low-Power, High-Speed Multiplier Architectures 6
Adding and Subtracting (Example 2)
Example2: Subtract –45d from 10d
10d- -45d
10d+ 45d
55d
Step1) Initialize
Step2) Subtrahend is negative, so negate it and do an addition
Signed Decimal Method Two’s Complement Method
10d = 0000 1010b
-45d = 1101 0011b
1b 0000
1010b 0010
1100b 0011
0111bConverting 2’s Comp back to
decimal:0011 0111b = 55d
Step1) Initialize
Step2) Invert subtrahend and set CIN = 1
Subtraction logic can be shared with addition logic!
2005/03/07 Low-Power, High-Speed Multiplier Architectures 7
Adder Building Blocks
Half AdderSn = An Bn
COn = An • Bn
An
Bn
COn
Sn
SnCINn
COUTn
An
Bn
Full AdderSn = An Bn CINn
COUTn = An • Bn• CINn
2005/03/07 Low-Power, High-Speed Multiplier Architectures 8
Adder Architectures (CRA)
Carry Ripple Adder (CRA) Gate Count N Area N Delay N Power N Layout friendly (low fan-in/fan-out; regular
structure)
AN BN
SN
FACOUTN CIN0
A1 B1
S1
FA
A0 B0
S0
FA
2005/03/07 Low-Power, High-Speed Multiplier Architectures 9
Adder Architectures (CLA) Carry Lookahead Adder (CLA)
Generate: Gn = An • Bn
Propagate: Pn = An + Bn Recursive Relationship:
CINn = Gn-1 + Pn-1• CINn-1
Generates
Propagates 1
CINn = Gn-1 + Pn-1Gn-2 + Pn-1Pn-2…P1G0 + Pn-1Pn-2…P0CIN0
CLA: Delay log2N
(if built right) Gate count, power are
greater than CRA Not layout friendly
(high fan-in; difficult to route)
GN-1 PN-1 CIN0P0P1PN-1 PN-1 GN-3 PN-1 P1P2 G0PN-2GN-2
CINN
AN BN A1 B1 A0 B0
SN S1 S0
Source: Patterson and Hennessy,Figure A.14
Stage n
CINn
Stage n
CINn
Stage n-1
Stage n-1
CINn
2005/03/07 Low-Power, High-Speed Multiplier Architectures 10
Adder Architectures (CSA)
Carry Save Adder Adders work
independently, so very fast
Pipelined architecture results in flops and control logic, which increase area and latency
CIN0A0 B0
S0
FA
COUT0
CIN1A1 B1
S1
FA
COUT1
CINN-1AN-1 BN-1
SN-1
FA
COUTN-1
CINNAN BN
SN
FA
COUTN
FAFAFAFA
FAFAFAFA
FAFAFAFA
2005/03/07 Low-Power, High-Speed Multiplier Architectures 11
Unsigned Multiplication
Shift-and-Add Algorithm
Example: Multiply 118d by 99d
Multiplicand
Multiplier
Step1) Initialize
Step2) Find partial products
Step3) Sum up the shifted partial products
118d99d
1062d 1062
d11682
d
Two’s Complement Method
Step1) Initialize
Step2) Find partial products
Step3) Sum up the shifted partial products
118d = 0111 0110b
99d = 0110 0011b
01110110b
Convert 2’s-Comp back to decimal:
0010 1101 1010 0010 = 11682d
00000000 b00000000 b
01110110 b01110110 b
00000000 b010110110100010 b
01110110 b00000000 b
2005/03/07 Low-Power, High-Speed Multiplier Architectures 12
Shift-and-Add Multiplier
A
B
SCOUT
Anx B
N-bit Adder
N N
Load B
Load A
P
N
N
N
N
N
N
N+1
1
2N
Shift
Add
B MultiplicandX A Multiplier P Product
Shift-and-Add Multiplier
Take N cycles to complete:
TLat= (TN-bitADD+Tshift)xN Requires
minimal logic (most logic is in the adder)
2005/03/07 Low-Power, High-Speed Multiplier Architectures 13
A B
Shift-and-AddMultiplier
Convert toUnsigned
Convert toUnsigned
DetermineSign of Result
Convert toSigned
P
2N
NN
Basic Signed Multiplication
ExtraHardware!
Basic Idea1. Convert to
Unsigned2. Use Shift-and-Add
Multiplier3. Convert to Signed
2005/03/07 Low-Power, High-Speed Multiplier Architectures 14
Signed Multiplication
Booth Recoding Reduce the number of partial products
by re-coding the multiplier operand Works for signed numbers
Example: Multiply -118d by -99d
Recall, 99d = 0110 0011b
1001 1100b 1b
-99d = 1001 1101bRadix-2 Booth Recoding
0101 1110-99d =
An An-1
Partial Produc
t
0 0 0
0 1 +B
1 0 -B
1 1 0
Low-order BitLast Bit Shifted Out
2005/03/07 Low-Power, High-Speed Multiplier Architectures 15
Radix-2 Booth Multiplication
Radix-2 Booth
Step1) Initialize
Step2) Find partial products
Step3) Sum up the shifted partial products
-118d = 0111 0110b
01110110b
Convert 2’s-Comp back to decimal:
0010 1101 1010 0010 = 11682d
00000000 b00000000 b
1110001010 b000000000 b
01110110 b0010110110100010
b
110001010 b01110110 b
0101 1110-99d = -B
B-B 0 0 B 0-B
B = -118d = 1000 1010b
-B = 118d = 0111 0110b
A = -99d = 1001 1101b
Example: Multiply -118d by -99d
Sign Extension
0101 1110-99d =
2005/03/07 Low-Power, High-Speed Multiplier Architectures 16
Array Multiplier
Array Multiplier Combinatorial, so it is very
fast – delay N Can be pipelined Very regular structure
-118d = 0111 0110b
01110110b
00000000 b00000000 b
1110001010 b000000000 b
01110110 b0010110110100010
b
110001010 b01110110 b
0101 1110-99d = -B
B-B 0 0 B 0-B
01110110b110001010 b01110110 b
-B B-B
FA FAFAFA
CSA
CSA
CSA
CSA
CSA
CPA
00000000 b
0
00000000 b
0
1110001010 b
B
000000000 b
0
01110110 b
-B
2005/03/07 Low-Power, High-Speed Multiplier Architectures 17
Array Multiplier Structure
Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999
2005/03/07 Low-Power, High-Speed Multiplier Architectures 18
Radix-4 Booth Multiplication
Similar to Radix-2, but uses looks at two low-order bits at a time (instead of 1) A2n+1 A2n A2n-1
Partial Produc
t
0 0 0 0
0 0 1 +B
0 1 0 +B
0 1 1 +2B
1 0 0 -2B
1 0 1 -B
1 1 0 -B
1 1 1 0
Low-order Bits
Last Bit Shifted Out
Recall, 99d = 0110 0011b
1001 1100b 1b
-99d = 1001 1101bRadix-4 Booth Recoding
-99d =
1122
2005/03/07 Low-Power, High-Speed Multiplier Architectures 19
Radix-4 Booth Multiplication
Radix-4 Booth
Step1) Initialize
Step2) Find partial products
Step3) Sum up the shifted partial products
-118d = 0111 0110b
Convert 2’s-Comp back to decimal:
0010 1101 1010 0010 = 11682d
111111110001010b
011101100 b0010110110100010
b
01110110 b11100010100 b
B-B 2B-2B
B = -118d = 1000 1010b-B = 118d = 0111 0110b
2B = -236d = 1 0001 0100b
-2B = 236d = 0 1110 1100b
A = -99d = 1001 1101b
Example: Multiply -118d by -99d
Sign Extension
-99d =
1122
-99d =
1122
Reduces number of partial products by half!
2005/03/07 Low-Power, High-Speed Multiplier Architectures 20
Tree Multiplier
Wallace Tree Reduces the total
number of full-adders Uses 3:2 Compressor
(aka Full Adder) Delay log3/2N Irregular structure is
difficult to layout
Source: J. Kuo, et. al., Low-Voltage CMOS VLSI Circuits, 1999
B7A0 B0A0
B7A8 B0A8
B7A0 B0A0B7A8
B0A8
OriginalStructure
TreeStructure
2005/03/07 Low-Power, High-Speed Multiplier Architectures 21
Twin Pipe Serial-Parallel Multiplier
Features
Source: S. Shah, et.al., “Comparison of 32-bit Multipliers for Various Performance Measures”, 2000.
Even data bits on rising clock
Odd data bits on falling clock
Parallel Feed One Operand
Serial Feed One Operand
Low Area High
latency Low Power
2005/03/07 Low-Power, High-Speed Multiplier Architectures 22
Cluster Multiplication
Divide circuit into clusters of nibble-wide multiplications If all bits in a nibble
are zeroes, then use clock-gating to gate multiplication for that nibble
A0
B0
A1
B1
A(N-1)
B(N-1)
A(N-1)xB0 A1xB0 A0xB0
A(N-1)xB1 A1xB1 A0xB1
A(N-1)xB(N-1)
A1xB(N-1) A0xB(N-1)
4 44
4
4
4
Source: A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel
Multipliers”, 2001.
Features Low Power(claims 13%
savings)
2005/03/07 Low-Power, High-Speed Multiplier Architectures 23
Multiplexer-Based Array Multiplier
Characteristics Fast (because it
is array-based) Unlike Booth,
does not require encoding logic
Source: K. Pekmestzi, “Multiplexer-Based Array Multipliers”, 1999.
Processes 1 bit of multiplier and 1 bit of multiplicand at a time, thus it is symmetric
Has a zigzag shape, thus not layout-friendly
2005/03/07 Low-Power, High-Speed Multiplier Architectures 24
Area-Efficient Multiplexer-Based Multiplier
Characteristics Increases each row to have N+1 cells (instead of N) Depth is cut in half (increases “squareness”)
Source:Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, 2001.
2005/03/07 Low-Power, High-Speed Multiplier Architectures 25
Low Latency Booth-Encoding-based Pipeline Multiplier
Features Delay N/4 Needs (N+N/2)-bit
addition at end Uses CLA’s instead of
CSA’s because longest stage (i.e. adder at end) determines fastest operating frequency
Source: X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, 2001.
2005/03/07 Low-Power, High-Speed Multiplier Architectures 26
Two’s Complement Gray-Encoded Array Multiplier
Characteristics Uses gray code
to reduce the switching activity of multiplier
Claims that traditional Booth uses 45% more power
Greater area than traditional Booth
Source: E. Costa, et.al., “A New Architecture for 2’s Complement Gray Encoded Array Multiplier”, 2002.
2005/03/07 Low-Power, High-Speed Multiplier Architectures 27
Project Plan
Start End Task
- 03/05 Research Multiplier Circuits
03/06 03/12 Code multipliers in Verilog HDL
03/13 03/19 Synthesize all multiplier circuits
03/20 03/26 Analyze results (delay/power/area)
03/27 04/02 Prepare report
04/03 04/09 Prepare for final exam
04/10 04/16 Complete Report and Submit
2005/03/07 Low-Power, High-Speed Multiplier Architectures 28
References S. Shah, A.J. Al-Khalili, D. Al-Khalili, “Comparison of 32-bit Multipliers for
Various Performance Measures”, Proc. 2000 Int’l Conf. Microelectronics, pp. 75-80, 2000.
D. Patterson, J. Hennessy, 2nd, ed., Computer Architecture – A Quantitative Approach, San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1996.
X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, Proc. 2001 Int’l Conf. on ASIC, pp. 551-554, 2001.
J. Wakerly, 2nd, ed., Digital Design – Principles and Practices, Eaglewood Cliffs, NJ: Prentice Hall, 1994.
J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, New York, NY: John Wiley & Sons, Inc., 1999.
K. Pekmestzi, “Multiplexer-Based Array Multipliers”, IEEE Trans. on Computers, vol. 48, pp. 15-23, 1999.
A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, Proc. 2001 IEEE Computer Society Workshop on VLSI, pp. 149-154, 2001.
Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, Proc. 2001 IEEE Int’l Conf. On Electronics, Circuits and Systems, vol. 3, pp. 1429‑1432, 2001.