aug 20071 shift operations source: david harris. aug 20072 shifter implementation regular layout,...
TRANSCRIPT
Aug 2007 2
Shifter Implementation
Regular layout, can be compact, use transmission gates to avoid threshold drop.
Not amenable to synthesis, high capacitive loading for large arrays.Source: David Harris
Aug 2007 5
Array Multiplier with CPAs
Source: Jan Rabaey
Array adder with Carry propagate adders (CPA), multiple near-critical paths
Aug 2007 7
How do CSAs work?
CSA: Carry Save Adder
Want to add these four numbers together (same problem as adding partial products in a multiplier)
Source: David Harris
Aug 2007 8
How do CSAs work? (cont)
Can use a full adder network to add three numbers together if we view the carry-in inputs as a bus that contains the third number.
The output produces a sum vector and a carry vector, and these have to be added to produce the final result.
Source: David Harris
Aug 2007 9
How do CSAs work? (cont)
carry vector has to be shifted to left by 1 before being added to the sum because the COUT bit has a weight of 2x that of the sum bit.
Source: David Harris
Aug 2007 10
CSA MultiplierCarry is shifted to left before being added.
This final addition is always N/2 in size if the product has N bits. For large multipliers, need to use a fast adder structure to do this addition.
Source: Jan Rabaey
Aug 2007 11
Multiplier Layout
Layout can be made to be rectangular
Source: David Harris
Source: David Harris
Aug 2007 12
2’s Complement Multiply Definition
MSb has negative weight
MSb has negative weight
4 bit 2’s complement example:
= -5 = 0xB = 1011 = -1*23 + 0*22 +1*21 +1*20 =-8+0+2+1=-5
Source: David Harris
Source: David Harris
Aug 2007 14
Modified Baugh-Wooley Multiplier(2’s complement)
Pre-compute sums of constant ‘1’, push some terms upwards.
Source: David Harris
Aug 2007 15
Multiplier Layout For Two’s Complement
Shaded Cells are modified cells for Baugh-Wooley.
Source: David Harris
Aug 2007 16
Booth EncodingPrevious multipliers use radix-2, one bit of the multiplier is observed at a time.
In general, radix-2r multipliers produce N/r partial products (assuming NxN multiplier).
Fewer partial products lead to smaller/faster CSA arrays.
A radix-4 = radix-22 multiplier produces N/2 partial products.
Two-bits * two bits = Y1Y0 * X1X0 = Y*X
= Y*0, Y*1, Y*2, Y*3
Y*0, Y*1, Y*2 are easy/fast (Y*2 is a shift).
Y*3 is hard, has to be done Y*3= Y*(2+1)= 2Y + Y,
involves a carry propagate.
Aug 2007 17
Radix-4 Partial Products
Y
* XN-1XN-2...X3X2 X1X0
Y* X1X0
+ Y* X3X2
+ Y* XN-1XN-2
Number of partial products
is reduced.
Source: David Harris
Aug 2007 18
Booth Encoding (cont.)Observe that 2Y = 4Y – 2Y and 3Y = 4Y – Y
4Y is simply the next row in the partial product, so just add Y to next row. In both cases, Y has to be added to current partial product.
Booth encoding looks at current 2 bits, and MSB of previous 2 bits, and modifies the partial product.
If the MSB of the previous pair is ‘1’, add in ‘Y’ to current value.
Aug 2007 19
Booth Encoding (cont)
PP =0*Y
PP =0*Y +Y = YPP =Y +0 = YPP =Y +Y = 2YPP =-2Y +0 = -2Y
PP =-2Y +Y = -YPP =-Y +0 = -YPP =-Y +Y = 0
Sign bit select2Y
select
1Y select
Negative operations are done at bit level as complements with +1 added to PP to complete 2’s complement
Source: David Harris
Aug 2007 20
Booth Selection Logic
Replaces AND gates in CSA array
When –Y is chosen, have a problem in that a ‘1’ has to be added to complete two’s complement
Source: David Harris
Aug 2007 21
Unsigned R-4 Booth Array (16 x 16)sign extension, either all 1’s or all 0’s for-Y terms
‘1’ or ‘0’ needed to complete 2’s complement
Extra PP in case last PP needed a ‘Y’ added in here (last two X bits were either 2 or 3)
Source: David Harris
Aug 2007 22
Optimized R-4 Booth Array (unsigned)SSSS = 1111 + S
additional reduction
produces this.
Source: David Harris
Aug 2007 23
Signed R-4 Booth Array (16 x 16)
ei = Mi xor y15
Last PP8 is not needed for signed multiply
Source: David Harris
Aug 2007 24
Booth Speedup
• Radix-4 arrays 20-to-50% smaller than CSA arrays and up to 20% faster.
• Higher Radix multipliers are possible, but not worth it except for larger multipliers (at least 64 bits).
Aug 2007 25
Wallace TreesA CSA adder just adds the PPs together one at a time:
3,2 Counter is another name
for a full adder
Source: David Harris
Aug 2007 26
Wallace Trees (cont).A Wallace tree adds the partial products in parallel!
Number of levels is:
Layout is not regular, long wires can cause delay.
Source: David Harris
Aug 2007 27
4-2 CompressorUsed to reduce the number of levels in a Wallace Tree
Number of levels is:
Logic more complex than Full Adder
Layout is more regular.
Source: David Harris