simulation of booth multiplier with verilog-xl

Simulation of Booth Multiplier with Verilog-XL

November 30, 2011

Robert D’Angelo & Scott Smith

Tufts University

Electrical and Computer Engineering

EE-103 Lab 3: Part II

Professor:

Dr. Valencia Joyner Koomson

1

Abstract

In this lab an 8x8 modified booth multiplier is designed and tested at the gate level using the AMS

simulator in Cadence Design System. Using logic gate primitives constructed in Part I of this lab exercise,

the following blocks were simulated to complete the booth multiplier architecture: half-adder, full-adder,

4-bit carry lookahead adder (CLA), 12-bit CLA, CLA logic, booth encoder, and booth decoder. Each of

these blocks along with the complete architecture of the multiplier were simulated successfully. There is

an average delay of ??? per multiply operation.

2

CONTENTS LIST OF FIGURES

ContentsAbstract 2

1 Booth Multiplier Overview 51.1 Booth’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Design and Simulation 62.1 Booth Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Booth Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Twelve-bit Carry Lookahead Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Sign Extension Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.5 Signed 8x8 Modified Booth Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Conclusions 39

List of Figures1 Modified Booth Multiplier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Booth Encoder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Booth Encoder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Booth Encoder Test Bench Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Booth Encoder AMS Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4-bit Booth Encoder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4-bit Booth Encoder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4-bit Booth Encoder Test Bench Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4-bit Booth Encoder AMS Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210 Booth Decoder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1311 Booth Decoder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1412 Half Adder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1413 Half Adder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1514 Half Adder Verilog X Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1515 8-bit Booth Decoder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616 8-bit Booth Decoder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1717 8-bit Booth Decoder Test Bench Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1818 8-bit Booth Decoder AMS Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1919 Full Adder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2220 Full Adder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2321 Full Adder Verilog X Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2322 4-bit Adder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2423 4-bit Adder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2424 Four input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2525 Five input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2526 Carry Lookahead Logic Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2627 4-bit Adder Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2728 12-bit Carry Lookahead Adder Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2829 12-bit Carry Lookahead Adder Testbench Schematic . . . . . . . . . . . . . . . . . . . . . . . 2830 12-bit Carry Lookahead Adder Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2931 12-bit Carry Lookahead Adder AMS Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 3032 Sign Extension Trick with Half Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3333 Sign Extension Trick with Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3334 8x8 Booth Multiplier Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3435 8x8 Booth Multiplier Testbench Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3436 8x8 Booth Multiplier Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3537 8x8 Booth MultiplierAMS Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3

LIST OF TABLES LIST OF TABLES

List of Tables1 Booth Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Booth Encoder Truth Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Booth Decoder Truth Table (NEG not included) . . . . . . . . . . . . . . . . . . . . . . . . . 134 Sign Extension Truth Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Delay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4

1 BOOTH MULTIPLIER OVERVIEW

1 Booth Multiplier Overview

1.1 Booth’s AlgorithmTraditional hardware multiplication is performed in the same way multiplication is done by hand: partial

products are computed, shifted appropriately, and summed. This algorithm can be slow if there are manypartial products (i.e. many bits) because the output must wait until each sum is performed. Booth’s algo-rithm cuts the number of required partial products in half. This increases the speed by reducing the totalnumber of partial product sums that must take place.

The algorithm exploits the fact that multiplication by a sequence of 1’s can be computed simply withinversions and shifts, simpler operations than adding. This algorithm first encodes the start, middle, end,or absence of a sequence of 1’s in the multiplier term from groupings of three bits, each with an overlappingbit from the previous grouping. These encodings are then used to compute the partial products from themultiplicand by either multiplying it by 1 (i.e. no change), multiplying it by 2 (shift left one bit), ormultiplying it by -1 (2’s complement). The encodings are shown in Table 1. These partial products areshifted by two bits for each partial product after the first. The product is equal to the sum of these terms.

Table 1: Booth EncodingGrouping Partial Product

000 0*Multiplicand001 1*Multiplicand010 1*Multiplicand011 2*Multiplicand100 -2*Multiplicand101 -1*Multiplicand110 -1*Multiplicand111 0*Multiplicand

1.2 ArchitectureFigure 1 below shows a block diagram of the proposed Booth multiplier implementation. This circuit

takes in two 8-bit binary numbers and outputs the 16-bit product. The multiplier, X[7:0], is divided into fourgroupings: 0, X0, X1; X1, X2, X3; X3, X4, X5; X5, X6, X7. Each of these groupings is passed into a Boothencoder, which outputs bits corresponding to the operations described in Table 1 (x0, x1, x2, x-1). Eachgroup of these selection bits are sent to a Booth decoder block, which outputs the appropriate partial productterm based on the selected operation. These partial products are then sign extended so that sign bits aretaken into account during the summing. Finally, the canonical shift and add multiplication is implementedusing 12-bit carry lookahead adders (CLA). The first two bits of each partial product are passed directlyto the output to account for the shifting. A standard array multiplier would typically require 8 partialproducts, and thus 8 adders. However, this implementation reduces the number of partial products to onlyfour, significantly improving speed. Furthermore, the CLA provides another speed boost to the system.

5

2 DESIGN AND SIMULATION

Figure 1: Modified Booth Multiplier Architecture

2 Design and Simulation

2.1 Booth Encoder

Single Booth EncoderTable 2 shows the truth table for a Booth encoder. The encoder takes inputs x2i+1, x2i, and x2i−1from

the multiplier bus and produces a 1 or a 0 for each operation: single, double, negative. Figure 2 shows theschematic that implements Table 2. Figure 3 shows a symbol view of the encoder. This block was simulatedusing the Analog Mixed Signal (AMS) simulator. Figure 4 shows the simulation results. Based on thesimulation, this block has a propogation delay of ???. The Verilog test bench code, which sweeps all testcases can be seen in Program Listing 1.

6

2.1 Booth Encoder 2 DESIGN AND SIMULATION

Table 2: Booth Encoder Truth Tablex2i+1 x2i x2i−1 PPi Single Double Negate0 0 0 0 0 0 00 0 1 Y 1 0 00 1 0 Y 1 0 00 1 1 2Y 0 1 01 0 0 -2Y 0 1 11 0 1 -Y 1 0 11 1 0 -Y 1 0 11 1 1 -0 (=0) 0 0 1

Figure 2: Booth Encoder Schematic

7


Figure 3: Booth Encoder Symbol

Figure 4: Booth Encoder Test Bench Schematic

8


Figure 5: Booth Encoder AMS Simulation

// Ver i l og HDL fo r " lab3_sims " , " ver i log_3in_st im " " f u n c t i o n a l "

‘timescale 1ns /1psmodule ver i log_3in_st im ( inout wire VDD,

inout wire VSS ,output reg [ 2 : 0 ] OUT ) ;

assign VDD = 1 ’ b1 ;assign VSS = 1 ’ b0 ;reg c l o ck ;i n i t i a l begin

c l o ck = 1 ’ b0 ;OUT = 3 ’ b000 ;

endalways begin

#25.0 c l o ck = ~ c lock ;endalways @ (posedge c l o ck ) begin

OUT = #1.0 OUT + 3 ’ b001 ;endalways @ (negedge c l o ck ) begin

i f ( OUT == 3 ’ b111 )#50 $finish ;

endendmodule

Listing 1: Booth Encoder Verilog StimulusComplete Booth EncoderFour single bit encoders are combined to produce a complete Booth encoder as shown in Figure 6. Thisblock was generated to simplify the final schematic. Figure 7 shows a symbol view of this block. This blockwas simulated using AMS. Figure 8 shows the test bench schematic. The Verilog test bench block with alltest cases can be viewed in Program Listing 2. The simulation results are displayed in Figure 9.

9


Figure 6: 4-bit Booth Encoder Schematic

10


Figure 7: 4-bit Booth Encoder Symbol

Figure 8: 4-bit Booth Encoder Test Bench Schematic

r

// Ver i l og HDL fo r " lab3_sims " , " ver i log_8in_st im " " f u n c t i o n a l "

‘timescale 1ns /1psmodule ver i log_8in_st im ( inout wire VDD,

inout wire VSS ,output reg [ 7 : 0 ] OUT ) ;

assign VDD = 1 ’ b1 ;assign VSS = 1 ’ b0 ;reg c l o ck ;i n i t i a l begin

c l o ck = 1 ’ b0 ;OUT = 8 ’ h00 ;

endalways begin

#25.0 c l o ck = ~ c lock ;endalways @ (posedge c l o ck ) begin

OUT = #1.0 OUT + 8 ’ h01 ;endalways @ (negedge c l o ck ) begin

i f ( OUT == 8 ’hFF )$finish ;

endendmodule

Listing 2: Booth Encoder Verilog Stimulus

11

2.2 Booth Decoder 2 DESIGN AND SIMULATION

Figure 9: 4-bit Booth Encoder AMS Simulation

2.2 Booth Decoder

Single Bit Booth DecoderTable 3 shows a truth table of the decoder block for generating a single bit of a partial product with inputsfrom the encoder output bits and two of the multiplicand bits. This truth table does not show the partialproduct if it is negated. This function is accomplished by simply performing the XOR of the PPij in the tablewith the NEG output of the Booth encoder so that if NEG is asserted, the output is invertered; however, ifNEG is low, the output is unchanged. From Table 3 it can be seen that if SNG is asserted Yj is passed tothe output indicating multiply by unity. If DBL is asserted, Yj-1 is passed to the output, indicating a shiftby one bit to the left. If SNG and DBL are both zero, the output is zero. SNG and DBL cannot both beasserted, so these cases are not shown. Figure 10 shows a gate level implementation of the single bit Boothdecoder, and Figure 11 shows a symbol view. This block has a worst case delay of about 7ns.

12


Table 3: Booth Decoder Truth Table (NEG not included)SNG Yj DBL Yj-1 PPij (pre-xor)0 0 0 0 00 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 1 0 0 11 1 0 1 1

Figure 10: Booth Decoder Schematic

13


Figure 11: Booth Decoder Symbol

Half AdderThe single bit decoder blocks invert the partial product bits when NEG is asserted; however, one mustbe added to the least significant bit (LSB) so that 2’s complement convention is followed. A cascade ofhalf-adders at the output (shown in Figure XX) implements this sum. Figure 9 shows a gate level view ofthe half adder. This circuit produces a sum bit from the XOR of inputs A and B and a carry bit from thefunction A AND B. The NAND and Inverter primitives are used to create the AND function. Figure 10shows the symbol view, Figure 11 shows a functional simulation, and Figure 12 shows the Verilog stimuluscode.

Figure 12: Half Adder Schematic

14


Figure 13: Half Adder Symbol

Figure 14: Half Adder Verilog X Simulation

// Ver i l og s t imu lu s f i l e .// Please do not c r ea t e a module in t h i s f i l e .

i n i t i a lbegin

A = 1 ’ b0 ; B = 1 ’ b0 ;#10 A = 1 ’ b0 ; B = 1 ’ b1 ;#10 A = 1 ’ b1 ; B = 1 ’ b0 ;#10 A = 1 ’ b1 ; B = 1 ’ b1 ;#10 $finish ;end


15


Eight-bit DecoderIn order to produce a complete partial product, an 8-bit Booth decoder block was implemented. This decoderconsists of nine single bit Booth decoders, each of which sends its partial product bit to a half adder. Eachhalf adder sums the current partial product bit with the carry from the previous half adder. The first halfadder sums the NEG bit with the first partial product bit, PP<0>. This step completes the 2’s complementrepresentation of the partial product by adding ’1’ to the LSB if NEG is asserted. In the first decoder, theYj−1 bit is tied to VSS (’0’). This connection ensures that when DBL is asserted, a zero is shifted in.

Note that this schematic actually produces a 9-bit output. The 9th bit is produced by connecting Yj−1of the 9th decoder to Y<7>, the MSB of the multiplicand. If a shift is occuring, the 9th bit, PP<8>, willbe correctly replaced with the 8th bit, Y<7>. If no shift occurs, PP<8> will become a sign extension bitthat is simply equal to the MSB, i.e. the sign bit, Y<7>.

Figure 15 shows a symbol view of the 8-bit decoder, and Figure 16 shows a gate level schematic. Thiscircuit was simulated using AMS. Figure 17 shows the test bench schematic used in the simulation, andFigure 18 shows the simulation results. The verilog code of the test bench block is shown in Listing 4. Alltest cases simulated successfully. This block has a worst case delay of about 33ns.

Figure 15: 8-bit Booth Decoder Symbol

16


Figure 16: 8-bit Booth Decoder Schematic

17


Figure 17: 8-bit Booth Decoder Test Bench Schematic

18


Figure 18: 8-bit Booth Decoder AMS Simulation

19


// Ver i l og HDL fo r " lab3_sims " , " booth_decode_9bit_v1_tb1 " " f u n c t i o n a l "// Testbench f o r booth_decode_9bit_v1_tb1

module booth_decode_9bit_v1_tb1 ( output wire Xm1,output wire X0 ,output wire Xp1 ,output reg [ 7 : 0 ] Y,input wire [ 8 : 0 ] PP,inout wire VDD,inout wire VSS ) ;

parameter NUM0 = 8 ’ b00000000 ;parameter NUM1 = 8 ’ b00000001 ;parameter NUM2 = 8 ’ b11111110 ;parameter NUM3 = 8 ’ b11111111 ;parameter NUM4 = 8 ’ b10101010 ;parameter NUM5 = 8 ’ b01010101 ;parameter NUM6 = 8 ’ b10000000 ;parameter NUM7 = 8 ’ b11001100 ;

reg c l k ;reg [ 2 : 0 ] X;reg [ 3 : 0 ] s e l ;

assign VDD = 1 ’ b1 ;assign VSS = 1 ’ b0 ;

assign Xm1 = X[ 0 ] ;assign X0 = X[ 1 ] ;assign Xp1 = X[ 2 ] ;

// I n i t i a l i z e v a r i a b l e s .i n i t i a l begin

c l k = 1 ’ b0 ;X = 3 ’ b000 ;s e l = 4 ’ h0 ;

end// Clock f o r advancing through the t e s t cases .always begin

#50.0 c l k = ~ c lk ;end// Counter f o r X to genera te a l l p o s s i b i l i t i e s .always @ (posedge c l k ) begin

X = X + 3 ’ b001 ;end// ’ s e l ’ v a r i a b l e to i t e r a t e through the case s ta tement be low .always @ (posedge c l k ) begin

20


i f ( X == 3 ’ b000 ) begins e l = s e l + 4 ’ h1 ;

endend// D i f f e r e n t cases f o r Y.always @ ( s e l or Y) begin

case ( s e l )4 ’ h1 : begin

Y = NUM0;end

4 ’ h2 : beginY = NUM1;end







default : beginY = 8 ’ h00 ;

endendcase

end// End s imu la t i on when we f i n i s h l o o k i n g at a l l our cases .always @ (posedge c l k ) begin

i f ( s e l == 4 ’ h8 ) begini f ( X == 3 ’ b111 ) begin

#100;$finish ;

endend

endendmodule


21

2.3 Twelve-bit Carry Lookahead Adder 2 DESIGN AND SIMULATION

2.3 Twelve-bit Carry Lookahead Adder

Full AdderThe first block required for a multi-bit adder is the full adder. This circuit is identical to the half adderexcept that it has an additional input, Cin, so that a carry from a previous addition may be passed along.Furthermore, instead of a carry out, Cout, propogate (P) and generate (G) signals are produced. The logicfor this schematic is as follows:

S = A ⊕ B ⊕ Cin (2.1)

P = A ⊕ B (2.2)

G = A × B (2.3)

These two signals can be used to generate the carry out bit simultaneously for faster addition (see 4-bitCarry Lookahead Adder). Figure 19 shows a gate level implementation of the full adder schematic. Notethat ~G is also generated. This bit can be used instead of G to simplify some of carry lookahead logic.Figure 20 shows a symbol view of the full adder. This block was verified using Verilog X. The simulationoutput can be seen in Figure 21, and the stimulus code can be seen in Listing 5. All possible inputs wereswept, and all outputs were correct. There is a average of 4ns propagation delay from input to sum.

Figure 19: Full Adder Schematic

22


Figure 20: Full Adder Symbol

Figure 21: Full Adder Verilog X Simulation


i n i t i a lbegin

A = 1 ’ b0 ; B = 1 ’ b0 ; Cin = 1 ’ b0 ;#10 A = 1 ’ b0 ; B = 1 ’ b0 ; Cin = 1 ’ b1 ;#10 A = 1 ’ b0 ; B = 1 ’ b1 ; Cin = 1 ’ b0 ;#10 A = 1 ’ b0 ; B = 1 ’ b1 ; Cin = 1 ’ b1 ;#10 A = 1 ’ b1 ; B = 1 ’ b0 ; Cin = 1 ’ b0 ;#10 A = 1 ’ b1 ; B = 1 ’ b0 ; Cin = 1 ’ b1 ;#10 A = 1 ’ b1 ; B = 1 ’ b1 ; Cin = 1 ’ b0 ;#10 A = 1 ’ b1 ; B = 1 ’ b1 ; Cin = 1 ’ b1 ;#10 $finish ;

end


23


Four-bit Carry Lookahead AdderIn order to improve the speed of the adder, carry lookahead logic was employed to compute the carry bitswhile simultaneously computing the sum bits. This method becomes less effective as the number of bitssurpasses four; therefore, a 4-bit CLA was implemented. This block can be cascaded to produce higher bitadders.

Figure 22: 4-bit Adder Schematic

Figure 23: 4-bit Adder Symbol

Figure 22 shows the schematic diagram of the 4-bit CLA, and Figure 23 shows a symbol view. Thiscircuit consists of four full adders, which each pass propogate and generate signals to the CLA logic, which

24


computes the carry bits and feeds them into the carry-in input of the appropriate full adders. The carry bitsare generated by the CLA based on the following logic:

Si = Ai ⊕ Bi ⊕ Ci = Pi ⊕ Ci (2.4)

Ci+1 = Gi + CiPi (2.5)

Expanding Eq 2.5 for each carry bit of the four bit adder yields Eq 2.6-2.9, which contains only the firstcarry in bit and propogate and generate bits. Thus, the carry bits can be computed in parallel with the sumbits, which increases the speed of the adder compared to a ripple style adder.

C1 = G0 + C0P0 (2.6)

C2 = G1 + G0P1 + C0P0P1 (2.7)

C3 = G2 + G1P2 + G0P1P2 + C0P0P1P2 (2.8)

C4 = G3 + G2P3 + G1P2P3 + G0P1P2P3 + C0P0P1P2P3 (2.9)Observing these equations reveals that two more primitive gates are required to implement this logic:

four and five input NAND gates. These gates were constructed using the logic primitives designed in PartI of this lab exercise. The schematics for the four and five input NAND gates are shown in Figure 24 andFigure 25, respectively. The CLA logic circuit that implements Eq 2.6-2.9 is shown in Figure 26. The codein Program Listing 6 was used to simulate the adder. Results are shown in Figure 27. All test cases weresuccessful.

Figure 24: Four input NAND gate

Figure 25: Five input NAND gate

25


Figure 26: Carry Lookahead Logic Schematic

26


Figure 27: 4-bit Adder Simulation


i n i t i a lbegin

A0 = 1 ’ b0 ; A1 = 1 ’ b0 ; A2 = 1 ’ b0 ; A3 = 1 ’ b1 ;B0 = 1 ’ b0 ; B1 = 1 ’ b0 ; B2 = 1 ’ b0 ; B3 = 1 ’ b1 ;Cin = 1 ’ b0 ;

#10 A0 = 1 ’ b1 ; A1 = 1 ’ b1 ; A2 = 1 ’ b1 ; A3 = 1 ’ b1 ;B0 = 1 ’ b1 ; B1 = 1 ’ b1 ; B2 = 1 ’ b1 ; B3 = 1 ’ b1 ;Cin = 1 ’ b1 ;

#10 A0 = 1 ’ b0 ; A1 = 1 ’ b0 ; A2 = 1 ’ b0 ; A3 = 1 ’ b0 ;B0 = 1 ’ b0 ; B1 = 1 ’ b0 ; B2 = 1 ’ b0 ; B3 = 1 ’ b0 ;Cin = 1 ’ b1 ;

#20 $finish ;end


27


12-bit Carry Lookahead AdderIn order to sum the 9-bit partial products from the decoders, a 12-bit CLA was implemented from the 4-bitCLA blocks. The block level schematic is shown in Figure 28. A symbol view of this circuit is shown in Figure26. This circuit cascades four bit CLAs in the same way that the four bit CLAs cascade full adders. Withinthe original CLA logic schematic in Figure 24 the carry out bits of each four bit adder are also computed.This implementation provides another layer of carry lookahead logic for the 12-bit adder to further improvespeed. This circuit was tested using the testbench schematic shown in Figure 27. The simulation results canbe seen in Figure 29. The verilog code of the stimulus block is shown in Program Listing 7. All test caseswere successful in this simulation. The average propagation delay was 8ns to compute one sum.

Figure 28: 12-bit Carry Lookahead Adder Symbol

Figure 29: 12-bit Carry Lookahead Adder Testbench Schematic

28


Figure 30: 12-bit Carry Lookahead Adder Schematic

29


Figure 31: 12-bit Carry Lookahead Adder AMS Simulation

// Ver i l og HDL fo r " ee103 " , " adder_12bit_vstim " " f u n c t i o n a l "module adder_12bit_vstim ( output reg [ 1 1 : 0 ] A,

output reg [ 1 1 : 0 ] B,output reg Cin ,inout wire VDD,inout wire VSS ) ;

reg c l k ;reg [ 3 : 0 ] s e l ;i n i t i a l begin

s e l = 4 ’ b0000 ;c l k = 1 ’ b0 ;A = 12 ’ h000 ;B = 12 ’ h000 ;Cin = 1 ’ b0 ;

end

always begin#25.0 c l k = ~ c lk ;

endalways @ (posedge c l k ) begin

s e l = s e l + 4 ’ b0001 ;endalways @ ( s e l ) begincase ( s e l )

4 ’ b0001 : beginA = 12 ’h7FF ;B = 12 ’ h001 ;Cin = 1 ’ b0 ; // S = 800end

4 ’ b0010 : beginA = 12 ’h08F ;B = 12 ’h07F ;Cin = 1 ’ b0 ; // S = 10Eend

4 ’ b0011 : begin

30


A = 12 ’hF00 ;B = 12 ’ h100 ;Cin = 1 ’ b0 ; // S = (1) 000end

4 ’ b0100 : beginA = 12 ’ h800 ;B = 12 ’hFFF ;Cin = 1 ’ b0 ; // S = (1) 7FFend

4 ’ b0101 : beginA = 12 ’h07F ;B = 12 ’hF71 ;Cin = 1 ’ b0 ; // S = FF0end

4 ’ b0110 : beginA = 12 ’h7FF ;B = 12 ’ h001 ;Cin = 1 ’ b1 ; // S = 801end

4 ’ b0111 : beginA = 12 ’h08F ;B = 12 ’h07F ;Cin = 1 ’ b1 ; // S = 10Fend

4 ’ b1000 : beginA = 12 ’hF00 ;B = 12 ’ h100 ;Cin = 1 ’ b1 ; // S = (1) 001end

4 ’ b1001 : beginA = 12 ’ h800 ;B = 12 ’hFFF ;Cin = 1 ’ b1 ; // S = (1) 800end

4 ’ b1010 : beginA = 12 ’h07F ;B = 12 ’hF71 ;Cin = 1 ’ b1 ; // S = FF1end

default : begin#50 $finish ;end

endcaseend

endmodule


31

2.4 Sign Extension Trick 2 DESIGN AND SIMULATION

2.4 Sign Extension TrickWhen summing the partial products to compute the final product, each partial product must be sign ex-tended by different amounts (i.e. the first partial product must be extended by 6 bits, the second by 4, etc.)so that the sign bits of each partial product are reproduced correctly. The sign extension would require extralogic; however, the following technique preserves the sign bits without having to sign extend each partialproduct:

1. Invert the MSB of each partial product generated by the decoder.

2. Add ’1’ to the MSB of the first partial product.

3. Add ’1’ in front of each partial product.

This technique was implemented by first placing an inverter between the MSB of each 9-bit partial productand the corresponding input bit of the 12-bit adder. Then, the 10th input bit to each of the 12-bit addersis connected to VDD. This effectively adds ’1’ in front of each partial product. For the first partial product,however, this cannot be done. Adding ’1’ to the MSB could produce a carry bit, which must be taken intoaccount. Furthermore, adding ’1’ in front of the MSB of the first partial product would produce a carryif the previous addition of ’1’ to the MSB produced a carry. This can be accomplished by cascading twohalf adders where each has one input tied to VDD, the first adder has a second input tied to the MSB ofthe first partial product, and the second adder has a second input connected to the carry out of the firstadder. The two sums and final carry are then tied to input bits 6, 7 and 8 of the 12-bit adder respectively.However, this method will be quite slow. Table 4 shows the truth table for the addition. PP08 is the MSBof the first partial product, ~PP08 is the inverted PP08 from step 1 above. B6-B8 refer to the inputs to the12-bit adder. Since ~PP08 can only be 1 or 0, there are only two possibilities for the addition. If ~PP08 =1, a carry will be produced by the first addition, and another carry will be produced by the second additionresulting in B8B7B6 = 011. If ~PP8 = 0, no carry will be produced and B8B7B6 = 100. From the truthtable it can be seen that B6 = B7 = ~PP08, and B8 = PP08. Thus, this circuit can be simplified to simplyconnecting these nodes as such. The two approaches are shown in Figure 32 and Figure 33.

Table 4: Sign Extension Truth Table~PP08 PP08 B8 B7 B6

0 1 1 0 01 0 0 1 1

32

2.4 Sign Extension Trick 2 DESIGN AND SIMULATION

Figure 32: Sign Extension Trick with Half Adders

Figure 33: Sign Extension Trick with Inverter

33

2.5 Signed 8x8 Modified Booth Multiplier 2 DESIGN AND SIMULATION

2.5 Signed 8x8 Modified Booth MultiplierFrom the blocks discussed above an 8x8 modified Booth multiplier was constructed following the architectureproposed in Section 1.2. The final schematic of the modified Booth multiplier is shown in Figure 36, anda symbol view is shown in Figure 34. This circuit takes in two 8-bit 2’s complement binary numbers andproduces a 16-bit output equal to the product of the two inputs. This circuit was simulated using AMS. Thetest bench schematic is shown in Figure 35. The simulation results are shown in Figure 37, and the stimuluscode is shown in Program Listing 8. Unfortunately, this circuit does not operate as desired. Four of theoutput bits are incorrect. We believe this is due to a flaw in our implementation of the sign extension trick.The test case shows the output for every combination of 107, 105, -107 and -105. The delay is measured at35ns per multiply, worst case.

Figure 34: 8x8 Booth Multiplier Symbol

Figure 35: 8x8 Booth Multiplier Testbench Schematic

34


Figure 36: 8x8 Booth Multiplier Schematic

35


Figure 37: 8x8 Booth MultiplierAMS Simulation

// Ver i l og HDL fo r " lab3_sims " , " booth_mul t ip l i er_top_tb1 " " f u n c t i o n a l "

// Testbench f o r booth_mult ip l ier_top_v1// Sco t t Smith// November 28 , 2011// STATUS − SEEMS TO WORKmodule booth_multipl ier_top_tb1 ( output reg [ 7 : 0 ] X,

output reg [ 7 : 0 ] Y,input wire [ 1 5 : 0 ] P,inout wire VDD,inout wire VSS ) ;

// This t e s t uses two numbers in var ious s i gn c o n f i g u r a t i o n s// and orders (8 cases t o t a l ) .// parameter NUM1p = 8 ’ b01101001 ; // +105// parameter NUM1n = 8 ’ b10010111 ; // −105 (151 unsigned )// parameter NUM2p = 8 ’ b01101011 ; // +107// parameter NUM2n = 8 ’ b10010101 ; // −107 (149 unsigned )

parameter NUM1p = 8 ’ b00000000 ; // 0parameter NUM1n = 8 ’ b00000000 ; // −0 ( carry l o s t )parameter NUM2p = 8 ’ b00000001 ; // 1parameter NUM2n = 8 ’ b11111111 ; // −1

reg c l k ;reg [ 3 : 0 ] s e l ;

assign VDD = 1 ’ b1 ;assign VSS = 1 ’ b0 ;

// I n i t i a l i z e v a r i a b l e s .i n i t i a l begin

c l k = 1 ’ b0 ;s e l = 4 ’ h0 ;X = 8 ’ h00 ;Y = 8 ’ h00 ;

end// Clock f o r advancing through the t e s t cases .

36


always begin#50.0 c l k = ~ c lk ;

end// ’ s e l ’ v a r i a b l e to i t e r a t e through the case s ta tement be low .always @ (posedge c l k ) begin

s e l = s e l + 4 ’ h1 ;end// D i f f e r e n t t e s t cases us ing the paramters above .always @ ( s e l or X or Y) begin

case ( s e l )4 ’ h0 : begin

X = NUM1p;Y = NUM2p;end

4 ’ h1 : beginX = NUM1n;Y = NUM2p;end

4 ’ h2 : beginX = NUM1p;Y = NUM2n;end

4 ’ h3 : beginX = NUM1n;Y = NUM2n;end

4 ’ h4 : beginX = NUM2p;Y = NUM1p;end

4 ’ h5 : beginX = NUM2n;Y = NUM1p;end

4 ’ h6 : beginX = NUM2p;Y = NUM1n;end

4 ’ h7 : beginX = NUM2n;Y = NUM1n;end

//−−−−−cases where a number i s m u l t i p l i e d by i t s e l f −−−−−//4 ’ h8 : begin

X = NUM1p;Y = NUM1p;end

4 ’ h9 : begin

37


X = NUM1p;Y = NUM1n;end

4 ’hA: beginX = NUM1n;Y = NUM1p;end

4 ’hB : beginX = NUM1n;Y = NUM1n;end

4 ’hC : beginX = NUM2p;Y = NUM2p;end

4 ’hD: beginX = NUM2p;Y = NUM2n;end

4 ’hE : beginX = NUM2n;Y = NUM2p;end

4 ’hF : beginX = NUM2n;Y = NUM2n;end

default : beginX = X;Y = Y;

endendcase

end// End s imu la t i on when we f i n i s h l o o k i n g at a l l our cases .always @ (negedge c l k ) begin

i f ( s e l == 4 ’hF ) begin// Add some de lay s ince t h i s c l k causes the inpu t s to update ,// but does not t e l l us what i s go ing on at the output .#200;$finish ;

endend// Add code to do error−check ing a u t o m a t i c a l l y . . .

endmodule


38

3 CONCLUSIONS

3 ConclusionsTable 5 below summarizes the delay seen through each block in this design. Several techniques can be usedin the final project to reduce these delay times.

Table 5: Delay SummaryBlock Propagation Delay

Half Adder 4nsFull Adder 4ns4-bit Adder 8ns12-bit Adder 8ns

Single Bit Decoder 7nsBooth Multiplier 35ns

This lab demonstrates the successful gate level implementation of an 8-bit signed modified booth mul-tiplier. However, the most significant four bits are incorrect. We believe this is due to our sign extensionimplementation. The first step of the final project will be to debug this technique. This design will beoptimized for maximum speed using IBM’s 180nm process, CMRF7SF. A layout will be constructed andpost layout simulation will be performed so that this device can be manufactured.

39

simulation of booth multiplier with verilog-xl

Documents