table of contents - university of floridaplaza.ufl.edu/krishnat/report1.pdfabstract this project is...

TABLE OF CONTENTS

1. ABSTRACT

2. INTRODUCTION

3. FUNCTIONAL BLOCK DIAGRAM

4. PIN DIAGRAM

5. SPECIFICATIONS AND PIN FUNCTIONS

6. FLOOR LAYOUT

7. INDIVIDUAL BLOCK DESCRIPTION

DECODER SUB / ADD

COMPARITOR

MULTIPLIER

PARITY GENERATOR

BARREL SHIFTER

LOGICAL OPERATIONS

AND OR NOT EXOR

8. POWER , AREA AND DELAY CALCULATIONS 9. CONCLUSION

10. FUTURE EXPANSION

11. APPENDIX

TEST RUN SCHEMATICS

LAYOUTS OF IMPORTANT BLOCKS

FINAL LAYOUT

ABSTRACT

This project is an implementation of a 4-bit Arithmetic Logic Unit (ALU) using cadence

tools. The project is divided in two different parts comprising of the simulation of the

ALU using the SPECTRE software and testing all the built in functions and then the

optimized design of the same using CADENCE. The built in functions in this 4 bit ALU

comprises of the following.

• Adder

• Subtractor

• Magnitude comparator

• Multiplier( Baugh-Wooley)

• Parity generator

• Logical operations

1. 8-bit NOT

2. bit-wise OR

3. bit-wise AND

4. bit-wise XOR

• Bit Shifter (Barrel shifter)

To implement the operations of addition and subtraction a 4-bit transmission gate adder and

subtractor design was used with a SUB pin provided to choose between the two options..

Baugh Wooley multiplier has been used to implement the multiplies. Both signed and

unsigned multiplication was implemented using this type. The operations of bit shifting and

has been implemented using a Barrel shifter. The rest of the functions were implemented

using transmission gates. A 4 to 8 bit decoder is used to select the respective functional unit

to carry out the function that is desired.. The ALU takes input data from two 4-bit latches

which are controlled by a clock. The outputs of the latches are fed into all the functional units

of the ALU. The function of the decoder is to select the required functional unit and pass the

data to that particular unit, which is accomplished by the input select logic.

INTRODUCTION

The ALU is the part of the Central Processing unit which performs operations such as

addition, subtraction and multiplication of integers and bit-wise AND, OR, NOT, XOR and

other Boolean operations.

LOGIC UNIT

O N

ARITHMETIC

ARITHMETIC LOGICAL OPERATIONSPERATIO

The CPU’s instruction decode logic determines which particular operation the ALU

should perform, the source of the operands and the destination of the result. The width in

bits of the words which the ALU handles is usually same as that quoted for the processor as a

whole whereas its external busses may be narrower. Floating point operations are usually

done by a separate “floating point unit”. Some processors use the ALU for address

calculations. Typically, the ALU has direct input and output access to the processor

controller, main memory and input and out put devices. The inputs and the outputs follow an

electronic path called the bus. The input consists of an instruction word (otherwise known as

the machine instruction word) that contains an operation, one or more operands and

sometimes a format code. The operation code instructs the AL

U to perform a stipulated operation. The flow of bits and the operations performed in the

subunits of the ALU is controlled by gated circuits. The gates in these circuits are controlled

by a sequence of logic units that use a particular algorithm for each operation code

COMPARISON

FUNCTIONAL BLOCK DIAGRAM

O

NOT

SHIFT

MULTIPLY

AND

OR

EXOR

COMPARE

ADD/SUB

PARITY

T

G

O 16

LATCHED

FUNCTION ENABLE

LATCHED

ENABLED

UTPUTS

R A N S

F M ROM

I S INPUT S I E

VERY

O N

A BT L

OCK

E S

SIGNALS 4 TDECODER

OUTPUT FROM

BLOCK

UT1

T2

T3

T4

T5

T6

T7

T8

IN2

IN3

IN4

IN5

IN6

IN7

IN8

WR SUB S0 S1 GND

CLK E0 E1 E2 E3

ALU

PIN DIAGRAM

P

OIN1 OU OU OU OU OU OU OU

ALU Spe

cifications

0.6 micron HP CMOS process

size: 240X120( in microns)

3 Volt Power Supply

26 pins

Operating Temperature: Commercial(0°C~70°C)

Power dissipation: 0.0403 mW

Number of functions supported: 10

PIN SPECIFICATIONS

PIN DESCRIPTION IN1 – IN8 Input A E0-E3 FUNCTION SELECT S0-S1 SHIFT SELECT SUB ADD / SUBTRACT SELECT OUT1-OUT8 Output CLK Clock GND Ground PWR Power Supply

FLOOR LAYOUT

INDUVIDUAL DIGITAL BLOCKS

ecoder

decoder is a combinational circuit that converts binary information from n input lines

t

D A

to a maximum of 2 n unique output lines.. We have implemented a 4 to 16 bit decoder ou

of which only nine combinations are used and the rest are treated as don’t cares. As a

result this decoder can be used to accommodate to choose seven more functional blocks.

The output of the decoder is connected to output transmission gate that it is it is the

gating signal for the transmission gates. This block utilizes 4 four input AND gates, their

corresponding bit combination and their respective bit selections are as shown in the

following table:

E3 E2 E1 E0 OPERATION

0 0 0 0 NOT

0 0 0 1 OR

0 0 1 0 AND

0 1 0 0 COMPARATOR

1 0 0 0 BARREL SHIFTER

0 0 1 1 XOR

0 1 1 0 MULTIPLIER

0 1 0 1 ADD/SUB

1 0 0 1 PARITY

The following figure illustrates the decoder implemented in this project

Transmission gate ADDER and SUBTRACTOR This is rather a dynamic implementation of both the adder and subtractor in the same circuit.

It makes extensive use of exclusive-or gates (XOR). The following figure shows the

schematic of a transmission gate adder and subtractor.

By using a combination of transmission gates, inverters and XOR gates an adder may be

constructed. The A ⊕ B and the complement are formed using the transmission gate XOR..

The sum A ⊕ B ⊕ C is formed by a multiplexer controlled by A ⊕ B and its complement.. It

can be quite clearly seen that CARRY = C when A ⊕ B is true and CARRY = A (or B) when

A ⊕ B is false. This adder has the advantage of having equal sum and carry propagation

delay times. In addition the sum and carry signals are not inverted .One of the disadvantages

of this circuitry is that since the adder is implemented using XOR gate switch level

simulators have problems. The number of transistors used in this circuitry can be minimized

if speed of computation is not the main goal. The same circuitry is also used to perform the

operation of subtraction based on whether the SUB signal is high or low. One of the inputs is

inverted and then added to the other input data to perform subtraction.

Magnitude comparator A magnitude comparator is useful to compare magnitude of two binary numbers. A

Comparator built from an adder and complemented functions as follows. A zero detect

NOR gate provides the A=B signal while the final carry output provides the B>A signal.

Other signals such as A<B or A<=B may be generated by logical combinations of these

circuit for

ry,

expressed logically with an equivalence function:

X = A * B + A * B

signals. If one needs to check equality between two binary numbers, then a XNOR gate

and an AND gate is all that is necessary. A pass-gate logic implementation can also used

instead of a gate implementation. Single polarity transmission gates can also be used and

are very appropriate in low power circuits. The following are some examples for the use

of magnitude comparators :

There are times when it is useful to detect the magnitude of two registers. For example:

during a search, a value is often compared with another to determine if a match is found.

Another example is sorting, in this case you are normally concerned with a "Less Than"

or "Greater Than" so a value can be inserted into a list. The comparator

example outputs a: 2 if "A" is greater than "B", 1 if they are equal, and 0 if "A" is less

than "B". This output can be used for many things. If the system designer wanted to

select the larger of 2 values, then the output would be sent into the select line of a Mux

that would then pass the proper register. Another place something like this may be used is

in a CPU’s compare and branch instruction. The program counter could be incremented

by the result of the comparison to select the appropriate branch address.

The magnitude comparator works on the following principle:

Let A and B be two numbers ach with four digits. i.e

A = A3 A2A1A0

B = B3B2B1B0

Where each subscripted alphabet represents a digit in the number. The two numbers are equal if and only if all pairs of significant digits are equal. When the numbers are bina

the digits are either one or zero and the equality relation of each pair of bits can be

i i i i i

Where Xi = 1 only if the pair of bits in position i are equal . The equality of the two numbers

the

This

are

of

m the most significant position. If the two digits are equal , we

air of

lude

This

ctions:

is displayed in a combinational circuit by an output variable which we designate by

symbol A=B . The binary variable is equal to 1 if the numbers A and B are equal and is equal

to 0 otherwise. For the equality condition to exist all X variables must be equal to 1.

indicates an AND operation of all variables. ( A = B) = X3 X2X1X0

The Binary variable (A=B) is equal to 1 only if all pairs of digits of the two numbers

equal. To determine if A<B or A>B, we inspect the relative magnitudes of pairs

significant digits starting fro

compare the next lower significant pair of digits. This comparison continues until a p

unequal digits is reached. If the corresponding digit of A is ! and that of B is 0, we conc

that A>B. If the corresponding digit of A is 0 and that of B is 1we have that A<B.

sequential comparison can be expressed logically by the following two Boolean fun

(A>B) = A3 * B3 + X3* A2*B2 + X3* X2 A1*B1 + X3* X2 *X1* A0*B0

(A<B) = A3 * B3 + X3* A2*B2 + X3* X2 A1*B1 + X3* X2 *X1* A0*B0

Condition Eo Fo

A < B 0 0

A = B 0 1

A > B 1 0

Baugh Wooley Multiplier Baugh-Wooley algorithm! An algorithm for direct 2's complement array multiplication

has been proposed by Baugh and Wooley . The primary advantage of this algorithm is

that the signs of all the partial products are positive, and thus allowing the array to be

entirely the same as conventional standard array structures.

The following are some of the highlights of the Baugh Wooley algorithm

• Algorithm for two’s-complement multiplication. • Adjusts partial products to maximize regularity of multiplication array. • Mo s

neg

Two’s Complement

st bit

To

2 n ,

where

positiv ent

the

corresponding positive number to obtain the negative number. Using the same example,

to obtain th 4, take the number 4(b’0100) and subtract 2 (4-1)

(b’1000)fro e result ). The multiplier algorithm discussed in this

application note takes advantage of the latter method. One important note on using 2’s

complement n epresenta n is the need for sign extension. That is, to obtain a

negative number using a greater number of bits, simply repeat the sign bit to the left until

the desired number of bits are filled. For example, to extend the number -4 (b’1100)

sign extended number would be (b’11111100).

ves partial products with negative signs to the last steps; also addation of partial products rather than subtracts

Before starting with signed multiplication, a quick review of the 2’s complement system

of signed number representation for a binary number would be helpful in understanding

the derivation of the algorithm. Basically, in the 2’s complement system, the left mo

(MSB) indicates the sign of the number, with 0 being positive, 1 being negative.

obtain a negative number, simply subtract the corresponding positive number from

n is the number of bits of the original number. For example, to obtain the 4 bit

signed number -4, take 2 4 (b’10000),and subtract from it the corresponding

e number, 4(b’0100), the result is (b’1100), -4 in the 2’s complem

representation.Alternatively, it can be obtained by subtracting 2 (n-1) from

e number -

m it. Th is (b’1100

umber r tio

from 4 bits to 8 bits, the resultant

Basic Binary Multiplication :

Baugh-Wooley Algorithm

concept shown in Figure . The algorithm specifies that all possible AND terms are

created first, and then sent through an array of half-adders and full-adders with the carry- outs chained to the next most significant bit at each level of addition. For signed Multiplication (by utilizing the properties of the two’s complement system) the Baugh-

Wooley algorithm can implement signed multiplication in almost the same way as the

The Baugh-Wooley algorithm for the unsigned binary multiplication is based on the

nsigned multiplication shown above. u

MULTIPLIER CELLS

TOTAL SCHEMATIC

PARITY GENERATOR An N-bit Parity Generator is a combinational Boolean function block which has N

parallel inputs and one output. T

• Logic "1" if the number of "1"'s in the input vector are odd

• Logic "0" if the number of "1"'s in the input vector are even

As an example, the truth table of a 4-bit parity generator is given below.

Although the function looks complex it can easily be realized by using XOR gates. The

parity function equals to :P= D0 ⊕ D1 ⊕ D2 ⊕ D3 ⊕ ....

A four bit parity generator can be realized using a tree structure as follows:

P= ( ( D0 ⊕ D1 ) ⊕ ( D2 ⊕ D3 ) )

he output (parity) bit is:

The parity function is used as a simple means for verifying the correctness of data

transmission in digital communications. In some serial communication

bits are send together with the corresponding parity value. The receiver checks the parity.

If one bit was misinterpreted during transmission, the parity will not m

receiver will ask the transmitter to re-transmit the data. This is a very s

verifying that the transmitted data is recieved correctly. This me

weaknesses, for example, if the communication channel is very noisy and m

bit gets misinterpreted this method will not be able to detect the error. Still,

checking is one of the most popular error detection methods in data transm

protocols, the data

atch, and the

imple method for

thod also has some

ore than one

simple parity

ission.

BARREL SHIFTER A hardware device that can shift or rotate a data word by any number of bits in a single

operation. It is implemented like a multiplexor, each output can be connected to any input

depending on the shift distance. The operations supported by Barrel shifters are lest shift,

right shift and rotations

A Barrel shifter cell

The following are some of the salient features in the layout of a barrel shifter

In addition to the above features it is quite evident that in a Barrel shifter the majority of e area is consumed by wiring. The propagation delay is theoretically a constant at most

ne transmission gate, independent of shifter size and number of shifts. Finally the

put is proportional to the maximum shift width.

th

o

capacity of the buffer in

LOGICAL OPERATIONS 4 – bit AND operation :

on mathematical

A and B. The logical

operation of en all the inputs are

The AND operation will be signified by AB or A*B. Other comm

notations for it are A^B and A∩B, called the intersection of

the AND gate is such that the output is HIGH (1) wh

HIGH, otherwise it is LOW (0). The 4 bit AND gate receives four inputs form A0, A1, A2,

A3 and B0,B1,B2,B3 where each of these elements taken in a pair form the input to each of

the 4 two input AND gates. The truth table and schematic of the AND gate is as follows

:

Ai Bi output

0 0 0

0 1 0

1 0 0

1 1 1

4-bit OR operation

he OR operation will be signified by A+B . Other common mathematical notations for it

lled the union of A and B. The logical operation of the OR gate

such that the output is HIGH (1) when one of the inputs are HIGH or both the inputs

0

T

are A(OR)B and A∪B, ca

is

are HIGH(1) otherwise it is LOW (0). The 4 bit OR gate receives four inputs form A ,

A1, A2, A3 and B0,B1,B2,B3 where each of these elements taken in a pair form the input to

each of the 4 two input OR gates. The truth table of the OR gate is as follows :

Ai Bi output

0 0 0

0 1 1

1 0 1

1 1 1

8 – bit NOT gate :

A logical inverter, sometimes called a NOT gate to differentiate it from other types of

electronic inverter devices, has only one input. It reverses the logic state. The NOT gate

is a circuit which produces at its output the negated (inverted) version of its input logic.

The inverter (NOT circuit) performs a basic logic function called inversion or

complementation. The purpose of the inverter is to change one logic level (HIGH / LOW)

to the opposite logic level. In terms of bits, it changes a ‘1’ to a ‘0’ and vice versa. This

version in the output is made possible by connecting a PMOS switch from Vdd to

output when the input is 0 and connecting an MOS switch from ground to output when

the input is 1. The following is the truth table and schematic of the inve

in

N

rter.

Input Output

0 1

1 0

4- bit exclusive- OR

(XOR) subsystem provides an output signal that is low if either both input signals are

BOTH inputs are 1. In that case the OR gate produces a 1, but the XOR gate produces a

sing 4 XOR

gates in conjunction. The following is the truth table for the XOR gate

The functional operation of a XOR gate is represented as A ⊕ B. The exclusive-OR gate

high or both input signals are low. Otherwise, the output signal is high. The XOR gate

responds almost exactly like the OR gate, except that it produces a zero output when

0. In conclusion, the XOR produces a 1 when exactly one of the inputs is 1, in all other

case it produces a zero. A 4-bit exclusive OR gate has been implemented u

Ai Bi output

0 0 0

0 1 1

1 0 1

1 1 0

Edge triggered D flip-flop

following advantages: No transparency problem AND Clock skew is minimized by

balancing CLK and CLK~ delays using buffers and invertors

The input latch for one clock cycle acts as a register for a single set of data and during the

subsequent clock sends them to the ALU inputs. Each latch is appositive edge triggered

Output latch

It acts as a register for data during one clock cycle and pushes it out during the next clock

cycle.

For the latch design a positive edge triggered D flip flop is used. This design has the

Input Latch

D flip flop implemented using dynamic logic.

FUTURE EXPANSION

The 4 bit A L U can be extended to 8 bit ALU owing to the design simplicity where in

concatenation of the blocks is easily achieved. The ALU can also be used along with an

on chip memory (like the SRAM designed earlier in the course) to store the outputs

temporarily.

complexities involved in designing large

ircuits and also gave us ample lot of exposure into the use of CADANCE tools in

splitting up of the work amongst the group members and the parallel

ion. This way the project also threw immense light into the concept of team work.

his project has also gone a long way in motivating all the team members to attack new

hallenges and design much more complex and efficient circuits in future. We would like

CONCLUSION

This project gave an in depth picture of the

c

designing. The

execution of different blocks in the layout made the job much simpler for final

integrat

T

c

to thank Dr. Eisenstadt for providing us this launch platform from where we are capable

enough to reach new heights.

APPENDIX

LAYOUT OF IMPORTANT BLOCKS DECODER

ADDER / SUBTRACTOR

COMPARITOR

MULTIPLIER

BARREL SHIFTER

PARITY GENERATOR

EDGE TRIGGERED D FLIP FLOP

INPUT PASS GATE

4

FINAL LAYOUT WITH PADS

table of contents - university of floridaplaza.ufl.edu/krishnat/report1.pdfabstract this project is...

Documents