project report on multiplier design 1

A PIPELINED MIXED ARCHITECTURE OF 16x16

MULTIPLIER FOR LOW POWER AND HIGH SPEED DSP

APPLICATIONS

A Project Report submitted in the partial fulfilment for the award of Bachelor of Technology

in

Electronics and Communication Engineering

By

Nitesh Heda

Prasad Nirmal Kameshwar

Rohit Kumar

Bachelor of Technology, VII Semester,

Electronics and Communication Engineering (2010-11)

Under the guidance of

Mr. K V Krishna Rao

Assistant Professor

Electronics and Communication Engineering Department

MNNIT, Allahabad

ELECTRONICS AND COMMUNICATION ENGINEERING DEPARTMENT

MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY

ALLAHABAD-211004

MOTILAL NEHARU NATIONAL INSTITUTE OF TECHNOLOGY

Department of Electronics and Communication Engineering

Allahabad-211004

Certificate

This is to certify that the term paper project titled “A Pipelined Mixed Architecture of

16x16 bit Multiplier for low power and high speed DSP Applications” submitted by

Nitesh Heda, Prasad Nirmal Kameshwar and Rohit Kumar in the partial fulfilment of the

requirement for the award of Bachelor of Technology in Electronics and Communication

Engineering to the Electronics and Communication Engineering Department, Motilal Nehru

National Institute of Technology (Deemed University), Allahabad, is a bonafide work of

students carried out under my supervision.

Date:

Place:

(Mr. K V Krishna Rao)

Assistant Professor

Acknowledgement

It is a great privilege for us to express our deep sense of gratitude to our supervisor, Asst.

Prof. K V Krishna Rao of Electronics and Communication Engineering Department, MNNIT

Allahabad for his stimulating guidance and profound assistance. We shall always cherish our

association with him for his constant encouragement and freedom of thought and action that

he rendered to us throughout the final year project. We also express out thanks to the head of

department Prof. Sudarshan Tiwari for his invaluable support and encouragement throughout

the project. We also feel a great pleasure to thank Dr. Rajeev Tripathi, Asst. Prof. Amit

Dhawan, Asst. Prof. Sanjeev Rai and Asst. Prof. Rajiv Gupta for their cooperation which led

to the successful competition of our work. Finally, we deem it a great pleasure to thank our

family and one and all who helped us in carrying out this project.

Date:

Place:

Nitesh Heda (200750)

Prasad Nirmal Kameshwar (20075005)

Rohit Kumar (20075021)

Abstract

This project describes a Pipelined mixed architecture of a 16x16 bit multiplier for low power

and high speed DSP application. In this project, some of the key multiplier structure, such as

Array Multiplier, Wallace multiplier and Bypass Tree Multiplier, have been implemented and

their performance parameters compared. Then, pipelining of these multipliers was considered

for the requirement of continuous multiplication in DSP processors. Finally, a mixed

architecture consisting of the altered Wallace and bypass tree multiplier, with pipelining was

simulated and its performance was measured. Pipelining allowed multiple processes running

at the same time whereas; the low power dissipation of Bypass logic and the low delay of

Wallace structure were exploited. It has been shown that this structure offers a good choice as

a multiplier for DSP processors, which require continuous multiplication.

Keywords: Wallace Multiplier, Bypass Multiplier, Array Multiplier, Pipelining, DSP.

Contents

Certificate .................................................................................................................................... i

Acknowledgement ..................................................................................................................... ii

Abstract ..................................................................................................................................... iii

Contents .................................................................................................................................... iii

List of Figures ............................................................................................................................ v

Chapter 1. Introduction ...................................................................................................... 1

1.1 History and Background ..................................................................................... 2

1.2 Important Features .............................................................................................. 2

1.3 Applications ....................................................................................................... 2

1.4 Methodology ...................................................................................................... 2

Chapter 2. Face Detection .................................................................................................. 2

2.1 Introduction ........................................................................................................ 2

2.2 Image Acquisition .............................................................................................. 2

2.3 Colour Segmentation .......................................................................................... 2

2.4 Noise Removal ................................................................................................... 4

2.5 Edge Detection and Dilation .............................................................................. 4

2.6 Face Cropping .................................................................................................... 6

Chapter 3. Facial Expression Recognition ........................................................................ 9

3.1 Introduction ...................................................................................................... 10

3.2 Principal Component Analysis ......................................................................... 10

3.3 Computation of Eigen Face .............................................................................. 10

3.4 Calculation of Eigen Vector ............................................................................. 10

3.5 Representing faces onto its basis ...................................................................... 10

3.6 Training and Testing ........................................................................................ 10

Chapter 4. Hardware Implementation ............................................................................ 13

4.1 L293 Motor Driver ........................................................................................... 13

Chapter 5. Experimental Results and Analysis .............................................................. 23

References ............................................................................................................................... 24

List of Figures

Figure 1: An IP-Surveillance system ........................................................................................ 1

Figure 2: OpenCV Structure and Content ................................................................................. 3

Figure 3: Graphical User Interface Design ............................................................................... 5

Figure 4: Common Haar features .............................................................................................. 9

Figure 5: Face detection output ............................................................................................... 12

Figure 6: AXIS 214 PTZ Network Camera ............................................................................. 13

Figure 7: Flow Chart ............................................................................................................... 17

Figure 8: GUI to detect and track the biggest face in live video ............................................. 18

List of Tables

Table 1.1: Applications of face recognition and face detection technology .............................. 2

Table 1.2: Argument and Values for MJPG Request ............................................................. 14

Table 1.2: Argument and Values for PTZ Functions .............................................................. 15

CHAPTER 1

INTRODUCTION

1.1 Background

In today’s fast technologically developing world, the shift has been towards construction of

small and portable devices. As the number of these battery operated, processor driven

equipments increase and their performance demand is expected to be more, there is a need of

increasing their processing speed and reducing their power dissipation. In such a consumer

controlled scenario, these demands mean a serious look into the construction of the devices.

These processors used for such purposes are nothing but the DSP processors. Also, in these

processors, major operations such as FIR filter design, DCT, etc are done through multipliers.

As multipliers are the major components of DSP, optimization in multiplier design will surely

lead to a better operating DSP.

1.2 Multiplier Features

The features of the multiplier proposed in this paper are:

1. Pipelining: Pipelining allows this multiplier to accept and start the partial process of

multiplication of a set of data, even though a part of another multiplication is taking

place.

2. Mixed Architecture: The mixed type architecture has been considered, consisting of

Wallace and Bypass tree multiplier. This allows taking the advantage of low delay of

Wallace multiplier and low power dissipation in bypass multiplier.

3. Clocking: Clocking has been so done as to allow the multiplier to work at its highest

clock frequency without compromising with the perfect flow of partial products in the

structure.

4. Data range: The data range has been extended from initial 4x4 bit to 16x16 bit,

which is actually the required working data range for many of the DSP processors.

5. Structural Modelling: This makes sure the best implementation of the multiplier, be

it on ASIC or in FPGA, and removes any chance of redundant hardware that may be

generated.

1.3 Pipelining

1.4 Scenario

CHAPTER 2

BASIC MULTIPLIER ARCHITECTURES

2.1 Introduction

Basic multiplier consists ANDed terms (as shown in Fig 1.1) and array of full adders and/or

half adders arranged so as to obtain partial products at each level. These partial products are

added along to obtain the final result. It is the different arrangement and the construction

changes in these adders that lead to various type of structures of basic multipliers.

2.2 Array Multiplier

This is the most basic form of binary multiplier construction. Its basic principle is exactly like

that done by pen and paper. It consists of a highly regular array of full adders, the exact

number depending on the length of the binary number to be multiplied. Each row of this array

generates a partial product. This partial generated value is then added with the sum and carry

generated on the next row. The final result of the multiplication is obtained directly after the

last row.

Fig.1.1 ANDed terms generated

using logic AND gate

Fig. 1.2: Full Adder (FA)

implementation showing the two bits

(A,B) and Carry In (Ci) as inputs and

Sum (S) and Carry Out (Co) as

outputs.

Due to the highly regular structure, array multiplier is very easily constructed and also can be

densely implemented in VLSI, which takes less space. But compared to other multiplier

structures proposed later, it shows a high computational time. In fact, the computational time

is of order of O(N), one of the highest in any multiplier structure.

2.3 Wallace Multiplier

A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two

integers. For a NxN bit multiplication, partial products are formed from N2 AND gates. Next

N rows of the partial products are grouped together in set of three rows each. Any additional

rows that are not a member of these groups are transferred to the next level without

modification. For a column consisting of three partial products, a full adder is used, with the

sum dropped down to the same column whereas the carry out is brought to the next higher

column. For column with two partial products, a half adder is used in place of full adder. At

the final stage, a carry propagation adder is used to add over all the propagating carries to get

the final result.

Fig 2.3: A pictorial description of 6x6 bit Array multiplier.

The computational complexity of Wallace tree multiplier has achieved the lowest bound i.e.

O3/2(N). Thus, Wallace tree clearly offers advantage over other type of multipliers on the

basis of high speed.

2.4 Bypass Tree multiplier

The principle underlying Bypass multiplier is to bypass those hardware (or cells) whose input

multiplicand or/and multiplier bit is 0. This removes that hardware whose output is 0 and so

reduces the power consumption in those areas.

The basic structural arrangement of Bypass multiplier is very similar to that of Array

multiplier. The difference lies in the construction of basic cell of bypass multiplier, used in

place of full adder in array multiplier. It consists of bypassing logic, which depends on the bit

value of the multiplicand and multiplier input. The combinational part of the logic is

Fig. 2.4: Dot diagram stages in 8x8 bit Wallace tree multiplier

(Courtesy: W. J. Townsend, E. E. SSwartzlander and J. A. Abraham)

implemented using two MUX’s which outputs the actual sum and carry out of the full adder

if the input bits are 1, else it bypasses the FA by outputting the sum and carry out from the

cell of previous row.

Bypassing can be 1-D or 2-D. In 1-D (one dimensional) bypassing, the bypassing logic

depends only on the value of the multiplier bits. This logic is easy to implement but does not

efficiently use the bypassing technique. The 2-D (two dimensional) bypassing depends both

on the bit value of multiplier and multiplicand. The logic for this is hard to implement with

respect to the 1-D, but efficiently uses the advantage offered by bypassing technique.

Fig. 2.5: A 4x4 bit bypassing Multiplier

implementation showing the construction

using the basic cell (courtesy: C. C. Wang

and G. N. Sung)

Fig. 2.6: Basic Cell with bypassing

logic for 1-D bypassing (courtesy: C.

C. Wang and G. N. Sung)

Fig. 2.7: Basic Cell with bypassing logic

for 2-D bypassing (courtesy: C. C. Wang

and G. N. Sung)

The complexity in 2-D bypassing lies in the fact that in 2-D, not only the row with particular

multiplier bit as 0 is bypassed but also the column with multiplicand bit as 0 has to be

bypassed. This bypassing in a cell should make sure that the carry out and the sum from

previous cell is added at the respective weights, even if that particular cell is bypassed. If care

is not taken in such designing, then these carry outs and/or sums may get lost in between and

never be compensated for.

The main logic signals defining the working of bypass logic in 2-D bypass multiplier are as

follows: []

muxR_blij=𝑋𝑗 .𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2. 𝑋𝑗 𝑖=1,𝑗=3

muxR_blij=𝑋𝑗 .𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2.𝑋𝑗 +𝑋𝑗−2.𝑌𝑖 . 𝐶𝑖 ,𝑗−2 𝑛−3≥𝑖≥2,𝑛−1≥𝑗≥4

muxC_blij=𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2. 𝑋𝑗 𝑖=1,𝑗=3

muxC_blij=𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2.𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 . 𝐶𝑖 ,𝑗−2 𝑛−3≥𝑖≥2,𝑛−1≥𝑗≥4

muxL_blij=𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2. 𝑋𝑗 𝑖=1,𝑗=3

muxL_blij=𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2.𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 . 𝐶𝑖 ,𝑗−2 𝑛−3≥𝑖≥2,𝑛−1≥𝑗≥4

Thus we observe that in 2-D bypassing, the logic not only depends on the multiplier and

multiplicand bits but also on the carry out bits from previous rows (maximum of 2).

CHAPTER 3

PIPELINING IN MULTIPLIERS

3.1 Introduction

CHAPTER 4

MIXED PIPELINE MULTIPLIER

ARCHITECTURE

4.1 The need of Mixed Architecture

During the simulation of Pipelined Wallace Tree multiplier (PWTM) and Pipelined Bypass

multiplier (PBM), it was observed that PWTM offered the low delay whereas PBM had the

upper hand because of very low power dissipation, with the same amount of total resources

used. Surely, the next move in designing the low power and high speed multiplier

architecture was to try to take advantage of both by mixing their architecture. This would

fulfil our expectations from the multiplier in terms of power and delay, while being

practically implementable.

4.2 Architecture Outline

Most of the DSP processors work on the floating point data types. That is, the numbers to be

multiplied are in form of mantissa and the exponent. Also, the mantissa is represented in 1.M

form and exponent as 2E. The real multiplication to be done is between the mantissa of the

two numbers only, as the exponent needs to be added.

One advantage offered by such a method is that we can be sure to have MSB’s of mantissa as

1. The LSB’s may or may not be 1. This implies that the real gain of the mixed architecture

can be taken if we use bypassing logic for multiplication of LSB’s of mantissa, as they have

higher probability of containing 0’s, whereas we can use Wallace tree structure for the

multiplication of MSB’s so as to reduce the delay in that side.

4.3 Structure

The inputs considered for multiplication are 16 bits of data. Each of these has been divided

into two parts of 8 bits, consisting of the MSB’s and LSB’s. The multiplication has been

considered in 4 parts now. Pipelined bypass multiplier has been used for multiplication of

two LSB parts, or a MSB part and LSB part. This is done in lieu of the explanation done

above so as to reduce the power dissipation. The MSB parts of the two binary numbers have

been multiplied using the pipelined Wallace tree multiplier so to reduce the delay in

multiplication.

The four products obtained are then inputted to an adder arrangement which adds all these

products, taking care of their respective weights. The final result is the Output of this adder

arrangement.

Fig. 4.1: Block Diagram of our Proposed Pipelined Mixed

Multiplier Structure. X {X1,X0} and Y {Y1,Y0} are the

16 bit input.

CHAPTER 5

SIMULATION AND RESULT

5.1 Introduction

The tool used for the simulation and verifying of result was XILINX ISE (11 and 12.2). The

hardware implementation has been done with the basis of Vertex 5 (XCV110T).

The complete hardware coding has been done in Verilog. Also, the whole implementation has

been done through Structural coding, which has the advantage of removing any redundant

hardware generated by any other type of modelling. Also, it is easily and practically

implementable.

For power analysis, we have used the XPower Analyzer tool of Xilinx.

5.2 Basic Multiplier

For initial comparison and understanding of the differences in the basic multiplier structures,

a 4x4 bit multiplier was implemented using the explained basic architecture and their

performance was evaluated.

Architecture Delay (ns) Power Dissipation

(mW)

Area overhead

Array 5.603

Bypass 6.538

Wallace Tree 6.685

Fig. 5.1: Technology Schematic of 4x4

bit Array Multiplier. (Generated from

Xilinx Synthesis)

5.3 Pipelined Multiplier Structure (8x8 bit)

Architecture Delay (ns) Power Dissipation

(mW)

Resources used

Array (Non-

pipelined)

10.97 422 -

Pipelined Wallace

Tree

6.36 436 (56 mW

Dynamic)

0.859% of Slice

Registers

0.796% of Slice

LUT’s

Pipelined Bypass 7.26 396 (16 mW

Dynamic)

1.161% of Slice

Registers

0.776% of Slice

LUT’s

Maximum Clock Frequency:

Pipelined Wallace tree Multiplier: 411.00 MHz

Pipelined Bypass Multiplier : 423.99 MHz


bit Bypass Multiplier. (Generated from

Xilinx Synthesis)


bit Wallace Tree Multiplier.

(Generated from Xilinx Synthesis)

5.4 Pipelined Mixed Architecture (16x16 bit)

project report on multiplier design 1

Documents