project report on multiplier design 1
TRANSCRIPT
A PIPELINED MIXED ARCHITECTURE OF 16x16
MULTIPLIER FOR LOW POWER AND HIGH SPEED DSP
APPLICATIONS
A Project Report submitted in the partial fulfilment for the award of Bachelor of Technology
in
Electronics and Communication Engineering
By
Nitesh Heda
Prasad Nirmal Kameshwar
Rohit Kumar
Bachelor of Technology, VII Semester,
Electronics and Communication Engineering (2010-11)
Under the guidance of
Mr. K V Krishna Rao
Assistant Professor
Electronics and Communication Engineering Department
MNNIT, Allahabad
ELECTRONICS AND COMMUNICATION ENGINEERING DEPARTMENT
MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY
ALLAHABAD-211004
MOTILAL NEHARU NATIONAL INSTITUTE OF TECHNOLOGY
Department of Electronics and Communication Engineering
Allahabad-211004
Certificate
This is to certify that the term paper project titled “A Pipelined Mixed Architecture of
16x16 bit Multiplier for low power and high speed DSP Applications” submitted by
Nitesh Heda, Prasad Nirmal Kameshwar and Rohit Kumar in the partial fulfilment of the
requirement for the award of Bachelor of Technology in Electronics and Communication
Engineering to the Electronics and Communication Engineering Department, Motilal Nehru
National Institute of Technology (Deemed University), Allahabad, is a bonafide work of
students carried out under my supervision.
Date:
Place:
(Mr. K V Krishna Rao)
Assistant Professor
Acknowledgement
It is a great privilege for us to express our deep sense of gratitude to our supervisor, Asst.
Prof. K V Krishna Rao of Electronics and Communication Engineering Department, MNNIT
Allahabad for his stimulating guidance and profound assistance. We shall always cherish our
association with him for his constant encouragement and freedom of thought and action that
he rendered to us throughout the final year project. We also express out thanks to the head of
department Prof. Sudarshan Tiwari for his invaluable support and encouragement throughout
the project. We also feel a great pleasure to thank Dr. Rajeev Tripathi, Asst. Prof. Amit
Dhawan, Asst. Prof. Sanjeev Rai and Asst. Prof. Rajiv Gupta for their cooperation which led
to the successful competition of our work. Finally, we deem it a great pleasure to thank our
family and one and all who helped us in carrying out this project.
Date:
Place:
Nitesh Heda (200750)
Prasad Nirmal Kameshwar (20075005)
Rohit Kumar (20075021)
Abstract
This project describes a Pipelined mixed architecture of a 16x16 bit multiplier for low power
and high speed DSP application. In this project, some of the key multiplier structure, such as
Array Multiplier, Wallace multiplier and Bypass Tree Multiplier, have been implemented and
their performance parameters compared. Then, pipelining of these multipliers was considered
for the requirement of continuous multiplication in DSP processors. Finally, a mixed
architecture consisting of the altered Wallace and bypass tree multiplier, with pipelining was
simulated and its performance was measured. Pipelining allowed multiple processes running
at the same time whereas; the low power dissipation of Bypass logic and the low delay of
Wallace structure were exploited. It has been shown that this structure offers a good choice as
a multiplier for DSP processors, which require continuous multiplication.
Keywords: Wallace Multiplier, Bypass Multiplier, Array Multiplier, Pipelining, DSP.
Contents
Certificate .................................................................................................................................... i
Acknowledgement ..................................................................................................................... ii
Abstract ..................................................................................................................................... iii
Contents .................................................................................................................................... iii
List of Figures ............................................................................................................................ v
Chapter 1. Introduction ...................................................................................................... 1
1.1 History and Background ..................................................................................... 2
1.2 Important Features .............................................................................................. 2
1.3 Applications ....................................................................................................... 2
1.4 Methodology ...................................................................................................... 2
Chapter 2. Face Detection .................................................................................................. 2
2.1 Introduction ........................................................................................................ 2
2.2 Image Acquisition .............................................................................................. 2
2.3 Colour Segmentation .......................................................................................... 2
2.4 Noise Removal ................................................................................................... 4
2.5 Edge Detection and Dilation .............................................................................. 4
2.6 Face Cropping .................................................................................................... 6
Chapter 3. Facial Expression Recognition ........................................................................ 9
3.1 Introduction ...................................................................................................... 10
3.2 Principal Component Analysis ......................................................................... 10
3.3 Computation of Eigen Face .............................................................................. 10
3.4 Calculation of Eigen Vector ............................................................................. 10
3.5 Representing faces onto its basis ...................................................................... 10
3.6 Training and Testing ........................................................................................ 10
Chapter 4. Hardware Implementation ............................................................................ 13
4.1 L293 Motor Driver ........................................................................................... 13
Chapter 5. Experimental Results and Analysis .............................................................. 23
References ............................................................................................................................... 24
List of Figures
Figure 1: An IP-Surveillance system ........................................................................................ 1
Figure 2: OpenCV Structure and Content ................................................................................. 3
Figure 3: Graphical User Interface Design ............................................................................... 5
Figure 4: Common Haar features .............................................................................................. 9
Figure 5: Face detection output ............................................................................................... 12
Figure 6: AXIS 214 PTZ Network Camera ............................................................................. 13
Figure 7: Flow Chart ............................................................................................................... 17
Figure 8: GUI to detect and track the biggest face in live video ............................................. 18
List of Tables
Table 1.1: Applications of face recognition and face detection technology .............................. 2
Table 1.2: Argument and Values for MJPG Request ............................................................. 14
Table 1.2: Argument and Values for PTZ Functions .............................................................. 15
CHAPTER 1
INTRODUCTION
1.1 Background
In today’s fast technologically developing world, the shift has been towards construction of
small and portable devices. As the number of these battery operated, processor driven
equipments increase and their performance demand is expected to be more, there is a need of
increasing their processing speed and reducing their power dissipation. In such a consumer
controlled scenario, these demands mean a serious look into the construction of the devices.
These processors used for such purposes are nothing but the DSP processors. Also, in these
processors, major operations such as FIR filter design, DCT, etc are done through multipliers.
As multipliers are the major components of DSP, optimization in multiplier design will surely
lead to a better operating DSP.
1.2 Multiplier Features
The features of the multiplier proposed in this paper are:
1. Pipelining: Pipelining allows this multiplier to accept and start the partial process of
multiplication of a set of data, even though a part of another multiplication is taking
place.
2. Mixed Architecture: The mixed type architecture has been considered, consisting of
Wallace and Bypass tree multiplier. This allows taking the advantage of low delay of
Wallace multiplier and low power dissipation in bypass multiplier.
3. Clocking: Clocking has been so done as to allow the multiplier to work at its highest
clock frequency without compromising with the perfect flow of partial products in the
structure.
4. Data range: The data range has been extended from initial 4x4 bit to 16x16 bit,
which is actually the required working data range for many of the DSP processors.
5. Structural Modelling: This makes sure the best implementation of the multiplier, be
it on ASIC or in FPGA, and removes any chance of redundant hardware that may be
generated.
1.3 Pipelining
1.4 Scenario
CHAPTER 2
BASIC MULTIPLIER ARCHITECTURES
2.1 Introduction
Basic multiplier consists ANDed terms (as shown in Fig 1.1) and array of full adders and/or
half adders arranged so as to obtain partial products at each level. These partial products are
added along to obtain the final result. It is the different arrangement and the construction
changes in these adders that lead to various type of structures of basic multipliers.
2.2 Array Multiplier
This is the most basic form of binary multiplier construction. Its basic principle is exactly like
that done by pen and paper. It consists of a highly regular array of full adders, the exact
number depending on the length of the binary number to be multiplied. Each row of this array
generates a partial product. This partial generated value is then added with the sum and carry
generated on the next row. The final result of the multiplication is obtained directly after the
last row.
Fig.1.1 ANDed terms generated
using logic AND gate
Fig. 1.2: Full Adder (FA)
implementation showing the two bits
(A,B) and Carry In (Ci) as inputs and
Sum (S) and Carry Out (Co) as
outputs.
Due to the highly regular structure, array multiplier is very easily constructed and also can be
densely implemented in VLSI, which takes less space. But compared to other multiplier
structures proposed later, it shows a high computational time. In fact, the computational time
is of order of O(N), one of the highest in any multiplier structure.
2.3 Wallace Multiplier
A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two
integers. For a NxN bit multiplication, partial products are formed from N2 AND gates. Next
N rows of the partial products are grouped together in set of three rows each. Any additional
rows that are not a member of these groups are transferred to the next level without
modification. For a column consisting of three partial products, a full adder is used, with the
sum dropped down to the same column whereas the carry out is brought to the next higher
column. For column with two partial products, a half adder is used in place of full adder. At
the final stage, a carry propagation adder is used to add over all the propagating carries to get
the final result.
Fig 2.3: A pictorial description of 6x6 bit Array multiplier.
The computational complexity of Wallace tree multiplier has achieved the lowest bound i.e.
O3/2(N). Thus, Wallace tree clearly offers advantage over other type of multipliers on the
basis of high speed.
2.4 Bypass Tree multiplier
The principle underlying Bypass multiplier is to bypass those hardware (or cells) whose input
multiplicand or/and multiplier bit is 0. This removes that hardware whose output is 0 and so
reduces the power consumption in those areas.
The basic structural arrangement of Bypass multiplier is very similar to that of Array
multiplier. The difference lies in the construction of basic cell of bypass multiplier, used in
place of full adder in array multiplier. It consists of bypassing logic, which depends on the bit
value of the multiplicand and multiplier input. The combinational part of the logic is
Fig. 2.4: Dot diagram stages in 8x8 bit Wallace tree multiplier
(Courtesy: W. J. Townsend, E. E. SSwartzlander and J. A. Abraham)
implemented using two MUX’s which outputs the actual sum and carry out of the full adder
if the input bits are 1, else it bypasses the FA by outputting the sum and carry out from the
cell of previous row.
Bypassing can be 1-D or 2-D. In 1-D (one dimensional) bypassing, the bypassing logic
depends only on the value of the multiplier bits. This logic is easy to implement but does not
efficiently use the bypassing technique. The 2-D (two dimensional) bypassing depends both
on the bit value of multiplier and multiplicand. The logic for this is hard to implement with
respect to the 1-D, but efficiently uses the advantage offered by bypassing technique.
Fig. 2.5: A 4x4 bit bypassing Multiplier
implementation showing the construction
using the basic cell (courtesy: C. C. Wang
and G. N. Sung)
Fig. 2.6: Basic Cell with bypassing
logic for 1-D bypassing (courtesy: C.
C. Wang and G. N. Sung)
Fig. 2.7: Basic Cell with bypassing logic
for 2-D bypassing (courtesy: C. C. Wang
and G. N. Sung)
The complexity in 2-D bypassing lies in the fact that in 2-D, not only the row with particular
multiplier bit as 0 is bypassed but also the column with multiplicand bit as 0 has to be
bypassed. This bypassing in a cell should make sure that the carry out and the sum from
previous cell is added at the respective weights, even if that particular cell is bypassed. If care
is not taken in such designing, then these carry outs and/or sums may get lost in between and
never be compensated for.
The main logic signals defining the working of bypass logic in 2-D bypass multiplier are as
follows: []
muxR_blij=𝑋𝑗 .𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2. 𝑋𝑗 𝑖=1,𝑗=3
muxR_blij=𝑋𝑗 .𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2.𝑋𝑗 +𝑋𝑗−2.𝑌𝑖 . 𝐶𝑖 ,𝑗−2 𝑛−3≥𝑖≥2,𝑛−1≥𝑗≥4
muxC_blij=𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2. 𝑋𝑗 𝑖=1,𝑗=3
muxC_blij=𝑌𝑖 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2.𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 . 𝐶𝑖 ,𝑗−2 𝑛−3≥𝑖≥2,𝑛−1≥𝑗≥4
muxL_blij=𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2. 𝑋𝑗 𝑖=1,𝑗=3
muxL_blij=𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 .𝐶𝑖+1,𝑗−2.𝑋𝑗 +𝑋𝑗−1.𝑌𝑖 . 𝐶𝑖 ,𝑗−2 𝑛−3≥𝑖≥2,𝑛−1≥𝑗≥4
Thus we observe that in 2-D bypassing, the logic not only depends on the multiplier and
multiplicand bits but also on the carry out bits from previous rows (maximum of 2).
CHAPTER 3
PIPELINING IN MULTIPLIERS
3.1 Introduction
CHAPTER 4
MIXED PIPELINE MULTIPLIER
ARCHITECTURE
4.1 The need of Mixed Architecture
During the simulation of Pipelined Wallace Tree multiplier (PWTM) and Pipelined Bypass
multiplier (PBM), it was observed that PWTM offered the low delay whereas PBM had the
upper hand because of very low power dissipation, with the same amount of total resources
used. Surely, the next move in designing the low power and high speed multiplier
architecture was to try to take advantage of both by mixing their architecture. This would
fulfil our expectations from the multiplier in terms of power and delay, while being
practically implementable.
4.2 Architecture Outline
Most of the DSP processors work on the floating point data types. That is, the numbers to be
multiplied are in form of mantissa and the exponent. Also, the mantissa is represented in 1.M
form and exponent as 2E. The real multiplication to be done is between the mantissa of the
two numbers only, as the exponent needs to be added.
One advantage offered by such a method is that we can be sure to have MSB’s of mantissa as
1. The LSB’s may or may not be 1. This implies that the real gain of the mixed architecture
can be taken if we use bypassing logic for multiplication of LSB’s of mantissa, as they have
higher probability of containing 0’s, whereas we can use Wallace tree structure for the
multiplication of MSB’s so as to reduce the delay in that side.
4.3 Structure
The inputs considered for multiplication are 16 bits of data. Each of these has been divided
into two parts of 8 bits, consisting of the MSB’s and LSB’s. The multiplication has been
considered in 4 parts now. Pipelined bypass multiplier has been used for multiplication of
two LSB parts, or a MSB part and LSB part. This is done in lieu of the explanation done
above so as to reduce the power dissipation. The MSB parts of the two binary numbers have
been multiplied using the pipelined Wallace tree multiplier so to reduce the delay in
multiplication.
The four products obtained are then inputted to an adder arrangement which adds all these
products, taking care of their respective weights. The final result is the Output of this adder
arrangement.
Fig. 4.1: Block Diagram of our Proposed Pipelined Mixed
Multiplier Structure. X {X1,X0} and Y {Y1,Y0} are the
16 bit input.
CHAPTER 5
SIMULATION AND RESULT
5.1 Introduction
The tool used for the simulation and verifying of result was XILINX ISE (11 and 12.2). The
hardware implementation has been done with the basis of Vertex 5 (XCV110T).
The complete hardware coding has been done in Verilog. Also, the whole implementation has
been done through Structural coding, which has the advantage of removing any redundant
hardware generated by any other type of modelling. Also, it is easily and practically
implementable.
For power analysis, we have used the XPower Analyzer tool of Xilinx.
5.2 Basic Multiplier
For initial comparison and understanding of the differences in the basic multiplier structures,
a 4x4 bit multiplier was implemented using the explained basic architecture and their
performance was evaluated.
Architecture Delay (ns) Power Dissipation
(mW)
Area overhead
Array 5.603
Bypass 6.538
Wallace Tree 6.685
Fig. 5.1: Technology Schematic of 4x4
bit Array Multiplier. (Generated from
Xilinx Synthesis)
5.3 Pipelined Multiplier Structure (8x8 bit)
Architecture Delay (ns) Power Dissipation
(mW)
Resources used
Array (Non-
pipelined)
10.97 422 -
Pipelined Wallace
Tree
6.36 436 (56 mW
Dynamic)
0.859% of Slice
Registers
0.796% of Slice
LUT’s
Pipelined Bypass 7.26 396 (16 mW
Dynamic)
1.161% of Slice
Registers
0.776% of Slice
LUT’s
Maximum Clock Frequency:
Pipelined Wallace tree Multiplier: 411.00 MHz
Pipelined Bypass Multiplier : 423.99 MHz
Fig. 5.2: Technology Schematic of 4x4
bit Bypass Multiplier. (Generated from
Xilinx Synthesis)
Fig. 5.3: Technology Schematic of 4x4
bit Wallace Tree Multiplier.
(Generated from Xilinx Synthesis)
5.4 Pipelined Mixed Architecture (16x16 bit)