design of high performance arithmetic circuits using novel...
TRANSCRIPT
Design of High Performance Arithmetic
Circuits using Novel Two Transistor
(2T) XOR Gates
Thesis submitted in partial fulfilment
of the requirements for the degree of
Master of Science by Research
In
Electronics and Communication Engineering
by
Himani Upadhyay
201232697
International Institute of Information Technology, Hyderabad
(Deemed to be University)
Hyderabad-500032, INDIA
October 2015
i
Copyright© Himani Upadhyay, 2015
All Rights Reserved
ii
Dedicated to my Guide, Family and Friends
iii
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
HYDERABAD, INDIA
CERTIFICATE
This is to certify that the work presented in this thesis, titled “Design of High Performance
Arithmetic Circuits using Novel Two Transistor (2T) XOR Gate” by Himani Upadhyay
(201232697) submitted in partial fulfilment for the award of the degree of Master of Science
(by Research) in VLSI & Embedded Systems, has been carried out under my supervision and
it is not submitted elsewhere for a degree.
Date Advisor: Prof. S R Chowdhury
Assistant Professor
IIIT, Hyderabad
iv
Acknowledgements
The journey through the completion of this dissertation has been an amazing one. I would like
to take this opportunity to acknowledge and appreciate the efforts of the people who have
helped me during my research and documenting this thesis. I extend my deepest gratitude to
my advisor, Prof. Shubhajit Roy Chowdhury for his motivation, guidance, support and
immense knowledge which played a great role during development of ideas in the thesis. I
could have never imagined getting a better advisor and mentor for my Master’s study. His
valuable feedback and flexible nature lead to improvement in different aspects of the work and
approach of the work.
I would also thank my colleagues in CVEST lab for extending support and being a wonderful
company. I am deeply grateful to my M. Tech friends Pankaj, Rutanshu, Rachit, Aswathy,
Anamika, Rajeev and many more who had always been there for encouraging and supporting
me throughout my work.
I would like to appreciate the love and support of my family who had always been there through
thick and thin of my degree. I thank my mother Vindvasini Upadhyay and father Narendra
Prakash Upadhyay for the selfless love, infinite trust and true confidence in me. I thank my
sisters Aekta and Shivangi for their encouragement during my study. I would also like to thank
my brother Himanshu for the strong will he builds in me. With that, I hope to work well and
proud in my future life.
I offer my thanks from the deepest part of my heart to GOD for filling my heart and soul with
inner enthusiasm, spirit of hard work and patience to complete my research.
v
Abstract
From the day transistor was invented (in 1947), low area, low power and high speed are
constitutional issues faced by researchers in transistor based technology. Presently
minimization of power consumption has emerged as a design constraint over the last few years
due to increase in demand of portable consumer electronic products in very large scale
integrated (VLSI) circuit designs. Mobile phones, smart cards, assistive listening technology
such as hearing aids and PDAs are the examples of portable consumer electronic products.
Many design technologies like complementary MOS, pass transistor logics, transmission gate
based and so on exists in literature dealing with issue of low power consumption. Low power
designs can be developed at system level, technology level, architectural level and circuit level.
For implementing a combinational circuit, power saving is done by proper choice of a logic
style. This is because all important parameters governing power dissipation, switching
capacitance, transition activity and short circuit currents are strongly influenced by logic style
of the circuit. Improvising the logic styles has advantage in terms of power, delay and layout
implementations.
XOR gates and Full adders are the basic building blocks of various circuits like Central
Processing Unit (CPU) and Digital Signal Processors (DSP). So, optimization of XOR and full
adder in terms of power consumption will let us achieve low power circuits. This thesis presents
a novel design of two transistor (2T) XOR gate and its application to design an 8 bit x 8 bit
multiplier. The design explores the essence of suitably biasing the two PMOS pass transistors
and engineering the threshold voltage of the PMOS transistors. Using the 2T XOR gates, a six
transistor (6T) full adder has been realised. Detailed simulations have been carried out to
compare the proposed 2T XOR gate and 6T full adder against the existing XOR gates and full
adders available in literature with respect to power delay product (PDP), noise margin and area.
Significant improvements in PDP has been achieved with the 2T XOR gate with respect to the
existing XOR gates. The area of 6T adder has been found to be lower than 8T adder convincing
that 2T XOR gate occupies less silicon area than 3T XOR gate.
The thesis also presents the architectures of 5:3 compressor designs for low power
multiplication purposes. The architecture utilizes the novel two transistor XOR gates and two
transistor multiplexer design for logic level implementation. The modified and proposed
compressor designs reduces the stage delay, transistor count, PDP (power delay product), EDP
(energy delay product) and area by utilizing the combinations of XOR-XNOR gates, MUX
circuits and transistor level implementation in contrast with the conventional designs.
An 8 bit x 8 bit array multiplier has also been implemented using the design of 6T adder and
its performance has been analysed and compared with similar multipliers designed with peer
adder designs available in literature. The power delay product (PDP) of the proposed multiplier
has been found to be as low as 1.854 pJ using UMC 65-nm CMOS process. The design of the
vi
8 bit x 8 bit multiplier has been extended to the design of 8 bit multiply-accumulate (MAC)
unit, which has been simulated using 65-nm CMOS process. A delay of 3.977 ns and power
dissipation of 1.107 mW has been obtained with the MAC unit.
All the circuit simulations in this thesis have been done in a systematic process. For validating
the applicability and accuracy of transistor level models, process, voltage and temperature
variation analysis has been done. The proposed designs are definitely a better choice for low
frequency (≤ 50MHz) applications. From the schematic design of the structures to layout
CADENCE Spectre simulation tool has been used. ASSURA, a design verification suite of
tools within the Virtuoso custom design platform is utilized for layout purposes. Simulation
studies have been carried out in UMC 65-nm, 90-nm and 130-nm technologies for conforming
the interdependence of the proposed model.
vii
Contents
Contents……………………………………………………………………………vii-viii
List of Tables…………………………………………………………………………ix
List of Figures………………………………………………………………………..x-xi
List of Symbols and Equations……………………………………………………xii
Contents Page No.
1. Introduction……………………………………………………………………...1-26
1.1. The Historical Perspective………………………………………………….1-4
1.2. Prior Works…………………………………………………………………4-24
1.2.1. XOR Gates…………………………………………………………..5-11
1.2.2. Adders……………………………………………………………….11-17
1.2.3. Compressors…………………………………………………………17-21
1.2.4. Multipliers…………………………………………………………...22-24
1.3. Motivation…………………………………………………………………..24-25
1.4. Problem Statement…………………………………………………………..25
1.5. Contribution of the Thesis…………………………………………………..25
1.6. Thesis Organization…………………………………………………………26
2. Design of 2T XOR Gate………………………………………………………....27-41
2.1. What is a XOR Gate?.....................................................................................27-28
2.2. Design of Two Transistor XOR Gate……………………………………….28-41
2.2.1. Working of 2T XOR Gate…………………………………………...29-32
2.2.2. Simulation and Performance Analysis……………………………....32-40
of Proposed 2T XOR Gate
2.2.3. Results and Discussions…………………………………………….40-41
3. Design of 6T Adder using Novel 2T XOR Gates………………………………42-53
3.1. What is an adder?..........................................................................................42-44
3.1.1. Half Adders…………………………………………………………42-43
3.1.2. Full Adders………………………………………………………….43-44
3.2. Design of Proposed Six Transistor Full Adder…………………………….44-53
3.2.1. Simulation and Performance Analysis of …………………………..45-49
Proposed 6T Full Adder
3.2.2. Layout Design of Proposed Six Transistor………………………….49-52
Adder
3.2.3. Results and Discussions……………………………………………..52-53
viii
4. Design of 5:3 Compressor using Novel 2T XOR Gates………………………...54-63
4.1. What are Compressors in VLSI Design?.......................................................54-55
4.2. Architecture of Proposed 5:3 Compressors...………………………………55-63
4.2.1. Circuit Design of Proposed 5:3 Compressors……………………….57-58
4.2.2. Simulation and Performance Analysis of 5:3……………………….59-60
Proposed Compressor Architectures
4.2.3. Layout design of Proposed 5:3 Compressor ……………………….60-62
Architectures
4.2.4. Results and Discussions……………………………………………..63
5. Design of 8 Bit x 8 Bit Multiplier using Novel 2T XOR………………………..64-76
Gates
5.1. What is a Multiplier?......................................................................................64-67
5.1.1. Multiplication Algorithm……………………………………………65-67
5.2. Design of Proposed 8 Bit x 8 Bit Multiplier………………………………..67-76
5.2.1. Array Multiplier……………………………………………………..67
5.2.2. Simulation and Performance Analysis of …………………………...67-70
Proposed 8 Bit x 8 Bit Multiplier
5.2.3. Layout Design of Proposed 8 Bit x 8 Bit Multiplier………………...70-73
5.2.4. Overview of Multiply and Accumulate (MAC)……………………..74-75
Unit
5.2.5. True Single Phased Clocked Register (TSPCR)…………………….75-76
5.2.6. Results and Discussions……………………………………………..76
6. Conclusions………………………………………………………………………77-79
6.1. Summary of present work…………………………………………………...77
6.2. Limitations of thesis work…………………………………………………...78
6.3. Future work………………………………………………………………… 79
List of Publications…………………………………………………………………80
Bibliography………………………………………………………………………...81-85
ix
List of Tables
Table Page No.
1. Time and Area Requirements of Different Types………………………………..12
of Adders
2. Simulation Logic Levels of 2T XOR Gate at
Reverse Bias of 320 mV using 65-nm Technology................................................30
3. Comparison of Performance Analysis of
Different XOR Gates..............................................................................…............35
4. Comparison Result of Noise Margin of
Different XOR Gates……………………………….………………………….....39
5. Truth Table for Half Adder.……………………………………………………....43
6. Truth Table for Full Adder.………………………………………………………44
7. Comparison of Performance Analysis of
Different Adders………………………………………………………………….47
8. Comparative Study of Area of Different Adders………………………………...50
9. Counter Property of 5:3 Compressors…………………………………………....55
10. Comparative Analysis of Performance of Different……………………………..59
5:3 Compressors
11. Comparative Study of Area of Different 5:3 Compressors………………………61
12. Performance Analysis of 8 Bit x 8 Bit Multiplier………………………………..69
using Different Adders
13. Comparative Study of Area of 8 Bit x 8 Bit Multiplier………………………….72
using Different Adders
x
List of Figures
Figures Page No.
1.1 Earlier Inventions.……………………………………………………………….2
1.1(a). Diagrammatic Representation of IGFET…………………………………2
1.1(b). The 4004 Microprocessor [2]...…………………………………………..2
1.1(c). The 8008 Microprocessor [3]...…………………………………………...2
1.2 Graphical Depiction of Moore’s Law [4]………………………………………..3
1.3 Static CMOS XOR gates…………………………………………………………7
1.4 Eight Transistor XOR Gate with CMOS Transmission Gate.…………………....8
1.5 Six Transistor XOR Gates………………………………………………………..8
1.6 Previous Works on Design of 4 Transistor XOR Gates……………………….....9-10
1.7 Design of 3T XOR Gate……………………………………………………….....10
1.8 Improved Design of 3T XOR gate……………………………………………....11
1.9 Adder cell with Three Modules………………………………………………......14
Module 1: Generate XOR and XNOR functions
Module 2: Sum function Module 3: Carry function
1.10 Topologies of Different Full Adder Designs with Reduced…………………….14-17
Number of Transistors over the years
1.10(a). 28 Transistor Full Adder………………………………………………...14
1.10(b). 20 Transistor Full Adder………………………………………………...15
1.10(c). 16 Transistor Full Adder………………………………………………...15
1.10(d). 14 Transistor Full Adder………………………………………………...16
1.10(e). 10 Transistor Full Adder………………………………………………...16
1.10(f). 8 Transistor Full Adder…………………………………………………..17
1.11 Different Implementations of Compressor Designs……………………………..18-20
1.11(a). Conventional Design of 3:2 Compressors………………………………18
1.11(b). Conventional Design of 4:2 Compressors………………………………19
1.11(c). Conventional Design of 5:3 Compressors………………………………19
1.11(d). Existing Implementation of 5:3 Compressors…………………………..20
1.12 CMOS Implementations of
1.12(a). MUX…………………………………………………………………….21
1.12(b). XOR-XNOR…………………………………………………………….21
1.13 Algorithm for 8 bits X 8 bits Wallace Tree Multiplier [55]…………………….23
1.14 Serial- Parallel Multiplier……………………………………………………….24
2.1 Logic Symbol of XOR Gate……………………………………………………..27
2.2 The wiring diagram depicting the control of single light……………………….28
source with two switches. The light is on when either both
switches are switched up or both down
2.3 Proposed Design of 2T XOR Gate…………………………………………….30
xi
2.4 Diode Connected NMOS………………………………………………………32
2.5 Input and Output Waveforms………………………………………………....33
2.5(a). XOR Gate at Reverse Bias of 320 mV………………………………….33
2.5(b). XOR Gate at Reverse Bias of 270 mV………………………………….33
2.6 Calculation of Propagation Delay………………………………………………34
2.7 PDP (vs) Technology for XOR Gate Architectures……………………………..36-37
3.1 Circuit Diagram of Half Adder………………………………………………….42
3.2 Logic Circuit of Full Adder……………………………………………………..43
3.3 Schematic Diagram of Proposed 6T Adder……………………………………..45
3.4 Post Layout Simulation of 6 Transistor Adder at 65-nm
Technology……………………………………………………………………...46
3.5 PDP (vs) Technology for Adder Architectures………………………………....48
3.6 Layout View of Proposed 6T Full Adder……………………………………….49
3.7 Area (vs) Technology for Adder Architectures………………………………....51-52
4.1 Block Diagram of 5:3 Compressors…………………………………………....55
4.2 Architecture of Proposed 5:3 Compressor……………………………………...57
4.3 Two Transistor 2x1 Multiplexer Design………………………………………..57
4.4 Schematic View of 5:3 Compressors…………………………………………..58
4.4(a). 3T XOR and 2T 2x1 MUX Compressor………………………………..58
4.4(b). 2T XOR and 2T 2x1 MUX Compressor………………………………..58
4.5 EDP (vs) Type of Compressor Circuit in Different Technology………………..60
4.6 Area (vs) Type of Compressor Circuit in Different Technology………………..61
4.7 Layout View of Proposed 5:3 Compressors in 90-nm…………………………..62
Technology
4.7(a). 3T XOR and 2T 2x1 MUX……………………………………………..62
4.7(b). 2T XOR and 2T 2x1 MUX……………………………………………..62
5.1 Basic Multiplication…………………………………………………………….65
5.2 Signed Multiplication Algorithm……………………………………………….65
5.3 Product Matrix………………………………………………………………….66
5.4 Example: Multiplication of 8 bit x 8 bit Binary Numbers……………………...66
5.5 Array Multiplier Architecture…………………………………………………..68
5.5(a). An 8 bit x 8 bit Array Multiplier………………………………………..68
5.5(b). Basic Building Block…………………………………………………...68
5.6 PDP (vs) Technology for Different Multiplier Architectures…………………..70
5.7 Layout Design of 8 Bit x 8 Bit Multiplier………………………………………71
5.8 Area (vs) Technology for Different Multiplier Architectures…………………..73
5.9 Basic Multiply and Accumulate (MAC) Unit…………………………………..74
5.10 True Single Phased Clocked Register (TSPC)………………………………...75
5.11 Positive and Negative Latches…………………………………………………76
xii
List of Symbols and Equations
Symbols Page No.
𝛼0→1= Switching activity factor………………………………………………………….5
𝛾 = Bulk threshold coefficient……………………………………………………………29
𝜑0= Fermi Potential………………………………………………………………………29
𝛼𝑣, 𝛼𝑤= Process dependent parameters…………………………………………………..29
Equations
1. Equation for Total Power Consumption of VLSI Circuit………………………...5
2. Equation for Representation for Static CMOS XOR Gate………………………..6-7
3. Equation for XOR gates…………………………………………………………..28
4. Equation for The relation exhibited between channel length (L),………………...29
width (W), substrate to bulk voltage (𝑉𝑆𝐵) of transistor
5. Equation of Flicker Noise………………………………………………………...31
6. Equation for Propagation Delay…………………………………………………..34
7. Equation for subthreshold current………………………………………………...38
8. Equation for sum and carry of half adders………………………………………..42
9. Equation for sum and carry of full adders………………………………………...43
10. Equation for design of 5:3 compressors…………………………………………. 56
11. Equation for addition of partial products in multipliers…………………………. 66
12. Equation for working of array multiplier………………………………………... 67
1
Chapter 1
Introduction
1.1 The Historical Perspective
Digital electronic system is on a revolutionary boom with great improvement in technology.
Earlier, the digital electronic systems were established on the idea of magnetically controlled
relays (or switches) used mainly for the implementation of very simple logic networks. The
train safety systems, which are still being used at present is an example for this kind of network.
The vacuum tubes were the dominating electronic device technology until 1950s. The change
in the technology came up in 1947 at Bell telephone laboratories with the invention of transistor
followed by Shockley’s exploration of bipolar transistor in 1949. The first bipolar logic gate
introduced by Harris came into picture in 1956 and until that even more time was taken to
translate it into integrated-circuit commercial logic gates, called the Fairchild Micro-logic
family. The first truly successful IC logic family was Transistor-Transistor Logic (TTL) which
got pioneered in 1962. The issues with bipolar junction transistors particularly with respect to
power dissipation, scaling and noise immunity became more and more serious over time and
ultimately gave way to Metal Oxide Semiconductor Field Effect Transistors (MOSFETs).
The basic principle behind the MOSFET (originally called IGFET) in Figure 1.1(a) was
proposed in a patent by J. Lilienfed (Canada) as early as 1925, and, independently, by O.Heil
in England in 1935 [1]. MOS digital integrated circuits started to take off in full swing at the
early 1970s. Remarkably, the first MOS logic gates introduced were of the CMOS
Complementary MOS) variety, and this trend continued till the late 1960s. The first practical
MOS integrated circuits were implemented in PMOS-only logic and were used in applications
such as calculators. The second age revolution of the digital integrated circuit was inaugurated
with the introduction of the first microprocessor by Intel in 1972 (the 4004 microprocessor [2]
(Figure 1.1(b))) and 1974 (the 8080 microprocessor [3] (Figure 1.1(c))). These processors were
implemented in NMOS-only logic which has the advantage of higher speed over the PMOS-
only logic because the mobility of electrons present in NMOS devices are more than that of
holes in PMOS devices. Simultaneously, MOS technology also enabled the realization of the
first high density semiconductor memories.
2
Figure 1.1(a). Diagrammatic Representation of IGFET
Figure 1.1(b). The 4004 Microprocessor [2]
Figure 1.1(c). 8080 Microprocessor [3]
Figure 1.1. Earlier Inventions
3
The driving force of integrated electronics is to have minimum area by compressing the silicon
area required by electronic circuit in addition to reduction in power consumption and delay.
This led to the integration of more and more applications because of reduction in number of
transistors. The overhead in terms of silicon area and power is also reduced. The demand of
transistors in VLSI design is appropriately elaborated by Moore’s Law [4] as shown in Figure
1.2. Moore’s Law states that “The number of transistors per square inch on integrated circuits
had doubled every year since the integrated circuit was invented”. Thus, reducing the transistor
count of circuits have been the main focus for researchers for so many years and is still
continuing [5].
The challenging criteria of the emerging low power and high speed communication digital
signal processing chips can be addressed by exploring the well-engineered deep submicron
MOSFET technologies. The performance of the basic arithmetic circuits to implement complex
algorithms such as convolution, correlation and digital filtering, defines the performance of
many bigger modules of Digital Signal Processors (DSPs). The semiconductor industry has
witnessed an explosive growth of integration of sophisticated multimedia-based applications
into mobile electronics gadgetry since the last decade. However, power consumption is the
critical area of concern in this arena and has to be reduced for a particular operating frequency.
Moreover, there is a drive by designers to strive for smaller silicon area, higher speed, longer
battery life, and enhanced reliability because of explosive growth of demand and popularity of
portable electronic products. The XOR-XNOR circuits are basic building blocks in various
circuits’ especially arithmetic circuits (adders & multipliers), compressors, comparators, parity
checkers, code converters, error-detecting or error-correcting codes and phase detector. The
adders and multipliers being the fast arithmetic computation cells and widely used for many
circuits of VLSI design are the frequent research areas.
Figure 1.2. Graphical Depiction of Moore’s Law [4]
4
A further addition to reliability and packaging problem issues have been raised with the rise in
chip density and increase in power consumption of VLSI systems. Packaging and cooling cost
of VLSI systems also goes up with high power dissipation. Nowadays, low power consumption
along with minimum delay and area requirements is one of important design consideration for
IC designers. There are three major source of power consumption in CMOS VLSI circuits:
1) Dynamic switching power due to charging and discharging of parasitic capacitances,
2) Short circuit power due to direct current flow from power supply to ground with
simultaneous functioning of p-network and n-networks,
3) Leakage power due to leakage currents, which includes both the subthreshold leakage and
reverse bias leakage.
Different logic styles with each having its own advantages in terms of power, delay and layout
implementation have been proposed for high speed and low power circuits. There are many
proposed logics for high speed and low power dissipation [6].
There are four different design levels at which the increasing demand for low-power of Very
Large Scale Integration (VLSI) can be addressed. They are defined as the architectural, circuit,
layout and the process technology level [7]. A considerable potential for power savings at the
circuit level exists by means of proper choice of a logic style for implementing combinational
circuits. This is because of switching capacitance, transition activity, and short-circuit
currents— all the important parameters governing power dissipation are strongly influenced
by the chosen logic style. At technology level, power consumption is going to scale down at
the same rate as the channel length technology is shrinking day by day. Thus, power saving
can be achieved by the improvements in fabrication process such as small feature size, very
low voltages, interconnects and insulators with low dielectric constants. The performance
aspects depend on the application, the kind of circuit to be implemented, and the design
technique used. Investigations of low-power logic styles proclaimed in the literature so far,
however, have mainly pin-pointed on particular logic cells, namely full-adders, used in some
arithmetic circuits. In this thesis, these observations and surveys have been kept in mind
starting with basic logic gate and extending the idea to a much broader set of combinational
arithmetic circuits. The power dissipation characteristics of various existing logic styles are
contrasted qualitatively and quantitatively by actual logic gate implementations and
simulations under experimental circuit arrangements and operating conditions [8].With the
reduction of power at different design levels, the number of transistors are also reduced. The
reduction in number of transistors to design a circuit reduces the silicon area during fabrication
giving way to compact digital logic design. Similar investigations of sequential elements, such
as latches and flip-flops, are not included in this work, but can be found elsewhere in the
literature [7].
1.2 Literature Surveys In the past decade a lot of work has been done and various architectural designs have been
proposed for the areas mentioned in this thesis. Starting with the basic building block i.e. XOR
5
gates to DSP (digital signal processor) level, novel designs have been implemented to have
minimum silicon area, minimum power and high speed. Working with nanometre technologies
and reducing the area has also been the prime focus in the preceding decades. Microprocessors
and digital signal processors rely on the efficient implementation of generic arithmetic logic
units and floating point units to execute dedicated algorithms.
1.2.1 XOR Gates
With the ever increasing demand for high speed processing and economy of batteries, the
demand for low power VLSI system is increasing steadily over a decade. In this regard, full
adder receives a lot of attention since it forms a basic element in any processor design. From
the gate level design point of view, it is well known that full adder can be efficiently
implemented using XOR gates. The ‘sum’ can be implemented using two cascaded XOR gates
and ‘carry’ as a multiplexed operation on transistors. With this essence, XOR gate forms a
primary block further used to design full adder in Chapter 3. Due to the increasing number of transistors on digital chip, power dissipation reduction counts
as an important criteria for designing XOR gates. The total power consumption for VLSI
circuits is given by a general Equation as follows [1]:
P =𝑓. 𝐶𝑙𝑜𝑎𝑑. 𝑉𝐷𝐷2 . 𝛼0→1 + 𝑉𝐷𝐷.𝑡𝑆𝐶 . 𝐼𝑆𝐶 . 𝑓 + 𝑉𝐷𝐷 . 𝐼𝑙 (1)
Where,
𝛼0→1 = Switching activity factor for transitions,
𝑉𝐷𝐷 = Supply voltage,
𝐶𝑙𝑜𝑎𝑑 = Output load capacitance,
f = System clock frequency during transitions,
𝐼𝑆𝐶= Short circuit current flowing from power supply to ground,
𝑡𝑆𝐶 = Time duration for flow of short circuit current,
𝐼𝑙 = Leakage current.
The first term on the right hand side of Equation (1) represents the dynamic component of
power, second term denotes the short circuit power and the third term defines the leakage
power. The power consumption can be minimised by reducing the power supply voltage, load
capacitances or by lowering the frequency of circuit as depicted from the first term of Equation
(1). The switching activity is primarily accounted at architectural and Register Transfer Level
(RTL) when going for synthesis. At circuit level, other factors in dynamic power play a
dominant role [8]. Avoiding direct path from 𝑉𝐷𝐷 to ground by balancing the rise and fall time
of the transistor inputs helps in diminishing power consumption as understood by second term.
Reducing supply voltage directs to poor performance if the threshold voltage is not scaled
accordingly. To accomplish low-voltage/low-power digital designs, both supply and threshold
voltage scaling has to be taken care of as explained in literature [9–11]. Due to circuit topology,
6
the optimal operating point may vary significantly between sub-circuits, depending on the
activity and logic depth. The application of different supply voltages can impose severe area
penalties for fixed and inherent threshold voltages for the general selected processes. The
inherent variation in threshold voltages and supply will normally further reduce the advantage
of operation at ultra-low supplies [10].
Circuit realization for low power and low area has become an important issue for the growth
of integrated circuit towards very high integration density and high operating frequencies. Due
to the important role played by XOR and XNOR gates in various circuits especially in
arithmetic circuits, optimized design of XOR and XNOR circuit to achieve low power, small
size and low delay is needed. The primary concern to design XOR-XNOR gate is to obtain low
power consumption and delay in the critical path and correct output voltage swing with least
number of transistors to implement it. XOR gate is an elementary building block of digital
circuits and there is persistent research going on to enhance its performance.
So, ever since its inception, the design of XOR gates forms the basic building block of all
digital VLSI circuits which has been undergoing a considerable improvement, being motivated
by three basic design goals, viz. minimizing the transistor count, reducing the power
consumption and increasing the speed [4-25]. Hosseinzadeh, Jassbi and Navi emphasized that
the circuit performance can be improved [5] through transistor count minimization. XOR gates
play an important role in digital systems including arithmetic circuits, encryption circuits,
comparator, parity checker and so on. Enhancing the performance of the XOR gates can
significantly improve the performance of these circuits. Many design architectures and
techniques have been developed to design XOR gates with reduction in power consumption
[14]. The literature survey reveals a wide spectrum of XOR gates that have been realized over
the years. The dominant concern to design XOR gate is to acquire correct output voltage swing
with least number of transistors and additionally, implementation with low power consumption
and delay in the critical path
There are many logic styles in which XOR gates can be designed like Pass Transistor Logic
(PTL), Double Pass Transistor Logic (DPL), inverter based logic circuit, transmission based
XOR gates and XOR gates with feedback transistors [26]. These techniques involved different
methods to design XOR gates with different count of transistors with minimum transistor count
three [24, 25]. Complementary MOS uses dual networks to implement a given function [6, 7,
14]. The first part consists solely of pull-up PMOS network while a second part consists of
pull-down NMOS networks. This technique is popular and produces results that are widely
accepted but it requires more numbers of CMOS transistors. Static CMOS XOR is shown in
Figure 1.3(a). The circuit can operate with full output voltage swing. The different realization
of XOR gates through equations are given below from Equation 2-5 where A and B are the
inputs and Z is the corresponding output value.
Z = A ⊕ B = (A + B). (A′ + B′) (2)
Z′ = (A ⊕ B) ′ = {(A + B). (A′ + B′)}′ (3)
Z′ = AB + A′B′ (4)
7
Z = (AB + A′B′) ′ = A ⊕ B (5)
Alternative realization of static XOR circuit with complementary CMOS transistors using
above input-output relation is shown in Figure 1.3(b).
Figure 1.3(a)
Figure 1.3(b)
Figure 1.3. Static CMOS XOR gates
8
The early designs were also based on conventional design of XOR gates with eight transistors
[14, 16] in Figure 1.4 and six transistors [14, 16] in Figure 1.5 which were used in many
applications. The drawback of 8 transistor XOR gate was complementary inputs and no driving
capability due to transmission gates used. In Figure 1.5(a), six transistor XOR gate is shown
where, when A= “High”, the output is complementary of input B and the transmission gate has
no role. When A= “Low”, the transmission gate passes the signal B to the output end directly
and fully. So, the A’B and AB’ will give a good signal level. This function will be complete
on all the input cases. In Figure 1.5(b), an additional tailing inverter can also improve the poor
signal which comes from the output end of the 4-transistor XNOR structure, and outputs a good
signal level. For the above two cases, the complementary signal inputs are not required, and
the driving property is better than Figure 1.4 as well. However, these structures still have some
defects, such as no full driving capability at the output end, or more delay time.
Figure 1.4. Eight Transistor XOR Gate with CMOS Transmission Gate
Figure 1.5(a) Figure 1.5(b)
Figure 1.5. Six Transistor XOR Gates
9
With the course of time, designs employing four transistors came into picture [15, 16, 17, 18,
19, 20, 21, 22, 23]. D. Radhakrishnan proposed ad-hoc design techniques for implementation
employing formal design procedures using K-maps and pass network theorem [27]. The
concept realized logic implementation of XOR-XNOR circuits using pass transistors. CMOS
transmission gate logic XOR gate [14, 16] was replaced by four transistor XOR gates as shown
in Figure 1.6(a) and Figure 1.6(b) manifested by Wang, Fang and Feng [16]. For the structure
stated in Figure 1.6(a), when A = “High”, A’ must be “LOW”. A and A’ signals are connected
to the 𝑉𝐷𝐷 end of PMOS and the 𝑉𝑆𝑆 end of NMOS in the second inverter, respectively. Then
the output of the second inverter functions like a standard inverter, and outputs the signal B’.
Therefore, the output signal will be a perfect AB’ signal. On the other hand, when A = “Low”,
A’ must be “High”. The output of the second inverter will be a poor signal B because it
transmits a signal “High” by NMOS and a signal “Low” by PMOS. That is, if we use only 4
transistors to implement an XOR function, based on the inverter configuration, its output will
be complete on AB’ but poor on A’B. To improve this phenomenon, an additional transmission
gate can correct this defect as shown in Figure 1.5.
These proposed novel XOR gate architectures operated without complementary inputs which
was a major drawback in previous conventional designs of XOR gates adopting complementary
transmission gate. Later, power consumption and delay was reduced with reformed XOR gate
layout without 𝑉𝐷𝐷 as shown in Figure 1.6(c) [17]. The XOR gates demonstrated were power
supply-less XOR or P- and similarly, the XNOR gates were groundless XNOR or G- with no
ground. The output for AB=01, 10, 11 will be complete but will differ from logic low level in
case of AB=00 by a threshold value of PMOS. So, the defect of this 4 transistor XOR gate is
that the output level will be higher or lower than a normal case by threshold voltage (𝑉𝑇). The
study of diverse XOR gates in Figure 1.6(d) by Bui, Wang ,Jiang and Al-Sheraidah led to the
design of XOR gates with some improvement in PDP though the silicon area remained same
[18, 19].
Figure 1.6(a) [16] Figure 1.6(b) [16]
10
Figure 1.6(c) [17] Figure 1.6(d) [18, 19]
Figure 1.6. Previous Works on Design of 4 Transistor XOR Gates
Shams, Darwish and Bayoumi further studied various forms of XOR gate designs given by
Bui, Wang and Jiang offering a further optimization of performance [20]. A striking progress
came up with three transistor XOR gate design by Roy Chowdhury et al combining CMOS
logic with pass transistor logic [24] as shown in Figure 1.7. The XOR gate depicts the concept
of combining a CMOS inverter and a pass transistor. The design suffers from two drawbacks.
Firstly, voltage degradation due to threshold drop and secondly, current feedback due to
transistor with aspect ratio 2/1 when the inputs are A=1 and B=0. This can be overcome by
decreasing the W/L ratio of that transistor but it greatly affects the current carrying capability,
thereby, reducing the steady state power dissipation. A different version of latest three
transistor XOR gate can also be seen in Figure 1.8 and was given by Tripti Sharma,
K.G.Sharma, B.P.Singh and Neha Arora [25]. The simulation result comparison showed it to
be best among three transistor XOR gates and has minimum power, delay and PDP as
compared to other 3T XOR gates in literature.
Figure 1.7. Design of 3T XOR Gate [24]
11
Figure 1.8. Improved Design of 3T XOR gate [25]
Reducing transistor count, area and power delay product still remained the three basic goals to
refine XOR gate designs across the years coming [4-25]. With the objective of further reducing
the transistor count a novel design of a two transistor XOR gate is proposed in the thesis. The
XOR gate has been found to be implemented over lesser silicon area with huge improvement
in power-delay product.
1.2.2 Adders
Adders are indispensable in VLSI circuits and proficient employment of these adders affect the
performance of entire system [8]. High speed processing devices consumes less power and
there is a high demand for these kind of portable devices like PDAs, cell phones etc. Addition
is one of the basic and commonly used arithmetic operation for many signal processors, digital
filters, application specific Digital Signal Processors (DSPs), microprocessors and many other
diverse applications. There are many basic constraint faced by designers such as high
throughput, low power consumption, high speed and small silicon area. Adders are the essential
element which effects the entire system. Some applications of adders are in the Arithmetic
Logic Unit (ALU), the floating-point unit, subtraction, multiplication, division and for address
generation in case of cache memory access.
The purpose of integrated electronics is to compress complex electronic circuits in minimum
area with reduction in power dissipation and delay. With the era of technological advancement,
reducing the number of transistors and ultra-low power design has become the driving force
for integration of more and more applications without incurring any overhead in terms of
silicon area. The performance of design is substantially governed by three important factors
viz. area complexity, delay performance and regularity of interconnection. The regularity of
interconnection means the way transistors are laid down, routing of interconnects in the best
possible way and complying with the rules of layout. Area of the circuit also depends on the
interconnection of wires which exhaust most of the area of a VLSI circuit. Different logic styles
have been proposed over the years with a trade-off of one performance aspect at the expense
of other. The circuit delay is affected by the number of transistors in series, wiring
interconnections related to wiring capacitances, transistor sizes and number of inversion levels.
Full adder implementation can be achieved by using either one logic style or more than one
logic style. On the other hand, discussing about power which forms one of the vital resources
is a prime concern for the designers. Power dissipation depends upon power supply, switching
activity, frequency, load capacitances (made up of gate, diffusion, and wire capacitances) and
12
control circuit size. The Equation 1 explains the dependence of power on different factors and
also the issues related to it. An important criteria reducing the power consumption is reduction
of supply voltage 𝑉𝐷𝐷 and conveniently using threshold voltage at device level. However, it
leads to increase in circuit delay, degrades the drivability of the adder cells and initiates
threshold loss problem. By selecting a proper W/L ratio, the issues raised can be overcome.
Over the years, significant researches have been made for high performance adder units for
low power application and is still continuing.
A wide range of contemporary adder architectures have been surveyed in literature over a past
few decades [28-39]. Adder architectures can be classified in two broad domains, static and
dynamic. The dynamic full adders are more advantageous with respect to faster switching
speed, fewer number of transistors, full dynamic range and ratioed logic. Ratioed logic is an
attempt to reduce the number of transistors required to implement a logic function, often at the
cost of reduced robustness and extra power dissipation. The number of transistors required for
static is 2N versus N+2 transistors for dynamic logic styles for N input logic function. Regular
structure, fast logic evaluation and compact circuit layout are three pursuits of different logic
styles in history [39]. The concept of static and dynamic adder architectures are more prominent
and can be utilized in efficient way for designing full adders with large number of transistors.
The time and area requirements for various important adders are shown in Table 1.1. The CRA
stands for Carry Ripple Adder, CLA for Carry Look Ahead Adder, parallel-prefix carry look
ahead adder and CSA is Carry Save Adder. The time and area complexity of different types of
adders are defined for n number of stages in the table below:
TABLE 1.1
TIME AND AREA REQUIREMENTS OF DIFFERENT TYPES OF ADDERS [39]
TIME AREA
CRA O(n) O(n)
CLA O(log n) O(n log n)
Parallel-Prefix CLA O(2 log n) O(2n log n)
CSA O(√𝑛) O(n)
There are many logic styles in which adders can be designed like standard CMOS, Differential
Cascode Voltage Switch (DCVS), Complementary Pass-Transistor Logic (CPL), Double Pass
Transistor (DPL), Swing Restore CPL (SR-CPL) and Hybrid styles to build up a general adder
module shown in Figure 1.9. There are multiple ways to design a full adder but this thesis
presents some of the conventional adders in literature with different transistor count in order to
compare the performance of proposed design with the existing designs. The compared adders
are enumerated briefly in this chapter.
A traditional low power 28 transistor design of a CMOS full adder adopts pull-up PMOS
network and pull-down NMOS network [29, 30] but requires large chip area. A complementary
MOS logic style is built with a network of NMOS pull-down and PMOS pull-up network as
shown in Figure 1.10(a). It is advantageous in regards to robustness, reliable operation, easy
placement and routing and is also efficient due to complementary transistor pairs. Due to high
13
number of transistors, its power consumption is high. Large PMOS transistor in pull up network
result in high input capacitances, which cause high delay and dynamic power. One of the most
significant advantages of this full adder was its high noise margins and thus reliable operation
at low voltages. But the disadvantage remains intact with high input loads due to dual network
and weak output driving capability. Further, in Figure 1.10(b), 20 transistor adder design is
shown which was based on transmission gates and CMOS inverters operating with full output
voltage swing [30]. It has better critical delay, power dissipation and PDP than Conventional-
CMOS (CCMOS) and CPL. It also gives better speed than static CMOS, CPL and requires less
number of transistors. Due to high number of internal nodes, there is an increase in parasitic
capacitance [5]. In large arithmetic circuits it gives poor performance because additional
buffers are required at each output due to their weak driving capability increasing power
consumption and area. In [31] and Figure 1.10(c), 16 transistor full adder is depicted with same
operating conditions as 20 transistor full adder [30]. Though it has larger power consumption
than 20 transistor full adder but it works at higher speed. It also had less short circuit power
dissipation compared with 14 transistor full adder [32] which uses pass transistor with XOR
and XNOR gates as shown in Figure 1.10(d), where 𝐴 ⊕ 𝐵 is generated by inverter [32]. This
adder has improved output than single logic adder. This adder has reduced number of
transistors and power dissipating nodes but it has less driving capability and noise immunity
[36]. With ongoing research to reduce transistor count, many versions of 10 transistor adder
were proposed [18, 19]. Initially, one of the 10 transistor, PTL based static energy recovery
full adder was proposed which suffered with the shortcoming of speed and severe threshold
loss [33, 34, 35]. Later, a systematic study led to improved version of 10 transistor full adder
by Bui, Wang and Al-Sheraidah [17] comprising of XOR, XNOR, sum and 𝐶𝑜𝑢𝑡 modules. But
with 2 to 1 MUX and two pass transistor based XOR, Fayed and Bayoumi proposed another
more efficient 10 transistor full adder [35]. Still, the threshold loss problem persisted in the
designs which was later minimized in 10 transistor full adder reported in [36] known as
Complementary and Level Restoring Carry Logic (CLRCL) adder shown in Figure 1.10(e) by
Lui, Hwang, Sheu and Ho. However, the CLRCL had complimented 𝐶𝑖𝑛, which increased the
number of transistors and also has large stage delays for 𝐶𝑜𝑢𝑡 and Sum. In order to overcome
the problems in 10 transistor full adders and to fulfil the urge of lesser number of transistor, 8
transistor logic was implemented as shown in Figure 1.10(f) by combining CMOS logic with
pass transistor logic [24]. The 8 transistor full adder gives output with maximum two stage
delay. The noise margin is substantially increased by proper sizing of 3 transistor XOR gate.
The PDP and area has been found to be better than existing 10 transistor and 14 transistor full
adders but the design suffers from higher power consumption due to short circuit current.
14
Figure 1.9. Adder cell with Three Modules
Module 1: Generate XOR and XNOR functions
Module 2: Sum function
Module 3: Carry function
Figure 1.10(a). 28 Transistor Full Adder [29, 30]
15
Figure 1.10(b). 20 Transistor Full Adder [30]
Figure 1.10(c). 16 Transistor Full Adder [31]
16
Figure 1.10(d). 14 Transistor Full Adder [32]
Figure 1.10(e). 10 Transistor Full Adder [34]
17
Figure 1.10(f). 8 Transistor Full Adder [24]
Figure 1.10. Topologies of Different Full Adder Designs with Reduced Number of
Transistors over the years
This thesis described a novel implementation of 2T XOR gate with reduced transistor count
and thus, least silicon area. The design of 2T XOR gate is based on two PMOS transistors. The
2T XOR gate is used for the design of a 6T full adder.
1.2.3 Compressors
A lot of study has been done for the implementation of fast and efficient adders and multipliers.
The choice of implementation techniques and technologies are the two important criteria of
VLSI industry. As illustrated in the previous chapters, an efficient growth is seen in the
integration of circuit components with limited silicon area [4-25]. The continuous urge for
integration of more and more components on minimum area of silicon has galvanized the
scientists and researchers to employ new trends and techniques.
Multipliers are the central arithmetic block and multiplication is imperative for many DSPs,
general purpose processors, and digital filters etc. [40-44]. Multiplication is a complex
operation that involves three principal stages [45, 46] i.e. 1.) Partial product generation 2.)
Partial product reductions 3.) Final carry propagating addition. Second phase being imperative
for overall performance of processors, reducing the critical path and minimizing time and
power deserves ultimate attention for power proficient design. Compressors are considered as
intermediate PEs (Processing Element) for accumulation of partial product in multiplication.
18
Compressors dictate the overall critical path of the circuit and has led to high speed and reduced
power over the decades [47, 48, 49]. Compressors play an important role in the implementation
of partial product addition in multiplier algorithms. A vast study of compressors is done in
order to minimize the computation complexity for multiplication and thus, higher blocks of
arithmetic circuits.
The simplest and widely used compressors are 3:2 and 4:2 compressors which have been
modified efficiently over the decades for improved results. The conventional design is
illustrated in Figure 1.11(a) and Figure 1.11(b). The conventional adders are the chain of Full
adders which generates carries and sum at each level. There was a delay while generating the
final MSB bits of result. During the partial product addition, the conventional adders are not
enough to reach the time constraints. The carry travels through one adder to another adder. This
generates a larger delay for carry propagation and ultimately efficiency of total circuit goes
down. The compressors are used to minimize delay and area which leads in increasing the
performance of circuit. Compressors dictates the overall critical path of the circuit [47-49]. This
chapter constitutes novel compressor architecture replacing XOR gates in critical path with
MUX to improve overall performance [50-52]. A contemporary design of 5:3 compressor using
full adders and half adder is shown in Figure 1.11(c). Further optimization of 5:3 compressor
is exhibited in S. Chowdhury, A. Banerjee and H. Saha topology in Figure 1.11(d) [50].
Figure 1.11(a). Conventional Design of 3:2 Compressors
19
Figure 1.11(b) Conventional Design of 4:2 Compressors
Figure 1.11(c). Conventional Design of 5:3 Compressors
20
Figure 1.11(d). Existing Implementation of 5:3 Compressors
Figure 1.11. Different Implementations of Compressor Designs
The current work in this thesis presents two architectures of 5:3 compressor based on
XOR/MUX implementation. The idea implemented has two characteristics. Firstly, employing
two transistor 2x1 multiplexer in lieu of XOR gates diminishing the critical path delay.
Secondly, using the proposed novel design of two transistor XOR gates for minimum silicon
area symbolizing global enrichment of performance. It also reduces the stage delays compared
with the previous designs.
1.2.3.1 MUX vs XOR-XNOR
The most primitive and common topology of MUX and XOR-XNOR circuits over the years is
shown in Figure 1.12(a) and 1.12(b) [29]. The inputs are A and B with outputs O and O’. O’ is
the complement of O. S is the select line. O is obtained as 𝐴 ⊕ 𝐵. The complement of O is
known as XNOR gate denoted by logic Equation: A’.B’ + A.B.
21
Figure 1.12(a). MUX Design
Figure 1.12(b). XOR-XNOR Design
Figure 1.12. CMOS Implementations of (a) MUX (b) XOR-XNOR
In Figure 1.12(a), it is evident that transistor switching is formerly attained if both select bit
and complement bit be accessible before the input, leads to global reduction of delay [29].
Thus, eliminating additional inverter stage gives way to low power consumption and area [7].
22
1.2.4 Multipliers
Multipliers play a vital role in any electronic hardware whether it is digital signal processors
(DSPs), digital filters or general purpose processors [40 – 44]. Digital signal processors are
used to perform the common operations such as video processing, filtering and Fast Fourier
Transform (FFT). Such modules perform extensive sequence of multiply and accumulate
computations. A large number of transistors with high switching transitions is used to perform
variety of multiplication operations. For example, in 64 point radix-4 pipelined FFT processor,
multiplier consumes 30% power and occupies 46% chip area. Therefore, with the generation
of advancing technologies, researchers over the decade have been focussing on prime issues
in-order to design multipliers. The desired targets are high speed, low power consumption,
packed and balanced layout, regular interconnection and least silicon area. Power consumption
is the most important concern of all the parameters and thus lot of researches have been made
in literature to reduce it for the implementation of basic units. The reduction in power
consumption of basic unit leads to dwindling of power consumption for the whole system and
also least energy wastage for upcoming technologies. CMOS and pass transistor technologies
are the dominant technologies for high speed, low power and compact VLSI implementation
with their own advantages and disadvantages.
Addition and multiplication of two binary numbers are the two fundamental arithmetic
operations and used in high performance DSP systems. Chapter 3 proposes a unique and
efficient design of six transistor adder. According to the historical statics for the algorithms
performed by large systems, more than 70% instructions are dominated by addition and
multiplication [53]. So, the critical delay of whole system operation is dependent on this phase.
Therefore, we need a high speed multiplier. As the VLSI industry is expanding in computer
and signal processing applications, demand of high speed processing is increasing. So, the
designers mainly concentrate on the need of high speed and low power multipliers in-order to
manufacture high quality DSP chips. The different types of multipliers available in literature
are: parallel multiplier, Booth multiplier, Sequential multiplier, combinational multiplier,
Wallace tree multiplier.
1.2.4.1 Different Types of Multipliers
An efficient multiplier enumerates following characteristics:
1. Power: Multiplier should consume less power for high performance.
2. Area: Should occupy less area on silicon.
3. Speed: There should be less stage delays along a critical path for high speed operations.
4. Accuracy: A good multiplier should give correct result.
The multiplication operation is primarily on ‘add’ and ‘shift’ algorithm. Many variants of
multipliers have been proposed by researchers in literature for efficient and reliable
computation. The number of partial products to be added determines the performance of
parallel multipliers and defines its performance [43]. Booth algorithm was proposed to design
a booth multiplier with reduced number of partial product addition as it is the most critical
stage in multiplier design. Many other modifications have come up for booth multiplication in
literature [54]. In order to gain high speed, Wallace tree algorithm can be used to reduce number
of sequential adding stages explained extensively in [42] and shown through an example of 8
23
bit x 8 bit multiplier in Figure 1.13 [55]. Later, modified booth algorithm and Wallace tree
multiplier techniques combined to explore new ideas and had advantages of both techniques in
one multiplier. However, a major disadvantage in terms of low speed, increase in silicon area
due to irregularity of structure and increase in power consumption because of complex routing
may be the outcome of increasing parallelism. Hence, a serial-parallel multipliers can be
designed for better area and power, compromising the speed. The design of serial-parallel
multiplier is presented in Figure 1.14. With the above defined metrics, it is clearly visible that
type of multiplier employed depends actually on the nature of application. So, array based
multiplier has been used as it consumes low power and have relatively good performance as
compared to Wallace tree multipliers. In other multipliers, additional hardware is required to
improve the performance, but at the cost of increased layout and parasitic. On the other hand,
array multiplier has smaller and regular layout. Therefore, array multiplier is a better choice
due to its lower power consumption, smaller layout and relatively good performance [56-58].
Figure 1.13. Algorithm for 8 bit x 8 bit Wallace Tree Multiplier [55]
24
Figure 1.14. Serial- Parallel Multiplier
The main computational kernel of DSP architectures is the Multiply-Accumulate (MAC) unit
[59, 60]. It computes the product of two numbers and adds the product to an accumulator. The
energy consumption at each level will affect the overall power of MAC unit. An 8 bit MAC
unit has been formulated in 65-nm technology extending the use of 2T XOR gates, 6T adders
and 8 bit x 8 bit multipliers for DSP architectural operations and exhibiting the utilization in
Application-Specific-Integrated-Circuits (ASICs).
1.3 Motivation
IC technology has emerged profoundly in the 1960s when Gordon Moore, then with Fair-child
Corporation and later the founder of Intel, envisioned that the number of transistors that can be
integrated on a single die would grow exponentially with time (this prediction later was called
as Moore’s Law). The integration of a few transistors (referred to as Small Scale Integration
(SSI) to the integration of millions of transistors in Very Large Scale Integration (VLSI) chips
currently in use [4] is shown in Figure 1.2. Early ICs were simple, elementary and only
employed a few couple of logic gates and flip-flops for operation. Some ICs were simply a
single transistor, along with a resistor network, performing a logic function. There have been
four generations of ICs with the number of transistors on a single chip growing from a few to
over millions, in a period of four decades.
The increasing market for complex mobile systems, which has been monitored during the last
years in the worldwide market, led the designers to take into account a fresh objective in the
design of complex digital circuits i.e. the minimization of power consumption. The high
dispersion of systems like laptop and palmtop computers, cellular phones, wireless modems
and portable multimedia applications is one of the most important reason that fuel the critical
25
importance for a low power design. The urge for minimization of power dissipation of the
system is also enforced by some thermal consideration; a large amount of the energy demanded
by a device from the power supply is converted into heat. In this way heat dissipation system
and cooling mechanisms become indispensable for the appropriate and safe operation of the
device and also for its reliability.
Over the years, continuous efforts are being employed by researchers across the globe to come
up with new designs and techniques to reduce the no of transistors, power consumption and
delay for smaller circuits. This in turn can be utilized for more complex circuits at higher
architectural levels. The XOR gates, adders, multipliers, compressors form the basic blocks for
many arithmetic circuits. Reducing power dissipation in digital circuits becomes more and
more important due to an increasing number of transistors on digital chips.
1.4 Problem Definition
The main agenda of this thesis is to design an efficient low power and high performance basic
XOR gate with least number of transistor count. Moreover, the thesis chapters also present its
application to implement bigger blocks of arithmetic circuits like full adder, compressor,
multiplier and MAC unit, making them efficient too. With this objective in mind, a unique two
transistor XOR gate has been proposed in this thesis as a basic building block for VLSI circuits
design. Further, a novel 6 transistor full adder, 5:3 compressor and 8 bit x 8 bit multiplier has
also been proposed in the upcoming chapters.
1.5 Contribution of the Thesis
The current thesis contributes:
A novel approach to design the primary building block of arithmetic circuits i.e. XOR gate with
two transistors.
Another contribution is the design of six transistor adder which is the voltage mode adder with
smallest number of transistors count reported so far.
A high performance 5:3 compressor is designed from novel two transistor XOR gate and
compared with other designs in literature.
The architecture is further extended with the multipliers where the fundamental block is adder.
Adders and multipliers combine in a fashion to come up with the MAC (Multiply and
Accumulate) unit which is used in DSPs and microcontrollers at the industrial level.
A MAC unit design is also implemented in 65-nm technology for depicting its application at
higher level of microprocessors.
The basic two transistor XOR gate reduces power consumption, area, PDP (Power Delay
Product) and EDP (Energy Delay Product) which serves its purpose at higher level of designing
digital signal processors, FIR filters, general purpose processors etc.
26
1.6 Thesis Organization The primary goal of this thesis is to demonstrate the circuit level approach of design which
demands high speed and low power dissipation.
The thesis is organized as follows:
Chapter 1 gives an introduction of the background details and further implementation of the
research work.
Chapter 2 proposes the design of novel two transistor XOR gate. Also, the simulation and
performance analysis of proposed XOR gate with the existing XOR gates in literature has been
tabulated.
Chapter 3 employs 2T XOR gate to formulate six transistor adder with only two stages delay
diminishing logic depth. It has also been elaborated with the simulation results of the
comparison of proposed adder with the conventional adders available in literature in terms of
power, delay and area.
Chapter 4 depicts the application of the novel two transistor XOR gate and six transistor adder
in the implementation of 5:3 compressors and its comparison with the conventional designs.
Chapter 5 shows the application of proposed 6T adder in an 8 bit x 8 bit multiplier and also the
performance analysis with respect to power, delay, PDP (Power Delay Product) and area of 8
bit x 8 bit multipliers designed with different adders in literature. The continuation of the work
has been shown by the design of 8 bit MAC unit in 65-nm technology.
To the best of my knowledge, it is the voltage mode full adder with least number of transistor
count designed so far. To show the uniqueness of proposed model, simulations are executed in
three (65-nm, 90-nm and 130-nm) different technologies. The power and delay simulation of
XOR gates and adders have been carried along with area comparison of adders. The entire
simulation has been carried out using Cadence Spectre with ASSURA to verify the area of the
proposed and existing designs.
Further chapters describe Conclusion, Related Publications and Bibliography.
27
Chapter 2
Design of 2T XOR Gate
This chapter proposes design of the novel two transistor XOR gate and its comparison of
performance with existing XOR gates found in literature. The comparison has been made in
terms of power, delay and Power Delay Product (PDP) in three technologies i.e. 65-nm, 90-
nm, 130-nm with Process, Temperature and Voltage (PVT) variation analysis.
The chapter is organized as follows: Section 2.1 lays down the basic idea of XOR gates. Section
2.2 proposes the design of novel 2T XOR gate with 2.2.1 explaining its operation in detail.
Section 2.2.2 presents the simulation results of performance analysis in form of tables and
figures and Section 2.2.3 discussing the results obtained.
2.1 What is a XOR Gate? XOR gate is a logic gate formally named as exclusive OR and performs the operation defined
as “either A or B” where A and B are the inputs to two input XOR gate and Y is the output as
shown in Figure 2.1. The idea of XOR gate is to operate as a switch for on and off purposes
and are highly useful for circuits which compare values for equality, compute checksums or do
some arithmetic computations. Exclusive OR can be found useful in everyday life and a simple
example of the switching system has been illustrated as follows in Figure 2.2. The system
works only when both switches are in the same position. Glowing of light means binary value
0 and darkness means binary value 1. The system can be turned on or off with any one of the
switches, independent of the position of the other switch. So, if both the switches are in same
position there is light and for different combinations there is darkness. The different
combinations for darkness explains the XOR outcome.
Figure 2.1. Logic Symbol of XOR Gate
28
Figure 2.2. The wiring diagram depicting the control of single light source with two
switches. The light is on when either both switches are switched up or both down
For the two input XOR gate, the gate has two inputs (A, B) and one output (Y) with four
different combinations of input values 00, 01, 10 and 11. XOR element has logic value 1 when
the inputs are at different logic levels. This means, it produces 0 for input combinations 00 and
11, and 1 for combinations 10 and 01 [14]. This operation is also called exclusive disjunction
and can be written in Equation 6 as follows,
𝐴 ⊕ 𝐵= A’.B + A.B’ (6)
2.2 Design of the Proposed Two Transistor XOR Gate The literature survey in Chapter 1 has led to the evolution of XOR gates up to three transistors.
Thus, with the view to further minimize the transistor count efficiently, Figure 2.3 shows the
novel proposed two transistor (2T) XOR gate. The design of the 2T XOR gate is based on two
PMOS pass transistors and a negative reverse biased bulk voltage. The central idea is to obtain
correct logic values of XOR logic by changing 𝑉𝑇 (threshold voltage of PMOS) of the circuit
and modifying the voltage values of bulk terminal i.e. 𝑉𝑆𝐵 [61]. The PMOS transistors act as
pass transistor which are much more efficient in terms of speed than CMOS. Basically, pass
transistor is a logic in which source side is connected to input signal rather than the supply
voltage as shown in Figure 2.3. This reduces number of transistors, runs faster and requires less
power than CMOS logic. But the disadvantage is the reduction in number of active devices in
subsequent stages due to small voltage difference at each cascaded stage. Therefore, the logic
devices channelled in series would require to restore the signal voltage at that stage as each
transistor in series is less saturated at its output than at input [29].
The rationale behind the implementation of two transistor XOR gate is firstly, to further reduce
the transistor count with the reduction in power and silicon area. And secondly, to apply it
further for the implementation of bigger modules of digital circuits as an application. The main
aim was the transition from 3T XOR (CMOS + pass transistor PMOS logic) gate to 2T XOR
(two pass transistor PMOS) gate keeping certain design constraints in mind like:
1. 3T XOR gate constitutes CMOS logic and a pass transistor gate. CMOS logic is
complex, expensive and slower in fabrication as compared with PMOS or NMOS.
29
2. The reduction in number of transistors is to compete with lower power dissipation given
by CMOS logic w.r.t PMOS or NMOS and to generate a design with minimum power
dissipation.
3. Compared with CMOS logic, PMOS only logic is faster in fabrication, less complex,
symmetric and less expensive because the wafers used in fabrication generally have n-
type substrate and to create NMOS, n-wells of PMOS transistors are used as substrate
[1].
4. Compared with NMOS only logic, PMOS only logic has less flicker noise since the
mobility of PMOS is less than NMOS [62]. Flicker noise is directly proportional to
mobility as can be inferred from [63]. It will help in reducing the noise in bigger circuits
derived from smaller module.
5. PMOS only logic is used for the implementation as NMOS only logic will not be able
to utilize the logic behind biasing of substrate to give correct logic level for XOR gate
with minimum transistor count of two.
The efficient implementation of design at smaller block level will lead to optimization of VLSI
circuits and can be used for many electronics devices and signal processing application.
2.2.1 Working of the 2T XOR gate
When A=1 and B=0, M2 transistor turns ON and M1 is normally off but due to Gate Induced
Drain Leakage (GIDL) [1, 6] effect in transistor M1, the output is pulled to logic high. Similar
is the situation with A=0 and B=1. However, for A=1, B=1 both the PMOS transistors are OFF
and the substrate bias sets up an appropriate bulk to drain reverse bias leakage current to give
logic 0 at the output. With A=0 and B=0, both the PMOS transistors are ON and hence, it is
natural to get a logic 0 at the output. But because of the bias given at the substrate, a small bias
appears across the output, which has been observed to be well below the switching threshold
voltage. The relation exhibited between channel length (L), width (W), substrate to bulk
voltage (𝑉𝑆𝐵) of transistor is as shown in Equation 7 and cited in [61].
(7)
Where,
𝑉𝑡0 = Zero bias threshold voltage,
𝛾 = Bulk threshold coefficient,
𝑉𝑆𝐵 = Bulk Potential,
𝜑0 = Fermi Potential,
𝑡𝑜𝑥 = The thickness of the oxide layer,
𝑉𝐷𝑆 = Drain to source voltage,
𝛼𝑣, 𝛼𝑤 = Process dependent parameters.
30
Figure 2.3. Proposed Design of 2T XOR Gate
The effect of 𝑉𝑆𝐵 on the channel can be most conveniently seen as a change in the threshold
voltage 𝑉𝑇. Specifically, if the PMOS substrate biasing is increased, the threshold voltage
decreases. A PMOS transistor has n-type substrate and p+ type drain and source regions. When
a negative reverse bias voltage is applied the electrons of n-type substrate are repelled. Due to
this, current flows in reverse direction of the flow of electrons with p+ type drain or source
regions and n-type substrate acting like a forward bias diode. Thus, a voltage drop occurs across
the junction making the circuit to pull down to lower logic value. A reverse bias voltage of 320
mV source has been used in 65-nm technology. Table 2.1, shows the output values obtained
after the simulation.
TABLE 2.1
SIMULATION LOGIC LEVELS OF 2T XOR GATE AT REVERSE BIAS OF 320 mV
USING 65-nm TECHNOLOGY
INPUT INPUT OUTPUT
A(V) B(V) Y (mV) approx.
0 0 0.0
0 1 700.0
1 0 700.0
1 1 200.0
The body biased value (𝑉𝑆𝐵 = 𝑉𝑟𝑒𝑓) in the circuit has been engineered through voltage divider
circuit from supply voltage (𝑉𝐷𝐷)as shown in Figure 2.4. A diode connected NMOS circuit has
been implemented with consideration of noise sensitivity, area and PVT variations. The circuit
of body biasing includes 1.114 𝜇𝑚2 area every time, when included in the circuit but it still
lays down the area of 2T XOR, 6T adder and 8 bit x 8 bit multiplier less than its peer design as
shown in Table 2.2, Table 3.5 and Table 5.2 later in the thesis. The noise including both flicker
31
noise and thermal noise are within the range of design consideration for all frequencies. The
circuit passes all the process corners. Flicker noise is a type of electronic noise proportional to
1/f (i.e. inversely related to frequency of the electronic circuit) and depends on the area of the
circuit [64]. It is due to traps near Si/SiO2 interface that randomly release and capture carriers.
The bias circuit is affected less by noise on increasing the frequency and is found to be -131
dB for 1 MHz frequency. The theoretical value of flicker noise from Equation 8 has been found
to be -149 dB. The difference in values is accounted for non-ideal width and length of NMOS
transistors practically and material defects. But both the values of noise are negligible to effect
the bias circuit. Thermal noise and resistor noise are very low for the circuit at all frequencies.
So, the circuit acts well in terms of noise. Since, the effect of flicker noise is negligible on the
bias circuit, it acts well with smaller areas of PMOS and NMOS transistors. The diode
connected NMOS requires less area and delivers high mobility to the circuit as compared to
diode connected PMOS. The mobility and area has been taken as an advantage, over the
disadvantage of flicker noise in NMOS only bias circuit. Also, the effect of noise is very less
even for NMOS only logic at high frequencies. A trade-off is preserved in terms of power and
area for conventional diode connected NMOS and supply independent reference voltage
circuits (other method to implement bias circuit).
Flicker noise = K/𝐶𝑜𝑥𝑊. 𝐿. 𝑓 (8)
Where,
K = process dependent constant with the order of 10−25 𝑉2𝐹,
𝐶𝑜𝑥 = capacitance per unit gate area= 3.9𝜀𝑜/𝑡𝑜𝑥= 17.7 x 10−3 F,
W = width of NMOS transistor,
L = Length of NMOS transistor,
f = frequency of operation.
32
Figure 2.4. Diode Connected NMOS
2.2.2 Simulation and Performance Analysis of Proposed 2T XOR Gate
Extensive simulation study of XOR gates has been carried out to compare the proposed design
of the 2T XOR gate with existing designs of XOR gates available in literature. The circuits are
simulated in similar testing environment. In order to cover all different input combinations, the
XOR gates have been studied for four different input patterns. All the simulations and
extraction of net-lists have been done in Cadence Spectre at 65-nm, 90-nm and 130-nm
technologies. The designs are simulated at 50 MHz frequency with 50 ps of rise and fall times.
The proposed and existing XOR gates have been simulated and examined thoroughly.
Comparisons have been made with the existing peer designs in terms of power, delay and
Power Delay Product (PDP).
Figure 2.5 shows the simulation results of 2T XOR gates in 65-nm technology in Cadence with
270 mV and 320 mV reverse bias bulk voltage for analysing the effect of changing threshold
voltage according to Equation 7. The waveforms at different input combination are worth
observing when output logic low is altered by change in 𝑉𝑇 where A, B are the input voltage
waveforms in volts and Y is the output voltage waveform in millivolts. So, it is evident from
the waveform in Figure 2.5(b), at 270 mV, the waveform is less close to logic 0 value compared
to the logic 0 value at 320 mV. This is because threshold value at 320 mV is less than at 270
mV. Thus, voltage drop at 320 mV is more than 270 mV as explained in 2.2.1 section.
Therefore, increasing reverse bias voltage will result closer logic 0. But a trade-off has to be
maintained for logic 0 and logic 1 values for achieving correct digital logic levels for XOR
gates and an appropriate reverse bias voltage is chosen for different technologies.
33
Figure 2.5(a) XOR Gate Simulation at Reverse Bias of 320 mV
Figure 2.5(b) XOR Gate Simulation at Reverse Bias of 270 mV
Figure 2.5. Input and Output Waveforms of XOR Gate
34
Figure 2.6. Calculation of Propagation Delay
Table 2.2 depicts power, delay and PDP (product of power and delay) of the proposed novel
XOR implementation as well as existing XOR gate designs in the literature. The propagation
delay is measured by measuring the time difference between transition of input and output logic
levels to 50% of their values as shown in Figure 2.6. For this, there is a change of input voltage
value to result a change in the output voltage value. The average of all the combination of
outputs 00,01,10,11 of rise and fall times have been taken into account for the evaluation of
delay in XOR gates. So, the delay is measured as the average of all the signal transition levels
of the circuits. The Equations (9), (10) and (11) explains the propagation delay in the MOS
transistors given as,
𝑡𝑝 = 0.69. 𝑅𝑜𝑛. 𝐶𝐿 (9)
𝑅𝑜𝑛= 3
4.
𝑉𝐷𝐷
𝐼𝐷𝑠𝑎𝑡. (1 −
7
9. 𝜆. 𝑉𝐷𝐷) (10)
𝐼𝐷𝑠𝑎𝑡= 𝜇𝑝. 𝐶𝑜𝑥.𝑊
𝐿. (𝑉𝐺𝑆 − 𝑉𝑇)2 (11)
Where,
𝑡𝑝= Propagation delay of the circuit,
𝑅𝑜𝑛= Resistance of PMOS which is ON,
𝐶𝐿= Load capacitance = 0.05 fF,
𝐼𝐷𝑠𝑎𝑡= Current when PMOS is in saturation,
35
𝜆 = Channel length modulation of PMOS,
𝜇𝑝. 𝐶𝑜𝑥= constant of technology = 23.21 𝜇𝐴/𝑉2
TABLE 2.2
COMPARISON OF PERFORMANCE ANALYSIS OF DIFFERENT XOR GATES
Type of XOR gate Technology
(nm)
Avg.
power
(µW)
Avg.
Delay
(ps)
PDP
(𝟏𝟎−𝟏𝟖 𝐉)
6T (Fig(1.5)) [15] 65 2.1880 15.600 34.132
4T (Fig(1.6(a))[16] 65 0.8660 8.125 7.036
4T (Fig(1.6(b))[16] 65 0.0650 13.250 0.861
4T (Fig(1.6(c))[17] 65 0.0340 14.625 0.497
4T (Fig(1.6(d))[18, 19] 65 0.0470 9.125 0.428
3T (Fig(1.7))[24] 65 0.0320 4.253 0.136
2T 65 0.0035 9.375 0.033
6T (Fig(1.5)) [15] 90 6.6300 18.625 123.483
4T (Fig(1.6(a)) [16] 90 0.1790 11.375 2.036
4T (Fig(1.6(b))[16] 90 0.1430 11.500 1.644
4T (Fig(1.6(c))[17] 90 0.0960 12.125 1.164
4T (Fig(1.6(d))[18, 19] 90 0.1020 11.125 1.134
3T (Fig(1.7))[24] 90 0.0410 9.750 0.399
2T 90 0.0039 19.375 0.075
6T (Fig(1.5))[15] 130 19.884 35.750 710.853
4T (Fig(1.6(a))[16] 130 4.4520 14.250 63.441
4T (Fig(1.6(b))[16] 130 0.4896 26.750 13.096
4T (Fig(1.6(c))[17] 130 0.2788 27.194 7.582
4T (fig(1.6(d))[18, 19] 130 0.3147 23.740 7.471
3T (Fig(1.7))[24] 130 0.1179 18.750 2.210
2T 130 0.0090 35.370 0.318
The mathematical evaluation of propagation delay with 𝜆 = 0 (no channel length modulation)
is equal to 19. 79 ps in 65-nm technology for the critical path. The deviation is due to other
secondary effects of PMOS which is prominent as the technology scales down and also due to
consideration of ideal channel length modulation.
36
The same input setting is followed for measurement of power. There is gradual decline of
power from 6T XOR gate to 2T XOR gate with moderate falling off PDP. The power delay
product is calculated by multiplying the average power with the average delay. A contemporary
idea of sacrificing power for delay can be studied from previous designs eventually giving
minimal PDP [65]. The dominant power dissipation of the proposed circuit is dynamic power
dissipation which depends on the switching transitions. Compared with 3T XOR gates, 2T
XOR gates have less dynamic power dissipation due to less number of transistors which leads
to lesser switching transitions. The dynamic power dissipation for frequency 50 MHz and load
capacitance of 0.05 fF will be theoretically equal to 2.5 nW. The difference is due to additional
power consumption due to static, leakage power and other prevailing secondary effects. Since
pass transistor logic is used which has variable input gates rather than constant power lines,
only one signal path will be active at a time to avoid short between inputs. Thus, giving small
power dissipation than CMOS logic of 3T XOR gate. Thus, practically simulating the design,
it is evident from Table 2.2 that there is a large drop in power consumption for the 2T XOR
gates. The comparison of performance analysis of different XOR gates has also been pictorially
represented through histograms in Figure 2.7. The histogram clearly indicates the difference in
PDP levels from 6 transistor XOR gate to proposed 2 transistor gate in 65-nm, 90-nm and 130-
nm technologies. The bars for 4T A, 4T B, 4T C and 4T D is for four transistor (4T) XOR gates
in figures 1.6(a), 1.6(b), 1.6(c) and 1.6(d) respectively.
Figure 2.7(a) Comparative Analysis of PDP of XOR Gates at 65-nm Technology
37
Figure 2.7(b) Comparative Analysis of PDP of XOR Gates at 90-nm Technology
Figure 2.7(c) Comparative Analysis of PDP of XOR Gates at 130-nm Technology
Figure 2.7. PDP (vs) Technology for XOR Gate Architectures
Sub threshold leakage which contributes in total power dissipation occurs when devices are in
off state i.e. 𝑉𝐺𝑠 = 0. In order to sustain the improvement in gate delay for digital circuits with
scaling of technology, MOSFET devices must be scaled aggressively in terms of threshold
voltages. However, the reduction in device threshold voltage will lead to exponential increase
in subthreshold leakage. The expression for drain current of PMOS in subthreshold region is
depicted from Equation (12) and (13) below which explains the above effect.
38
𝐼𝐷 = 𝐼0. (𝑊
𝐿) exp (𝑘.
𝑉𝐺
𝑈𝑇) . [exp (−
𝑉𝑆
𝑈𝑇) − exp (−
𝑉𝐷
𝑈𝑇)] (12)
𝑈𝑇 = 𝐾𝑇/𝑞 (13)
Where,
𝐼𝐷 = Drain current of PMOS,
𝐼0 = Process dependent constant,
𝑊 = Width of PMOS,
𝐿 = Length of PMOS,
𝑘 = Gate coupling coefficient,
𝑉𝐺 = Gate voltage,
𝑉𝐷 = Drain voltage,
𝑉𝑆 = Source voltage,
𝑈𝑇 = Thermal voltage = 26 mV.
The above equation shows the dependence of threshold voltage on drain current. Due to
topology of the proposed 2T XOR gate and appropriate biasing, the gate leakage power has
been found in range of femto watts (fW) for the proposed XOR circuit in all the three
technologies via 65-nm, 90-nm and 130-nm. The leakage power obtained is too small to effect
the overall consumption of power. The leakage power is calculated by computing the gate
leakage current of individual transistor in the circuit when the transistors are switched off.
Then, total leakage current is simply the sum of individual leakage current of all the gates.
The leakage currents are mainly a reason of big concern in analog circuits. In digital circuits,
more focus is on the correct logic levels obtained and also on how to enhance the circuit to get
better logic levels. The body biasing method increases the sub-threshold leakage of the 2T
XOR gate but has negligible effect on total power [66]. The gate leakage is in range of femto-
watts (fW) equal to 263.9 fW obtained by summation of leakage power of individual gates of
PMOS transistors and is least for 65-nm due to technology scaling [67].
The XOR gates have also been scrutinized with respect to noise margins in 65-nm, 90-nm and
130-nm. Table 2.3 shows the comparison of noise margins of 2T XOR gate designs with the
design of XOR gates available in literature. Noise margin is defined as the amount of noise a
circuit can withstand without compromising the output logic level and it is input pattern
dependent [7]. Noise margin are found to be comparable. 𝑁𝑀𝐻(High Noise Margin) and
𝑁𝑀𝐿(Low Noise Margin) are studied by performing the DC analysis of circuit in Cadence to
find the Voltage Transfer Characteristics(VTC) and balancing the switching probabilities of
the two PMOS transistors at GND(logic ‘0’) and 𝑉𝐷𝐷(logic ‘1’).
The XOR gate is more extensively analysed for the impact of Process, Voltage and
Temperature (PVT) variation in 65-nm, 90-nm and 130-nm technologies. The worst-case/best-
39
case analysis had been performed by analysing the process corners of the circuits. The aim in
PVT analysis is to find the worst-case and best case performance values across all PVT corner.
In PVT-aware design, the aim design is such that it maximizes performance and meet
specifications across all PVT corners. The process variation tolerance incorporates bias voltage
change of (+/-) 100 mV (from nominal value of 320 mV in 65-nm technology), temperature
variation from -20℃ to 70℃ (from a nominal room temperature value of 27℃) and including
slow-slow, fast-fast, slow-fast and fast-slow process corners. So, the worst and best
temperatures at which the proposed XOR gate works correctly is -20℃ and 70℃ respectively
and the maximum variation of bias voltage for correct logic values is (+/-) 100 mV.
Statistically, a circuit fulfilling (+/-) 10% variation is considered an appropriate design.
TABLE 2.3
COMPARISON RESULT OF NOISE MARGIN OF DIFFERENT XOR GATES
Types of XOR gate Technology
(nm)
𝐕𝐎𝐇
(𝑽)
𝐕𝐎𝐋
(𝑽)
𝐕𝐈𝐇
(𝑽)
𝐕𝐈𝐋
(𝑽)
𝐍𝐌𝐇
(V)
𝐍𝐌𝑳
(V)
6T (Fig(2.5)) [9] 65 1.000 0.000 0.690 0.318 0.310 0.318
4T (Fig(2.6(a))[10] 65 1.000 0.000 0.667 0.357 0.333 0.357
4T (Fig(2.6(b))[10] 65 1.000 0.000 0.690 0.460 0.310 0.460
4T (Fig(2.6(c))[11] 65 1.000 0.000 0.600 0.420 0.400 0.420
4T (Fig(2.6(d))[12, 13] 65 1.000 0.000 0.630 0.450 0.370 0.450
3T (Fig(2.7))[18] 65 1.000 0.000 0.520 0.240 0.480 0.240
2T 65 1.000 0.000 0.650 0.280 0.350 0.280
6T (Fig(2.5)) [9] 90 1.000 0.000 0.680 0.480 0.320 0.480
4T (Fig(2.6(a)) [10] 90 1.000 0.000 0.699 0.372 0.301 0.372
4T (Fig(2.6(b))[10] 90 1.000 0.000 0.640 0.480 0.360 0.480
4T (Fig(2.6(c))[11] 90 1.000 0.000 0.600 0.360 0.400 0.360
4T (Fig(2.6(d))[12, 13] 90 1.000 0.000 0.600 0.400 0.400 0.400
3T (Fig(2.7))[18] 90 1.000 0.000 0.680 0.440 0.320 0.440
2T 90 1.000 0.000 0.640 0.280 0.360 0.280
6T (Fig(2.5))[9] 130 1.200 0.000 0.960 0.320 0.320 0.240
4T (Fig(2.6(a))[10] 130 1.200 0.000 0.920 0.360 0.301 0.280
4T (Fig(2.6(b))[10] 130 1.200 0.000 0.840 0.360 0.360 0.360
4T (Fig(2.6(c))[11] 130 1.200 0.000 0.960 0.264 0.400 0.240
4T (fig(2.6(d))[12, 13] 130 1.200 0.000 0.984 0.312 0.400 0.216
3T (Fig(2.7))[18] 130 1.200 0.000 0.910 0.320 0.320 0.290
2T 130 1.200 0.000 0.970 0.280 0.360 0.230
40
𝑉𝑂𝐻 = output high voltage
𝑉𝑂𝐿 = output low voltage
𝑉𝐼𝐻 = input high voltage
𝑉𝐼𝐿 = input low voltage
𝑁𝑀𝐻 = high noise margin = 𝑉𝑂𝐻-𝑉𝐼𝐻
𝑁𝑀𝐿 = low noise margin = 𝑉𝑂𝐿-𝑉𝐼𝐿
2.2.3 Results and Discussions
The waveforms in Figure 2.5 clearly demonstrate the output of a XOR gate. There is change in
the values of logic levels as the threshold voltage is varied with the variation of reverse bias
bulk voltage. Appropriate values are adjusted maintaining a trade-off between logic levels for
different technologies.
Only PMOS circuit has been used in order to stick to the idea of implementing two transistor
XOR gates because using CMOS logic won’t be able to generate the required logic with the
least number of transistors. Though CMOS has less power dissipation compared to PMOS
transistors but it is more complex and expensive. PMOS transistors are faster to fabricate,
highly controllable and reliable. Moreover, comparing with the existing XOR gates available
in literature, the two transistor only PMOS circuit will still have least power dissipation.
Secondly, PMOS is chosen over NMOS devices (though NMOS has higher mobility) because
using a reverse bias voltage for NMOS only circuit won’t be able to produce the desired logic
level for XOR gate due to its different behaviour than PMOS devices thus compromising area
(even though area occupied by NMOS is less than PMOS). It can be seen from Table 2.3 that
average delay of 2T XOR gate is more than 3T XOR gate due to employment of PMOS logic
as it has less mobility than NMOS or CMOS (which has been used for 3T XOR gate) circuit.
But the overhead is compensated by reduced power consumption. Other ways using NMOS
and CMOS transistors to implement 2T XOR gates can be incorporated as part of future
research works.
The calculation and comparison details of power, delay and PDP is compiled in Table 2.2. The
power is decreasing with the minimization of number of transistor required to design XOR
gates over the years. The power delay product for the proposed 2T XOR gate is found to be as
low as 0.033 aJ as compared with 3T XOR gate value of 0.136 aJ in 65-nm technology. The
PDP value is lowered by approximately 75.73% from 3T XOR gate to 2T XOR gate. The
highest value of PDP for 6T XOR gate is 15.6 aJ. The power and delay for different XOR gate
implementation is manifesting regular trend in 65-nm, 90-nm and 130-nm with least power
consumption for proposed 2T XOR gate. The noise margin are examined and depicted in Table
2.3. It shows the efficiency of the proposed and existing XOR gates design and its effective
employment over the decade for utilization in bigger units. The leakage power in terms of gate
leakage and sub-threshold leakage has also been determined. Process, Voltage and
41
Temperature (PVT) variations are taken into consideration by varying P, V and T over their
allowable ranges and analysing the resultant combinations or so-called PVT corners.
42
Chapter 3
Design of 6T Adder using Novel 2T XOR Gates
This chapter explains the unique design of 6T adder utilizing two 2T XOR gates described in
Chapter 2. Using XOR gates is a general way to construct adder with efficient and appropriate
operation. So, Chapter 3 is the next step towards design of adders using XOR gates which can
be utilized further at higher levels of design.
The chapter is organized as follows: Section 3.1 explains the basic operating principle of adders
and its general applications in VLSI design. Section 3.2 proposes the novel architecture of 6T
adders employing unique model of 2T XOR gates from previous chapter followed by Section
3.2.1 discussing the simulation and performance analysis of adders, Section 3.2.2 enunciating
the layout view of the proposed 6T adder model and Section 3.2.3 outlines the analysis of
results obtained in the chapter.
3.1 What is an Adder? Adders or summers, electronically signifies a digital circuit that performs addition of numbers.
It efficiently adds two digital n-bit binary numbers where n is the number of bits required.
Digital adders adds two or more binary numbers to generate two outputs as sum and carry. The
adders can be classified as half adder and full adder according to its ability or way to combine
binary numbers.
3.1.1 Half Adders
Half adder is a combinational circuit that takes two inputs A and B to produce two outputs Sum
(S) and carry (C) as shown in Figure 3.1. It is built using two logic gates, XOR gate for sum
and AND gate for carry. The input variables A and B are called as addend and augend. The
truth table describing the operation is given by Table 3.1 and the logic gate level circuit
equation governing its operation is given by Equation (14) and Equation (15) as follows:
𝑆 = 𝐴 ⊕ 𝐵 (14)
𝐶 = 𝐴. 𝐵 (15)
Figure 3.1. Circuit Diagram of Half Adder
43
TABLE 3.1
TRUTH TABLE FOR HALF ADDERS
INPUT INPUT OUTPUT OUTPUT
A B SUM(S) CARRY(C)
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
3.1.2 Full Adders
A 1-bit full adder circuits functionality can be summarized by Equation (16) and (17) given the
three 1-bit inputs A, B and Cin to produce outputs as Sum and carry (Cout) as shown in Figure
3.2. The logic circuit of full adder uses two half adders and one OR gate. It is usually utilized
as a cascade of adders such as ripple carry adder in which Cout of one adder is the Cin for
another adder. The critical path is defined through two XOR gates till the sum bit output. Using
only two types of gates is convenient if the circuit is being implemented using simple IC chips
which contain only one gate type per chip. The truth table is given by Table 3.2 and Equations
(16) and (17) are the governing Boolean equations as below:
𝑆𝑢𝑚 = 𝐴⨁𝐵⨁Cin (16)
Cout = A’.B. Cin + A.B’. Cin + A.B.Cin’ + A.B.Cin = Cin (A’B +AB’) + AB (Cin + Cin’)
Cout = Cin (𝐴⨁𝐵) + AB (17)
Figure 3.2. Logic Circuit of Full Adder
44
TABLE 3.2
TRUTH TABLE FOR FULL ADDER
A B 𝑪𝒊𝒏 SUM 𝑪𝒐𝒖𝒕
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
3.2 Design of Six Transistor Full Adder The proposed 2T XOR gate has been used to design a 6T full adder. The two outputs SUM and
CARRY (𝐶𝑜𝑢𝑡) can be generated based on the Boolean Equations (16) and (17) of full adder.
The approach to implement the full adder in this thesis uses two XOR gates for SUM output
and a 2 x 1 multiplexer to generate a carry output. The inputs to the circuit are A, B and 𝐶𝑖𝑛.
The critical three input XOR function of full adder required for sum bit calculation is perfectly
suited for implementation in pass transistor due to its multiplexer structure.
The exclusive ORing realized uses wired logic [34] of 2T XOR gate as depicted in Equation
(16) to give rise to sum output and the final carry output given by Equation (17) is implemented
using M5 and M6 pass transistors. The W/L ratio of M5 and M6 transistors are W=300 nm,
L=60 nm in 65-nm technology. The W and L of transistors from M1 to M4 is same as defined
for 2T XOR gate. The schematic of the proposed six transistor full adder is shown in Figure
3.3. A reverse bias voltage of 320 mV is kept in order to represent the appropriate logic high
and logic low levels at the output of simulated circuit for 65-nm technology. Evidently, for the
three input combination there is a two stage delay for the sum and carry output. The delay for
carry output is less than the previously designed eight transistor adder [24] (as explained later
in this chapter) which is the critical delay of the circuit used for further finding the PDP [65].
The approach of using minimum width and length is for minimizing the power consumption in
the circuit [61]. The concept of the design has been simulated in three technologies viz. 65-nm,
90-nm and 130-nm and proper reverse bias voltages have been applied for different
technologies to achieve the desired output.
45
Figure 3.3. Schematic Diagram of Proposed 6T Adder in 65-nm Technology
3.2.1 Simulation and Performance Analysis of Proposed 6T Full Adder
The proposed 6T adder is simulated in Cadence environment at 65-nm, 90-nm and 130-nm
technologies. The input and output voltage waveforms for the simulated schematic of adder in
Figure 3.3 is shown in Figure 3.4. The output waveform is given for all the three input
combinational logic as it responds differently for different input patterns. The post layout
simulation of adder is performed using the proposed 2T XOR gate. The circuits are simulated
at 50 MHz with rise and fall times of 50 ps.
The voltage difference is very small in the circuit but it shows the correct output logic levels
as desired for the sum and carry. As the voltage swing for the inputs is 1V in 65-nm technology,
the output value above 0.5V is considered as logic ‘1’ and below that is considered as logic ‘0’.
The voltage degradation in the waveform for the sum output is the result of cascaded XOR
gates which has been implemented through pass transistor logic. It means that the proposed
XOR gate has the strength to drive only one XOR gate without any extra circuitry with correct
logic levels. Moreover, here the circuits have been analysed with minimum width of transistors.
So, the voltage level difference can be increased by increasing the width of the transistors but
that will increase the silicon area. Also, for further implementation of the adder circuit in bigger
modules appropriate use of buffers, inverters or comparator circuits can be employed, if
required, to achieve higher voltage difference. Proper level restorer circuits can be used at
different points of the circuit to reduce the effect of voltage degradation and noise [68].
Consequently, relevant reverse bias value is used for proper operation of adder. Typically, the
46
width of the transistors used for implementing the actual circuits is minimum width for XOR
gate and with such widths the difference in voltage levels between logic high and logic low are
as high as 138 mV for sum output. The voltage swing is found to be much higher around 400
mV for carry output because of 5X width of transistor M5 and M6.
Figure 3.4. Post Layout Simulation of 6 Transistor Adder at 65-nm Technology
The comparative performance analysis of different adders in terms of power, delay and PDP
has been shown in Table 3.4 and also pictorially through histograms in Figure 3.5 exploring
28T, 20T, 16T, 14T, 10T, 8T available in literature with the proposed 6T adder. The results
indicate that the power delay product of 6T full adder is much less than the other adders
available in literature. The 8T and 6T adders have been designed using the 3T XOR gate
available in literature [24] and proposed 2T XOR gate in the thesis.
47
TABLE 3.4
COMPARISON OF PERFORMANCE ANALYSIS OF DIFFERENT ADDERS
Types of adder Technology
(nm)
Avg.
power
(𝝁𝑾)
Avg.
delay
(ps)
PDP
(𝟏𝟎−𝟏𝟖𝑱)
28T (Fig 1.10(a))[29,30] 65 0.481 11.875 5.711
20T (Fig 1.10(b)) [30] 65 0.317 7.812 2.476
16T (Fig 1.10(c)) [31] 65 0.393 4.625 1.817
14T (Fig 1.10(d)) [32] 65 0.511 3.187 1.628
10T (Fig 1.10(e)) [34] 65 0.129 11.625 1.499
8T (Fig 1.10(f)) [24] 65 0.127 8.625 1.095
6T 65 0.439 1.935 0.849
28T (Fig 1.10(a))[29,30] 90 0.806 21.750 17.530
20T (Fig 1.10(b)) [30] 90 0.281 9.812 2.757
16T (Fig 1.10(c)) [31] 90 0.318 8.500 2.703
14T (Fig 1.10(d)) [32] 90 0.610 4.320 2.635
10T (Fig 1.10(e)) [34] 90 0.665 3.750 2.493
8T (Fig 1.10(f)) [24] 90 0.232 9.500 2.204
6T 90 0.685 2.625 1.798
28T (Fig 1.10(a))[29,30] 130 7.107 20.680 146.972
20T (Fig 1.10(b)) [30] 130 3.768 14.750 55.578
16T (Fig 1.10(c)) [31] 130 5.547 8.500 47.149
14T (Fig 1.10(d)) [32] 130 6.572 3.375 22.180
10T (Fig 1.10(e)) [34] 130 1.510 14.218 21.469
8T (Fig 1.10(f)) [24] 130 1.590 10.437 16.594
6T 130 3.962 3.875 15.352
48
Figure 3.5(a) Comparative Analysis of PDP of Different Adders at 65-nm Technology
Figure 3.5(b) Comparative Analysis of PDP of Different Adders at 90-nm Technology
Figure 3.5(c) Comparative Analysis of PDP of Different Adders at 130-nm Technology
Figure 3.5. PDP (vs) Technology for Adder Architectures
49
The 6T adder is found to behave correctly for all the five process corners namely typical, slow-
slow, fast-fast, slow-fast, fast-slow with bias voltage variation of (+/-) 20 mV and temperature
variation from -10℃ to 40℃ in 65-nm, 90-nm and 130-nm technologies. The dominating
factors of MOSFETs i.e. threshold voltage and (W/L) ratios are randomly varied for different
values to conclude the analysis. Conceptually and practically, due to reduced voltage swing,
the PVT ranges of 6T adder vary from 2T XOR gate and thus have reduced. Still, the circuit
behave correctly with standard deviation of (+/-) 10 % of design parameters with certain
tolerance limits as expected for VLSI circuits. The sub-threshold leakage is reduced by virtue
of reverse biasing and thus, gate leakage power limits to femto-watt (fW) range equal to 701.14
fW which is almost negligible for 6T adder.
3.2.2 Layout Design of Proposed Six Transistor Adder
Figure 3.6, visualizes the layout of full adder in 65-nm technology in Cadence Virtuoso Layout
Editor. It is evident that the interconnect density is lower than that of 8 transistor full adder [24]
leading to low power delay product [69]. The layout is symmetric with the view of having big
sized PMOS on two p-wells and p-wells on n-type substrate.
Figure 3.6. Layout View of Proposed 6T Full Adder
A prime motivation for coming up with the latest researches is to reduce the chip area [4]. The
silicon space used defines the area of any circuit in VLSI design. The number of circuit
interconnections also consumes comparable amount of area. Adders are designed with an effort
to find optimal area complexity making the circuit least expensive. Table 3.5 and Figure 3.7
50
shows comparative study of area for different adders in three distinct technologies. The silicon
area is determined approximately by generating the layout of adder modules with proper
Design Rule Check (DRC) and Layout Versus Schematic (LVS) check. Theoretically and
experimentally, the area of the proposed design is minimum. The trade-off in the silicon area
is how the blocks are placed and how efficiently the routing is done. Based on Table 3.5, one
can easily recognize that the proposed adder with 6T has the smallest chip area with the
inclusion of bias circuit area shown in Figure 2.4.
TABLE 3.5
COMPARATIVE STUDY OF AREA OF DIFFFERENT ADDERS
Types of adder Technology
(nm)
Area
(µm2)
28T (Fig 1.10(a))[29,30] 65 114.519
20T (Fig 1.10(b)) [30] 65 83.723
16T (Fig 1.10(c)) [31] 65 78.723
14T (Fig 1.10(d)) [32] 65 60.083
10T (Fig 1.10(e)) [34] 65 44.208
8T (Fig 1.10(f)) [24] 65 39.214
6T 65 14.517
28T (Fig 1.10(a))[29,30] 90 259.364
20T (Fig 1.10(b)) [30] 90 155.703
16T (Fig 1.10(c)) [31] 90 146.577
14T (Fig 1.10(d)) [32] 90 116.741
10T (Fig 1.10(e)) [34] 90 81.624
8T (Fig 1.10(f)) [24] 90 75.247
6T 90 37.953
28T (Fig 1.10(a))[29,30] 130 290.565
20T (Fig 1.10(b)) [30] 130 195.048
16T (Fig 1.10(c)) [31] 130 179..626
14T (Fig 1.10(d)) [32] 130 127.110
10T (Fig 1.10(e)) [34] 130 93.427
8T (Fig 1.10(f)) [24] 130 85.140
6T 130 45.283
51
Figure 3.7(a) Comparative Analysis of Area of Different Adders at 65-nm Technology
Figure 3.7(b) Comparative Analysis of Area of Different Adders at 90-nm Technology
52
Figure 3.7(c) Comparative Analysis of Area of Different Adders at 130-nm Technology
Figure 3.7. Area (vs) Technology for Different Adder Architectures
3.2.3 Results and Discussions
The waveforms of Figure 3.4 depicts the output of the full adder. There is voltage degradation
in the waveform for the sum output as a result of cascaded XOR gates. Consequently, relevant
reverse bias value is used for proper operation of adder. Also, level restorers can be used at
different points of the circuit to reduce the effect of voltage degradation and noise [68].
The calculation and comparison details of power, delay and PDP is compiled in Table 3.4. The
power delay product is diminishing from 28 transistor full adder design to 6 transistor full adder
design. The PDP for six transistor adder is found to be as low as 0.849 aJ as compared with
1.095 aJ value for 8T full adder in 65-nm technology. The reduction in PDP is approximately
22.46% from 8T adder to 6T adder. The reduction percentage is reduced from 2T XOR gate to
6T adder due to increase number of transistor leading to increase in complexity and lower
voltage swing as compared to that in 8T full adder which will have a better voltage swing. The
highest value of PDP is 5.711 aJ for 28 transistor adder in 65-nm technology. The power and
delay for different full adder architectures follows similar trend in 65-nm, 90-nm and 130-nm
technologies with least power consumption for proposed 6T full adder design as compared to
other adder designs in literature. Process, voltage and temperature variations improvise the
accuracy of the circuit and is valuable for best and worst case analysis. The thesis also gives
an evaluation on the leakage power of the circuit which is negligible due to reverse back biasing
technique.
Based on Table 3.5, in 65-nm technology, one can easily recognize that the proposed adder
with 6T has the smallest chip area of 14.517 𝜇𝑚2 even with the insertion of bias circuit
(1.114𝜇𝑚2). The area is found to be least equal to 16.745 𝜇𝑚2 for 6T adder as compared with
53
39.214 𝜇𝑚2 for 8T adder. Similar is the trend obtained for all the three technologies reducing
silicon area approximately by 58.57% from 8T adder to 6T adder. This novel adder with
minimum area allows to implement more applications per area thus increasing the VLSI
integration and reducing the die area.
54
Chapter 4
Design of 5:3 Compressor using Novel 2T XOR
Gates
This chapter proposes another arithmetic circuit called 5:3 compressor for low power
multiplication purposes. The architecture utilizes two transistor multiplexer design and novel
two transistor XOR gates for the proposed topology giving least number of transistors for logic
level implementation. The modified and proposed compressor designs reduce the stage delays,
transistor count, PDP, EDP (Energy Delay Product) and silicon area by utilizing the
combinations of XOR-XNOR gates, MUX circuits and transistor level implementation when
compared with the conventional designs. Simulation studies have been carried out in 65-nm,
90-nm, 130-nm technologies in Cadence Spectre. The XOR gate, full adder and multiplier can
be further used for many other applications. Different types of compressors like 6:3, 7:3, 8:3
compressors etc. can also be designed based on the same technique as shown in the chapter
which can be further used for multiplication purposes. The design discussed in this chapter
comes as an application to the proposed 2T XOR gates with additional optimization.
The chapter is formulated as follows: Section 4.1 elaborates the basic operation of compressors.
Section 4.2 explains the proposed model of 5:3 compressors with Section 4.2.1 depicting the
simulation and performance analysis of proposed 5:3 compressors compared with the two other
designs proposed in literature to show the efficiency of the proposed design. Section 4.2.2
presents the schematic of the proposed compressor designs. Section 4.2.3 gives the layout view
of proposed 5:3 compressors for area comparisons and finally, Section 4.2.4 is about results
and discussions on the values obtained.
4.1 What are compressors in VLSI design? A compressor is a combinatorial device based upon the logic of the counter of full adder.
Generally, it is used in the multipliers to reduce the number of operands while adding the terms
of partial products. A typical m: n compressor takes m equally weighted input bits and produces
n-bit binary number [70]. In other words, it counts the number of 1s in the input and outputs
the binary count value. The block diagram of 5:3 compressors is shown in Figure 4.1.
The counter property of this compressor is shown in Table 4.1. The counting limit of this
compressor is zero to five. The block diagram of 6:3 and 7:3 compressors are almost similar
like 5:3 compressor; only one more input to be added to 6:3 compressor and two inputs to be
added to 7:3 compressor. I1 to I5 are the inputs and X1 to X3 are the outputs of 5:3 compressor.
Note that the outputs of the compressor have different power-of-2 weights. The weight of the
LSB (X1) of the compressor output is the same as the weight of each of the inputs, and the
remaining bits have increasingly higher weights.
55
Figure 4.1. Block Diagram of 5:3 Compressor
TABLE 4.1
COUNTER PROPERTY OF 5:3 COMPRESSOR
Input
Conditions
Outputs Decimal
Value X3 X2 X1
All inputs are zero 0 0 0 0
Any one input is one 0 0 1 1
Any two inputs are one 0 1 0 2
Any three inputs are one 0 1 1 3
Any four inputs are one 1 0 0 4
Any five inputs are one 1 0 1 5
4.2 Architecture of Proposed 5:3 Compressors A combinational logic circuit of 5:3 compressor is a topology accepting five inputs and
generating three outputs. The five input bits are summed up to produce three bit output. The
conventional design of 5:3 compressor is an enhanced version of 4:2 compressor [71, 72, 73]
and can have maximum value of 101 when all the three bits are 1. The conventional design of
5:3 compressors are shown in Figure 4.1. Figure 1.11(a) is a straightforward approach which
leads to five stage delays and the up-gradation in Figure 1.11(b) [52] entails three stage delays.
The current work involves 2x1 multiplexer substituting XOR gates at second and third stages
producing output with decreased critical path delay. Moreover, the architecture also has
56
profound role in decreasing the PDP, EDP and area. The design of 5:3 compressor has been
derived by suitably altering the Boolean equations as follows:
𝑂𝑜 = 𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3 ⊕ 𝑥4
= (𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3).𝑥4 +( 𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3). 𝑥4
= [(𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3) + (𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3)]. 𝑥4 +
[(𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3) + (𝑥0 ⊕ 𝑥1). (𝑥2 ⊕ 𝑥3) ] . 𝑥4 (18)
𝑂1=((𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3). 𝑥4 +
(𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3) . 𝑥3 ⊕ ((𝑥0 ⊕ 𝑥1). 𝑥2 (𝑥0 ⊕ 𝑥1). 𝑥0)
(19)
𝑂2=((𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3). 𝑥4 +
𝑥0 ⊕ 𝑥1 ⊕ 𝑥2 ⊕ 𝑥3) . 𝑥3) ⊕ ((𝑥0 ⊕ 𝑥1). 𝑥2 + (𝑥0 ⊕ 𝑥1). 𝑥0)
(20)
The proposed architectures are based on Equations (18), (19) and (20).
Figure 4.2, is the modified version of 5:3 compressor encountered in Figure 1.11(b) reducing
the critical path delay. Theoretically as explained above, the changes undergone results in a
more efficient design as compared with the earlier designs of 5:3 compressors. And later
experimental simulations concurrently highlight that the radical changes portray an optimized
design with high speed and low power. The W/L ratios are minimum for XOR gates and 5/1
for multiplexer. The reverse-biased voltages are adjusted according to desired logic levels in
the design in different technologies. MUX* is the block incorporating the inverter stage. The
alternative topology for 2x1 multiplexer has been used in the thesis as shown in Figure 4.3 with
A, B as inputs and S as select line which is faster and consumes lesser power than other CMOS
design of multiplexer [34]. The design exploits the advantages of pass transistor logic over
CMOS logic.
57
Figure 4.2. Architecture of Proposed 5:3 Compressor Design
Figure 4.3. Two Transistor 2x1 Multiplexer Design
The XOR gate used for the formulation of the proposed 5:3 compressor architectures is the
proposed design in Chapter 2 of the thesis and given by Figure 2.3. In order to gain
independence of design, they have been implemented in three technologies viz. 65-nm, 90-nm
and 130-nm. The power delay and energy delay product simulations are also carried out which
is found to be less than its peer designs.
4.2.1 Circuit Design of Proposed 5:3 Compressors
The architectures have been designed and simulated in Cadence Spectre in 65-nm, 90-nm and
130-nm technologies. The schematic diagram of modified and proposed design in 90-nm
technology is as shown in Figure 4.4(a) and 4.4(b).
58
Figure 4.4(a). 3T XOR and 2T 2x1 MUX Compressor (Modified Design)
Figure 4.4(b). 2T XOR and 2T 2x1 MUX Compressor (Proposed Design)
Figure 4.4. Schematic View of 5:3 Compressors
59
4.2.2 Simulation and Performance Analysis of Proposed 5:3 Compressor Architectures
With the aim of evaluating and comparing the performance of proposed designs with
previously reported 5:3 compressor in Figure 1.11(b)( 3 transistor XOR and 6 transistor MUX
compressor) [52], exhaustive simulation studies have been carried out with respect to number
of transistors, delay and power dissipation. The circuits are simulated under same testing
conditions and the response of outputs at four different input patterns is studied. All the
assessment and estimation has been executed in 65-nm, 90-nm and 130-nm technologies. The
simulations are carried out at 2.5 MHz frequency with rise and fall times of 50 ps. Computation
of energy delay product is multiplication of PDP by average delay [1]. The simulated outcome
is exhibited in Table 4.2. The simulation analysis is also depicted through histograms in Figure
4.5 to give a clearer vision of the differences in EDP values at different technology for different
compressors in Table 4.2.
TABLE 4.2
COMPARATIVE ANALYSIS OF PERFORMANCE OF DIFFERENT 5:3 COMPRESSORS
Technology
(nm)
Type of Compressor Circuit No of
transistor
s
Delay
(ns)
Power
(nW)
PDP(𝟏𝟎−𝟏𝟖)
(J)
EDP(𝟏𝟎−𝟐𝟕)
(Js)
130 Conventional(Fig 4.2(b))[38] 28 1.371 2070 2837.97 3890.857
130 3T XOR-2T 2x1 MUX
(Fig 4.6(a))
24 0.811 463.710 376.0688 304.991
130 2T XOR-2T 2x1 MUX
(Fig 4.6(b))
21 1.207 154.460 186.433 225.024
90 Conventional(Fig.4.2(b))[38] 28 0.870 1305 1135.350 987.754
90 3T XOR-2T 2x1 MUX
(Fig 4.6(a))
24 0.578 227.800 131.668 76.104
90 2T XOR-2T 2x1 MUX
(Fig 4.6(b))
21 0.721 128.170 92.414 66.630
65 Conventional(Fig.27(b))[36] 28 0.604 1220 736.880 445.075
65 3T XOR-2T 2x1 MUX
(Fig 4.6(a))
24 0.477 155.706 74.271 35.427
65 2T XOR-2T 2x1 MUX
(Fig 4.6(b))
21 0.553 92.840 51.340 26.984
60
Figure 4.5. EDP (vs) Type of Compressor Circuit in Different Technology
4.2.3 Layout of Proposed 5:3 Compressor Architectures
A contrastive study of silicon area is done for proposed designs and conventional design. The
results obtained are grouped in Table 4.3 and shown diagrammatically through Figure 4.6. The
proposed design has least number of transistor count and it should possess minimum silicon
space during fabrication. This notion is convinced with the layout design as presented in Figure
4.7(a) and Figure 4.7(b) for modified and proposed version respectively. The layout view of
5:3 compressor architectures is in 90-nm technology in Cadence Virtuoso Spectre. The layout
is designed with lowest interconnect density (i.e. routing is kept to approximately minimum)
leading to low power consumption [69]. The layout is built symmetric by placing big sized
PMOS transistors with proper orientation of substrates to cover minimum space without any
errors and following the DRC constraints. The cadence has the flexibility of changing the
orientation of the transistor to give a symmetry as required.
61
TABLE 4.3
COMPARATIVE STUDY OF THE AREA OF DIFFERENT 5:3 COMPRESSORS
Type of design Technology
(nm)
Area
(𝝁𝒎𝟐) Conventional(Fig 4.2(b))[38] 130 278.141
3T XOR-2T 2x1 MUX (Fig 4.6(a)) 130 204.491
2T XOR-2T 2x1 MUX (Fig 4.6(b)) 130 181.249
Conventional(Fig 4.2(b))[38] 90 220.065
3T XOR-2T 2x1 MUX (Fig 4.6(a)) 90 153.960
2T XOR-2T 2x1 MUX(Fig 4.6(b)) 90 131.953
Conventional(Fig 4.2(b))[38] 65 76.497
3T XOR-2T 2x1 MUX (Fig 4.6(a)) 65 61.425
2T XOR-2T 2x1 MUX (Fig 4.6(b)) 65 45.578
Figure 4.6. Area (vs) Type of Compressor Circuit in Different Technology
62
Figure 4.7(a). 3T XOR and 2T 2x1 MUX
Figure 4.7(b). 2T XOR and 2T 2x1 MUX
Figure 4.7. Layout View of Proposed 5:3 Compressors in 90-nm Technology
63
4.2.4 Results and Discussions
A design of 5:3 compressors using 3T XOR gates and 2T XOR gates has been implemented
combining with 2x1 MUX. The design is simulated and scrutinized in terms of power, delay,
PDP and EDP with exploration of layout view for area estimation.
Table 4.2 indicates that the delay of 3T XOR and 2T 2x1 MUX 5:3 compressor is less as
contrasted with 2T XOR and 2T 2x1 MUX 5:3 compressor, but the power dissipation is
approximately half, giving way to reduced PDP and EDP. The decrement of PDP is
approximately 30% in all the three technologies as compared for modified and proposed 5:3
compressors. The trade-off between power and delay has also been found in other peer designs
of literature [50]. The proposed models have shown remarkable improvement in all fields of
VLSI Design Systems.
Table 4.3 denotes the silicon area evaluation which is least for 2T XOR and 2x1 MUX but 3T
XOR and 2x1 also have less area than the conventional designs in the literature of 5:3
compressors architecture. It also shows that due to improvement of technology from 130-nm
to 65-nm the area, power, delay, PDP and EDP also enhances their values. The smallest area
obtained is 45.578 𝜇𝑚2 for 2T XOR and 2x1 MUX 5:3 compressor and thus, there is 25.79 %
approximate reduction in silicon area compared with 3T XOR and 2x1 MUX. The fabrication
will become fast because of the increase in PMOS transistor rather than CMOS. The
complexity decreases with less number of transistor which also effects the routing space. So,
in lieu with new architecture the technology considerations is also an important criteria. As we
go down for lower technologies, the efficiency of the design is enhanced keeping in mind the
accuracy of implementation. A genuine trade off should be maintained for design parameters.
Hence, this model of proposed design can be incorporated efficiently in many design system
like CPU or DSPs to increase the overall performance of the system. Starting with smaller
modules, a more complex module can be created with eminent results. This can change the
emerging trend in VLSI industry and give a way for new researches.
64
Chapter 5
Design of 8 Bit x 8 Bit Multiplier using Novel
2T XOR Gates
The chapter explores the essence of novel design of 8 bit x 8 bit multiplier using two transistor
XOR gates and six transistor full adder. It takes the thesis to a next level of arithmetic operation
which can be further utilized in industrial microprocessors and digital signal processors (DSPs).
The design has been contrasted with other multipliers available in literature (multipliers formed
by using the different count XOR gates and adders in Chapter 2 and Chapter 3) in terms of
power, delay, Power Delay Product (PDP) and area. The idea has been broadened with the
application of two transistor XOR gates, six transistor adders and 8 bit x 8 bit multiplier for the
conception of 8 bit Multiply-Accumulate (MAC) unit in 65-nm technology. The comparisons
are made in Cadence Virtuoso Spectre in UMC 65-nm, 90-nm and 130-nm technologies.
The chapter is organized as follows: Section 5.1 explains the basic operation of multipliers in
a general sense. Section 5.2 proposes 8 bit x 8 bit multiplier design concept with sub-section
5.2.1 constituting the working of array multiplier. Section 5.2.2 giving details of simulation
and performance analysis of proposed multiplier design compared with multiplier designed by
adder design logic style in literature. Section 5.2.3 presents the layout of 8 bit x 8 bit multiplier
architecture and section 5.2.4 gives an overview of Multiply-Accumulate (MAC) unit. Section
5.2.5 briefly explains a module of MAC i.e. registers/accumulators (True Single Phased
Clocked Register (TSPC)) and Section 5.2.6 gives the conclusion and result of overall chapter.
5.1 What is a Multiplier?
Multiplier is a circuit in VLSI domain that performs multiplication operation. Multiplication in
terms of mathematical operation is an abbreviated process of adding an integer to itself to a
given number of times [74]. Multiplication in its most basic aspect is the product of two binary
numbers namely multiplicand and multiplier. At elementary level, the multiplication operation
is performed by placing multiplicand on top of multiplier. The result is obtained by multiplying
each digit in multiplier with multiplicand beginning with the least significant digit (LSD). The
initial stage involves partial product generation which are compressed through compressors to
generate the product matrix. The intermediate results (partial products) are placed offset by one
and placed one atop the other for alignment of digits of same weight. The final product matrix
is determined by summation of all intermediate results. The multiplication technique equally
applies to all base including binary. In general way, the basic data flow mechanism for
multiplication technique is described in Figure 5.1 with each black dot for each digit.
65
Figure 5.1. Basic Multiplication
5.1.1 Multiplication Algorithm
Multiplication is a simple operation in digital electronics. The classical algorithm dictates
multiplication of two binary number with the help of flowchart in Figure 5.2 where Most
Significant Bit (MSB) represent the sign of the digit. The algorithm also dedicates the
multiplication of n-bit multiplicand with m-bit multiplier to generate m partial product and n +
m bits of product matrix in an array form as shown in Figure 5.3.
Y= Yn-1 Yn-2 ….....................Y2 Y1 Y0 Multiplicand
X= Xn-1 Xn-2….................. X2 X1 X0 Multiplier
Figure 5.2. Signed Multiplication Algorithm
66
Figure 5.3. Product Matrix
The equation for addition is:
P (m + n) = Y (m). X (n) = ∑ 𝑦𝑖𝑚−1𝑖=0 ∑ 𝑥𝑗
𝑛−1𝑗=0 2𝑖+𝑗 (21)
Where P represents the products, 𝑦𝑖 is 𝑖𝑡ℎbit of multiplier and 𝑥𝑗 is 𝑗𝑡ℎ bit of multiplicand.
Generally, add and shift operation is performed by the product matrix. Thus, multiplication
involves mainly three steps [45, 46, 53]:
1. Generation of partial product
2. Accumulation of shifted partial product and its reduction
3. Final addition
The AND gates are used to generate partial products and the way these partial products are
generated and summed up. The logical AND operation is followed by decomposition of
multiplication into addition operation.
As 8 bit x 8 bit multiplier is proposed in the thesis, a multiplication example of two 8 bit
numbers A and B to produce a 16 bit product P is shown in Figure 5.4.
Figure 5.4. Example: Multiplication of 8 bit x 8 bit Binary Numbers
67
The above matrix shows clearly that the multiplication has been commutated to addition of
binary numbers and thus exhibits three phases:
1. Add the multiplicand to an accumulator if the Least Significant Bit (LSB) of multiplier
is ‘1’.
2. Shift multiplicand to one bit left and multiplier to one bit right.
3. The operation is halted when all the bits of multiplier is zero.
A serial adder with least hardware gets implemented when the partial products are added
serially. The partial product when added using one combinational circuit forms a parallel
multiplier. However, different compression techniques can be exploited for reduction of partial
product to evolve with different kind of multiplier.
5.2 Design of Proposed 8 Bit x 8 Bit Multiplier An 8 bit x 8 bit multiplier has been implemented using proposed 6T adder. The result of
multiplication is obtained by multiplying two 8 bit numbers in a traditional array architecture
as shown in Figure 5.4 to get the desired 16 bits output. Array multiplier is proposed to achieve
low power and high speed multiplication operation with lesser hardware cost.
5.2.1 Array Multiplier
Array multiplier is a multiplier with traditional structure. The architecture is regular and
performs operation by repeated addition and shifting procedure. The algorithm of array
multiplier dictates multiplication of multiplicand bits with one bit of multiplier starting from
its Least Significant Bit (LSB). Then, shifting is done according to the bit sequences. The
structure is organized by several stages of AND gates and full adder cells. For the
accomplishment of multiplication of N bits, 2N adders and 2N AND gates are required. If A,
B are the multiplicand and multiplier binary numbers respectively, then P denotes the product
and intermediate results are partial products. 𝑆𝑖 and 𝐶𝑖 represents the 𝑖𝑡ℎ stage input sum and
carry to be given to other block. 𝑆0and 𝐶0 represents the output sum and carry at a particular
stage. The logic style of array multiplier is exhibited in Figure 5.5(a).
Each individual block is made using AND gate and full adder as shown in Figure 5.5(b). The
Boolean equations governing the working of array multiplier are as indicated:
𝑃 = 𝐴. 𝐵 (22)
𝑆0 = 𝑆𝑖 ⊕ 𝑃 ⊕ 𝐶𝑖 (23)
𝐶0 = 𝑆𝑖. 𝑃 + 𝐶𝑖(𝑆𝑖 ⊕ 𝑃) (24)
5.2.2 Simulation and Performance Analysis of Proposed 8 Bit x 8 Bit Multiplier
The proposed 8 Bit x 8 Bit multiplier has been designed using the combination of 64 six
transistor adders and 64 AND gates in a symmetric array matrix form as depicted by Figure
5.5(a) to generate the product term. The performance of the proposed 8 bit x 8 bit multiplier
has been analysed and compared with 8 bit x 8 bit multipliers designed with adder designs
available in literature [24, 29, 30, 31, 32, 33]. The adders used are the same shown in Chapter
3 from 28T to 6T. A comparison has been made with respect to power, delay and Power Delay
Product (PDP) in 65-nm, 90-nm and 130-nm technologies in Cadence Spectre in Table 5.1 and
68
Figure 5.6. For uniqueness and comparison of model, simulations are executed in three
different technologies. The simulations are carried out at 50 MHz frequency with 50 ps rise
and fall times. The results indicate that PDP of multiplier employing six transistor adder is the
lowest in all the three technologies. The process corner analysis has also been performed for
all process corners with (+/-) 10% variations in bias voltage and temperature.
Figure 5.5(a). An 8 bit x 8 bit Array Multiplier
Figure 5.5(b). Basic Building Block
Figure 5.5. Array Multiplier Architecture
69
TABLE 5.1
PERFORMANCE ANALYSIS OF 8 BIT x 8 BIT MULTIPLIER USING DIFFERENT
ADDERS
Type of adder used in
multiplier
Technology
(nm)
Avg.
power
(𝝁𝑾)
Avg.
delay
(ps)
PDP
(𝟏𝟎−𝟏𝟖𝑱)
28T (Fig 1.10(a))[29,30] 65 42.600 198.557 8458.528
20T (Fig 1.10(b)) [30] 65 36.500 180.468 6587.082
16T (Fig 1.10(c)) [31] 65 27.460 140.083 3846.679
14T (Fig 1.10(d)) [32] 65 24.570 136.774 3360.537
10T (Fig 1.10(e)) [34] 65 23.630 122.866 2903.323
8T (Fig 1.10(f)) [24] 65 20.140 118.018 2376.882
6T 65 15.220 121.816 1854.039
28T (Fig 1.10(a))[29,30] 90 82.120 339.278 27861.509
20T (Fig 1.10(b)) [30] 90 80.470 330.829 26621.809
16T (Fig 1.10(c)) [31] 90 50.290 315.788 15880.978
14T (Fig 1.10(d)) [32] 90 48.460 285.399 13830.435
10T (Fig 1.10(e)) [34] 90 45.900 220.926 10140.503
8T (Fig 1.10(f)) [24] 90 32.168 207.338 6669.648
6T 90 29.060 216.472 6290.676
28T (Fig 1.10(a))[29,30] 130 228.132 392.806 89611.618
20T (Fig 1.10(b)) [30] 130 223.800 364.718 81623.888
16T (Fig 1.10(c)) [31] 130 201.840 306.412 61846.198
14T (Fig 1.10(d)) [32] 130 148.440 303.844 45102.603
10T (Fig 1.10(e)) [34] 130 144.960 301.091 43646.151
8T (Fig 1.10(f)) [24] 130 105.336 258.312 27209.552
6T 130 96.420 264.900 25541.658
70
Figure 5.6(a) Comparative Analysis of PDP of Different Multipliers at 65-nm Technology
Figure 5.6(b) Comparative Analysis of PDP of Different Multipliers at 90-nm Technology
Figure 5.6(c) Comparative Analysis of PDP of Different Multipliers at 130-nm Technology
Figure 5.6. PDP (vs) Technology of Different Multiplier Architectures
71
5.2.3 Layout Design of Proposed 8 Bit x 8 Bit Multiplier
The layout of 8 bit x 8 bit multiplier using proposed 6T adder has been designed using 65- nm
technology is shown in Figure 5.7. The individual array blocks have been positioned in such a
way that the complexity of interconnection is reduced and layout is symmetric. The array
multiplier used gives smaller and regular layout. This leads to robustness and packed design.
The layout is also free from any DRC errors in Cadence virtuoso ASSURA verification suite.
A comparative study on the silicon area for 8 bit x 8 bit multiplier employing variable transistor
count adders is shown in Table 5.2 and Figure 5.8. The Table clearly depicts that the multiplier
with 6T adder implementation has the least area, thus accounting for more applications on a
chip.
Figure 5.7. Layout Design of Proposed 8 Bit x 8 Bit Multiplier
72
TABLE 5.2
COMPARATIVE STUDY OF AREA OF 8 BIT x 8 BIT MULTIPLIER USING
DIFFERENT ADDERS
Type of adder used in
multiplier
Technology
(nm)
Area
(µm2)
28T (Fig 1.10(a))[29,30] 65 11740.087
20T (Fig 1.10(b)) [30] 65 8411.571
16T (Fig 1.10(c)) [31] 65 8264.371
14T (Fig 1.10(d)) [32] 65 6466.196
10T (Fig 1.10(e)) [34] 65 4276.463
8T (Fig 1.10(f)) [24] 65 3735.890
6T 65 1581.069
28T (Fig 1.10(a))[29,30] 90 26668.557
20T (Fig 1.10(b)) [30] 90 18246.264
16T (Fig 1.10(c)) [31] 90 18107.889
14T (Fig 1.10(d)) [32] 90 13507.574
10T (Fig 1.10(e)) [34] 90 12504.680
8T (Fig 1.10(f)) [24] 90 11535.308
6T 90 6253.650
28T (Fig 1.10(a))[29,30] 130 28488.613
20T (Fig 1.10(b)) [30] 130 21300.440
16T (Fig 1.10(c)) [31] 130 20619.317
14T (Fig 1.10(d)) [32] 130 18231.370
10T (Fig 1.10(e)) [34] 130 15272.972
8T (Fig 1.10(f)) [24] 130 12391.646
6T 130 7719.373
73
Figure 5.8(a) Comparative Analysis of Area of Different Multipliers at 65-nm Technology
Figure 5.8(b) Comparative Analysis of Area of Different Multipliers at 90-nm Technology
Figure 5.8(c) Comparative Analysis of Area of Different Multipliers at 130-nm Technology
Figure 5.8. Area (vs) Technology of Different Multiplier Architectures
74
5.2.4 Overview of Design of Multiply and Accumulate (MAC) Unit
The next level of design advances to the design of Multiply and Accumulate (MAC) unit where
Multipliers and Adders are fundamental component in the design as shown in Figure 5.9 [75,
76]. To achieve high performance digital signal processing system for computationally
intensive application and real-time signal processing, a high speed and high throughput MAC
is required. The major criteria in the design of MAC unit over the last few years is speed and
power consumption. Generally for personal communication, low power designs are preferred.
The simulation results of adders in Chapter 3 and results of multiplier in Section 5.2.2 clearly
indicates the improvement in overall performance of the proposed designs in terms of power,
PDP and area. Hence, the proposed architecture is useful for the implementation of Multiply-
Accumulate (MAC) unit for high speed and low power, accounting for minimal area on the
chip.
A typical MAC Unit has three sub units: namely multiplier, adder and accumulator register.
Multiplier finds the various partial products involved. Adder adds up the values of those partial
products generated and saves them in the accumulator register. Figure 5.9 depicts the MAC
architecture when the two binary inputs have N bits and thus depicts a general design which
can be implemented for any number of bits. The two N-bit input is given to the multiplier which
generates 2N outputs. The 2N input is given to the adder for computations. The output of adder
is N+1 bits i.e. one bit is for the carry (N bits+ 1 bit). Then, the output is given to the
accumulator register. The accumulator register used in this design is True single phased clock
register. The output of the accumulator register is taken out or fed back as one of the input to
the carry save adder.
Figure 5.9. Basic Multiply and Accumulate (MAC) Unit
75
Therefore, the thesis presents an implementation of MAC unit in 65-nm technology to show
the application of proposed models at higher level of abstraction. The MAC unit can be used
for industrial purposes for the manufacture of DSP chips with reduced silicon area and
enhanced performance. An 8 bit Multiply and Accumulate (MAC) unit has been simulated in
65-nm technology accounting 8 bit x 8 bit multipliers, 6T adders and registers. A True Single
Phased Clocked Register (TSPCR) as depicted in Figure 5.10 is used as an accumulator/register
unit for implementation of MAC design [1].
5.2.5 True Single Phased Clocked Register (TSPCR)
The True Single Phased Clocked Register (TSPCR) logic integrates basic single phased
positive and negative latches as shown in Figure 5.11. The main aim to use TSPCR is to avoid
clock skew (phenomena in synchronous circuits in which clock signals arrive at different
components at different times). Thus, it eradicates two phase clocking scheme and applies a
single phase clock. For the positive latch, when the clock CLK is high, the latch enters into
transparent mode of operation forming two cascaded inverters; the latch is non-inverting, and
propagates the input (IN) to the output (OUT). On the contrary, when CLK is low, both the
inverters are disabled, and the latch is in hold mode. The pull down circuits are deactivated but
the pull up circuits are still active. Since, the circuit has dual cascaded network, no signal
propagates from the input of the latch to the output. A register is constructed by cascading
positive and negative latches. The load capacitances are C1= 1fF and C2= 0.5fF. The difference
in value of capacitances is due to loading effect of cascaded stages. The TSPC proposes many
advantages like reducing the delay overhead associated with the latches but has a slight
disadvantage of increase in number of transistors.
Figure 5.10. True single Phased Clock Register (TSPCR)
76
Figure 5.11(a). Positive Latch Figure 5.11(b). Negative Latch
Figure 5.11. Positive and Negative Latches
5.2.6 Results and Discussions
An 8 bit x 8 bit multiplier has also been implemented using the design of 6T adder and its
performance has been analysed and compared with similar multipliers designed with peer
adders design available in literature. The Power Delay Product (PDP) of the proposed
multiplier has been found to be as low as 1.854 pJ when compared with immediate multiplier
using 8T adder having 2.376 pJ PDP in 65-nm technology. So, overall decrement or
optimization of PDP in all technologies is 28.2%. The proposed 8 Bit x 8 Bit multiplier has
lesser percentage of reduction in terms of power consumption as compared with the power
consumed by multiplier employing 8T full adder. This is because, inverters are used at higher
level of abstraction to increase the voltage swing which was drawback of cascaded 2T XOR
gate and 6T full adder. But since, very small amount of power dissipation occurs at lower level,
this trade-off can be handled well at higher levels. Other than that, a delay of 3.977-ns and
power dissipation of 1.107-mW is realized from the realization of MAC unit in 65-nm
technology.
Based on Table 5.2, it is evident that silicon area is least for the proposed design in all the three
technologies. The silicon area estimated for multiplier with 6T adder is 1581.069 𝜇𝑚2and with
8T adder is 3735.890 𝜇𝑚2, thus, giving approximately half the area of the existing multiplier.
Area is also a prime concern as it defines the overall cost of the system. So, reduction of area
at the basic level will lead to its benefit for building bigger modules and also more functions
can be designated on single chip favouring cheap investment.
The next chapter of this thesis focuses on conclusions we have drawn from all the experiments
performed.
77
Chapter 6
Conclusion
To err is digital, to forgive human
- Jonathan Fahey, Forbes Magazine
The thesis presents the simulations and performance analysis of proposed two transistor XOR
gate, six transistor full adder, 5:3 compressor designs and 8 bit x 8 bit multiplier in three
technologies viz. 65-nm, 90-nm and 130-nm. A multiply-accumulate (MAC) is also simulated
in 65-nm technology which forms the basic unit of Digital signal Processors (DSPs).The
designs have been simulated in Cadence Virtuoso in UMC technologies.
6.1 Summary of Present Work
The thesis presented the design of a high performance 8 bit x 8 bit multiplier based on the
design of a novel 2T XOR gate which is the XOR gate with smallest transistor count designed
so far. The XOR gate implementation is compared extensively with its peer design in terms of
power-delay product and silicon area. The power-delay product is found to be least with noise
margin comparable with other designs of XOR gates available in literature. The six transistor
full adder designed with XOR gates also has smallest transistor count and minimum power
delay product. Thus, it results into better performance compared with other adders in the
literature. The design simulations of multiplier as well as 6T adder and 2T XOR gate works
well up to 2 GHz frequency. To understand and verify the amount of silicon area required for
the designs, layout of the designs have been done and checked effectively for all the errors.
The current work also shows another arithmetic circuit using two transistor XOR gates in the
design, simulation and layout view of novel 5:3 compressors. The work encompasses in
implementation of compressors by engaging multiplexers superseding the XOR gates. Thus, it
leads to the reduction of critical path delay and reducing transistor count by employing novel
2T XOR gates. The design utilizes least number of transistors for the logic level
implementation of compressors in different technologies and comparison is done at all levels-
delay, power, PDP, EDP and area. The layout has also been designed and simulated. Further,
5:3 compressors can be used for the design of other arithmetic circuits with greater advantages
and also the idea can be employed for the implementation of other processors.
An application of the proposed work has also been depicted through 8 bit x 8 bit multiplier
which can be further applied for implementation of Multiply-Accumulate (MAC) units of
digital signal processors. An implementation in 65-nm technology for MAC unit has been done
78
which forms the fundamental unit for Digital Signal Processors (DSPs) architectural operations
and Application-Specific-Integrated-Circuit (ASICs).
6.2 Limitations of thesis work
As the coin has two sides, everything has advantages and disadvantages. So, there are some
limitations adhered to the work presented in the thesis.
The novel 2T XOR gate reduces power consumption more than 50% as compared to 3T XOR
gate but the delay increases. Going to higher level of implementation, the bigger modules will
involve voltage restorer circuits consuming extra power dissipation. Therefore, the trade-off
obtained to get 75.73% reduction in PDP for 2T XOR gate reduces to 28.2% when implemented
in the multiplier. So, apparently it is more beneficial to use 2T XOR gate and 6T adder modules
because going up the hierarchy, better optimization considerations have to be met in terms of
power consumption, delay, PDP and area. The design of high-density chips in MOS VLSI
(Very Large Scale Integration) technology requires that the packing density of MOSFETs used
in the circuits is as high as possible and, consequently, that the sizes of the transistors are as
small as possible. The device geometry is kept at minimum for 2T XOR gate operation. The
width of devices can be increased further for better voltage swing at the cost of silicon area.
The frequency of operation is 50 MHz and choosing a higher frequency in GHz range will
show differences in voltage levels, power consumption and add to glitches. Accordingly,
proper choices come into picture for utilization of resources at lower level and make higher
level more effective. It means that the amount of power saved at 2T XOR gate can go up for
bigger implementation like MAC unit at higher frequency of operation. The substrate biasing
applied to the back gate of PMOS in 2T XOR gate should be optimum for the voltage levels to
be obtained correctly. This is handled by the optimum threshold voltage values of different
technologies. A logical ‘0’ value below threshold and logical ‘1’ value above threshold should
be maintained. Therefore, substrate biasing can only be played with to an extent.
The technology plays an important role for any digital circuits. The evolution in technology is
leading to smaller nanometre technologies to achieve more functions on a single chip with least
area. But as the device dimensions are systematically scaled down, various physical limitations
(short channel effects) like velocity saturation, change in threshold voltage, subthreshold
conduction, leakage current etc. come into picture and ultimately restrict the amount of feasible
scaling for some device dimensions. It is expected that the operational characteristics of the
MOS transistor will change with the reduction of its dimension. Also, some physical limitations
eventually restrict the extent of scaling that is practically achievable. Scaling of MOS
transistors is concerned with systematic reduction of overall dimensions of the devices as
allowed by the available technology, while preserving the geometric ratios found in the larger
devices. The proportional scaling of all devices in a circuit would certainly result in a reduction
of the total silicon area occupied by the circuit, thereby increasing the overall functional density
of the chip. Thus, to obtain the functionality of circuits special considerations have to be
accounted for optimizing the performance of the circuits designed when going below 65 nm
technology.
79
6.3 Future work
Future work can be focussed on implementing more digital arithmetic circuits using the
proposed novel designs. Additional research work could be spend on minimising the power
and obtaining better voltage swings for adder. An effort to increase the drive strength of 2T
XOR gates can be made by introducing new techniques and solutions. The future work can
also be focussed upon increasing the frequency of operation of the proposed design for
effective use in gigahertz range. Design of DSP chips and high performance processors are also
the future aspects of the design of arithmetic circuits.
A major drive for further research is the promising substitute to CMOS by FINFET technology
which are double gate devices. This eventually will lead to continued technology scaling below
65-nm by overcoming fundamental material and process technology limits in efficient way.
Below 65-nm technology, short channel effects like subthreshold, channel length modulation,
velocity saturation, drain punch through, impact ionization and mobility variations starts
playing a dominant effect. Thus, FINFETs are innovative MOS device structure which gives
superior performance because they are less effected by short channel effects. It is so because
of the way FINFETs are fabricated having thin body structure which control short channel
effects and supress leakage by keeping the gate capacitance in closer proximity to the whole of
the channel. FINFETs have been efficiently used in literature to design XOR gates up to three
transistor and thus efforts can be made to design a two transistor XOR gate.
80
List of Relevant Publications
Published
Himani Upadhyay, Shubhajit Roy Chowdhury, “Design of high speed and
low power 5:3 compressor architectures using novel two transistor XOR
gates” , International Journal of Electronics, Electrical and Computer
Systems(IJEECS), ISSN (Online): 2347-2820, Volume -2, Issue-7, 2014.
Himani Upadhyay, Shubhajit Roy Chowdhury, “A High Performance 8 Bit
x 8 Bit Multiplier Design using Novel Two Transistor(2T) XOR gates” ,
Journal of Low Power Electronics(JOLPE), Volume 11, Number 1, March
2015, pp. 37-48(12)
81
Bibliography
1) Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, “Digital Integrated Circuits”,
A Design Perspective, Second Edition, Prentice Hall Electronics and VLSI Series,
2012.
2) F. Faggin, M.E. Hoff, Jr, H. Feeney, S. Mazor, M. Shima, “ The MCS-4 – An LSI
Micro-computer System,” 1972 IEEE Region Six Conference Record, San Diego, CA,
pp. 1-6, April 1972.
3) M. Shima, F. Faggin and S. Mazor, “ An N-Channel, 8-bit Single-Chip
Microprocessor,” ISSCC Digest of Technical Papers, pp. 56-57, Feb.1974.
4) Gordon E. Moore, “Cramming more components onto integrated circuits”, Electronics,
Volume 38, Number 8, April 19, 1965
5) M. Hosseinzadeh, S.J. Jassbi, and Keivan Navi, “A Novel Multiple Valued Logic
OHRNS Modulo 𝑟𝑛 Adder Circuit”, International Journal of Electronics, Circuits and
Systems, Vol. 1, No. 4, fall 2007, pp. 245-249
6) Neil H.E. Weste, CMOS VLSI Design Circuits & Systems Perspective, Addsion
Wesley, 3rd Edition, 2005.
7) A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design. Norwell,
MA: Kluwer, 1995.
8) Manoj Kumar, Sandeep K. Arya, Sujata Pandey, “Single bit full adder design using 8
transistors with novel 3 transistors XNOR gate,” International Journal of VLSI Design
& Communication Systems, Vol. 2, pp. 47-59, Dec. 2011.
9) R. Zimmermann and R. Gupta, “Low-power logic styles: CMOS versus CPL,” in Proc.
22nd European Solid-State Circuits Conf., Neuchâtel, Switzerland, Sept. 1996, pp.
112–115.
10) J. Yuan and C. Svensson, “New single-clock CMOS latches and flip-flops with
improved speed and power savings,” IEEE J. Solid-State Circuits, vol. 32, pp. 62–69,
Jan. 1997.
11) Y. Leblebici, S.M. Kang, “CMOS Digital Digital Integrated Circuits”, Singapore: Mc
Graw Hill, 2nd edition, 1999, Ch. 7
12) D. Radhakrishnan, “Low-voltage low-power CMOS full adder,” in Proc. IEEE Circuits
Devices Syst., vol. 148, Feb. 2001, pp. 19-24.
13) J. Wang, S. Fang, and W. Feng, “New efficient designs for XOR and XNOR functions
on the transistor level,” IEEE J. Solid-State Circuits,vol. 29, no. 7, Jul. 1994, pp. 780–
786.
14) H. T. Bui, A. K. Al-Sheraidah, and Y.Wang, “New 4-transistor XOR and XNOR
designs,” in Proc. 2nd IEEE Asia Pacific Conf. ASICs, 2000, pp.25–28.
15) H.T. Bui, Y. Wang, A. K. Al-Sheraidah, “Design and analysis of 10-transistor full
adders using novel XOR–XNOR gates,” in Proc. 5th Int. Conf. Signal Process., vol. 1,
Aug. 21–25, 2000, pp. 619–622.
82
16) H. T. Bui, Y. Wang, and Y. Jiang, “Design and analysis of low-power 10-transistor full
adders using XOR-XNOR gates,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal
Process, vol. 49, no. 1, Jan. 2002, pp. 25–30.
17) A. M. Shams, T. K. Darwish, and M. A. Bayoumi, “Performance analysis of low-power
1-bit CMOS full adder cells,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.
10, no. 1, Feb. 2002, pp. 20–29.
18) K.-H. Cheng and C.-S. Huang, “The novel efficient design of XOR/XNOR function for
adder applications,” in Proc. IEEE Int. Conf. Elect, Circuits Syst., vol. 1, Sep. 5–8,
1999, pp. 29–32.
19) H. Lee and G. E. Sobelman, “New low-voltage circuits for XOR and XNOR,” in Proc.
IEEE Southeastcon, Apr. 12–14, 1997, pp. 225–229.
20) M. Vesterbacka, “A 14-transistor CMOS full adder with full voltage swing nodes,” in
Proc. IEEE Workshop. Signal Process. Syst., Oct. 20–22, 1999, pp. 713–722.
21) Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha roy, and Hiranmay Saha, “A
High Speed 8 Transistor Full Adder Design using Novel 3 Transistor XOR Gates”,
International Journal of Electronics, Circuits and Systems, WASET Fall, (2008)
22) Tripti Sharma, K.G.Sharma, B.P.Singh and Neha Arora, “New Efficient Design for
XOR Function on the Transistor Level”, International Conference on Methods and
Models in Science and Technology, 2010 American Institute of Physics.
23) Ahmed M. Shams and Magdy A, “A structured approach for designing low power
adders,” Conference Record of the Thirty-First Asilomar Conference on Signals,
Systems & Computers, vol.1, pp.757-761, Nov. 1997.
24) R. Zimmermann, and W. Fichtner, “Low-power logic styles: CMOS versus pass-
transistor logic,” IEEE J. Solid State Circuits, vol. 32, no. 7, pp. 1079-1090, Jul. 1997.
25) N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, a System Perspective,
Addison-Wesley, 1993.
26) N. Zhuang and H. Wu, “A new design of the CMOS full adder,” IEEE J. Solid-State
Circuits, vol.27, no. 5, pp. 840–844, May 1992.
27) A. M. Shams and M. Bayoumi, “A novel high-performance CMOS1-bit full adder cell,”
IEEE Transaction on Circuits Systems II, Analog Digital Signal Process, vol. 47, no. 5,
pp. 478–481, May 2000.
28) Yingtao Jiang Al-Sheraidah, A. Yuke Wang Sha, E. and Jin-Gyun Chung, “A novel
multiplexer based low-power full adder,” IEEE Transactions on Circuits and Systems:
Express Briefs, vol. 51, no.7, pp.345-348, Jul. 2004.
29) R. Shalem, E. John, and L. K. John, “A novel low-power energy recovery full adder
cell,” in Proc.Great Lakes Symposium on VLSI, pp. 380–383, Feb. 1999.
30) A. Fayed and M. Bayoumi, “A low-power 10-transistor full adder cell for embedded
architectures,” in Proc. IEEE Symp. Circuits Syst., Sydney, Australia, May 2001, pp.
226–229.
31) J.F. Lin, Y.T.Hwang, M.H. Sheu, C.C. Ho, “A novel high speed and energy efficient
10 transistor full adder design”, IEEE Trans. Circuits Syst. I, Regular papers, Vol. 54,
No.5, May 2007, pp. 1050-1059.
83
32) S. Goel. A. Kumar, M. A. Bayoumi, “Design of robust, energy –efficient full adders for
deep sub micrometre design using hybrid-CMOS logic style,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol.14, no.12, pp.1309-1321, Dec. 2006.
33) Zhang, M., J. Gu and C.H. Chang, “A novel hybrid pass logic with static CMOS output
drive full adder cell,” IEEE Int. Symposium on Circuits Systems, vol. 5, pp. 317-320,
May 2003.
34) G.A. Ruiz, M. Granda, “An area-efficient static CMOS carry-select adder based on a
compact carry look-ahead unit”, Microelectronics Journal, Vol. 35, No. 12, 2004, pp.
939-944.
35) Z. Wang, G. A. Jullien, and W. C. Miller, “A new design technique for column
compression multipliers,” IEEE Trans. Comput., vol. 44, pp. 962–970, Aug. 1995.
36) Milos Ercegovac, Tomas Lang, "Digital Arithmetic”, Morgan Kaufman, 2004.
37) I. Koren, Computer Arithmetic Algorithms. Englewood Cliffs, NJ, Prentice Hall, 1993.
38) Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha Roy, Hiranmay Saha,” Design,
Simulation and Testing of a High Speed Low Power 15-4 Compressor for High Speed
Multiplication Applications” First International Conference on Emerging Trends in
Engineering and Technology. 434 – 438, 2008.
39) K. Prasad and K. K. Parhi, “Low-power 4-2 and 5-2 compressors,” in Proc. of the 35th
Asilomar Conf. on Signals, Systems and Computers, vol. 1, 2001, pp. 129–133.
40) C. H. Chang, J. Gu, M. Zhang, “Ultra low-voltage low-power CMOS 4-2 and 5-2
compressors for fast arithmetic circuits” IEEE Transactions on Circuits and Systems I:
Regular Papers, Volume 51, Issue 10, Oct. 2004 Page(s):1985 – 1997
41) Ma GK, Taylor FJ (1990). Multiplier policies for digital signal processing. IEEE
ASSP., 7(1): 6-20.
42) S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, “A 2.4-GS/s FFT processor for OFDM-based
WPAN applications,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 6, pp. 451–
455, Jun. 2010.
43) V. Gowrishankar, D. Manoranjitham and P. Jagadeesh, “Efficient FIR filter design
using modified carry select adder & Wallace tree multiplier”, International Journal of
Science, Engineering and Technology Research, Vol. 2, pp. 703-711, March 2013.
44) D. Radhakrishnan, A.P. Preethy, “Low Power CMOS pass logic 4-2 compressor for
high speed multiplication”, Proceedings of 43rd IEEE Midwest Symposium on Circuits
and Systems, Vol. 3, 2000, pp. 1296-1298.
45) S.F. Hsiao, M.R. Jiang, J.S. Yeh, “Design of high low power 3-2 counter and 4-2
compressor for fast multipliers”, Electronic Letters, Vol. 34, No. 4,1998, pp. 341-343.
46) S. O'uchi, K. Sakamoto, K. Endo, M. Masahara, T. Matsukawa, Y.X. Liu, M. Hioki, T.
Nakagawa, T. Sekigawa, H. Koike and E. Suzuki, “Variable-Threshold-Voltage
FinFETs with a Control-Voltage Range within the Logic-Level Swing Using
Asymmetric Work-Function Double Gates,” in VLSI Technology, Systems and
Applications, 2008.
47) M. C. Wang, “Independet-Gate FinFET Circuit Design Methodology”, International
Journal of Computer Science, 37:1. Feb. 2010.
48) L. Dadda, “Some Schemes for Parallel Multiplier,” Alta Freq,vol. 34,1965, pp. 349–
356.
84
49) C.S.Wallace, “A Suggestion for a Fast Multiplier,” IEEE Transon Electronic
Computers, vol. EC-13, pp. 14–17, 1964.
50) P. Balasubramanian, R.T. Naayagi, “Critical Path Delay and Net Delay Reduced Tree
Structure for Combinational Logic Circuits”, International Journal of Electronics,
Circuits and Systems, Vol. 1, No.1, 2007, pp. 19-29.
51) J. B. Burr and A. M. Peterson, “Ultra low power CMOS technology,”NASA VLSI
Design Symposium. 1991, pp. 4.2.1–4.2.13.
52) R. Gonzalez, B. M. Gordon, and M. A. Horowitz, “Supply and threshold voltage scaling
for low power CMOS,” IEEE J. Solid-State Circuits, vol.32, pp. 1210–1216, Aug.
1997.
53) Y. Berg and T. S. Lande, “Programmable floating-gate mos logic for low-power
operation,” in Proc. IEEE ISCAS, Hong Kong, June 1997, pp. 1792–1795.
54) Shiv Shankar Mishra, Adarsh Kumar Agrawal and R.K. Nagaria, “A comparative
performance analysis of various CMOS design techniques for XOR and XNOR
circuits”, International Journal on Emerging Technologies 1(1): 1-10(2010) ISSN:
0975-8364.
55) RADHAKRISHNAN, D., WHITAKER, S.R., and MAKI, G.K.: Formal design
procedures for pass transistor switching circuits’, IEEE J. Solid-State Circuits, 1985,
SC-20, pp. 53 1-536.
56) P. Gaubert, A. Teramoto, W. Cheng, and T. Ohmi, “Relation between the mobility, 1/f
noise, and channel direction in MOSFETs fabricated on (100) and (110) silicon-
oriented wafers,” IEEE Trans. Electron Devices, vol. 57, no. 7, pp. 1597–1607, Jul.
2010.
57) K. K. Hung, P. K. Ko, C. Hu, and Y. C. Cheng, “A unified model for the flicker noise
in metal-oxide-semiconductor field-effect transistors,” IEEE Trans. Electron Devices,
vol. 37, pp. 654–665, 1990.
58) Y. Tsividis, Mixed Analog-Digital VLSI Devices and Technology, Singapore: McGraw
Hill, 1st edition, 1996.
59) Behzad Razavi, “Design of Analog CMOS Integrated Circuits”, Tata McGraw Hill
Edition, 2002.
60) Amrita Oza, Poonam Kadam, “Techniques for Sub-threshold Leakage Reduction in
Low Power CMOS Circuit Designs”, International Journal of Computer Applications
(0975 – 8887), Volume 97– No.15, July 2014.
61) A. Muttreja, N. Agarwal, and N.K. Jha, “CMOS logic design with independent gate
FinFETs,” in Proc. Int. Conf. Computer Design, Oct. 2007, pp. 560–567.
62) S. Goel, M.A. Elgamel, M.A. Bayoumi, Y. Hanafy, “Design Methodologies for high
performance noise tolerant XOR-XNOR circuits”, IEEE Transactions on Circuits and
Systems – I: Regular Papers, Vol. 53, No. 4, 2006, pp. 867-878.
63) A. Yurdakul, “Multiplierless implementation of 2D FIR filters”,Integration: The VLSI
Journal, Vol. 38, No. 4, 2005, pp. 597-613.
64) S. B Sukhavasi, S. B Sukhasavi, V. B Madivada, H. Khan, and S. R S.Kalavakolanu,
“Implementation of low power parallel compressor for multiplier using self-resetting
logic,” International Journal of Computer Applications, vol. 47, no. 3, June, 2012.
85
65) V.G. Oklobdzija, D. Villeger, S.S. Liu, “A method for speed optimized partial product
reduction and generation of fast parallel multipliers using an algorithmic approach”,
IEEE Transactions on Computers, Vol. 45, No. 3, 1996.
66) P. Stelling, C. Martel, V.G. Oklobdzija, R. Ravi, “Optimal circuit for parallel
multipliers”, IEEE Transactions on Computers, Vol. 47, No. 3, 1998.
67) V.G. Oklobdzija, “High speed VLSI arithmetic unit: Adders and Multipliers”, in
Design of High Performance Microprocessor Circuits”, Editor A. Chandrakasan, IEEE
Press, 2000.
68) J. J. F. Cavanagh, Digital Computer Arithmetic. New York: McGraw- Hill, 1984.
69) Naveen Kumar, Manu Bansal, Navnish Kumar” VLSI Architecture of Pipelined Booth
Wallace MAC unit” International Journal of Computer Application (0975-8887).
70) Fayed, Ayman A., Bayoumi, Magdy A., “A Merged Multiplier-Accumulator for high
speed signal processing applications”, IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), pp 3212 -3215, 2002.
71) S. Knowles, “A Family of Adders”, Proceedings of the 15th IEEE Symposium of
Computer Arithmetic, pp. 271-281, June 2001.
72) F. Carbognani, F. Buergin, N. Felber, H. Kaeslin and W. Fitcher. A low-power
transmission-gate-based 16-bit multiplier for digital hearing aids. Analog Integrated
Circuits and Signal Processing. vol. 56 pp. 5-12 (2008).
73) P.V. Rao, C. Prasanna Raj P, and S. Ravi. Vlsi design and analysis of multipliers for
low power. Fifth IEEE International Conference on Intelligent Information Hiding and
Multimedia Signal Processing, (2009).
74) C.-Y. Han, H.-J. Park, and L.-S. Kim. A low-power array multiplier using separated
multiplication technique. IEEE Transactions on Circuits and Systems II: Analog and
Digital Signal Processing. vol. 48, pp. 866-871 (2001).
75) Avisek Sen, Partha Mitra, Debarshi Datta, “Low Power MAC Unit for DSP Processor”,
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-
3878, Volume-1, Issue-6, January 2013.
76) P.Jagadeesh, S.Ravi, Dr.Kittur Harish Mallikarjun, “Design of High Performance 64-
Bit MAC Unit”, Proceedings of IEEE International Conference on Circuits, Power and
Computing Technologies, Tamilnadu, pp.782-786, 2013.