variable body bias thesis-libre

An Efficient Approach to Low-leakage

Power VLSI Design using Variable Body

Biasing

A thesis Presented

By

Md. Asif Jahangir Chowdhury

Student Id. 0606047

&

Md. Shahriar Rizwan

Student Id. 0606072

In partial fulfillment of the

Requirements for the B.Sc in

Electrical and Electronics Engineering

Department of Electrical and Electronics Engineering,

BUET, Bangladesh

March 2012

iii

Bangladesh University of Engineering

and Technology

CERTIFICATE

This is to certify that the thesis report entitled “An Efficient Approach to Low-

leakage Power VLSI Design using Variable Body Biasing” submitted by Md. Asif

Jahangir Chowdhury, (student id. 0606047), and Md. Shahriar Rizwan, (student id.

0606072) in partial fulfillment of the requirements for the award of B.Sc degree in

Department of Electrical and Electronics Engineering in Bangladesh University of

Engineering and Technology is an authentic work under my supervision and guidance.

To the best of my knowledge, the matter embodied in the thesis has not been submitted

to any other University / Institute for the award of any Degree or Diploma.

Approved by:

Dr. Md. Shafiqul Islam

Professor

Department of Electrical And Electronics Engineering

BUET.

iv

Dedicated

To Our Parents

v

ACKNOWLEDGEMENTS

We would like to express our sincere gratitude and appreciation to everyone who made

this thesis possible. Most of all, we would like to thank our advisor, Professor Dr. Md.

Shafiqul Islam for giving us the opportunity to work under him and lending every

support at every stage of this thesis. We are deeply indebted to his esteemed guidance,

constant encouragement and fruitful suggestions from the beginning to the end of this

thesis. His trust and support inspired us in the most important moments of making right

decisions and we are delighted to work under his supervision.

We would also like to express our gratitude to our beloved parents who inspired us in

each and every step of our lives.

THE AUTHORS

vi

TABLE OF CONTENTS

DEDICATION ......................................................................................................... IV

ACKNOWLEDGEMENTS ...................................................................................... V

LIST OF TABLES ................................................................................................ VIII

LIST OF FIGURES ................................................................................................. IX

LIST OF SYMBOLS OR ABBREVIATIONS ....................................................... XI

ABSTRACT ........................................................................................................... XII

CHAPTER 1: INTRODUCTION........................................................................... 14

1.1 PROBLEM STATEMENT .................................................................................. 16

1.2 CONTRIBUTIONS ...................................................................................... 16

1.3 THESIS ORGANIZATION ........................................................................... 17

CHAPTER 2: MOTIVATION ............................................................................... 18

CHAPTER 3: NOTATION AND BACKGOUND .................................................. 22

3.1 LEAKAGE POWER ............................................................................................... 22

3.2 SRAM CELL LEAKAGE PATHS ............................................................................ 26

3.3 SWITCHING POWER AND DELAY TRADEOFFS........................................................ 27

3.4 CIRCUIT PERFORMANCE ESTIMATION ................................................................. 28

CHAPTER 4: PREVIOUS WORKS ...................................................................... 31

4.1 STATIC POWER REDUCTION VLSI RESEARCH ..................................................... 31

4.1.1 Static power reduction research for generic logic circuits .......................... 31

4.1.1.1 Sleep transistor .................................................................................... 32

4.1.1.2 Forced Stack ........................................................................................ 33

4.1.1.3 Sleepy Stack ........................................................................................ 34

4.1.1.4 Sleepy keeper ...................................................................................... 35

4.1.1.5 Dual Sleep ........................................................................................... 36

4.1.1.6 Dual Stack ........................................................................................... 37

vii

4.1.2 Static power reduction research for SRAM ................................................. 38

4.1.2.1 Sleep transistor .................................................................................... 38

4.1.2.2 Dual Sleep ........................................................................................... 39

4.1.2.3 Dual Stack ........................................................................................... 40

4.1.2.4 Sleepy Keeper in SRAM ...................................................................... 41

CHAPTER 5: VARIABLE BODY BIASING TECHNIQUE ................................ 42

5.1 VARIABLE BODY BIASING APPROACH .................................................................. 42

5.2 VARIABLE BODY BIASING STRUCTURE ................................................................ 43

5.3 VARIABLE BODY BIASING OPERATION ................................................................. 44

5.4 ANALYSIS OF SUBTHRESHOLD LEAKAGE REDUCTION .......................................... 45

5.5 ESTIMATION OF DELAY FOR VARIABLE BODY BIASING TECHNIQUE ....................... 46

CHAPTER 6: EXPERIMENTAL RESULTS ....................................................... 48

6.1 EXPERIMENTAL RESULTS FOR GENERAL LOGIC CIRCUITS ..................................... 48

6.1.1 Experimental results for CO4 ..................................................................... 48

6.1.2 Experimental results for FA ....................................................................... 54

6.2 EXPERIMENTAL RESULTS FOR SRAM ................................................................. 59

6.3 COMPARISON WITH PREVIOUS METHODS ............................................................. 63

CHAPTER 7: CONCLUSION ............................................................................... 65

7.1 CONCLUSION ........................................................................................... 65

7.2 SUGGESTIONS FOR FUTURE WORK ............................................. 66

APPENDIX .............................................................................................................. 67

A. AREA ESTIMATION .................................................................................... 67

B. CIRCUIT DIAGRAMS .................................................................................. 67

BIBLIOGRAPHY .............................................................................................. 73

viii

LIST OF TABLES

Table 1 Power and area results from [15] ................................................................... 19

Table 2 Energy consumption scenario of a cell phone (0.07µ) from [15] .................... 20

Table 3 Leakage model parameters (0.5μ tech) ........................................................... 25

Table 4 Chosen technology and Vdd value .................................................................. 48

Table 5 Static power data for chain of 4 inverters (nano watt) .................................... 49

Table 6 Dynamic power data for chain of 4 inverters (micro watt) ............................. 50

Table 7 propagation delay data for chain of 4 inverters (Pico seconds) ....................... 51

Table 8 Power delay Product data for chain of 4 inverters (femto joule) ..................... 52

Table 9 Area delay data for chain of 4 inverters (µm2) ............................................... 53

Table 10 Static power data for 1 bit full adder (nano watt) ......................................... 54

Table 11 Dynamic power data for 1 bit full adder (micro watt) .................................. 55

Table 12 Data of propagation delay for 1 bit full adder (nano second) ........................ 56

Table 13 Power delay product data for 1 bit full adder (femto joule) .......................... 57

Table 14 Area data for 1 bit full adder (µm2) .............................................................. 58

Table 15 Static power data for SRAM (nano watt) ..................................................... 59

Table 16 Dynamic power data for SRAM (micro watt) .............................................. 60

Table 17 Data of propagation delay for SRAM (nano second) .................................... 61

Table 18 Power delay product data for SRAM (femto joule) ...................................... 62

Table 19 Area data for SRAM (µm2) ......................................................................... 62

Table 20 Comparison of VBB Approach for a Chain of Four Inverters (for 90 nm

process) ..................................................................................................................... 64

Table 21 Comparison of VBB Approach for a 1 bit full adder (for 90 nm process) ..... 64

Table 22 Comparison of VBB Approach for a SRAM (for 90 nm process) ................. 64

ix

LIST OF FIGURES

Figure 1 Sub-threshold leakage of an nFET ................................................................ 22

Figure 2 (a) A single transistor (left) and (b) stacked transistors (right) ...................... 23

Figure 3 SRAM cell leakage paths ............................................................................. 26

Figure 4 logical efforts of basic logic gates ................................................................ 29

Figure 5 Sleep transistor............................................................................................. 32

Figure 6 “Forced Stack” ............................................................................................. 33

Figure 7 sleepy stack .................................................................................................. 34

Figure 8 Sleepy keeper ............................................................................................... 35

Figure 9 Dual Sleep ................................................................................................... 36

Figure 10 Dual Stack ................................................................................................. 37

Figure 11 SLEEP TRANSISTOR IN SRAM.............................................................. 38

Figure 12 “DUAL SLEEP” IN SRAM ....................................................................... 39

Figure 13 “DUAL STACK” IN SRAM ...................................................................... 40

Figure 14 “SLEEPY KEEPER” IN SRAM................................................................. 41

Figure 15 An Inverter with (a) Sleepy Keeper (left) (b) Variable body biasing structure (right) .... 43

Figure 16 (a) Sleep transistor without body biasing transistor ..................................... 45

Figure 17 (a) Inverter with VBB Technique (left) (b) Inverter of equal strength (right) ........ 46

Figure 18 Static Power Consumption (CO4) .............................................................. 49

Figure 19 Dynamic power consumption (CO4) .......................................................... 50

Figure 20 Propagation delay (CO4) ............................................................................ 51

Figure 21 power delay product ................................................................................... 52

Figure 22 Area comparison (CO4) ............................................................................. 53

Figure 23 Static power consumption for FA ............................................................... 54

Figure 24 Dynamic power consumption for FA .......................................................... 55

Figure 25 Propagation delay comparison in FA .......................................................... 56

Figure 26 Power Delay Product for FA ...................................................................... 57

Figure 27 Area Comparison for FA ............................................................................ 58

Figure 28 Static power consumption for SRAM cell .................................................. 59

Figure 29 Dynamic power consumption for SRAM .................................................... 60

Figure 30 Propagation delay comparison for SRAM .................................................. 61

x

Figure 31 Power Delay Product of SRAM.................................................................. 62

Figure 32 Area comparison for SRAM ....................................................................... 63

Figure 33 SLEEP TRANSISTOR .............................................................................. 67

Figure 34 “FORCED STACK” METHOD ................................................................. 68

Figure 35 “SLEEPY KEEPER” METHOD ................................................................ 68

Figure 36 “DUAL SLEEP” METHOD ....................................................................... 69

Figure 37 “DUAL STACK” METHOD ..................................................................... 69

Figure 38 “VARIABLE BODY BIASING TECHNIQUE” ........................................ 70

Figure 39 “SLEEPY KEEPER” (FA) ......................................................................... 71

Figure 40 “DUAL SLEEP” (FA) ................................................................................ 71

Figure 41 “DUAL STACK” (FA) .............................................................................. 72

Figure 42 “VBB” (FA) ............................................................................................... 72

xi

LIST OF SYMBOLS OR ABBREVIATIONS

6-T 6 Transistors.

CMOS Complementary Metal Oxide Semiconductor.

CO4 Chain of 4 Inverter

DIBL Drain Induced Barrier Lowering.

FBB Forward-Body Bias.

FA Full Adder

ITRS International Technology Roadmap for Semiconductors.

MTCMOS Multi-Threshold-voltage CMOS.

RBB Reverse-Body Bias.

SRAM Static Random Access Memory.

VLSI Very Large Scale Integration

ZBB Zero Body Bias

VBB Variable Body Bias

xii

ABSTRACT

The ubiquitous era of emerging portable devices demands long battery life time as a

primary design goal. Subthreshold circuit design can reduce energy per cycle in an

order of magnitude of nominal operating circuits by scaling power supply voltage (Vdd)

below the device threshold voltage. But, it lowers significantly circuit performance as a

penalty. Stringent energy budget and moderate speed requirements of ultra low power

systems in the market may not be best satisfied just by scaling a single supply voltage.

Optimized circuits with dual supply voltages provide an opportunity to resolve these

demands.

The primary focus of this thesis is to provide more efficient low-power solutions for

Very Large Scale Integration (VLSI) designers. Especially, we concentrate on leakage

power reduction. Although leakage power was negligible at 0.18µ technology and

above, in nano scale technology, such as 0.07µ, leakage power is almost equal to

dynamic power consumption.

In this thesis we present a new CMOS circuit design technique called “VARIABLE

BODY BIASING”. This structure dramatically reduces leakage. It tries to combine the

good features from the sleep transistor technique, sleepy keeper technique, dual sleep

technique and dual stack technique. The sleepy transistor technique can achieve ultra-

low leakage power consumption, but loses logic state during sleep mode. Sleepy

keeper, dual sleep and dual stack technique can retain state but the static power

consumption is not satisfactory in these methods. To get satisfactory leakage power

dual - Vth is a must for these techniques. We know using body biasing we can control

the threshold voltage. If a voltage difference is created between body and source of a

MOSFET the threshold voltage increases. With the increase of the threshold voltage

leakage current decreases resulting in a decrease in Static Power or Leakage power. If

high- Vth MOS is used in the circuit (i.e. sleepy keeper, dual sleep, dual stack) delay

increases. So if we can make such arrangements so that the Vth of the sleep transistors

remains low during active mode and becomes high in sleep mode both leakage power

and delay can be checked within desired limits. In our design we have designed this

arrangement.

xiii

In sleep transistor there are two sleep transistors, one PMOS and one NMOS. In our

design we have added another two transistors; one PMOS and one NMOS in such way

that the drain of the added MOS is connected to the body of the sleep transistors

respectively, the gates and source are in parallel. With this arrangement we also added

two MOS like sleepy keeper technique to save state. So when the circuit is in active

mode, there is now voltage difference between the body and source in the sleep

transistors as the added transistors are on and offer no resistance between the body and

source (means body and source are short). When the circuit is in sleep mode there is a

high resistance between body and source resulting in a higher - Vth. Consequently, the

leakage power consumed reduces. Moreover, due to low - Vth during active mode delay

remains in a reasonable range.

One of the advantages of the “Variable Body Biasing” technique is saving state.

Therefore, the “Variable Body Biasing” technique can be applicable memory design,

i.e., Static Random Access Memory (SRAM). When we apply the sleepy stack to

SRAM cell design, we can observe new Pareto points which have not been presented

prior to the research in this thesis. Although the “Variable Body Biasing” incurs some

area overheads, the “Variable Body Biasing” SRAM cell can achieve ultra-low leakage

power consumption while suppressing two main leakage paths in an SRAM cell. When

compared to a high-Vth SRAM cell, which is the best prior state-saving SRAM cell.

14

CHAPTER 1

INTRODUCTION

Ultra-low power applications such as micro-sensor networks, pacemakers, and many

Portable devices require extreme energy constraint for long battery life time.

Subthreshold operation presents an opportunity for such energy-constrained

applications with its very low energy consumption [1-6]. Subthreshold circuits offer a

promising solution for implementing highly energy-constrained systems in clock ranges

of low to medium frequencies for remote or mobile applications.

As the power supply voltage (Vdd) is scaled below the device threshold voltage (Vth),

the subthreshold current ever so slowly charges and discharges nodes for the circuit‟s

logic function [4]. This weak driving current inherently limits the performance but

minimum energy operation of the circuit is achieved with reduced dynamic and leakage

power, resulting in long battery life [7-9].

In the past decades, subthreshold circuit design was not well recognized in the area of

digital circuits as high performance demand was a major concern. Lately, however,

portability has become a trend in the electronics market place. Low energy per

operation is a primary design parameter in such applications. Without the performance

requirement, a subthreshold circuit can operate at its minimum energy operating point

that is only slightly above the absolute minimum voltage [10] that would guarantee the

correct logic function. Even for applications requiring high peak performance, ultra-

dynamic voltage scaling (UDVS) [11] can provide an opportunity for subthreshold

circuit design that would switch between a nominal voltage high performance mode

and an energy efficient subthreshold mode according to the system work load.

Before the mobile era, power consumption has been a fundamental problem. To solve

the power dissipation problem, many researchers have proposed different ideas from

the device level to the architectural level and above. However, there is no universal way

15

to avoid tradeoffs between power, delay and area, and thus designers are required to

choose appropriate techniques that satisfy application and product needs. Power

consumption of CMOS consists of dynamic and static components. Dynamic power is

consumed when transistors are switching, and static power is consumed regardless of

transistor switching. Dynamic power consumption was previously (at 0.18 µ

technology and above) the single largest concern for low-power chip designers since

dynamic power accounted for 90% for more of the total chip power. Therefore, many

previously proposed techniques, such as voltage and frequency scaling, focused on

dynamic power reduction. However, as the feature size shrinks, e.g., to 0.09µ and

0.065µ, static power has become a great challenge for current and future technologies.

Based on the International Technology Roadmap for Semiconductors (ITRS) [12] Kim

et al. report that sub threshold leakage power dissipation of a chip may exceed dynamic

power dissipation at the 65nm feature size [13] .One of the main reasons causing the

leakage power increase is increase of sub threshold leakage power. When technology

feature size scales down, supply voltage and threshold voltage also scale down. Sub

threshold leakage power increases exponentially as threshold voltage decreases.

Furthermore, the structure of the short channel device lowers the threshold voltage even

lower. In addition to sub threshold leakage, another contributor to leakage power is

gate-oxide leakage power due to the tunneling current through the gate-oxide insulator.

Since gate-oxide thickness will be reduced as the technology decreases, in nano-scale

technology, gate-oxide leakage power may be comparable to sub threshold leakage

power if not handled properly. However, we assume other techniques will address gate-

oxide leakage; for example, high-k dielectric gate insulators may provide a solution to

reduce gate-leakage [13]. Therefore, this thesis focuses on reducing sub threshold

leakage power consumption.

There are quite a few static power reduction methods present currently. Most of these

try to establish a balance between power and delay trade-off by implementing different

techniques. One of the most effective dynamic power reduction techniques is lowering

the supply voltage of CMOS transistors because the power consumption of CMOS

transistors increases quadratically proportional to the supply voltage. However,

lowering the supply voltage incurs an increase in transistor switching delays. Therefore,

designing CMOS circuits typically necessitates tradeoffs between performance (in

terms of delay) and power consumption. In this dissertation, we provide a circuit

16

structure named variable body biasing as a new remedy for designers in terms of static

power. With almost 95% reduction of static power, the variable body biasing method

does not degrade the delay or dynamic power consumption of the circuit, which makes

this approach a very attractive one for the circuit designers.

1.1 Problem Statement

This research work addresses new low power approaches for Very Large Scale

Integration (VLSI) logic and memory. Power dissipation is one of the major concerns

when designing a VLSI system. Until recently, dynamic power was the only concern.

However, as the technology feature size shrinks, static power, which was negligible

before, becomes an issue as important as dynamic power. Since static power increases

dramatically (indeed, even exponentially) in nano-scale silicon VLSI technology, the

importance of reducing leakage power consumption cannot be overstressed. A well-

known previous technique called the sleep transistor technique cuts off Vdd and/or Gnd

connections of transistors to save leakage power consumption. However, when

transistors are allowed to float, a system may have to wait a long time to reliably

restore lost state and thus may experience seriously degraded performance. Therefore,

retaining state is crucial for a system that requires fast response even while in an

inactive state. Our research provides new VLSI techniques that achieve ultra-low

leakage power consumption while maintaining logic state, and thus can be used for a

system with long inactive times but a fast response time requirement.

1.2 Contributions

The following items are the main contributions of this research:

Design of Variable body biasing technique for logic circuits

The “VARIABLE BODY BIASING” technique is applied to generic logic circuits, and

we achieve orders of magnitude leakage power reduction compared to the best prior

state saving technique we could find (namely, sleepy stack [14,15], sleepy keeper [16],

dual sleep [17], dual stack [18]).

Design of a Variable body biasing SRAM cell

17

Static Random Access Memory (SRAM) is a power hungry component in a VLSI chip.

Therefore, we apply the “VARIABLE BODY BIASING” technique to SRAM design.

We provide new Pareto points that can be used by designers who want extremely low

leakage power consumption.

1.3 Thesis organization

The thesis is organized into eight chapters:

CHAPTER 1: INTRODUCTION. This chapter introduces power consumption issues in

VLSI. This chapter also summarizes the contributions of this thesis. Finally, this

chapter explains organization of the thesis.

CHAPTER 2: MOTIVATION. This chapter addresses our motivation for this research.

CHAPTER 3: NOTATION AND BACKGROUND. This chapter explains important

notation and background used throughout this dissertation.

CHAPTER 4: PREVIOUS WORK. This chapter describes previous work in power

reduction research and explains key differences between our solutions and previous

work.

CHAPTER 5: VARIABLE BODY BIASING TECHNIQUE APPLICATION. This

chapter introduces the “VARIABLE BODY BIASING” technique. First the structure of

the circuit is explained followed by a detailed explanation of the circuit operation. An

analytical model of the circuit is derived and compared to the previous techniques.

CHAPTER 6: EXPERIMENTAL RESULTS. This chapter discusses the experimental

results from various applications of the technique. The “variable body biasing”

technique is empirically compared to well-known previous approaches. The

comparisons are assessed in terms of static power, dynamic power, delay, power delay

product and area occupied while changing numerous VLSI and CMOS circuit

parameters.

CHAPTER 7: CONCLUSION. This chapter summarizes the major accomplishments of

this thesis.

18

CHAPTER 2

MOTIVATION

Subthreshold circuit design is suitably applicable for emerging portable applications

that need tremendously low energy operation. The limitation of this technique is very

slow speed of operation due to the extremely scaled down supply voltage. Despite a

very high energy efficiency, the subthreshold design has been applied only in niche

markets due to its low performance. Depending upon the application, size, weight and

cost can be equally important as performance. Especially for remote, portable and

mobile applications, low-power has significance. Reduced power consumption makes

the circuits lighter, reduces or eliminates cooling subsystems, and reduces the weight

and extends the life of the energy source.

The multi-Vdd technique has been widely implemented for two supply voltages [19].

The dual-Vdd design is best suited for exploiting the time slack in a subthreshold circuit

as well. Although the gate delay exponentially depends on Vdd in the subthreshold

region it may be possible to find an optimal lower supply voltage for the available time

slack in the circuit. A DC to DC voltage converter [20] will then allow the voltage

management.

Historically, in the 1980‟s CMOS technology took over the mainstream of VLSI

design because CMOS consumes far less power than its predecessors (NMOS, bipolar,

etc.). Although this advantage still holds, power dissipation of CMOS has nonetheless

become a problem. For a long time, dynamic power accounted for more than 90%

(typically, over 99%) of total chip power, and thus was frequently used as the metric

for total power consumption for technologies 0.18µ and above. However, as technology

scales down to tens of nanometers, leakage power becomes as important as dynamic

power. Therefore, many ideas have been proposed to tackle the leakage power problem.

Although cutting off transistors from power rails, e.g., using the sleep transistor

19

technique, is one of the possible solutions, losing state during inactive mode incurs long

wake-up time and thus may not be appropriate for a system that requires fast response

times.

To provide a motivational scenario to illustrate the possible impact of this thesis, let us

compare the impact of static (leakage) power consumption in the context of a cell

phone example. We assume that in general, the cell phone we consider is always on

(i.e., 24 hours a day). However, the actual usage time of the cell phone is very limited.

If we assume a 500 minute calling plan with 500 minutes total used per month, the cell

phone is active only 1.15% (500兼件券/(30穴欠検嫌 × 24 月剣憲堅嫌 × 60兼件券嫌)) of the total

on-time. This means that during rest – 98.85% of the time – the cell phone is non-

active; however, due to static power consumption, during rest (standby) the cell phone

still consumes energy and reduces battery life. In technology such as 0.07µ, the impact

of leakage power is huge. Let us consider an energy consumption scenario of a cell

phone predicted based on experimental results from [15]. Specifically, Table 1 shows

some specific results from [15] for 0.07µ technology at 25oC. Table 2 shows a

hypothetical energy consumption scenario.

Table 1 Power and area results from [15]

Forced Stack Sleepy Stack

Active

Power(W)

Leakage Power

(W)

Area (µ2) Active

Power(W)

Leakage Power

(W)

Area (µ2)

4

inverters

1.25E-06 9.81E-10 5.97E+00 1.09E-06 4.56E-12 9.03E+00

512B

SRAM

5.22E-04 5.39E-06 2.00E+01 5.80E-04 3.24E-07 3.66E+01

20

Table 2 Energy consumption scenario of a cell phone (0.07µ) from [15]

Forced stack Sleepy stack

Active power (W)

Leakage power (W)

Area (µ2) Energy (J) (Month)

Active power (W)

Leakage power (W)

Area (µ2) Energy (J) (Month)

Processor logic

circuits

1.38E-01 1.02E-01 6.61E+05 2.65E+05 1.47E-01 5.74E-04 1.21E+06 5.87E+03

32KB SRAM

5.54E-03 4.15E-02 6.61E+05 1.06E+05 6.09E-03 2.44E-03 1.21E+06 6.44E+03

TOTAL 1.43E-01 1.43E-01 1.32E+06 3.72E+05 1.53E-01 3.01E-03 2.42E+06 1.23E+04

First, we assume a single chip containing an embedded processor core in 0.07µ

technology. The chip largely consists of logic circuits and a 32KB SRAM; note that

we exclude I/Os and the pad frame. Furthermore, we only consider here the digital

chip; i.e., the liquid crystal display, Radio Frequency (RF) circuitry, etc., are all

ignored. Second, we assume that SRAM and logic circuits each occupy half of the

digital chip area, respectively. We estimate 32KB SRAM area based on SRAM cell

area which we will present in Chapter 5 – note that in all cases we exclude test, e.g., our

SRAM does not include Built-In Self Test (BIST). The forced stack 32KB SRAM area

is 6.61 × 105µ2, and the sleepy stack SRAM area is 1.21 × 106µ2. Then we estimate

that the processor logic gates occupy the same amount of area as the 32KB SRAM as

shown in the area columns of Table 2.

Third, we also assume that at 0.07µ technology leakage power consumption is as much

as active power consumption when we use the forced stack technique. We multiply

forced stack leakage values from Table 1 by a factor (specifically, 939), so that forced

stack leakage power becomes the same as forced stack active power, i.e., 143mW. Then

we apply the same factor (939) to the sleepy stack leakage power from Table 1,

resulting in sleepy stack leakage power of 3.01mW. In other words, while Table 1 is

based on Berkeley Predicted Technology Model (BPTM) [21], we instead assume a

scenario where leakage power equals active power (which is, we believe, a hypothetical

situation we may possibly see in the future.) Now, recalling that our cell phone is active

500 minutes per month and thus inactive 42700 minutes per month, we calculate forced

stack digital chip energy per month as follows:

21

継券結堅訣検1 = 143兼激 ∗ 500 ∗ 60嫌結潔 + 143兼激 ∗ 42700 ∗ 60嫌結潔

= 37.2計蛍

Similarly, we calculate sleepy stack digital chip energy per month as follows: 継券結堅訣検2 = 153兼激 ∗ 500 ∗ 60嫌結潔 + 3.01兼激 ∗ 42700 ∗ 60嫌結潔

= 1.23計蛍

The result predicts that the ultra-low leakage power technology, i.e., sleepy stack,

serves 30X total energy consumption compared to the best prior work, i.e., forced

stack. Therefore, potentially, the ultra-low leakage power technique can extend by 30X

the cell phone battery life in this motivational example. There is a cost for this 30X

savings, however, note that the overall area increase 83% (from 1.32 mm2 to 2.42 mm

2-

Table 2).

Although there already exist many low-leakage techniques, the best prior low-leakage

technique in terms of leakage power reduction, the sleep transistor technique, loses

logic state during sleep mode. Therefore, the sleep transistor technique requires non-

negligible time to wake-up the device from the sleep mode. If we consider an

emergency calling situation to use cell phone, this wake-up time may not be acceptable.

Therefore, an ultra- low-leakage technique that can save state even in non-active mode

can be quite important in nano-scale technology VLSI.

In this dissertation, we use circuit based techniques to reduce leakage power

consumption. Especially, our technique can retain logic state and thus fast response

time can be achieved even during non-active mode. The technique can be applicable to

generic logic circuits as well memory, i.e., SRAM, since our technique can retain state.

In this chapter, some motivation for the importance of this research is provided. In the

next chapter, we explain expressions, notation and background important for this thesis.

22

CHAPTER 3

NOTATION AND BACKGOUND

In this chapter, we explain important notation and VLSI background used in this

dissertation. First, we introduce subthreshold leakage power consumption on which our

research focuses. Next, we explain the background underlying a particular leakage

power model able to explain the stack effect. We then explain the body-bias effect,

which is an important leakage reduction factor in our research. Furthermore, we explain

subthreshold leakage power consumption of a conventional 6 Transistor (6-T) SRAM

cell. Finally, we explain switching power and delay tradeoffs of CMOS circuits and

some key terms of circuit performance estimations.

3.1 Leakage power

In this section, we explain notation and background relevant to leakage power

consumption.

Although dynamic power is dominant for technologies at 180nm and above, leakage

(static) power consumption becomes another dominant factor for 130nm and below.

One of the main contributors to static power consumption in CMOS is subthreshold

leakage current shown in Figure 1, i.e., the drain to source current when the gate

Figure 1 Sub-threshold leakage of an nFET

23

voltage is smaller than the transistor threshold voltage. Since subthreshold current

increases exponentially as the threshold voltage decreases, nano scale technologies with

scaled down threshold voltages will severely suffer from subthreshold leakage power

consumption.

Assuming the leakage current is constant the static power dissipation is the

product of total leakage current and supply voltage,

�嫌建欠建件潔 = 荊嫌建欠建件潔撃穴穴 (3.1)

Static power reduction involves minimizing 荊嫌建欠建件潔 , which is almost equal to the

subthreshold leakage current Isub for Vgs < 撃建月 .

Subthreshold leakage can be reduced by stacking transistors, i.e., taking advantage of

the so-called “stack effect” [22] or alternatively applying variable body biasing (撃嫌決0),

which we will use in section 5. The stack effect occurs when two or more stacked

transistors are turned off together; the result is reduced leakage power consumption. Let

us explain an important stack effect leakage reduction model. The model we explain

here is based on the leakage models in [22] and [23]. For a turned off single transistor

shown in Figure 2(a), leakage current (Isub0) can be expressed as follows:

荊嫌憲決 0 = 畦結 1券撃肯撃訣嫌0−撃建月0−�撃嫌決0+考撃穴嫌0 1 − 結− 撃穴嫌0 /撃肯

= 畦結 1券撃肯 −撃建月0+考撃穴穴

Figure 2 (a) A single transistor (left) and (b) stacked transistors (right)

24

Where,

畦 = �0系剣捲激詣結血血撃肯2結1.8 ,

券 is the subthreshold swing coefficient

撃肯 is the thermal voltage.

撃訣嫌0 , 撃建月0 , 撃嫌決0 and 撃穴嫌0 are the gate-to-source voltage, the zero-bias threshold

voltage, the source-to-base voltage and the drain-to-source voltage,

respectively.

� is the body-bias effect coefficient

考 is the Drain Induced Barrier Lowering (DIBL) coefficient.

�0 is zero-bias mobility,

系剣捲 is the gate-oxide capacitance,

激 is the width of the transistor, and

詣結血血 is the effective channel length [24].

(Note that throughout this thesis we assume �券 = 2�喧 , i.e., nMOS carrier mobility is

twice PMOS carrier mobility. Also note that we use a W/L ratio based on a actual

transistor size, in which way a W/L ratio properly characterizes circuit models used in

this thesis.) We assume 1 >> 結−撃穴嫌0/撃肯 .

Let us assume that the two stacked transistors (M1 and M2) in Figure 2(b) are turned

off. We also assume that the transistor width of each of M1 and M2 is the same as the

transistor width of M0 (激警0 = 激警1 = 激警2 ). Two leakage currents 荊嫌憲決 1 of the

transistor M1 and 荊嫌憲決 2 of the transistor M2 can be expressed as follows:

荊嫌憲決 1 = 畦結 1券撃肯撃訣嫌1−撃建月1−�撃嫌決1+考撃穴嫌1 1 − 結− 撃穴嫌1 /撃肯

= 畦結 1券撃肯 −撃捲−撃建月0−�撃捲 +考(撃穴穴 −撃捲 ) 荊嫌憲決 2 = 畦結 1券撃肯撃訣嫌2−撃建月2−�撃嫌決2+考撃穴嫌2 1 − 結− 撃穴嫌2 /撃肯

= 畦結 1券撃肯 −撃建月0+考撃捲 1 − 結− 撃捲 /撃肯

Where 撃捲 is the voltage at the node between M1 and M2, and we assume 1 >> 結−撃穴嫌1/撃肯 .

Now consider leakage current reduction between 荊嫌憲決 0 and 荊嫌憲決 1 (= 荊嫌憲決 2 ). The

reduction factor X can be expressed as follows:

25

隙 = 荊嫌憲決 0荊嫌憲決 1

= 畦結 1券撃肯 −撃建月0+考撃穴穴

畦結 1券撃肯 −撃捲−撃建月0−�撃捲+考 (撃穴穴 −撃捲 )= 結撃捲券撃肯 1+�+考

(3.2)

撃捲 in Equation (3.2) can be derived by letting 荊嫌憲決 1 = 荊嫌憲決 2 and by solving the

following equation:

結 1券撃肯考撃穴穴 −撃捲 (1+�+2考 ) + 結−撃捲

撃肯 = 1 (3.3)

If all the parameters are known, we can calculate stack effect leakage power reduction

using Equations (3.2) and (3.3). As an example, we consider leakage model parameter

values targeting 0.5� technology in Table 3 obtained from [22]. From Equation (3.3),

we calculate 撃捲 =0.0443V, and from Equation (3.2), we obtain leakage reduction

factor X = 4.188.

Table 3 Leakage model parameters (0.5μ tech)

Although the reduction is 4.188X at 0.5� technology, the reduction increases at nano-

scale technology because 考 increases as technology feature size shrinks.

Threshold voltage of a CMOS transistor can be controlled using body bias. In

general, we apply Vdd to the body (e.g., an n-well or n-tub) of PMOS and apply gnd to a

body (e.g., p-well or p-substrate) of NMOS. This condition, in which source voltage

and body voltage of a transistor are the same, is called Zero-Body Bias (ZBB).

Threshold voltage at ZBB is called ZBB threshold voltage. When body voltage is lower

than source voltage by biasing negative voltage to body, this condition is called

Reverse-Body Bias (RBB). Alternatively, when body voltage is higher than source

voltage by biasing positive voltage to body, this condition is called Forward-Body Bias

Parameter Value 撃穴穴 1V 撃建月 0.2V

n (subthreshold slope coefficient) 1.5 考 (DIBL coefficient) 0.05 V/V � (body-bias effect coefficient) 0.24 V/V

26

(FBB). When RBB is applied to a transistor, threshold voltage increases, and when

FBB applied to a transistor, threshold voltage decreases. This phenomenon is called

body-bias effect, and this is frequently used to control threshold voltage dynamically

[25].

In this section, Section 3.1, we explained subthreshold leakage power

consumption, the stack effect, and body-bias effects which can alter subthreshold

leakage power consumption. In the next section, we explain leakage current of an

SRAM cell.

3.2 SRAM cell leakage paths

In this section, we explain the major subthreshold leakage components in a 6-T

SRAM cell. The subthreshold leakage current in an SRAM cell is typically categorized

into two kinds [26] as shown in Figure 3: (i) cell leakage current that flows from Vdd to

Gnd internal to the cell and (ii) 決件建健件券結 leakage current that flows from 決件建健件券結 (or 決件建健件券結′) to Gnd.

Although an SRAM cell has two 決件建健件券結 leakage paths, the 決件建健件券結 leakage

current and 決件建健件券結′ leakage current differs according to the value stored in the SRAM

bit. If an SRAM cell holds „1‟ as shown in Figure 3, the 決件建健件券結 leakage current passing

through N3 and N2 is effectively suppressed due to two reasons. First, after precharging 決件建健件券結 and 決件建健件券結′ both to „1,‟ the source voltage and the drain voltage of N3 are the

same and thus potentially no current flows through N3. Second, two stacked and turned

Figure 3 SRAM cell leakage paths

27

off transistors (N2 and N3) induce the stack effect. Meanwhile, for this case where the

SRAM bit holds value „1,‟ a large 決件建健件券結′ leakage current flows passing through N4

and N1. If, on the other hand, the SRAM cell holds „0,‟ a large 決件建健件券結 leakage current

flows while 決件建健件券結′ leakage current is suppressed.

In this section, Section 3.2, we explain the two major types of leakage paths in

an SRAM cell (cell leakage and 決件建健件券結 leakage). In next section, we explain tradeoffs

between switching power and delay.

3.3 Switching power and delay tradeoffs

In this section, we explain tradeoffs between switching power and delay. In

CMOS, power consumption consists of leakage power and dynamic power – note that

dynamic power includes both switching power and short-circuit power. Switching

power is consumed when a gate charges its output load capacitance, and short-circuit

power is consumed when a pull-up network and a pull-down network are on together

for an instant while transistors are turning on and off. For 0.18μ channel lengths and

above, leakage power is very small compared to dynamic power. Furthermore, short-

circuit power is also less than 10% of the dynamic power for a typical CMOS design,

and the ratio between dynamic power and short-circuit power does not change as long

as the ratio between supply voltage and threshold voltage remains the same [27]. Since,

for 0.18μ and above, short circuit power and leakage power are relatively small

compared to switching power, CMOS power consumption of a particular CMOS gate

under consideration can be represented by the following switching power (Pswitching)

equation for 0.18μ and above:

�嫌拳件建潔月件券訣 = 喧建系詣撃穴穴 2 血 (3.4)

Where, CL, Vdd, and f denote the load capacitance of a CMOS gate, the supply voltage

and the clock frequency, respectively [28]. Notation pt denotes the switching ratio of a

gate output; this switching ratio represents the number of times the particular gate‟s

output changes from Gnd to Vdd per second – please note that when output capacitance

discharges from Vdd to Gnd, switching power is not consumed because power from Vdd

is not used (e.g., discharging to Gnd does not consume battery power). The switching

ratio varies according to the input vectors and benchmark programs, and thus an

average value of each benchmark may be used as a switching ratio.

28

Equation (3.4) shows that lowering Vdd decreases CMOS switching power

consumption quadratically. However, this power reduction unfortunately entails an

increase in the gate delay in a CMOS circuit as shown in following approximated

equation:

劇穴 ∝ 撃穴穴

(撃穴穴 −撃建月 )� (3.5)

Where, Td, Vth, and � denote the gate delay in a CMOS circuit, the threshold voltage

and velocity saturation index of a transistor, respectively. It is well-known that while �

has values close to 2 for above 2.0μ, for 0.25μ � is between 1.3 and 1.5, and for below

0.1μ � is close to 1 [28-29]. However, instead of scaling down a � value along with

the technology feature size, CMOS technology may take a constant � value to avoid

the hotcarrier related problem [30]. A constant � value could be accomplished by

changing Vth because � is a function of gate-source voltage [31]. If we scale down Vdd,

switching power in Equation (3.4) decreases, while the gate delay in Equation (3.5)

increases. Therefore, CMOS circuit speed can be traded with switching power

consumption as shown in Equations (3.4) and (3.5).

When there exist tradeoffs between multiple criteria, e.g., power and delay, we

may say one design is better than another design in specific criteria. The point of design

space is called a Pareto point if there is no point with one or more inferior objective

[32]. In this thesis we estimate leakage power consumption by measuring static power

when transistors are not switching. Furthermore, we estimate active power

consumption by measuring power when transistors are switching. This active power

includes dynamic power consumption and leakage power consumption. In this chapter

we explained important notation and VLSI background used in this thesis. In the next

section, we explain previous low-power research related to our research.

3.4 Circuit Performance Estimation

In this section we introduce a vastly used technique to estimate the propagation

delay of VLSI circuits, Linear Delay Model. Using this model one can quickly and

crudely calculate the propagation delay in unit of � = 3RC (parasitic delay of a unit

an inverter). For 180nm process � =15ps.

29

In general the propagation delay of a gate can be written as

穴 = 血 + 喧 (3.6)

Where, p is the parasitic delay inherent to the gate when no load is attached; f is the

effort delay that depends on the complexity and fan-out of the gate: [33] 血 = 訣月

The complexity is represented by the logical effort, g. An inverter is defined to have a

logical effort of „1‟. Logical effort of a gate is defined by ratio of the input capacitance

to the input capacitance of an inverter that can deliver the same output current. For 2-

input NAND gate and 2-input NOR gate g is 4/3 and 5/3 respectively. (Figure 4)

For general case it can be shown that that logical effort of n-input NAND gate, g =

(n+2)/3 and n-input NOR gate, g = (2n+1)/3. A gate driving h identical copies of itself

is said to have fan-out or electrical effort of h. If the load is not identical h is defined by

月 =系剣憲建系件券

The parasitic delay p of a gate is the delay of the gate when drives zero load. A crude

method is count the diffusion capacitance on the output node. It can be shown that the p

of n-input NAND gate and n-input NOR is equals to n. For calculating delay in

multistage logic networks we define following terms:

Figure 4 logical efforts of basic logic gates

30

Path logical effort, 罫 = 訣件 Path electrical effort, 茎 =

系剣憲建(喧欠建月)系件券(喧欠建月)

Branching effort, 決 =系剣券喧欠建月+系剣血血喧欠建月系剣券喧欠建月

Path branching effort, 稽 = 決件 Path effort, 繋 = 罫稽茎

Path effort delay, 経繋 = 血件 Path parasitic delay, � = 喧件

Finally, the path delay, D is the sum of the delays each stage: 経 = 穴件 = 経繋 + �

In this section, we discussed a simple model to estimate propagation delay of a

circuit by hand calculation. This model gives the designers an insight of the circuit

using which he can design faster circuits.

31

CHAPTER 4

PREVIOUS WORKS

In this chapter, we review important prior work that is closely related to our research.

Furthermore, the previous works are compared to our research. We explore the prior

work targeting leakage power reduction mainly. But we also shed light to other

performance criteria such as dynamic power, propagation delay, power delay product

and area etc.

4.1 Static Power Reduction VLSI research

In this section, we discuss previous low-power techniques that primarily target

reducing leakage power consumption of CMOS circuits. Techniques for leakage power

reduction can be grouped into two categories: (I) state-saving techniques where circuit

state (present value) is retained and (II) state-destructive techniques where the current

Boolean output value of the circuit might be lost [13]. A state-saving technique has an

advantage over a state-destructive technique in that with a state-saving technique the

circuitry can immediately resume operation at a point much later in time without

having to somehow regenerate state. We characterize each low-leakage technique

according to this criterion. We study low-leakage techniques for generic logic circuits

followed by low-leakage SRAM designs separately.

4.1.1 Static power reduction research for generic logic circuits

This section explains previously proposed low-leakage techniques for generic logic

circuits. As introduced, previously proposed work can be divided into techniques that

either (i) save state or (ii) destroy state. Although our research focuses on techniques

which save state, we also review the state-destructive techniques for the purposes of

comparison. In state-destructive category there is sleep transistor technique and forced

stack. The state saving category includes sleepy stack, sleepy keeper, dual sleep and

dual stack method.

32

4.1.1.1 Sleep transistor

State-destructive techniques cut off transistor (pull-up or pull-down or both) networks

from supply voltage or ground using sleep transistors [34]. These types of techniques

are also called gated-Vdd and gated-Gnd (note that a gate dc lock is generally used for

dynamic power reduction). Motoh et al. propose a technique they call Multi-Threshold

Voltage CMOS (MTCMOS) [34], which adds high-Vth sleep transistors between pull-

up networks and Vdd and between pull-down networks and ground as shown in Figure 1

while logic circuits use low-Vth transistors in order to maintain fast logic switching

speeds. The sleep transistors are turned off when the logic circuits are not in use. By

isolating the logic networks using sleep transistors, the sleep transistor technique

dramatically reduces leakage power during sleep mode. However, the additional sleep

transistors increase area and delay. Furthermore, the pull-up and pull-down networks

will have floating values and thus will lose state during sleep mode. These floating

values significantly impact the wake-up time and energy of the sleep technique due to

the requirement to recharge transistors which lost state during sleep (this issue is

nontrivial, especially for registers and flip-flops).

Comparison with prior works using sleep transistors

The sleep transistor technique and the “Variable Body Biasing” technique both achieve

roughly the same static power savings over conventional CMOS. However, unlike the

sleep transistor technique, the “Variable Body Biasing” technique saves logic state

during low leakage mode (sleep mode), and this is a significant advantage over the

state-destructive sleep transistor technique. The sleep transistor technique requires non-

Figure 5 Sleep transistor

33

negligible power consumption to restore lost state. Further, the wake-up time of the

sleep transistor technique is significant, while the “Variable Body Biasing” technique

needs only a very small extra wake-up time (a few clock cycles).

4.1.1.2 Forced Stack

Another technique to reduce leakage power is transistor stacking. Transistor

stacking exploits the stack effect explained in Chapter3; the stack effect results in

substantial subthreshold leakage current reduction when two or more stacked transistors

are turned off together.

Example1: The stack effect can be understood from the forced stack inverter

example shown in Figure 6. Unlike a generic CMOS inverter, this forced stack inverter

consists of two pull-up transistors and two pull-down transistors. All inputs share the

same input „A.‟ If A =0, then both transistors M1 and M2 are turned off. Due to the

internal resistance of M2, the intermediate node voltage Vx is higher than Gnd. The

positive potential of Vx results in a negative gate-source voltage (Vgs) for M1 and

negative source-base voltage (Vsb) for M1. Furthermore, M1 has a reduced drain-source

voltage (Vds), which degrades the Drain Induced Barrier Lowering (DIBL) effect [35].

All three effects together change the leakage reduction factor X in Equation 3.2

(see Chapter 3), reducing leakage current by an order of magnitude for today‟s channel

lengths (0.18µ, 0.13µ, 0.10µ and 0.07µ) [36].

Narendra et al. study the effectiveness of the stack effect including effects from

increasing the channel length [37]. Since forced stacking of what previously was a

Figure 6 “Forced Stack”

34

single transistor increases delay, Johnson et al. propose an algorithm that finds circuit

input vectors that maximize stacked transistors of existing complex logic [38].

Comparison with prior work using the forced stack approach

Compared to the forced stack technique, the “Variable Body Biasing” technique

potentially achieves more power savings because the “Variable Body Biasing” can

control the change in body bias during circuit operation. The forced stack technique

cannot use high-Vth transistors without dramatic delay increase (larger than 5X delay

increase compared to conventional CMOS).

4.1.1.3 Sleepy Stack

The sleepy stack approach has a structure combining the stack and sleep approaches by

dividing every transistor into two transistors of half width and placing a sleep transistor

in parallel with one of the divided transistor [14, 15]. As shown in Figure 7, sleep

transistors are placed in parallel to the divided transistor closest to Vdd for pull-up and

in parallel to the divided transistor closest to GND for pull-down. The sleepy stack

approach can have advantages of both the stack approach and the sleep approach.

During active mode, the sleepy stack approach results in lower delay than the stack

approach because sleep transistors placed in parallel (i) reduce resistance and (ii) are

already on. When sleep transistors are turned off; the existence of a path from either

Vdd or GND prevents floating output. Also, leakage current can further be reduced by

applying high-Vth on sleep transistors and the transistors in parallel to the sleep

transistors. However, area penalty is significant matter since every transistor is replaced

by three transistors and since additional wires are added for S and 鯨′, which are sleep

signals.

Figure 7 SLEEPY STACK

35

Comparison with prior work using the sleepy stack approach

Compared to the sleepy stack technique, the “Variable Body Biasing” technique

achieves 86% more power savings because the “Variable Body Biasing” can control the

change in body bias during circuit operation. The sleepy stack requires 32.3% more

area than “Variable Body Biasing” technique. This area overhead is a major

improvement over sleepy stack technique.

4.1.1.4 Sleepy keeper

Another approach called sleepy keeper utilizes leakage feedback technique [16] and is

shown in Figure 8. In this approach, a PMOS transistor is placed in parallel to the sleep

transistor (S) and a NMOS transistor is placed in parallel to the sleep transistor (S').

The two transistors are driven by the output of the inverter. During sleep mode, sleep

transistors are turned off and one of the transistors in parallel to the sleep transistors

keep the connection with the appropriate power rail.

Comparison with prior work using the sleepy keeper approach

Compared to the sleepy keeper technique, the “Variable Body Biasing” technique

achieves 50% more power savings because the “Variable Body Biasing” can control the

change in body bias during circuit operation. The sleepy keeper requires less area than

“Variable Body Biasing” technique. But the reduction in leakage power is more useful.

Figure 8 Sleepy keeper

36

4.1.1.5 Dual Sleep

Dual sleep approach uses the advantage of using the two extra pull-up and two extra

pull-down transistors in sleep mode either in OFF state or in ON state. Since the dual

sleep portion can be made common to all logic circuitry, less number of transistors is

needed to apply a certain logic circuit. In OFF state each of the pull-up and pull-down

networks consists of both PMOS and NMOS transistors in order to reduce the leakage

power. There are three obvious advantages. Firstly, it maintains state in sleep mode.

Secondly, like the sleep, sleepy stack and sleepy keeper approaches, dual-Vth

technology can be applied in dual sleep approach to obtain greater leakage power

reduction [17].

Comparison with prior work using the dual sleep approach

Dual sleep method requires 94.7%, 95.3% and 80.49% more leakage power compared

to “Variable Body Biasing” technique respectively for chain of 4 inverters, 1 bit Full

adder and SRAM cell. There is around 7% improvement in propagation delay for logic

circuits for “Variable Body Biasing” technique compared to dual sleep method.

Figure 9 Dual Sleep

37

4.1.1.6 Dual Stack

The dual stack approach (Figure 10) uses 2 extra PMOS in the pull down network and

2 extra NMOS in the pull up network. As a result the NMOS degrades high logic level

and the PMOS degrades the low logic level. Due to the body effect they further

decrease the voltage level. So, the pass transistor decreases the voltage applied across

the main circuit. The stacked transistors are held in reverse body bias. As a result their

threshold is high. High threshold voltage causes low leakage current and hence low

leakage power. Again minimum transistor size of aspect ratio 1 is used to reduce the

static power more [18].

Comparison with prior work using the dual stack approach

Dual sleep method requires 92.93%, 94.58% and 77.14% more leakage power

compared to “Variable Body Biasing” technique respectively for chain of 4 inverters, 1

bit Full adder and SRAM cell. There is improvement also in propagation delay for both

logic circuits and memory circuits (i.e. SRAM) for “Variable Body Biasing” technique

compared to dual sleep method.

Figure 10 Dual Stack

38

4.1.2 Static power reduction research for SRAM

In this section, we discuss state-of-the-art low-power memory techniques, especially

SRAM on which our research focuses and hence make comparisons.

4.1.2.1 Sleep transistor

This is same as applying sleep transistor in generic logic circuits i.e. chain of four

inverters. The Vdd and Gnd rails are separated from the circuit through a PMOS and an

NMOS transistor respectively.

Comparison with our work

Sleep transistor method requires 35.4% more static power and 35.7% more

dynamic power than our approach. It has 0.89% less delay and 31.9% area compared to

our approach. Our proposed approach has overall power delay product 35.05% more

than sleep transistor approach.

Figure 11 SLEEP TRANSISTOR IN SRAM

39

4.1.2.2 Dual Sleep

Sleep transistors are crucial part in any low leakage power design. Generally, the sleep

transistor is used to reduce leakage power in off mode and other techniques are adopted

to save the state. In this method, each of the rails is separated by a header and footer

sleep transistor. It is similar to the case of logic circuits. We apply S=1 when the circuit

is in active mode and S=0 when it is in sleep mode.


Dual sleep method requires 84.12% more static power and 35.49% more

dynamic power than our approach. It has 0.10% more delay compared to our approach.

Our proposed approach has overall power delay product 35.35% more than dual sleep

approach.

Figure 12 “DUAL SLEEP” IN SRAM

40

4.1.2.3 Dual Stack

Figure 13 shows the configuration of dual stack method in case of a SRAM. In this

method, there are two extra MOSFETS parallel to the sleep transistors. These extra

MOSFET helps to retain state which is crucial for the operation of SRAM. As the

retention transistors are stacked they help to reduce leakage power. Its operation is

similar to the case of logic circuits. We apply S=1 when the circuit is in active mode

and S=0 when it is in sleep mode.


Dual stack method requires 80.73% more static power and 0.48% less dynamic

power than our approach. It has 0.05% less delay compared to our approach. Our

proposed approach has overall power delay product 0.46% less than dual stack

approach.

Figure 13 “DUAL STACK” IN SRAM

41

4.1.2.4 Sleepy Keeper in SRAM

Figure 14 shows the configuration of sleepy keeper method in case of a SRAM. In this

method, there is extra MOSFET parallel to the sleep transistors. These extra MOSFET

helps to retain state which is required for the operation of SRAM. In this case the

retention transistors are not stacked so they offer small amount of reduction in leakage

power. Its operation is similar to the case of logic circuits. We apply S=1 when the

circuit is in active mode and S=0 when it is in sleep mode.


Sleep transistor method requires 37.1795% more static power and 39.5954%

more dynamic power than our approach. It has 0.43471% more delay compared to our

approach. Our proposed approach has overall power delay product 39.86% more than

sleepy keeper approach.

Figure 14 “SLEEPY KEEPER” IN SRAM

42

CHAPTER 5

VARIABLE BODY BIASING TECHNIQUE

In this chapter, we introduce our new leakage power reduction technique we name

“Variable Body Biasing.” We derived this technique by controlling the Vth of the sleep

transistor of sleepy keeper technique mode wise so that the subthreshold leakage can be

minimized in sleep mode. However, unlike the sleep transistor technique, the variable

body biasing technique retains the exact logic state; and, unlike the sleepy keeper

technique, our technique can utilize variable Vth using body effect without suffering

delay penalties. Therefore, far better than any prior approach known to this thesis

author, the variable body biasing technique can achieve ultra-low leakage power

consumption while saving state.

We first explain the structure of the variable body biasing technique using an

inverter. Then we describe the details of variable body biasing operation in active mode

and sleep mode. The advantages of the variable body biasing technique over the sleep

transistor technique and the sleepy keeper technique are explored. Finally, we apply

linear delay model to our variable body biasing technique to estimate the propagation

delay.

5.1 Variable body biasing approach

In this section, we explain our variable body biasing structure comparing to the sleepy

keeper technique for an inverter. The details of the variable body biasing inverter are

described as an example. Two operation modes, active mode and sleep mode, of the

variable body biasing technique are explored.

43

5.2 Variable body biasing structure

We have already described the sleepy keeper structure in section 4.1.1.4. In sleepy

keeper structure the sleep transistors still have some subthreshold leakage in sleep

mode which can be reduced by increasing their Vth using body effect. To implement

this we use a PMOS (M2) and a NMOS (M5) (Figure15 (b)). The drain of the PMOS

(M2) is connected with the body of the sleep PMOS (M1) and the source the PMOS

(M2) is connected to the Vdd. Similarly the drain of the NMOS (M5) is connected with

the body of the sleep NMOS (M4) and the source the NMOS (M5) is connected to the

Gnd. The other PMOS (M3) and NMOS (M6) are used as keeper to retain the state of

the output in sleep mode. The W/L ratio of the inverter PMOS is 6 and NMOS is 3. All

other transistor have W/L ratio of unity.

Figure 15 An Inverter with (a) Sleepy Keeper (left) (b) Variable body biasing structure

(right)

44

5.3 Variable body biasing operation

During active mode (S=1, 鯨′=0) transistor M1, M2, M4 and M5 act as short circuit so

the inverter works normally. The Vsb of the PMOS (M1) and NMOS (M4) is almost

zero. As a result the Vth of sleep transistors decreases by body effect. (Equation (5.1))

撃建月 = 撃建月0 + �( �嫌 + 撃嫌決 − �嫌) (5.1)

Where 撃建月0 is the threshold voltage source is at the body potential, �嫌 = 2撃肯 ln(軽畦/券件)

is the surface potential at threshold, � is the body-bias effect coefficient, typically in the

range 0.4 to 1 V1/2

, 軽畦 is the doping level and 撃肯 is the thermal voltage.

On the other hand, during sleep mode (S=0, 鯨′=1) transistor M1, M2, M4 and M5 are

cutoff which make Vsb of the sleep transistor non zero. As a result the Vth of sleep

transistors increases hence the subthreshold leakage is reduced by the following

equation:

荊嫌憲決 = 畦結 1券撃肯撃訣嫌−撃建月0−�撃嫌決 +考撃穴嫌 1 − 結− 撃穴嫌 /撃肯 (5.2)

Where 畦 = �0系剣捲激詣結血血撃肯2結1.8 , n is the subthreshold swing coefficient and 撃肯 is the

thermal voltage. 撃訣嫌 , 撃建月0 , 撃嫌決 and 撃穴嫌 are the gate-to-source voltage, the zero-bias

threshold voltage, the source-to-base voltage and the drain-to-source voltage,

respectively. � is the body-bias effect coefficient, and 考 is the Drain Induced Barrier

Lowering (DIBL) coefficient. �0 is zero-bias mobility, 系剣捲 is the gate-oxide

capacitance, 激 is the width of the transistor, and 詣結血血 is the effective channel length.

Reducing the subthreshold leakage makes the static dissipation lower. Thus the Vth of

the sleep transistors (M1, M4) are varied mode wise. In active mode it is necessary to

decrease the Vth for reducing the propagation delay:

劇穴 ∝ 撃穴穴

(撃穴穴 −撃建月 )� (5.3)

Where Td, Vth, and � denote the gate delay in a CMOS circuit, the threshold voltage

and velocity saturation index of a transistor, respectively.

45

The state retention is accomplished by keeper pair NMOS (M3) and PMOS (M6).

During sleep mode, if output is high the NMOS (M3) keeps the state high and if the

output is low PMOS (M6) keeps the state low independent of any margin of noise.

5.4 Analysis of Subthreshold leakage reduction

Using analytical approach we can show that the subthreshold current can be reduced by

variable body biasing technique. In Figure 16(a) we have a sleep transistor without

body biasing transistor and in Figure 16(b) a sleep transistor (M1) with body biasing

transistor (M2).

For sleep transistor without body biasing subthreshold leakage is given by

荊嫌憲決 = 畦結 1券撃肯撃訣嫌−撃建月0−�撃嫌決 +考撃穴嫌 1 − 結− 撃穴嫌 /撃肯 (5.4)

During sleep mode 鯨′ is high, so 撃訣嫌 is zero. As source is at body potential, the

subthreshold leakage becomes

荊嫌憲決 = 畦結 1券撃肯 −撃建月0+考撃穴嫌 1 − 結− 撃穴嫌 /撃肯 (5.5)

Now for sleep transistor with body biasing transistor subthreshold leakage through M2

is

荊嫌憲決 2 = 畦結 1券撃肯撃訣嫌2−撃建月0−�撃嫌決2+考撃穴嫌2 1 − 結− 撃穴嫌2 /撃肯 (5.6)

Here during sleep mode 撃訣嫌2 is zero. Unlike the sleep transistor (M1), the body biasing

transistor (M2)‟s source is connected with body. So 撃嫌決2 is also zero. Hence

Figure 16 (a) Sleep transistor without body biasing transistor

(b) With body biasing transistor

46

荊嫌憲決 2 = 畦結 1券撃肯 −撃建月0+考撃穴嫌2 1 − 結− 撃穴嫌2 /撃肯 (5.7)

荊嫌憲決 2 = 畦結 1券撃肯 −撃建月0+考撃穴嫌2 − 畦結 1券撃肯 −撃建月0+ (考−券) 撃穴嫌2 (5.8)

Now the subthreshold leakage through M1 is given by

荊嫌憲決 1 = 畦結 1券撃肯撃訣嫌1−撃建月0−�撃嫌決1+考撃穴嫌1 1 − 結− 撃穴嫌1 /撃肯 (5.9)

Here 撃訣嫌1 is zero and 撃嫌決1 is equal to 撃穴嫌2. Hence 荊嫌憲決 1 becomes

荊嫌憲決 1 = 畦結 1券撃肯 −撃建月0−�撃嫌穴2+考撃穴嫌1 1 − 結− 撃穴嫌1 /撃肯 (5.10)

From Figure 16(b) we see that only 荊嫌憲決 1 contributes to the static current. Comparing

subthreshold leakage of sleep transistor without body biasing from equation (5.5) with

subthreshold leakage of sleep transistor with body biasing from equation (5.10) we find

that, 荊嫌憲決 1 < 荊嫌憲決

So by body biasing the static current during sleep mode is reduced thus resulting in a

reduced static power (according to Equation 3.1).

5.5 Estimation of delay for variable body biasing technique

In this section we estimate the propagation delay of an inverter by applying the linear

delay model which is discussed in section (3.4).

Figure 17 (a) Inverter with VBB Technique (left) (b) Inverter of equal strength (right)

47

To find the logical effort of the inverter with variable body biasing technique Figure

17(a) we have to determine it‟s the input capacitance and also input capacitance of a

simple inverter of equal strength Figure 17(b).

Input capacitance of the inverter with variable body biasing technique is

Cin =6+3=9,

Input capacitance of a simple inverter of equal strength is 系件券 ′=3/2+3/4=9/4.

Logical effort, 訣 =9

9/4= 4

Transistor M3 and M6 contributes to parasitic capacitance. So 喧 = 6 + 3 + 1 + 1 = 11

Assuming fan-out of one identical gate, Electrical effort, 月 = 1

Now according to linear delay model, propagation delay, 穴 = 訣月 + 喧 = 4 × 1 + 11 =

15 in unit of � = 3迎系, parasitic delay of unit inverter. For 180nm process, � = 15喧嫌.

So, propagation delay of the inverter with variable body biasing technique is estimated

to be 15 × 11 喧嫌 = 165喧嫌 for 180 nm process.[33]

In this chapter, we introduced the variable body biasing technique for leakage power

reduction. In this technique, the Vth of the sleep transistor of sleepy keeper technique is

controlled using body effect so that the subthreshold leakage can be minimized in sleep

mode. In sleep mode the Vth of the sleep transistor is increased as a result subthreshold

leakage is reduced as well as the static power dissipation. During active mode, due to

the body biasing transistor Vth of the sleep transistor is decreased so the propagation

delay can be kept within the limit.

In the next chapter we apply the variable body biasing structure to generic logic circuits

and to SRAM, explaining in detail our methodology.

48

CHAPTER 6

EXPERIMENTAL RESULTS

We Ioﾏpaヴe the さVARIABLE BODY BIA“INGざ teIhnique to a number of key, well-known low-

leakage techniques. At first, we explore the experimental results for general logic circuits.

Then we explore the experimental results for SRAM cell design.

6.1 Experimental results for general logic circuits

In this section, we explain the experimental results for generic logic circuits. We utilize two

logic designs namely (1) chain of four inverters (CO4) and (2) 1 bit full adder (FA). The chosen

technologies are BSIM4 PTM Model [36] and their supply voltages are given in Table 4.

Table 4 Chosen technology and Vdd value

130 nm 90 nm 65 nm 45 nm 32 nm

1.3V 1.2V 1.1V 1.0V 0.9V

6.1.1 Experimental results for CO4

We have considered the following three techniques for comparing with our proposed

technique:

SLEEPY STACK

DUAL SLEEP

DUAL STACK

The data of static power consumption of these methods for different technologies are shown

in Table 5. From the table we can clearly understand that variable body biasing technique has

the lowest static power consumption compared to the other techniques.

49

Table 5 Static power data for chain of 4 inverters (nano watt)

Different

Technologies SLEEPY STACK DUAL SLEEP DUAL STACK VBB

130nm 10.3 26 21.6 1.44

90nm 9.16 21.3 16.9 1.13

65nm 8.82 15.5 11.6 0.82

45nm 4.41 9.23 6.22 0.494

32nm 2.44 6.59 3.79 0.364

Figure 18 Static Power Consumption (CO4)

Figure 18 shows the graphical representation of the static power consumption in a CO4 for

different technologies in different methods. From the graph we can see the downward

tendency of static power consumption compared to other methods in all of the technologies.

The data of dynamic power consumption of these methods for different technologies are

shown in Table 6 in the next page. From the presented data we can see that the dynamic

power is almost equal to dual stack method and less than the sleepy stack and dual sleep

method.

0

5

10

15

20

25

30

SLEEPY

STACK

DUAL SLEEP DUAL STACK Variable

Body Biasing

Na

no

Wa

tt

Static Power Consumption

130nm

90nm

65nm

45nm

32nm

50

Table 6 Dynamic power data for chain of 4 inverters (micro watt)

Different

Technologies SLEEPY STACK DUAL SLEEP DUAL STACK VBB

130nm 21.1 26.5 12 12.2

90nm 12.6 16.2 7.31 7.21

65nm 6.98 9.47 4.31 4.22

45nm 3.26 4.67 2.06 2.05

32nm 1.61 2.33 1.03 1.03

Figure 19 Dynamic power consumption (CO4)

Figure 19 shows the graphical representation of the dynamic power consumption in a CO4 for

different technologies in different methods. From the graph we can see that dynamic power is

a maximum for dual sleep method and dual stack and variable body biasing method have

almost equal power consumption.

The data of propagation delay of aforementioned methods for different technologies are

shown in Table 7.

0

5

10

15

20

25

30

SLEEPY STACK DUAL SLEEP DUAL STACK variable body

biasing

Mic

ro w

att

Dynamic Power Consumption

130nm

90nm

65nm

45nm

32nm

51

Table 7 propagation delay data for chain of 4 inverters (Pico seconds)

Different

Technologies SLEEPY STACK DUAL SLEEP DUAL STACK

variable body

biasing

130nm 85.9 53.5 84.3 57.5

90nm 68.6 39.1 38.2 34.9

65nm 57 35.8 61.5 32.8

45nm 50.3 30.9 38.9 44.9

32nm 46.8 35.9 74.2 78.6

Figure 20 Propagation delay (CO4)

Figure 20 shows the graphical representation of the propagation delay in a CO4 for different

technologies in different methods.

The data of Power delay Product of aforementioned methods for different technologies are

shown in Table 8.

30

40

50

60

70

80

90


biasing

Pic

o S

eco

nd

s

Propagation Delay

130nm

90nm

65nm

45nm

32nm

52

Table 8 Power delay Product data for chain of 4 inverters (femto joule)

Different


variable body

biasing

130nm 1.8133748 1.419141 1.013421 0.701583

90nm 0.8649884 0.634253 0.279888 0.251668

65nm 0.3983627 0.339581 0.265778 0.138443

45nm 0.1641998 0.144588 0.080376 0.092067

32nm 0.0754622 0.083884 0.076707 0.080987

Figure 21 power delay product

Figure 21 shows the graphical representation of the power delay product in a CO4 for

different technologies in different methods.

The data of area of aforementioned methods for different technologies are shown in Table 9.

Fヴoﾏ this ┘e Iaﾐ see that ┗aヴiaHle Hody Hiasiﾐg ﾏethod’s aヴea is sﾏalleヴ thaﾐ dual stack and

sleepy stack method but greater than dual sleep method.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2


biasing

fem

to jo

ule

Power delay product

130nm

90nm

65nm

45nm

32nm

53

Table 9 Area delay data for chain of 4 inverters (µm2)

Different


Variable Body

Biasing

130nm 57.87067 34.0692 40.851525 39.17843

90nm 27.73683 16.3053 19.579725 18.77783

65nm 14.4676675 8.504925 10.21288125 9.794606

45nm 6.9342075 4.076325 4.89493125 4.694456

32nm 3.5064832 2.061312 2.475264 2.373888

Figure 22 Area comparison (CO4)

Figure 22 shows the graphical representation of area comparison in a CO4 for different


From the above data we can see that it was possible to reduce the static power consumption

by our proposed method many times than that of the previous methods. In this process, our

process did not suffer any area or delay penalties as it has almost equal delay and area

compared to the previous methods.

0

10

20

30

40

50

60

70

SLEEPY STACK DUAL SLEEP DUAL STACK Variable Body

Biasing

Mic

ro m

ete

r sq

ua

re

Area

130nm

90nm

65nm

45nm

32nm

54

6.1.2 Experimental results for FA

We have considered the following four techniques for comparison with our proposed

technique:

SLEEP TRANSISTOR

SLEEPY KEEPER

DUAL SLEEP

DUAL STACK

The data of static power consumption of these methods for different technologies are shown

in Table 10.

Table 10 Static power data for 1 bit full adder (nano watt)

Different

technologies

SLEEP

TRANSISTOR

SLEEPY

KEEPER DUAL SLEEP DUAL STACK VBB

130nm 0.327 0.365 4.11 3.6544 0.18814

90nm 0.278 0.305 3.3417 2.8934 0.15712

65nm 0.251 0.273 2.7807 2.3359 0.13521

45nm 0.176 0.189 1.6361 1.2675 0.089568

32nm 0.164 0.173 1.2712 0.86129 0.0759

Figure 23 Static power consumption for FA

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

SLEEP

TRANSISTOR

SLEEPY KEEPER DUAL SLEEP DUAL STACK variable Vsb

Na

no

Wa

tt


130nm

90nm

65nm

45nm

32nm

55

Figure 23 shows the graphical representation of static power consumption in an FA for

different technologies in different methods. The data of dynamic power consumption of these

methods for different technologies are shown in Table 11.

Table 11 Dynamic power data for 1 bit full adder (micro watt)

Different

technologies

SLEEP

TRANSISTOR

SLEEPY


130nm 26.2 26.4 26.2 17.3 16

90nm 15 15 15 9.44 8.54

65nm 8.28 8.35 8.28 5.14 4.63

45nm 3.57 3.59 3.57 2.34 2.13

32nm 1.6 1.61 1.6 1.1 1.01

Figure 24 Dynamic power consumption for FA

Figure 24 shows the graphical representation of the dynamic power consumption in an FA for



shown in Table 12.

0

5

10

15

20

25

30

SLEEP

TRANSISTOR


mic

ro W

att


130nm

90nm

65nm

45nm

32nm

56

Table 12 Data of propagation delay for 1 bit full adder (nano second)

Different

technologies

SLEEP

TRANSISTOR

SLEEPY


130nm 20 20 20 19.9 19.9

90nm 20 20 20 20 18.7

65nm 20 20 20 18.8 18.8

45nm 20 20 20 18.8 18.8

32nm 20 20 20 18.8 18.8

Figure 25 Propagation delay comparison in FA

Figure 25 shows the graphical representation of the propagation delay in an FA for different


The data of power delay product of aforementioned methods for different technologies are

shown in Table 13.

18

18.5

19

19.5

20

20.5

SLEEP

TRANSISTOR


na

no

se

con

ds

propagation delay

130nm

90nm

65nm

45nm

32nm

57

Table 13 Power delay product data for 1 bit full adder (femto joule)

Different

technologies

SLEEP

TRANSISTOR

SLEEPY


130nm 524 528 524 344 317

90nm 299 301 300 188 160

65nm 166 167 166 96.7 87

45nm 71.4 71.8 71.4 44 39.9

32nm 32 32.2 32 20.6 19

Figure 26 Power Delay Product for FA

Figure 26 shows the graphical representation of the power delay product in an FA for different


The data of area comparison of aforementioned methods for different technologies are

shown in Table 14.

0

100

200

300

400

500

600

SLEEP

TRANSISTOR


fem

to J

ou

le

POWER DELAY PRODUCT

130nm

90nm

65nm

45nm

32nm

58

Table 14 Area data for 1 bit full adder (µm2)

Different

technologies

SLEEP

TRANSISTOR

SLEEPY


130nm 317.1 312.36 317.98 359.15 330.28

90nm 151.98 149.71 152.4 172.14 163.9

65nm 79.28 78.09 79.5 89.79 85.07

45nm 37.996 37.43 38.1 43.03 40.98

32nm 19.21 18.93 19.3 21.76 20.19

Figure 27 Area Comparison for FA

Figure 27 shows the graphical representation of the area comparison in an FA for different


From the data presented above, we can realize that for 1 bit full adder we get the same kind

of result like we got for chain of four inverters for all the methods under consideration.

0

50

100

150

200

250

300

350

400

SLEEP

TRANSISTOR

SLEEPY

KEEPER

DUAL SLEEP DUAL STACK variable Vsb

mic

ro m

ete

r sq

ua

re

AREA

130nm

90nm

65nm

45nm

32nm

59

6.2 Experimental results for SRAM

In this section, we explore the experimental results for the different SRAM cell variations. Like

the generic circuit experimental comparisons in Section 6.1, here, we have considered the

following four techniques for comparison with our proposed technique:

SLEEP TRANSISTOR

SLEEPY KEEPER

DUAL SLEEP

DUAL STACK

The data of static power of aforementioned methods for different technologies are shown in

Table 15.

Table 15 Static power data for SRAM (nano watt)

Different

technologies SLEEP

SLEEPY


130nm 5.31 5.46 21.6 17.8 3.43

90nm 4.25 4.39 16.5 14.5 2.88

65nm 3.27 3.41 12.3 10.5 2.4

45nm 1.87 1.95 7.31 5.9 1.36

32nm 1.37 1.41 5.35 3.97 0.948

Figure 28 Static power consumption for SRAM cell

0

5

10

15

20

25

sleep sleepy

keeper

dual sleep dual stack variable

vsb

Na

no

Wa

tt


130nm

90nm

65nm

45nm

32nm

60

Figure 28 shows the graphical representation of the static power consumption in SRAM for


The data of dynamic power aforementioned methods for different technologies are shown in

Table 16.

Table 16 Dynamic power data for SRAM (micro watt)

Different

Technologies SLEEP

SLEEPY


130nm 32.5 34.6 32.4 20.8 20.9

90nm 22 21.2 21.7 13.2 13.2

65nm 14.6 14.2 14.7 8.35 7.66

45nm 7.62 7.85 7.62 4.2 4.19

32nm 4.27 4.05 4.27 2.23 2.27

Figure 29 Dynamic power consumption for SRAM

Figure 29 shows the graphical representation of the dynamic power consumption in SRAM for



shown in Table 17.

0

5

10

15

20

25

30

35

40

sleep sleepy

keeper


body

biasing

Mic

ro W

att


130nm

90nm

65nm

45nm

32nm

61

Table 17 Data of propagation delay for SRAM (nano second)

Different

technologies SLEEP

SLEEPY


130nm 5.97 6.05 6.03 6.0207 6.0237

90nm 6.03 5.97 6 6.0194 6.0221

65nm 5.99 5.97 5.95 6.0208 6.0158

45nm 6 6.01 6 6.0228 6.0232

32nm 6.08 6.08 6.08 6.04 6.06

Figure 30 Propagation delay comparison for SRAM

Figure 30 shows the graphical representation of the propagation delay comparison in SRAM

for different technologies in different methods.

The data of power delay product of aforementioned methods for different technologies are

shown in Table 18.

5.95

5.97

5.99

6.01

6.03

6.05

6.07

6.09

sleep sleepy

keeper


body

biasing

Na

no

Se

con

ds

Propagation Delay

130nm

90nm

65nm

45nm

32nm

62

Table 18 Power delay product data for SRAM (femto joule)

Different

Technologies SLEEP

SLEEPY


130nm 0.194057 0.209363 0.195502 0.125338 0.125916

90nm 0.132686 0.12659 0.130299 0.079543 0.079509

65nm 0.087474 0.084794 0.087538 0.050337 0.046095

45nm 0.045731 0.04719 0.045764 0.025331 0.025245

32nm 0.02597 0.024633 0.025994 0.013493 0.013762

Figure 31 Power Delay Product of SRAM

Figure 31 shows the graphical representation of the power delay product in SRAM for


The data of area of aforementioned methods for different technologies are shown in Table 19.

Table 19 Area data for SRAM (µm2)

Different

Technologies sleep

sleepy

keeper dual sleep dual stack

Variable body

biasing

130nm 51.1284 56.3022 17 33 34.85625

90nm 42.56 49 9 16 16.70625

65nm 32.968 36 4 10 8.714063

45nm 25 30 3 2.7 4.176563

32nm 14 20 2 2.2 2.112

0

0.05

0.1

0.15

0.2

0.25

sleep sleepy

keeper


vsb

fem

to jo

ou

le

Power Delay Product

130nm

90nm

65nm

45nm

32nm

63

Figure 32 Area comparison for SRAM

Figure 32 shows the graphical representation of the area in SRAM for different technologies in

different methods.

6.3 Comparison with previous methods

Wheﾐ ┘e Ioﾏpaヴe ouヴヴesult to aﾐotheヴヴesult, ┘e ofteﾐ say oﾐe is さless thaﾐざ the otheヴ. Iﾐ

paヴtiIulaヴ, さX is ﾐ% less thaﾐ Yざﾏeaﾐs ┘hat Eケ. 6.1 shows:[39]

券 =喧堅結懸件剣憲嫌兼結建月剣穴穴欠建欠 −券結拳兼結建月剣穴穴欠建欠喧堅結懸件剣憲嫌兼結建月剣穴穴欠建欠 × 100% (6.1)

For example, when two propagation delay measurements result in, X is 8.18E-10s and Y is

1.23E-09s, n is 50 from calculation using Eq. 6.1. In this case, we say X is 50% less delay than

Y. This equation is used for all other comparison such as area and power consumption.

The comparisons of VBB approach using 90 nm technologies with the existing methods for a

chain of four inverters, 1 bit full adder and for a SRAM cell are summarized in Table 20, 21 and

Table 22, ヴespeIti┗ely. Heヴe さ+ざ deﾐotes iﾏpヴo┗ed aﾐd さ-さ, deﾐotes degヴaded peヴfoヴﾏaﾐIe.

0

10

20

30

40

50

60

sleep sleepy

keeper

dual sleep dual stack Variable

body

biasing

Mic

rom

ete

r S

qu

are

Area

130nm

90nm

65nm

45nm

32nm

64

Table 20 Comparison of VBB Approach for a Chain of Four Inverters (for 90 nm

process)

Methods delay Static Power Dynamic

Power Area

Dual sleep +8.37% +94.7% +55.4% -15.16%

Dual stack +46.67% +92.93% +2.09% +4.09%

Here VBB approach exhibits 8.37%, 94.7% and 55.4% improved performance with respect to

dual sleep technique in delay, Static Power, Dynamic Power respectively while giving 15.16%

penalty in area. With respect to Dual Stack technique it shows 46.67%, 92.93%, 2.09% and

4.09% improved performance in delay, static power, dynamic power and area respectively.

Table 21 Comparison of VBB Approach for a 1 bit full adder (for 90 nm process)


Power Area

Dual sleep +6.5% +95.3% +40.97% -7.55%

Dual stack +6.5% +94.58% +9.53% 4.7887%

Here VBB approach exhibits 6.5%, 95.3% and 40.97% improved performance with respect to

dual sleep technique in delay, Static Power, Dynamic Power respectively while giving 7.55%

penalty in area. With respect to Dual Stack technique it shows 6.5%, 94.58%, 9.53% and

4.7887% improved performance in delay, static power, dynamic power and area respectively.

Table 22 Comparison of VBB Approach for a SRAM (for 90 nm process)


Power Area

Dual sleep -1.1% +80.49% +47.89% -74.28%

Dual stack +0.08% +77.14% +8.26% +12.86%

Here VBB approach exhibits 80.49% and 47.89% improved performance with respect to dual

sleep technique in static Power, dynamic Power respectively while giving 1.1% and 74.28%

penalty in delay and area respectively. With respect to Dual Stack technique it shows 0.08%,

77.14%, 8.26% and 12.86% improved performance in delay, static power, dynamic power and

area respectively.

65

CHAPTER 7

CONCLUSION

This section provides the summary of our contribution, the ratiocination

of this work and some suggestions for future work.

7.1 CONCLUSION

With rigid energy budget in energy constrained systems, subthreshold circuit design

has become a predominant technique in recent years. The battery life of remote or

portable devices may not be affordable to the system demands. In an extreme case,

micro-sensor networks may require very little energy consumption to be supplied by

electrical energy converted from the ambient energy, such as energy harvesting or

energy scavenging. These challenges are solved by designing the systems with respect

to a very low supply voltage below Vth, but performance penalty still remains for

subthreshold circuits. Without the performance requirement, we can focus on minimum

energy operation as a primary goal. On the other hand, some energy efficient systems

have a wide range of speed requirements; therefore the operation of systems may occur

at a non-minimum energy point. We utilize the body biasing effect to further lower

energy budget for energy constrained systems that have speed requirement or not.

Using Variable body Biasing design for subthreshold circuits, static energy is always

less than the prior works while maintaining system speed requirements.

In this dissertation we proposed a new static power reduction technique named

“Variable Body Biasing”. With the help of this technique we were able to reduce the

static power consumption in low power CMOS circuit without penalizing in delay or

area. This design technique offers the low power CMOS circuit designers a new armor

in their arsenal.

66

7.2 SUGGESTIONS FOR FUTURE WORK

We have implemented our design in chain of four inverters, 1 bit full adders and

SRAM circuit. More tests could be done on ISCAS benchmark circuits for

further verification.

In our design we tried to keep the delay and area equal to previous cases.

Further research could be done to explore design techniques to reduce delay and

area as well as static power, hence overall increase of circuit performance.

We have used a minimum of 32 nm nodes in our research. Smaller processes

could be used to explore the static power consumption in sub nanometer

processes.

67

APPENDIX

A. AREA ESTIMATION

Layouts of all the considered approaches are designed based on 130nm process

by using standard layout design application. Areas for below 130nm technology

are estimated by scaling the area of each approach layout designed based on

130nm process. The areas are scaled by a ratio of squares with addition of a

10% overhead for nonlinear scaling layers (i.e., metal layers). For example, if

an area of 100.00µm2 is measured for 130 nm technology, the area for 120nm

technology would be 100.00μm2 * (120

2 / 130

2) * 1.1 = 93.73 μm2

.

B. CIRCUIT DIAGRAMS

Figure 33 SLEEP TRANSISTOR

68

Figure 34 “FORCED STACK” METHOD

Figure 35 “SLEEPY KEEPER” METHOD

69

Figure 36 “DUAL SLEEP” METHOD

Figure 37 “DUAL STACK” METHOD

70

Figure 38 “VARIABLE BODY BIASING TECHNIQUE”

71

Figure 39 “SLEEPY KEEPER” (FA)

Figure 40 “DUAL SLEEP” (FA)

72

Figure 41 “DUAL STACK” (FA)

Figure 42 “VBB” (FA)

73

BIBLIOGRAPHY

[1] C. H. I. Kim, H. Soeleman, and K. Roy, “Ultra-Low-Power DLMS Adaptive

Filter for Hearing Aid Applications,” IEEE Transactions on Very Large Scale

Integration (VLSI) Systems, vol.11,no.6,pp.1058–1067,2003.

[2] M.Seok, S.Hanson, Y.S.Lin, Z.Foo, D.Kim, Y.Lee, N.Liu, D.Sylvester, and

D.Blaauw, “The Phoenix Processor: a 30pW Platform for Sensor Applications,”

in Proceedings of IEEE Symposium on VLSI Circuits, 2008, pp.188–189.

[3] R.Vaddi, S.Dasgupta, and R.P.Agarwal, “Device and Circuit Design Challenges

in the Digital Subthreshold Region for Ultra low-Power Applications,” VLSI

Design,vol.2009,pp.1–14,Jan.2009.

[4] A.Wang, B.H.Calhoun, and A.P.Chandrakasan, “Subthreshold Design for Ultra

Low-Power Systems.” Springer, 2006.

[5] A.Wang and A.Chandrakasan, “A 180mV FFT Processor Using Subthreshold

Circuit Techniques,” in IEEE International Solid-State Circuits Conference

Digest of Technical Papers, 2004, pp.292–529.

[6] B.Zhai, S.Pant, L.Nazhandali, S.Hanson, J.Olson, A.Reeves, M.Minuth,

R.Helfand,T.Austin, D.Sylvester, and D.Blaauw, “Energy-Efficient

Subthreshold Processor Design,”IEEE Transactions on Very Large Scale

Integration (VLSI) Systems,vol.17,no.8,pp.1127–1137, aug2009.

[7] M.Kulkarni, “A Reduced Constraint Set Linear Program for Low-Power Design

of Digital Circuits,” Master‟s thesis, Auburn University, Dept. of ECE, Auburn,

Alabama, Dec.2010.

74

[8] M.Kulkarni and V.D.Agrawal, “A Tutorial on Battery Simulation-Matching

Power Source to Electronic System,” in Proceedings of 14th IEEE VLSI Design

and Test Symposium, July 2010.

[9] M.Kulkarni and V.D.Agrawal, “Energy Source Lifetime Optimization for a

Digital System through Power Management,” in Proceedings of 43rd IEEE

Southeastern Symposium on System Theory, Mar.2011,pp.75–80.

[10] B.Zhai, D.Blaauw, D.Sylvester, and K.Flautner, “Theoretical and Practical

Limits of Dynamic Voltage Scaling,” in Proceedings of 41st Design

Automation Conference, 2004, pp. 868–873.

[11] B. H. Calhoun and A. P. Chandrakasan, “Ultra-Dynamic Voltage Scaling

(UDVS) Using Subthreshold Operation and Local Voltage Dithering,” IEEE

Journal of Solid-State Circuits,vol.41, no.1,pp.238–245,2006.

[12] International Technology Roadmap for Semiconductors by Semiconductor

Industry Association, 2002. [Online] Available http://public.itrs.net

[13] kim,n., Austin,t., Baauw,d., Mudge,t., Flautner,k., Hu,j., Irwin, m., Kandemir,

m., and Narayanan, v., “Leakage Current: Moore‟s Law Meets Static Power,”

IEEE computer, vol. 36, pp. 68–75, December 2003.

[14] J.C. Park, V. J. Mooney III and P. Pfeiffenberger, “Sleepy Stack Reduction of

Leakage Power,” Proceeding of the International Workshop on Power and

Timing Modeling, Optimization and Simulation, pp. 148-158, September 2004.

[15] J. Park, “Sleepy Stack: a New Approach to Low Power VLSI and Memory,”

Ph.D. Dissertation, School of Electrical and Computer Engineering, Georgia

Institute of Technology, 2005. [Online].Available http://etd.gatech.edu/theses

[16] S. Kim and V. Mooney, “The Sleepy Keeper Approach: Methodology, Layout

and Power Results for a 4 bit Adder,” Technical Report GIT-CERCS-06-03,

75

Georgia Institute of Technology, March 2006,http://www.cercs.gatech.edu/tech-

reports/tr2006/git-cercs-06-03.pdf.

[17] N. Karmakar, M. Z. Sadi, M. K. Alam and M. S. Islam, “A novel dual sleep

approach to low leakage and area efficient VLSI design” Proc. 2009 IEEE

Regional Symposium on Micro and Nano Electronics(RSM2009), Kota Bharu,

Malaysia, August 10-12, 2009, pp. 409-414.

[18] M. S. Islam, M. Sultana Nasrin, Nuzhat Mansur and Naila Tasneem, “Dual

Stack Method: A Novel Approach to Low Leakage and Speed Power Product

VLSI Design” Proc, International Conference on Electrical and Computer

Engineering (ICECE) 2010, Dhaka, Bangladesh. 18-20 December 2010, pp. 89-

92.

[19] V.Kursun and E.G.Friedman, Multi-Voltage CMOS Circuit Design. Wiley,

2006.

[20] Y.Ramadass and A.Chandrakasan, “Voltage scalable switched capacitor dc-dc

converter for ultra-low-power on-chip applications,” in Proceedings of Power

Electronics Specialists Conference, 2007, pp.2353–2359.

[21] Berkeley Predictive echnology Model (BPTM). [Online]. Available http: //

www. device. eecs. berkeley.edu/˜ptm/.

[22] JOHNSON, M.C., SOMASEKHAR,D., and ROY,K., “Models and Algorithms

for Bounds on Leakage in CMOS Circuits,” IEEE Transactions on Computer

Aided De-sign on Integrated Circuits and Systems, vol.18, no.6, pp.714–725,

June1999.

[23] NARENDRA, S., DE,V., BORKAR,S., ANTONIADIS,D.A., and

CHANDRAKASAN,A.P., “Full-Chip Subthreshold Leakage Power Prediction

and Reduction Techniques for Sub-0.18µm CMOS,” IEEE Journal of Solid-

State Circuits, vol.39, no.2, pp.501–510, February2004.

76

[24] SHEU, B., SCHARFETTER, D., KO,P.-K., and JENG,M.-C., “BSIM: Berkeley

short-channel IGFET model for MOS transistors,” IEEE Journal of Solid-State

Circuits, vol.22, pp.558–566, August1987.

[25] UYEMURA, J.P., CMOS Logic Circuit Design Second Edition. Norwell,

Massachusetts USA: Kluwer Academic Publishers, 1999.

[26] KIM,C. and ROY,K., “Dynamic Vt SRAM: a Leakage Tolerant Cache Memory

for Low Voltage Microprocessors,” Proceedings of the International

Symposium on Low Power Electronics and Design, pp.251–254, August2002.

[27] NOSE,K. and SAKURAI,T., “Analysis and Future Trend of Short Circuit

Power,” IEEE Transactions on Computer Aided Design of Integrated Circuits

and Systems, vol.19, no.9, pp.1023–1030, September 2000.

[28] CHANDRAKASAN, A. P., SHENG, S., and BRODERSEN, R.W., “Low-

Power CMOS Digital Design,” IEEE Journal of Solid-State Circuits, vol.27,

no.4, pp.473–484, April1992.

[29] KHELLAH,M.M. and ELMASRY,M.I., “Power Minimization of High-

Performance Submicron CMOS Circuits Using a Dual-Vdd Dual-Vth (DVDV)

Approach,” Proceedings of the International Symposium on Low Power

Electronics and Design, pp.106–108, 1999.

[30] SAKURAI, T. and NEWTON, A. R., “Alpha-Power Law MOSFET Model and

Its Application to CMOS Inverter Delay and Other Formulas,” IEEE Journal of

Solid State Circuits, vol.25, no.2, pp.584–593, April 1990.

[31] BOWMAN,K.A., AUSTIN,B.L., EBLE,J.C., TANG,X., and MEINDL,J.D., “A

Physical Alpha-Power Law MOSFET Model,” IEEE Journal of Solid-State

Circuits, vol.34, no.10, pp.1410–1414, October 1999.

[32] MICHELI, G.D., “Synthesis and Optimization of Digital Circuits.” USA:

McGraw-Hill Inc., 1994.

77

[33] Neil H. E. Weste., Harris, David., Banerjee, Ayan, “CMOS VLSI DESIGN: a

circuits and systems perspective”, third edition, Pearson, pp. 116-117, 2006

[34] MUTOH,S., DOUSEKI,T., MATSUYA,Y., AOKI,T., SHIGEMATSU,S., and

YAMADA,J., “1-V Power Supply High-speed Digital Circuit Technology with

Multi-threshold-Voltage CMOS,” IEEE Journal of Solis-State Circuits, vol.30,

no.8, pp.847–854, August 1995.

[35] CHEN,Z., JOHNSON,M., WEI,L., and ROY,K., “Estimation of Standby

Leakage Power in CMOS Circuits Considering Accurate Modeling of

Transistor Stacks,” Proceedings of the International Symposium on Low Power

Electronics and Design, pp.239–244, August 1998.

[36] Berkeley Predictive Technology Model (BPTM). [Online]. Available

http://www.device.eecs.berkeley.edu/˜ptm/.

[37] NARENDRA,S., S.BORKAR,V.D., ANTONIADIS,D., and

CHANDRAKASAN,A., “Scaling of Stack Effect and its Application for

Leakage Reduction,” Proceedings of the International Symposium on Low

Power Electronics and Design, pp.195–200, August 2001.

[38] JOHNSON,M., SOMASEKHAR,D., CHIOU,L.-Y., and ROY,K., “Leakage

Control with Efficient Use of Transistor Stacks in Single Threshold CMOS,”

IEEE Transactions on VLSI Systems, vol.10, no.1, pp.1–5, February 2002

[39] D. Patterson and J. Hennessy, Computer Architecture: A Quantitative

Approach. Palo Alto, California: Morgan Kaufmann Publishers, pp. 5-7, 1990.

variable body bias thesis-libre

Documents