portless sram—a high-performance alternative to the 6t methodology

11
2600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007 Portless SRAM—A High-Performance Alternative to the 6T Methodology Michael Wieckowski, Student Member, IEEE, Sandeep Patil, Student Member, IEEE, and Martin Margala, Senior Member, IEEE Abstract—A novel memory cell, termed “portless” SRAM, is presented as a direct alternative to the standard 6T design. The new cell consists of only five transistors and does not make use of any pass-transistor ports. A complete theoretical and functional analysis is presented along with a design methodology for implementing the new memory cell. In addition, simulations are presented on the cell level and on the cache level exhibiting comparative improvements on the order of 19 and 6 in dy- namic power and leakage power respectively. This is augmented by a 20% improvement in static noise margin for a comparable cell area. A test chip was fabricated, and measured results are presented demonstrating functionality of the new cell. Index Terms—CMOS memory, low power, SRAM, 6T, 5T. I. INTRODUCTION F OR OVER 45 years, the architecture of MOS static memory (SRAM) cells has remained relatively unchanged, based primarily on a technique referred to in this paper as the 6T methodology [1]. It is characterized by the interaction of two cross-coupled inverters with a set of access pass-transistors, or ports as shown in Fig. 1. This circuit has suffered from a well-known problem inherent to its structure: the more stable a cell is during a read operation, the more difficult it is to change its contents during a write operation. This is a result of the voltage divider formed by the port pass-transistor and the inverter pull-down, a characteristic specific to the 6T design methodology. While a great variety of novel static memory cells have been proposed in the past [2]–[4], their collective goals have only been the gradual evolution of 6T performance metrics such as dynamic power, area, leakage, and stability. Not one design has strayed from the basic 6T methodology of manipulating storage nodes through pass-transistor ports. It is worthwhile to mention that some designs have proposed additions to the 6T cell using a seventh or eighth transistor for added write functionality [5]–[7]. The tradeoff in area for performance in these designs is generally unacceptable for most applications, and as such, they have not generated significant interest. Manuscript received January 30, 2007; revised June 22, 2007. M. Wieckowski was with the Department of Electrical and Computer Engi- neering, University of Rochester, Rochester, NY 14627 USA. He is now with the University of Michigan, Ann Arbor, MI 48109-2122 USA (e-mail: wieckows@ umich.edu) S. Patil was with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627 USA (e-mail: spatil.sndp@ gmail.com). M. Margala is with the Department of Electrical and Computer Engi- neering, University of Massachusetts–Lowell, Lowell, MA 01854 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/JSSC.2007.907173 Fig. 1. Single-port memory cell based on the standard 6T methodology. In this work, a new “portless” memory cell is presented that implements SRAM functionality in an entirely novel way [8]. This new cell not only offers certain performance improvements as a direct alternative for current 6T designs, but also gives de- signers a new starting point for developing novel power reduc- tion techniques based on sleeping and drowsy modes. The remainder of this paper is organized as follows. Section II presents the basic operation of the new design and Section III describes how the cell functions through a theoretical analysis of its stability. Section IV details the design methodology for the proposed memory cell and Section V presents some of the scaling considerations such as physical layout and variation analysis. Sections VI and VII present the simulated and experi- mental results, respectively, along with various comparisons to standard designs. Section VIII concludes this work by summa- rizing its contributions. II. PRINCIPLE OF OPERATION Data retention in the portless SRAM is based on the classic cross-coupled inverter structure used in the standard 6T method- ology. Complementary data is stored between the two inverter output nodes, and as in the conceptual example shown in Fig. 2(a). The positive feedback connection ensures static sta- bility during standby. Portless SRAM builds upon this structure with the insertion of transistor M5 as shown in Fig. 2(b). When this transistor is off, the portless cell is stable and data is retained as long as power is supplied. When this transistor is on, current flows from the high data node to the low data node. At first glance, it would seem that this additional transistor would have the effect of shorting the two data node voltages, essentially erasing the stored data. Indeed, such a scheme was employed by Khellah to erase and write new data into a seven- transistor (7T) cell [7]. Reading in the 7T cell was performed using standard wordline port transistors. It can be shown how- ever, that such equalization of the data nodes is not the only stable state when the seventh transistor is on. In fact, there exist a 0018-9200/$25.00 © 2007 IEEE

Upload: independent

Post on 26-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

2600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007

Portless SRAM—A High-Performance Alternativeto the 6T Methodology

Michael Wieckowski, Student Member, IEEE, Sandeep Patil, Student Member, IEEE, andMartin Margala, Senior Member, IEEE

Abstract—A novel memory cell, termed “portless” SRAM,is presented as a direct alternative to the standard 6T design.The new cell consists of only five transistors and does not makeuse of any pass-transistor ports. A complete theoretical andfunctional analysis is presented along with a design methodologyfor implementing the new memory cell. In addition, simulationsare presented on the cell level and on the cache level exhibitingcomparative improvements on the order of 19 and 6 in dy-namic power and leakage power respectively. This is augmentedby a 20% improvement in static noise margin for a comparablecell area. A test chip was fabricated, and measured results arepresented demonstrating functionality of the new cell.

Index Terms—CMOS memory, low power, SRAM, 6T, 5T.

I. INTRODUCTION

FOR OVER 45 years, the architecture of MOS staticmemory (SRAM) cells has remained relatively unchanged,

based primarily on a technique referred to in this paper as the6T methodology [1]. It is characterized by the interaction oftwo cross-coupled inverters with a set of access pass-transistors,or ports as shown in Fig. 1. This circuit has suffered from awell-known problem inherent to its structure: the more stablea cell is during a read operation, the more difficult it is tochange its contents during a write operation. This is a result ofthe voltage divider formed by the port pass-transistor and theinverter pull-down, a characteristic specific to the 6T designmethodology. While a great variety of novel static memorycells have been proposed in the past [2]–[4], their collectivegoals have only been the gradual evolution of 6T performancemetrics such as dynamic power, area, leakage, and stability.Not one design has strayed from the basic 6T methodologyof manipulating storage nodes through pass-transistor ports.It is worthwhile to mention that some designs have proposedadditions to the 6T cell using a seventh or eighth transistorfor added write functionality [5]–[7]. The tradeoff in area forperformance in these designs is generally unacceptable for mostapplications, and as such, they have not generated significantinterest.

Manuscript received January 30, 2007; revised June 22, 2007.M. Wieckowski was with the Department of Electrical and Computer Engi-

neering, University of Rochester, Rochester, NY 14627 USA. He is now with theUniversity of Michigan, Ann Arbor, MI 48109-2122 USA (e-mail: [email protected])

S. Patil was with the Department of Electrical and Computer Engineering,University of Rochester, Rochester, NY 14627 USA (e-mail: [email protected]).

M. Margala is with the Department of Electrical and Computer Engi-neering, University of Massachusetts–Lowell, Lowell, MA 01854 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/JSSC.2007.907173

Fig. 1. Single-port memory cell based on the standard 6T methodology.

In this work, a new “portless” memory cell is presented thatimplements SRAM functionality in an entirely novel way [8].This new cell not only offers certain performance improvementsas a direct alternative for current 6T designs, but also gives de-signers a new starting point for developing novel power reduc-tion techniques based on sleeping and drowsy modes.

The remainder of this paper is organized as follows. Section IIpresents the basic operation of the new design and Section IIIdescribes how the cell functions through a theoretical analysisof its stability. Section IV details the design methodology forthe proposed memory cell and Section V presents some of thescaling considerations such as physical layout and variationanalysis. Sections VI and VII present the simulated and experi-mental results, respectively, along with various comparisons tostandard designs. Section VIII concludes this work by summa-rizing its contributions.

II. PRINCIPLE OF OPERATION

Data retention in the portless SRAM is based on the classiccross-coupled inverter structure used in the standard 6T method-ology. Complementary data is stored between the two inverteroutput nodes, and as in the conceptual example shown inFig. 2(a). The positive feedback connection ensures static sta-bility during standby. Portless SRAM builds upon this structurewith the insertion of transistor M5 as shown in Fig. 2(b). Whenthis transistor is off, the portless cell is stable and data is retainedas long as power is supplied. When this transistor is on, currentflows from the high data node to the low data node.

At first glance, it would seem that this additional transistorwould have the effect of shorting the two data node voltages,essentially erasing the stored data. Indeed, such a scheme wasemployed by Khellah to erase and write new data into a seven-transistor (7T) cell [7]. Reading in the 7T cell was performedusing standard wordline port transistors. It can be shown how-ever, that such equalization of the data nodes is not the onlystable state when the seventh transistor is on. In fact, there exist a

0018-9200/$25.00 © 2007 IEEE

WIECKOWSKI et al.: PORTLESS SRAM—A HIGH-PERFORMANCE ALTERNATIVE TO THE 6T METHODOLOGY 2601

Fig. 2. (a) Cross-coupled inverters and (b) portless memory cell structure.

Fig. 3. Current flow when a portless cell is accessed and M5 is conducting.

variety of stable states where the seventh transistor is on and celldata is still retained. This behavior is fundamental to the opera-tion of portless SRAM and allows for both reading and writingof cell data without the need for wordline port transistors.

Considering the basic portless cell structure in Fig. 3, cur-rent will flow through transistor M5 when it is turned on, thedirection of which will be dependant on the data stored in thecell. This current will be supplied by the PFET attached to thehigh data node and sunk by the NFET attached to the low datanode. It can only flow however, if a finite drain-source voltageis present across transistors M2, M5, and M4. Therefore, theportless cell will behave in one of two ways when a positivegate-source voltage is applied to M5:

1) A current will flow through M5 until the cell nodes areequalized. At that point, an equilibrium condition isreached where the drain-source voltage of M5 approacheszero and the current ceases to flow.

2) A current will flow through M5 continuously creating anequilibrium condition where and are separated by aconstant voltage.

In portless SRAM, the first behavior is used for writing data intothe cell much in the same way as the previously mentioned 7Tdesign. The second behavior however, preserves the data in thecell since and remain distinct. This is the situation utilizedin portless SRAM for reading data from the cell.

In order to employ such a methodology within an SRAMarray, the cells themselves must be grouped into columns andneed to interact with common bitlines during read and writeoperations. In standard SRAM designs, the access port tran-sistors served to fulfill this role. In the new portless topology,this is accomplished by removing the cell PFET connections tothe global supply and instead, powering the cells directly from

Fig. 4. New portless SRAM cell and its column bitline connection.

the bitlines. A typical portless cell connection within a singleSRAM column is shown in Fig. 4 where BLL and BLR arethe left and right bitlines respectively. With this column struc-ture, read and write operations are performed by pulsing the ac-cess (AXS) transistor (M5 in Figs. 2 and 3.) This is function-ally equivalent to signaling the wordline (WL) ports in the 6Tmethodology. To write to the cell, AXS is driven in such a waythat the cell nodes equalize and new data can be injected. Toread from the cell, AXS is driven such that the cell data persistsand in turn causes one of the PFETS to draw additional currentfrom its bitline supply. This generates a differential bitline cur-rent that can then be detected by a sense-amplifier at the bottomof the column. The specific techniques for designing and con-trolling portless SRAM cells are presented in Section IV.

III. CELL MODEL AND STABILITY ANALYSIS

Analysis of cell stability is a fundamental necessity in thedevelopment of a successful SRAM design flow. It is widelyaccepted that, in this context, stability is best defined by acell’s static noise margin (SNM) during stand-by, reading, andwriting. Static noise margin for the new portless cell is bestcharacterized in the small-signal domain by breaking it into aforward gain path and a feedback gain path, andrespectively as shown in Fig. 5. Following the technique from[9], the loop gain of this system can be described as

(1)

2602 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007

Fig. 5. Equivalent small-signal gain model of the portless cell.

In this context, the SNM is defined as the difference betweenand when the voltage at results in a loop gain of

one. This is mathematically equivalent to the smallest butterflysquare method originally used by Seevinck [10], and can bewritten as

(2)

where is the DC input voltage resulting in unity loop gain[11]. To obtain some insight into how the static noise margin re-lates to the cell transistor sizes, the loop gain can be derived an-alytically using small-signal MOSFET models. During a read,the gate of transistor M5 is connected to as shown in Fig. 5.Since the stored data must not be lost during this operation, it isassumed that and remain separated by a large voltage. Inthe following analysis therefore, is near , andis near . This forces M1, M3, and M5 into saturation whileM2 and M4 operate in the triode region. The positive and neg-ative forward gain components can be combined as shown inFig. 6 where the elements represent the MOSFET equiva-lent resistances. One can then determine an expression for thetotal forward gain of the 5T cell during a read operation. Theopen circuit gain of this model can be expressed as

(3)

Fig. 6. Equivalent forward gain model of the portless cell.

where is the ratio, by equating the currents from, to , and from , to , , . Substitu-

tions can be made for the and terms to obtain the for-ward gain in terms of MOSFET parameters as in (4), shownat the bottom of the page, where the terms represent thegate overdrive voltages for transistor , the terms representthe process transconductance parameters, and the terms repre-sent the channel length modulation parameters. The gate over-drives can be written in terms of , the MOSFET thresholdvoltage, and the cell node voltages as ,

, and . The feedbackgain component can be modeled as a basic common-source am-plifier, and its open-circuit gain written as

(5)

The loop gain for the entire cell is simply the multiplicationof with . In the case of the portless cell, the feedbackgain in (5) is always negative. The forward gain however, isa strong function of the operating conditions of transistor M5.Considering the MOSFET equivalent resistances, it is clear bydefinition that the denominator of (4) is positive. Therefore, inorder to ensure an overall closed loop gain between zero andunity, the second term of the numerator must remain smallerthan the first term. This directly implies that the SNM of theportless cell can be controlled through either the sizing of M5or its gate overdrive voltage. This result is pivotal to the correctoperation of portless SRAM and is analyzed in more detail inthe following sections using Spice simulations.

(4)

WIECKOWSKI et al.: PORTLESS SRAM—A HIGH-PERFORMANCE ALTERNATIVE TO THE 6T METHODOLOGY 2603

Fig. 7. SNM as a function of AXS length using the loop gain method.

A. Stability Simulations

The read SNM for a portless cell measured from simulationusing BSIM4 foundry models for a 90 nm CMOS process isshown in Fig. 7 as a function of M5 length, henceforth referredto as the AXS length. The simulation was performed by in-serting DC sources into the cross-coupling paths of the cell andexecuting a stability analysis to determine the unity loop gaincondition. As can be seen in Fig. 7, the SNM increases withAXS length and only achieves a measurable value after approxi-mately 200 nm (twice the process minimum.) At lengths smallerthan 200 nm, the cell effectively fails to retain data during aread. In addition, as AXS gets longer and the SNM increases,the current through the AXS transistor (and hence the cell it-self) decreases. The optimal SNM for a particular application istherefore a tradeoff between cell area, read current, and stability.

Similar simulations were performed to determine the rela-tionship between the inverter MOSFETs and the cell SNM. Onecan intuit that since the DC voltages at and are functionsof both cell current and transistor sizes, then the SNM is moreaccurately described by the ratio of the AXS transistor dimen-sions to the inverter dimensions. This is similar to the ratioconcept in the 6T methodology, except that the inverter PFET’splay a more important role. As can be seen in Fig. 8, an increasein the inverter ratio for a fixed AXS length results in an increasein SNM. As such, the portless cell offers designers significantflexibility when tailoring a cell to a specific application sinceboth the cell current and SNM can be tuned over a wide range.

In addition to analyzing the SNM of the active cell during aread, it is important in the case of portless SRAM to consider theSNM of the inactive cells in the same column. This stems fromthe fact that all cells in a column share common bitlines as theirpower supply. One must therefore ensure that read and write op-erations to one cell do not cause instability in the other inactivecells of the same column. Fig. 9 depicts the simulated relation-ship between the bitline voltage differential of a column and theSNM of an inactive cell in that column for a 1.0 V process. Forthis particular case, the positive differential corresponds to thehigh node of the cell being connected to the bitline pulled lowerthan through the active cell PFET. It is clear from this figurethat the inactive cells of a particular column remain stable with a

Fig. 8. SNM as a function of the cell inverter ratio.

Fig. 9. SNM as a function of bitline voltage differential (1.0 V supply).

large SNM even for bitline swings greater than would typicallybe present in a modern SRAM architecture.

IV. DESIGN METHODOLOGY

Designing portless SRAM cells introduces performancetradeoffs uncommon to any other SRAM architecture previ-ously developed. These tradeoffs provide a new dimension tothe SRAM design space that affords one finer control whentargeting a particular application. More specifically, the pro-posed portless technique allows for the design of high SNM,low leakage cells with no penalty in area and without resortingto multithreshold techniques. In addition, the proposed designis well suited for drowsy and sleep-mode architectures withoutthe overhead of specialized power routing, as is demonstrated inSection V-D. With this in mind, the following constraints serveas the generalized foundation for designing portless SRAMcells:

1) Since all of the cells in a particular column are poweredby the bitlines, the differential bitline voltage swing duringreads and writes must not catastrophically affect the sta-bility of other cells in the same column.

2) During a read operation, the SNM must remain sufficientlylarge to ensure data stability.

2604 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007

3) During a read operation, the active cell must induce a dif-ferential bitline current large enough for sensing that over-comes the various bitline leakage currents.

4) During a write operation, the bitline differential must belarge enough to change the state of the active cell withoutcatastrophically affecting the stability of other cells in thesame column.

Given these constraints, a design methodology has been devel-oped and is presented in this work for reading from and writingto the portless SRAM cell. It makes use of a long-channel ac-cess transistor to satisfy the stability criteria while maintainingacceptable performance with respect to cell current, leakagepower, and area.

A. The Long AXS Design Technique

The use of a long access transistor is a straightforward sizingtechnique to ensure proper operation of the portless cell. Inorder to satisfy the second design criteria of data stability duringread operations, it is desirable to keep and separatedby a relatively large voltage. The result that follows is a smalldrain-source voltage across both of the active transistors sup-plying the data nodes, M2 and M4 in Fig. 3. This implies thatthey are operating in the triode region or at the onset of satura-tion with a relatively small current flowing through them. Thissame current must also flow through the access transistor, M5in Fig. 3. In this case, it will be assumed that the gate-sourcevoltage of M5 is large, and that its drain-source voltage is com-parable. This implies that its operating region be near or in sat-uration. If all transistors in the cell were minimum sized, thenM2 and M4 would be unable to source and sink the necessarycurrent to keep M5 saturated. As a result, the data nodes wouldbegin to equalize, during which time the inactive transistors, M1and M3, would begin to turn on. The cell would begin drawingcurrent from both bitlines and the stored data would eventuallybe lost.

To avoid this situation, one only has to increase the length ofM5 to ensure an equilibrium condition where its drain current insaturation is equal to the smaller drain currents of M2 and M4.This corresponds to the loop gain stability conditions mentionedin Section III with respect to SNM. A parametric simulation isshown in Fig. 10 in which the cell data nodes are plotted in timefor different length AXS transistors and minimum sized invertertransistors in a 0.18 m CMOS process. For the minimum sizedAXS transistor, the cell data nodes quickly equalize to approx-imately 700 mV. As the access transistor gets longer however,equilibrium conditions can be realized where the cell data nodevoltages are kept distinct. It should be noted that for the 475 nmcase in the figure, the time constant of equilibration is a factorof the access transistor current and the cell node capacitance.For designs aggressively optimized for speed, one can choosean access transistor length that results in an equilibration timelonger than the cell read time. For example, for an access lengthof 475 nm, the cell data will be retained successfully as long asthe cell access time is less than approximately 0.5 ns. This al-lows the designer to greatly reduce both the cycle time and thearea while still guaranteeing cell stability.

Waveforms for a simulated memory using the long AXS tech-nique and voltage-mode signaling are shown in Fig. 11. Data is

Fig. 10. Dynamic response of cell nodes for various AXS transistor lengths.

written into a cell and then read back in an alternating fashion.The corresponding bitline voltages and cell node voltages areshown along with the AXS and column write signals. When awrite signal is asserted with an AXS pulse, the differential inbitline voltage of approximately 15% lowers the SNM ofthe selected cell such that it changes state to reflect the data onthe bitlines. When the AXS signal is asserted without a writepulse, the cell remains stable and induces a bitline differentialof approximately 10% to reflect the data it is storing.

Table I presents a comparison between the portless cell andtwo standard 6T cells using the long AXS technique and logicdesign rules. The first 6T design, labeled , was taken from[4] and uses larger pull-down NFET’s to improve cell current.The second design, labeled , was taken from [12] and useslonger channel wordline transistors to minimize bitline leakagecurrent. In each column of the table under the portless design,the cell is constructed to match the cell for the highlightedmetric as close as possible to demonstrate the resulting trade-offs. Considering high-speed, low-power applications in partic-ular, the portless cell matched in area to the standard 6T cellexhibits a 44% increase in SNM and a 39% decrease in standbyleakage with 29% decreased cell read current. As long as thisread current is still large enough to overcome bitline leakage andensure successful data sensing, a reduction of dynamic powerconsumption can be expected as well.

V. SCALING CONSIDERATIONS

One of the primary motivations behind the aggressive scalingseen in modern CMOS technologies has been the improved per-formance with respect to speed and density. In SRAM design,these improvements come at the cost of increased sensitivity toprocess variation, operating environment, and transistor leakagewithin the cell array. While there have been some attempts atmitigating these issues in the traditional 6T methodology, penal-ties in complexity and area have generally been unavoidable[13], [14]. It is for this reason that the new portless cell is pro-posed as an attractive alternative to its 6T counterpart, exhibitingmany of the features desirable for sub-100 nm applications, suchas low leakage, high SNM, and variation tolerance, without anyof the associated overhead.

WIECKOWSKI et al.: PORTLESS SRAM—A HIGH-PERFORMANCE ALTERNATIVE TO THE 6T METHODOLOGY 2605

Fig. 11. Simulated timing diagram using long AXS portless cell.

TABLE IPORTLESS CELL COMPARED TO TWO STANDARD

6T DESIGNS IN 0.18 �m CMOS @ 110 C

A. Physical Layout

The physical layout of the portless memory cell is similar toprevious 6T designs [12]. This is beneficial in many ways sincethe highly optimized SRAM design rules can still be applied. Ageneralized stick layout is shown in Fig. 12 for a standard 6Tdesign and the new portless design. In both layouts, the groundand bitline contacts are shared with neighboring cells. In the 6Tdesign, and wordline contacts are also shared. This is not thecase in the portless layout where there is no contact and theAXS contact is contained within the cell itself. The area com-parison between the two memory cells is presented for severaldifferent designs in Table I.

B. Power—Dynamic Consumption and Leakage

In static cache architectures, the consumption of powerstems from two dominant sources. The first is dynamic, re-sulting mainly from the charging and discharging of both thebitline and the wordline capacitance [15]. In the new portlesscell, bitline swinging is minimized and power consumptionreduced by using current-mode signaling during read opera-tions and by dynamically altering the cell stability during writeoperations. The wordline load is generally also reduced for

Fig. 12. Layout of the (a) standard 6T cell and the (b) new portless cell.

portless cells, since the gate area, and hence the gate-sourcecapacitance, of the AXS transistor is smaller than that of thetwo wordline transistors in 6T designs.

The second dominant contributor to cache power consump-tion is leakage current during standby. An analysis of cell levelleakage power through extracted layout simulation suggests thatthe new portless SRAM cell outperforms the standard 6T de-sign used for comparison. This is also well supported by morecomplex simulations of full caches based on the new portlesscell. Such a result can be explained by comparing the dominantleakage paths of the two cell structures as shown in Fig. 13.

Considering the subthreshold leakage currents and their asso-ciated transistors, several important observations can be made.The first relates to the magnitude of the total leakage per cell.Even though both the 6T and the portless cell contain three dom-inant leakage paths, the total leakage of the long AXS portlesscell will be less than that of the 6T cell for the same stand-byoperating conditions. This result stems exclusively from the ge-ometries of the transistors in the cell and the associated propor-tionality of their leakage current.

The second observation to be made from Fig. 13 relates tobitline leakage current. One of the most important trends seenwith the aggressive scaling of CMOS technologies with respectto SRAM is the ratio of the cell current during a read to the bit-line leakage current [16], [17]. In 180 nm nodes and smaller, thebitline leakage current of the inactive cells masks the read cur-rent of the active cell and in turn, reduces the realizable columnheight. When one considers that the portless cell uses a PFETfor the bitline connection instead of an NFET as seen in the 6Tmethodology, it is expected that bitline leakage will be reducedbased solely on the difference in carrier mobility.

Finally, it should be noted that since the bitlines of the port-less SRAM cache serve to power the individual cells, then theimplementation of a sleep mode can be realized by simply re-ducing the bitline pre-charge voltage during stand-by. Such ascheme is significantly simpler than any of the previously pro-posed methods for 6T cache, which typically require additionaltransistors within the cell itself [18], [19]. As shown in Fig. 14,cell stability in terms of SNM for a portless cell in a 1.0 V tech-nology can be maintained near 175 mV when the bitline voltage

2606 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007

Fig. 13. Subthreshold leakage paths in (a) 6T and (b) portless cells.

Fig. 14. SNM as a function of bitline precharge voltage for sleep mode.

is reduced to around 50% . This reduction of cell supply pro-vides a proportional savings in leakage power with no addedoverhead in the cell array itself.

C. Bitline Capacitance Mismatch

It is important to point out that the new portless SRAM archi-tecture introduces a dependency of bitline capacitance on cachedata, an effect not seen in other SRAM methodologies. This re-lationship results from the direct connection of the cell PFET’sto the bitlines. Whichever PFET is connected to the high datanode operates in the triode region while the low data node PFETis effectively off. The high side bitline will therefore experiencea much higher gate-source capacitance in comparison to the lowside.

This capacitive mismatch effect is insignificant in current-mode architectures where the bitlines are typically clamped to

. For voltage mode operation however, one must design forthe worst-case scenario where all the cells in a particular columnare storing the same value. This results in a maximum differen-tial in bitline capacitance and can potentially have a negative im-pact on the overall speed of read and write operations if the bit-line capacitance is not dominated by interconnect parasitics. Thefabricated chip in Section VII was designed for this case usingvoltage-mode signaling, and no failures resulting from columndata dependence could be detected experimentally.

TABLE IITRANSISTOR SIZES FOR 90 NM SRAM CELLS

D. Variation Analysis

The aggressive scaling of technology feature sizes that haslead to the sub-100 nm technology nodes has been unavoidablyaccompanied by a focused concern for variations in process pa-rameters after fabrication. Such variation has been present inpast technologies and was more or less accounted for by treatingthe resulting deviations in threshold, mobility, etc. as inter-diephenomena [20]. In newer technologies however, this general-ization no longer applies since the feature sizes have becomesmall enough to make the assumption of dopant homogeneity afalse one. This is especially true in SRAM technologies wherethe design rules are pushed to reduce chip area and the tran-sistors themselves are sized near the allowed minimums. Theresulting local, or intra-die, variations can cause significant de-viations in cell performance or even functional failures [9], [21],[22]. As such, it is of pivotal importance to characterize how thenew portless cell behaves when local variations in process, tem-perature, and supply are introduced including mismatch varia-tions within the memory cell itself.

To gain some insight into how the new portless cell performswith respect to parameter variation, Monte Carlo simulationswere carried out in a 90 nm process for a foundry standard 6Tcell as well as for a portless cell designed to occupy compa-rable area. The transistor sizes for the two designs are given inTable II. The simulations were carried out for 1000 runs and theinduced process variation was based on a foundry model that in-cluded device mismatch. Figs. 15 and 16 show the results for theread SNM of the 6T and the portless cells respectively at 110 C.The portless cell exhibits approximately 23% higher SNM thanthe 6T cell, while the standard deviations of the 6T and the port-less cells are comparable at around 12% and 16% respectively.Figs. 17 and 18 depict how the cell currents drawn from the bit-lines during a read operation vary for the same two cell designsat 27 C and 110 C. When comparing these two figures, it canbe concluded that the portless cell exhibits a slightly higher sen-sitivity to temperature than the 6T design, varying 10% over the

WIECKOWSKI et al.: PORTLESS SRAM—A HIGH-PERFORMANCE ALTERNATIVE TO THE 6T METHODOLOGY 2607

Fig. 15. Monte Carlo SNM analysis of standard 90 nm 6T cell.

Fig. 16. Monte Carlo SNM analysis of a 90 nm portless cell.

83 C range versus 6% in the 6T case. On the other hand, SNMas a function of temperature, as shown in Fig. 19, remains fairlyconstant for the portless cell while the SNM of the 6T design de-creases by about 5% over the same temperature range. The port-less design therefore, is slightly better suited for applications re-quiring good SNM performance. Overall, one can conclude thatthe new portless memory cell performs at a comparable level asthe 6T design when process variation, mismatch, and temper-ature are considered together, and that the portless cell can besuccessfully employed in future sub-100 nm technologies.

VI. CACHE LEVEL SIMULATION RESULTS

To garner some understanding of the new portless memoryperformance on a larger scale, a full 32 kb cache was simulatedfrom an extracted custom layout designed using logic rules ina 0.18 m process. The cache was designed with 128 rows and256 columns and the columns were divided into 32 words, each8-bits long. A 7-bit row decoder and 5-bit column decoder wereimplemented with dynamic NOR logic style [23]. Write logicfor each individual column was designed with basic pass tran-sistor logic. Fig. 20 illustrates the circuit used to write data to

Fig. 17. Monte Carlo analysis of standard 90 nm 6T cell current.

Fig. 18. Monte Carlo analysis of 90 nm portless cell current.

Fig. 19. SNM as a function of temperature.

the cells. The complementary data inputs were combined withthe column select signal and the read/write signal to determinewhen and which bitline would be pulled down for a write cycle.

2608 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007

Fig. 20. Column write circuitry for the simulated cache.

In addition, current sensing circuits were used to amplify the bit-line and dataline signals. A hybrid circuit with a unity-gain cur-rent conveyor and a clamped bitline sense amplifier from [24]was used for the sensing circuit and is shown in Fig. 21. Thecurrent conveyor in the first stage acts as a multiplexer to selectgroups of columns, and each column contains one such currentconveyor. The clamped bitline sense amplifer forms the secondstage of the sensing circuit and detects the differential currenton the local datalines of the cache. The core of this cache wasbased on the long AXS transistor technique from Section IV.A,and was characterized for area, active power during consecutivereads and writes at a 2.5 ns cycle time, leakage power duringstandby at full , and static noise margin. The comparison ofthese metrics to two other previously published, similar designsis presented in Table III where standard scaling techniques havebeen employed to compensate for differences in frequency orvoltage, and with unreported metrics denoted as “NR”. As canbe seen in the table, the new portless cell occupies comparablearea to the two reference designs, but consumes 6 less leakagepower at the same temperature of 110 C. While a 19 reduc-tion of active power was also observed, one must also considerthe differences in consumed power for the peripheral and sup-porting circuitry, neither of which was unreported in the refer-ence design. In addition, the new portless design exhibits a 20%increase in static noise margin demonstrating the stability of thenew cell.

VII. EXPERIMENTAL RESULTS

As a proof of concept, a 16 kB portless cache was fabricatedin a three-metal 0.5 m CMOS process using the long AXStechnique discussed in Section IV-A. The memory was dividedinto 128 columns, each 128 cells tall. A two-stage row de-coder and a single-stage column decoder, both using transmis-sion gate logic, were used to decode the 11 bit input address.Latch-based, voltage-mode sense amplifiers were used for theread cycle and the output was latched using a D flip-flop onthe falling clock edge. All of the necessary signals were gener-ated and timed internally from the global clock using address

TABLE IIIA 32 kb EXTRACTED CACHE COMPARISON IN 0.18 �m CMOS

transition detectors. The final layout and chip micrograph areshown in Fig. 22.

The fabricated chip was tested both on a logic analyzer and asa standalone memory in a DSP system. Fig. 23 shows a single bitof an 8-bit word during write and read operations for differentaddress locations at 45 MHz. As can be seen in the left half ofthe figure, the data sequence “00101011” is written to a randomsequence of addresses. The latched data output during this timeis invalid since the sense amplifiers float during the write cycle.Read operations are performed on the same address sequencein the right half of the figure, and the same data sequence isread out and latched on the falling edge of each clock pulse.It is important to mention that two of the locations written toand read from shared the same column circuitry, an operationalconcern specific to the 5T design where cells of the same columninteract through their shared bitline supplies. In addition, all ofthese measurements were performed without detectable errorsfor worst-case capacitive mismatch in the bitlines as discussedin Section V-C.

VIII. CONCLUSION

A new type of memory cell, termed “portless” SRAM, ispresented as an alternative to the standard 6T SRAM designmethodology. A complete functional and theoretical analysis isgiven to explain how the proposed cell operates with respectto timing, stability, variation, and cache implementation. In ad-dition, a design methodology is presented for creating portlessSRAM cells along with its associated merits compared to stan-dard 6T cell techniques. It is shown that the proposed cell issuperior in the following ways:

1) The portless cell gives the designer an new degree offreedom in the SRAM design space when targeting thecell for particular applications such as low-leakage andhigh SNM.

2) When sized for minimum area, the portless cell offerslower leakage current and higher static noise margins thana standard 6T cell.

3) The portless cell is well suited for current-mode operation.In this regime, dynamic power consumption is reducedthrough minimized bitline and wordline capacitance andreduced bitline voltage swings.

4) The portless cell offers a simple drowsy/sleep implemen-tation since the cells are powered from the bitline insteadof the global supply network.

Simulations and measurements have confirmed these benefitsand show significant improvements in power consumption (19dynamic reduction and 6 leakage reduction) along with a 20%increase in static noise margin. The portless cell is also robust,

WIECKOWSKI et al.: PORTLESS SRAM—A HIGH-PERFORMANCE ALTERNATIVE TO THE 6T METHODOLOGY 2609

Fig. 21. Column read circuitry for the simulated cache.

Fig. 22. The 16 kB portless cache layout and chip micrograph.

Fig. 23. Measured waveforms of the new portless cache at 45 MHz.

demonstrating high tolerance to process and temperature varia-tion. A test chip has been fabricated and tested successfully as aproof of concept to verify the theory behind the proposed cell.

ACKNOWLEDGMENT

The authors would like to acknowledge the insight from theircolleague, Dr. J. Liobe, and the support in test-chip fabricationby the MOSIS service and AMI Semiconductor.

REFERENCES

[1] J. D. Schmidt, “Integrated MOS random-access memory,” Solid-StateDesign, pp. 21–25, 1965.

[2] K. Takeda, Y. Aimoto, N. Nakamura, H. Toyoshima, T. Iwasaki, K.Noda, K. Matsui, S. Itoh, S. Masuoka, T. Horiuchi, A. Nakagawa, K.Shimogawa, and H. Takahashi, “A 16 Mb 400 MHz loadless CMOSfour-transistor SRAM macro,” in IEEE Int. Solid-State Circuits Conf.Dig. Tech. Papers, 2000, pp. 264–265.

[3] B. Wang and J. B. Kuo, “A novel two-port 6T CMOS SRAM cell struc-ture for low-voltage VLSI SRAM with single-bit-line simultaneousread-and-write access (SBLSRWA) capability,” in Proc. 2000 IEEEISCAS, Geneva, Switzerland, 2000, vol. 5, pp. 733–736.

[4] I. Carlson, S. Andersson, S. Natarajan, and A. Alvandpour, “A highdensity, low leakage, 5T SRAM for embedded caches,” in Proc. 30thEur. Solid-State Circuits Conf., Leuven, Belgium, 2004, pp. 215–218.

[5] R. E. Aly, M. I. Faisal, and M. A. Bayoumi, “Novel 7T sram cell forlow power cache design,” in Proc. IEEE Int. SOC Conf., Herndon, VA,2005, pp. 171–174.

2610 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 11, NOVEMBER 2007

[6] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T.Ishii, and H. Kobatake, “A read-static-noise-margin-free SRAM cellfor low-Vdd and high-speed applications,” IEEE J. Solid-State Cir-cuits, vol. 41, no. 1, pp. 113–121, Jan. 2006.

[7] M. M. Khellah and M. I. Elmasry, “A low-power high-performancecurrent-mode multiport SRAM,” IEEE Trans. Very Large Scale Inte-grat. (VLSI) Syst., vol. 9, no. 5, pp. 590–598, Oct. 2001.

[8] M. Wieckowski and M. Margala, “A novel five-transistor (5T) SRAMcell for high performance cache,” in Proc. IEEE Int. SOC Conf., 2005,pp. 101–102.

[9] K. Agarwal and S. Nassif, “Statistical analysis of SRAM cell stability,”in Proc. 43rd ACM/IEEE Design Automation Conf., San Francisco,CA, 2006, pp. 57–62.

[10] E. Seevinck, F. List, and J. Lohstroh, “Static noise margin analysis ofMOS SRAM cells,” IEEE J. Solid-State Circuits, vol. SC-22, no. 5, pp.748–754, Oct. 1987.

[11] J. Lohstroh, E. Seevinck, and J. de Groot, “Worst-case static noisemargin criteria for logic circuits and their mathematical equivalence,”IEEE J. Solid-State Circuits, vol. SC-18, no. 6, pp. 803–807, Dec. 1983.

[12] R. W. Mann, W. W. Abadeer, M. J. Breitwisch, O. Bula, J. S. Brown,and B. C. Colwill, “Ultralow-power SRAM technology,” IBM J. Res.Dev., vol. 47, pp. 553–566, Sep./Nov. 2003.

[13] C. H. Kim, K. Jae-Joon, S. Mukhopadhyay, and K. Roy, “A forwardbody-biased low-leakage SRAM cache: Device, circuit and architec-ture considerations,” IEEE Trans. Very Large Scale Integrat. (VLSI)Syst., vol. 13, no. 3, pp. 349–357, Mar. 2005.

[14] P. Elakkumanan, C. Thondapu, and R. Sridhar, “DG-SRAM: A lowleakage memory circuit,” in Proc. IEEE Int. SOC Conf., 2005, pp.167–170.

[15] K. Kanda, K. Sadaaki, and T. Sakurai, “90% write power-savingSRAM using sense-amplifying memory cell,” IEEE J. Solid StateCircuits, vol. 39, no. 6, pp. 927–933, Jun. 2004.

[16] K. Agawa, H. Hara, T. Takayanagi, and T. Kuroda, “A bitline leakagecompensation scheme for low-voltage SRAMs,” IEEE J. Solid-StateCircuits, vol. 36, no. 5, pp. 726–734, May 2001.

[17] Y. Ye, M. Khellah, D. Somasekhar, A. Farhang, and V. De, “A 6-GHz16-kb L1 cache in a 100-nm dual-Vt technology using a bitline leakagereduction (BLR) technique,” IEEE J. Solid-State Circuits, vol. 38, no.5, pp. 839–842, May 2003.

[18] K. N. Sung, K. Flautner, D. Blaauw, and T. Mudge, “Circuit and mi-croarchitectural techniques for reducing cache leakage power,” IEEETrans. Very Large Scale Integrat. (VLSI) Syst., vol. 12, pp. 167–184,2004.

[19] F. Frustaci, P. Corsonello, S. Perri, and G. Cocorullo, “Techniquesfor leakage energy reduction in deep submicrometer cache memories,”IEEE Trans. Very Large Scale Integrat. (VLSI) Syst., vol. 14, no. 11,pp. 1238–1249, Nov. 2006.

[20] D. Ji-Seong, K. Dae-Wook, L. Sang-Hoon, L. Jong-Bae, P. Young-kwan, Y. Moon-Hyun, and K. Jeong-Taek, “A unified statistical modelfor inter-die and intra-die process variation,” in Proc. Int. Conf. Simu-lation Semicond. Processes Devices, 2005, pp. 131–134.

[21] R. Venkatraman, R. Castagnetti, and S. Ramesh, “The statistics of de-vice variations and its impact on SRAM bitcell performance, leakageand stability,” in Proc. 7th Int. Symp. Quality Electronics Design, SanJose, CA, 2006, p. 6.

[22] R. Heald and P. Wang, “Variability in sub-100 nm SRAM designs,” inProc. Int. Conf. Computer-Aided Design, 2004, pp. 347–352.

[23] B. S. Amrutur and M. A. Horowitz, “Fast low-power decoders forRAMs,” IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 1506–1515,Oct. 2001.

[24] P. Y. Chee, P. C. Liu, and L. Siek, Electron. Lett. “High-speed hybridcurrent-mode sense amplifier for CMOS SRAMs,” Apr. 1992, vol. 28,no. 9, pp. 871–873.

[25] R. V. Joshi, S. P. Kowalczyk, Y. H. Chan, W. V. Huott, S. C. Wilson,and G. J. Scharff, “A 2 GHz cycle, 430 ps access time 34 kb L1 direc-tory SRAM in 1.5 V, 0.18 �m CMOS bulk technology,” in Symp. VLSICircuits Dig. Tech. Papers, 2000, pp. 222–225.

Michael Wieckowski (S’99) received the M.S. andPh.D. degrees in electrical and computer engineeringfrom the University of Rochester, Rochester, NY, in2004 and 2007, respectively.

He is currently a Postdoctoral Research Fellowat the University of Michigan, Ann Arbor. He hasauthored or co-authored nine technical papers injournals and conference proceedings and holdsone pending patent. His main research interestsare high-speed, low-power memory architectures,optoelectronic self-testing circuits, and low-cost

embedded system design.

Sandeep Patil (S’02) received the B.S. degreein electrical and computer engineering from theUniversity of Texas in 2005 and the M.S. degreein electrical engineering from the University ofRochester, Rochester, NY, in 2007. His thesis fo-cused on the design and test of low-energy memoryarchitectures.

His main research interests are high-speed, lowpower memory design, memory testing, and senseamplifier design.

Martin Margala (S’92–M’98–SM’04) received theM.S. degree in microelectronics from Slovak Tech-nical University, Slovakia, in 1990 and the Ph.D. de-gree in electrical and computer engineering from theUniversity of Alberta, Canada, in 1998.

He is currently an Associate Professor withthe Electrical and Computer Engineering Depart-ment, the University of Massachusetts, Lowell.Previously, he was with the University of Rochester,Rochester, NY, and with the University of Alberta.From 1998 to 2003, he was an Adjunct Scientist with

the Telecommunications Research Labs, Edmonton, Canada. He is a memberof program committees of many conferences and symposia in design and test.He holds one patent (five others pending) and is author or coauthor of morethan 100 publications in peer-reviewed journals and conference proceedingson integrated circuit design and test. His main research interests are SoC andSoP testing, parametric monitoring, adaptive built-in-self-test systems, energy-efficient low-voltage mixed-signal design, high-bandwidth and data-processingarchitectures.

Dr. Margala is a member of STC, ITRS workgroup on DFT.