Sp09 CMPEN 411 L23 S.1
CMPEN 411VLSI Digital Circuits
Spring 2009
Lecture 23: Memory Cell DesignsSRAM, DRAM
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L23 S.2
Heads-up
IBM Kerry Bernstein’s talk Thursday 4 PM, IST 333 To prepare for his talk, go to ANGEL system, find the file “New
dimensions in performance”, under “interesting reading materials”
To make up last cancelled lecture: Kerry Bernstein’s talk – “Microarchitecture’s Race for
Performance and Power”, PSU talk, 11/2004, Slides and Videos are online in ANGEL system “Interesting Reading Materials”
DAC Young Student Scholarship
www.dac.com
Sp09 CMPEN 411 L23 S.3
Review: Basic Building Blocks
Datapath Execution units
- Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers Multiplexers, decoders
Control Finite state machines (PLA, ROM, random logic)
Interconnect Switches, arbiters, buses
Memory ROM, Caches (SRAMs), CAM, DRAMs, buffers
Sp09 CMPEN 411 L23 S.4
2D 4x4 SRAM Memory Bank
A0
Row
Dec
oder
!BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
BL
WL[1]
WL[2]
WL[3]
bit line precharge
2 bit words
clocking and control
enable
read precharge
BLi BLi+1
Sp09 CMPEN 411 L23 S.5
6-Transistor SRAM Storage Cell
!BL BL
WL
M1
M2
M3
M4
M5
M6Q
!Q 10
on
onoff
off
Sp09 CMPEN 411 L23 S.6
SRAM Cell Analysis (Read)
!BL=2.5V BL=2.5V
WL=1
M1
M4
M5
M6Q=1!Q=0
CbitCbit
Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area M1 must be stronger than M5 when storing a 1 (as shown) M3 must be stronger than M6 when storing a 0
0
Sp09 CMPEN 411 L23 S.7
Read Voltage Ratios
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
Cell Ratio (CR)
Vol
tage
Ris
e on
!Q
VDD = 2.5VVTn = 0.4V
where CR is the Cell Ratio = (W1/L1)/(W5/L5)
Keep cell size minimal while maintaining read stability Make M1 minimum size
and increase the L of M5 (to make it weaker)
- increases load on WL
Make M5 minimum size and increase the W of M1 (to make it stronger)
Similar constraints on (W3/L3)/(W6/L6) when storing a 0
1.2
Sp09 CMPEN 411 L23 S.8
SRAM Cell Analysis (Write)
!BL=2.5V BL=0V
WL=1
M1
M4
M5
M6Q=1!Q=0
CbitCbit
The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M6. M6 must be able to overpower M4 when storing a 1 and writing a 0 M5 must be able to overpower M2 when storing a 0 and writing a 1
0
Sp09 CMPEN 411 L23 S.9
Write Voltage Ratios
0
0.1
0.2
0.3
0.4
0.5
0 0.5 1 1.5 2
Pullup Ratio (PR)
Wri
te V
olta
ge (V
Q)
VDD = 2.5V|VTp| = 0.4V
p/n = 0.5
where PR is the Pull-up Ratio = (W4/L4)/(W6/L6)
Keep cell size minimal while allowing writes Make M4 and M6
minimum size
1.8
Sp09 CMPEN 411 L23 S.10
Cell Sizing and Performance
Keeping cell size minimal is critical for large SRAMs Minimum sized pull down fets (M1 and M3)
- Requires longer than minimum channel length, L, pass transistors (M5 and M6) to ensure proper CR
- But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle
Minimum width and length pass transistors
- Boost the width of the pull downs (M1 and M3)
- Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger
Performance is determined by the read operation To accelerate the read time, SRAMs use sense amplifiers (so
that the bit line doesn’t have to make a full swing)
Sp09 CMPEN 411 L23 S.11
6-T SRAM Layout
VDD
GND
WL
BLBL
M1 M3
M4M2
M5 M6
Simple and reliable, but big signal routing and connections
to two bit lines, a word line, and both supply rails
Area is dominated by the wiring and contacts
Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process
Sp09 CMPEN 411 L23 S.12
Multiple Read/Write Port Storage Cell
!BL1 BL1
WL1
M1
M2
M3
M4
M5 M6Q!Q
WL2
BL2!BL2
M7 M8
To avoid read upset, the widths of M1 and M3 will have to be sized up by a factor equal to the number of simultaneously open read ports
Sp09 CMPEN 411 L23 S.13
Resistance-load SRAM Cell
M3
RL RL
VDD
WL
Q Q
M1 M2
M4
BL BL
Sp09 CMPEN 411 L23 S.14
Remove R
M3
WL
M1 M2
M4
BL BL
Sp09 CMPEN 411 L23 S.15
Remove R
M3
WL
M2
M4
Further remove one transistor
Sp09 CMPEN 411 L23 S.16
3-Transistor DRAM Cell
M1 M2
M3
X
BL1 BL2
WWL
RWL
X VDD-VT
BL1VDD
WWL write
RWL read
BL2 VDD-VT V
Cs
Write: Cs is charged (or discharged) by asserting WWL and BL1 Value stored at node X when writing a 1 is VWWL - VTn
Read: Cs is “sensed” by asserting RWL and observing BL2 Read is non-destructive and inverting (ratioless)
Sp09 CMPEN 411 L23 S.17
3-Transistor DRAM Cell
M1 M2
M3
X
BL1 BL2
WWL
RWL
X VDD-VT
BL1VDD
WWL write
RWL read
BL2 VDD-VT V
Cs
Refresh: read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec)
Note Vt drop at x: how to fix it?
Sp09 CMPEN 411 L23 S.18
3-T DRAM Layout
BL2 BL1 GND
RWL
WWL
M3
M2
M1
Fewer contacts & wires
Total cell area is 576 2 (compared to 1,092 2 for the 6-T SRAM cell)
No special processing steps are needed (so compatible with logic CMOS process)
Can use bootstrapping (raise VWWL to a value higher than VDD) to eliminate threshold drop when storing a “1”
Sp09 CMPEN 411 L23 S.19
1-Transistor DRAM Cell
M1 X
BL
WL
X VDD-VT
WLwrite
1
BL VDD
Cs
read1
VDD/2 sensing
CBL
Write: Cs is charged (or discharged) by asserting WL and BL
Read: Charge redistribution occurs between CBL and Cs
Read is destructive, so must refresh after read
Voltage swing is small
Sp09 CMPEN 411 L23 S.20
Sense Amp Operation
V(1)
V(0)
t
VPRE
VBL
Sense amp activatedWord line activated
Sp09 CMPEN 411 L23 S.21
1-T DRAM Cell Observations Cell is single ended (complicates the design of the sense
amp) Cell requires a sense amp for each bit line due to charge
redistribution based read BL’s precharged to VDD/2 (not VDD as with SRAM design) all previous designs used SAs for speed, not functionality
Cell read is destructive; refresh must follow to restore data
Cell requires an extra capacitor (CS) that must be explicitly included in the design May not compatible with logic CMOS process
A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than VDD)
Sp09 CMPEN 411 L23 S.22
1-T DRAM (3-D capacitor)
Source: IBMNon-CMOS
Sp09 CMPEN 411 L23 S.23
Peripheral Memory Circuitry
Row and column decoders
Read bit line precharge logic
Sense amplifiers
Timing and control
Speed
Power consumption
Area – pitch matching
Sp09 CMPEN 411 L23 S.24
2D 4x4 __RAM Memory
A0
Row
Dec
oder
!BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
BL
WL[1]
WL[2]
WL[3]
bit line precharge
2 bit words
clocking and control
enable
read precharge
BLi BLi+1
Sp09 CMPEN 411 L23 S.25
2D 4x4 ___RAM Memory
A0
Row
Dec
oder
BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
WL[1]
WL[2]
WL[3]
bit line precharge
2 bit words
BL0 BL1 BL2 BL3
clocking, control, and
refresh
enable
read precharge
Sp09 CMPEN 411 L23 S.26
Row Decoders
Collection of 2M complex logic gates organized in a regular, dense fashion
(N)AND decoder for 8 address bits
WL(0) = !A7 & !A6 & !A5 & !A4 & !A3 & !A2 & !A1 & !A0
…
WL(255) = A7 & A6 & A5 & A4 & A3 & A2 & A1 & A0
NOR decoder for 8 address bits
WL(0) = !(A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0)
…
WL(255) = !(!A7 | !A6 | !A5 | !A4 | !A3 | !A2 | !A1 | !A0)
Goals: Pitch matched, fast, low power
Sp09 CMPEN 411 L23 S.27
Dynamic Decoders
Precharge devices
VDD
GND
WL3
WL2
WL1
WL0
A0A0
GND
A1A1
WL3
A0A0 A1A1
WL 2
WL 1
WL 0
VDD
VDD
VDD
VDD
2-input NOR decoder 2-input NAND decoder
Which one is faster? Smaller? Low power?
Sp09 CMPEN 411 L23 S.28
Pass Transistor Based Column DecoderBL3 BL2 BL1 BL0
data_out
2 in
put
NO
R d
ecod
erA1
A0
S3
S2
S1
S0
Read: connect BLs to the Sense Amps (SA) Writes: drive one of the BLs low to write a 0 into the cell
Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2K + 2 x 2K)
For K = 2 3 x 22 (decoder) + 2 x 22 (PTs) = 12 + 8 = 20
!BL3 !BL2 !BL1 !BL0
!data_out
Sp09 CMPEN 411 L23 S.29
Tree Based Column DecoderBL3 BL2 BL1 BL0
A0
!A0
A1
!A1
data_out Number of transistors = (2 x 2 x (2K -1))
for K = 2 2 x 2 x (22 – 1) = 4 x 3 = 12
Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders)
can fix with buffers, progressive sizing, combination of tree and pass transistor approaches
!BL3 !BL2 !BL1 !BL0
!data_out
Sp09 CMPEN 411 L23 S.30
Bit Line Precharge Logic
equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line
!PC
!BLBL
First step of a Read cycle is to precharge (PC) the bit lines to VDD
every differential signal in the memory must be equalized to the same voltage level before Read
Turn off PC and enable the WL the grounded PMOS load
limits the bit line swing (speeding up the next precharge cycle)
Sp09 CMPEN 411 L23 S.31
Sense Amplifiers Amplification – resolves data
with small bit line swings (in some DRAMs required for proper functionality)
Delay reduction – compensates for the limited drive capability of the memory cell to accelerate BL transition
SA
input output
tp = ( C * V ) / Iav
large
small
make V as small as possible
Power reduction – eliminates a large part of the power dissipation due to charging and discharging bit lines
Signal restoration – for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh
Sp09 CMPEN 411 L23 S.32
Differential Sense Amplifier
Directly applicable toSRAMs
M4
M1
M5
M3
M2
VDD
bitbit
SE
Outy
Sp09 CMPEN 411 L23 S.33
Differential Sensing ― SRAM
VDD
VDD
VDD
VDD
BL
EQ
Diff.SenseAmp
(a) SRAM sensing scheme (b) two stage differential amplifier
SRAM cell i
WL i
2xx
VDD
Output
BL
PC
M3
M1
M5
M2
M4
x
SE
SE
SE
Output
SE
x2x 2x
Sp09 CMPEN 411 L23 S.35
Redundancy in the Memory Structure
Row address
Column address
Redundant row
Redundant columns
Fuse bank
Sp09 CMPEN 411 L23 S.36
Page 4
== ?
== ?
Redundant Wordline
Redundant Wordline
Fused RepairAddresses
EnableNormalWordlineDecoder
Normal Wordline
Functional Address
== ?
== ?
Redundant Wordline
Redundant Wordline Fused RepairAddresses
Enable
NormalWordlineDecoder
Normal Wordline
Row Redundancy
Sp09 CMPEN 411 L23 S.37
Page 5
Column Redundancy
Re
du
nda
nt D
ata
Co
lum
n
Norm
al D
ata
Co
lum
n
Norm
al D
ata
Co
lum
n
No
rma
l Data
Co
lum
n
Norm
al D
ata
Co
lum
n
No
rma
l Data
Co
lum
n
No
rma
l Data
Co
lum
n
No
rmal D
ata
Colu
mn
No
rma
l Data
Co
lum
nF
use
Fuse
Fu
se
Fu
se
Fu
se
Fu
se
Fu
se
Fu
se
Data 0
Data 1
Data 2
Data 3
Data 4
Data 5
Data 7
Data 6
Sp09 CMPEN 411 L23 S.38
Error-Correcting Codes
Example: Hamming Codes
e.g. If B3 flips
1
1
0
= 3
2K>= m+k+1. m # data bit, k # check bitFor 64 data bits, needs 7 check bits
Sp09 CMPEN 411 L23 S.39
Performance and area overhead for ECC
Sp09 CMPEN 411 L23 S.40
Redundancy and Error Correction
Sp09 CMPEN 411 L23 S.41
Soft Errors
Nonrecurrent and nonpermanent errors from
alpha particles (from the packaging materials)
neutrons from cosmic rays
As feature size decreases, the charge stored at each node decreases (due to a lower node capacitance and lower VDD) and thus Qcritical (the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER)
1
10
100
1000
10000
0.25 0.18 0.13 0.09 0.05
Process Technology
Sys
tem
FIT
S
From Semico Research Corp.
MTBF (hours)
.13 m .09 m
Ground-based 895 448
Civilian Avionics System 324 162
Military Avionics System 18 9
From Actel
Sp09 CMPEN 411 L23 S.42
Scary Fact
Avionics system in civilian aviation: altitude of 30,000 feet on a route crossing the north pole both cause increase in neutron flux. If avionics board uses four 1M 130nm SRAM-based FPGAs, it would be subject to 0.074 upsets per day = 324 hours between upsets or 3million FITs. Assume one such system on-board each commercial aircraft, 4,000 civilian flights per day, 3 hours average flight time. Nearly 37 aircraft will experience a neutron-induced SRAM-based FPGA configuration failure during the duration of their flight.
Sp09 CMPEN 411 L23 S.43
Modeling of a particle strike
Sp09 CMPEN 411 L23 S.44
A SPICE simulation for SRAM
A particle strike
!BLBL
WL
0->11->0
0
Sp09 CMPEN 411 L23 S.45
On-chip Memory: ITRS roadmap
0
20
40
60
80
100
% D
ie u
tiliz
ation
Area Reused LogicArea New LogicArea Memory
Sp09 CMPEN 411 L23 S.46
State of Art
Sp09 CMPEN 411 L23 S.47
State of Art