1
SRAM - Basics
Jaydeep P. Kulkarni
Slides are adopted from Prof. Kaushik Roy, Prof. Chris Kim , Prof. SaibalMukhopadhyay, and Prof. Jan Rabaey’s slides
2
Semiconductor Memory Trends
From [Itoh01]
3
4
5
6
7
8
9Variation in Process Parameters
Inter and Intra-die Variations
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
# d
op
ant
ato
ms Source: Intel
Random Dopant Fluctuation (RDF)
• SRAM Transistors – Minimum geometry
• Inter-die and Intra-die variations
• LER, RDF induced device mismatch
Line Edge Roughness (LER)
10Parametric Failures: Read Failure
Read failure => Flipping of Cell Data while Reading
BL BR
WL
VL=‘1’
VR=‘0’VREAD
+∆∆∆∆
-∆∆∆∆
+∆∆∆∆
-∆∆∆∆NR
PR
NL
PLAXRAXL
VTRIPRDVR=VREAD
VL
WL
Vo
ltag
e
Time ->
WL
VR
VL
Vo
ltag
e
Time ->( )TRIPRDREADRF VVPP >=
11Mechanisms of Parametric Failures
WLVR
VL
Vo
ltag
e
Time ->
WL
VR
VL
Vo
ltag
e
Time -> Time ->
VR VL
Vo
ltag
eVDDH
Read Failure
Write Failure Hold Failure
Time ->
∆∆∆∆MIN
WL
BL
BR
Access Failure
Vo
ltag
e
12
13
14
15
16
17
18
19
20
21SRAM Bitcell Layouts
M. Ishida, IEDM 98
Tall-cell
Thin-cell
22
Tall Cell Layout
23
Thin Cell Layout
• 2 Poly-pitch, N-well continuous, Shared Contacts
• WL (Horizontal), BL(Vertical), Lower bitline capacitance
24
Upsizing Thin-cell size
Increasing cell area by 2X increase device widths by 4X !
25
Trends in Memory Cell Area
From [Itoh01]
26
Row
Dec
oder
Bit line2L 2 K
Word line
AK
AK1 1
AL 2 1
A0
M.2K
AK2 1
Sense amplifiers / Drivers
Column decoder
Input-Output(M bits)
Storage cell
Array-Structured Memory ArchitectureProblem: ASPECT RATIO or HEIGHT >> WIDTH
Amplify swing torail-to-rail amplitude
Selects appropriateword
27
Hierarchical Memory Architecture
Advantages:Advantages:1. Shorter wires within blocks1. Shorter wires within blocks2. Block address activates only 1 block => power savings2. Block address activates only 1 block => power savings
Globalamplifier/driver
Controlcircuitry
Global data bus
Block selector
Block 0
Rowaddress
Columnaddress
Blockaddress
Block i Block P 2 1
I/O
28
Memory Architecture: Decoders
Word 0
Word 1
Word 2
Word N2 2
Word N2 1
Storagecell
M bits M bits
N
w o r d s
S0
S1
S2
SN2 2
A 0
A 1
A K2 1
K 5 log2N
SN2 1
Word 0
Word 1
Word 2
Word N2 2
Word N2 1
Storagecell
S0
Input-Output(M bits)
Intuitive architecture for N x M memoryToo many select signals:
N words == N select signals K = log2NDecoder reduces the number of select signals
Input-Output(M bits)
D e c o d e r
29Hierarchical Decoders
• • •
• • •
A2A2
A2A3
WL 0
A2A3A2A3A2A3
A3 A3A 0A0
A0A1A0A1A0A1A0A1
A1 A1
WL 1
Multi-stage implementation improves performance
NAND decoder usingNAND decoder using22--input preinput pre--decodersdecoders
30
Dynamic Decoders
Precharge devices
VDD φφφφ
GND
WL3
WL2
WL1
WL0
A0A0
GND
A1A1φφφφ
WL3
A0A0 A1A1
WL 2
WL 1
WL 0
VDD
VDD
VDD
VDD
2-input NOR decoder 2-input NAND decoder
31
Column Decoder
Advantages: speed (tpd does not add to overall memory access time)Only one extra transistor in signal path
Disadvantage: Large transistor count
2 - i n p u t N O R d e c o d e r
A0S0
BL 0 BL 1 BL 2 BL 3
A1
S1
S2
S3
D
32Tree based column decoder
Number of devices drastically reducedDelay increases quadratically with # of sections; prohibitive for large decoders
buffersprogressive sizingcombination of tree and pass transistor approaches
Solutions:
BL 0 BL 1 BL 2 BL 3
D
A 0
A 0
A1
A 1
33
Sense Amplifiers
tpC ∆V×
Iav----------------=
make ∆V as smallas possible
smalllarge
Idea: Use Sense Amplifer
outputinput
s.a.smalltransition
34
Differential Sense Amplifier
M 4
M 1
M 5
M 3
M 2
V DD
bitbit
SE
Outy
35Differential Sensing ― SRAMVDD
VDD
VDD
VDD
BL
EQ
Diff.SenseAmp
(a) SRAM sensing scheme (b) two stage differential amplifier
SRAM cell i
WL i
2xx
VDD
Output
BL
PC
M3
M1
M5
M2
M4
x
SE
SE
SE
Output
SE
x2x 2x
y
y
2y
36Latch-Based Sense Amplifier
Initialized in its meta-stable point with EQ
Once adequate voltage gap created, sense amp enabled with SEPositive feedback quickly forces output to a stable operating point.
EQ
VDD
BL BL
SE
SE
37Decoupled Sense Amplifier
Y. Wang, Intel, ISSCC’07
38
39
40
41
42
43
44
45
46
47
48Bitline Leakage Compensation - II
M. Khellah, Intel, VLSI’05
• Less transistor count
• Easy to implement in within column pitch
• No complicated timing
• Calibration after every write
49
50
51