semiconductor memories mohammad sharifkhani. outline introduction non-volatile memories
TRANSCRIPT
Semiconductor Memories
Mohammad Sharifkhani
Outline
• Introduction
• Non-volatile memories
Semiconductor Memory Classification
Read-Write MemoryNon-VolatileRead-Write
Memory
Read-Only Memory
EPROM
E2PROM
FLASH
RandomAccess
Non-RandomAccess
SRAM
DRAM
Mask-Programmed
Programmable (PROM)
FIFO
Shift Register
CAM
LIFO
Memory Timing: Definitions
Write cycleRead access Read access
Read cycle
Write access
Data written
Data valid
DATA
WRITE
READ
Memory Architecture: Decoders
Word 0
Word 1
Word 2
WordN22
WordN21
Storagecell
M bits M bits
S0
S1
S2
SN - 2
A0
A1
AK-1
K = log2N
SN- 1
Word 0
Word 1
Word 2
WordN22
WordN21
Storagecell
S0
Input-Output(M bits)
Intuitive architecture for N x M memoryToo many select signals:
N words == N select signals K = log2NDecoder reduces the number of select signals
Input-Output(M bits)
Array-Structured Memory ArchitectureProblem: ASPECT RATIO or HEIGHT >> WIDTH
Amplify swing torail-to-rail amplitude
Selects appropriateword
Row
Dec
oder
Bit line2m
Word line
An
An+1
An+m
A0
M.2n
An - 1Column decoder
Input-Output(M bits)
Storage cell in the accessed
word
Hierarchical Memory Architecture
Advantages:Advantages:1. Shorter wires within blocks1. Shorter wires within blocks2. Block address activates only 1 block => power savings2. Block address activates only 1 block => power savings
Globalamplifier/driver
Controlcircuitry
Global data bus
Block selector
Block 0
Rowaddress
Columnaddress
Blockaddress
Blocki BlockP 2 1
I/O
Block Diagram of 4 Mbit SRAM
Clockgenerator
CS, WEbuffer
I/Obuffer
Y-addressbuffer
X-addressbuffer
x1/x4controller
Z-addressbuffer
X-addressbuffer
Predecoder and block selectorBit line load
Transfer gateColumn decoder
Sense amplifier and write driver
[Hirose90]
Contents-Addressable Memory
Ad
dre
ss D
eco
de
r
Data (64 bits)
I/O
Bu
ffe
rs
Comparand
CAM Array29 words3 64 bits
Mask
Control LogicR/W Address (9 bits)
Co
mm
an
ds
29 Va
lid
ity
Bit
s
Prio
rity
En
cod
er
Memory Timing: Approaches
DRAM TimingMultiplexed Adressing
SRAM TimingSelf-timed
Addressbus
RAS
RAS-CAS timing
Row Address
AddressBus
Address transitioninitiates memory operation
Address
Column Address
CAS
• Introduction
• Non volatile memories
Non-Volatile MemoriesThe Floating-gate transistor
(FAMOS)
Floating gate
Source
Substrate
Gate
Drain
n+ n+_p
tox
tox
Device cross-section Schematic symbol
G
S
D
Floating-Gate Transistor Programming
0 V
- 5 V 0 V
DS
Removing programming voltage leaves charge trapped
5 V
- 2.5 V 5 V
DS
Programming results in higher VT.
20 V
10 V 5 V 20 V
DS
Avalanche injection
A “Programmable-Threshold” Transistor
“0”-state “1”-state
DVT
VWL VGS
“ON”
“OFF”
FLOTOX EEPROM
Floating gate
Source
Substratep
Gate
Drain
n1 n1
FLOTOX transistorFowler-Nordheim I-V characteristic
20–30 nm
10 nm
-10 V
10 V
I
VGD
EEPROM Cell
WL
BL
VDD
Absolute threshold controlis hardUnprogrammed transistor might be depletion always on 2 transistor cell
Flash EEPROM
Control gate
erasure
p-substrate
Floating gate
Thin tunneling oxide
n1source n1drainprogramming
Many other options …
Cross-sections of NVM cells
EPROMFlashCourtesy Intel
Basic Operations in a NOR Flash Memory―
Erase
S D
12 VG
cell arrayBL0 BL1
open open
WL0
WL1
0 V
0 V
Basic Operations in a NOR Flash Memory―
Write
S D
12 V
6 VG
BL0 BL1
6 V 0 V
WL0
WL1
12 V
0 V
Basic Operations in a NOR Flash Memory―
Read5 V
1 VG
S D
BL0 BL1
1 V 0 V
WL0
WL1
5 V
0 V
NAND Flash Memory
Unit Cell
Word line(poly)
BL
Courtesy Toshiba
Gate
ONO
FGGateOxide
Select line
Select line
Source line(Diff. Layer)
NAND Flash Memory
Word linesSelect transistor
Bit line contact Source line contact
Active area
STI
Courtesy Toshiba
Characteristics of State-of-the-art NVM
Outline
• Introduction
• Non-volatile memories
• RAM
Read-Write Memories (RAM) STATIC (SRAM)
DYNAMIC (DRAM)
Data stored as long as supply is appliedLarge (6 transistors/cell)FastDifferential
Periodic refresh requiredSmall (1-3 transistors/cell)SlowerSingle Ended
6-transistor CMOS SRAM Cell
WL
BL
VDD
M5M6
M4
M1
M2
M3
BL
CMOS SRAM Analysis (Read)WL
BL
VDD
M 5
M 6
M 4
M1VDDVDD VDD
BL
Q = 1Q = 0
Cbit Cbit
CMOS SRAM Analysis (Read)
0
0
0.2
0.4
0.6
0.8
1
1.2
0.5 1 1.2 1.5 2Cell Ratio (CR)
2.5 3
Vo
ltage
Ris
e (
V)
CMOS SRAM Analysis (Write)
BL = 1 BL = 0
Q = 0
Q = 1
M1
M4
M5
M6
VDD
VDD
WL
CMOS SRAM Analysis (Write)
6T-SRAM — Layout
VDD
GND
WL
BLBL
M1 M3
M4M2
M5 M6
Decreasing Word Line Delay
Metal bypass
Polysilicon word lineK cells
Polysilicon word lineWL
Driver
(b) Using a metal bypass
(a) Driving the word line from both sides
Metal word line
WL
(c) Use silicides
Resistance-load SRAM Cell
Static power dissipation -- Want RL largeBit lines precharged to VDD to address tp problem
M3
RL RL
VDD
WL
Q Q
M1 M2
M4
BL BL
SRAM Characteristics
• Introduction
• Non-volatile memories
• RAM– SRAM– DRAM
3-Transistor DRAM Cell
No constraints on device ratiosReads are non-destructiveValue stored at node X when writing a “1” = VWWL-VTn
WWL
BL1
M1 X
M3
M2
CS
BL2
RWL
VDD
VDD 2VT
DVVDD 2VTBL2
BL1
X
RWL
WWL
3T-DRAM — Layout
BL2 BL1 GND
RWL
WWL
M3
M2
M1
1-Transistor DRAM Cell
Write: CS is charged or discharged by asserting WL and BL.Read: Charge redistribution takes places between bit line and storage capacitance
Voltage swing is small; typically around 250 mV.
M1
CS
WL
BL
CBL
VDD2 VT
WL
X
sensing
BL
GND
Write 1 Read 1
VDD
VDD /2 V
V BL VPRE– VBIT VPRE–CS
CS CBL+------------= =V
DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells.The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD
Sense Amp Operation
DV(1)
V(1)
V(0)
t
VPRE
VBL
Sense amp activatedWord line activated
1-T DRAM Cell
Uses Polysilicon-Diffusion Capacitance
Expensive in Area
M1 wordline
Diffusedbit line
Polysilicongate
Polysiliconplate
Capacitor
Cross-section Layout
Metal word line
Poly
SiO2
Field Oxiden+ n+
Inversion layerinduced byplate bias
Poly
SEM of poly-diffusion capacitor 1T-DRAM
Advanced 1T DRAM Cells
Cell Plate Si
Capacitor Insulator
Storage Node Poly
2nd Field Oxide
Refilling Poly
Si Substrate
Trench Cell Stacked-capacitor Cell
Capacitor dielectric layerCell plateWord line
Insulating Layer
IsolationTransfer gate
Storage electrode
GND
Static CAM Memory Cell
CAM
Bit
Word
Bit
••• CAM
Bit Bit
CAM
Word
Wired-NOR Match Line
Match M1
M2
M7M6
M4 M5M8 M9
M3int
SWord
••• CAM
Bit Bit
S
CAM in Cache Memory
CAM
ARRAY
Input Drivers
Tag HitAddress
SRAM
ARRAY
Sense Amps / Input Drivers
DataR/W
• Introduction
• Non-volatile memories
• RAM
• Periphery circuits
Periphery
Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry
Row DecodersCollection of 2M complex logic gatesOrganized in regular and dense fashion
(N)AND Decoder
NOR Decoder
Hierarchical Decoders
• • •
• • •
A2A2
A 2A3
WL 0
A2A3A2A 3A2A3
A3 A3A 0A0
A0A 1A 0A1A0A1A0A1
A 1 A1
WL 1
Multi-stage implementation improves performance
NAND decoder usingNAND decoder using2-input pre-decoders2-input pre-decoders
Dynamic Decoders
Precharge devices
VDD
GND
WL3
WL2
WL1
WL0
A0A0
GND
A1A1
WL3
A0A0 A1A1
WL 2
WL 1
WL 0
VDD
VDD
VDD
VDD
2-input NOR decoder 2-input NAND decoder
Active low inputs (all are high except for the selected WL which is low)
4-input pass-transistor based column decoder
Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal pathDisadvantage: Large transistor count
A0S0
BL 0 BL 1 BL 2 BL 3
A1
S1
S2
S3
D
4-to-1 tree based column decoder
Number of devices drastically reducedDelay increases quadratically with # of sections; prohibitive for large decoders
buffersprogressive sizingcombination of tree and pass transistor approaches
Solutions:
BL 0 BL 1 BL 2 BL 3
D
A 0
A 0
A1
A 1
Decoder for circular shift-register
VDD
VDD
R
WL0
VDD
f
ff
f
VDD
R
WL1
VDD
f
ff
f
VDD
R
WL2
VDD
f
ff
f• • •
Sense Amplifiers
tpC V
Iav----------------=
make V as smallas possible
smalllarge
Idea: Use Sense Amplifer
outputinput
s.a.smalltransition
Differential Sense Amplifier
Directly applicable toSRAMs
M4
M1
M5
M3
M2
VDD
bitbit
SE
Outy
Differential Sensing ― SRAMVDD
VDD
VDD
VDD
BL
EQ
Diff.SenseAmp
(a) SRAM sensing scheme (b) two stage differential amplifier
SRAM cell i
WL i
2xx
VDD
Output
BL
PC
M3
M1
M5
M2
M4
x
SE
SE
SE
Output
SE
x2x 2x
Latch-Based Sense Amplifier (DRAM)
Initialized in its meta-stable point with EQOnce adequate voltage gap created, sense amp enabled with SEPositive feedback quickly forces output to a stable operating point.
EQ
VDD
BL BL
SE
SE
Charge-Redistribution Amplifier
0.5
1.0
1.5
2.0
2.5
0.0
0.0 1.00 2.00time (nsec)
V
Vin
Vref5 3V
VL
VS
(b) Transient response
3.00
Concept
M2 M3
M1VL VS
Vref
CsmallClarge
Transient Response
Charge-Redistribution Amplifier―EPROM
SE
VDD
WLC
Load
Cascodedevice
Columndecoder
EPROMarray
BL
WL
Vcasc
Out
Cout
Ccol
CBLM1
M2
M3
M4
Single-to-Differential Conversion
How to make a good Vref?
Diff.S.A.Cell
2xx
Output
WL
Vref
BL
12
Open bitline architecture with dummy cells
CS CS CS CS
BLL
L L1 L0 R0
CS
R1
CS
L
… …
BLR
VDD
SE
SE
EQ
Dummy cell Dummy cell
DRAM Read Process with Dummy Cell
3
2
1
00 1 2 3
BL
BL
t (ns)
reading 03
2
1
00 1 2 3
SE
EQ WL
t (ns)
control signals
3
2
1
00 1 2 3
BL
BL
t (ns)
reading 1
Voltage Regulator
-
+
VDD
VREF
Vbias
Mdrive
Mdrive
VDL
VDL
VREF
Equivalent Model
Charge Pump
CLK
VDD
A BM1
M2Vload
Cload
Cpump
2VDD 2 VT
VDD 2 VT
0 V
VB
Vload
0 V
Q=Cpump (VDD-Vt)
-
-
DRAM Timing
SDRAM Timing
A chunk of data is processed at the same time effective when data is written in large sequential blocks
RDRAM Architecture
memoryarray
Databus
Clocks
Column
Rowdemux packet dec.
packet dec.
Bus
k kx l
demux
Rambus DRAM to reduce the access time
Synch. DRAM
Operates at uP clock speed
up to 1.6 GB/sec bandwidth
Highly parallel: A large number of bits can be read/write at the same time interface ; fast and synch
Address Transition Detection
DELAYtdA0
DELAYtdA1
DELAYtdAN2 1
VDD
ATD ATD
…
• Introduction
• Non-volatile memories
• RAM
• Periphery
• Reliability
Reliability and Yield
Sensing Parameters in DRAM
From [Itoh01]
4K
10
100
1000
64K 1M 16M256M 4G 64GMemory Capacity (bits/chip)
CD(1F)
CS(1F)
QS(1C)
Vsmax(mv)
VDD(V)
QS= CS VDD/2Vsmax= QS/(CS 1 CD)
Noise Sources in 1T DRam
Ccross
electrode
a-particles
leakage CS
WL
BL substrate Adjacent BL
CWBL
Open Bit-line Architecture —Cross Coupling
SenseAmplifierC
WL1
BL
CBL
CWBL CWBL
CC
WL0
CCBL
C C
WLD WLD WL0 WL1
BL
EQ
Folded-Bitline Architecture
SenseAmplifier
C
WL1
CWBL
CWBL
C
WL0 WL0 WLD
CC
WL1
CC
WLD
BL CBL
BL CBL
EQ
x
x
y
Transposed-Bitline Architecture
SA
Ccross
(a) Straightforward bit-line routing
(b) Transposed bit-line architecture
BL9
BL
BL
BL99
SA
Ccross
BL9
BL
BL
BL99
Alpha-particles (or Neutrons)
1 Particle ~ 1 Million Carriers
WL
BL
VDD
n1
a-particle
SiO21
111
11
22
22
22
Yield
Yield curves at different stages of process maturity(from [Veendrick92])
Redundancy
MemoryArray
Column Decoder
Redundantrows
Redundantcolumns
RowAddress
ColumnAddress
FuseBank:
Error-Correcting Codes
Example: Hamming Codes
with
e.g. B3 Wrong
1
1
0
= 3
Redundancy and Error Correction
Sources of Power Dissipation in Memories
PERIPHERY
ROWDEC
selected
non-selected
CHIP
COLUMN DEC
nCDEV INTf
mCDEV INTf
CPTV INTf
IDCP
ARRAY
m
n
m(n- 1)ihld
miact
VDD
VSS
IDD = ΣCiΔV if+Σ IDCP
From [Itoh00]
Data Retention in SRAM1.30u
1.10u
900n
700n
500n
300n
100n
0.00 .600 1.20 1.80
Factor 7
0.13 m CMOSm
0.18 m CMOSm
VDD
I lea
kag
e
SRAM leakage increases with technology scaling
Suppressing Leakage in SRAM
SRAMcell
SRAMcell
SRAMcell
VDD,int
VDD
VDD VDDL
VSS,int
sleep
sleep
SRAMcell
SRAMcell
SRAMcell
VDD,int
sleep
low-threshold transistor
Reducing the supply voltageReducing the supply voltageInserting Extra ResistanceInserting Extra Resistance
Data Retention in DRAM101
100
102 1
102 2
102 3
102 4
102 5
102 6
15M 64M 255M 1G 4G 15G 64G
Capacity (bit)
Curr
ent
(A)
3.3 2.5 2.0 1.5 1.2 1.0 0.8
Operating voltage (V)
0.53 0.40 0.32 0.24 0.19 0.16 0.13
Extrapolated threshold voltage at 25 C (V)
IACT
IAC
IDC
Cycle time : 150 nsT 5 75 C,S
From [Itoh00]
Case Studies
• SRAM
• Flash Memory
4 Mbit SRAMHierarchical Word-line Architecture
Global word line
Sub-global word line
Block groupselect
Blockselect
Blockselect
Memory cell
Localword line
Block 0
•••
Localword line
Block 1
•••
Block 2...
•••
Bit-line Circuitry
Bit-lineload
Blockselect ATD
BEQ
Local WL
Memory cell
I/O lineI/O
B/T
CD
Sense amplifier
CD CD
I/O
B/T
Sense Amplifier (and Waveforms)
BS
I /O I /O
DATA
Blockselect ATD
BSSA SA
BS
SEQ
SEQ
SEQ
SEQSEQ
Dei
I/O Lines
Address
Data-cut
ATD
BEQ
SEQ
DATA
Vdd
GND
SA, SA
Vdd
GND
1 Gbit Flash Memory
Sense Latches(10241 32)3 8
Data Caches(10241 32)3 8
Sense Latches(10241 32)3 8
Data Caches(10241 32)3 8
Wo
rd L
ine
Dri
ver
Wo
rd L
ine
Dri
ver
Wo
rd L
ine
Dri
ver
Wo
rd L
ine
Dri
ver
512Mb Memory Array 512Mb Memory Array
BL0 BL1 ····· BL16895 BL16996 BL16897··· BL33791
SGDWL31
WL0SGS
Block0
BLT0
Block1023
Block0
Block1023
Bit Line Control CircuitBLT1
I/O
From [Nakamura02]
Writing Flash MemoryN
um
be
r of
me
mo
ry c
ell
s
0V 1V 2V
Vt of memory cells
Verify level5 0.8 V Word-line level5 4.5 V
(a)
3V 4V
Result of 4 timesprogram
100
0V 1V 2V
Vt of memory cells
3V 4V
102
104
106
108
Evolution of thresholds Final Distribution
From [Nakamura02]
125mm2 1Gbit NAND Flash Memory
10.7
mm
11.7mm
2kB
Pa
ge b
uffe
r &
ca
che
Ch
arg
e p
ump
16896 bit lines
32 word lines x 1024 blocks
From [Nakamura02]
125mm2 1Gbit NAND Flash Memory
• Technology 0.13m p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al• Cell size 0.077m2• Chip size 125.2mm2• Organization 2112 x 8b x 64 page x 1k block• Power supply 2.7V-3.6V• Cycle time 50ns• Read time 25s• Program time 200s / page• Erase time 2ms / block
• Technology 0.13m p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al• Cell size 0.077m2• Chip size 125.2mm2• Organization 2112 x 8b x 64 page x 1k block• Power supply 2.7V-3.6V• Cycle time 50ns• Read time 25s• Program time 200s / page• Erase time 2ms / block
From [Nakamura02]
Semiconductor Memory Trends(up to the 90’s)
Memory Size as a function of time: x 4 every three years
Semiconductor Memory Trends(updated)
From [Itoh01]
Trends in Memory Cell Area
From [Itoh01]
Future generations
• Very specialized technologies for stand alone memories expensive
• Reliability is going to be a very important issue (SER) particularly for SRAMs and DRAMs
• Power is going to be the limiting factor particularly when it comes to standby currents
• Embedded memories is the prominent market thrust driven by all mobile/SoC applications