ram (random access memory) - national tsing hua …oz.nthu.edu.tw/~d947207/chap22_dram.pdf · ram...
TRANSCRIPT
RAM (Random Access Memory)
Speaker: Lung-Sheng Chien Speaker: Lung-Sheng Chien
Reference: [1] Bruce Jacob, Spencer W. Ng, David T. Wang, MEMORY SYSTEMS
Cache, DRAM, Disk
[2] Hideo Sunami, The invention and development of the first trench-
capacitor DRAM cell, http://www.cmoset.com/uploads/4.1-08.pdf
[3] JEDEC STANDARD: DDR2 SDRAM SPECIFICATION
[4] John P. Uyemura, Introduction to VLSI circuits and systems
[5] Benson, university physics
OutLine
• Preliminary- parallel-plate capacitor
- RC circuit
- MOSFET
• DRAM cell
• DRAM device
• DRAM access protocol
• DRAM timing parameter
• DDR SDRAM
SRAM
Typical PC organization. Cache: use SRAMmain memory : use DRAM
Basic organization of DRAM internals
Dynamic RAM Static RAM
Cost Low High
Speed Slow Fast
# of transistors 1 6
Density High Low
target Main memory Cache
DRAM versus SRAM
• Random access: each location in
memory has a unique address. The
time to access a given location is
independent of the sequence of
prior accesses and is constant.
DRAM cell( 1T1C cell )
SRAM cellQuestion 1: what is capacitor?
Question 2: what is transistor?
Electric flux
electric flux = number of field lines passing though a surface
E E AΦ = ⋅��
1 uniform electric field, electrical flux is defined by
1 1 2 2E j jE A E A E A E dAΦ ≈ ⋅ ∆ + ⋅ ∆ + = ⋅ ∆ → ⋅∑ ∫� � � �� � � �
�
2 if the surface is not flat or field is not uniform, then
one must sum contributions of all tiny elements of
area
Gauss’s Law
flux leaving surface = flux entering surface
net flux is 0, say 0E E dAΦ = ⋅ =∫��
�
0
encE
QE dA
εΦ = ⋅ =∫
��
�
Gauss’s Law: net flux through a closed surface is
proportional to net charge enclosed by the surface
encQ = net charge enclosed by a closed surface
212
0 28.85 10 /
CF m
N mε −
= × ⋅
permitivity in free space
Conductor
• When a net charge is added to a conductor, free electrons
will redistribute themselves in a short time ( ~1ps ) such
that internal electrical field is 0
• If we draw a Gaussian surface (dashed line) inside a
conductor, then zero flux implies zero charge inside
conductor
121 10ps s−=
E upper upper down downE dA E A E A EAΦ = ⋅ = + =∫��
�
encQ Aσ= σ = surface charge density
0 0
encE
QE dA E
σ
ε εΦ = ⋅ = ⇒ =∫
��
�
Gaussian pillbox Example: infinite conducting plate
metal
Capacitor: parallel plate [1]
+ + + + + + + + +
+ + + + + + + + +
0
Eσ
ε=
σ
/ 2Q
Aσ =
+ + + + + + + + + + + + + + + + + +
Consider two parallel plate (metal) with total charge Q on each plate respectively
surface charge density
0
σ
ε
− − − − − − − −
0
σ
ε
− − − − − − − −
0
σ
ε
+ + + + + + + + +
− − − − − − − −
− − − − − − − −
+ + + + + + + + +
0
2E
σ
ε=
− − − − − − − −
− − − − − − − −
d
Capacitor: parallel plate [2]
In most cases, we don’t care about thickness of the plate. For simplicity we may
assume plate has no thickness, say flat sheet, with total charge Q on each plate
respectively. Then definition of surface charge density is different
Q
Aσ =surface charge density
1 1 2 2
0
E
QE dA E A E A
εΦ = ⋅ = + =∫
��
�
1 2
02E E
σ
ε= =
+ + + + + + + + +
0
Eσ
ε=
− − − − − − − −
d
+ + + + + + + + +
− − − − − − − −
Capacitance [1]
R (resistance)
C (capacitor)V
Kirchhoff’s voltage law:R CV V V= + ( )RV I t R= ⋅
( )C
Q tV
C= ( )
( )dQ tI t
dt=
R
CV+ + +
− − −
0T > ,capacitor has some charge
R
CV
0T = ,no charge on capacitor
0CV = ( ), 0V
IR
= charges the capacitor
0T > ,capacitor has some charge
0CV > ( ),V
I tR
< still charges capacitor
+ + + + +
− − − − −
1T >> ,capacitor contains maximum charge
CV V= ( ), 0I t = doesn’t charge capacitor anymore
Capacitance [2]
+ + + + + + + + +
0
Eσ
ε=
− − − − − − − −
dCV Ed=
capacitance is defined by 0
C
AQC
V d
ε= =
Electric field is not uniform near edge,
capacitance is capability of storing charge
10C ε∝ since from Gauss’s law
0
encE
QE dA
εΦ = ⋅ =∫
��
�0
1, E
ε∝
21
Cd
∝ since if we fix total charge Q and area A, then
0
is fixed is fixed Q
E V Ed dA
σσ
ε= ⇒ = ⇒ = ∝
3 C A∝ since if we fix potential difference V and space d, then
0
is fixed is fixed due to V
E E Q A Ad
σσ σ
ε= ⇒ = ⇒ = ∝
Electric field is not uniform near edge, called fringe field
Capacitance [3]
Suppose we add an insulator into parallel metal plate, what happens on capacitor?
insulator
metal When charge is stored on capacitor, then electric field would separate positive and
+
−
d�
q+
q−
: dipole momentp qd=�
dipole
metal
No charge on capacitor, nothing happenselectric field would separate positive and negative charge inside insulator.
0 :E field produced by charge on capacitor
:iE field induced by separate charge of insulator:DE net field within insulator (dielectric)
Capacitance [4]
+
−
d�
q+
q−
: dipole momentp qd=� dipole moment
polarizationunit volume
P = =
Constitutive equation: 0 e totalP Eε χ= : electric susceptibility
eχ
0
1 1
1total ext total ext ext
e r
PE E E E E
ε χ ε= − ⇒ = ≡
+: dielectric constant
rε
material Dielectric constant Material Dielectric constant
vacuum 1 Benzene (苯) 2.28
Silicon dioxide 3.9 Diamond 5.7
Ta2O5 25 Salt 5.9
BST >200 Silicon 11.8
TiO2 (Titanium dioxide) 85 Methanol (甲醇) 33
ZrO2 23 SrZrO3 30
Al2O3 9.1 La2O3 (氧化鑭 ) 16
HfO2 (Hafnium oxide) 25 water 80.1
BaTiO3 (鈦酸鋇) 3000~8000 KTaNbO3 34000
+ + + + + + + + +
0
extEσ
ε=
− − − − − − − −
dC extV E d=
Capacitance [5]
+ + + + + + + + +
1ext
r
E Eε
=
− − − − − − − −
dCV Ed=
00
C
AQC
V d
ε= =
00r r
C
AQC C
V d
εε ε= = =
1 Keep all geometrical parameters, area A and height d, then we can add insulator to increase capacitance of capacitor
2 Insulator would induce polarization to cancel part of external field such that small voltage gap can store the same charge. In other words, capability of charge storage is increasing so that capacitance is also increasing
3 Design parameters of a capacitor are
Area of plate: A
Distance between two plate : d
Dielectric constant : rε
Insulator (dielectric)
RC circuit
R
CV
Kirchhoff’s voltage law:R CV V V= + ( )RV I t R= ⋅
( )C
Q tV
C= ( )
( )dQ tI t
dt=
dQ QV R
dt C= +First order ODE: ( ). . 0I C Q q=
Charging: ( )0 0Q =
1 expC
tV V
RC
= − −
1dQ Q
V Rdt C
= +with
R
C+ + + + +
− − − − −V
discharging: ( )0Q CV=
expC
tV V
RC
= −
2 0dQ Q
Rdt C
= +with
Typical time: T RC= (RC time constant)
For discharging case, when t RC= , then 0.37CV V=
CMOS inverter
x x
Logical symboltruth table
FET (Field-Effect Transistor, 場效電晶體)
0 1x x= ⇒ = 1 0x x= ⇒ =
current flow
MOSFET (Metal-Oxide-Semiconductor) [1]
top view
L: channel length, also called feature size, up to 45 nm so far
polysilicon (poly)
2SiO
side view
L: channel length, also called feature size, up to 45 nm so far
http://ezphysics.nchu.edu.tw/prophys/electron/lecturenote/7_5.pdf
pFET
MOSFET [2]
G (gate)
nFET cross section pFET cross section
G (gate)
S (source) D (drain)
Typical thickness 5oxt nm=
Typical gate capacitance ( )15,10GC fF femtofarad F−∼
MOSFET operation [1]
n+ n+
L
W
open switchzero gate voltage
p p+ p−p
n
p
n
V+
−
p
n
V+
−
pn junction forward current reverse blocking
= pn n two pn junction
No current
MOSFET operation [2]
n+ n+ W
closed switchpositive gate voltage
electron channel
current flows through thin electron channel from source to drain
high gate voltage negative gate voltage
open switch closed switch
MOSFET layers in an n-well process
CMOSFET layers
Metal interconnect layers
OutLine• Preliminary
• DRAM cell-1T1C structure
- trench capacitor, stack capacitor
- array structure
- sense amplifier- sense amplifier
- read / write operation
• DRAM device
• DRAM access protocol
• DRAM timing parameter
• DDR SDRAM
DRAM cellDRAM cell = cell transistor + storage capacitor
R
CV
R
C+ + +
− − −V
charging 1
2 leakage
Scaling of memory cell and die size of DRAM
Storage capacitance should be kept constant despite the cell scaling to provide
adequate operational margin with sufficient signal-to-noise ratio
對於在製程微縮過程中電容所面臨問題的解決方式,為了增加平行電板的面積又不至於增加細胞的尺寸,有兩種製程流派來維持電容值在容許的數值之上:深溝電容(trench capacitor)以及堆疊電容(stack capacitor)。
深溝電容(trench capacitor)
堆疊電容 (stack capacitor)
Popular model of DRAM cell
Dielectric film should be physically thin enough not fill up the trench.
F: feature size
Ti: dielectric film thickness
2 iT F<
A scaling limit of capacitor structure
Cross-section of storage node DRAM capacity (bits/die)
After K. Itoh, H. Sunami, K. Nakazato, and M. Horiguchi, ECS Spring Meeting, May 4, 1998
Objective: decrease feature size to increase density of DRAM cells
material Dielectric constant Material Dielectric constant
Silicon dioxide 3.9 Al2O3 9.1
Ta2O5 25 La2O3 (氧化鑭 ) 16
TiO2 (Titanium dioxide) 85 BaTiO3 (鈦酸鋇) 3000~8000
ZrO2 23 SrZrO3 30
HfO2 (Hafnium oxide) 25 KTaNbO3 34000
current flow out
Read operation in DRAM [1]
Suppose a DRAM cell is high voltage (data value is 1) in capacitor, when do read
operation, address line (word line) is selected and value of capacitor would be
extracted
capacitorSense amplifier
ddV
off
refV
1 Precharge to reference voltage
capacitorSense amplifier
ddV
open
refV
2 Open transistor (world line is selected)
V∆
0V∆ > Sense amplifier sets bit line as 1
(dis-charging)
capacitance of storage capacitor 1
capacitance of bitline 10=
Read operation in DRAM [2]
current flow in
(charging)
capacitorSense amplifier
ddV
off
4 Turn off transistor, complete one read operation
capacitorSense amplifier
ddV
open
ddV
3 Data restoration
Since when data is read out, then capacitor is discharging such that it can not be read again, hence data restoration is necessary.
Question 3: what do you think “if transistor is off,
then capacitor is isolated, no leakage current
flows out” ?
DRAM array structure
Open bitline folded bitline
Differential sense amplifier use a pair of bitlines to sense the
voltage value in DRAM cell
area per cell =26F
area per cell =28F
Functionality of sense amplifier
• Sense the minute change in voltage
- access transistor is turned on
- storage capacitor places its charge on the bitline
- sense amplifier compares voltage on that bitline against a reference
voltage on a separate bitline
• Restores the value of cell after the voltage on the bitline is sensed
• Temporary data storage, called row buffer
Basic sense amplifier circuit diagram
4 steps of amplifier operation [1]
Signal EQ activates tow transistors such that source2
ccref
VV = charges two drains (bitlines)
4 steps of amplifier operation [2]
• Signal EQ is deactivated such that equalization circuit is disable
• Storage capacitor is discharging till voltage of storage capacitor is equal to voltage of
bitline, a little bit larger than reference voltage
capacitorSense amplifier
ddV
open
refV
Open transistor (world line is selected)
V∆
0V∆ > Sense amplifier sets bit line as 1
current flow out
(dis-charging)
4 steps of amplifier operation [3]
refV V+ ∆
21 exceeds threshold such that transistor is turned on
1
2ref ccV V V+ ∆ >
0SAN =
refV ↘ 1
2
3
12
ref ccV V V+ ∆ >
2 signal SAN is set GND (ground)
3 current from bitline flows into SAN, then voltage of bitline isdecreasing till voltage is 0
ccSAP V=1
2ccV V<
refV V+ ∆ ↗
4 5
6
41
2ccV V< , its complement exceeds threshold such that
transistor is turned on
5 signal SAP is set Vcc (power line)
6 current from SAP flows into bitline, then voltage of bitline isincreasing till voltage is Vcc
4 steps of amplifier operation [4]
refV ↗ 8
7 current from bitline flows into capacitor, then
ccV
0SAN =
0V =
ccSAP V=
1addr =
Bi-stable circuit
7
7 current from bitline flows into capacitor, thencapacitor is charging (data restoration)
8 signal CSL (column-select line) is activated,then transistor is turn on, current flows intooutput. After voltage is stable in output, CSLis deactivated and turn off transistor, thendata is stored in output (row buffer)
Written into DRAM array
• Data written by memory controller is buffered by I/O buffer of DRAM device and used to overwrite sense amplifiers and DRAM cells.
• The time period required for write data to overdrive sense amplifiers and written through into DRAM cells is t_WR
• The row cycle time of DRAM device is write-cycle limited due to t_WR
OutLine
• Preliminary
• DRAM cell
• DRAM device- DRAM SPEC- DRAM SPEC
- input/output signal
- channel, rank, bank, row, column
• DRAM access protocol
• DRAM timing parameter
• DDR SDRAM
Typical 16Mbit DRAM (4M x 4)
: row address selectRAS
: column address selectCAS
( ): write enable write operationWE
( ): output enable read operationOE
[ ]0 :10 : address line, 11 bits, for row and columnA
[ ]0 : 3 : data line, 4 bitsD
: time to do refreshrefresh counter
packaging of 16Mbit DRAM (4M x 4)
[ ]0 :10 : address line, 11 bits, for row and columnA
[ ]1 : 4 : data line, 4 bitsD
: row address selectRAS
: column address selectCAS
( ): write enable write operationWE
( ): output enable read operationOE
: power supply 2Vcc ×
: ground pin 2Vss ×
: no connectNC
Standard name
Memory clock
Cycle time
I/O Bus clock
Data transfers per second
Module name
Peak transfer rate
DDR-200 100 MHz 10 ns 200 MHz 200 Million PC-1600 1600 MB/s
DDR-266 133 MHz 7.5 ns 266 MHz 266 Million PC-2100 2100 MB/s
DDR-333 166 MHz 6 ns 333 MHz 333 Million PC-2700 2700 MB/s
DDR-400 200 MHz 5 ns 400 MHz 400 Million PC-3200 3200 MB/s
Spec of DDR (double data rate)
from http://en.wikipedia.org/wiki/DDR_SDRAM
DDR prefetch buffer is 2 bits deep
JEDEC document: http://www.jedec.org/Catalog/display.cfm
DDR-xxx denotes data transfer rate
Bandwidth is calculated by taking transfers per second and multiplying by eight. This is because DDR memory modules transfer data on a bus that is 64 data bits wide
from http://en.wikipedia.org/wiki/DDR_SDRAM
1 ns (nano second) = 910− second
64 data bits = 8 (chip per side) x 8 (bits per chip)
Standard
name
Memory
clock
Cycle
time
I/O Bus
clock
Data
transfers per
second
Module
name
Peak
transfer
rate
DDR2-400 100 MHz 10 ns 200 MHz 400 Million PC2-3200 3200 MB/s
DDR2-533 133 MHz 7.5 ns 266 MHz 533 Million PC2-4200
PC2-4300
4266 MB/s
Spec of DDR2 (double data rate)
From http://en.wikipedia.org/wiki/DDR2_SDRAM
DDR2 prefetch buffer is 4 bits deep
DDR2-667 166 MHz 6 ns 333 MHz 667 Million PC2-5300
PC2-5400
5333 MB/s
DDR2-800 200 MHz 5 ns 400 MHz 800 Million PC2-6400 6400 MB/s
DDR2-1066 266 MHz 3.75 ns 533 MHz 1066 Million PC2-8500
PC2-8600
8533 MB/s
DDR2-xxx denotes data transfer rate
PC2-xxxx denotes theoretical bandwidth and is used to describe assembled DIMMs,
Bandwidth is calculated by taking transfers per second and multiplying by eight.
This is because DDR2 memory modules transfer data on a bus that is 64 data bits wide
Standard
name
Memory
clock
Cycle
time
I/O Bus
clock
Data transfers
per second
Module
name
Peak
transfer rate
DDR3-800 100 MHz 10 ns 400 MHz 800 Million PC3-6400 6400 MB/s
DDR3-1066 133 MHz 7.5 ns 533 MHz 1066 Million PC3-8500 8533 MB/s
DDR3-1333 166 MHz 6 ns 667 MHz 1333 Million PC3-10600 10667 MB/s
DDR3-1600 200 MHz 5 ns 800 MHz 1600 Million PC3-12800 12800 MB/s
Spec of DDR3 (double data rate)
From http://en.wikipedia.org/wiki/DDR3_SDRAM
DDR3 prefetch buffer is 8 bits deep > DDR2 (4 bits deep) > DDR (2 bits deep)
DDR3-1600 200 MHz 5 ns 800 MHz 1600 Million PC3-12800 12800 MB/s
1GB DDR2-800, 240 pins
From http://shopping.pchome.com.tw/
A DIMM (dual in-line memory module)
comprises a series of DRAM IC.
full data bit-width of the DIMM ie 64 bits
Motherboard P5Q PRO
North BridgeIntel P45 chipsetmemory controller
DDR2 DIMM_A1 240-pin module
DDR2 DIMM_A2 240-pin module
DDR2 DIMM_B1 240-pin module
DDR2 DIMM_B2 240-pin module
前側匯流排 FSB 1600 FSB 1333 FSB 1066 FSB 800
CPU外頻 400 MHz 333 MHz 266 MHz 200 MHz
FSB/CPU外頻對照表
Channel A: DIMM_A1 and DIMM_A2
Channel B: DIMM_B1 and DIMM_B2
DIMMs, ranks, banks, and arrays
North bridgememory slots
• A system has many DIMMs, each of which contains ranks.
• Each rank is a set of ganged DRAM devices, each of which has many banks.
• Each bank has many constituent arrays.
Nomenclature: Channel
DMC (DRAM memory controller)
Two channels
CPU
North bridge
DIMM
Nomenclature: rank
Memory system with 2 ranks of DRAM devices
A “rank” is a set of DRAM devices that operate in lockstep in response to a given command.
Chip-select signal is used to select appropriate rank of DRAM devices to respond to a given
command.
Nomenclature: bank
SDRAM device with 4 banks of DRAM arrays internally
A “bank” is a set of independent memory arrays inside a DRAM devices.
All banks can be read in pipelined and refresh simultaneously.
Nomenclature: row
Generic DRAM devices with 4 banks, 8196 rows, 512 columns per row and 16 data
bits per column.
A “row” is a group of storage cells that are activated in parallel in response to a row
activation command.
size of row = size of row of a DRAM device x # of DRAM devices in a given rank
Nomenclature: column
A column of data is the smallest addressable unit of memory
width of data bus = 16 x 4 = 64 (bits)
Nomenclature: DIMM (dual In-line Memory Module)
240-pin fully buffered DDR2 DIMM Standard 240-pin DDR2 DIMM
http://www.simmtester.com/page/news/showpubnews.asp?title=Memory+Module+Picture+2007&num=150
• A dual inline memory module (DIMM) consists of a number of memory components
(usually black) that are attached to a printed circuit board (usually green).
• Each 240-pin DIMM provides a 64-bit data path (72-bit for ECC or registered
modules).
• DIMM has 120 pins on the front and 120 pins on the back, for a total of 240 pins.
• Standard DDR2 DIMM has 8 chips (block) on one side, total is 16 chips.
• Fully buffered DDR2 DIMM has 9 chips on one size, total is 18 chips.
Configuration of DRAM [1]
DIMMs are built using "x4" (by 4) memory chips or "x8" (by 8) memory chips with 8(9)
chips per side. "x4" or "x8" refer to the data width of the DRAM chips in bits.
Example: a x4 DRAM indicates that DRAM has at least four memory array in a single
bank and a column width is 4 bits.
Configuration of DRAM [2]
Device configuration 64 M x 4 32 M x 8 16 M x 16
Number of banks 4 4 4
Number of rows 8192 8192 8192
Number of columns 2048 1024 512
Data bus width 4 8 16
256-Mbit SDRAM device configuration
Configuration = (number of addressable location, number of data bits per location)
11 256-Mbit = 64 M (locations) x 4 (bits per location)
2 64 M (locations) = 8192 (rows) x 2048 (cols) x 4 (banks)
1GB DDR2-800, 240 pins
From http://shopping.pchome.com.tw/
OutLine
• Preliminary
• DRAM cell
• DRAM device
• DRAM access protocol• DRAM access protocol- pipelined-base resource usage model
- read / write operation
• DRAM timing parameter
• DDR SDRAM
Basic DRAM Memory-Access Protocol
Command and data movement on a generic SDRAM device DRAM memory-access protocol defines commands and timing constraints that a DRAM memory controller uses to manage the movement of data between itself and DRAM devices
five basic DRAM commands
resource usage model : at any given instance, 4 operations exist in 4 phases, this
constitute 4-stage pipelined. Resources are not shared among these 4 phases.
Sometimes we call it as 4-stage pipelined.
five basic DRAM commands- row access command- column-read command- column-write command- precharge command- refresh command
Generic DRAM command format
1parametert measures duration of “phase 2” (spends in the use of selected bank)
2parametert measures duration of “phase 3” (spends in the use of resource to multiple banks of DRAM)to multiple banks of DRAM)
1parametert is minimum time between two commands whose relative timing is limited bythe sharing of resources within a given bank of DRAM arrays
2parametert is minimum time between two commands whose relative timing is limited bythe sharing of resources by multiple banks of DRAM arrays within the sameDRAM devices.
parameter description
t_CMD Command transport duration. The time period that a command occupies on the command bus as it is transported from the DRAM controller to the DRAM devices.
Row Access Command
Objective: move data from the cells in DRAM arrays to sense amplifiers and then
restore the data back into the cells in DRAM array.
parameter description
t_RCD Row to Column command Delay. The time interval between row access and data ready at sense amplifiers.
The time required between RAS (Row Address Select) and CAS (Column Address Select).
t_RAS Row Access Strobe latency. The time interval between row access command and data restoration in DRAM array. A DRAM bank cannot be precharged until at least t_RAS time after the previous bank activation.
Column-Read Command [1]
Objective: move data from array of sense amplifiers through data bus back to
memory controller
parameter description
t_CAS ( t_CL ) Column Access Strobe latency. The time interval between column access command and start of data return by DRAM devices.
Column-Read Command [2]
parameter description
t_BURST Data burst duration. The time period that data burst occupies on the data bus.
In DDR2 SDRAM, 4 beats of data occupy 2 full clock cycles.
• One beat burst means one-column data
• Each column of SDRAM is individually addressable and given a column address in the middle of 4 column burst , SDRAM will reorder the burst to provide the data of requested address first, this is called critical-word forwarding.
Column-Read Command [3]
parameter description
t_CCD Column-to-Column Delay. The minimum column command timing, determined by internal burst (prefetch) length. Multiple internal bursts are used to form longer burst for column read.
t_CCD is 2 beats (1 cycles) for DDR SDRAM
t_CCD is 4 beats (2 cycles) for DDR2 SDRAM
t_CCD is 8 beats (4 cycles) for DDR3 SDRAM
Column-Write Command [1]
Objective: move data from memory controller to sense amplifiers of targeted bank.
Clearly ordering of phases is reversed between column-read and column-write
commands.
parameter description
t_CWD Column Write Delay. The time interval between issuance of column-write command and placement of data on the bus by DRAM controller.
SDRAM: t_CWD = 0 cycle
DDR SDRAM: t_CWD = 1 cycle
DDR2 SDRAM: t_CWD = t_CAS – t_CMD cycles
DDR3 SDRAM : t_CWD = 5 ~ 8 cycles
Column-Write Command [2]
parameter description
t_WTR Write To Ready delay time. The minimum time interval between the end of a write data burst and the start of a column-read command. I/O gating is released by write command.
Write command � read command
t_WR Write Recovery time. The minimum time interval between the end of a write data burst and the start of a precharge command. Allows sense amplifiers to restore data to cells.
Wrtie command � precharge command
Precharge Command [1]
• Step1: row access command moves data from DRAM cells to sense amplifiers (data
is cached), then column access command moves data between DRAM device and
memory controller
• Step 2: precharge command completes the row access sequence as it resets the
sense amplifiers and bitlines and prepares them for another row access command to
the same DRAM array.
Data access in a typical DRAM device is composed of two-step process
Precharge Command [2]
parameter description
t_RP Row Precharge. The time interval that it takes for a DRAM array to be precharged (precharge bitline and sense amplifiers) for another row access.Switching between memory banks.
t_RC Row Cycle. The time interval between accesses to different rows in a bank.
t_RC = t_RAS + t_RP
Refresh Command [1]
• Non-persistent charge storage in DRAM cells means that charge stored in capacitor
will gradually leak out through access transistors.
• To maintain data integrity, DRAM must be periodically read out and restored before
charge decay to indistinguishable level.
parameter description
t_RFC ReFresh Cycle time. The time interval between refresh and activation commands.
One refresh command may refresh 1, 2, 4, 8 rows. The more rows are refreshed, the more time t_RFC is.
Refresh Command [2]
A refresh command refresh DRAM cells in all banks since all banks can operate independently
DRAM device family
capacity Number of rows
Refresh count Number of row per refresh command
t_RC t_RFC
DDR 512MB 8192 8192 1 55 ns 70 ns
DDR2 512MB 16384 8192 2 55 ns 105 ns
4096MB 65536 8192 8 ~ 327.5 ns
Suppose memory is DDR2-800 2GB, (memory clock = 200MHz, 5ns/clock), then
t_RFC = 52 (memory clocks) = 52 x 5 (ns/clock) = 260 ns
Read Cycle [1]
[ ][ ] [ ][ ]1A i j A i j→ + Row access column access column access
[ ][ ] [ ][ ]1A i j A i j→ + Row access column access precharge Row access column access
[ ][ ]A i j [ ][ ]1A i j +
[ ][ ]A i j [ ][ ]1A i j+
Principle of spatial locality: row access command fetch a row into sense amplifier
Read Cycle [2]
I/O gating
data burst
row acc
data sense data restore array precharge
RCDt
CASt BURSTt
col read prec. row act
RAStRCt
RPt
time
cmd & addr bus
bank utilization
device utilization
data bus
bank access
row access column read prechargerow access column read precharge
Read Cycle [3]
I/O gating
data burst
row acc
data sense data restore array precharge
RCDt
CASt BURSTt
col read prec. row act
RAStRCt
RPt
time
cmd & addr bus
bank utilization
device utilization
data bus
bank access
row access column read precharge
Read Cycle [4]
I/O gating
data burst
row acc
data sense data restore array precharge
RCDt
CASt BURSTt
col read prec. row act
RAStRCt
RPt
time
cmd & addr bus
bank utilization
device utilization
data bus
bank access
row access column read precharge
Write Cycle [1]
Row cycle time is limited by the duration of write cycle since data path of write is
memory controller data bus I/O gating MUX sense amplifier DRAM cells
( )RAS RCD CWD BURST WRt write t t t t= + + +
( ) ( )RAS RCD CAS BURST restoret read t t t remaining t= + + +
∨
Write Cycle [2]
I/O gating
data burst
row acc
data sense write data restore array precharge
RCDt
CWDt BURSTt
col write prec. row act
RAStRCt
RPt
time
cmd & addr bus
bank utilization
device utilization
data bus
row access column read precharge
WRt
data restorerow access column read prechargedata restore
Consecutive reads and writes to same open bank [1]
Two column-read commands to the same row are issued
Precharge is not necessary since one row of data has been latched in sense amplifier
I/O gating
data burst
CASt BURSTt
Read 0cmd & addr bus
bank “i” utilization
rank “m” utilization
data bus
row x open
Read 1
I/O gating
data burst
BURSTt
I/O gating
data burst
row acc
data sense data restore array precharge
RCDt
CASt BURSTt
col read prec. row act
RAStRCt
RPt
time
cmd & addr bus
bank utilization
device utilization
data bus
bank access
row access column read precharge
Precharge is not necessary since one row of data has been latched in sense amplifier
Consecutive reads and writes to same open bank [2]
I/O gating
data burst
row acc
data sense data restore
RCDt
CASt BURSTt
Read 0cmd & addr bus
bank utilization
device utilization
data bus
bank access
row access column read
Read 1
I/O gating
data burst
data restore
CASt BURSTt
bank access
column read
data restore for “read 0” and “read 1” can be done simultaneously
I/O gating
data burst
row acc
data sense data restore
RCDt
CASt BURSTt
Read 0cmd & addr bus
bank utilization
device utilization
data bus
bank access
row access column read
Read 1
I/O gating
data burst
data restore
N consecutive column-read needs time RCD CAS BURSTt t N t+ + ⋅ , not ( )RCD CAS BURSTN t t t+ +
I/O gating
data burst
row y open-data restore
RASt
row acc
RAS RPt t+
cmd & addr
bank “i” utilization
data bus
bank i precharge
rank “m” utilization
read 1
I/O gating
data burst
row accprec
data sense data sense
RPt
read 0
row x open -data restore
Consecutive reads to different rows of same bank
[ ][ ] [ ][ ]1A i j A i j→ + destroys spatial locality such that [ ][ ] [ ][ ], 1A i j A i j+ are on different rows.
The time to access [ ][ ]A i j or [ ][ ]1A i j+ require whole row cycle time RCt
I/O gating
data burst
data restore
CWDt WRtBURSTt
write 0
CWD BURST WR RP RCDt t t t t+ + + +
cmd & addr
bank “i” of rank
data bus
array precharge
rank “m” utilization
write 1
I/O gating
data burst
BURSTt
row accprec
data restore data sense
RPt RCDt
Consecutive writes to different rows of same bank
Consecutive reads to different banks (bank conflict)
bank i and bank j are open together but read request to bank j is different from
I/O gating
data burst
row y open-data restore
RPt
read 0
RP RCDt t+
cmd & addr
bank “i” of rank “m”
data bus
rank “m” utilization
read 1
I/O gating
data burst
bank i open
RCDt
bank “j” of rank “m”
prec row acc
row x open bank j precharge data sense
bank i and bank j are open together but read request to bank j is different from
active row in sense amplifier, hence bank j must precharge bitline first. This is
called “bank conflict”.
I/O gating
data burst
bank j open
CASt OSTtBURSTt
read 0
BURST RTRSt t+
time
cmd & addr
bank “i” of rank “m”
rank “m” utilization
data bus
bank i open
bank “j” of rank “n”
read 1
data burst sync
rank “n” utilization
BURSTt
Consecutive reads to different ranks
parameter description
t_RTRS Rank-To-Rank-Switching time. Used in DDR and DDR2 SDRAM system.1 full cycle in DDR SDRAM
I/O gating
t_RTRS Rank-To-Rank-Switching time. Used in DDR and DDR2 SDRAM system.1 full cycle in DDR SDRAM
I/O gating
data burst
bank j access
CWDtOSTtBURSTt
write 0
BURSTt OSTt
time
cmd & addr
bank “i” of rank “m”
rank “m” utilization
data bus
bank i access
bank “j” of rank “n”
write 1
I/O gating
data burst
rank “n” utilization
BURSTt
Consecutive writes to different ranks
I/O gating
data burst
data restore
CASt BURSTt
read 0
CAS BURST RTRS CWDt t t t+ + −
cmd & addr
bank “i” of rank “m”
data bus
rank “m” utilization
write 1
I/O gating
data burst
row x open
RTRSt
bank “j” of rank “m”
sync
CWDt
Write command following read command to open banks
I/O gating
data burst
row x open
CWDtBURSTt
write 0
CWD BURST WTRt t t+ +
cmd & addr
bank “i” of rank “m”
data bus
rank “m” utilization
read 1
I/O gating
data restore
WTRt
bank “j” of rank “m”
data burst
Read command following write command to open banks
DRAM protocol overheads for DDR and DDR2 SDRAM
prev next rank bank row scheduling distance between column access
commands (no command reordering)
R R s s t_BURST
R R s d t_BURST
R R s s d t_RAS + t_RP
R R d s/d t_RTRS + t_BURST
R W s d t_CAS + t_BURST + t_RTRS - t_CWD
R = read ; W = write ; s = same ; d = different
R W s d t_CAS + t_BURST + t_RTRS - t_CWD
W R s d t_CWD + t_BURST - t_WTR
W W s s t_BURST
W W s s d t_CWD + t_BURST + t_WR + t_RP + t_RCD
W W d s/d t_OST + t_BURST
Later on, we will determine value of timing parameter and calculate overhead explicitly
OutLine
• Preliminary
• DRAM cell
• DRAM device
• DRAM access protocol • DRAM access protocol
• DRAM timing parameter- CL value
- system calibration
• DDR SDRAM
value: CL RCD RP RASCL t t t t− − −
memory clock speed = 533
CL value of commodity DDRx SDRAM
From http://shopping.pchome.com.tw/
value: CL RCD RPCL t t t− −
memory clock speed = 533
memory clock speed = 400
from http://en.wikipedia.org/wiki/CAS_latency
• when DDR is read, a single read produces 64 bits of data from 8 chips, 8 bits per chip.
• when talking about the time between bits, it is referring to the time from the appearance of the first group of bits (8 bits a chip) until the appearance of the next group of bits
• CAS latency only specifies the delay between the request and the first bit.
• Remaining bits (7 bits) are fetched one bits per cycle.
CAS latency (CL value) [1]
type Data rate ns/bit Command rate ns/cycle CL first word (ns) 8 word (ns)
DDR-400 400MHz 2.5 200MHz 5 3 15 32.5DDR-400 400MHz 2.5 200MHz 5 3 15 32.5
DDR2-800 800MHz 1.25 400MHz 2.5 5 12.5 21.25
DDR2-1066 1066MHz 0.94 533MHz 1.88 5 9.4 15.98
DDR3-1333 1333MHz 0.75 666MHz 1.5 9 13.5 18.75
DDR3-1600 1600MHz 0.625 800MHz 1.25 8 10 14.375
first word needs time 1
3 5 15 command rate
CL ns× = × =
Example: DDR-400
remaining 7 words need time 1
7 7 2.5 17.5 data rate
ns× = × =
CAS latency [2]
type Data rate ns/bit Command rate ns/cycle CL first word (ns) 8 word (ns)
DDR-400 400MHz 2.5 200MHz 5 3 15 32.5
DDR2-800 800MHz 1.25 400MHz 2.5 5 12.5 21.25
DDR2-1066 1066MHz 0.94 533MHz 1.88 5 9.4 15.98
DDR3-1333 1333MHz 0.75 666MHz 1.5 9 13.5 18.75
DDR3-1600 1600MHz 0.625 800MHz 1.25 8 10 14.375
I/O bus clock
type Data rate ns/bit Command rate
(memory clock)
ns/cycle CL first word (ns) 8 word (ns)
DDR-400 400MHz 2.5 200MHz 5 3 15 32.5
DDR2-800 800MHz 1.25 200MHz 5 5 25 33.75
DDR2-1066 1066MHz 0.94 266MHz 3.75 5 18.75 25.33
DDR3-1333 1333MHz 0.75 133MHz 6 9 54 59.25
DDR3-1600 1600MHz 0.625 200MHz 5 8 40 44.375
Correct “command rate”
Memory divider
• A Memory divider is a ratio which is used to determine the operating clock frequency
of computer memory in accordance with Front Side Bus frequency, if memory system
is dependent on FSB clock speed
• Memory Divider is also commonly referred as "DRAM:FSB Ratio".
• Ideally, Front Side Bus and system memory should run at the same clock speed
because FSB connects memory system to the CPU. But, it is sometimes desired to
run the FSB and system memory at different clock speeds when you overclock FSB
from http://en.wikipedia.org/wiki/Memory_divider
run the FSB and system memory at different clock speeds when you overclock FSB
type Data rate ns/bit Command rate
(memory clock)
ns/cycle CL first word (ns) 8 word (ns)
DDR2-800 800MHz 1.25 200MHz 5 5 25 33.75
DDR2-1066 1066MHz 0.94 266MHz 3.75 5 18.75 25.33
Motherboard: P5Q PRO with system clock 266MHz, FSB = 1066 MHz
Memory: DDR2-800 with CL = 5
http://www.lavalys.com/
System information tool, show everything of PC
System calibration Software
http://www.tweakers.fr/memset.html
Memory system
Memory timing parameter
5-5-5-18-3-52-6-3 8-3-5-4-6-4-6 14-5-1-6-6
CAS Latency READ to WRITE Delay (S/D) WRITE to PRE Delay
DRAM RAS to CAS delay Write to Read Delay(S) READ to PRE Delay
DRAM RAS Precharge WRITE to READ Delay(D) PRE to PRE Delay
DRAM RAS Activate to
Precharge time
READ to READ Delay(S) ALL PRE to ACT Delay
From BIOS of motherboard P5Q PRO
Timing parameter
RAS to RAS Delay READ to READ Delay(D) ALL PRE to REF
Delay
Row Refresh Cycle Time WRITE to WRITE Delay(S)
Write Recovery Time WRITE to WRITE Delay(D)
Read to Precharge Time
value: 5 5 5 18CL RCD RP RASCL t t t t− − − = − − −
System information
1 System clock (外頻) = 267.3MHz
2 CPU multiplier (倍頻) = 9
3 CPU clock = 2405.4 MHz
CPU clock = 外頻 x 倍頻
4 Memory bus = 400.9 MHz
5 DRAM : FSB ratio = 12 : 8
Standard name
Memory clock
Cycle time
I/O Bus clock
Data transfers per second
Module name
Peak transfer rate
DDR2-800 200 MHz 5 ns 400 MHz 800 Million PC2-6400 6400 MB/s
5 DRAM : FSB ratio = 12 : 8
DRAM I/O bus clock = 400 MHz
System clock = 266 MHz
6 Memory type: DDR2-800
7 Dual-channel: enable
Performance of cache, memory
level capacity Associativity Line size
Access Latency
Access Throughput Write update
Cache parameters of processors based on Intel Core Microarchitecture
CPU clock = 2405.4 MHz implies 1 2.4 ns cycle=
level capacity Associativity
(ways)
Line size (bytes)
Latency (clocks)
Throughput (clocks)
Write update policy
L1 data cache 32 KB 8 64 3 1 Write-back
L2 cache 2, 4 MB 8 or 16 64 14 2 Write-back
L1 cache has estimate latency 1.2 2.88 3 ns cycles cycles= ∼
L2 cache has estimate latency 5.6 13.44 14 ns cycles cycles= ∼
DDR2-800 SDRAM has estimate latency 91.5 220 ns cycles=
type Data rate ns/bit memory clock ns/cycle CL first word (ns) 8 word (ns)
DDR2-800 800MHz 1.25 200MHz 5 5 25
60 CPU cycles
33.75
81 CPU cycles
which is larger than best case (25 ns)
EVEREST: CPU information
倍頻 = 9
外頻 = 267MHz
2 cores share 8 MB L2 cache2 cores share 8 MB L2 cache
feature size of MOS = 65 nm
Huge number of transistors
Density of MOS transistor
( )feature size 65L nm=
Suppose length of MOS is 5L and MOS is square
2 225 105625L nm=, then area of MOS =
max 5L L=
2 225 105625L nm=, then area of MOS =
Area of Die = ( )2
2 6 12 2286 286 10 286 10mm nm nm= = ×
Maximum number of MOS in a Die =
12 2
2
area of Die 286 102700
area of MOS 105625
nmMillion
nm
×= =
number of MOS of CPU = 582 M, about 582
21.6%2700
M
M= of Die
This means a large space of Die is preserved for other usage
EVEREST: motherboard information
bus width of FSB = 64 bits
Bandwidth = 1067MHz (data rate) x 64 bits (bus width)
Type Data rate ns/bit memory clock ns/cycle CL first word (ns) 8 word (ns)
DDR2-800 800MHz 1.25 200MHz 5 5 25
60 CPU cycles
33.75
81 CPU cycles
Memory bus = 64 (bits per channel) x 2 (dual-channel)
Bandwidth = 800MHz (data rate) x 128 bits (bus width)
(bus width)
EVEREST: SDRAM information
t_REF = 3120 (memory clock) = 3120 x 5 (ns/clock) = 15600 ns = 15.6 sµ
EVEREST: SPD information
the JEDEC standards require certain parameters to be placed in the
lower 128 bytes of an EEPROM located on the memory module.
These bytes contain timing parameters, manufacturer, serial number
and other useful information about the module
Module has two sides, one rank per side, each rank has 8 SDRAM chips, 8 banks per chip
8 (bits per chip) x 8 (chips per rank)
parameter description Memory Clocks
t_CAS Column Access Strobe
latency.
5
t_RCD Row to Column command
Delay
5
t_RP Row precharge 5
t_RAS Row Access Strobe 18
t_RTRS Rank-To-Rank-Switching
time.
1
from EVEREST
Concrete timing parameter
t_BURST Data burst duration. 2
t_CMD Command transport
duration.
2
t_CWD Column Write Delay. t_CAS – t_CMD
= 3
t_WR Write recovery time 14
t_WTR Write to read delay same rank: 11
different rank: 5
t_OST ODT switching time. 1
t_WTP Write to precharge delay 14
t_RTP Read to precharge delay 5
DRAM protocol overheads for DDR2-800 SDRAM
prev next rank bank row scheduling distance between column access commands (no command reordering)
Memory clocks
CPU clocks
R R s s t_BURST 2 24
R R s d t_RP + t_RCD 10 120
R R s s d t_RAS + t_RP 23 276
R R d s/d t_RTRS + t_BURST 3 36
R W s d t_CAS + t_BURST + t_RTRS - t_CWD 5 60
W R s d t_CWD + t_BURST + t_WTR 16 192
R = read ; W = write ; s = same ; d = different
W R s d t_CWD + t_BURST + t_WTR 16 192
W W s s t_BURST 2 24
W W s s d t_CWD + t_BURST + t_WR + t_RP + t_RCD 29 348
W W d s/d t_OST + t_BURST 3 36
1 memory clock = 5 ns = 5 ns x (2.4 CPU cycle/ns) = 12 CPU cycle
Observation: different combination of commands reveals different overhead, we expect that false-sharing would have large overhead
219.6 CPU cycles
CAS latency, memory speed, and price
dual-channel (雙通道)
from http://shopping.pchome.com.tw/
5CLt = 7CLt =
tri-channel (三通道)
9CLt =8CLt =
Question: Is DDR3 faster than DDR2 ?
Objective: choose low CL-value and high clock speed memory module
OutLine
• Preliminary
• DRAM cell
• DRAM device
• DRAM access protocol • DRAM access protocol
• DRAM timing parameter
• DDR SDRAM- DDR2-SDRAM, DDR3-SDRAM
- dual-channel, tri-channel
- memory controller
SDRAM
1 data out per cycle
• SDRAM device operates data bus at the same data rate as the address and command buses.
• DDR SDRAM device operates data bus twice the data rate as the address and command buses.
DDR SDRAM [1]
DDR SDRAM
Two data out per cycle
DDR SDRAM [2]
SDRAM device architecture with 4 banksDDR SDRAM device I/O
The rate of internal data transfer in DDR SDRAM is not increased. DDR SDRAM use 2-bit prefetch to increase bandwidth I/O bus clock run twice faster than memory clock such that I/O bus can transfer 2N data at one time unit
Standard name
Memory clock
Cycle time
I/O Bus clock
Data transfers per second
Module name
Peak transfer rate
DDR-266 133 MHz 7.5 ns 266 MHz 266 Million PC-2100 2100 MB/s
DDR-333 166 MHz 6 ns 333 MHz 333 Million PC-2700 2700 MB/s
DDR-400 200 MHz 5 ns 400 MHz 400 Million PC-3200 3200 MB/s
DDR2 SDRAM
DDR2 SDRAM device I/O
Standard name
Memory clock
Cycle time
I/O Bus clock
Data transfers per second
Module name
Peak transfer rate
DDR2-533 133 MHz 7.5 ns 266 MHz 533 Million PC2-4300 4266 MB/s
DDR2-667 166 MHz 6 ns 333 MHz 667 Million PC2-5300 5333 MB/s
DDR2-800 200 MHz 5 ns 400 MHz 800 Million PC2-6400 6400 MB/s
The rate of internal data transfer in DDR2 SDRAM is not increased.
DDR2 SDRAM use 4-bit prefetch to increase bandwidth.
I/O bus clock run 2 times faster than memory clock and sample data at rising edge and falling edge of clock signal such that I/O bus can transfer 4N data at one time unit
1GB addressing
512MB addressing
DDR2 SDRAM SPEC [1]
DDR2 SDRAM SPEC [2] Simplified state diagram (not real)
DRAM controller
• Row-Buffer-Management Policy- open-page policy
- close-page policy
• Address Mapping Scheme- minimize bank address conflicts in temporal adjacent requests and maximize the
parallelism in memory system (parallelism of channels, ranks, banks, rows, and
columns)
- utilize dual-channel architecture
- flexibility for inserting/removing memory module
• DRAM Command Ordering Scheme• DRAM Command Ordering Scheme
http://www.intel.com/products/desktop/chipsets/p45/p45-overview.htm
North-bridge on P5Q PRO motherboard
Dual-channel architecture describes a technology that theoretically doubles data
throughput from the memory to the memory controller. Dual-channel-enabled memory
controllers utilize two 64-bit data channels, resulting in a 128-bit data path
Intel Dual-Channel DDR Memory Architecture White Paper
single-channel memory feeds data
to CPU through a single pipe.Data
is transferred 64 bits at a time.
With two channels, data is transferred
128 bits at a time.
Possible allocation of memory modules
Peak bandwidth
Standard
name
Memory
clock
Cycle
time
I/O Bus
clock
Data
transfers
per second
Peak transfer
rate (single
channel)
Dual-
channel
前側匯流排 FSB 1600 FSB 1333 FSB 1066 FSB 800
CPU外頻 400 MHz 333 MHz 266 MHz 200 MHz
P5Q PRO: FSB/CPU外頻對照表
Bandwidth of FSB = 1066 (MHz) x 8 (bytes) = 8 GB/s
DDR2-800 200 MHz 5 ns 400 MHz 800 Million 6.4GB/s 12.8 GB/s
Note: 外頻指的是 CPU 的外部頻率. 前端匯流排FSB則是用來作為CPU和晶片組Chipset之間連接用的這一段。 FSB速度是以CPU外頻為基準,利用倍頻技術,使FSB在每一週期傳輸一次資料提升到兩次或四次的資料,也就是兩倍頻或四倍頻,如266MHz(133x2)、333MHz(166x2).
Quad data rate (or quad pumping) is a communication signaling technique wherein data is transmitted at four points in the clock cycle: on the rising and falling edges, and at two intermediate points between them. The intermediate points are defined by a 2nd clock that is 90° out of phase from the first.
bandwidth of FSB < bandwidth of dual-channel DRAM
http://www.d-cross.com/show_article.asp?page=2&article_id=693
Intel 82955X MCH (Memory Controller Hub)
Two channels, 4 ranks per channel
symbol description Number
K Number of channels in system 2^k
L Number of ranks per channel 2^l
B Number of banks per rank 2^b
R Number of rows per bank 2^r
C Number of columns per row 2^c
V Number of bytes per column 2^v
Z Number of bytes per cacheline 2^z
Definition
N Number of cacheline per row 2^n
Number of bytes per row per bank = C V N Z× = ×
A memory system has capacity of K L B R C V× × × × × bytes
A memory system needs k l b r c v+ + + + + or k l b r z n+ + + + + address bits
symmetric dual channel mode: sequentially consecutive cacheline addresses are mapped to alternating channels
so that requests from a streaming request sequence are mapped to both channels
concurrently.
Rank
capacity
(MB)
Device configuration:
bank count x row
count x col count xcol size (bytes)
Rank composition:
device density xdevice count
Rank configuration:
bank count x row
count x col count xcol size
( B x R x C x V )
Bank
address
bits (b)
row
address
bits (r)
column
address
bits (c)
column
address
offset (v)
128 4 x 8192 x 512 x 2 256Mbit x 4 4 x 8192 x 512 x 8 2 13 9 3
256 4 x 8192 x 1024 x 2 512Mbit x 4 4 x 8192 x 1024 x 8 2 13 10 3
512 4 x 16384 x 1024 x 1 512Mbit x 8 4 x 16384 x 1024 x 8 2 14 10 3
512 8 x 8192 x 1024 x 2 1Gbit x 4 8 x 8192 x 1024 x 8 3 13 10 3
1024 8 x 16384 x 1024 x 1 1Gbit x 8 8 x 16384 x 1024 x 8 3 14 10 3
Address Mapping in Intel 82955X MCH
deviceBank 3
deviceBank 2device
Bank 1Device 0Bank 0
8192 rows
1024 columns2 byte per column
4 banks per device
deviceBank 3
deviceBank 2device
Bank 1Device 1Bank 0
deviceBank 3
deviceBank 2device
Bank 1Device 2Bank 0
deviceBank 3
deviceBank 2device
Bank 1Device 3Bank 0
4 devices per rank
col size per rank = col size per device x device count per rank = 2 x 4 = 8 (bytes)
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0
9 8 7 6 5 4 3 2 1 0
8 7 6 5 4 3 2 1 0 1
1
1
2
0 1 8 7 6 5 4 3 2 1 0 X X X
8 7 6 5 4 3 2 1 0 1
1
1 0 9 8 7 6 5 4 3 2 1 0 X X X
8 7 6 5 4 3 2 1 0 1
1
1 0 9 8 7 6 5 4 3 2 1 0 X X X
3
1
3
0
2
9
2
8
2
7
2
6
2
5
1
0
9
1
2
1
0
9
1
3
1
2
1
0
9
Rank
capacity
(MB)
Rank
configuration:
row count x bank
count x col count
x col size
128 8192x4x512x8
256 8192x4x1024x8
512 16384x4x1024x8
Per-channel, per-rank address mapping scheme for single/asymmetric channel mode
1
8 7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 2 1 0 X X X
8 7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 2 1 0 X X X
3 2 0
1
2
1
1
1
0
9
1
3
1
2
1
1
1
0
9
512 8192x8x1024x8
1024 16384x8x1024x8
Row ID bank ID col ID Byte offset
Channel address and rank address are mapped to highest bit field such that each rank or
channel is a contiguous block of memory
2
4
2
3
2
2
2
1
2
0
1
9
1
8
1
7
1
6
1
5
1
4
1
3
1
2
1
1
1
0
9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0 1
1
1
2
0 1 8 7 6 5 4 3 0 2 1 0 X X X
7 6 5 4 3 2 1 0 1
1
1 0 9 8 7 6 5 4 3 0 2 1 0 X X X
7 6 5 4 3 2 1 0 1
1
1 0 9 8 7 6 5 4 3 0 2 1 0 X X X
7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 0 2 1 0 X X X
3
1
3
0
2
9
2
8
2
7
2
6
2
5
1
0
9 8
1
2
1
0
9 8
1
3
1
2
1
0
9 8
1
2
1
1
1
0
9 8
Rank
capacity
(MB)
Rank
configuration:
row count x bank
count x col count
x col size
128 8192x4x512x8
256 8192x4x1024x8
512 16384x4x1024x8
512 8192x8x1024x8
Per-rank address mapping scheme for dual channel mode
7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 0 2 1 0 X X X
2 1 0
1
3
1
2
1
1
1
0
9 81024 16384x8x1024x8
Row ID bank ID col ID Byte offset
Channel ID
• A channel is 64-byte contiguous block (cacheline is 64 byte), hence
consecutive cacheline addresses are interleaved into channels
• Rank address is mapped to highest bit field such that a rank is a contiguous
memory block