lec11 computer architecture by hsien-hsin sean lee georgia tech -- memory part3

ECE 4100/6100Advanced Computer Architecture

Lecture 11 DRAM and Storage

Prof. Hsien-Hsin Sean LeeSchool of Electrical and Computer EngineeringGeorgia Institute of Technology

2

The DRAM Cell

• Why DRAMs– Higher density than SRAMs

• Disadvantages– Longer access times – Leaky, needs to be refreshed– Cannot be easily integrated with CMOS

Stack capacitor (vs. Trench capacitor)Source: Memory Arch Course, Insa. Toulouse

Word Line (Control)

Storage CapacitorBit Line

(Information)

1T1C DRAM cell

3

One DRAM Bank

wordline

bitlines

Sense ampsI/O gating

Row

decoder

Column decoder

Data out

Address

Sense ampsSense ampsSense amps

4

Column decoderColumn decoderColumn decoder

Row

decoder

Row

decoder

Row

decoder

Example: 512Mb 4-bank DRAM (x4)

I/O gating

Row

decoder

Column decoder

Data out D[3:0]

Address

A[13:0]

A[10:0]

Address Multiplexing

16K

2k

A x4 DRAM chip

A DRAM page = 2kx4 = 1KB

BA[1:0]

Bank016384 x 2048 x 4

Sense amps

5

DRAM Cell ArrayWordline0 Wordline1 Wordline2 Wordline1023

bitline0

bitline1

bitline2

bitline15

Wordline3

6

DRAM Sensing (Open Bitline Array)WL0 WL1 WL2 WL127

A DRAM Subarry

WL128 WL129 WL130 WL255

A DRAM Subarry

SenseSenseAmpAmp

8

DRAM Basics• Address multiplexing

– Send row address when RAS asserted – Send column address when CAS asserted

• DRAM reads are self-destructive– Rewrite after a read

• Memory array– All bits within an array work in unison

• Memory bank– Different banks can operate independently

• DRAM rank– Chips inside the same rank are accessed

simultaneously

9

Examples of DRAM DIMM Standards

D0

D7

x8

D8

D15

x8D

16

D23

x8

D24

D31

x8

D32

D39

x8

D40

D47

x8

D48

D55

x8

D56

D63

x8

x64 (No ECC)

D0

D7

x8

D8

D15

x8

CB

0

CB

7

x8

D16

D23

x8

D24

D31

x8

D32

D39

x8D

40

D47

x8

D48

D55

x8

X72 (ECC)D

56

D63

x8

10

DRAM Ranks

x8 x8 x8 x8 x8 x8 x8 x8x8 x8 x8 x8 x8 x8 x8 x8

D0

D7

D8

D15

D16

D23

D24

D31

D32

D39

D40

D47

D48

D55

D56

D63

CS1

CS0

Mem

ory

Con

trolle

r Rank0Rank0Rank1Rank1

11

DRAM RanksSingle Rank

8b 8b 8b 8b 8b 8b 8b 8b

64b

Single Rank

4b 4b 4b 4b 4b 4b 4b 4b

64b

4b 4b 4b 4b 4b 4b 4b 4b

Dual-Rank

8b 8b 8b 8b 8b 8b 8b 8b

64b

64b

8b 8b 8b 8b 8b 8b 8b 8b

12

DRAM Organization

Source: Memory Systems Architecture Course, B. Jacobs, Maryland

13

Organization of DRAM Modules

Source: Memory Systems Architecture Course Bruce Jacobs, University of Maryland

MemoryController

Addr and Cmd Bus

Data Bus

Channel

Multi-Banked DRAM Chip

14

DRAM Configuration ExampleSource: MICRON DDR3 DRAM

15

MemoryController

DRAM Module

Addr Bus

WECASRAS

Assert RAS

Row Address

Row OpenedData Bus

Column Address

Assert CAS

DRAM Access (Non Nibble Mode)RAS

CAS

ADDR

DATA

Row Addr

Col Addr

Data

Col Addr

Data

16

DRAM Refresh• Leaky storage • Periodic Refresh across DRAM rows • Un-accessible when refreshing• Read, and write the same data back

• Example: – 4k rows in a DRAM– 100ns read cycle– Decay in 64ms

– 4096*100ns = 410s to refresh once– 410s / 64ms = 0.64% unavailability

17

DRAM Refresh Styles• Bursty

64ms

410s =(100ns*4096) 410s

64ms

• Distributed

64ms

15.6s

64ms

100ns

18

• RAS-Only Refresh

• CAS-Before-RAS (CBR) Refresh

MemoryController

DRAM Module

DRAM Module

MemoryController

Addr Bus

WECASRAS

Addr Bus

WE#

CASRAS

Assert RAS

Row Address

Refresh Row

Assert RAS

Refresh Row

Assert CASWE High

Increment counter

DRAM Refresh Policies

Addr counter

No address involved

19

Types of DRAM• Asynchronous DRAM

– Normal: Responds to RAS and CAS signals (no clock)– Fast Page Mode (FPM): Row remains open after RAS for multiple

CAS commands – Extended Data Out (EDO): Change output drivers to latches. Data

can be held on bus for longer time– Burst Extended Data Out: Internal counter drives address latch.

Able to provide data in burst mode.

• Synchronous DRAM– SDRAM: All of the above with clock. Adds predictability to DRAM

operation– DDR, DDR2, DDR3: Transfer data on both edges of the clock– FB-DIMM: DIMMs connected using point to point connection instead

of bus. Allows more DIMMs to be incorporated in server based systems

• RDRAM– Low pin count

20

Disk Storage

21

Disk Organization

Platters

A track

A sector

A cylinder

(1 to 12)

(5000 to 30000)

(100 to 500)

512 Bytes

3600 to 15000 RPM

22

Disk OrganizationRead/write Head (10s of nanometers above magnetic surface)

Arm

23

Disk Access Time• Seek time

– Move the arm to the desired track– 5ms to 12ms

• Rotation latency (or delay)– For example, average rotation latency for a 10,000

RPM disk is 3ms (=0.5/(10,000/60))• Data transfer latency (or throughput)

– Some tens of hundreds of MB per second– E.g., Seagate Cheetah 15K.6 sustained 164MB/sec

• Disk controller overhead

• Use Disk cache (or cache buffer) to exploit locality– 4 to 32MB today– Come with the embedded controller in the HDD

24

Reliability, Availability, Dependability• Program faults

25

Reliability, Availability, Dependability• Program faults• Static Permanent faults

– Design flaw • FDIV ~500

million$– Manufacturing

• Stuck-at-faults• Process variability

• Dynamic faults– Soft errors– Noise-induced– Wear-out

26

Solution Space • DRAM / SRAM

– Use ECC (SECDED)

• Disks– Use redundancy

• User’s backup• Disk arrays

27

RAID• Reliability and Performance consideration• Redundant Array of Inexpensive Disks• Combine multiple small, inexpensive disk

drives• Break arrays into “reliability groups”• Data are divided and replicated across

multiple disk drives• RAID-0 to RAID-5

• Hardware RAID– Dedicated HW controller

• Software RAID– Implemented in the OS

28

Basic Principles• Data mirroring

• Data striping

• Error correction code

29

RAID-1

• Mirrored disks• Most expensive (100% overhead)• Every write to disk also writes to the check disk• Can improve read/seek performance with sufficient number of

controllers

A4A3A2A1A0

A4A3A2A1A0

Disk 0(Data Disk)

Disk 1(Check Disk)

30

RAID-10

• Combine data striping atop of RAID-1

B5B2A3A0

B5B2A3A0

Data Disk 0

Data Disk 1

C0B3B0A1

Data Disk 2

C0B3B0A1

Data Disk 3

B4B1A2

Data Disk 4

B4B1A2

Data Disk 5

31

RAID-2

• Bit-interleaving striping• Use Hamming Code to generate and store ECC on check disks

(e.g., Hamming(7,4))– Space: 4 data disks need 3 check disks (75%), 10 data disks need 4

check disks (40% overhead), 25 data disks need 5 check disks (20%)– CPU needs more compute power to generate Hamming code than

parity• Complex controller• Not really used today!

D0C0B0A0

D1C1B1A1

Data Disk 0

Data Disk 1

D2C2B2A2

Data Disk 2

D3C3B3A3

Data Disk 3

dECC0cECC0bECC0aECC0

Check Disk 0


CheckDisk 1


CheckDisk 2

32

RAID-3

• Byte-level striping• Use XOR parity to generate and store

parity code on the check disk• At least 3 disks: 2 data disks + 1 check

disk

D0C0B0A0

D1C1B1A1

Data Disk 0

Data Disk 1

D2C2B2A2

Data Disk 2

D3C3B3A3

Data Disk 3

ECCdECCcECCbECCa

Check Disk 0

OneTransfer

Unit

33

RAID-4

• Block-level striping• Keep each individual accessed unit in one disk

– Do not access all disks for (small) transfers– Improved parallelism

• Use XOR parity to generate and store parity code on the check disk• Check info is calculated over a piece of each transfer unit• Small read one read on one disk• Small write two reads and two writes (data and check disks)

– New parity = (old data new data) old parity – No need to read B0, C0, and D0 when read-modify-write A0

• Write is the bottlenecks as all writes access the check disk

A3A2A1A0

B3B2B1B0

Data Disk 0

Data Disk 1

C3C2C1C0

Data Disk 2

D3D2D1D0

Data Disk 3

ECC3ECC2ECC1ECC0

Check Disk 0

RAID-3 vs. RAID-4

34

NewD0 D0 D1 D2 D3 P

New D0 D1 D2 D3 New

P

XOR

NewD0 D0 D1 D2 D3 P

NewD0 D1 D2 D3 New

P

XOR XOR

RAID-3

RAID-4

35

E3D3B3ECC2 C3ECC3 C2

ECC4D2D1

ECC0

RAID-5

• Block-level striping• Distributed parity to enable write parallelism.

Remove bottleneck of accessing parity• Example: write “sector A” and write “sector B” can

be performed simultaneously

A3A2A1A0

E2B2B1B0

Data Disk 0

Data Disk 1

E1C1C0

Data Disk 2

E0D0

Data Disk 3

ECC1

Data Disk 4

36

ECC4qD2D1D0

E2B2A2 ECC4pECC3p ECC3q

C2

RAID-6

• Similar to RAID-5 with “dual distributed parity”• ECC_p = XOR(A0, B0, C0); ECC_q = Code(A0, B0, C0,

ECC_p)• Sustain 2 drive failures with no data loss• Minimum requirement: 4 disks

– 2 for data striping– 2 for dual parity

A1ECC2pECC1q

A0

E1

ECC2qB1B0

Data Disk 0

Data Disk 1

E0C1C0

Data Disk 2

ECC1pECC0p

Data Disk 3

ECC0q

Data Disk 4

lec11 computer architecture by hsien-hsin sean lee georgia tech -- memory part3

Devices & Hardware