memory hierarchy part 1

41
Faculty of Computer Science CMPUT 229 © 2006 Memory Hierarchy Part 1 Refreshing Memory

Upload: eurydice-charis

Post on 02-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Refreshing Memory. Memory Hierarchy Part 1. Optional: Bryant , Randal E., O’Hallaron , David, Computer Systems: A Programmer’s Perspective , Prentice Hall, 2003. (B&H). Reading Assignment. Chapter 6: The Memory Hierarchy. Required: Sections 8.4 and 12.4 of the Clements textbook. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Memory Hierarchy Part 1

Faculty of Computer Science

CMPUT 229 © 2006

Memory HierarchyPart 1

Refreshing Memory

Page 2: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Reading Assignment

Optional:Bryant, Randal E., O’Hallaron, David, Computer Systems: A Programmer’s Perspective, Prentice Hall, 2003. (B&H)

Chapter 6: The Memory Hierarchy

Required:Sections 8.4 and 12.4 of the Clements textbook.

Page 3: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Types of Memories

Read/Write Memory (RWM):

the time required to read orwrite a bit of memory is independent of the bit’s location.

once a word is writtento a location, it remains stored as long as power is appliedto the chip, unless the location is written again.

the data stored ateach location must be refreshed periodically by reading it andthen writing it back again, or else it disappears.

we can store and retrieve data.

Random Access Memory (RAM):

Static Random Access Memory (SRAM):

Dynamic Random Access Memory (DRAM):

Page 4: Memory Hierarchy Part 1

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

DOUT3 DOUT2 DOUT1 DOUT0

3-to-8decoder

2

1

0

A2

A1

A0

0

1

2

3

4

5

6

7

DIN3 DIN0DIN2 DIN1

WE_LCS_L

OE_L

WR_L

IOE_L

0

1

1

Page 5: Memory Hierarchy Part 1

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

DOUT3 DOUT3 DOUT3 DOUT3

3-to-8decoder

2

1

0

A2

A1

A0

0

1

2

3

4

5

6

7

DIN3 DIN3DIN3 DIN3

WE_LCS_L

OE_L

WR_L

IOE_L

0

1

1

Page 6: Memory Hierarchy Part 1

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

DOUT3 DOUT3 DOUT3 DOUT3

3-to-8decoder

2

1

0

A2

A1

A0

0

1

2

3

4

5

6

7

DIN3 DIN3DIN3 DIN3

WE_LCS_L

OE_L

WR_L

IOE_L

0

1

1

Page 7: Memory Hierarchy Part 1

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

DOUT3 DOUT3 DOUT3 DOUT3

3-to-8decoder

2

1

0

A2

A1

A0

0

1

2

3

4

5

6

7

DIN3 DIN3DIN3 DIN3

WE_LCS_L

OE_L

WR_L

IOE_L

0

1

1

Page 8: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Refreshing the Memory

Vcap

0V

HIGHLOW

VCC

time

0 stored

1 written refreshes

The solution is to periodically refresh the memorycells by reading and writing back each one of them.

Page 9: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

SRAM with Bi-directional Data Bus

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

DIO3 DIO2 DIO1 DIO0

WE_LCS_L

OE_L

WR_L

IOE_L

microprocessor

Page 10: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

DRAM High Level View

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

DRAM chip

addr

data

2/

8/

Memorycontroller

(to CPU)

Byant/O’Hallaron, pp. 459

Page 11: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

DRAM RAS Request

RAS = 2

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

DRAM chip

Row 2

addr

data

2/

8/

Memorycontroller

RAS = Row Address StrobeByant/O’Hallaron, pp. 460

Page 12: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

DRAM CAS Request

Supercell (2,1)

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

DRAM chip

CAS = 1

addr

data

2/

8/

Memorycontroller

CAS = Column Address StrobeByant/O’Hallaron, pp. 460

Page 13: Memory Hierarchy Part 1

Memory Modules

: Supercell (i,j)

031 78151623243263 394047485556

64-bit double word at main memory address A

addr (row = i, col = j)

data

64 MB memory module

consisting of8 8Mx8 DRAMs

Memorycontroller

bits0-7

DRAM 7

DRAM 0

bits8-15

bits16-23

bits24-31

bits32-39

bits40-47

bits48-55

bits56-63

64-bit doubleword to CPU chip

Byant/O’Hallaron, pp. 461

Page 14: Memory Hierarchy Part 1

Step 1: Apply row address

1

Step 2: RAS go from high to low and remain low2

Step 4: WE must be high

4

Step 3: Apply column address

3Step 5: CAS goes from high to low and remain low

5

Step 6: OE goes low

6

Step 7: Data appears

7

Step 8: RAS and CAS return to high

8

Read Cycle on an Asynchronous DRAM

Page 15: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Improved DRAMs

Central Idea: Each read to a DRAM actuallyreads a complete row of bits or word line fromthe DRAM core into an array of sense amps.

A traditional asynchronous DRAM interfacethen selects a small number of these bits to bedelivered to the cache/microprocessor.

All the other bits already extracted from the DRAMcells into the sense amps are wasted.

Page 16: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Fast Page Mode DRAMs

In a DRAM with Fast Page Mode, a page is defined asall memory addresses that have the same row address.

To read in fast page mode, all the steps from 1 to 7 ofa standard read cycle are performed.

Then OE and CAS are switched high, but RAS remains low.

Then the steps 3 to 7 (providing a new column address,asserting CAS and OE) are performed for each newmemory location to be read.

Page 17: Memory Hierarchy Part 1

A Fast Page Mode Read Cycle on an Asynchronous DRAM

Page 18: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Enhanced Data Output RAMs (EDO-RAM)

The process to read multiple locations in an EDO-RAMis very similar to the Fast Page Mode.

The difference is that the output drivers are not disabledwhen CAS goes high.

This distintion allows the data from the current read cycleto be present at the outputs while the next cyclebegins.

As a result, faster read cycle times are allowed.

Page 19: Memory Hierarchy Part 1

An Enhanced Data Output Read Cycle on an Asynchronous DRAM

Page 20: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Synchronous DRAMs (SDRAM)

A Synchronous DRAM (SDRAM) has a clock input. It operatesin a similar fashion as the fast page mode and EDO DRAM.However the consecutive data is output synchronously on thefalling/rising edge of the clock, instead of on command byCAS.

How many data elements will be output (the length of the burst) is programmable up to the maximum size ofthe row.

The clock in an SDRAM typically runs oneorder of magnitude faster than the access time forindividual accesses.

Page 21: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

DDR SDRAM

A Double Data Rate (DDR) SDRAM is an SDRAMthat allows data transfers both on the rising andfalling edge of the clock.

Thus the effective data transfer rate of a DDR SDRAM is two times the data transfer rate ofa standard SDRAM with the same clock frequency.

Page 22: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

The Rambus DRAM (RDRAM)

Multiple memory arrays (banks)Rambus DRAMs are synchronous and transfer data on both edges of the clock.

Page 23: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

SDRAM Memory Systems

Complex circuits for RAS/CAS/OE.

Each DIMM is connectedin parallel with the memorycontroller.(DIMM = Dual In-line Memory Module)

Often requires buffering.

Needs the whole clockcycle to establish valid data.

Making the bus wider ismechanically complicated.

Page 24: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

RDRAM Memory Systems

Page 25: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Locality

We say that a computer program exhibits good locality if the program tends to reference data that is nearby or datathat has been referenced recently.

Because a program might do one of these things, but not the other,the principle of locality is separated into two flavors:

Temporal locality: a memory location that is referenced once is likely to be referenced multiple times in the near future.

Spatial locality: if a memory location that is referenced once then locations that are nearby are likely to be referenced in the near future.

Byant/O’Hallaron, pp. 478

Page 26: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Examples

In the Sampler function below, RandInt returns a randomly selected integer within the specified interval.Which program has better locality?

1 int SumVec(int v[], int N) 2 { 3 int i; 4 int sum = 0; 5 6 for (i=0 ; i<N ; i=i+1) 7 sum += v[i]; 8 return sum; 9 }

1 int Sampler(int v[], int N, int K) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<K ; i=i+1) 7 { 8 j = RandInt(0,N-1); 9 sum += v[j];10 }11 return sum/K;12 }

Byant/O’Hallaron, pp. 479

Page 27: Memory Hierarchy Part 1

Memory Hierarchy

Larger, slower,

and cheaper (per byte)storagedevices

Registers

CPU registers hold words retrieved from cache memory.

L0:

On-chip L1cache (SRAM)

L1 cache holds cache lines retrieved from the L2 cache.L1:

Off-chip L2cache (SRAM)

L2 cache holds cache lines retrieved from memory.L2:

Main memory(DRAM)

Main memory holds disk blocks retrieved from local

disks.

L3:

Local secondary storage(local disks)

Local disks hold files retrieved from disks on

remote network servers.

L4:

Remote secondary storage(distributed file systems, Web servers)

L5:

Smaller,faster,and

costlier(per byte)storage devices

Byant/O’Hallaron,

pp. 483

Page 28: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Caching Principle

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storagedevice at level k+1 is partitioned

into blocks.

Smaller, faster, more expensivedevice at level k caches a

subset of the blocks from level k+1

Data is copied betweenlevels in block-sized transfer units

Level k:

Level k+1:

Byant/O’Hallaron, pp. 484

Page 29: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Cache Misses

Cold Misses, or compulsory misses, occur the first time that a data is referenced.

Conflict Misses, occur when two memory references have to occupy the same memory line. It can occur even when the remainder of the cache is not in use.

Capacity Misses, occur when there are no more free lines in the cache.

Page 30: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Simplest Cache: Direct MappedMemory

4 Byte Direct Mapped Cache

Memory Address0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

Cache Index

0

1

2

3

Location 0 can be occupied by data from:

– Memory location 0, 4, 8, ... etc.

– In general: any memory locationwhose 2 LSBs of the address are 0s

– Address<1:0> => cache index

Which one should we place in the cache?

How can we tell which one is in the cache?

Page 31: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

1 KB Direct Mapped Cache, 32B blocks For a 2 ** N byte cache:

– The uppermost (32 - N) bits are always the Cache Tag

– The lowest M bits are the Byte Select (Block Size = 2 ** M)

Cache Index

0

1

2

3

:

Cache Data

Byte 0

0431

:

Cache Tag

Example: 0x50 Ex: 0x01

0x50

Cache Tag is Stored as part of the cache “state”

Valid Bit

:

31

Byte 1Byte 31 :

Byte 32Byte 33Byte 63 :Byte 992Byte 1023 :

Cache Tag

Byte Select

Ex: 0x00

9

Page 32: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Direct-mapped Cache

Clements pp. 346

Page 33: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Identifying sets in Direct-mapped Caches

Clements pp. 347

Page 34: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Operation of a Direct-mapped Cache

Clements pp. 348

Page 35: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Full-Associative Cache

Clements pp. 348

Page 36: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Two-way Set Associative Cache N-way set associative: N entries for each Cache Index

– N direct mapped caches operates in parallel (N typically 2 to 4)

Example: Two-way set associative cache

– Cache Index selects a “set” from the cache

– The two tags in the set are compared in parallel

– Data is selected based on the tag result

Cache Data

Cache Block 0

Cache TagValid

:: :

Cache Data

Cache Block 0

Cache Tag Valid

: ::

Cache Index

Mux 01Sel1 Sel0

Cache Block

CompareAdr Tag

Compare

OR

Hit

Page 37: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Set associative-mapped cache

Clements pp. 349

Page 38: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

L1 and L2 Bus System

Mainmemory

I/Obridge

Bus interfaceL2 cache

ALU

Register file

CPU chip

Cache bus System bus Memory bus

L1 cache

Byant/O’Hallaron, pp. 488

Page 39: Memory Hierarchy Part 1

Cache Organization

• • • B–110

• • • B–110

Valid

Valid

Tag

TagSet 0:

B = 2b bytesper cache block

E lines per set

S = 2s sets

t tag bitsper line

1 valid bitper line

Cache size: C = B x E x S data bytes

• • •

• • • B–110

• • • B–110

Valid

Valid

Tag

TagSet 1:

• • •

• • • B–110

• • • B–110

Valid

Valid

Tag

TagSet S -1:

• • •• • •

Byant/O’Hallaron, pp. 488

Page 40: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Address Partition

t bits s bits b bits

0m-1

Tag Set index Block offset

Address:

Compared with tags in thecache to find a match.

Used to find the set wherethe data might be found inthe cache.

Selects which word, insidethe block, is referenced.

Byant/O’Hallaron, pp. 488

Page 41: Memory Hierarchy Part 1

© 2006

Department of Computing Science

CMPUT 229

Multi-Level Cache Organization

Mainmemory Disk

L1 i-cache

L1 d-cacheRegs L2 unifiedcache

CPU

Byant/O’Hallaron, pp. 504