chapter 7 · 2005-12-12 · computer architecture cs 35101-002 4 memory hierarchy design philosophy...

60
Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Upload: others

Post on 19-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Page 2: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 2

Basic Memory Requirements

Users/Programmers Demand: Large computer memory Very Fast access memory

Technology Limitations Large Computer memory relatively slower

access Small Computer memory Relatively Faster

access

So how do you build a large computer memory with faster access?

Page 3: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 3

Computer MemoryUse-Case Scenarios

If a memory item is referenced It will most likely be referenced again soon

(Temporal Locality) Its neighbors will tend to be referenced soon

(Spatial Locality)

Basic Philosophy Employ basic requirements, technology limitations

and stated use-case scenarios to architect the memory

Page 4: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 4

Memory HierarchyDesign Philosophy

Build a hierarchy of memories with fast access close to the CPU Employ Temporal Locality and Spatial Locality in the design

CPU

Level 1

Level 2

Level n

Increasing distance

from the CPU in

access timeLevels in the

memory hierarchy

Size of the memory at each level

Increasing speed

Page 5: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 5

Memory SpeedTechnology trend

$0.50 - $25,000,000 – 20,000,000 ns

Magnetic Disk

$100 - $20050 – 70 nsDRAM

$4,000 - $10,0000.5 – 5 nsSRAM

$ per GB in 2004Access TimeTechnology

Page 6: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 6

CPU

SRAM ---- Small Memory and Fastest

DRAM

Magnetic Disk ---- Biggest and Slowest

Fastest Memory is closest to the CPU

Three-Level Computer Memory Hierarchy

Cache

Main

Virtual(stores data)

Page 7: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 7

Memory HierarchyUpper & Lower Levels

CPU

q

q

blockData transfer

if requested data appears in cache block hit not in cache miss

Data requestMemory Performance

hit rate = hits/memory accessmiss rate = 1-hit ratehit time = time to access cache miss penalty = time to access

lower level + time to transfer block in upper level

+ access time of upper memory

Page 8: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 8

Basics of Caches

Simple Cache CPU request Xn

But Xn is not in Cache

Xn is copied in Cache from

memory Xn is returned to CPU

X3

X2

Xn-1

Xn-2

X1

X4

CPU

X3

Xn

X2

Xn-1

Xn-2

X1

X4

Requests Xn

How do we know if a data item is in cache, and how do we find it?

Trip to Memory

MemoryBefore Request After Request

Page 9: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 9

Cache StructureDirect Mapped

Each memory location is mapped directly to a unique location in cache

Example Mapping Scheme: Memory address (Block address) mudulo (Number of cache blocks in cache)

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

Words: 0 - 31

8 cache entries

(8 = 23)

Total entries in Direct mapped Cache must be a Power of 2

1-word block of cache

Page 10: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 10

Basics of Cache Cache mapping: many words -to-one cache location

How do we know whether the data in the cache corresponds to a requested word? Add a set of Tags to the cache

Contain the upper bits of the word address not used in the indexing

How do we know if a cache block contains valid information? Cache contents are invalid (empty) during CPU initialization Solution add Valid Bits to indicate valid address

Page 11: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 11

Accessing Direct-Mapped CacheAssume a direct mapped Cache with 1-word blocks

1101026

1001018

1000016

000113

1000016

1011022

1101026

1011022

Assigned cache block

Hits/MissBinary Ref Address

Decimal Ref Address

1.

2.

3.

4.

5.

6.

7.

8.

Requests

Page 12: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 12

N111

N110

N101

N100

N011

N010

N001

N000

DataTagValidIndex

Accessing a CacheState of Cache at Initialization

Cache is empty

Page 13: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 13

N111

Memory contents1011010Y110

N101

N100

N011

N010

N001

N000

DataTagValidIndex

Accessing a CacheRequest #1: Post Memory Reference 10110

CPU encounters a miss, Cache copies contents from memory 10110

Page 14: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 14

N111

Memory contents1011010Y110

N101

N100

N011

Memory contents1101011Y010

N001

N000

DataTagValidIndex

Accessing a CacheRequest #2: Post Memory Reference 11010

CPU encounters a miss, Cache copies contents from memory 11010

Page 15: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 15

N111

Memory contents1011010Y110

N101

N100

N011

Memory contents1101011Y010

N001

N000

DataTagValidIndex

Accessing a CacheRequest #3: Post Memory Reference 10110

CPU encounters a hit,

Page 16: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 16

N111

Memory contents1011010Y110

N101

N100

N011

Memory contents1101011Y010

N001

N000

DataTagValidIndex

Accessing a CacheRequest #4: Post Memory Reference 11010

CPU encounters a hit

Page 17: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 17

N111

Memory contents1011010Y110

N101

N100

N011

Memory contents1101011Y010

N001

Memory contents1000010Y000

DataTagValidIndex

Accessing a CacheRequest #5: Post Memory Reference 10000

CPU encounters a miss, Cache copies contents from memory 10000

Page 18: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 18

N111

Memory contents1011010Y110

N101

N100

Memory contents0001100Y011

Memory contents1101011Y010

N001

Memory contents1000010Y000

DataTagValidIndex

Accessing a CacheRequest #6: Post Memory Reference 00011

CPU encounters a miss, Cache copies contents from memory 00011

Page 19: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 19

N111

Memory contents1011010Y110

N101

N100

Memory contents0001100Y011

Memory contents1101011Y010

N001

Memory contents1000010Y000

DataTagValidIndex

Accessing a CacheRequest #7: Post Memory Reference 10000

CPU encounters a hit

Page 20: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 20

N111

Memory contents1011010Y110

N101

N100

Memory contents0001100Y011

Memory contents1101011Y010

N001

Memory contents1000010Y000

DataTagValidIndex

Accessing a CacheRequest #7: Post Memory Reference 00011

CPU encounters a miss, Cache copies contents from memory 00011

Page 21: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 21

N111

Memory contents1011010Y110

N101

N100

Memory contents0001100Y011

Memory contents1001010Y010

N001

Memory contents1000010Y000

DataTagValidIndex

Accessing a CacheRequest #8: Post Memory Reference 10010

CPU encounters a miss, Cache copies contents from memory 10010

Page 22: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 22

Accessing Direct-Mapped CacheAssume a direct mapped Cache with 1-word blocks

010Hit1101026

010Miss (fig 7.6f)1001018

000Hit1000016

011Miss

(fig,. 7.6e)

000113

000Miss

(fig. 7.6d)

1000016

110Hit1011022

010Miss

(fig. 7.6c)

1101026

110Miss

(fig. 7.6b)

1011022

Assigned cache block

Hits/MissBinary Ref Address

Decimal Ref Address

1.

2.

3.

4.

5.

6.

7.

8.

Requests

Page 23: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 23

Cache StructureDirect Mapped

Each memory location is mapped directly to a unique location in cache

Example Mapping Scheme: Memory address (Block address) mudulo (Number of cache blocks in cache)

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

Words: 0 - 31

8 cache entries

(8 = 23)

Total entries in Direct mapped Cache must be a Power of 2

1-word block of cache

Page 24: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 24

N111

N110

N101

N100

N011

N010

N001

N000

DataTagValidIndex

Accessing a CacheState of Cache at Initialization

Cache is empty

Page 25: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 25

Address (showing bit positions)

Data

Hit

Data

Tag

Valid Tag

3220

Index

012

102310221021

=

Index

20 10

Byteoffset

31 30 13 12 11 2 1 0

MIPS Direct Mapped Cache32-bit Byte Reference Address

•Cache Index: Bits 2 – 11•Bits 0-1 ignored (not significant)

• Tag Field: Bits 12 - 31

CPU

Cache:

Cache size:• 210 blocks• 1 block/word

Page 26: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 26

Direct Mapped Cache Four-word blocks & Total size 16 words

3210Index/Block

11

10

01

00

Cache DATA

How do we map the memory address to a Direct Mapped cache with four-word blocks?

Page 27: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 27

Four-word blocks & Total size 16 words Consider following reference Memory addresses

1010020

1110028

1001018

1000016

000113

1000016

1011123

1101026

1011022

Binary Ref Address

Decimal Ref Address

1.

2.

3.

4.

5.

6.

7.

8.

9.

Ref.Requests

Show the Direct-Mapped Cache contents at each stage

Page 28: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 28

Direct Mapped Cache: Four-word blocks & Total size 16 words

Initial Content of Cache

3210Index/Block

11

10

01

00

Cache Data

N

N

N

N

VTag

Cache is empty

Page 29: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 29

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #1: Post Memory Reference address 22 ( 1 01 10 )

3

11two

2

10two

1

01two

0

00two

Index/Block

11

10

01

00

Cache

Con(23)Con(22)Con(21)Con (20)

Data

Block: 22 mod 4 = 2 =10two

Index: Four word 2bits max 01

Address Tag = 1: diff from cache Tag content Miss

N

N

Y1

N

VTag

Trailing 2 bits of 22

Next 2 upper bits of 22

Transfer data to cache then

set Tag to 1 & V bit to Y

Remaining bits of 22

Page 30: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 30

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #2: Post Memory Reference address 20 (10100)

3

11two

2

10two

1

01two

0

00two

Index/Block

11

10

01

00

Cache

Con(23)Con(22)Con(21)Con (20)

Data

Block: 20 mod 4 = 0 =00two

Index: Four word 2bits max 01

Tag = 1(Remaining bits)

N

N

Y1

N

VTag

HitTag & V-bits already set to 1& Y resp.

Page 31: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 31

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #3: Post Memory Reference address 26 (11010)

3

11two

2

10two

1

01two

0

00two

Index/Block

11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Data

Block: 26 mod 4 = 2 =10two

Index: Four word 2bits max 10Tag = 1

(Remaining bits)

N

Y1

Y1

N

VTag

Tag not set, V-bit set to N MissTransfer data to cache then

set Tag to 1 & V bit to Y

Page 32: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 32

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #4: Post Memory Reference address 23 (10111)

3210Index/Block

11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Data

Block: 23 mod 4 = 3 =11two

Index: Four word 2bits max 01

Tag = 1(Remaining bits)

N

Y1

Y1

N

VTag

Tag already set to 1, V-bit set to Y Hit

Page 33: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 33

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #5: Post Memory Reference address 28 (11100)

3210Index/Block

Con(31)Con(30)Con(29)Con (28)11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Data

Block: 28 mod 4 = 0 =00two

Index: Four word 2bits max 11

Tag = 1(Remaining bits)

Y1

Y1

Y1

N

VTag

MissTag not set, V-bit set to N Transfer data to cache then

set Tag to 1 & V bit to Y

Page 34: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 34

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #6: Post Memory Reference address 16 (10000)

3210Index/Block

Con(31)Con(30)Con(29)Con (28)11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Con(19)Con(18)Con(17)Con (16)

Data

Block: 16 mod 4 = 0 =00two

Index: Four word 2bits max 00

Tag = 1(Remaining bits)

Y1

Y1

Y1

Y1

VTag

MissTag not set, V-bit set to N Transfer data to cache then

set Tag to 1 & V bit to Y

Page 35: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 35

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #7: Post Memory Reference address 3 (00011)

3210Index/Block

Con(31)Con(30)Con(29)Con (28)11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Con(3)Con(2)Con(1)Con (0)

Data

Block: 3 mod 4 = 3 =11two

Index: Four word 2bits max 00

Tag = 0(Remaining bits)

Y1

Y1

Y1

Y0

VTag

MissTransfer data to cache then

set Tag to 0 & V bit to Y

But Tag is set to 1

Page 36: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 36

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #8: Post Memory Reference address 16 (10000)

3210Index/Block

Con(31)Con(30)Con(29)Con (28)11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Con(19)Con(18)Con(17)Con (16)

Data

Block: 16 mod 4 = 0 =00two

Index: Four word 2bits max 00

Tag = 1(Remaining bits)

Y1

Y1

Y1

Y1

VTag

MissBut Tag is set to 0 Transfer data to cache then

set Tag to 1 & V bit to Y

Page 37: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 37

Direct Mapped Cache: Four-word blocks & Total size 16 words

Request #9: Post Memory Reference address 18 (10010)

3210Index/Block

Con(31)Con(30)Con(29)Con (28)11

10

01

00

Cache

Con(27)Con(26)Con(25)Con (24)

Con(23)Con(22)Con(21)Con (20)

Con(19)Con(18)Con(17)Con (16)

Data

Block: 18 mod 4 = 2 =10two

Index: Four word 2bits max 00

Tag = 1(Remaining bits)

Y1

Y1

Y1

Y1

VTag

Tag already set to 1, V-bit set to Y Hit

Page 38: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 38

Handling Hits and MissesControl Unit

Read hits BAU! this is what we want!

Read misses stall the CPU, control unit fetches block from memory,

deliver to cache, restart

Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache

later)

Write misses: read the entire block into the cache, then write the word

Page 39: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 39

Final Schedule

Wed. Dec. 14 at 5:45 pm

Extra Class: Dec 8th/Thursday from 5:15pm at room 108!

No class on next coming Monday But Dr. Samba will teach on next Wednesday!

Page 40: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 40

Other Cache StructuresReducing Cache Misses

Fully Associative Cache Each Block in Memory can be placed anywhere in

the cache

Set-Associative Cache Each Block in Memory can be placed in a fixed

number of locations within a “set” A 2-Way Set-Associative Cache

Each set has 2 elements: Cache Locations

Page 41: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 41

Set Associative Cache Mapping of Memory Lines

Each set can hold E lines Typically between 2 and 8

Given memory line can map to any entry within its given set

Eviction Policy Which line gets kicked out when bring new line in Commonly either “Least Recently Used” (LRU) or pseudo-random

LRU: least-recently accessed (read or written) line gets evicted

Set i:0 1 • • • B–1Tag Valid

•••

0 1 • • • B–1Tag Valid

0 1 • • • B–1Tag Valid

LRU State

Line 0:

Line 1:

Line E–1:

Page 42: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 42

Set 0:

Set 1:

Set S–1:

•••

t s b

tag set index offset

Physical Address

Indexing into 2-Way Associative Cache Use middle s bits to

select from among S = 2s sets

0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid

0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid

0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid

Page 43: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 43

2-Way Associative Cache Tag Matching Identifying Line

Must have one of the tags match high order bits of address

Must have Valid = 1 for this line

Selected Set:

t s b

tag set index offset

Physical Address

= ?

= 1?

Lower bits of address select byte or word within cache line

0 1 • • • B–1Tag Valid0 1 • • • B–1Tag Valid

Page 44: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 44

2-Way Set Associative SimulationM=16 addresses, B=2 bytes/line, S=2 sets, E=2 entries/set

Address trace (reads):0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]

0 (miss)

13 (miss)

8 (miss)(LRU replacement)

0 (miss)(LRU replacement)

xxt=2 s=1 b=1

x x

1 00 m[1] m[0]

v tag data v tag data

v tag data v tag data1 00 m[1] m[0] 1 11 m[13] m[12]

1 10 m[9] m[8]

v tag data v tag data1 11 m[13] m[12]

1 10 m[9] m[8]

v tag data v tag data1 00 m[1] m[0]

Page 45: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 45

Two-Way Set Associative CacheImplementation

Set index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result

Cache Data

Cache Line 0

Cache Tag

Valid

:: :

Cache Data

Cache Line 0

Cache Tag Valid

: ::

Set Index

Mux 01Sel1 Sel0

Cache Line

CompareAdr Tag

Compare

OR

Hit

Adr Tag

Page 46: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 46

5.

4.

3.

2.

1.

Order of requests

001106

010008

000000

010008

000000

Binary Reference Address

Decimal Reference Address

Two-way Set - Associative Cache Show Cache Contents after each requests

Page 47: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 47

Two-way Set - Associative Cache State of Cache at initialization

Element 2Element 1Set

1

0

NN

NN

DataTagVDataTagV

We will assume:2. The Set -Associative Cache has 2 Sets (21) with two elements per set3. Cache Block Size 1 Byte = 20

= Max bits to rep Address – Max bits to rep Set – Cache Block size

= Max bits to rep Address - 1 - 0

Number of bits in Tag

Page 48: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 48

Two-way Set - Associative Cache Contents Request #1: Post Memory Reference address 0 (00000)

Locate Set: 0 mod 2 = 0 =0two

Tag bit: 5 – 1 Leading 4 bits

Miss Transfer data to cache then

set Tag to 000 & V bit to Y

Element 2Element 1Set

1

0 CON[0]0000Y

DataTagVDataTagV

Does not exist in Set 0

Search:

Page 49: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 49

Two-way Set - Associative Cache Contents Request #2: Post Memory Reference address 8 (01000)

Locate Set: 8 mod 2 = 0 =0two

Tag bit: 5– 1 Leading 4 bits

Miss Transfer data to cache then

set Tag to 010 & V bit to Y

Element 2Element 1Set

1

0 CON[8]0100YCON[0]0000Y

DataTagVDataTagV

Does not exist in Set 0

Search:

Page 50: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 50

Two-way Set - Associative Cache Contents Request #3: Post Memory Reference address 0 (00000)

Locate Set: 0 mod 2 = 0 =0two

Tag bit: 5 – 1 Leading 4 bits

HitSend Memory address

contents to CPU

Element 2Element 1Set

1

0 CON[8]0100YCON[0]0000Y

DataTagVDataTagV

TAG bits exist in Set 0

Search the TAG bits in set 0

(parallel algorithm) for 000:

Page 51: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 51

Two-way Set - Associative Cache Contents Request #4: Post Memory Reference address 6 (00110)

Locate Set: 6 mod 2 = 0 =0two

Tag bit: 5 – 1 Leading 4 bits

Miss

Transfer data to cache then

set Tag to 001 & V bit to Y

Element 2Element 1Set

1

0 CON[6]0011YCON[0]0000Y

DataTagVDataTagV

Does not exist in Set 0

Search:

Cache Replacement Algorithm: Least Recently Used block

Page 52: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 52

Two-way Set - Associative Cache Contents Request #5: Post Memory Reference address 8 (01000)

Locate Set: 8 mod 2 = 0 =0two

Tag bit: 5 – 1 Leading 4 bits

MissTransfer data to cache then

set Tag to 010 & V bit to Y

Element 2Element 1Set

1

0 CON[6]0010YCON[8]0100Y

DataTagVDataTagV

Does not exist in Set 0

Search:

Cache Replacement Algorithm: Least Recently Used block

Page 53: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 53

Fully Associative Cache Mapping of Memory Lines

Cache consists of single set holding E lines Given memory line can map to any line in set Only practical for small caches

Entire Cache

0 1 • • • B–1Tag Valid

•••

0 1 • • • B–1Tag Valid

0 1 • • • B–1Tag Valid

LRU State

Line 0:

Line 1:

Line E–1:

Page 54: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 54

Fully Associative Cache Tag Matching Identifying Line

Must check all of the tags for match

Must have Valid = 1 for this line

t b

tag offset

Physical Address

= ?

= 1?

Lower bits of address select byte or word within cache line

0 1 • • • B–1Tag Valid

•••

0 1 • • • B–1Tag Valid

0 1 • • • B–1Tag Valid

•••

Page 55: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 55

Fully Associative Cache SimulationM=16 addresses, B=2 bytes/line, S=1 sets, E=4 entries/setAddress trace (reads):

0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]xxxt=3 s=0 b=1

x

1 000 m[1] m[0]v tag data

1 110 m[13] m[12]

13 (miss)

(2)

v tag data8 (miss)

(3)

1 000 m[1] m[0]1 110 m[13] m[12]1 100 m[9] m[8]

1 00 m[1] m[0]v tag data

0 (miss)

(1) set ø

Page 56: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 56

Set-Associative CacheBlock Replacement Algorithm

Least Recently Used (LRU) The block replaced is the one that has been unused for the

longest time. Track usage of each Element in a set Expensive with increased associativity (m-way set associative; where

m very large)

Most Recently Used (MRU) Least-Frequently Used (LFU ) Most-Frequently Used (MFU) First In First Out (FIFO)

Page 57: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 57

Let’s SummarizeM-Way Set-Associative Cache

1-Way Set-Associative Cache (M=1) (√) Direct Mapped

2-Way Set-Associative Cache (M=2) (√) Cache Block Replacement: LRU

4-Way Set-Associative Cache (M=4) (√) Cache Block Replacement: LRU

Fully Associative Cache Block can be played in any location in Cache All entries in cache must be searched in response to cache

request (Expensive)

Page 58: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 58

Let’s assume Index field occupies n bits Size = 2n blocks

Block size 2m word 2m+5 bits

For a 32-bit byte address: Number of bits for Tag field = 32 – (n + m +2) bits Size of cache = 2n x (block size + Tag size + Valid size) Since Block size = 2m+5 bits Number of bits: 2n x (block size + Tag size + Valid size) = 2n x (2m+5 + [32 – (n + m +2)] + 1) = 2n x (m x 32 + [32 – (n + m +2)] + 1)

Cache size = 2n x (m x 32 + 31 – n –m)

Direct Mapped CacheCache size

Page 59: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 59

Exercise

How many total bits are required for a direct-mapped cache with 16KB of data and 4-word blocks, assuming a 32-bit address? How many words in each block in terms of words? How many blocks? How many bits for the Tag field? The total cache size = ?

Page 60: Chapter 7 · 2005-12-12 · Computer Architecture CS 35101-002 4 Memory Hierarchy Design Philosophy Build a hierarchy of memories with fast access close to the CPU Employ Temporal

Computer Architecture CS 35101-002 60

Fully Associative Cache SimulationM=16 addresses, B=2 bytes/line, S=1 sets, E=4 entries/setAddress trace (reads):

0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000]xxxt=3 s=0 b=1

x

1 000 m[1] m[0]v tag data

1 110 m[13] m[12](2)

v tag data

(3)

1 000 m[1] m[0]1 110 m[13] m[12]1 100 m[9] m[8]

1 00 m[1] m[0]v tag data

(1) set ø