caches j. nelson amaral university of alberta. processor-memory performance gap bauer p. 47

28
Caches J. Nelson Amaral University of Alberta

Post on 20-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Caches

J. Nelson AmaralUniversity of Alberta

Page 2: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Processor-Memory Performance Gap

Bauer p. 47

Page 3: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Memory Hierarchy

Bauer p. 48

Page 4: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Principle of Locality

• Temporal Localitywhat was used in the past is likely to be reused in the near future

• Spatial Localitywhat is close to the thing that is being used now is likely to be also used in the near future

Bauer p. 48

Page 5: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Hits and Misses

• Cache hit: the requested location is in the cache

• Cache miss: the requested location in not in the cache

Bauer p. 48

Page 6: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Cache Organizations

• When to bring the content of a memory location into the cache?

• Where to put it?• How do we know it is there?• What happens if the cache is full and we need

to bring the content of a location into the cache?

On demand

Depends on Cache Organization

Tag entries

Use a replacement algorithm

Bauer p. 49

Page 7: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Cache Organization

Bauer p. 50

Page 8: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Mapping

Bauer p. 51

Page 9: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Content-Addressable Memories (CAMs)

• Indexed by matching (part of) the content of entries

• All entries are searched in parallel• Drawbacks:

– expensive hardware– consume more power– difficult to modify

Bauer p. 50

Page 10: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Cache Geometry

• C: number of cache lines• m: number of banks in the cache (associativity)• L: line size• S: Cache size (or capacity)• S = C × L• (S, L, m) gives the geometry of a cache• d: number of bits needed for displacement

Bauer p. 52

Page 11: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Hit and Miss Detection

(S,L,m) = (32KB, 16B, 1)Cache Geometry:

Memory Reference:(t,i,d) = (?, ?, ?)

d = log2 L = log2 16 = 4

i = log2 (C/m) = log 2048 = 11

C = S/L = 32KB/16B = 2048

t= 32 – i – d = 32 – 11 – 4 = 17

Bauer p. 52

• C: # of cache lines• m: associativity• L: line size• S: Cache size

• S = C × L• (S, L, m): geometry• d: # displacement bits

(t,i,d) = (tag, index, displacement)

Page 12: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Hit and Miss Detection

d = log2 L = log2 16 = 4

i = log2 (C/m) = log 2048 = 11

C = S/L = 32KB/16B = 2048

t= 32 – i – d = 32 – 11 – 4 = 17

Bauer p. 52

What happens to t if we doublethe line size?

32

32B

5

1024

1024

10

10

5

• C: # of cache lines• m: associativity• L: line size• S: Cache size

• S = C × L• (S, L, m): geometry• d: # displacement bits

(t,i,d) = (tag, index, displacement)

(S,L,m) = (32KB, 16B, 1)32

Page 13: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Hit and Miss Detection

d = log2 L = log2 16 = 4

i = log2 (C/m) = log 2048 = 11

C = S/L = 32KB/16B = 2048

t= 32 – i – d = 32 – 11 – 4 = 17

Bauer p. 52

What happens to t if we changeto a 2-way associativity?

1024 10

10 17

Need one more comparatorand a multiplexor.

(S,L,m) = (32KB, 16B, 1) 2

• C: # of cache lines• m: associativity• L: line size• S: Cache size

• S = C × L• (S, L, m): geometry• d: # displacement bits

(t,i,d) = (tag, index, displacement)

Page 14: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Replacement Algorithm

• Direct mapped– There is only one location for a block– If the location is occupied, the block that is there

is evicted• m-way set associative

– If all m are valid, must select a victim• Low associativity:

- Least-Recently Used (LRU) entry should be evicted- High associativity:

- (Two) Most-Recently Used (MRU) should not be evicted.

Bauer p. 53

Page 15: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Write Strategies (on a hit)

• Write back– Write only to the cache (memory becomes stale)– Add a dirty bit to each cache line– Must write back to memory when entry is evicted

• Write through– Write to both cache and memory– No need to have a dirty bit– Memory is consistent at all times

Bauer p. 54

Page 16: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Write Strategies (on a miss)

• Write allocate– read the line from the memory– write to the line to modify it

• Write around– write to the next level only

• Combinations that make sense:– write back with write allocate– write through with write around

Bauer p. 54

Page 17: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Write Buffer

Processor CacheWriteBuffer MemoryRead

Read

WriteWrite

Bauer p. 54

Page 18: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

The three C’s• Compulsory (cold) misses

– first time a memory block is referenced• Conflict misses

– more than m blocks compete for the same cache entries in an m-way cache

• Capacity misses– more than C blocks compute for space in a cache with

C lines• Coherence misses

– needed blocks are invalidated because of I/O or multiprocessor operations.

Bauer p. 54

Page 19: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Caches and I/O (read)

Bauer p. 55

What happens to the cache when data need to move fromdisk to memory?

1. Invalidate cache data using valid bit.

Page 20: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Caches and I/O (read)

Bauer p. 55

2. Update cache with new data.

What happens to the cache when data need to move fromdisk to memory?

Page 21: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Caches and I/O (Write)

Bauer p. 55

What happens to the cache when data need to move frommemory to disk?

purge dirty lines

Alternative: Hardware Snoopy Protocol.

Page 22: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Cache Performance

Hit Ratio:

h= number of memory references that hit the cache

total number of memory references to the cache

miss ratio= 1- h

Average Memory Access Time= h× Tcache+ (1- )h Tmem

For two levels of cache:

AMAT = h1 ×T 1L + (1- h1 ) ×h2 ×T 2L +(1 - h1 ) ×(1 - h2 ) ×Tmem

Bauer p. 56

Page 23: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Cache Performance

AMAT = h× Tcache+ (1- )h Tmem

Goal: Reduce AMAT

Strategies: 1. Increase hit ratio (h) 2. Reduce Tcache

Parameters: 1. Cache Capacity 2. Cache Associativity 3. Cache Line Size

Bauer p. 56

Page 24: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Influence of Capacity on Miss Rate

Bauer p. 57Cache is (S, 2, 64) Application: 176.gcc

Page 25: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Associativity X Miss Rate

Cache is (32KB, m, 64) Application: 176.gcc

Page 26: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Line Size X Miss Rate

Cache is (16KB, 1, L)

Page 27: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

Memory Access time

AMAT = h× Tcache+ (1- )h Tmem

Tmem = Tacc+ ( / )L w TbusTacc : Time to send address + Time to Read

L : L2 cache line size

w : Bus width

Tbus : bus cycle time

AMAT = h× Tcache+ (1- )h ×(Tacc + (L/w) ×Tbus )

Page 28: Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47

AMAT Example

Tacc : 5 cycles

w : 64 bits

Tbus : 2 cycles

Cache CA : hA = 0.88, LA = 16 bytes

Cache CB : hB = 0.92, LB = 32 bytes

Both access time (CA and CB) is 1 cycle

We will study two alternative configurations, CA and CB, for a single level of cache. What is the AMAT in each case?

AMATA = 0.88 × 1+ (1- 0.88) ×(5 + (16 /8) × 2) = 1.96

AMAT = h× Tcache+ (1- )h ×(Tacc + (L/w) ×Tbus )

AMATB = 0.92 × 1+ (1 - 0.92) ×(5 + (16 /4) × 2) = 1.96