cs152 – computer architecture and engineering lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf ·...

21
CS 152 L13 Cache I () UC Regents Fall 2004 © UCB 2004-10-14 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/ CS152 – Computer Architecture and Engineering Lecture 13 – Cache I 1

Upload: others

Post on 13-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

2004-10-14 Dave Patterson

(www.cs.berkeley.edu/~patterson)

John Lazzaro (www.cs.berkeley.edu/~lazzaro)

www-inst.eecs.berkeley.edu/~cs152/

CS152 – Computer Architecture andEngineering

Lecture 13 – Cache I

1

Page 2: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

The Big Picture: Where are we now?

Datapath

Memory

Processor

Input

Output

Control Next: Focus on the memory system

Memory

Processor

Input

Output

Control So far: Focus on processor datapath

and control Datapath

2

Page 3: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Today’s Lecture - Caches

Memory hierarchy

Static memory design

Locality

Cache design

3

Page 4: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

1977: DRAM faster than microprocessors

Apple ][ (1977)

Steve WozniakSteve

Jobs

CPU: 1000 ns DRAM: 400 ns

4

Page 5: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Since 1980, CPU has outpaced DRAM ...

CPU60% per yr2X in 1.5 yrs

DRAM9% per yr2X in 10 yrs

10

DRAM

CPU

Performance(1/latency)

100

1000

1980

2000

1990 Year

Gap grew 50% per year

Q. How do architects address this gap? A. Put smaller, faster “cache” memories

between CPU and DRAM. Create a “memory hierarchy”.

5

Page 6: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Basic Idea: Variable-latency memory port Lower Level

MemoryUpper Level

MemoryTo Processor

From Processor

Blk X

Blk Y

Small, fast Large, slow

FromCPU

To CPU

Data in upper memory returned with lower latency.

Data in lower level returned with higher latency.

6

Page 7: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Administrivia - Lab 3, HW 3 ...

Lab 3 “no forwarding” Xilinx demo on 10/15 (tomorrow)

Homework 3 due 10/20 (Wednesday),283 Soda, in CS 152 box at 5 PM

Lab 3 final demo on 10/22 (Friday)Report due 10/25 (Monday,11:59 PM)

7

Page 8: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

2004 Memory Hierarchy: Apple iMac G5

iMac G51.6 GHz$1299.00

Reg L1 Inst

L1 Data L2 DRAM Disk

Size 1K 64K 32K 512K 256M 80GLatency(cycles) 1 3 3 11 88 1e7

Let programs address a memory space that scales to the disk size, at a speed that is

usually as fast as register access

Managed by compiler

Managed by hardware

Managed by OS,hardware,application

Goal: Illusion of large, fast, cheap memory

8

Page 9: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

iMac’s PowerPC 970: All caches on-chip

(1K)

Registers

512KL2

L1 (64K Instruction)

L1 (32K Data)9

Page 10: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Latency: A closer look

Reg L1 Inst

L1 Data L2 DRAM Disk

Size 1K 64K 32K 512K 256M 80GLatency(cycles) 1 3 3 11 88 1e7Latency

(sec) 0.6n 1.9n 1.9n 6.9n 55n 12.5m

Hz 1.6G 533M 533M 145M 18M 80Architect’s latency toolkit:

Read latency: Time to return first byte of a random access

(1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later.

10

Page 11: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Programs with locality cache well ...

Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)

Time

Mem

ory

Addr

ess

(one

dot

per

acc

ess)

Q. Point out bad locality behavior ...

SpatialLocality

Temporal Locality

Bad

11

Page 12: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

The caching algorithm in one slide

Temporal locality: Keep most recently accessed data closer to processor.

Spatial locality: Move contiguous blocks in the address space to upper levels.

Lower LevelMemory

Upper LevelMemory

To Processor

From Processor

Blk X

Blk Y

12

Page 13: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Caching terminology

Lower LevelMemory

Upper LevelMemory

To Processor

From Processor

Blk X

Blk Y

Hit: Data appearsin upper

level block(ex: Blk X)

Miss: Data retrieval from lower level

needed(Ex: Blk Y)

Hit Rate: The fraction of memory accesses found in

upper level.

Miss Rate: 1 - Hit Rate

Hit Time: Time to access upper level. Includes hit/miss check.

Miss penalty: Time to replace

block in upper level + deliver to CPU

Hit Time << Miss Penalty

13

Page 14: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

Static Memory Design

14

Page 15: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

Review: Two inverters store a bit

The other elements in a memory circuit control reading and writing

16-transistor circuit. Most transistors implement read/write semantics

!"#$%&'())* ++,!-.)'/ 012-)34$5$%& 67&1'-8

!"#$%&'(&)#'*+,#-*.

/ 0"12*&1'3" 4".2#1.&,4-3&5"#$%&

164-276&!"#$% #$1869

/ :#-8;&1-&<&5"#$% 4".2#1.&,4-3&

5"#$%&164-276&$&'()* #$1869

!

8#;

<

."12*&1'3" 8#-8;&1-&<&5"#$%

8#;

8#;=

8#;

8#;

8#;=

8#;=

8#;

8#;=

Holds

value

0 1 0

!"#$%&'())* ++,!-.)'/ 012-)34$5$%& 67&1'-8

!"#$%&'(&)#'*+,#-*.

/ 0"12*&1'3" 4".2#1.&,4-3&5"#$%&

164-276&!"#$% #$1869

/ :#-8;&1-&<&5"#$% 4".2#1.&,4-3&

5"#$%&164-276&$&'()* #$1869

!

8#;

<

."12*&1'3" 8#-8;&1-&<&5"#$%

8#;

8#;=

8#;

8#;

8#;=

8#;=

8#;

8#;=

Holds

value

1 0 1

Example: Flip-Flop

!"#$%&'())* ++,!-.)'/ 012-)34$5$%& 67&1'-8

!"#$%&'(&)#'*+,#-*.

/ 0"12*&1'3" 4".2#1.&,4-3&5"#$%&

164-276&!"#$% #$1869

/ :#-8;&1-&<&5"#$% 4".2#1.&,4-3&

5"#$%&164-276&$&'()* #$1869

!

8#;

<

."12*&1'3" 8#-8;&1-&<&5"#$%

8#;

8#;=

8#;

8#;

8#;=

8#;=

8#;

8#;=

!"#$%&'())* ++,!-.)'/ 012-)34$5$%& 67&1'-8

!"#$%&'(&)#'*+,#-*.

/ 0"12*&1'3" 4".2#1.&,4-3&5"#$%&

164-276&!"#$% #$1869

/ :#-8;&1-&<&5"#$% 4".2#1.&,4-3&

5"#$%&164-276&$&'()* #$1869

!

8#;

<

."12*&1'3" 8#-8;&1-&<&5"#$%

8#;

8#;=

8#;

8#;

8#;=

8#;=

8#;

8#;=

D Q

CLK

15

Page 16: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

For use in arrays: Static RAM (SRAM) cell

Writing a bit

Drive bit lineswith new dataand activate

word line

1

01

bit

!"#$%&'())* ++,!-.)'/ 012-3/414- 56&1'--

!"#$%&#'()'*"+,(-"*.$+&/"0(

1 234)(-'##

1 5$+6'+(7'##(! #"8'+(9'0/&%,:(;&6;'+(7"/%<=&%((((((((((((((((((

1 >"(+'?+'/;(+'@A&+'9(

1 2&*.#'(+'$9(! ?$/%'+($77'//(

1 2%$09$+9(B-(.+"7'//(! 0$%A+$#(?"+(&0%'6+$%&"0(8&%;(#"6&7

1 C34)(-'##

1 2*$##'+(7'##(! ;&6;'+(9'0/&%,:(#"8'+(7"/%<=&%(

1 >''9/(.'+&"9&7(+'?+'/;:($09(+'?+'/;($?%'+(+'$9(

1 -"*.#'D(+'$9(! #"06'+($77'//(%&*'(

1 2.'7&$#(B-(.+"7'//(! 9&??&7A#%(%"(&0%'6+$%'(8&%;(#"6&7(7&+7A&%/

8"+9(#&0'

=&%(#&0' =&%(#&0'

8"+9(#&0'

=&%(#&0'

!"#$%&'()&*$+',,#&#-.#$/#01##-$+',,#&#-0$(#(2&*$0*%#3$'3$0"#$/'0 .#445

bit

word

Reading a bit

Activateword line

let cell drivebit lines.

1 0

1

01

bit

!"#$%&'())* ++,!-.)'/ 012-3/414- 56&1'--

!"#$%&#'()'*"+,(-"*.$+&/"0(

1 234)(-'##

1 5$+6'+(7'##(! #"8'+(9'0/&%,:(;&6;'+(7"/%<=&%((((((((((((((((((

1 >"(+'?+'/;(+'@A&+'9(

1 2&*.#'(+'$9(! ?$/%'+($77'//(

1 2%$09$+9(B-(.+"7'//(! 0$%A+$#(?"+(&0%'6+$%&"0(8&%;(#"6&7

1 C34)(-'##

1 2*$##'+(7'##(! ;&6;'+(9'0/&%,:(#"8'+(7"/%<=&%(

1 >''9/(.'+&"9&7(+'?+'/;:($09(+'?+'/;($?%'+(+'$9(

1 -"*.#'D(+'$9(! #"06'+($77'//(%&*'(

1 2.'7&$#(B-(.+"7'//(! 9&??&7A#%(%"(&0%'6+$%'(8&%;(#"6&7(7&+7A&%/

8"+9(#&0'

=&%(#&0' =&%(#&0'

8"+9(#&0'

=&%(#&0'

!"#$%&'()&*$+',,#&#-.#$/#01##-$+',,#&#-0$(#(2&*$0*%#3$'3$0"#$/'0 .#445

bit

word

1 0

16

Page 17: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

Putting it all together: an SRAM array

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.13

° Why do computer designers need to know about RAM technology?

• Processor performance is usually limited by memory bandwidth

• As IC densities increase, lots of memory will fit on processor chip

- Tailor on-chip memory to specific needs

- Instruction cache

- Data cache

- Write buffer

° What makes RAM different from a bunch of flip-flops?• Density: RAM is much denser

Random Access Memory (RAM) Technology

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.14

Static RAM Cell

6-Transistor SRAM Cell

bit bit

word(row select)

bit bit

word

° Write:1. Drive bit lines (bit=1, bit=0)

2.. Select row

° Read:1. Precharge bit and bit to Vdd or Vdd/2 => make sure equal!

2.. Select row

3. Cell pulls one line low

4. Sense amp on column detects difference between bit and bit

replaced with pullupto save area

10

0 1

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.15

Typical SRAM Organization: 16-word x 4-bit

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

SRAM

Cell

- +Sense Amp - +Sense Amp - +Sense Amp - +Sense Amp

: : : :

Word 0

Word 1

Word 15

Dout 0Dout 1Dout 2Dout 3

- +Wr Driver &

Precharger - +Wr Driver &

Precharger - +Wr Driver &

Precharger - +Wr Driver &

Precharger

Ad

dress D

ecod

er

WrEn

Precharge

Din 0Din 1Din 2Din 3

A0

A1

A2

A3

Q: Which is longer:

word line or

bit line?

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.16

° Write Enable is usually active low (WE_L)

° Din and Dout are combined to save pins:• A new control signal, output enable (OE_L) is needed

• WE_L is asserted (Low), OE_L is disasserted (High)

- D serves as the data input pin

• WE_L is disasserted (High), OE_L is asserted (Low)

- D is the data output pin

• Both WE_L and OE_L are asserted:

- Result is unknown. Don’t do that!!!

° Although could change VHDL to do what desire, must do the best with what you’ve got (vs. what you need)

A

DOE_L

2 Nwordsx M bit

SRAM

N

M

WE_L

Logic Diagram of a Typical SRAM

WriteDriver

WriteDriver

WriteDriver

WriteDriver

Word and bit lines slow down as array grows larger! Architects specify number of rows and columns.

ParallelDataI/OLines

Add muxesto selectsubset of bits

17

Page 18: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

Cache Design Example

18

Page 19: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

CPU address space: An array of “blocks” Block #

7

123456

0

227

- 1

...

32-byte blocks

27 bits 5 bits

The job of a cache is to hold

a “popular” subset of blocks.

32-bit Memory Address

Which block? Byte #031

19

Page 20: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

UC Regents Fall 2004 © UCBCS 152 L13 Cache I ()

One approach: Fully Associative Cache

Cache Tag (27 bits) Byte Select

531 04

Ex: 0x01

ValidBit

Byte 31 ... Byte

1Byte

0

Byte 31 ... Byte

1Byte

0

Cache DataHolds 4 blocks

=

=

=

=

HitReturn bytes of “hit” cache line

Block # (”Tags”)026

20

Page 21: CS152 – Computer Architecture and Engineering Lecture 13 – …cs152/fa04/lecnotes/lec7-1.pdf · Lec1 9. 13 ¡Why do computer designers need to know about RAM technology? ¥ Processor

CS 152 L13 Cache I () UC Regents Fall 2004 © UCB

Conclusions

Program locality is why building a memory hierarchy makes sense

Latency toolkit: hierarchy design,bit-wise parallelism, pipelining.

Cache operation: compare tags, detect hits, select bytes.

In practice: how many rows, how many columns, how many arrays.

21