fall 2015, nov 2... elec 5200-001/6200-001 lecture 9 1 elec 5200-001/6200-001 computer architecture...

69
Fall 2015, Nov Fall 2015, Nov 2 . . . 2 . . . ELEC 5200-001/6200-001 ELEC 5200-001/6200-001 Lecture 9 Lecture 9 1 ELEC 5200-001/6200-001 ELEC 5200-001/6200-001 Computer Architecture and Design Computer Architecture and Design Fall 2015 Fall 2015 Memory Organization Memory Organization (Chapter 5) (Chapter 5) Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor Department of Electrical and Computer Department of Electrical and Computer Engineering Engineering Auburn University, Auburn, AL 36849 Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal [email protected]

Upload: barry-shaw

Post on 17-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 11

ELEC 5200-001/6200-001ELEC 5200-001/6200-001Computer Architecture and DesignComputer Architecture and Design

Fall 2015Fall 2015 Memory Organization (Chapter 5) Memory Organization (Chapter 5)

Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor

Department of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849

http://www.eng.auburn.edu/[email protected]

Page 2: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 22

Types of Computer MemoriesTypes of Computer Memories

From the cover of:A. S. Tanenbaum, Structured Computer Organization, Fifth Edition, Upper SaddleRiver, New Jersey: Pearson Prentice Hall, 2006.

Page 3: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 33

Random Access Memory (RAM)Random Access Memory (RAM)

Memory cell

array

Address decoder

Read/write circuits

Address bits

Data bits

Page 4: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 44

Six-Transistor SRAM CellSix-Transistor SRAM Cell

Bit line

Word line

Bit line

bit bit

Page 5: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 55

Dynamic RAM (DRAM) CellDynamic RAM (DRAM) Cell

Word line

Bit line

“Single-transistor DRAM cell”Robert Dennard’s 1967 invevention

Page 6: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 66

Electronic Memory DevicesElectronic Memory DevicesMemory Memory

technologytechnology

Typical Typical access access

timetime

Clock rate Clock rate GHzGHz

Cost per GB Cost per GB in 2004in 2004

SRAMSRAM 0.5-5 ns0.5-5 ns 0.2-2.0 GHz0.2-2.0 GHz $4k-$10k$4k-$10k

DRAMDRAM 50-70 ns50-70 ns 15-20 MHz15-20 MHz $100-$200$100-$200

Magnetic Magnetic diskdisk 5-20 ms5-20 ms 50-200 Hz50-200 Hz $0.5-$2$0.5-$2

For more on memories:Semiconductor Memories: A Handbook of Design, Manufacture and Application, by Betty Prince, Wiley 1996.Emerging Memories: Technologies and Trends, by Betty Prince,Springer 2002.

Page 7: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 77

Building a Computer with 1GHz Building a Computer with 1GHz Clock and 40GB MemoryClock and 40GB Memory

Type of Type of memorymemory CostCost Clock rateClock rate

SRAMSRAM $160,000$160,000 0.2-2.0 GHz0.2-2.0 GHz

DRAMDRAM $4,000$4,000 15-20 MHz15-20 MHz

DiskDisk $20$20 50-200 Hz50-200 Hz

Page 8: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 88

Trying to Buy a Laptop Computer?Trying to Buy a Laptop Computer?(Three Years Ago)(Three Years Ago)

IBM ThinkPad X Series by Lenovo23717GU1.20 GHz Low Voltage Intel® Pentium® M, 1MB L2 SRAM Cache ~$5Microsoft® Windows® XP Professional512 MB DRAM ~$10040 GB Hard Drive ~$402.7 lbs, 12.1" XGA (1024x768)IBM Embedded Security Subsystem 2.0Intel PRO/Wireless Network Connection 802.11b, Gigabit EthernetIntegrated graphics Intel Extreme Graphics 2No CD/DVD drive PROP, Fixed BayAvailability**: Within 2 weeks$2,149.00 IBM web price*$1,741.65 sale price* Last year $1,023.75 X61 with Intel Duo 2GHz Processor

Page 9: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 99

2006/072006/07Choose a Lenovo 3000 V Series to customize & buy

From: $999.00

Sale price: $949.00

ProcessorIntel Core 2 Duo T5500 (1.66GHz, 2MBL2, 667MHzFSB)

Total memory512MB PC2-5300DDR2 SDRAM

Hard drive80GB, 5400rpm Serial ATA

Weight4.0lbs

Page 10: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1010

2007 Executive-Class 2007 Executive-Class

The Reserve Edition features a leather exterior handmade by expert Japanese saddle makers.

The classic, award-winning ThinkPad design remains unchanged - why mess with success?

ThinkPad Reserve Verizon Edition orThinkPad Reserve Cingular EditionFrom $5,000

Page 11: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1111

Nicholas Negroponte’s OLPCNicholas Negroponte’s OLPC(One Laptop per Child)(One Laptop per Child)

http://www.flickr.com/photos/olpc/3145038187/

Page 12: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1212

X0-1X0-1Manufacturer:Manufacturer: Quanta ComputerQuanta ComputerType: Type: SubnotebookSubnotebookConnectivity: Connectivity: 802.11b/g /s, wireless LAN, 3 USB 2.0 ports, 802.11b/g /s, wireless LAN, 3 USB 2.0 ports,

MMC/SD card slotMMC/SD card slotMedia: Media: 1 GB flash memory1 GB flash memoryOperating system: Operating system: Fedora-based (Linux)Fedora-based (Linux)Input: Input: Keyboard, Touchpad, Microphone, CameraKeyboard, Touchpad, Microphone, CameraCamera: Camera: Built-in video camera (640×480; 30 FPS)Built-in video camera (640×480; 30 FPS)Power: Power: NiMH or LiFePO4 battery removable packNiMH or LiFePO4 battery removable packCPU: CPU: AMD Geode [email protected] WAMD Geode [email protected] WMemory: Memory: 256 MB DRAM256 MB DRAMDisplay: Display: Dual-mode 19.1 cm/7.5" diagonal TFT LCD Dual-mode 19.1 cm/7.5" diagonal TFT LCD

1200×9001200×900Dimensions: Dimensions: 242 mm × 228 mm × 32 mm242 mm × 228 mm × 32 mmWeight: Weight: LiFeP battery: 1.45 kg [3.2 lbs]; NiMH battery: LiFeP battery: 1.45 kg [3.2 lbs]; NiMH battery: 1.58 1.58 kg [3.5 lbs]kg [3.5 lbs]Price: Price: $100+$100+

Page 13: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Today’s LaptopToday’s Laptop

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1313

Inspiron 15                                

Dell Price $379.99 Introducing the new Inspiron™ 15, a 15.6" laptop that gives you the everyday features you need, all at a great value. •Up to Intel® Core™2 Duo processors•Entertainment on the go with the HD display •Personalize with a choice of six vibrant colors or choose from over 200+ artist designs with Design Studio

Page 14: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1414

CacheCacheProcessor does all memory Processor does all memory operations with cache.operations with cache.MissMiss – If requested word is not – If requested word is not in cache, a in cache, a blockblock of words of words containing the requested word is containing the requested word is brought to cache, and then the brought to cache, and then the processor request is completed.processor request is completed.HitHit – If the requested word is in – If the requested word is in cache, read or write operation is cache, read or write operation is performed directly in cache, performed directly in cache, without accessing main memory.without accessing main memory.BlockBlock – minimum amount of – minimum amount of data transferred between cache data transferred between cache and main memory.and main memory.

Processor

Cache small, fast

memory

Main memory large, inexpensive

(slow)

words

blocks

Page 15: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1515

Inventor of CacheInventor of Cache

M. V. Wilkes, “Slave Memories and Dynamic Storage Allocation,”IEEE Transactions on Electronic Computers, vol. EC-14, no. 2,pp. 270-271, April 1965.

Page 16: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1616

Cache PerformanceCache Performance

Average access timeAverage access time

= T1 = T1 × h + (Tm + T1) × (1 – × h + (Tm + T1) × (1 – h)h)

= T1 + Tm × (1 – h)= T1 + Tm × (1 – h)

wherewhere– T1 = cache access time (small)T1 = cache access time (small)– Tm = memory access time (large)Tm = memory access time (large)– h = hit rate (0 ≤ h ≤ 1)h = hit rate (0 ≤ h ≤ 1)

Hit rate is also known as hit ratio,Hit rate is also known as hit ratio,

miss rate = 1 – hit ratemiss rate = 1 – hit rate

Processor

CacheSmall, fast memory

Main memory large, inexpensive

(slow)

Access time = T1

Access time = Tm

Page 17: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1717

Average Access TimeAverage Access Time

Tm + T1

T1

0 h = 1

1 h = 0

miss rate, 1 – h

Acc

ess

tim

e

Desirable miss rate < 5%

Acceptable miss rate < 10%

T1 + Tm × (1– h)T1 + Tm × (1– h)Slo

pe = Tm

Page 18: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1818

Comparing PerformanceComparing PerformanceIdeal processor with 1 cycle memory access, CPI = 1Ideal processor with 1 cycle memory access, CPI = 1Processor without cache:Processor without cache:

Assume main memory access time of 10 cyclesAssume main memory access time of 10 cyclesAssume 30% instructions require memory data accessAssume 30% instructions require memory data access

Processor with cache:Processor with cache:Assume cache access time of 1 cycleAssume cache access time of 1 cycleAssume hit rate 0.95 for instructions, 0.90 for dataAssume hit rate 0.95 for instructions, 0.90 for dataAssume miss penalty (time to read memory into cache and from Assume miss penalty (time to read memory into cache and from cache to processor) is 11 cyclescache to processor) is 11 cycles

– Comparing times of Comparing times of 100100 instructions: instructions:

Time without cache Time without cache 100 100×10 + 30×10×10 + 30×10──────────── ──────────── == ──────────────────────────── ────────────────────────────

Time with cacheTime with cache 100(0.95×1+0.05×11) + 30(0.9×1+0.1×11) 100(0.95×1+0.05×11) + 30(0.9×1+0.1×11)

= 6.19= 6.19

Page 19: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 1919

Controlling Miss RateControlling Miss RateIncrease cache sizeIncrease cache size

More blocks can be More blocks can be kept in cache; lower kept in cache; lower miss rate.miss rate.Larger cache is Larger cache is slower; expensive.slower; expensive.

Increase block sizeIncrease block sizeMore data available; More data available; may lower miss rate.may lower miss rate.Fewer blocks in Fewer blocks in cache increase miss cache increase miss rate.rate.Larger blocks need Larger blocks need more time to swap.more time to swap.

Large memory

Large memory

Cache

Cache

Blocks

Blocks

Page 20: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2020

Cache SizeCache Size

Increasing cache size

hit rate

1/(cycle time)

optimum

Page 21: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2121

Block SizeBlock Size

Increasing block size

hit

rate

fragmenteddata

optimum

1.0

0.8

localizeddata

Page 22: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2222

Increasing Hit RateIncreasing Hit RateHit rate increases with cache size.Hit rate increases with cache size.Hit rate mildly depends on block size.Hit rate mildly depends on block size.

10%

5%

0%

Cache size = 4KB

16KB

64KB

16B 32B 64B 128B 256BBlock size

mis

s ra

te =

1 –

hit

rat

e

100%

95%

90%

hit

rat

e, h

Decreasing chances of covering large data locality

Decreasing chances of getting fragmented data

Page 23: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2323

The Locality PrincipleThe Locality PrincipleA program tends to access data that form a A program tends to access data that form a physical cluster in the memory – multiple physical cluster in the memory – multiple accesses may be made within the same block.accesses may be made within the same block.Physical localities are temporal and may shift Physical localities are temporal and may shift over longer periods of time – data not used for over longer periods of time – data not used for some time is less likely to be used in the future. some time is less likely to be used in the future. Upon miss, the Upon miss, the least recently usedleast recently used (LRU) block (LRU) block can be overwritten by a new block.can be overwritten by a new block.P. J. Denning, “The Locality Principle,” P. J. Denning, “The Locality Principle,” Communications of the ACMCommunications of the ACM, vol. 48, no. 7, pp. , vol. 48, no. 7, pp. 19-24, July 2005.19-24, July 2005.

Page 24: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2424

Data Locality, Cache, BlocksData Locality, Cache, Blocks

Increase block size to match locality size

Increase cache size to include most blocks

Dataneeded bya program

Block 1

Block 2

Memory

Cache

Page 25: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2525

Types of CachesTypes of Caches

Direct-mapped cacheDirect-mapped cachePartitions of size of cache in the memoryPartitions of size of cache in the memory

Each partition subdivided into blocksEach partition subdivided into blocks

Set-associative cacheSet-associative cache

Page 26: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2626

Direct-Mapped CacheDirect-Mapped Cache

Dataneeded bya program

Memory

Cache

Swap-out

Swap-in

LRU

Block 1

Block 2 Data needed

Page 27: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2727

Set-Associative CacheSet-Associative Cache

Dataneeded bya program

Memory

Cache Swap

-out

LRU

Block 1

Block 2

Swap-inData

needed

Page 28: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2828

Direct-Mapped CacheDirect-Mapped Cache

000001010011100101110111

0000000001000100001100100001010011000111

0100001001010100101101100011010111001111

1000010001100101001110100101011011010111

1100011001110101101111100111011111011111

Main memory

Cache of 8 blocks

11 101 → memory address

cache address: tag

index

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

ind

ex (

loca

l ad

dre

ss)

tag

0010110101001011

Page 29: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 2929

Direct-Mapped CacheDirect-Mapped Cache

00011011

0000000001000100001100100001010011000111

0100001001010100101101100011010111001111

1000010001100101001110100101011011010111

1100011001110101101111100111011111011111

Main memory

Cache of 4 blocks

11 10 1 → memory address

cache address:tag

index block offset

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 2 word

ind

ex (

loca

l ad

dre

ss)

tag

00110010

block offset0 1

Page 30: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3030

Main memory

Size=W words

Number of Tag and Index BitsNumber of Tag and Index Bits

Cache Size= w words

Each word in cache has unique index (local addr.)Number of index bits = log2w

Index bits are shared with block offset when a block contains more words than 1

Assume partitions of w words each in the main memory.

W/w such partitions, each identified by a tagNumber of tag bits = log2(W/w)

Page 31: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3131

Direct-Mapped Cache (Byte Address)Direct-Mapped Cache (Byte Address)

000001010011100101110111

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

11 101 00 → memory address

cache address: tag

index

32-

wo

rd b

yte-

add

ress

able

mem

ory Block size = 1 word

ind

ex

tag

0010110101001011

byte offset

Page 32: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3232

Finding a Word in CacheFinding a Word in Cache

Valid 2-bit Index bit Tag Data

000 001 010 011 100 101 110 111

byte offset b6 b5 b4 b3 b2 b1 b0

= Data1 = hit0 = miss

TagIndex

Memory address

Cache size8 words

Block size= 1 word

32 words byte-address

Page 33: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3333

How Many Bits Cache Has?How Many Bits Cache Has?Consider a main memory:Consider a main memory:

32 words; byte address is 7 bits wide: 32 words; byte address is 7 bits wide: b6 b5 b4 b3 b2 b1 b0b6 b5 b4 b3 b2 b1 b0Each word is 32 bits wideEach word is 32 bits wide

Assume that cache block size is 1 word (32 bits Assume that cache block size is 1 word (32 bits data) and it contains 8 blocks.data) and it contains 8 blocks.Cache requires, for each word:Cache requires, for each word:

2 bit 2 bit tagtag, and one , and one valid bitvalid bitTotal storage needed in cacheTotal storage needed in cache

= #blocks in cache = #blocks in cache × × (data bits/block + tag bits + (data bits/block + tag bits + valid bit)valid bit)

= 8 (32+2+1) = 280 bits= 8 (32+2+1) = 280 bits

Physical storage/Data storage = 280/256 = 1.094Physical storage/Data storage = 280/256 = 1.094

Page 34: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3434

A More Realistic CacheA More Realistic CacheConsider 4 GB, byte-addressable main memory:Consider 4 GB, byte-addressable main memory:

1Gwords; byte address is 32 bits wide: 1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0b31…b16 b15…b2 b1 b0Each word is 32 bits wideEach word is 32 bits wide

Assume that cache block size is 1 word (32 bits data) and it Assume that cache block size is 1 word (32 bits data) and it contains contains 64 KB data64 KB data, or 16K words, i.e., 16K blocks., or 16K words, i.e., 16K blocks.Number of cache index bits = 14, because 16K = 2Number of cache index bits = 14, because 16K = 21414

Tag size = 32 – byte offset – #index bits = 32 – 2 – 14 = 16 bitsTag size = 32 – byte offset – #index bits = 32 – 2 – 14 = 16 bits

Cache requires, for each word:Cache requires, for each word:16 bit 16 bit tagtag, and one , and one valid bitvalid bitTotal storage needed in cacheTotal storage needed in cache

= #blocks in cache = #blocks in cache × × (data bits/block + tag size + valid bits)(data bits/block + tag size + valid bits)

= 2= 21414(32+16+1) = 16(32+16+1) = 16×2×21010×49 = 784×2×49 = 784×21010 bits = 784 Kb = bits = 784 Kb = 98 KB98 KB

Physical storage/Data storage = 98/64 = 1.53Physical storage/Data storage = 98/64 = 1.53But, need to increase the block size to match the size of locality.But, need to increase the block size to match the size of locality.

Page 35: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Data Organization in CacheData Organization in Cache

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3535

Block offset00 01 10 11

000

001

010

011

100

101

110

111

Blo

ck in

dex

0100

0011

0101

1011

1001

1010

1100

1110

0

0

0

0

0

0

0

0

Va

lid b

itTag

Memory address of word in cache 1011 011 10 00

4 bit tag means memory is 16 times larger than cacheAddressmappingoverhead

4 word block

Page 36: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3636

Cache Bits for 4-Word BlockCache Bits for 4-Word BlockConsider 4 GB, byte-addressable main memory:Consider 4 GB, byte-addressable main memory:

1Gwords; byte address is 32 bits wide: 1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0b31…b16 b15…b2 b1 b0Each word is 32 bits wideEach word is 32 bits wide

Assume that cache block size is 4 words (128 bits data) and Assume that cache block size is 4 words (128 bits data) and it contains it contains 64 KB data64 KB data, or 16K words, i.e., 4K blocks., or 16K words, i.e., 4K blocks.Number of cache index bits = 12, because 4K = 2Number of cache index bits = 12, because 4K = 21212

– Tag size Tag size = 32 – byte offset – #block offset bits – #index bits= 32 – byte offset – #block offset bits – #index bits= 32 – 2 – 2 – 12 = 16 bits= 32 – 2 – 2 – 12 = 16 bits

Cache requires, for each word:Cache requires, for each word:– 16 bit 16 bit tagtag, and one , and one valid bitvalid bit– Total storage needed in cacheTotal storage needed in cache

= #blocks in cache = #blocks in cache × × (data bits/block + tag size + valid bit)(data bits/block + tag size + valid bit)= 2= 21212(4(4××32+16+1) = 432+16+1) = 4×2×21010×145 = 580×2×145 = 580×21010 bits =580 Kb = bits =580 Kb = 72.5 KB72.5 KB

Physical storage/Data storage = 72.5/64 = 1.13Physical storage/Data storage = 72.5/64 = 1.13

Page 37: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3737

Using Larger Cache Block (4 Words)Using Larger Cache Block (4 Words)

Val. 16-bit DataIndex bit Tag (4 words=128 bits)

byte offset

b31… b15 b14… b4 b3 b2 b1 b0

=

Data

1 = hit0 = miss

16 bit Tag

12 bit Index

Memory address

Cache size16K words

Block size= 4 word

4GB = 1G words byte-address

4K In

dex

es

0000 0000 0000

1111 1111 1111

M U X

2 bit block offset

Page 38: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3838

Handling a MissHandling a MissMiss occurs when data at the required memory Miss occurs when data at the required memory address is not found in cache.address is not found in cache.Controller actions:Controller actions:– Stall pipelineStall pipeline– Freeze contents of all registersFreeze contents of all registers– Activate a separate cache controllerActivate a separate cache controller

If cache is fullIf cache is full– select the least recently used (LRU) block in cache for over-select the least recently used (LRU) block in cache for over-

writingwriting– If selected block has inconsistent data, take proper actionIf selected block has inconsistent data, take proper action

Copy the block containing the requested address from memoryCopy the block containing the requested address from memory

– Restart InstructionRestart Instruction

Page 39: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 3939

Miss During Instruction FetchMiss During Instruction Fetch

Send original PC value (PC – 4) to the Send original PC value (PC – 4) to the memory.memory.

Instruct main memory to perform a read Instruct main memory to perform a read and wait for the memory to complete the and wait for the memory to complete the access.access.

Write cache entry.Write cache entry.

Restart the instruction whose fetch failed.Restart the instruction whose fetch failed.

Page 40: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4040

Writing to MemoryWriting to MemoryCache and memory become Cache and memory become inconsistentinconsistent when data is written into cache, but not to when data is written into cache, but not to memory – memory – the cache coherence problem.the cache coherence problem.

Strategies to handle inconsistent data:Strategies to handle inconsistent data:– Write-throughWrite-through

Write to memory and cache simultaneously always.Write to memory and cache simultaneously always.

Write to memory is ~100 times slower than to cache.Write to memory is ~100 times slower than to cache.

– Write bufferWrite bufferWrite to cache and to buffer for writing to memory.Write to cache and to buffer for writing to memory.

If buffer is full, the processor must wait.If buffer is full, the processor must wait.

Page 41: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4141

Writing to Memory: Write-BackWriting to Memory: Write-BackWrite-back (or copy back) writes only to Write-back (or copy back) writes only to cache but sets a cache but sets a “dirty bit”“dirty bit” in the block in the block where write is performed.where write is performed.

When a block with When a block with dirty bitdirty bit “on” is to be “on” is to be overwritten in the cache, it is first written to overwritten in the cache, it is first written to the memory.the memory.

Page 42: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4242

AMD Opteron MicroprocessorAMD Opteron Microprocessor

L1(split64KB each)Block 64BWrite-back

L21MBBlock 64BWrite-back

Page 43: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4343

Interleaved MemoryInterleaved MemoryReduces miss penalty.Reduces miss penalty.Memory designed to read words Memory designed to read words of a block simultaneously in one of a block simultaneously in one read operation.read operation.Example:Example:– Cache block size = 4 wordsCache block size = 4 words– Interleaved memory with 4 banksInterleaved memory with 4 banks– Suppose memory access ~15 cyclesSuppose memory access ~15 cycles– Miss penalty = 1 cycle to send Miss penalty = 1 cycle to send

address + 15 cycles to read a block + address + 15 cycles to read a block + 4 cycles to send data to cache4 cycles to send data to cache = = 20 cycles20 cycles

– Without interleaving,Without interleaving, Miss Miss penalty penalty = 65 cycles= 65 cycles

Processor

CacheSmall, fast

memory

words

blocks

Memory bank 0

Memory bank 1

Memory bank 2

Memory bank 3

Main memory

Page 44: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4444

Cache HierarchyCache Hierarchy

Average access timeAverage access time

= = T1 + T1 + (1 – h(1 – h1) [ T2 + (1 – h2)Tm ]1) [ T2 + (1 – h2)Tm ]

WhereWhereT1 = L1 cache access time T1 = L1 cache access time (smallest)(smallest)

T2 = L2 cache access time (small)T2 = L2 cache access time (small)

Tm = memory access time (large)Tm = memory access time (large)

h1, h2 = hit rates (0 ≤ h1, h2 ≤ 1)h1, h2 = hit rates (0 ≤ h1, h2 ≤ 1)

Average access time reduces Average access time reduces by adding a cache.by adding a cache.

Processor

L1 Cache(SRAM)

Main memory large, inexpensive

(slow)

Access time = T1

Access time = Tm

L2 Cache (DRAM)

Access time = T2

Page 45: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4545

Average Access TimeAverage Access Time

T1+T2+Tm

T1

0 h1=1

1 h1=0

miss rate, 1- h1

Acc

ess

tim

e

T1+T2+Tm / 2

T1+T2

T1 < T2 < Tm

h2 = 0

h2 = 1

h2 = 0.5

T1 + (1 – h1) [ T2 + (1 – h2)Tm ]T1 + (1 – h1) [ T2 + (1 – h2)Tm ]

Page 46: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4646

Processor Pipeline Performance Processor Pipeline Performance Without CacheWithout Cache

5GHz processor, cycle time = 0.2ns5GHz processor, cycle time = 0.2ns

Memory access time = 100ns = 500 cyclesMemory access time = 100ns = 500 cycles

Ignoring memory access, CPI = 1Ignoring memory access, CPI = 1

Assuming no memory data access:Assuming no memory data access:

CPICPI = 1 + # stall cycles= 1 + # stall cycles= 1 + 500 = 501= 1 + 500 = 501

Page 47: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4747

Performance with 1 Level CachePerformance with 1 Level Cache

Assume hit rate, h1 = 0.95Assume hit rate, h1 = 0.95

L1 access time = 0.2ns = 1 cycleL1 access time = 0.2ns = 1 cycle

CPICPI = 1 + # stall cycles= 1 + # stall cycles

= 1 += 1 + 0.05×500 0.05×500

= 26= 26

Processor speed increase due to cacheProcessor speed increase due to cache

= 501/26= 501/26 = = 19.319.3

Page 48: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4848

Performance with 2 Level CachesPerformance with 2 Level CachesAssume:Assume:– L1 hit rate, h1 = 0.95L1 hit rate, h1 = 0.95– L2 hit rate, h2 = 0.90L2 hit rate, h2 = 0.90– L2 access time = 5ns = 25 cyclesL2 access time = 5ns = 25 cycles

CPI = 1 + # stall cyclesCPI = 1 + # stall cycles = 1= 1 + 0.05 (25 + 0.10×500) + 0.05 (25 + 0.10×500) = 1 + 3.75 = 4.75= 1 + 3.75 = 4.75

Processor speed increase due to 2 level cachesProcessor speed increase due to 2 level caches = 501/4.75= 501/4.75 = = 105.5105.5

Speed increase due to L2 cacheSpeed increase due to L2 cache = 26/4.75= 26/4.75 = 5.47= 5.47

Page 49: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 4949

Miss Rate of Direct-Mapped CacheMiss Rate of Direct-Mapped Cache

000001010011100101110111

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

11 101 00 → memory address

cache address: tag

index

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

ind

ex

tag

001011010100 111011

byte offset

Least recently used(LRU) block

This block is needed

Page 50: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5050

Fully-Associative Cache (8-Way Set Associative)Fully-Associative Cache (8-Way Set Associative)

000001010011100101010 11101111

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

11101 00 → memory address

cache address: tag

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

tag

0010110101000111

byte offset

LRU block

This block is needed

Page 51: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5151

Miss Rate of Direct-Mapped CacheMiss Rate of Direct-Mapped Cache

000001010011100101110111

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

11 101 00 → memory address

cache address: tag

index

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

ind

ex

tag

00 / 01 / 00 / 10 xx xx xx xx xx 00 xx

byte offset

Memory references to addresses: 0, 8, 0, 6, 8, 16

1. miss

2.miss

4. miss

3. miss

5. miss

6. miss

Page 52: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5252

Miss Rate: Fully-Associative CacheMiss Rate: Fully-Associative Cache

00000 01000 00110 10000 xxxxx xxxxx xxxxx xxxxx

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

11101 00 → memory address

cache address: tag

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

tag

byte offset

Memory references to addresses: 0, 8, 0, 6, 8, 16

1. miss

2. miss

3. hit

4. miss

5. hit

6. miss

Page 53: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5353

Finding a Word in Associative CacheFinding a Word in Associative Cache

Valid 5-bit Databit Tag

byte offset b6 b5 b4 b3 b2 b1 b0

= Data1 = hit0 = miss

5 bit Tag no index

Memory address

Cache size8 words

Block size= 1 word

32 words byte-address

No index,must comparewith all tagsin the cache

Page 54: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5454

Eight-Way Set-Associative CacheEight-Way Set-Associative Cache

byte offset

b6 b5 b4 b3 b2 b1 b0

Data1 = hit0 = miss

5 bit Tag

Memory address Cache size8 wordsBlock size= 1 word

32 words byte-address

=

V | tag | data

=

V | tag | data

=

V | tag | data

=

V | tag | data

=

V | tag | data

=

V | tag | data

=

V | tag | data

=

V | tag | data

8 to 1 multiplexer

Page 55: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5555

Two-Way Set-Associative CacheTwo-Way Set-Associative Cache

00011011

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

111 01 00 → memory address

cache address: tag

index

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

ind

ex

tag

s

000 | 011100 | 001110 | 101010 | 111

byte offset

LRU block

This block is needed

Page 56: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5656

Miss Rate: Two-Way Set-Associative CacheMiss Rate: Two-Way Set-Associative Cache

00011011

00000 0000001 0000010 0000011 0000100 0000101 0000110 0000111 00

01000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 00

10000 0010001 0010010 0010011 0010100 0010101 0010110 0010111 00

11000 0011001 0011010 0011011 0011100 0011101 0011110 0011111 00

Main memory

Cache of 8 blocks

111 01 00 → memory address

cache address: tag

index

32-

wo

rd w

ord

-ad

dre

ssab

le m

emo

ry Block size = 1 word

ind

ex

tag

s

000 | 010xxx | xxx001 | xxxxxx | xxx

byte offset

Memory references to addresses: 0, 8, 0, 6, 8, 16

1. miss

2. miss

4. m

iss

3. hit

5. hit6. miss

Page 57: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5757

Two-Way Set-Associative CacheTwo-Way Set-Associative Cache

byte offset

b6 b5 b4 b3 b2 b1 b0

Data1 = hit0 = miss

3 bit tag

Memory addressCache size8 wordsBlock size= 1 word

32 words byte-address

V | tag | data

V | tag | data

V | tag | data

V | tag | data

==

V | tag | data

V | tag | data

V | tag | data

V | tag | data

00011011

2 to

1 M

UX

2 bit index

Page 58: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5858

Cache Miss and Page FaultCache Miss and Page FaultD

isk

All data, organized inPages (~4KB), accessed byPhysical addresses

ProcessorCacheMMU

MainMemory

Pages

(Write-back, same as in

cache)Cached pages,Page table

Page fault: a required page is not found in main memory

Cache miss: a required block is not found in cache

“Page fault” in virtual memory is similar to “miss” in cache.

Page 59: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 5959

Virtual vs. Physical AddressVirtual vs. Physical AddressProcessor assumes a virtual memory addressing Processor assumes a virtual memory addressing scheme:scheme:

Disk is a virtual memory (large, slow)Disk is a virtual memory (large, slow)A block of data is called a virtual pageA block of data is called a virtual pageAn address is called virtual (or logical) address (VA)An address is called virtual (or logical) address (VA)

Main memory may have a different addressing Main memory may have a different addressing scheme:scheme:

Physical memory consists of cachesPhysical memory consists of cachesMemory address is called physical addressMemory address is called physical addressMMU translates virtual address to physical addressMMU translates virtual address to physical addressComplete address translation table is large and is kept in Complete address translation table is large and is kept in main memorymain memoryMMU contains TLB (translation-lookaside buffer), which MMU contains TLB (translation-lookaside buffer), which keeps record of recent address translations.keeps record of recent address translations.

Page 60: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Memory HierarchyMemory Hierarchy

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6060

Wordstransferred

via load/store

Main memoryPhysical Virtual

CacheRegisters (1 or more levels)

Blockstransferred

automaticallyupon cache miss

Pagestransferred

automaticallyupon page fault

Page 61: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Memory Hierarchy ExampleMemory Hierarchy Example32-bit address (byte addressing)32-bit address (byte addressing)

4 GB virtual main memory (disk space)4 GB virtual main memory (disk space)Page size = 4 KBPage size = 4 KB

Number of virtual pages = 4Number of virtual pages = 4✕2✕23030/(4✕2/(4✕21010) = 1M) = 1M

Bits for virtual page number = logBits for virtual page number = log22(1M) = 20(1M) = 20

128 MB physical main memory128 MB physical main memoryPage size 4 KBPage size 4 KB

Number of physical pages = 128Number of physical pages = 128✕2✕22020/(4✕2/(4✕21010) = 32K) = 32K

Bits for physical page number = logBits for physical page number = log22(32K) = 15(32K) = 15

Page table contains 1M records specifying where Page table contains 1M records specifying where each virtual page is located.each virtual page is located.

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6161

Page 62: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Page TablePage Table

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6262

Page tableregister

Address ofPage tablein mainmemory

1M v

irtua

l pag

e nu

mbe

rs

Virtual main memory(pages on disk)

Phy

sica

l mai

n m

emor

y (p

ages

)

Valid bitOther flags, e.g.,

dirty bit, LRU ref. bit

Page locations0123..K...

Page 0

Page 1

Page 2

Page 3

-2-13----0--

Page 63: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

32-bit Virtual Address (4 KB Page)32-bit Virtual Address (4 KB Page)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6363

1K words(4KB data)

A virtual page(contains 4KB,or 1K words)

20-bit virtual page number |10-b word number within page|2-b byte offset

32 bits (4 bytes)

Page 64: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Virtual to Physical Address TranslationVirtual to Physical Address Translation

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6464

20-bit virtual page number | 12-bit byte offset within page

15-bit physical page number | 12-bit byte offset within page

Address translation

Virtual address

Physical address

Page 65: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6565

Virtual Memory SystemVirtual Memory System

Processor

MMU:Memory

management unit with

TLB

Virtual or logical address (VA)

Physical address (PTE)

Main Memorywith page table

Physical address

Data

Data

DMA:Direct memory

access

Cache

Disk

SRAM

DRAM

Page 66: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

TLB: Translation-Lookaside BufferTLB: Translation-Lookaside Buffer

A processor request requires two A processor request requires two accesses to main memory:accesses to main memory:

Access page table to get physical addressAccess page table to get physical address

Access physical addressAccess physical address

TLB acts as a cache of page tableTLB acts as a cache of page tableHolds recent virtual to physical page translationsHolds recent virtual to physical page translations

Eliminates one main memory access if requested Eliminates one main memory access if requested virtual page address is found in TLB (hit)virtual page address is found in TLB (hit)

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6666

Page 67: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

TLB OrganizationTLB Organization

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6767

Page tableregister

Address ofPage tablein mainmemory

1M v

irtua

l pag

e nu

mbe

rs

Virtual main memory(pages on disk)

Phy

sica

l mai

n m

emor

y (p

ages

)

Valid bitOther flags, e.g.,

dirty bit, LRU ref. bit

Page locations

01234.K...

Page 0

Page 1

Page 2

Page 3

-2-13----0--

V D R Tag Phy. Pg. Addr.

1 1 1 4 3

1 0 1 1 2

Page 68: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

TLB DataTLB Data

V: Valid bitV: Valid bit

D: Dirty bitD: Dirty bit

R: Reference bit (LRU)R: Reference bit (LRU)

Tag: Index in page tableTag: Index in page table

Physical page addressPhysical page address

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6868

Page 69: Fall 2015, Nov 2... ELEC 5200-001/6200-001 Lecture 9 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2015 Memory Organization (Chapter 5)

Typical TLB CharacteristicsTypical TLB Characteristics

TLB size: 16 – 512 entriesTLB size: 16 – 512 entries

Block size: 1 – 2 page table entries of 4 – Block size: 1 – 2 page table entries of 4 – 8 bytes each8 bytes each

Hit time: 0.5 – 1 clock cycleHit time: 0.5 – 1 clock cycle

Miss penalty: 10 – 100 clock cyclesMiss penalty: 10 – 100 clock cycles

Miss rate: 0.01% – 1% Miss rate: 0.01% – 1%

Fall 2015, Nov 2 . . .Fall 2015, Nov 2 . . . ELEC 5200-001/6200-001 Lecture 9ELEC 5200-001/6200-001 Lecture 9 6969