the memory hierarchy jehan-françois pâris [email protected]

272
THE MEMORY HIERARCHY Jehan-François Pâris [email protected]

Upload: phillip-austin

Post on 27-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

THE MEMORY HIERARCHY

Jehan-François Pâ[email protected]

Page 2: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Chapter Organization• Technology overview• Caches

– Cache associativity, write through andwrite back, …

• Virtual memory– Page table organization, the translation lookaside

buffer (TLB), page fault handling, memory protection

• Virtual machines• Cache consistency

Page 3: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

TECHNOLOGY OVERVIEW

Page 4: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Dynamic RAM

• Standard solution for main memory since 70's– Replaced magnetic core memory

• Bits represented stored on capacitors– Charged state represents a one

• Capacitors discharge – Must be dynamically refreshed– Achieved by accessing each cell several

thousand times each second

Page 5: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Dynamic RAM

Row select

ColumnSelect

nMOS transistor

Capacitor

Ground

Page 6: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The role of the nMOS transistorRow select (gate)

ColumnSelect(source)

drain

• When the gate is positive with respect to the ground, electrons are attracted to the gate (the "field effect")and current can go through

• Normally, no current can go from the source to the drain

Not on the exam

Page 7: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Magnetic disks

Platter

R/W headArm

Servo

Page 8: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Magnetic disk (I)

• Data are stored into circular tracks• Tracks are partitioned into a variable number of

fixed-size sectors• If disk drive has more than one platter, all tracks

corresponding to the same position of the R/ W head form a cylinder

Page 9: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Magnetic disk (II)

• Disk spins at a speed varying between– 5,400 rpm (laptops) and– 15,000 rpm (Seagate Cheetah X15, …)– Accessing data requires

• Positioning the head on the right track: –Seek time

• Waiting for the data to reach the R/W head–On the average half a rotation

Page 10: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Disk access times

• Dominated by seek time and rotational delay

• We try to reduce seek times by placing all data that are likely to be accessed together on nearby tracks or same cylinder

• Cannot do as much for rotational delay– On the average half a rotation

Page 11: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Average rotational delay

RPM Delay

(ms)

5400 5.67200 4.2

10,000 3.015,000 2.0

Page 12: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Overall performance

• Disk access times are still dominated by rotational latency– Were 8-10 ms in the late 70's when rotational

speeds were 3,000 to 3,600 RPM• Disk capacities and maximum transfer rates have

done much better – Pack many more tracks per platter– Pack many more bits per track

Page 13: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The internal disk controller

• Printed circuit board attached to disk drive– As powerful as the CPU of a personal

computer of the early 80's• Functions include

– Speed buffering– Disk scheduling– …

Page 14: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Reliability issues

• Disk drives have more reliability issues than most other computer components– Moving parts eventually wear– Infant mortality– Would be too costly to produce

perfect magnetic surfaces• Disks have bad blocks

Page 15: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Disk failure rates

• Failure rates follow a bathtub curve– High infantile mortality – Low failure rate during useful life– Higher failure rates as disks wear out

Page 16: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Disk failure rates (II)

Failurerate

Time

Infantilemortality

Useful life

Wearout

Page 17: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Disk failure rates (III)

• Infant mortality effect can last for months for disk drives

• Cheap ATA disk drives seem to age less gracefully than SCSI drives

Page 18: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

MTTF

• Disk manufacturers advertise very highMean Times To Fail (MTTF) for their products– 500,000 to 1,000,000 hours, that is,

57 to 114 years• Does not mean that disk will last that long!• Means that disks will fail at an average rate of one

failure per 500,000 to 100,000 hours duringtheir useful life

Page 19: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More MTTF Issues (I)

• Manufacturers' claims are not supported by solid experimental evidence

• Obtained by submitting disks to a stress test at high temperature and extrapolating results to ideal conditions– Procedure raises many issues

Page 20: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More MTTF Issues (II)

• Failure rates observed in the field aremuch higher– Can go up to 8 to 9 percent per year

• Corresponding MTTFs are 11 to 12.5 years

• If we have 100 disks and a MTTF of 12.5 years, we can expect an average of 8 disk failures per year

Page 21: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Bad blocks (I)

• Also known as– Irrecoverable read errors– Latent sector errors

• Can be caused by– Defects in magnetic substrate– Problems during last write

Page 22: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Bad blocks (II)

• Disk controller uses redundant encoding that can detect and correct many errors

• When internal disk controller detects a bad block– Marks it as unusable– Remaps logical block address of bad block to

spare sectors• Each disk is extensively tested during

burn in period before being released

Page 23: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The memory hierarchy (I)

Level Device Access Time

1 Fastest registers(2 GHz CPU)

0.5 ns

2 Main memory 10-60 ns

3 Secondary storage (disk) 7 ms

4 Mass storage(CD-ROM library)

a few s

Page 24: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The memory hierarchy (II)

• To make sense of these numbers, let us consider an analogy

Page 25: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Writing a paper (I)

Level Resource Access Time

1 Open book on desk 1 s

2 Book on desk3 Book in library

4 Book far away

Page 26: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Writing a paper (II)

Level Resource Access Time

1 Open book on desk 1 s

2 Book on desk 20-120 s

3 Book in library

4 Book far away

Page 27: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Writing a paper (III)

Level Resource Access Time

1 Open book on desk 1 s

2 Book on desk 20-140 s

3 Book in library 162 days

4 Book far away

Page 28: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Writing a paper (IV)

Level Resource Access Time

1 Open book on desk 1 s

2 Book on desk 20-140 s

3 Book in library 162 days

4 Book far away 63 years

Page 29: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Major issues

• Huge gaps between– CPU speeds and SDRAM access times– SDRAM access times and disk access times

• Both problems have very different solutions– Gap between CPU speeds and SDRAM

access times handled by hardware– Gap between SDRAM access times and disk

access times handled by combination of software and hardware

Page 30: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Why?

• Having hardware handle an issue– Complicates hardware design– Offers a very fast solution– Standard approach for very frequent actions

• Letting software handle an issue– Cheaper– Has a much higher overhead– Standard approach for less frequent actions

Page 31: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Will the problem go away?

• It will become worse– RAM access times are not improving as fast

as CPU power– Disk access times are limited by rotational

speed of disk drive

Page 32: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

What are the solutions?

• To bridge the CPU/DRAM gap:– Interposing between the CPU and the DRAM

smaller, faster memories that cache the data that the CPU currently needs• Cache memories• Managed by the hardware and invisible to

the software (OS included)

Page 33: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

What are the solutions?

• To bridge the DRAM/disk drive gap:• Storing in main memory the data blocks that are

currently accessed (I/O buffer)• Managing memory space and disk space as a

single resource (Virtual memory)• I/O buffer and virtual memory are managed by

the OS and invisible to the user processes

Page 34: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Why do these solutions work?

• Locality principle:– Spatial locality:

at any time a process only accesses asmall portion of its address space

– Temporal locality:this subset does not change too frequently

Page 35: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Can we think of examples?

• The way we write programs• The way we act in everyday life

– …

Page 36: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

CACHING

Page 37: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The technology

• Caches use faster static RAM (SRAM)– Similar organization as that of D flipflops

• Can have– Separate caches for instructions and data

• Great for pipelining– A unified cache

Page 38: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (I)

• Consider a closed-stack library– Customers bring book requests to circulation desk– Librarians go to stack to fetch requested book

• Solution is used in national libraries– Costlier than open-stack approach– Much better control of assets

Page 39: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (II)

• Librarians have noted that some books get asked again and again– Want to put them closer to the circulation desk

• Would result in much faster service• The problem is how to locate these books

– They will not be at the right location!

Page 40: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (III)

• Librarians come with a great solution– They put behind the circulation desk shelves with

100 book slots numbered from 00 to 99– Each slot is a home for the most recently

requested book that has a call number whose last two digits match the slot number • 3141593 can only go in slot 93• 1234567 can only go in slot 67

Page 41: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (IV)

The call number of the book I need is 3141593

Let me see if it's in bin 93

Page 42: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (V)

• To let the librarian do her job each slot much contain either– Nothing or– A book and its reference number

• There are many books whose reference number ends in 93 or any two given digits

Page 43: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (VI)

Could I get this time the book whose call number 4444493?

Sure

Page 44: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A little story (VII)

• This time the librarian will– Go bin 93– Find it contains a book with a different call

number• She will

– Bring back that book to the stacks– Fetch the new book

Page 45: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Basic principles

• Assume we want to store in a faster memory 2n words that are currently accessed by the CPU– Can be instructions or data or even both

• When the CPU will need to fetch an instruction or load a word into a register– It will look first into the cache– Can have a hit or a miss

Page 46: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Cache hits

• Occur when the requested word is found in the cache– Cache avoided a memory access– CPU can proceed

Page 47: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Cache misses

• Occur when the requested word is not found in the cache– Will need to access the main memory– Will bring the new word into the cache

• Must make space for it by expelling one of the cache entries

–Need to decide which one

Page 48: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Handling writes (I)

• When CPU has to store the contents of a register into main memory– Write will update the cache

• If the modified word is already in the cache– Everything is fine

• Otherwise– Must make space for it by expelling one of the

cache entries

Page 49: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Handling writes (II)

• Two ways to handle writes– Write through:

• Each write updates both the cache and the main memory

– Write back:• Writes are not propagated to the main

memory until the updated word is expelled from the cache

Page 50: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Handling writes (II)

• Write through • Write back

CPU

Cache

RAM

CPU

Cache

RAM

later

Page 51: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Pros and cons

• Write through:– Ensures that memory is always up to date

• Expelled cache entries can be overwritten

• Write back:– Faster writes– Complicates cache expulsion procedure

• Must write back cache entries that have been modified in the cache

Page 52: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Picking the right solution

• Caches use write through:– Provides simpler cache expulsions– Can minimize write-through overhead with

additional circuitry• I/O Buffers and virtual memory use

write back:– Write-through overhead would be too high

Page 53: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A better write through (I)

• Add a small buffer to speed up write performance of write-through caches– At least four words

• Holds modified data until they are written into main memory – Cache can proceed as soon as data are

written into the write buffer

Page 54: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Write buffer

A better write through (II)

• Write through • Better write through

CPU

Cache

RAM

CPU

Cache

RAM

Page 55: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A very basic cache

• Has 2n entries• Each entry contains

– A word (4 bytes)– Its RAM address

• Sole way to identify the word– A bit indicating whether the cache entry

contains something useful

Page 56: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A very basic cache (I)

RAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address WordRAM Address Word

Actualcachesaremuchbigger

RAM Address WordRAM Address WordRAM Address WordRAM Address Word

RAM Address WordRAM Address WordRAM Address WordRAM Address Word

Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N

Page 57: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A very basic cache (II)

000001010011100101110111

RAM Address WordRAM Address WordRAM Address WordRAM Address Word

RAM Address WordRAM Address WordRAM Address WordRAM Address Word

Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N

Page 58: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Comments (I)

• The cache organization we have presented is nothing but the hardware implementation of a hash table

• Each entry has– a key: the word address– a value: word contents plus valid bit

Page 59: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Comments (II)

• The hash function is h(k) = (k/4) mod N

where k is the key and N is the cache size– Can be computed very fast

• Unlike conventional hash tables, this organization has no provision for handling collisions– Use expulsion to resolve collisions

Page 60: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Managing the cache

• Each word fetched into cache can occupy a single cache location– Specified by n+1 to 2 bits of its address

• Two words with the same n+1 to 2 bitscannot be at the same time in the cache– Happens whenever the addresses of the two

words differ by K 2n+2

Page 61: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• Assume cache can contain 8 words• If word 48 is in the cache it will be stored at

cache index (48/4) mod 8 = 12 mod 8 = 4• In our case 2n+2 = 23+2 = 32• The only possible cache index for word 80 would

be (80/4) mod 8 = 20 mod 8 = 4• Same for words 112, 144, 176, …

Page 62: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Managing the cache

• Each word fetched into cache can occupy a single cache location– Specified by n+1 to 2 bits of its address

• Two words with the same n+1 to 2 bitscannot be at the same time in the cache– Happens whenever the addresses of the two

words differ by K 2n+2

Page 63: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Saving cache space

• We do not need to store whole address of each word in cache– Bits 1 and 0 will always be zero– Bits n + 1 to 2 can be inferred from the

cache index• If cache has 8 entries, bits 4 to 2

– Will only store in tag the remaining bits of address

Page 64: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A very basic cache (III)

Cache uses bits 4 to 2ofwordaddress

000001010011100101110111

Bits 31:5 WordBits 31:5 WordBits 31:5 WordBits 31:5 Word

Bits 31:5 WordBits 31:5 WordBits 31:5 WordBits 31:5 Word

Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N

Page 65: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Storing a new word in the cache

• Location of new word entry will be obtained from LSB of word address – Discard 2 LSB

• Always zero for a well-aligned word– Remove n next LSB for a cache of size 2n

• Given by cache index

MSB of word address 00n next LSB

Page 66: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Accessing a word in the cache (I)

• Start with word address

• Remove two least significant bit• Always zero

Word address

Word address minus two LSB

Page 67: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Accessing a word in the cache (II)

• Split remainder of address into

– n least significant bits• Word address in the cache

– Cache tag

n LSB

Word address minus two LSB

Cache Tag

Page 68: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Towards a better cache

• Our cache takes into account temporal locality of accesses– Repeated accesses to the same location

• But not their spatial locality– Accesses to neighboring locations

• Cache space is poorly used– Need 26 + 1 bits of overhead to store

32 bits of data

Page 69: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Multiword cache (I)

• Each cache entry will contain a block of 2, 4 , 8, … words with consecutive addresses– Will require words to be well aligned

• Pair of words should start at an address that is multiple of 2×4 = 8

• Group of four words should start at an address that is multiple of 4×4 = 16

Page 70: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Multiword cache (II)

WordWord

WordWord

WordWord

WordWord

000001010011

000001

100101

010011

000001

110111

100101

010011

000001

ValidY/NY/NY/NY/NY/NY/NY/NY/N

110111

100101

010011

000001

ValidY/NY/NY/NY/NY/NY/NY/NY/N

110111

100101

010011

000001

TagValidY/NY/NY/NY/NY/NY/NY/NY/N

110111

100101

010011

000001

110111

100101

010011

000001

Contents

100101

010011

WordWord

WordWord

Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6Bits 31:6

WordWord

WordWord

WordWord

WordWord

Page 71: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Multiword cache (III)

• Has 2n entries each containing 2m words• Each entry contains

– 2m words – A tag– A bit indicating whether the cache entry

contains useful data

Page 72: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Storing a new word in the cache

• Location of new word entry will be obtained from LSB of word address – Discard 2 + m LSB

• Always zero for a well-aligned group of words

– Take n next LSB for a cache of size 2n

MSB of address 2 +m LSBn next LSB

Page 73: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• Assume– Cache can contain 8 entries– Each block contains 2 words

• Words 48 and 52 belong to the same block– If word 48 is in the cache it will be stored at cache

index (48 /8) mod 8 = 6 mod 8 = 6– If word 48 is in the cache it will be stored at cache

index (49 /8) mod 8 = 6 mod 8 = 6

Page 74: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Selecting the right block size

• Larger block sizes improve the performance of the cache– Allows us to exploit spatial locality

• Three limitations– Spatial locality effect less pronounced if block

size exceeds 128 bytes– Too many collisions in very small caches– Large blocks take more time to be fetched

into the cache

Page 75: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

1 KB

8 KB

16 KB

64 KB

256 KB

256

40%

35%

30%

25%

20%

15%

10%

5%

0%

Mis

s ra

te

64164

Block size (bytes)

Page 76: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Collision effect in small cache

• Consider a 4KB cache– If block size is 16 B, that is, 4 words,

cache will have 256 blocks– …– If block size is 128 B, that is 32 words,

cache will have 32 blocks• Too many collisions

Page 77: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Problem

• Consider a very small cache with 8 entries and a block size of 8 bytes (2 words)– Which words will be fetched in the cache

when the CPU accesses words at address 32, 48, 60 and 80?

– How will these words will be stored in the cache?

Page 78: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (I)

• Since block size is 8 bytes– 3 LSB of address used to address one of the

8 bytes in a block• Since cache holds 8 blocks,

– Next 3 LSB of address used by the cache index

• As a result, tag has 32 – 3 – 3 =26 bits

Page 79: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (I)

• Consider words at address 32

• Cache index is (32/23) mod 23 = (32/8) mod 8 = 4• Block tag is 32/26 = 32/64 = 0

Row 4 Tag=0 32 33 34 35 36 37 38 39

Page 80: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (II)

• Consider words at address 48• Cache index is (48/8) mod 8 =6• Block tag is 48/64 = 0

Row 6 Tag=0 48 49 50 51 52 53 54 55

Page 81: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (III)

• Consider words at address 60• Cache index is (60/8) mod 8 =7• Block tag is 60/64 = 0

Row 6 Tag=0 56 57 58 59 60 61 62 63

Page 82: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (IV)

• Consider words at address 80• Cache index is (80/8) mod 8 = 10 mod 8 = 2• Block tag is 80/64 = 1

Row 2 Tag=1 80 81 82 83 84 85 86 67

Page 83: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Set-associative caches (I)

• Can be seen as 2, 4, 8 caches attached together• Reduces collisions

Page 84: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Set-associative caches (II)

000001010011100101110111

Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block

Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block

Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N

000001010011100101110111

Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block

Bits 31:5 BlockBits 31:5 BlockBits 31:5 BlockBits 31:5 Block

Tag ContentsValidY/NY/NY/NY/NY/NY/NY/NY/N

Page 85: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Set-associative caches (III)

• Advantage:– We take care of more collisions

• Like a hash table with a fixed bucket size– Results in lower miss rates than direct-

mapped caches• Disadvantage:

– Slower access– Best solution if miss penalty is very big

Page 86: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Fully associative caches

• The dream!• A block can occupy any index position in the

cache• Requires an associative memory

– Content-addressable– Like our brain!

• Remain a dream

Page 87: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Designing RAM to support caches

• RAM connected to CPU through a "bus"– Clock rate much slower than CPU clock rate

• Assume that a RAM access takes– 1 bus clock cycle to send the address– 15 bus clock cycle to initiate a read– 1 bus clock cycle to send a word of data

Page 88: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Designing RAM to support caches

• Assume– Cache block size is 4 words– One-word bank of DRAM

• Fetching a cache block would take1 + 4×15 + 4×1 = 65 bus clock cycles

– Transfer rate is 0.25 byte/bus cycle• Awful!

Page 89: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Designing RAM to support caches

• Could – Double bus width (from 32 to 64 bits)– Have a two-word bank of DRAM

• Fetching a cache block would take1 + 2×15 + 2×1 = 33 bus clock cycles

– Transfer rate is 0.48 byte/bus cycle• Much better

• Costly solution

Page 90: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Designing RAM to support caches

• Could – Have an interleaved memory organization– Four one-word banks of DRAM– A 32-bit bus

32 bits

RAMbank 1

RAMbank 0

RAMbank 2

RAMbank 3

Page 91: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Designing RAM to support caches

• Can do the 4 accesses in parallel• Must still transmit the block 32 bits by 32 bits• Fetching a cache block would take

1 + 15 + 4×1 = 20 bus clock cycles– Transfer rate is 0.80 word/bus cycle

• Even better• Much cheaper than having a 64-bit bus

Page 92: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

ANALYZING CACHE PERFORMANCE

Page 93: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Memory stalls

• Can divide CPU time into– NEXEC clock cycles spent executing

instructions – NMEM_STALLS cycles spent waiting for memory

accesses• We have

CPU time = (NEXEC + NMEM_STALLS)×TCYCLE

Page 94: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Memory stalls

• We assume that– cache access times can be neglected– most CPU cycles spent waiting for memory

accesses are caused by cache misses• Distinguishing between read stalls and write

stalls

NMEM_STALLS = NRD_STALLS + NWR_STALLS

Page 95: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Read stalls

• Fairly simple

NRD_STALLS = NMEM_RD×Read miss rate×

Read miss penalty

Page 96: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Write stalls (I)

• Two causes of delays– Must fetch missing blocks before updating them

• We update at most 8 bytes of the block!– Must take into account cost of write through

• Buffering delay depends of proximity of writes not number of cache misses

–Writes too close to each other

Page 97: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Write stalls (II)

• We have

NWR_STALLS =NWRITES×Write miss rate×

Write miss penalty + NWR_BUFFER_STALLS

• In practice, very few buffer stalls if the buffer contains at least four words

Page 98: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Global impact

• We have

NMEM_STALLS = NMEM_ACCESSES×Cache miss rate×

Cache miss penalty • and also

NMEM_STALLS = NINSTRUCTIONS×(NMISSES/Instruction)×

Cache miss penalty

Page 99: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• Miss rate of instruction cache is 2 percentMiss rate of data cache is 4 percentIn the absence of memory stalls, each instruction would take 2 cyclesMiss penalty is 100 cycles36 percent of instructions access the main memory

• How many cycles are lost due to cache misses?

Page 100: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (I)

• Impact of instruction cache misses0.02×100 =2 cycles/instruction

• Impact of data cache misses0.36×0.04×100 =1.44 cycles/instruction

• Total impact of cache misses2 + 1.44 = 3.44 cycles/instruction

Page 101: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (II)

• Average number of cycles per instruction2 + 3.44 = 5.44 cycles/instruction

• Fraction of time wasted 3.44 /5.44 = 63 percent

Page 102: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Problem

• Redo the example with the following data– Miss rate of instruction cache is 3 percent

Miss rate of data cache is 5 percentIn the absence of memory stalls, each instruction would take 2 cyclesMiss penalty is 100 cycles40 percent of instructions access the main memory

Page 103: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution

• The fraction of time wasted to memory stalls is 71 percent

Page 104: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Average memory access time

• Some authors call it AMATTAVERAGE = TCACHE + f×TMISS

where f is the cache miss rate• Times can be expressed

– In nanoseconds– In number of cycles

Page 105: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• A cache has a hit rate of 96 percent• Accessing data

– In the cache requires one cycle– In the memory requires 100 cycles

• What is the average memory access time?

Page 106: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution

• Miss rate = 1 – Hit rate = 0.04• Applying the formula

TAVERAGE = 1 + 0.04×100 = 5 cycles

Corrected

Page 107: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Impact of a better hit rate

• What would be the impact of improving the hit rate of the cache from 96 to 98 percent?

Page 108: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution

• New miss rate = 1 – New hit rate = 0.02• Applying the formula

TAVERAGE = 1 + 0.02×100 = 3 cycles

When the hit rate is above 80 percent small improvements in the hit rate willresult in much better miss rate

Page 109: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Examples

• Old hit rate: 80 percentNew hit rate: 90 percent– Miss rates goes from 20 to 10 percent!

• Old hit rate: 94 percentNew hit rate: 98 percent– Miss rates goes from 6 to 2 percent!

Page 110: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

In other words

It's the miss rate, stupid!

Page 111: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Improving cache hit rate

• Two complementary techniques– Using set-associative caches

• Must check tags of all blocks with the same index values

–Slower • Have fewer collisions

–Fewer misses– Use a cache hierarchy

Page 112: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A cache hierarchy (I)

CPU

L1

L2

L3

RAM

L1 misses

L2 misses

L3 misses

Page 113: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A cache hierarchy

• Topmost cache– Optimized for speed, not miss rate– Rather small– Uses a small block size

• As we go down the hierarchy– Cache sizes increase– Block sizes increase– Cache associativity level increases

Page 114: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• Cache miss rate per instruction is 2 percentIn the absence of memory stalls, each instruction would take one cycleCache miss penalty is 100 nsClock rate is 4GHz

• How many cycles are lost due to cache misses?

Page 115: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (I)

• Duration of clock cycle1/(4 Ghz) = 0.25×10-9 s = 0.25 ns

• Cache miss penalty100ns = 400 cycles

• Total impact of cache misses0.02×400 = 8 cycles/instruction

Page 116: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (II)

• Average number of cycles per instruction1 + 8 = 9 cycles/instruction

• Fraction of time wasted 8/9 = 89 percent

Page 117: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example (cont'd)

• How much faster would the processor if we added a L2 cache that – Has a 5 ns access time– Would reduce miss rate to main memory to

0.5 percent?• Will see later how to get that

Page 118: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (I)

• L2 cache access time5ns = 20 cycles

• Impact of cache misses per instructionL1 cache misses + L2 cache misses =

0.02×20+0.005×400 = 0.4 + 2.0 =2.4 cycles/instruction

• Average number of cycles per instruction1 + 2.4 = 3.4 cycles/instruction

Page 119: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (II)

• Fraction of time wasted 2.4/3.4 = 63 percent

• CPU speedup 9/3.4 = 2.6

Page 120: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

How to get the 0.005 miss rate

• Wanted miss rate corresponds to a combined cache hit rate of 99.5 percent

• Let H1 be hit rate of L1 cache and H2 be the hit rate of the second cache

• The combined hit rate of the cache hierarchy isH = H1 +(1-H1)H2

Page 121: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

How to get the 0.005 miss rate

• We have0.995 = 0.98 +0.02H2

• H2 = (0.995 – 0.98)/0.02 = 0.75– Quite feasible!

Page 122: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Can we do better? (I)

• Keep 98 percent hit rate for L1 cache• Raise hit rate of L2 cache to 85 percent

– L2 cache is now slower: 6ns• Impact of cache misses per instruction

L1 cache misses + L2 cache misses =0.02×24+0.02×0.15×400 = 0.48 + 1.2 =1.68 cycles/instruction

Page 123: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The verdict

• Fraction of time wasted per cycle1.68/2.68 = 63 percent

• CPU speedup 9/2.68 = 3.36

Page 124: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Would a faster L2 cache help?

• Redo the example assuming – Miss rate of L1 cache is till 98 percent– New faster L2 cache

• Access time reduced to 3 ns• Hit rate only 50 percent

Page 125: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The verdict

• Fraction of time wasted 87 percent

• CPU speedup 1.72

New L2 cache with a lower access timebut a higher miss rate performs much worsethan original L2 cache

Page 126: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Cache replacement policy

• Not an issue in direct mapped caches– We have no choice!

• An issue in set-associative caches – Best policy is least recently used (LRU)

• Expels from the cache a block in thesame set as the incoming block

• Pick block that has not been accessed for the longest period of time

Page 127: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Implementing LRU policy

• Easy when each set contains two blocks– We attach to each block a use bit that is

• Set to 1 when the block is accessed• Reset to 0 when the other block is accessed

– We expel block whose use bit is 0• Much more complicated for higher associativity

levels

Page 128: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

REALIZATIONS

Page 129: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Caching in a multicore organization

• Multicore organizations often involve multiple chips– Say four chips with four cores per chip

• Have a cache hierarchy on each chip– L1, L2, L3 – Some caches are private, other are shared

• Accessing a cache on a chip is much faster than accessing a cache on another chip

Page 130: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

AMD 16-core system (I)

• AMD 16-core system– Sixteen cores on four chips

• Each core has a 64-KB L1 and a 512-KB L2 cache

• Each chip has a 2-MB shared L3 cache

Page 131: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

X/Y where X is latency in cyclesY is bandwidth in bytes/cycle

Page 132: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

AMD 16-core system (II)

• Observe that access times are non-uniform– Takes more time to access L1 or L2 cache of

another core than accessing shared L3 cache– Takes more time to access caches in another

chip than local caches– Access times and bandwidths depend on

chip interconnect topology

Page 133: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

VIRTUAL MEMORY

Page 134: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Main objective (I)

• To allow programmers to write programs that reside– partially in main memory– partially on disk

Page 135: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Main objective (II)

Main memory Address space (I)

Address space (II)

Page 136: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Motivation

• Most programs do not access their whole address space at the same time

• Compilers go through several phases– Lexical analysis– Preprocessing (C, C++)– Syntactic analysis– Semantic analysis– …

Page 137: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Advantages (I)

– VM allows programmers to write programs that would not otherwise fit in main memory• They will run although much more slowly• Very important in 70's and 80's

– VM allows OS to allocate the main memory much more efficiently• Do not waste precious memory space• Still important today

Page 138: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Advantages

• VM let programmers use– Sparsely populated– Very large address spaces

VMDC S L

Page 139: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Sparsely populated address spaces

• Let programmers put different items apart from each other– Code segment– Data segment– Stack– Shared library– Mapped files

Wait untilyou take 4330 to

study this

Page 140: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Big difference with caching

• Miss penalty is much bigger– Around 5 ms– Assuming a memory access time of 50 ns,

5 ms equals 100,000 memory accesses– For caches, miss penalty was around

100 cycles

Page 141: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Consequences

• Will use much larger block sizes– Blocks, here called pages, measure 4 KB, 8KB,

… with 4 KB an unofficial standard• Will use fully associative mapping to reduce

misses, here called page faults• Will use write back to reduce disk accesses

– Must keep track of modified (dirty) pages in memory

Page 142: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Virtual memory

• Combines two big ideas– Non-contiguous memory allocation:

processes are allocated page frames scattered all over the main memory

– On-demand fetch:Process pages are brought in main memory when they are accessed for the first time

• MMU takes care of almost everything

Page 143: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Main memory

• Divided into fixed-size page frames– Allocation units – Sizes are powers of 2 (512B, …, 4KB, … )– Properly aligned– Numbered 0 , 1, 2, . . .

0 1 2 3 4 5 6 7 8

Page 144: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Program address space

• Divided into fixed-size pages– Same sizes as page frames– Properly aligned– Also numbered 0 , 1, 2, . . .

0 1 2 3 4 5 6 7

Page 145: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• Will allocate non contiguous page frames to the pages of a process

0 1 2

3 4 5 6 70 1 2

Page 146: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

Page Number Frame number

0 01 42 2

Page 147: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• Assuming 1KB pages and page frames

Virtual Addresses Physical Addresses

0 to 1,023 0 to 1,023

1,024 to 2,047 4,096 to 5,119

2,048 to 3,071 2,048 to 3,071

Page 148: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• Observing that 210 = 1000000000 in binary• We will write 0-0 for ten zeroes and 1-1 for ten

ones

Virtual Addresses Physical Addresses

0000-0 to 0001-1 0000-0 to 0001-1

0010-0 to 0011-1 1000-0 to 1001-1

0100-0 to 0101-1 0100-0 to 0101-1

Page 149: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• The ten least significant bits of the address do not change

Virtual Addresses Physical Addresses

000 0-0 to 000 1-1 000 0-0 to 000 1-1

001 0-0 to 001 1-1 100 0-0 to 100 1-1

010 0-0 to 010 1-1 010 0-0 to 010 1-1

Page 150: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• Must only map page numbers into page frame numbers

Page number Page frame number

000 000

001 100

010 010

Page 151: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• Same in decimal

Page number Page frame number

0 0

1 4

2 2

Page 152: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The mapping

• Since page numbers are always in sequence, they are redundant

Page number Page frame number

0 0

1 4

2 2 X

Page 153: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The algorithm

• Assume page size = 2p

• Remove p least significant bits from virtual address to obtain the page number

• Use page number to find corresponding page frame number in page table

• Append p least significant bits from virtual address to page frame number to get physical address

Page 154: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Realization

2 897

897

1

5

7

3

5

Virtual Address

Physical Address

PAGE TABLE

Page No Offset

PageFrameNo

(10 bits)

Page 155: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The offset

• Offset contains all bits that remain unchanged through the address translation process

• Function of page size

Page size Offset

1 KB 10 bits2 KB 11 bits 4KB 12 bits

Page 156: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The page number

• Contains other bits of virtual address• Assuming 32-bit addresses

Page size Offset Page number

1 KB 10 bits 22 bits2 KB 11 bits 21 bits

4KB 12 bits 20 bits

Page 157: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Internal fragmentation

• Each process now occupies an integer number of pages

• Actual process space is not a round number– Last page of a process is rarely full

• On the average, half a page is wasted– Not a big issue– Internal fragmentation

Page 158: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

On-demand fetch (I)

• Most processes terminate without having accessed their whole address space– Code handling rare error conditions, . . .

• Other processes go to multiple phases during which they access different parts of their address space– Compilers

Page 159: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

On-demand fetch (II)

• VM systems do not fetch whole address space of a process when it is brought into memory

• They fetch individual pages on demand when they get accessed the first time– Page miss or page fault

• When memory is full, they expel from memory pages that are not currently in use

Page 160: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

On-demand fetch (III)

• The pages of a process that are not in main memory reside on disk– In the executable file for the program being

run for the pages in the code segment– In a special swap area for the data pages that

were expelled from main memory

Page 161: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

On-demand fetch (IV)

Main memory Code Data

Disk Executable

Swap area

Page 162: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

On-demand fetch (V)

• When a process tries to access data that are nor present in main memory– MMU hardware detects that the page is

missing and causes an interrupt– Interrupt wakes up page fault handler– Page fault handler puts process in waiting state

and brings missing page in main memory

Page 163: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Advantages

• VM systems use main memory more efficiently than other memory management schemes– Give to each process more or less what it

needs • Process sizes are not limited by the size of main

memory– Greatly simplifies program organization

Page 164: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Sole disadvantage

• Bringing pages from disk is a relatively slow operation– Takes milliseconds while memory access take

nanoseconds• Ten thousand times to hundred thousand

times slower

Page 165: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The cost of a page fault

• Let– Tm be the main memory access time

– Td the disk access time– f the page fault rate– Ta the average access time of the VM

Ta = (1 – f ) Tm + f (Tm + Td ) = Tm + f Td

Page 166: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• Assume Tm = 50 ns and Td = 5 ms

f Mean memory access time

10-3 = 50 ns + 5 ms/103 = 5,050 ns

10-4 = 50 ns + 5 ms/104 = 550 ns

10-5 = 50 ns + 5 ms/105 = 100 ns

10-6 = 50 ns + 5 ms/ 106 = 55 ns

Page 167: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Conclusion

• Virtual memory works best when page fault rate is less than a page fault per 100,000 instructions

Page 168: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Locality principle (I)

• A process that would access its pages in a totally unpredictable fashion would perform very poorly in a VM system unless all its pages are in main memory

Page 169: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Locality principle (II)

• Process P accesses randomly a very large array consisting of n pages

• If m of these n pages are in main memory, the page fault frequency of the process will be( n – m )/ n

• Must switch to another algorithm

Page 170: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Tuning considerations

• In order to achieve an acceptable performance,a VM system must ensure that each process has in main memory all the pages it is currently referencing

• When this is not the case, the system performance will quickly collapse

Page 171: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

First problem

• A virtual memory system has– 32 bit addresses– 8 KB pages

• What are the sizes of the – Page number field?– Offset field?

Page 172: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (I)

• Step 1:Convert page size to power of 2

8 KB = 2----- B

• Step 2:Exponent is length of offset field

Page 173: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solution (II)

• Step 3:Size of page number field =Address size – Offset size

Here 32 – ____ = _____ bits

• Highlight the text in the box to see the answers

13 bits for the offset and 19 bits for the page number

Page 174: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

PAGE TABLE REPRESENTATION

Page 175: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Page table entries

• A page table entry (PTE) contains– A page frame number– Several special bits

• Assuming 32-bit addresses, all fit into four bytes

Page frame number Bits

Page 176: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The special bits (I)

• Valid bit:1 if page is in main memory, 0 otherwise

• Missing bit:1 if page is in not main memory, 0 otherwise

• Serve the same functionUse different conventions

Page 177: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The special bits (II)

• Dirty bit:1 if page has been modified since it was brought into main memory,0 otherwise– A dirty page must be saved in the process

swap area on disk before being expelled from main memory

– A clean page can be immediately expelled

Page 178: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The special bits (III)

• Page-referenced bit:1 if page has been recently modified,0 otherwise– Often simulated in software

Page 179: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Where to store page tables

• Use a three-level approach• Store parts of page table

– In high speed registers located in the MMU:the translation lookaside buffer (TLB)(good solution)

– In main memory (bad solution)– On disk (ugly solution)

Page 180: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The translation look aside buffer

• Small high-speed memory– Contains fixed number of PTEs– Content-addressable memory

• Entries include page frame number and page number

Page frame number BitsPage number

Page 181: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Realizations (I)

• TLB of Intrisity FastMATH– 32-bit addresses– 4 KB pages– Fully associative TLB with 16 entries– Each entry occupies 64 bits

• 20 bits for page number• 20 bits for page frame number• Valid bit, dirty bit, …

Page 182: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Realizations (II)

• TLB of ULTRA SPARC III– 64-bit addresses

• Maximum program size is 244 bytes, that is,16 TB

– Supported page sizes are 4 KB, 16KB, 64 KB, 4MB ("superpages")

Page 183: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Realizations (III)

• TLB of ULTRA SPARC III– Dual direct-mapping (?) TLB

• 64 entries for code pages• 64 entries for data pages

– Each entry occupies 64 bits• Page number and page frame number• Context• Valid bit, dirty bit, …

Page 184: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The context (I)

• Conventional TLBs contain the PTE's for a specific address space – Must be flushed each time the OS switches

from the current process to a new process• Frequent action in any modern OS

– Introduces a significant time penalty

Page 185: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The context (II)

• UltraSPARC III architecture adds to TLB entries a context identifying a specific address space– Page mappings from different address

spaces can coexist in the TLB– A TLB hit now requires a match for both

page number and context– Eliminates the need to flush the TLB

Page 186: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

TLB misses

• When a PTE cannot be found in the TLB, a TLB miss is said to occur

• TLB misses can be handled– By the computer firmware:

• Cost of miss is one extra memory access– By the OS kernel:

• Cost of miss is two context switches

Page 187: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Letting SW handle TLB misses

• As in other exceptions, must save current value of PC in EPC register

• Must also assert the exception by the end of the clock cycle during which the memory access occurs– In MIPS, must prevent WB cycle to occur after

MEM cycle that generated the exception

Page 188: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• Consider the instructionlw $1, 0($2)

– If word at address $2 is not in the TLB,we must prevent any update of $1

Page 189: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Performance implications

• When TLB misses are handled by the firmware, they are very cheap– A TLB hit rate of 99% is very good:

Average access cost will be

Ta = 0.99×Tm + 0.01×2Tm= 1.01Tm

• Less true if TLB misses are handled by the kernel

Page 190: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Storing the rest of the page table

• PTs are too large to be stored in main memory– Will store active part of the PT in main memory– Other entries on disk

• Three solutions– Linear page tables– Multilevel page tables– Hashed page tables

Page 191: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Storing the rest of the page table

• We will review these solutions even though page table organizations are an operating system topic

Page 192: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Linear page tables (I)

• Store PT in virtual memory (VMS solution)• Very large page tables need more than 2 levels

(3 levels on MIPS R3000)

Page 193: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Linear page tables (II)

PhysicalMemory

Virtual MemoryVirtual Memory

PTOtherOther PTs PTs

Page 194: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Linear page tables (III)

• Assuming a page size of 4KB,– Each page of virtual memory requires 4 bytes

of physical memory– Each PT maps 4GB of virtual addresses– A PT will occupy 4MB– Storing these 4MB in virtual memory will

require 4KB of physical memory

Page 195: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Multi-level page tables (I)

• PT is divided into – A master index that always remains in main

memory– Sub indexes that can be expelled

Page 196: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Multi-level page tables (II)

VIRTUAL ADDRESS

PHYSICAL ADDRESS

MASTER INDEX

Offset

Offset

1ary 2ary

SUBINDEX

< Page Number >

Frame No

FrameAddr

(unchanged)

Page 197: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Multi-level page tables (III)

• Especially suited for a page size of 4 KB and 32 bits virtual addresses

• Will allocate– 10 bits of the address for the first level,– 10 bits for the second level, and– 12 bits for the offset.

• Master index and sub indexes will all have 210

entries and occupy 4KB

Page 198: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Hashed page tables (I)

• Only contain paged that are in main memory– PTs are much smaller

• Also known as inverted page tables

Page 199: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Hashed page table (II)

PN hashPN

PFN

PN = page numberPFN = page frame number

Page 200: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Selecting the right page size

• Increasing the page size– Increases the length of the offset– Decreases the length of the page number– Reduces the size of page tables

• Less entries– Increases internal fragmentation

• 4KB seems to be a good choice

Page 201: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

MEMORY PROTECTION

Page 202: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Objective

• Unless we have an isolated single-user system, we must prevent users from– Accessing– Deleting– Modifying

the address spaces of other processes, including the kernel

Page 203: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Historical considerations

• Earlier operating systems for personal computers did not have any protection– They were single-user machines– They typically ran one program at a time

• Windows 2000, Windows XP, Vista and MacOS X are protected

Page 204: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Memory protection (I)

• VM ensures that processes cannot access page frames that are not referenced in their page table.

• Can refine control by distinguishing among– Read access– Write access– Execute access

• Must also prevent processes from modifying their own page tables

Page 205: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Dual-mode CPU

• Require a dual-mode CPU• Two CPU modes

– Privileged mode or executive mode that allows CPU to execute all instructions

– User mode that allows CPU to execute only safe unprivileged instructions

• State of CPU is determined by a special bits

Page 206: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Switching between states

• User mode will be the default mode for all programs– Only the kernel can run in supervisor mode

• Switching from user mode to supervisor mode is done through an interrupt – Safe because the jump address is at a well-

defined location in main memory

Page 207: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Memory protection (II)

• Has additional advantages:– Prevents programs from corrupting address

spaces of other programs– Prevents programs from crashing the kernel

• Not true for device drivers which are inside the kernel

• Required part of any multiprogramming system

Page 208: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

INTEGRATING CACHES AND VM

Page 209: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The problem

• In a VM system, each byte of memory has two addresses– A virtual address– A physical address

• Should cache tags contain virtual addresses or physical addresses?

Page 210: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Discussion• Using virtual addresses

– Directly available– Bypass TLB– Cache entries specific

to a given address space

– Must flush caches when the OS selects another process

• Using physical addresses– Must access first TLB– Cache entries not

specific to a given address space

– Do not have to flush caches when the OS selects another process

Page 211: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The best solution

• Let the cache use physical addresses– No need to flush the cache at each context

switch– TLB access delay is tolerable

Page 212: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Processing a memory access (I)

• if virtual address in TLB :get physical address

else :

create TLB miss exceptionbreak

I use Python because it is very compact:hetland.org/writing/instant-python.html

Page 213: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Processing a memory access (II)

if read_access :while data not in cache :

stalldeliver data to CPU

else : # write_access

… Continues on next page

Page 214: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Processing a memory access (III)if write_access_OK :

while data not in cache :stall

write data into cacheupdate dirty bitput data and address in write buffer

else :

# illegal accesscreate TLB miss exception

Page 215: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More Problems (I)

• A virtual memory system has a virtual address space of 4 Gigabytes and a page size of 4 Kilobytes. Each page table entry occupies 4 bytes.

Page 216: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More Problems (II)

• How many bits are used for the byte offset?

• Since 4K =2___, the byte offset will

use __ bits.

• Highlight text in box to see the answer

Since 4KB= 212bytes, the byte offset uses 12 bits

Page 217: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More Problems (III)

• How many bits are used for the page number?

• Since 4G = 2__ we will have __-bit virtual addresses. Since the byte offset occupies ___ of these __ bits, __ bits are left for the page number.

The page number uses 20 bits of the address

Page 218: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More Problems (IV)

• What is the maximum number of page table entries in a page table?

• Address space/ Page size =

2__ / 2__ =

2 ___ PTE’s.

220 page table entries

Page 219: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

More problems (VI)

• A computer has 32 bit addresses and a page size of one kilobyte.

• How many bits are used to represent the page number?

___ bits• What is the maximum number of entries in a

process page table?2___ entries

Page 220: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Answer

• As 1KB = 210 bytes, the byte offset occupies10 bits

• The page number uses the remaining 22 bits ofthe address

Page 221: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Some review questions

• Why are TLB entries 64-bit wide while page table entries only require 32 bits?

• What would be the main disadvantage of a virtual memory system lacking a dirty bit?

• What is the big limitation of VM systems that cannot prevent processes from executing the contents of any arbitrary page in their address space?

Page 222: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Answers

• We need extra space for storing the page number

• It would have to write back to disk all pages thatit expels even when they were not modified

• It would make the system less secure

Page 223: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

VIRTUAL MACHINES

Page 224: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Key idea

• Let different operating systems run at the same time on a single computer– Windows, Linux and Mac OS– A real-time OS and a conventional OS– A production OS and a new OS being tested

Page 225: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

How it is done

• A hypervisor /VM monitor defines two or more virtual machines

• Each virtual machine has– Its own virtual CPU– Its own virtual physical memory– Its own virtual disk(s)

Page 226: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The virtualization process

Actual hardware

CPU

Memory

Disk

Virtual hardware # 1

CPU

Memory

Virtual hardware # 2

CPU

Memory

Disk

Virtual hardware # 1

CPU

Memory

Disk

Hypervisor

Page 227: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Reminder

• In a conventional OS,– Kernel executes in privileged/supervisor

mode• Can do virtually everything

– User processes execute in user mode• Cannot modify their page tables• Cannot execute privileged instructions

Page 228: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

KernelPrivileged

mode

Usermode

User processUser process

System call

Page 229: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Two virtual machines

HypervisorPrivilegedmode

Usermode

Userprocess

VM Kernel

Userprocess

Userprocess

VM Kernel

Userprocess

Page 230: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Explanations (II)

• Whenever the kernel of a VM issues a privileged instruction, an interrupt occurs– The hypervisor takes control and do the physical

equivalent of what the VM attempted to do:• Must convert virtual RAM addresses into

physical RAM addresses• Must convert virtual disk block addresses into

physical block addresses

Page 231: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Translating a block address

VM kernel

Virtual disk

Access block x, yof my virtual disk

That's block v, w of the actual disk

Actual disk

Hypervisor

Access block v, w

of actual disk

Page 232: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Handling I/Os

• Difficult task because– Wide variety of devices– Some devices may be shared among several

VMs• Printers• Shared disk partition

–Want to let Linux and Windowsaccess the same files

Page 233: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Virtual Memory Issues

• Each VM kernel manages its own memory– Its page tables map program virtual

addresses into what it believes to be physical addresses

Page 234: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The dilemma

User processA

VM kernel

Page 735 of process A is stored in page frame 435

That's page frame 993 of the actual RAM

Hypervisor

Page 235: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The solution (I)

• Address translation must remain fast!– Hypervisor lets each VM kernel manage their

own page tables but do not use them• They contain bogus mappings!

– It maintains instead its own shadow page tables with the correct mappings• Used to handle TLB misses

Page 236: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Why it works

• Most memory accesses go through the TLB

• The system can tolerate slower page table updates

Page 237: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The solution (II)

• To keep its shadow page tables up to date, hypervisor must track any changes made by the VM kernels

• Mark page tables read-only–Each attempt to update then by a VM

kernel results in an interrupt

Page 238: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Nastiest Issue

• The whole VM approach assumes that a kernel executing in user mode will behave exactly like a kernel executing in privileged mode except that privileged instruction will be trapped

• Not true for all architectures!– Intel x86 Pop flags (POPF) instruction– …

Page 239: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

POPF instruction

• Pop top of stack into lower 16 bits of EFLAGS – Designed for a 16-bit architecture

• FFLAGS contains interrupt enable flag (IE)• When executed in privileged mode, POPF

updates all flags• When executed in user mode, POPF updates all

flags but the IE flag

Page 240: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Solutions

1. Modify the instruction set and eliminate instructions like POPF• IBM redesigned the instruction set of their

360 series for the 370 series2. Mask it through clever software

• Dynamic "binary translation" when direct execution of code could not work(VMWare)

Page 241: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Other Approaches (I)

• Can use the VM approach to let binaries written in a specific machine language run on a machine with a different instruction set

• Called emulators• Have a huge performance penalty

– Still work fairly well when the target machine is much faster than the original architecture

– Lets us run very old binaries

Page 242: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Other Approaches (II)

• Can use the VM approach to let programs written in any arbitrary low-level language run on many different architectures

• Java virtual machine (JVM)– Ported to may architectures– Allow execution of programs written in

"bytecode"– Professes to be inherently safe

Page 243: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

CP/CMS (I)

• IBM was the dominant computer manufacturer during the 60's and the 70's– Machines were designed for batch processing– Lacked any decent time-sharing OS

• Wanted by universities– TSS/360 was not a great success

Page 244: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

CP/CMS (II)

• IBM Cambridge Scientific Center– In Cambridge, MA– Developed a combination of

• A Control Program (CP) supporting virtual machines

• A time-sharing OS (CMS) for a single user• Was a great success!

Page 245: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

CP/CMS (III)

• How it worked

CMS on a VM

CMS on a VM

CMS on a VM

CMS on a VM

CP (hypervisor)

Page 246: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

CACHE CONSISTENCY

Page 247: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

The problem• Specific to architectures with

– Several processors sharing the same main memory

– Multicore architectures• Each core/processor has its own private cache

– Needed for performance • Problem occur when same data are present in

two or more private caches

Page 248: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (I)

RAM

CPU

Cachex=0

CPU

Cachex=0

Page 249: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (II)

RAM

CPU

Cachex=1

CPU

Cachex=0

Increments x

Still assumes x =0

Page 250: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Our Objective

• Single copy serializability– All operations on all the variables should have

the same effect as if they were executed• in sequence with• a single copy of each variable

Page 251: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

One-copy serializability rules

1. Whenever a processes accesses a variable it always gets the value stored by the processor that updated that variable last

2. A processor accessing a variable sees all updates applied to that variable in thesame order

– The exact order does not matter as long as everybody agrees on it

Page 252: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example

RAM

CPU

Cachex=1

Sets x to 1

CPU

Cachex=0

Resets x to 0

CPU

Cachex=?

CPU

Cachex=?

Both CPUs must applythe two updates

in the same order

Page 253: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Big problem

• When a processor updates a cached variable, the new value of the variable is not immediately written into the main memory– Perfect one-copy serializability is not feasible

Page 254: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

New rules

1. Whenever a processes accesses a variable it always gets the value stored by the processor that updated that variable last if the updates are sufficiently separated in times

2. A processor accessing a variable sees all updates applied to that variable in thesame order

– No compromise is possible here

Page 255: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A remark

• Data consistency issues appear in may disguises– Cache consistency– Distributed shared memory

• work done in early to mid 90's– Distributed file systems– Distributed databases

Page 256: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (I)

• UNIX workstations use a distributed file system called NFS (Network File System)

• An NFS comprises– client workstations– a centralized server

• NFS allows client workstations to cache contents of the file they access

• What happens when two workstations access the same file?

Page 257: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (II)

Server

x’ xx’’

A B

Inconsistent updates

Page 258: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Possible Approaches (I)

• Always keep a single copy:– Guarantees one–copy serializability– Would make the system too slow

• No caching!

• Prevent shared access:– Guarantees one–copy serializability– Would be very slow and complicated

Page 259: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Possible Approaches (II)

• Replicate and update:– Allows multiple processors to cache variables

already cached by other processors– Whenever a processor updates a cached variable,

it propagates the update to all other caches holding a copy of the variable

– Costly because processors tend to repeatedly update the same variable• Temporal locality of accesses

Page 260: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Possible Approaches (III)

• Replicate and invalidate:– Allows multiple processors to cache variables

already cached by other processors– Whenever a processor updates a cached

variable, we invalidate all other cached copies of the variable• Works well with write-through caches

–Will get the correct value later from RAM

Page 261: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A realization: Snoopy caches

• All caches are linked to the main memory through a shared bus

– All caches observe the writes performed by other caches

• When a cache notices that another cache performs a write on a memory location that it has in its cache, it invalidates the corresponding cache block

Page 262: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (I)

RAM

CPU

Cachex=2

CPU

CacheFetches x = 2

x = 2

Page 263: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (II)

RAM

CPU

Cachex = 2

CPU

Cachex = 2

Also fetches x

x = 2

Page 264: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (III)

RAM

CPU

Cachex = 0

CPU

Cachex = 2

Resets x to 0

x = 2

Page 265: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (IV)

RAM

CPU

Cachex = 0

CPU

Cachex = ?x = ?

Performs write-through

Detects write-through

and invalidates its copy of x

x = 0

Page 266: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

An example (IV)

RAM

CPU

Cachex = 0

CPU

Cachex = 0

when CPU wants to access x. cache gets correct value

from RAM

x = 0

Page 267: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

A last correctness condition

• Cache cannot reorder their memory updates– Cache RAM buffer must be FIFO

• First in first out

Page 268: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Example

• A CPU performs– x = 0;– x++; // sets x to 1;

• Final value for x in CPU cache is 1• If write buffer reorders write-through requests,

final value for x in RAM—and other caches will be 0– Ouch!

Page 269: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Miscellaneous fallacies (I)

• Segmented address spaces

– Address is segment number + offset in segment

– Supposed to let programmers organize their address space into meaningful segments

– Programmers—and compilers—hate them

Page 270: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Miscellaneous fallacies (II)

• Ignoring virtual memory behavior when accessing large two-dimensional arrays

– Must access array in a way that minimizes number of page faults

– Done by all good mathematical software libraries

Page 271: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Miscellaneous fallacies (III)

• Believing that you can virtualize any CPU architecture – Some are much more difficult than others

Page 272: THE MEMORY HIERARCHY Jehan-François Pâris jfparis@uh.edu

Concluding remarks

• As before, we have seen how human ingenuity has worked around hardware limitations– Cannot increase CPU speed above 3 to 4 GHZ

Pipelining, multicore architectures

– RAM is slower than CPUCaches

– Hard disks much slower than RAMVirtual memory, I/O buffering