sci.tamucc.edusci.tamucc.edu/~dkar/sum2017/2334/ch12.doc · web viewdevices that provide backup...

Chapter 12 – Memory Organization

Section 12.1 – Memory Hierarchy

The memory unit that communicates directly with the CPU is called the main memory where programs and data currently needed by the processor reside

Devices that provide backup storage are called auxiliary memory, where all other information is stored and transferred to main memory when needed

The goal of using memory hierarchy is to obtain the highest possible access speed while minimizing the total cost of the entire memory system

Auxiliary vs. Main MemoryAuxiliary Memory Main Memory

Provides backup storage Holds programs and data currently neededLow-cost Smaller capacityLarge capacity Direct access by CPUHolds data files, system programs, etc not currently needed

Higher cost

No direct access by CPU Faster access, but still not fast enough…

Cache Memory: Very-high speed memory Smaller capacity, higher cost Access time is close to processor logic clock cycle time Stores program segments currently being executed and data frequently accessed Increases performance of the computer

Section 12.2 – Main Memory

RAM and ROM Chips:Typical RAM chip

Typical ROM chip

Memory Address Map: The addressing of memory can be established by means of a table that specifies the

memory address assigned to each chip Example: 512 bytes RAM and 512 bytes ROM

Memory Connection to CPU: RAM and ROM chips are connected to a CPU through the data and address buses. The low-order lines in the address bus select the byte within the chips and other lines

in the address bus select a particular chip through its chip select inputs.

Section 12.4 – Associative Memory

The time required to find an item in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather than by an address

A memory unit accessed by content is called an associative memory or content addressable memory (CAM)

When a word is to be read from an associative memory, the content of the word, or part of the word, is specified.

It is more expensive than RAM as each cell must have storage capability as well as logic circuits for matching its content with an external argument

Compare each word in CAM in parallel with the content of A(Argument Register) If CAM Word[i] = A, M(i) = 1 Read sequentially accessing CAM for CAM Word(i) and set M(i) = 1 if there is a

match K(Key Register) provides a mask for choosing a particular field or key in the

argument register A (only those bits in the argument that have 1’s in their corresponding position of K are compared)

Example:

A 101 111100K 111 000000Word 1 100 111100 no matchWord 2 101 000001 match

Section 12.5 – Cache Memory

The references to memory at any given interval of time tend to be confined within a few localized areas in memory. This phenomenon is known as locality of reference.

Temporal Locality: The information which will be used in near future is likely to be in use already( e.g. Reuse of information in loops)

Spatial Locality: If a word is accessed, adjacent(near) words are likely to be accessed soon (e.g. Related data items (arrays) are usually stored together; instructions are executed sequentially)

Cache: The property of locality of reference makes the cache memory systems work Cache is a fast small capacity memory that should hold those information which are

most likely to be accessed

All the memory accesses are directed first to cache If the word is in cache; access cache to provide it to the CPU If the word is not in cache; bring a block (or a line) including that word to replace a

block now in cache

Performance of Cache Memory System:

Hit Ratio - % of memory accesses satisfied by cache memory systemTe: Effective memory access time in cache memory systemTc: Cache access timeTm: Main memory access time

Te = Tc + (1 - h) TmExample: Tc = 0.4ns, Tm = 1.2ns, h = 0.85%Te = 0.4 + (1 - 0.85) × 1.2 = 0.58ns

The transformation of data from main memory to cache memory is referred to as a mapping process

1. Associative mapping2. Direct mapping3. Set-associative mapping

Associative Mapping:

Any block location in cache can store any block in memory Most Flexible, fast, very expensive Mapping table is implemented in an associative memory Mapping table stores both address and the content of the memory word If the address is found, 12-bit data is read and sent to the CPU, if not main memory is

accessed for the word and the address-data pair is transferred to the associative cache memory

If the cache is full, an address-data pair is displaced to make room for the pair needed according to a replacement policy (e.g. round-robin, FIFO)

Direct Mapping:

Each memory block has only one place to load in cacheMapping table is made of RAM instead of CAMn-bit memory address consists of 2 parts; k bits of index field and n - k bits of tag fieldn-bit addresses are used to access main memory and k-bit index is used to access the cache

When CPU generates a memory request, the index field is used for address to access the cache and the tag field is compared with the tag in the word read from the cache

If two tags matches, there is a hit and the desired data word is in cache If there is no match, there is a miss and the required word is read from main memory,

and stored in cache together with the new tag, replacing the old value The disadvantage is that the hit ratio drops if two or more words with same index but

different tags are accessed repeatedly (minimized by the fact that they are far apart- 512 locations)

The index field is now divided into two parts; the block field and the word field. In a 512-word cache we have 64 blocks of 8 words each; the tag field stored within

the cache is common to all eight words of the same block Every time a miss occurs, an entire block of eight words is transferred from main

memory to cache memory; this takes extra time, but the hit ratio will most likely improve with a larger block size because of the sequential nature of programs

Set-Associative Mapping:

Each word of cache can store two or more words of memory under the same index address

The number of tag-data items in one word of cache is said to form a set The word length is 2(6 + 12) = 36 bits, the index address is 9 bits, and the cache can

accommodate 1024 words of main memory The index value of the address is used to access the cache, and the tag field of the

CPU address is compared with both tags; the comparison logic is done by an associative search of the tags in the set similar to an associative memory (thus the name set-associative)

Writing into Cache:

Write-Through method: Update main memory with every memory write operation, with cache memory being updated in parallel if it contains the word at the specified address

Memory is always updated -> Important when CPU and I/O device communicating through DMA are both executing

Slow, due to the memory access time Write-Back (Copy-Back) method: The cache location is updated during a write

operation, and the location is then marked by a flag so that later when the word is removed from cache it is copied into main memory

Memory is not up-to-date, i.e., the same item in cache and memory may have different value

Section 12.6 – Virtual Memory

Give the programmer the illusion that the system has a very large memory (equal to the totality of auxiliary memory), even though the computer actually has a relatively small main memory.

Address Space (Logical) and Memory Space (Physical):

An address used by a programmer is called virtual address, and the set of such addresses the address space (N)

An address in main memory is called a location or physical address, and the set of such locations the memory space (M)

Portions of programs and data need not be in contiguous locations in memory since information is being moved in and out, and empty spaces may be available in scattered locations in memory

The address field of the instruction code has sufficient number of bits to specify all virtual addresses

A table is needed to map a virtual address of 20 bits to a physical address of 15 bits The mapping is a dynamic operation, which means that every address is translated

immediately as a word is referenced by CPU

Address Mapping Using Pages:

Memory space is divided into groups of equal size called blocks and address space is divided into groups of equal size called pages

The mapping from address space to memory space is facilitated if each virtual address is considered to be represented by two numbers: a page number address and a line within a page.

In a computer with 2p words per page, p bits specify the line address and the remaining high-order bits specify the page number

In Fig. 12-18 a virtual address has 13 bits. Since each page consists of 1K = 1024 words, the high-order three bits will specify one of the eight pages and the low-order 10 bits give the line address within the page

The line address in address space and memory space is the same, and the only mapping required is from a page number to a block number

The address in the page table denotes the page number and the content of the word gives the block number where that page is stored in main memory

In Fig. 12-19 the pages 1, 2, 5, and 6 are now available in main memory in blocks 3, 0, 1, and 2, respectively

The presence bit indicates whether the page has been transferred from auxiliary memory to main memory

The CPU references a word in memory with a virtual address of 13 bits If the presence bit in the word read from the page table is 0, it signifies that the

content of the word referenced by the virtual address does not reside in main memory, and a call to the operating system is generated to fetch the required page from auxiliary memory and place it into main memory

Associative Memory Page Table:

A RAM page table is inefficient with respect to storage utilization In Fig. 12-19 we observe that eight words of memory are needed, one for each page,

but only four are available as main memory can accommodate only four blocks For example if we have an address space of 1024K words and memory space of 32K

words, the number of pages is 1024 and the number of blocks is 32. Therefore, at any given time 32 locations may have a presence bit equal to 1, and at least 992 locations will be empty and not in use

Efficient way is to organize the page table with number of words equal to the number of blocks in main memory

Implemented by means of an associative memory with each word in memory containing a page number together with its corresponding block number

Page Replacement:

Page fault is a condition that occurs when a program attempts to reference a page that is still in auxiliary memory

When a page fault occurs, the execution of the present program is suspended until the required page is brought into main memory

If the main memory is full, it would be necessary to remove a page from a memory block to make room for the new page according to a page replacement policy

The FIFO algorithm selects for replacement the page that has been in memory the longest time. Using a queue - every time a page is loaded, its identification number is inserted in the queue. Easy to implement as the identification number of the page to be removed can be accessed first

The OPT (Optimal Replacement) algorithm replaces that page which will not be used for the longest period of time. Lowest page fault rate of all the algorithms, but difficult to implement.

07

1

7

2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1

0 07

1

201

231

230

430

420

423

023

013

012

712

702

701

Page frames

Reference string

07

17

2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1

0 07

1

201

203

243

203

201

701

Page frames

Reference string

The LRU algorithm replaces the page which has not been used for the longest period of time. It is implemented by associating a counter with every page that is in main memory. At fixed intervals of time, the counters associated with all pages presently in main memory are incremented by 1, and when a page is referenced, its associated counter is set to zero. The least recently used page is the page with the highest count.

07

17

2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1

0 07

1

201

203

403

402

432

032

132

102

107

Page frames

Reference string

sci.tamucc.edusci.tamucc.edu/~dkar/sum2017/2334/ch12.doc · web viewdevices that provide backup...

Documents