mamas – computer architecture address spaces and memory management dr. avi mendelson

© Avi Mendelson, 5/2005 1

Lecture 11-12 - Address spaces and Memory management

MAMAS – Computer Architecture

Address spaces and Memory management

Dr. Avi Mendelson

Some of the slides were taken from:

(1) Jim Smith (2) Patterson presentations.



Memory System Problems

Different Programs have different memory requirements–How to manage program placement?

Different machines have different amount of memory–How to run the same program on many different machines?

At any given time each machine runs a different set of programs

–How to fit the program mix into memory? Reclaiming unused memory? Moving code around?

The amount of memory consumed by each program is dynamic (changes over time)

–How to effect changes in memory location: add or subtract space?

Program bugs can cause a program to generate reads and writes outside the program address space

–How to protect one program from another?



Address Spaces



Address Space of a single process (Linux)

Each process has a fixed size address space that can be divided into:– Symbol tables and other management areas

– Code

– Data “static data” allocated by compiler “Dynamic data allocated by

“malloc”

– Stack Managed automatically

kernel virtual memory

runtime heap (via malloc)

program text (.text)

initialized data (.data)

uninitialized data (.bss)

stack

forbidden0

%esp

memory invisible to user code

the “brk” ptr



Main memory can act as a cache for the secondary storage (disk)

Virtual Memory

Divide memory (virtual and physical) into fixed size blocks (Pages, Frames)

– Pages in Virtual space, Frames in Physical space

– Page size = Frame size

– Page size is a power of 2: page size = 2k

All pages in the virtual address space are contiguous

Pages can be mapped into physical Frames in any order

Some of the pages are in main memory (DRAM), some of the pages are on disk.

Virtual Addresses

Physical Addresses

Address Translation

Disk Addresses



Managing Multiple Virtual Address spaces Each process “sees” a private

address space of 4G.– 2G private address space the process

can use

– 2G system address space, shared by all processes, and can be accessed only if the system at “supervisor” mode (kernel mode).

All processes are sharing the same (small) physical memory

– Small part of the address space is kept in main memory (DRAM), most of the address space is either not mapped or is kept on disk.

All programs are written using Virtual Memory Address Space

The hardware does on-the-fly translation between virtual and physical address spaces.

2GB

2GB

2GB

2GB

Virtual address spaces

U A

rea

U A

rea

U A

rea



How a process is created (exec)

The executable file contains the code, the initial values for the data area and initial tables (symbol tables etc).

A page table is created to map the virtual addresses to the physical addresses

At the start time, no page is in the main memory, so – Code pages are pointed to the disk where the code section within the file

is– The initial data pages (.data) are mapped to the .data section within the

executable file.– The system may allocate pages for initial stack space, data space (.BSS)

and for the heap area (this is a performance optimization, the system can avoid it).

As the execution continues, the system allocates pages in the main memory, copies their content from the disk and change the pointer to point to the main memory.

When a page is replaced from the main memory, the system keeps it in a special place on the disk, called swap area.



Virtual Address Space for Process 1:

Physical Address Space (DRAM)

VP 1VP 2

PP 2

Address Translation0

0

N-1

0

N-1M-1

VP 1VP 2

PP 7

PP 10

(e.g., read only or library code)

Virtual Memory can help to share address spaces

– If several processes map different pages to the same physical page, the data/code of that page is shared (we will discuss how the OS takes advantage of this feature later on)

...

...

Virtual Address Space for Process 2:



VM can help process protection

Page table entry contains access rights information– hardware enforces this protection (trap into OS if violation occurs)

Page Tables

Process i:

Physical AddrRead? Write?

PP 9Yes No

PP 4Yes Yes

XXXXXXX No No

VP 0:

VP 1:

VP 2:•••

•••

•••

Process j:

0:1:

N-1:

Memory

Physical AddrRead? Write?

PP 6Yes Yes

PP 9Yes No

XXXXXXX No No•••

•••

•••

VP 0:

VP 1:

VP 2:



Virtual Memory Implementation



Page Tables

Valid

1

Physical Memory

Disk

Page TablePhysical Page

Or Disk Address

1

1

1

1

1

11

1

0

0

0

Virtual page number



Virtual to Physical Address translation

3 2 1 011 10 9 815 14 13 1231 30 29 28 27

3 2 1 011 10 9 815 14 13 1229 28 27

Virtual page number

Physical page number

Page offset

Page offset

Virtual Address

Physical Addresses

Page size: 212 byte =4K byte

Page Table



The Page Table

31

Page offset

011

Virtual Page Number

Page offset

11 0

Physical Frame Number

29

Virtual Address

Physical Address

V D Frame number

1

Page table basereg

0

Valid bit

Dirty bit

12

AC

Access Control

12



If V = 1 then

page is in main memory at frame address stored in table

Access data

else (page fault)

need to fetch page from disk

causes a trap, usually accompanied by a context switch:

current process is suspended while page is fetched from disk

Access Control (R = Read-only, R/W = read/write, X = execute only)

If kind of access not compatible with specified access rights then protection_violation_fault

causes trap to hardware, or software fault handler

Missing item fetched from secondary memory only on the occurrence of a fault demand load policy

Address Mapping Algorithm



Managing virtual addresses

Virtual memory management is mainly done by the OS.– Deciding which pages should be in memory and which – on disk.

– Deciding when to write pages to disk.

– Deciding what access rights are assigned to pages.

– Et cetera, et cetera.

– The exact algorithms are out of the scope of this course.

Here, we study two hardware mechanisms – Pseudo LRU mechanism – to decide which pages to keep

– TLB – to perform fast lookup



Page Replacement Algorithm (pseudo LRU)

Not Recently Used (NRU)– Associated with each page is a reference flag such that

ref flag = 1 if the page has been referenced in recent past

If replacement is needed, choose any page frame such that its reference bit is 0. – This is a page that has not been referenced in the recent past

Clock implementation of NRU:last replaced pointer (lrp)if replacement is to take place,advance lrp to next entry (modtable size) until one with a 0 bitis found; this is the target forreplacement; As a side effect,all examined PTE's have theirreference bits set to zero.

1 01 000

page table entry

Ref bit

1 0

Possible optimization: search for a page that is both not recently referenced AND not dirty



Page Faults

Page faults: the data is not in memory retrieve it from disk– The CPU must detect situation

– The CPU cannot remedy the situation (has no knowledge of the disk) CPU must trap to the operating system so that it can remedy the situation

– Pick a page to discard (possibly writing it to disk)

– Load the page in from disk

– Update the page table

– Resume to program so HW will retry and succeed!

Page fault incurs a huge miss penalty – Pages should be fairly large (e.g., 4KB)

– Can handle the faults in software instead of hardware

– Page fault causes a context switch

– Using write-through is too expensive so we use write-back



Virtual Memory Unix (VAX)

VAX was the first system to run UNIX, and so many of the UNIX mechanisms are based on the VAX implementation.

Unix distinguishes between 3 different address spaces, each of them is controlled by a separate page table– The system address space – one per

system– User code + data address space –

one per process– User Stack address space – one per

process.

Only the system page table (one table) resists in the main memory all the other tables and address spaces are pageable.

Kernel

code

stack

code

stack

User sp

aceS

ystem sp

ace

Run

PP tableSys page table

User P

T

Swappable area



VM in VAX: Address Format

Page size: 29 = 512 bytes

31

Page offset

08

Virtual Page Number

Virtual Address930 29

0 0 - P0 process space code and data0 1 - P1 process space stack1 0 - S0 system space1 1 - S1 not used

Page offset

8 0


29

Physical Address

9



Page Table Entry (PTE)

V PROT M Z OWN S S

0


31 20

Valid bit =1 if page mapped to main memory, otherwise page on the swap area and the address indicates where I can find the page on the disk

4 Protection bits

Modified bit

3 ownership bits

This is the structure of all the entries in each of the page tables.

Indicate if the line was cleaned (zero)



Address Translation

The System address space is divided into “resident” part that always exists in the main memory and the rest of the 2G address space of the system is swappable.

User space is always swappable. The system’s page table is part of the resident area

and the SBR register points on its physical base address.

For each process, there are two dedicated registers:– P0BR: points to the virtual address (in the system’s address space)

of the code segment page table of the user

– P1BR: points to the virtual address (in the system’s address space) of the stack segment page table of the user



System Space - Address Translation

From the virtual address of the system’s address space we can extract the virtual page number (VPN).

Assuming that each entry in the system address space is 4 bytes, the PTE that contains the physical address of the line exists in SBR+VPN*4.

If V=1, we can extract the physical address of the page.

If V=0, it causes a page fault and we need to bring the page from the disk.

SBR

VPN*4

PFN

10 offset8 029 9

VPN

offset8 029 9

PFN

31



P0/P1 Address Translation

Address translation need to be done in few phases:1. Find the address of the proper PTE in the user level page table.

2. Since the PTE address is within the system’s address space, check if the page that contains the PTE is in the main memory

3. If page exists, check if the user page is in the main memory

4. If the page does not exist, causes a page fault to bring the value of the PTE and only than, we can check if the address exists in the memory or not.

We may have up to 2 page-faults in the translation process of the singe access to a user level address.



P0/P1 Space Address Translation (cont)

SBR

P0BR+VPN*4

Offset’8 029 9

PFN’

00 offset8 029 9

VPN31

10 Offset’8 029 9

VPN’31

PFN’

VPN’*4

Physical addrof PTE

PFN

Offset8 029 9

PFN



Handling large address space

For a large virtual address space, we may get a large page table, for example:– For a 2 GB virtual address space, page size of 4KB, we get

231/212=219 entries

– Assuming each entry is 4 bytes wide, page table alone will occupy 2MB in main memory

Two proposed techniques to handle it:– Hierarchical page tables

– “segmentation”



Large Address SpacesSolution: use two-level Page

Tables Master page table resides in physical

memory Secondary page tables reside in virtual

address space Virtual address format

At the lower levels of the tree we will keep only the segments of the tables we use

P1 index P2 index page offest

10 10 12

4 bytes

4 bytes

4KB1KPTEs



• We do not need to map the addresses between the TOS (Top Of Stack) and the Break point

• The system needs to know how to map new regions when needed

Virtual Memory – Memory-management

Virtual Addresses Physical Addresses

Address Translation

Disk Addresses



The VAX Solution Segmentation

Map only the sections the program uses

Need to extend at run time– Stack manipulation

– SBRK command

p0

p1



How a process is forked

When a process is forked, the new process and the old processes starts at the same point with the same state. The only difference between the two processes is that the Fork command returns the PID to the original process and 0 for the “new born” process.

When a process is forked, we try to avoid to copy physical pages– At the initial point, the translation tables are copied

– All pages in the memory that belong to the original process are marked as “read-only”

– Only when one of the processes tries to change the page, we copy the page to private copies and allow private writable copies. (this mechanism is called COW – Copy On Write)



TLB – hardware support for address translation

TLB is a cache that keeps the last translations the system made, using the virtual address as an index

If the translation was found in the TLB, no further translation is needed.

If it misses, we need to continue the process as described before.



TLB (Translation Lookaside Buffer) is a cache for recent address translations:

Valid

1

1

1

1

0

1

1

0

1

1

0

1

1

1

1

1

0

1

Making Address Translation Fast

Physical Memory

Disk

Virtual page number

Page Table

Valid Tag Physical PageTLB

Physical PageOr

Disk Address



P0 space Address translation Using TLB

Yes

No

ProcessTLB Access

ProcessTLB hit?

00 VPN offset

NoSystemTLB hit?

Yes

Get PTE of req page from the proc. TLB

Calculate PTE virtual addr (in S0): P0BR+4*VPN

System TLB Access

Get PTE from system TLB

Get PTE of req page from the process Page table

Access Sys Page Table inSBR+4*VPN(PTE)

Memory Access

Calculate physical address

PFN

PFN

Access Memory



Virtual Memory And Cache

Yes

No

TLB Access

TLB Hit ?Access

Page Table

Access Cache

Virtual Address

Cache Hit ?

Yes

No AccessMemory

Physical Addresses

Data

TLB access is serial with cache access



Overlapped TLB & Cache Access

Physical page number Page offset

Virtual Memory view of a Physical Address

Disp

Cache view of a Physical Address

# SetTag

In the above example #Set is not contained within the Page Offset The #Set is not known Until the Physical page number is known Cache can be accessed only after address translation done.

3 2 1 011 10 9 815 14 13 1229 28 27

3 2 1 011 10 9 815 14 13 1229 28 27



Overlapped TLB & Cache Access (cont)

Physical page number Page offset

Virtual Memory view of a Physical Address

Disp

Cache view of a Physical Address

# SetTag

In the above example #Set is contained within the Page Offset The #Set is known immediately Cache can be accessed in parallel with address translation First the tags from the appropriate set are brought

Tag comparison takes place only after the Physical page number is known

(after address translation done).

3 2 1 011 10 9 815 14 13 1229 28 27

3 2 1 011 10 9 815 14 13 1229 28 27



Overlapped TLB & Cache Access (cont)

First Stage– Virtual Page Number goes to TLB for translation

– Page offset goes to cache to get all tags from appropriate set

Second Stage– The Physical Page Number, obtained from the TLB,

goes to the cache for tag comparison.

Limitation: Cache < (page size * associativity) How can we overcome this limitation ?

– Fetch 2 sets and mux after translation

– Increase associativity



Virtual memory and process switch

Whenever we switch the execution between processes we need to:– Save the state of the process

– Load the state of the new process

– Make sure that the P0BPT and P1BPT registers are pointing to the new translation tables

– Clean the TLB

– We do not need to clean the cache (Why????)

mamas – computer architecture address spaces and memory management dr. avi mendelson

Documents