t3-memory. 3.2 index memory management concepts basic services program loading in memory dynamic...

T3-Memory

3.2

Index

Memory management concepts Basic Services

Program loading in memory Dynamic memory HW support

To memory assignment To address translation

Services to optimize physical memory usage COW Virtual memory Prefetch

Linux on Pentium

3.3

CONCEPTS

Physical memory vs. Logical memory

Process address space

Addresses assignment to processes

Operating system tasks

Hardware support

3.4

Execution model

CPU can access only memory and registers Data and code must be loaded in memory to be referenced Program loading: allocate memory, write the executable on that

memory and pass execution control to the entry point of the program

Memory

CPU

Memory content

@

3.5

Multi-programmed systems

Multi-programmed systems Several programs loaded

simultaneously in physical memory Ease concurrent execution and

simplify context switch mechanism 1 process on CPU but N processes in

physical memory When performing a context switch it is

not necessary to load again the process that gets assigned the CPU

OS must guarantee physical memory protection between processes

Each process can access only the physical memory that it gets assigned

It must be done by hardware

Memory Management Unit (MMU)

CPU

Memory data

@

P1

P2

Memory

@CPU

Memory data

@

P1

P2

Memory

MMU

Exception?

3.6

Physical memory vs. Logical memory

“Type” of addresses: Logical addresses : The memory addresses generated by the

CPU Physical addresses: The memory addresses that arrive to

memory Are they different??? They can be!!!

Current systems offer translation support based on the MMU, it offers:– Memory translation– Memory validation

3.7

Address space: Range of addresses [@first_adress…@last_address] That concept is applied to different contexts:

Processor address space Process logical address space

Subset of logical addresses that a process can reference (OS kernel decides which are those valid addresses for each process)

Process physical address space Relationship between logical addresses and physical addresses

Without translation: logical address space == physical address space With translation: It can be done at different moments

Option1, During program loading: kernel decides where to place the process in memory and translate references at program loading

Option 2, During program execution: each issued reference is translated at runtime (this is the normal behavior in current systems)

Address Spaces

3.8

There exists other choices but… current general purpose systems translate @ to instructions and data at runtime

Since logical addresses are decoupled from physical addresses We can have many processes with the same logical addresses without

problem – FORK!!!!! Parent and child have the same logical address

space without conflict– Compiler can translate program references to memory without

concerning about other programs references and about which physical addresses will be available when the process starts the execution

Processes are enabled to change their position in memory without changing their logical address space. Example: Paging (explained in EC course)

Assignment of addresses to processes

3.9

Collaboration between MMU (HW) and Kernel (SW) MMU

It implements the mechanism to detect illegal accesses – Out of process logical memory address space– Valid address but invalid access

It throws an exception to the OS if some problem is detected during memory address translation

kernel It configures MMU It manages the exception according to the situation

For example, if the logical address is not valid it can kill the process (SISEGV signal)

Multi-programmed systems with MMU support

3.10

Multiprogrammed systems: whole picture

1-Process A is running * But process A and C are both loaded on memory2-Context switch to C

CPUlogical@

physical@

Physical Memory

Process A

Process B

Process Cphysical@

Process A

Process C

Exception if invalid access

Memory content

MMU

-Translation Process A addresses-Protection

-Translation Process C addresses-Protection

3.13

Case 1: When assigning memory Initialization when assigning new memory (mutation, execlp) Changes in the address space: grows/diminishes.

Case 2: When switching contexts For the process that leaves the CPU: if it is not finished, then keep in its

data structures (PCB) the information to configure the MMU when it resumes the execution

For the process that resumes the execution: configure the MMU

When does the OS need to update the MMU???

3.15

Program loading in memory Once loaded we have already seen how it works!!

Allocate/Deallocate dynamic memory (requested through system calls) Shared memory between processes

COW: transparent sharing of read-only regions between processes Shared memory explicitly requested through system calls (out of the

scope of this course) Optimization services

COW Virtual memory Prefetch

OS tasks in memory management

3.16

OS BASIC SERVICES

Program loading

Dynamic memory

Memory assignment

Explicit shared memory between processes

3.17

Executable file is stored in disk, but it has to be in memory in order to be executed (execlp or similar)

OS has to:

1. Read and interpret the format of the executable

2. Prepare in logical memory the process layout and assign physical memory to it

1. Initialize the PCB attributes related to memory management: Information to configure MMU each time the process resumes the execution

2. Initialize MMU

3. Read the program sections from disk and write them to memory

4. Load program counter register with the address of the entry point instruction, which is defined in the executable file

Basic services: program loading

3.18

STEP 1: Interpret executable format in disk If address translation is performed at runtime, which kind of address in

in the executable file in disk? Logical or physical? Header in the executable file defines sections: type of section, size and

position in the file (try objdump –h program) There exists several executable file formats

ELF (Executable and Linkable Format): is the most widespread executable format in POSIX systems

Program loading: executable format

Some sections in ELF files

.text Code

.data Global data with initial value

.bss Global data without initial value

.debug Debug information

.comment Control information

.dynamic Information for dynamic linking

.init Initialization code for the process (contains @ of the 1st instruction)

3.19

STEP 2: Prepare the process layout in memory Usual layout: code/data/heap/stack memory regions

Program loading: Process layout in memory

max

0

.bss

.data

.text

Local variables, parameters and execution control

Dynamic memory: runtime allocation (sbrk)

Global variables

stack

heap

data

code

invalid Sections in the executable file

3.20

Program loading

Disk

Memory

.data01010101….bss01010101….text01010101…

1-Allocate memory - Kernel data - Process’ PCB2-Copy executable file3- Update MMU

CPU stack

datos

código

stack

data 01010101…

code 01010101…

MMU

Kernel data: free memory regions, PCB

3.21

Optimizations on program loading (exec system call) On-demand loading: not all the code lines are executed Shared libraries and dynamic linking: Many parts of executables are

read only and can be shared by more than one process The goals are

To save time by loading just parts of the executable

To save memory and storage area by loading just parts of the executable in memory by sharing parts of the executable (both in disk and memory)

Optimizations on program loading

3.22

On-demand loading Loading of routines is delayed until they are called It requires a mechanism to detect if an address is already in memory or

not. – Real MMU information is stored at PCB– MMU exception code validates the memory address, if correct

» updates memory content» updates MMU and PCB attributes» Restart instruction


3.23

Shared libraries and dynamic linking Libraries can be generated in two different ways: static and dynamic version Executables can use static or dynamic version of libraries (default is dynamic)

Static: Library code is included in the executable file Dynamic: Executable files (in disk) do not contain the dynamic library code

but just a reference to it » That saves a lot of disk space! (Link phase is delayed until runtime)» When executed, that code loads the library if it is not already

loaded in memory and updates the process code to substitute the call to the stub by the call to the routine in the shared library

Processes can share those memory areas holding the same code (it is read only) and the code of libraries It saves a lot of memory space


3.24

BASIC SERVICES: DYNAMIC MEMORY

3.25

System call to ask for an extra memory space or to reduce a previously reserved memory area

Heap area: region in the process address space that holds dynamic memory allocations

Required when the size of a variable depends on runtime parameters In this situation, it is not desirable to fix sizes at compiling time, causes

over allocation (memory wasting) or under allocation (runtime error) Optimization

Physical memory assignment can be delayed until the first write access to the region Temporal assignment of a 0 filled region to manage read accesses

(it depends on the interface).

Dynamic memory allocation/deallocation

3.26

Linux on Pentium Unix traditional interface is not user friendly

brk and sbrk (we will use sbrk) Both system calls just update the heap limit. OS does not control which

variables are store in the heap, it just increases or decreases heap size. Programmer is responsible of controlling the position of each variable in the

heap. Man pages recommends to use malloc

Returns: previous heap limit Size_in_bytes values

– >0 increases the heap limit by size bytes– <0 decreases the heap limit by size bytes– ==0 it does not modify the heap limit (it is used to get the current heap limit)

Dynamic memory allocation/deallocation

void * sbrk(size_in_bytes);

3.27

Sbrk:example

int main(int argc,char *argv[]){int procs_nb=atoi(argv[1]);int *pids;pids=sbrk(procs_nb*sizeof(int));for(i=0;i<10;i++){ pids[i]=fork(); if (pids[i]==0){

…. }}sbrk(-1*procs_nb*sizeof(int));

0

max

CODE

DATA

STACK

HEAP

3.28

C library offers to programmers: The deep management of the heap: it knows which parts are “reserved” and

which parts are “free” The heap size management

Memory allocation: void * malloc(int size_in_bytes) If possible, it “reserves” N consecutive non-used bytes of the heap Otherwise, it asks to the kernel to increases the heap size

Implementation and optimizations It controls reserved/free areas The C library tries to reduce the number of system calls to save time

– Asking to the kernel for an extra memory space when calling the kernel is mandatory

Memory deallocation: void free(void *p) Marks as “free to use” a previously “reserved” area

C library: Dynamic memory allocation/deallocation

3.29

malloc/free: example

int main(int argc,char *argv[]){int procs_nb=atoi(argv[1]);int *pids;pids=malloc(procs_nb*sizeof(int));for(i=0;i<10;i++){ pids[i]=fork(); if (pids[i]==0){

…. }}free(pids);

malloc interface like sbrk interface.free interface needs as input parameter a pointer to the base address of the region

3.30

How does the heap change after executing the following examples? Example 1:

Example 2:

Does the heap size change in both examples?

Dynamic memory: examples

...new = sbrk(1000);...

...new = malloc(1000);...

3.31

How does the heap change after executing the following examples? Example 1:

Example 2:

Do both examples allocate the same logical memory addresses? Example 1: requires 1000 consecutive bytes Example 2: requires 10 regions of 100 bytes each one


...ptr = malloc(1000);...

...for (i = 0; i < 10; i++) ptr[i] = malloc(100);...

3.32

Which errors are in the following codes?

Code 1: What does happen while executing the second iteration of second loop?

Code 2: Does the access to “*x” produce always the same error?


Code 1: Code 2:

int *x, *ptr;

...ptr = malloc(SIZE);...x = ptr;...free(ptr);

sprintf(buffer,”...%d”, *x);

...for (i = 0; i < 10; i++) ptr = malloc(SIZE);

// uso de la memoria// ...

for (i = 0; i < 10; i++) free(ptr);...

3.33

BASIC SERVICES: MEMORY ASSIGNMENT

Fixed partitions: Paging

Variable partitions: Segmentation

3.34

It is executed each time a process needs physical memory: In Linux: creation (fork), load of executable files (exec), dynamic

memory usage, implementation of some optimization (on-demand loading, virtual memory, COW…).

Steps Select free physical memory and mark it in the OS data structures as

in-use memory Update MMU with the mapping information logical @ physical @

Necessary to implement address translation

Basic services: memory assignment

3.35

First approach: contiguous assignment Process physical address space is contiguous

The whole process is loaded on a partition which is selected at loading time It is not flexible and complicates to apply optimizations (as, for example, on-

demand loading) and services such as dynamic memory Non-contiguous assignment

Process physical address space is not contiguous Flexible Increases complexity of OS and MMU

Based on Fixed partitions: Paging Variable partitions: Segmentation Combined schemes

For example, segmentation at a first level and paging in a second level

Basic services: memory assignment

explained in EC course

3.36

Any non-contiguous scheme of allocation of space suffers from fragmentation Fragmentation problem: when it is not possible to satisfy a given memory

request although the system has enough memory to do it. There is free memory but cannot be assigned to a process. It appears in the disk management too

Internal fragmentation: memory assigned to a process that is not going to use it.

External fragmentation: free memory that cannot be used to satisfy any memory request because it is not contiguous. It can be avoided compacting the free memory. It is necessary the system

to support address translation at runtime. Slowdowns applications

Memory assignment: fragmentation

3.37

Paging based scheme Logical address space is divided into fixed size partitions: pages Physical memory is divided into partitions of the same size: frames Easy to implement memory management since all the frames are equal

– Global list of free frames– MMU per-process information stored at PCB

Page: working unit of the OS Facilitates on-demand loading: 1 page each time Enables page-level protection: at page level Facilitates memory sharing between processes : at page level Usually, a page belongs to just one memory region to match region

protection requirements (code/data/heap/stack)

Assignment: Paging

3.38

MMU information: Page Table One entry per page: validity, access permissions(rwx), associated

frame, etc. One table per process Typically, architectures have a register that points to the current page

table

Assignment: Paging

3.39

PROBLEM: Page table size (stored in memory) Page size is usually power of 2

Typical size 4Kb (2^12) Affects to

Internal fragmentation and management granularity Page table size

Scheme to reduce memory needed by PT: multi-level PT PT is divided into section and more sections are added as process

address space grows

Assignment: Paging

Logical address of the processor

Number of pages PT size

32 bits Bus 2^32 2^20 4MB

64 bits Bus 2^64 2^52 4PB

3.40

It is a good solution in terms of space requirements, but many memory accesses are required to perform an address translation!!!

Current processors also have a TLB (Translation Lookaside Buffer) Associative memory (cache) of faster access than RAM to keep

translation for active pages It is necessary to update/invalidate TLB for each change in the MMU

Hardware management / Software management (OS) Dependent on the architecture

Multi-level page tables

3.41

Assignment: Paging

CPU

MMU

Memory

#page #frame

TLB

logical@p o

physical @f o

TLB hit

f

Page table

TLB miss

p

Exception

rw

3.42

Logical address space divided into variable size partitions (segments), that fit the size that is really needed At least 3 segments: one for code, one for stack and one for data References to memory are composed of segment and offset

All physical contiguous memory is an available partition However, they are not equal like in paging

Assignment: for each segment in a process Look for a partition big enough to hold the segment Possible policies: first fit, best fit, worst fit Select from the partition just the amount of memory needed to hold

the segment and the rest of the partition is kept in a free partitions list

Can cause external fragmentation

Assignment: Segmentation

3.43

MMU

MMU Segment table

For each segment: base @ and size One table per process

Assignment: Segmentation

CPU

logical@

s o

s

limit base

<

no

Exception: illegal @

yes+

Memory

Segment table

3.44

Mixed schemes: paged segmentation

Process logical address space is divided into segments

Segments are divided into pages Segment size is multiple of page size Page is OS working unit

Assignment: Mixed schemes

CPUsegmentation

unitpaging

unitphysicalmemory

logical@ lineal@ physical@

3.45

Explicit memory sharing between processes Useful as a method to share data between processes

OS must provide programmers with system calls to manage shared memory regions: allocate memory regions and mark them as sharable, thus other processes can map them into their address space

Basic services: Explicit shared memory

3.46

SERVICES TO OPTIMIZE PHYSICAL MEMORY USAGE

COW

Virtual Memory

Prefetch

3.47

Idea: to delay allocation/initialization of physical memory until it is really necessary If a new zone is never accessed it is not necessary to assign physical

memory to it If a copied zone is never written it is not necessary to replicate it Save time and physical memory space

It can be applied When asking for dynamic memory When creating a new process (fork)

Optimizations: COW (Copy on Write)

3.48

Kernel uses the MMU (exception mechanism) to detect write accesses to (speculatively) shared memory pages

MMU New (logical) pages are initialized with existing (physical) frames, but

permissions are set as write protected (both, source and new page) PCB

Real permissions are set here to differentiate fails because of COW from real invalid accesses

When a process tries to write on the new region or on the source region: OS exception management code performs the actual allocation and

copy Updates MMU with the real permission for both regions and resets the

instruction that generates the exception

COW: Implementation

3.49

Compute: how many pages are modified (and thus cannot be shared)? how many pages are read-only (and thus can be shared) ?

Process A physical memory assignment: Code: 3 pages, Data: 2 pages, Stack: 1 page, Heap: 1 page

Let’s consider that process A executes a fork system call. Just after fork: Total physical memory:

Without COW: process A= 7 pages + child = 7 pages = 14 pages With COW: process A= 7 pages + child =0 pages = 7 pages

Later on the execution… depends on the code executed by the processes, for example: If child executes an exec (and its new address space uses 10 pages):

Without COW: process A= 7 pages+ child = 10 pages= 17 pages With COW: process A= 7 pages+ child A=10 pages= 17 pages

If child does not execute an exec, at least code will be always shared between both processes and the rest of the address space depends on the code. If only the code is shared:

Without COW: process A= 7 pages+ child A= 7 pages= 14 pages With COW: process A= 7 pages+ child A=4 pages= 11 pages

COW: example

3.50

OPTIMIZATIONS: VIRTUAL MEMORY

3.51

Goal To reduce amount of physical memory assigned to a process To

increase potential multiprogramming grade Idea: We don’t need to have the whole process loaded on memory (we

already know that) What if we introduce the mechanism to move out pages from memory to…

(where)? From memory to Disk!

Optimizations: Virtual memory

On demand page loading

Page swap in/ou

tmechanism

Virtual

memory

New!

3.52

First approach: swapping of the whole process To much penalty to swap in from disk

Next approach: Use the MMU and the paging mechanism to offer virtual memory at page granularity If we need a frame for a new frame request, and no physical memory is

available we swap out one allocated frame and we use the hole generated

We need a memory replacement algorithm to select a victim frame to move from memory to disk


3.54

Memory replacement algorithm: executed when OS needs to free frames Selects a victim page and deletes its translation from MMU

Try to select victim pages that are no longer necessary or that will take long time until be needed

» Example: Least Recently Used (LRU) or approximations Stores it contents in the swap area Assigns the free frame to that page that requires it

Page Fault: When a non-present page (but valid page) is referenced MMU throws an exception to the OS as it cannot perform the translation

Kernel exception code for page fault management Checks if the access is valid (the PCB always contains full information) Assigns a free frame to the page (starts the memory replacement algorithm if it

is necessary) Searches for the content of the page in the swap area and writes it into the

selected frame Updates MMU with the physical address assigned


3.55


swap

Logicaladdressspace

Physical memory

MMU

OS

physical @ logical @

page fault page swapping

page fault

memory replacement

update MMU

Logical address space of a process is distributed across physical memory (present pages) and swap area (non-present pages)

3.56

Memory access steps:


log

ical

@

TLB access

physical@

PT access

valid logical@ and present?

process yes

allocatesframe

no

updates TLByes

no

valid logical@?

hit?

generatessignal

memory access

readspage

updatesPT

restartinstruction

yes

no

starts memoryreplacement, if needed

blocksprocess

Page fault

3.57

Effects of using virtual memory: Physical memory can be smaller than the sum of the address spaces of

the loaded processes Physical memory can be smaller than the logical address space of a

single process Accessing to non-present pages is slower than accessing to present

pages Exception + page loading It is important to reduce the number of page faults


3.59

Process is in thrashing when It spends more time performing page swapping than executing

program code It is not able to keep simultaneously in memory the minimum number of

pages required to advance with the execution. Memory system is overloaded

Detection: to control page fault rate per process Management: to control multiprogramming grade and to swap out

processes


3.60

Goal: to minimize number of page faults Idea: to predict which pages will need a process (in a near future) and load

them in advance Parameters to consider:

Prefetch distance: time between the page loading and the page reference

Number of pages to load in advance Some simple prediction algorithms:

Sequential Stride

Optimizations: Prefetch

3.61

exec system call: loads a new program PCB initialization with the description of the new address space, memory

assignment,… Process creation (fork):

PCB initialization with the description of its address space (which is a parent copy)

Uses COW Creation and initialization of the new process PT

Base address of the PT is kept in the PCB of the process Process scheduling

Context switch: updates MMU with the base address of the current PT and invalidates TLB

exit: Deletes process PT and deallocates process frames (if those frames are not

in use by other process)

Summary: Linux on Pentium

3.62

Virtual memory based on paged segmentation Multi-level page table (2 levels)

One per process Stored in memory A CPU register keeps the base address of the PT for the current

process Memory replacement algorithm: LRU approximation

Executed periodically and each time the number of free frames reach a threshold

COW at page level On-demand loading Support to shared libraries Simple Prefetch (sequential)

Summary: Linux on Pentium

3.63

Storage hierarchy

Sto

rage c

apaci

ty

acce

ss speed

less

lessmore

more

t3-memory. 3.2 index memory management concepts basic services program loading in memory dynamic...

Documents