074.784 operating systems design and …comp7840/notes/1_osreview_2up.pdf1 074.784 operating systems...

61
1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI Overview 1 Logistics Instructor: Peter Graham E2-572 EITC (474-8837) [email protected] Lectures: Wednesday 10:00 – 12:00 E2-461 EITC Course Homepage: www.cs.umanitoba.ca/~comp7840

Upload: others

Post on 17-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

1

074.784 Operating Systems Design and Implementation

Peter GrahamSpring 2007 (January – April)

04.784 OSDI Overview 1

LogisticsInstructor: Peter Graham

E2-572 EITC (474-8837)[email protected]

Lectures: Wednesday 10:00 – 12:00

E2-461 EITC

Course Homepage:www.cs.umanitoba.ca/~comp7840

Page 2: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

2

04.784 OSDI Overview 2

Course OverviewGoals: - with a focus on support for pervasive computing

Deeper understanding of OS design and implementation principles: OS/architecture interface/interactionCurrent trends in OS Research - with a focus on support for pervasive computing

Structure:Review basic material: OSs, Pervasive ConceptsRead and discuss papers on advanced issuesWrite a survey paper on an OS topicSignificant project a taste of hands-on work

me

you

04.784 OSDI Overview 3

TopicsOS Review: Processes, threads, and synchronization, Resource Management, Virtual Memory, I/O and file systemsPervasive Computing Introduction: Motivation, Key Issues, OS ChallengesCurrent research topics

Based on selected paper presentations

Page 3: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

3

04.784 OSDI Overview 4

Course DetailsPrerequisites

Undergraduate OS and architecture coursesGood programming skills (in C/C++ and UNIX)

What to expectReading and critical analysis of other’s workImplementation Project (with evaluation)Write a Survey Paper – no “double dipping”

Hence (evaluation):Paper presentation 30%Implementation Project 30%Survey Paper 40%

04.784 OSDI Overview 5

ProjectGoals

Learn to design, implement/simulate, and evaluate an OS component related to pervasive computingImprove systems programming skills

StructureIndividual work (with a significant coding effort)Include a brief final project report

Page 4: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

4

OS Mechanisms and Policies

04.784 OSDI Overview 7

What is an operating system?

A software layer between the hardware and the application programs/users which provides a virtual machine interface: easy and safeA resource manager that allows programs/users to share the hardware resources: fairly and efficiently

hardware

operating system

application (user)

Page 5: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

5

04.784 OSDI Overview 8

How does an OS work?

Receives requests from the application: system callsSatisfies the requests: may issue commands to hardwareHandles hardware interrupts: may upcall the applicationOS complexity: synchronous calls + asynchronous events

hardware

OS

application (user) system calls upcalls

commands interruptsH/W independentH/W dependent

04.784 OSDI Overview 9

Mechanism and policy

Mechanisms: data structures and operations that implement an abstraction (e.g. the file buffer cache) Policies: the procedures that guide the selection of a certain course of action from among alternatives (e.g. the replacement policy for the buffer cache)Traditional OS is rigid: mechanism together with policy

hardware

operating system: mechanism+policy

application (user)

Page 6: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

6

04.784 OSDI Overview 10

Mechanism-policy splitSingle policy often not the best for all casesSo, separate mechanisms from policies:

OS provides the mechanism + some policy

applications contribute to the policy

Flexibility + efficiency: require new OS structures and/or new OS interfaces

04.784 OSDI Overview 11

System abstraction: processes

A process is a system abstraction: An illusion of being the only job in the system.

hardware: computer

operating system: process

user: run application create, kill processes,inter-process comm.

multiplex resources

Page 7: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

7

04.784 OSDI Overview 12

Processes: mechanism and policyMechanism:

Creation, destruction, suspension, context switch, signaling, IPC, etc.

Policy:Minor policy questions:

Who can create/destroy/suspend processes?How many active processes can each user have?

Major policy questions:How to share system resources between multiple processes?Typically broken into a number of orthogonal policies for individual resources such as CPU, memory, and disk.

04.784 OSDI Overview 13

A thread is a processor abstraction: An illusion of having 1 processor per execution context

- One or more threads per process

Processor abstraction: threads

hardware: processor

operating system: thread

application: execution contextcreate, kill, synch.

context switch

Page 8: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

8

04.784 OSDI Overview 14

Threads: mechanism and policyMechanism:

Creation, destruction, suspension, context switch, signaling, synchronization, etc.

Policy:How to share the CPU between threads from different processes?How to share the CPU between threads from the same process?

04.784 OSDI Overview 15

Memory abstraction: virtual memory

Virtual memory is a memory abstraction: An illusion of large contiguous memory, typically more memory than is physically available

hardware: physical memory

operating system: virtual memory

application: address spacevirtual addresses

physical addresses

Page 9: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

9

04.784 OSDI Overview 16

Virtual memory: mechanism

Virtual-to-physical memory mapping, page-fault, etc.Done with hardware support (DAT/MMU)

physical memory:

v-to-p memory mappings

processes:

virtual address spacesp1 p2

04.784 OSDI Overview 17

Virtual memory: policyHow to multiplex a virtual memory that is larger than the physical memory onto what is available?How to share physical memory between multiple processes?

Page 10: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

10

04.784 OSDI Overview 18

Storage abstraction: file system

A file system is a storage abstraction: An illusion of structured storage space

hardware: disk

operating system: files, directories

application/user: copy file1 file2 naming, protection,operations on files

operations on disk blocks

04.784 OSDI Overview 19

File SystemMechanism:

File creation, deletion, read, write, file-block-to-disk-block mapping, file buffer cache, etc.

Policy:Sharing vs. protection?Which block to allocate?File buffer cache management?

Page 11: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

11

04.784 OSDI Overview 20

Communication Abstraction:Messaging

Message passing is a communication abstraction: An illusion of reliable (sometimes ordered) transport

hardware: network interface

operating system: TCP/IP protocols

application: socketsnaming, messages

network packets

04.784 OSDI Overview 21

Message PassingMechanism:

Send, receive, buffering, retransmission, etc.

Policy:Congestion control and routingMultiplexing multiple connections onto a single NIC

Page 12: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

12

04.784 OSDI Overview 22

Multiprocessors

Memory

memory bus

I/O bus

Net interfaceDisk

CPU

cache

CPU

cache

04.784 OSDI Overview 23

UMA Multiprocessors: OS issuesProcesses

How to divide processors among multiple processes? Time sharing vs. space sharing

ThreadsNew synchronization mechanismsHow to schedule threads of a single process on its allocated processors?Affinity scheduling?

Page 13: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

13

OS Structure

04.784 OSDI Overview 25

Traditional OS structure

Monolithic/layered systemsone/N layers all executed in “kernel-mode” good performance but rigid

OS kernel

hardware

userprocess

filesystem

memorysystem

user system calls

Page 14: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

14

04.784 OSDI Overview 26

Micro-kernel OS

client-server model, IPC between clients and serversthe micro-kernel provides protected communicationSome OS functions implemented as user-level servers flexible but efficiency is the problemeasy to extend for distributed systems

micro-kernel

hardware

clientprocess

fileserver

memoryserver

IPC

user mode

04.784 OSDI Overview 27

Extensible OS kernel

User processes can load customized OS services into the kernel Good performance but protection and scalability become problems

extensible kernel

hardware

process

defaultmemoryservice

user modeprocess

mymemoryservice

Page 15: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

15

04.784 OSDI Overview 28

Virtual Machines

Old concept which is heavily revived todaythe real hardware is “cloned” into several identical virtual machinesOS functionality built on top of the virtual machine

hardware

user

exokernel

allocate resourceOS on virtual machine

Processes, Threads, and Synchronization

Page 16: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

16

04.784 OSDI Overview 30

Execution modeMost processors support at least two modes of execution for protection reasons

Privileged - kernel-modeNon-privileged - user-mode

The portion of the OS that executes in kernel-mode is called the kernel

Can freely access hardware resourcesProtected from interference by user programs

Code running in kernel-mode can do anything—no protectionUser code executes in user-modeOS functionality that does not need direct access of hardware may also run in user-mode

04.784 OSDI Overview 31

Interrupts and trapsInterrupt: an asynchronous event

External events (not related to the processor state) which occur independently of the instruction execution in the processorCan be masked (specifically or not)e.g. I/O completion interrupt

Traps: a synchronous eventConditionally or unconditionally caused by the execution of the current instructione.g., floating point error

Interrupt and trap events are predefinedEach interrupt and trap has an associated interrupt vectorInterrupt vector specifies handler that should be called when the event occurs (i.e. points to the handler)

Interrupts and traps force the processor to save the current state of execution and transfer control to the handler

Page 17: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

17

04.784 OSDI Overview 32

A processAn “instantiation” of a programSystem abstraction—the set of resources required for executing a program

Execution context(s)Address spaceFile handles, communication endpoints, etc.Register contents (i.e. process execution “state”)

Historically, all of the above “lumped” into a single abstractionMore recently, split into several abstractions

Threads, address space, protection domain, etc.

04.784 OSDI Overview 33

OS process managementSupports user creation/destruction of processes and support for inter-process communication (IPC)

Allocates resources to processes according to specific policies

Interleaves the execution of multiple processes to increase system utilization and permit effective sharing of resources among several users

Page 18: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

18

04.784 OSDI Overview 34

Process imageThe physical representation of a process in the OSRequires a process control data structure (the “PCB” – Process Control Block)

Identification: process, parent process, userControl: scheduling (state, priority), resources (memory, openedfiles), IPCExecution contexts—threadsAn address space consisting of code, data, and stack segments

04.784 OSDI Overview 35

User modeWhen running in user-mode, a process can only access its virtual memory and processor resources (registers) directlyAll other resources can only be accessed indirectly through the

kernel by “calling the system”System callA system call is a call because it looks like a procedure callIn actuality, it’s a software trap

Why is a system call a “trap”, instead of a procedure call?How it is doneYou end up running OS code not a part of user program

Page 19: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

19

04.784 OSDI Overview 36

System calls in a monolithic OS

kernel mode

user mode

read(…)

PC PSW

code for read system call

trap

interrupt vector for trap instruction

iret

04.784 OSDI Overview 37

Process creationHow to create a process? - Use a system call (of course)!In UNIX, a process can create another process using the fork()system call

int pid = fork()

The creating process is called the parent and the new process is called the childThe child process is created as a copy of the parent process (process image and process control structure) except for the identification and scheduling stateParent and child processes run in two different address spaces—by default no memory sharingProcess creation is expensive because of this copying

Page 20: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

20

04.784 OSDI Overview 38

Process creation using fork()The UNIX shell is a command-line interpreter whose basic purpose is to allow users to run applications on a UNIX systemcmd arg1 arg2 ... argN

While(TRUE) {get_command(cmd, arguments)

if (fork() != 0) { /* parent */wait(&status);

} else { /* child */exec(cmd, arguments)

}}

04.784 OSDI Overview 39

Inter-process communicationMost operating systems provide several abstractions for inter-process communication: message passing , shared memory, etc. Communication requires synchronization between processes (i.e. data must be produced before it is consumed) Synchronization can be implicit (message passing) or may have to be explicit (shared memory)Explicit synchronization can be provided by the OS (semaphores, monitors, etc.) or can be achieved exclusively in user-mode (if processes share memory)

Ugly and tedious

Page 21: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

21

04.784 OSDI Overview 40

ThreadsWhy limit ourselves to a single execution context?

For example, have to use select() to deal with multiple outstanding events. Having multiple execution contexts is more natural.Nice fit with multiprocessor systems

Multiple execution contexts threadsAll the threads of a process share the same address space and the same resourcesEach thread contains

An execution state: running, ready, etc.An execution context: PC, SP, other registersA per-thread stack

04.784 OSDI Overview 41

Process address space revisited

OS

Code

(Global) Data

Stack

Heap

(a) Single-threaded address space

OS

Code

(Global) DataStack

Heap

Stack

(b) Multi-threaded address space

Page 22: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

22

04.784 OSDI Overview 42

Threads vs. processesWhy multiple threads?

Can’t we use multiple processes to do whatever we can do with multiple threads?

Of course, we need to be able to share memory (and other resources) between multiple processesBut this sharing is already supported

Operations on threads (creation, termination, scheduling, etc..) are cheaper than the corresponding operations on processes

This is because thread operations do not involve manipulations of other resources associated with processes (especially memory)

Inter-thread communication is supported through shared memory without kernel intervention

04.784 OSDI Overview 43

Thread state diagram

ready running

blockedsuspended

dispatch

timeout

wait forevent

event occurred

thread scheduling

activatesuspend

suspend

processscheduling

(swapped out)

Page 23: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

23

04.784 OSDI Overview 44

Thread switchingTypically referred to as a context switchContext switching is the act of taking a thread off of the processor and replacing it with another one that is waiting to runA context switch takes place when

Time quota allocated to the executing thread expiresThe executing thread performs a blocking system callA memory fault due to a page missEtc.

How to do a context switch?

04.784 OSDI Overview 45

Thread implementationKernel-level threads (lightweight processes)

Kernel sees multiple execution contextThread management done by the kernel

User-level threadsImplemented as a thread library which contains the code for thread creation, termination, scheduling and switchingKernel sees one execution context and is unaware of thread activity

Page 24: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

24

04.784 OSDI Overview 46

Threads: user- vs. kernel-levelAdvantages of user-level threads

Performance: low-cost thread operations (do not require crossing protection domains)Flexibility: scheduling can be application specificPortability: user-level thread library easy to port

Disadvantages of user-level threadsIf a user-level thread is blocked in the kernel, the entire process (all threads of that process) are blockedCannot take advantage of multiprocessing (the kernel assigns oneprocess to only one processor)

04.784 OSDI Overview 47

SynchronizationWhy synchronization?Problem

Threads (or processes) must (sometimes) share dataData integrity must be maintained

ExampleTransfer $10 from account A to account B

A ← A + 10B ← B - 10

We don’t want to be able to read A and B between the previous two statements

Page 25: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

25

04.784 OSDI Overview 48

Some terminologyCritical section: a section of code which reads or writes shared dataRace condition: potential for interleaved execution of a critical section by multiple threads

Results are non-deterministic

Mutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sectionsDeadlock: permanent blocking of threadsStarvation: execution but no progress

04.784 OSDI Overview 49

Requirements for mutexNo assumptions on hardware: speed, # of processorsExecution of CS takes a finite timeA thread/process not in CS cannot prevent other threads/processes to enter the CSEntering CS cannot de delayed indefinitely: no deadlock or starvation

Page 26: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

26

04.784 OSDI Overview 50

Synchronization primitivesMost common primitives

Mutex/locksCondition variablesSemaphores

04.784 OSDI Overview 51

Mutual Exclusion

Lock(A)Lock(B)A ← A + 10B ← B - 10Unlock(B)Unlock(A)

Mutual exclusion ≡ want to be the only thread modifying a set of data items

Can look at it as exclusive access to data items or to a piece of code

Have three components:Acquire, Release, Waiting

Acquire/release operations often termed Lock/UnlockExample: transferring $10 from B to A

Function Transfer (Amount, A, B)Lock(Transfer_Lock)A ← A + 10B ← B - 10Unlock(Transfer_Lock)

Page 27: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

27

04.784 OSDI Overview 52

What to do while waiting?Spinning

Waiting threads keep testing location until it changes valueNot very efficient in uniprocessor systems

BlockingOS or RT system de-schedules waiting threads

Spinning vs. blocking becomes an issue in multiprocessor systems

04.784 OSDI Overview 53

Deadlock

Lock A Lock B

A B

Page 28: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

28

04.784 OSDI Overview 54

Deadlock

Lock A Lock B

A B

04.784 OSDI Overview 55

Deadlock

Lock A Lock B

A B

Page 29: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

29

04.784 OSDI Overview 56

Deadlock (cont’d)Deadlock can occur whenever multiple parties are competing for exclusive access to multiple resourcesHow can we avoid deadlocks?

Deadlock preventionHow? See a textbook …ExpensiveWhat to do when discover a deadlock is about to happen?

Deadlock detection and recoveryHow to detect? How to recover?Potentially Expensive

Impose strict ordering on lockse.g., if need to lock both A and B, always lock A first, then lock B

04.784 OSDI Overview 57

SemaphoresSynchronized counting variablesFormally, a semaphore is comprised of:

An integer valueTwo operations: P() and V()

P()While value = 0, sleepDecrement value and return

V()Increments valueIf there are any threads sleeping waiting for value to become non-zero, wakeup at least 1 thread

Used around critical sections to implement “locks” (or …)

Page 30: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

30

04.784 OSDI Overview 58

Condition variablesA condition variable is always associated with:

A conditionA lock

Typically used to wait for the condition to take on a given valueThree operations:

cond_wait(lock, cond_var)cond_signal(cond_var)cond_broadcast(cond_var)

04.784 OSDI Overview 59

Condition variablescond_wait(lock, cond_var)

Release the lockSleep on cond_varWhen wakened by the system, re-acquire the lock and return

cond_signal(cond_var)If at least 1 thread is sleeping on cond_var, wake 1 upOtherwise, no effect

cond_broadcast(cond_var)If at least 1 thread is sleeping on cond_var, wake everyone upOtherwise, no effect

Page 31: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

31

04.784 OSDI Overview 60

Condition variablesCondition variables are implemented using locksImplementation is tricky because it involves multiple locks and a scheduling queueImplemented in the OS or run-time thread systems because they involve scheduling operations

Sleep/Wake

04.784 OSDI Overview 61

Posix threads (pthreads)thread creation and termination

pthread_create(&tid,NULL,start_fn,arg);pthread_exit(status);

thread joinpthread_join(tid, &status);

mutual exclusionpthread_mutex_lock(&lock);pthread_mutex_unlock(&lock);

condition variablepthread_cond_wait(&c,&lock);pthread_cond_signal(&c);

Page 32: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

32

Memory Management

04.784 OSDI Overview 63

Memory hierarchy

Registers

Cache

Memory

Question: What if we want to support programs that require more memory than is available in the system?

Page 33: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

33

04.784 OSDI Overview 64

Registers

Cache

Memory

Virtual Memory

Memory hierarchy (2)

Answer: Pretend we had something bigger→ Virtual Memory

04.784 OSDI Overview 65

Virtual memory: pagingA page is a cacheable unit of virtual memoryThe OS controls the mapping between pages of VM and “real” memory

More flexible (at a cost)

Cache

Memory

Memory

VM

framepage

Page 34: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

34

04.784 OSDI Overview 66

Two views of memoryView from the hardware—physical memoryView from the software—what program seesMemory management in the OS coordinates these two views

Consistency: all address spaces can look “basically the same”Relocation: processes can be loaded at any physical addressProtection: a process cannot maliciously access memory belonging to another processSharing: may allow sharing of physical memory (must implement control)

04.784 OSDI Overview 67

Virtual MemoryVirtual memory is the OS abstraction that gives the programmer the illusion of an address space that may be larger than the physical address spaceVirtual memory can be implemented using either paging or segmentation but paging is presently most commonVirtual memory is motivated by both

Convenience: the programmer does not have to deal with the fact that individual machines may have very different amount of physical memory or with the sharing of memory among many usersFragmentation in multi-programming environments

Page 35: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

35

04.784 OSDI Overview 68

Hardware translation

Translation from logical to physical can be done in software but without protectionHardware support is needed to ensure protectionSimplest solution with two registers: base and size

Processor Physicalmemory

translationbox (MMU)

04.784 OSDI Overview 69

Paging hardware

Pages are of fixed sizeThe physical memory corresponding to a page is called page frameTranslation done through a page table indexed by page numberEach entry in a page table contains the physical frame number that the virtual page is mapped to and the state of the page in memoryState: valid/invalid, access permission, reference bit, modified bit, caching Paging is transparent to the programmer

virtual address

page table

+ physical addresspage # offset

Page 36: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

36

04.784 OSDI Overview 70

Address translation

CPU p d

p

f

f d

f

d

page tableMemory

virtual address

physical address

04.784 OSDI Overview 71

Translation Lookaside BuffersTranslation on every memory access—must be fastWhat to do? Caching, of course …

Why does caching work? That is, we still have to lookup the page table entry and use it to do translation, right?Same as normal memory cache—cache is smaller so can spend more $$ to make it faster

Cache for page table entries is called the Translation Lookaside Buffer (TLB)

Typically fully associativeNo more than 64 entries

Each TLB entry contains a page number and the corresponding PT entryOn each memory access, we look for the page—>frame mapping in the TLB

Page 37: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

37

04.784 OSDI Overview 72

Address translation

CPU p d

f d

f

d

TLB

Memory

virtual address

physical address

p/f

f

04.784 OSDI Overview 73

TLB missWhat if the TLB does not contain the appropriate PT entry?

TLB missEvict an existing entry if do not have any free ones

Replacement policy?

Bring in the missing entry from the PT

TLB misses can be handled in hardware or softwareSoftware allows application to assist in replacement decisions

Page 38: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

38

04.784 OSDI Overview 74

Where to store address space?Address space may be larger than physical memoryWhere do we keep it?Where do we keep the page table?

04.784 OSDI Overview 75

Where to store address space?

On the next device down our storage hierarchy, of course …

Memory

VM

Disk

Page 39: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

39

04.784 OSDI Overview 76

Where to store page table?In memory, of course …

OS

Code

Globals

Stack

Heap

P1 Page Table

P0 Page Table

• Interestingly, use memory to “enlarge” view of memory, leaving LESS physical memory

• This kind of overhead is common

• Got to know what the right trade-off is

• Have to understand common application characteristics

• Have to be common enough!

04.784 OSDI Overview 77

Page table structure

Page table can become hugeWhat to do?

Two-Level PT: saves memory but requires two lookups per accessPage the page tablesInverted page tables (one entry per page frame in physical memory): translation through hash tables

PageTable

MasterPT

2nd-LevelPTs

P1 PT

P0 PT

Kernel PTNon-page-able

Page-able

OS Segment

Page 40: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

40

04.784 OSDI Overview 78

How to deal with VM > RAM?

If address space of each process is ≤ size of physical memory, then no problem

Still useful to deal with fragmentation

When VM larger than physical memoryPart stored in memoryPart stored on disk

How do we make this work?

04.784 OSDI Overview 79

Demand pagingTo start a process (program), just load the code page where the process will start executingAs process references memory (instructions or data) outside of loaded page, bring in as necessaryHow to represent fact that a page of VM is not yet in memory?

012

1 vii

A

BC

0

1

23

A

0

1

2

BC

VM

Paging Table Memory Disk

Page 41: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

41

04.784 OSDI Overview 80

Page faultWhat happens when process references a page marked as invalid inthe page table?

Page fault trapCheck that reference is validFind a free memory frameRead desired page from diskChange valid bit of page to vRestart instruction that was interrupted by the trap

Is it easy to restart an instruction?What happens if there is no free frame?

04.784 OSDI Overview 81

Page fault (2)So, what can happen on a memory access?

TLB miss → read page table entryTLB miss → read kernel page table entryPage fault for necessary page of process page tableAll frames are used → need to evict a page → modify a process page table entry

TLB miss → read kernel page table entryPage fault for necessary page of process page tableUh oh, how deep can this go?

Read in needed page, modify page table entry, fill TLB

Page 42: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

42

04.784 OSDI Overview 82

Cost of handling a page faultTrap, check page table, find free memory frame (or find victim) … about 200 - 600 μsDisk seek and read … about 10 msMemory access … about 100 nsPage fault degrades performance by ~100,000!!!!!

And this doesn’t even count all the additional things that can happen along the way

Better not have too many page faults!If want no more than 10% degradation, can only have 1 page faultfor every 1,000,000 memory accessesOS had better do a great job of managing the movement of data between secondary storage and main memory

04.784 OSDI Overview 83

Page replacementWhat if there’s no free frame left on a page fault?

Free a frame that’s currently being usedSelect the frame to be replaced (victim)Write victim back to diskChange page table to reflect that victim is now invalidRead the desired page into the newly freed frameChange page table to reflect that new page is now validRestart faulting instructions

Optimization: do not need to write victim back if it has not been modified (need dirty bit per page).

Page 43: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

43

04.784 OSDI Overview 84

Page replacement (2)Highly motivated to find a good replacement policy

That is, when evicting a page, how do we choose the best victim in order to minimize the page fault rate?

Is there an optimal replacement algorithm?If yes, what is the optimal page replacement algorithm?Let’s look at an example:

Suppose we have 3 memory frames and are running a program that has the following reference pattern

7, 0, 1, 2, 0, 3, 0, 4, 2, 3

Suppose we know the reference pattern in advance ...

04.784 OSDI Overview 85

Page replacement (3)Suppose we know the access pattern in advance

7, 0, 1, 2, 0, 3, 0, 4, 2, 3Optimal algorithm is to replace the page that will not be used for the longest period of timeWhat’s the problem with this algorithm?Realistic policies try to predict future behavior on the basis of past behavior

Works because of locality

Page 44: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

44

04.784 OSDI Overview 86

FIFOFirst-in, First-out

Be fair, let every page live in memory for about the same amount of time, then toss it.

What’s the problem?Is this compatible with what we know about behavior of programs?

How does it do on our example?

7, 0, 1, 2, 0, 3, 0, 4, 2, 3

04.784 OSDI Overview 87

LRULeast Recently Used

On access to a page, timestamp itWhen need to evict a page, choose the one with the oldest timestampWhat’s the motivation here?

Is LRU optimal?In practice, LRU is quite good for most programs

Is it easy to implement?

Page 45: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

45

04.784 OSDI Overview 88

Not frequently used caseHave a reference bit and software counter for each page frameAt each clock interrupt, the OS adds the reference bit of each frame to its counter and then clears the reference bitWhen need to evict a page, choose frame with lowest counterWhat’s the problem?

Doesn’t forget anything, no sense of time – hard to evict a page that was referenced a lot sometime in the past but is no longer relevant to the computationUpdating counters is expensive, especially since memory is getting rather large these days

Can be improved with an aging scheme: counters are shifted right before adding the reference bit and the reference bit is added to the leftmost bit (rather than to the rightmost one)

04.784 OSDI Overview 89

Clock (second-chance)Arrange physical pages in a circle, with a clock handHardware keeps 1 used bit per frame. Sets used bit on memory reference to a frame.

If bit is not set, hasn’t been used for a while

On page fault:Advance clock handCheck used bit

If 1, has been used recently, clear and go onIf 0, this is our victim

Can we always find a victim?

Page 46: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

46

04.784 OSDI Overview 90

Nth-chanceSimilar to clock algorithm, exceptMaintain a counter as well as a used bitOn page fault:

Advance clock handCheck used bit

If 1, clear and set counter to 0If 0, increment counter, if counter < N, go on, otherwise, this is our victim

Why?N larger → better approximation of LRU

What’s the problem if N is too large?

04.784 OSDI Overview 91

Multi-programming environmentWhy?

Better utilization of resources (CPU, disks, memory, etc.)

Problems?Mechanism – TLB?Fairness?Over commitment of memory

What’s the potential problem?Each process needs it working set to perform wellIf too many processes are running, can have thrashing

Page 47: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

47

04.784 OSDI Overview 92

Support for multiple processesMore than one address space can be loaded in memoryA register points to the current page tableOS updates the register when context switching between threads from different processesMost TLBs can cache entries from more than one PT

Store the process id to distinguish between virtual addresses belonging to different processes

If TLB caches entries from only one PT then it must be flushed at process switch time

04.784 OSDI Overview 93

Sharing

physical memory:

v-to-p memory mappings

processes:

virtual address spacesp1 p2

Page 48: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

48

Input / Output (I/O)

04.784 OSDI Overview 95

I/O DevicesSo far we have talked about how to abstract and manage the CPU and memoryComputation “inside” a computer is useful only if some results are communicated “outside” of the computerI/O devices are the computer’s interface to the outside world (I/O ≡ Input/Output)

Example devices: display, keyboard, mouse, speakers, network interface, and disk

Page 49: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

49

04.784 OSDI Overview 96

CPU Memory

memory bus

I/O bus

Net interfaceDisk

Basic Computer StructureBasic Computer Structure

04.784 OSDI Overview 97

CPU

System Bus &MMU/AGP/PCI

Controller

I/O Bus

IDE DiskController

USBController Another

I/O BusSerial &

Parallel Ports Keyboard & Mouse

Intel SR440BX Motherboard

Page 50: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

50

04.784 OSDI Overview 98

CPU and I/O Device Communication

CPU/Memory ⇒ I/O DevicesHow does the CPU communicate with I/O devices?

Send/receive messages?Memory map

Each I/O device assigned a portion of the physical address spaceCPU I/O device

CPU writes to locations in this area to "talk" to I/O device

I/O device CPUPolling: CPU repeatedly checks location(s) in portion of address space assigned to deviceInterrupt: Device sends an interrupt (on an interrupt line) to get the attention of the CPU

CPU writing to (or reading from) the address range of device is called programmed I/O

04.784 OSDI Overview 99

Programmed I/O vs. DMA (1)Programmed I/O is O.K. for sending commands, receiving status, and communication of a small amount of dataInefficient for large amount of data however

Keeps CPU busy (doing useless work) during the transferProgrammed I/O ≡ memory operations → slow

Direct Memory AccessDevice read/write directly from/to memoryMemory → device typically initiated from CPUDevice → memory can be initiated by either the device or the CPU

Page 51: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

51

04.784 OSDI Overview 100

Programmed I/O vs. DMA (2)

CPU Memory

Disk

Interconnect

CPU Memory

Disk

Interconnect

CPU Memory

Disk

Interconnect

ProgrammedI/O

DMADevice Memory

DMAMemory Device

04.784 OSDI Overview 101

Device DriversOS module controlling an I/O deviceHides the device specifics from the higher layers in the OS/kernel

Support a common APIUNIX: block or character device

Block: device communicates with the CPU/memory in fixed-size blocksCharacter: stream of bytes

Translates logical I/O into device I/OE.g. logical disk blocks into {cylinder, head, & sector}Performs data buffering and scheduling of I/O operationsStructure

Several synchronous entry points (system calls): device initialization, queue I/O requests, state control, read/writeAn asynchronous entry point to handle interrupts

Page 52: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

52

04.784 OSDI Overview 102

I/O BufferingI/O Transfer – DMA

After an I/O request is placed, the source/destination of the I/O transfer (i.e. a buffer) must be “page-fixed”/”pinned” in memoryTo allow user process to continue (when possible), data is oftencopied from user address space to kernel buffers which are also pinned in memory

OK for write, not for read (no concurrency since waiting for input)Copying is expensive (and long block time for read)

This is the motivation for “asynchronous I/O”

Devices are typically slow compared to CPUHow do we speed up accesses? Caching, of course …

I/O bufferingBuffer cache: a buffer in main memory for block devicesCharacter queue: follows the producer/consumer model (charactersin the queue are read once)

04.784 OSDI Overview 103

Buffer CacheWhen an I/O request is made for a block, the buffer cache is checked firstIf the data is missing from the cache, it is read into the buffer cache from the deviceExploits locality of reference as does any other cacheReplacement policies similar to those for VMUNIX

Historically, UNIX has a buffer cache for the disk which does not share buffers with character/stream devicesUnfortunately adds overhead in a path that has become increasingly common: disk →NIC

E.g. file service

Page 53: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

53

04.784 OSDI Overview 104

File SystemsFile system is an abstraction of the disk

Track/sector → filesTo a user process

A file looks like a contiguous block of bytesA file system provides a coherent view of a group of files

Typically also provides protection

API: create, delete, read, write filesPerformance: throughput vs. response timeReliability: goal is to minimize the potential for lost or destroyed data

E.g. RAID could be implemented in the OS as part of the disk device driver

04.784 OSDI Overview 105

Unix File SystemOrdinary files (uninterpreted byte streams)Directories

“File of files”Organized as a rooted “tree” (actually a DAG)Pathnames (relative and absolute)Contains links to parent and itself as well as contained filesMultiple links to files can exist

Link - hard OR symbolic

Typically tree-structured file hierarchiesMounted on existing space by using ‘mount’No links between different file systems

Page 54: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

54

04.784 OSDI Overview 106

...

Unix File System (2)

/usr/lib/libc.a or /lib/libc.a

. . .... ......

...

...

...

/ Root directory

bin usr lib tmp

lib

Libc.a trashX11new.a

Basically a tree, but links convert toDAGs (no cycles!)

04.784 OSDI Overview 107

UNIX File System (3)root

swap

bin

usr

usr2

logical file system

file systemslogical disks

physical disks

Mapping file systemsto disks

Page 55: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

55

04.784 OSDI Overview 108

File NamingEach file has a unique nameUser visible (external) name must be symbolic

In a hierarchical file system, unique external names are given as pathnames (path from the root to the file)

Internal names: i-node in UNIX - an index into a persistent array of file descriptors/headers for a specific partitionDirectory: translation from external to internal names

May have more than one external name for a single internal name (i.e. “name service”)

Information about file is split between the directory and the file descriptor: size, location on disk, owner, permissions, date created, date last modified, date last access, link count

04.784 OSDI Overview 109

Name SpaceIn UNIX, “devices are also treated as files”

E.g. /dev/cdrom, /dev/fd0User process accesses devices by accessing the corresponding file in /dev

Normally hidden from higher level Unix programs

/

dev A B

ttyX CDROM

Page 56: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

56

04.784 OSDI Overview 110

File AllocationContiguous: a contiguous set of blocks is allocated to a file at the time of file creation

Good for sequential filesFile size must be known at the time of file creationExternal fragmentation – like memory allocation when giving a contiguous block to each job

Hmm, so what do we do?Use a disk block table (remember the page table?)Use Indexed allocation to avoid the problem

No/little fragmentationVery flexible - no need to know sizes apriori and can change size dynamically

04.784 OSDI Overview 111

Free Space ManagementNo policy issues here – just mechanismBitmap: one bit for each block on the disk

Good to find a contiguous group of free blocksFiles are often accessed sequentially

Small enough to be kept in memory and therefore fast!

Chained free portions: pointer to the next oneNot so good for sequential access but very flexible

Index: treats free space as a file from which allocations are made to create/expand other files

what is the difference in representation between a file that contains useful data and one that does not (i.e. contains free space) -nothing!

Page 57: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

57

04.784 OSDI Overview 112

UNIX I-nodesMode

Link count

UidGidSize

TimesDisk block 1Disk block 2Disk block 3

Disk block 11

Single indirect

Double indirect

Triple indirect

... ......

......

...

DataBlock

DataBlock

DataBlock

File nameDirectory entry

Disk block 10

Disk block 12

...

04.784 OSDI Overview 113

UNIX I-nodes (2)File

DescriptorTables

Processi

Processj

Processk

(parent)

(child)

Open File Descriptor

Table

I-node ptr

I-nodes

I-node ptr

I-node ptr

R/W pointers

Active files

------

I-nodes

In memory On disk(s)

Page 58: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

58

04.784 OSDI Overview 114

File System Buffer Cacheapplication: read/write files

OS: translate file to disk blocks

...buffer cache ...maintains

controls disk accesses: read/write blocks

hardware:

04.784 OSDI Overview 115

File System Buffer CacheDisks are “stable” while memory is volatile

What happens if you buffer a write and the machine crashes before the write has been saved to disk?Can use write-through but write performance will suffer

Greater write traffic to slow disk

In UNIXUse unbuffered I/O when writing i-nodes or pointer blocksUse buffered I/O for other writes and force sync every 30 seconds

What about replacement?How can we further improve performance?

Page 59: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

59

04.784 OSDI Overview 116

File System ConsistencyFile systems almost always use a buffer/disk cache for performance reasons Two copies of a disk block (in the buffer cache and on disk) consistency problem if the system crashes before all the modified blocks are written back to diskThis problem is critical especially for the blocks that contain control information: i-node, free-list, directory blocks

This is why we have utility programs for checking block and directory consistency and making repairs after system crashes

Write back critical blocks from the buffer cache to the disk immediatelyData blocks are also written back periodically: sync

04.784 OSDI Overview 117

More on File System ConsistencyTo maintain file system consistency the ordering of updates from the buffer cache to the disk is criticalExample: if the directory block is written back before the i-node and the system crashes, the directory structure will be inconsistentAn elaborate solution: use dependencies between blocks containing control data in the buffer cache to specify the ordering of updates

Page 60: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

60

04.784 OSDI Overview 118

Elements of storage management

Users

FileStructure

Records BlockCaches

ControllerCachesDirectory

management

Accesscontrol

Accessmethods

Diskscheduling

Fileallocation

Free spacemanagement

Buffering

File manipulation

04.784 OSDI Overview 119

Protection Mechanisms (1)Files are OS objects: with unique names and a finite set of operations that processes can perform on themA protection domain is a set of {object,rights} where rights is the permission to perform one of the operationsAt each instant in time, each process runs in some protection domainIn Unix, a protection domain is identified by {uid, gid} The protection domain in Unix is switched when running a program with SETUID/SETGID set or when the process enters the kernel mode by issuing a system callFundamental Issue: How to manage all the protection domains?

Page 61: 074.784 Operating Systems Design and …comp7840/notes/1_OSreview_2up.pdf1 074.784 Operating Systems Design and Implementation Peter Graham Spring 2007 (January – April) 04.784 OSDI

61

04.784 OSDI Overview 120

Protection Mechanisms (2)Access Control List (ACL): associates with each object a list of all the protection domains that may access the object and what they may do

In Unix the ACL concept for files is reduced to three protectiondomains: owner, group and others

Much smaller and therefore easier to manage

Capability List (C-list): associates with each process a list of objects that may be accessed along with the operations permitted on them

C-list implementation issues: where/how to store the capabilities (hardware, kernel, encrypted in user space) and how to revoke them and control their distribution to other processes