cs 519: lecture 2

CS 519: Lecture 2

Processes, Threads, and Synchronization

CS 519Operating System

Theory2

Assignment 1

Design and implement a threads package (details left out on purpose!)

Check the Web page and the newsgroup for further details Groups of 3 students. Anybody still looking for a group? Due 2/12/2001 at 5pm


Theory3

Introduction: Von Neuman Model

Both text (program) and data reside in memory

Execution cycle Fetch instruction Decode instruction Execute instruction

OS is just a program OS text and data reside in memory

too Invoke OS functionality through

procedure calls

CPU

Memory


Theory4

Execution Mode

Most processors support at least two modes of execution for protection reasons Privileged - kernel-mode Non-privileged - user-mode

The portion of the OS that executes in kernel-mode is called the kernel Access hardware resources Protected from interference by user programs

Code running in kernel-mode can do anything - no protection User code executes in user-mode OS functionality that does not need direct access to

hardware may run in user-mode


Theory5

Interrupts and traps

Interrupt: an asynchronous event External events which occur independently of the instruction

execution in the processor, e.g. DMA completion Can be masked (specifically or not)

Trap: a synchronous event Conditionally or unconditionally caused by the execution of the

current instruction, e.g. floating point error, trap instruction Interrupt and trap events are predefined (typically as

integers) Each interrupt and trap has an associated interrupt vector Interrupt vector specifies a handler that should be called when

the event occurs Interrupts and traps force the processor to save the current

state of execution and jump to the handler


Theory6

Process

An “instantiation” of a program System abstraction – the set of resources required for executing

a program Execution context(s) Address space File handles, communication endpoints, etc.

Historically, all of the above “lumped” into a single abstraction More recently, split into several abstractions

Threads, address space, protection domain, etc. OS process management:

Supports creation of processes and interprocess communication (IPC)

Allocates resources to processes according to specific policies Interleaves the execution of multiple processes to increase system

utilization


Theory7

Process Image

The physical representation of a process in the OS

A process control structure includes Identification info: process, parent process, user Control info: scheduling (state, priority), resources

(memory, opened files), IPC Execution contexts – threads An address space consisting of code, data, and stack

segments


Theory8

Process Address Space

OS

Code

Globals

Stack

Heap

A’s activationframe

B

C

A: … B() …

B: … C() …

C:


Theory9

Virtual Memory

Process addresses are virtual memory addresses Mapping from virtual memory to physical

memory Details next lecture Translation: Translation Look-aside Buffer (TLB), page

table

Will cover more next time …


Theory10

User Mode

When running in user-mode, a process can only access its virtual memory and processor resources (registers) directly

All other resources can only be accessed through the kernel by “calling the system” System call A system call is a call because it looks like a procedure

call It’s a system call because, in reality, it’s a software trap

Why is a system call a trap instead of a procedure call?


Theory11

User Mode

When running in user-mode, a process can only access its virtual memory and processor resources (registers) directly

All other resources can only be accessed through the kernel by “calling the system” System call A system call is a call because it looks like a procedure call It’s a system call because, in actuality, it’s a software trap

Why is a system call a trap instead of a procedure call? Need to switch to kernel mode


Theory12

Mode switching

One or several bits in the processor status word (PSW) indicate the current mode of execution Sometimes also the previous mode (Other bits in PSW: interrupt enabled/disabled flags, alignment

check bit, arithmetic overflow bit, parity bit, carry bit, etc.) Mode bits can be changed explicitly while in kernel-mode

but not in user-mode The processor status word can be loaded from the interrupt

vector along with the address of the interrupt handler Setting of kernel mode bit in the interrupt vector allows

interrupt and trap handlers to be “automatically” executed in kernel mode


Theory13

System Call In Monolithic OS

kernel mode

user mode

read(....)

PC PSW

code for read system call

trap

interrupt vector for trap instruction

iret

in-kernel file system(monolithic OS)


Theory14

Signals

OS may need to “upcall” into user processes Signals

UNIX mechanism to upcall when an event of interest occurs

Potentially interesting events are predefined: e.g., segmentation violation, message arrival, kill, etc.

When interested in “handling” a particular event (signal), a process indicates its interest to the OS and gives the OS a procedure that should be invoked in the upcall.


Theory15

Signals (Cont’d)

When an event of interest occurs the kernel handles the event first, then modifies the process’s stack to look as if the process’s code made a procedure call to the signal handler.

When the user process is scheduled next it executes the handler first

From the handler the user process returns to where it was when the event occurred

A

B

A

B

Handler


Theory16

Process Creation

How to create a process? System call. In UNIX, a process can create another process using the

fork() system call int pid = fork()

The creating process is called the parent and the new process is called the child

The child process is created as a copy of the parent process (process image and process control structure) except for the identification and scheduling state

Parent and child processes run in two different address spaces by default no memory sharing

Process creation is expensive because of this copying


Theory17

Example of Process Creation Using Fork

The UNIX shell is command-line interpreter whose basic purpose is for user to run applications on a UNIX system

cmd arg1 arg2 ... argn


Theory18

Inter-Process Communication

Most operating systems provide several abstractions for inter-process communication: message passing, shared memory, etc

Communication requires synchronization between processes (i.e. data must be produced before it is consumed)

Synchronization can be implicit (message passing) or explicit (shared memory)

Explicit synchronization can be provided by the OS (semaphores, monitors, etc) or can be achieved exclusively in user-mode (if processes share memory)


Theory19

Inter-Process Communication

More on shared memory and message passing later

Synchronization after we talk about threads


Theory20

Threads

Why limit ourselves to a single execution context? For example, have to use select() to deal with multiple

outstanding events. Having multiple execution contexts is more natural.

Multiprocessor systems Multiple execution contexts threads All the threads of a process share the same address space

and the same resources Each thread contains

An execution state: running, ready, etc.. An execution context: PC, SP, other registers A per-thread stack


Theory21

Process Address Space Revisited

OS

Code

Globals

Stack

Heap

OS

Code

GlobalsStack

Heap

Stack

(a) Single-threaded address space (b) Multi-threaded address space


Theory22

Threads vs. Processes

Why multiple threads? Can’t we use multiple processes to do whatever it is that we

do with multiple threads? Sure, but we would need to be able to share memory (and other

resources) between multiple processes This sharing is directly supported by threads – see later in the

lecture Operations on threads (creation, termination, scheduling, etc)

are cheaper than the corresponding operations on processes This is because thread operations do not involve manipulations of

other resources associated with processes Inter-thread communication is supported through shared

memory without kernel intervention Why not? Have multiple other resources, why not execution

contexts


Theory23

Process state diagram

ready(in memory)

suspended(swapped out)

swap out swap in


Theory24

Thread State Diagram

ready running

blockedsuspended

dispatch

timeout

wait forevent

event occurred

thread scheduling

activatesuspend

suspend

processscheduling

(swapped out)


Theory25

Thread Switching

Typically referred to as context switching Context switching is the act of taking a thread off

of the processor and replacing it with another one that is waiting to run

A context switch takes place when Time quota allocated to the executing thread expires The executing thread performs a blocking system call A memory fault due to a page miss


Theory26

Context Switching

How to do a context switch? Very carefully!!

Save state of currently executing thread Copy all “live” registers to thread control block Need at least 1 scratch register – points to area of

memory in thread control block that registers should be saved to

Restore state of thread to run next Copy values of live registers from thread control block

to registers How to get into and out of the context switching

code?


Theory27

Context Switching

OS is just code stored in memory, so … Call context switching subroutine The subroutine saves context of current thread, restores

context of the next thread to be executed, and returns The subroutine returns in the context of another (the next)

thread! Eventually, will switch back to the current thread To thread, it appears as if the context switching subroutine just

took a long while to return


Theory28

Thread Switching (Cont’d)

What if switching to a thread of a different process? Context switch to process.

Implications for caches, TLB, page table, etc.? Caches

Physical addresses: no problem Virtual addresses: cache must either have process tag or

must flush cache on context switch TLB

Each entry must have process tag or flush TLB on context switch

Page table Typically have page table pointer (register) that must be

reloaded on context switch


Theory29

Process Swapping

May want to swap out entire process Thrashing if too many processes competing for resources

To swap out a process Suspend all of its threads

Must keep track of whether thread was blocked or ready Copy all of its information to backing store (except for

PCB)

To swap a process back in Copy needed information back into memory, e.g. page

table, thread control blocks Restore each thread to blocked or ready

Must check whether event(s) has (have) already occurred


Theory30

Threads & Signals (Usual Unix Implementation)

A signal is directed to a single thread (we don’t want a signal delivered multiple times), but the handler is associated with the process

Signals can be ignored/masked by threads. Common strategy: one thread responsible to field all signals (calls sigwait() in an infinite loop); all other threads mask all signals

What happens if kernel wants to signal a process when all of its threads are blocked? When there are multiple threads, which thread should the kernel deliver the signal to? OS writes into PCB that a signal should be delivered to the

process If one thread is responsible for all signals, it is scheduled to run

and fields the signal Otherwise, the next time any thread that does not mask the

signal is allowed to run, the signal is delivered to that thread as part of the context switch


Theory31

POSIX Threads (Pthreads) API Standard

thread creation and termination

pthread_create(&tid,NULL,start_fn,arg);

pthread_exit(status)’ thread join

pthread_join(tid, &status); mutual exclusion

pthread_mutex_lock(&lock);

pthread_mutex_unlock(&lock); condition variable

pthread_cond_wait(&c,&lock);

pthread_cond_signal(&c);


Theory32

Thread Implementation

Kernel-level threads (lightweight processes) Kernel sees multiple execution contexts Thread management done by the kernel

User-level threads Implemented as a thread library which contains the

code for thread creation, termination, scheduling and switching

Kernel sees one execution context and is unaware of thread activity

Can be preemptive or not


Theory33

User-Level vs. Kernel-Level Threads

Advantages of user-level threads Performance: low-cost thread operations and context

switching, since it is not necessary to go into the kernel Flexibility: scheduling can be application-specific Portability: user-level thread library easy to port

Disadvantages of user-level threads If a user-level thread is blocked in the kernel, the entire

process (all threads of that process) are blocked Cannot take advantage of multiprocessing (the kernel

assigns one process to only one processor)


Theory34


process

processor

user-levelthreads

threadscheduling

processscheduling

kernel-levelthreads

threadscheduling

kernel

user

processor

threads

threads

processscheduling


Theory35


No reason why we shouldn’t have both (e.g. Solaris)

Most systems now support kernel threads

User-level threads are available as linkable libraries

kernel-levelthreads

processor

user-levelthreads

threadscheduling

threadscheduling

kernel

user

processscheduling


Theory36

Kernel Support for User-Level Threads

Even kernel threads are not quite the right abstraction for supporting user-level threads

Mismatch between where the scheduling information is available (user) and where scheduling on real processors is performed (kernel) When the kernel thread is blocked, the corresponding

physical processor is lost to all user-level threads although there may be some ready to run.


Theory37

Why Kernel Threads Are Not The Right Abstraction

physical processor

kernel thread kernel

user

user-level threads

user-level scheduling

kernel-level schedulingblocked


Theory38

Scheduler Activations

Kernel allocates processors to processes as scheduler activations

Each process contains a user-level thread system which controls the scheduling of the allocated processors

Kernel notifies a process whenever the number of allocated processors changes or when a scheduler activation is blocked due to the user-level thread running on it

The process notifies the kernel when it needs more or fewer scheduler activations (processors)


Theory39

User-Level Threads On Top ofScheduler Activations

physical processor

scheduler activation

kernel

user

user-level threads

user-level scheduling

kernel-level schedulingblocked active

blocked active


Theory40

Thread/Process Operation Latencies

Operation User-level Threads

(s)

Kernel Threads

(s)

Processes (s)

Null fork 34 948 11,300

Signal-wait

37 441 1,840

VAX uniprocessor running UNIX-like OS, 1992.


Theory41

Synchronization

Why synchronization? Problem

Threads share data Data integrity must be maintained

Example Transfer $10 from account A to account B

A A + 10B B - 10

If was taking snapshot of account balances, don’t want to read A and B between the previous two statements


Theory42

Terminology

Critical section: a section of code which reads or writes shared data

Race condition: potential for interleaved execution of a critical section by multiple threads Results are non-deterministic

Mutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections

Deadlock: permanent blocking of threads Livelock: execution but no progress Starvation: one or more threads denied resources


Theory43

Requirements for Mutual Exclusion

No assumptions on hardware: speed, # of processors

Execution of CS takes a finite time A thread/process not in CS cannot prevent other

threads/processes to enter the CS Entering CS cannot de delayed indefinitely: no

deadlock or starvation


Theory44

Synchronization Primitives

Most common primitives Mutual exclusion Condition variables Semaphores Monitors

Need Semaphores Mutual exclusion and condition variables


Theory45

Mutual Exclusion

Lock(A)Lock(B)A A + 10B B - 10Unlock(B)Unlock(A)

Mutual exclusion used when want to be only thread modifying a set of data items

Have three components: Lock, Unlock, Waiting

Example: transferring $10 from A to BFunction Transfer (Amount, A, B) Lock(Transfer_Lock) A A + 10 B B - 10 Unlock(Transfer_Lock)


Theory46

Implementation of Mutual Exclusion

Many read/write algorithms for mutual exclusion See any OS book Disadvantages: difficult to get correct and not very

general

Simple with a little help from the hardware Atomic read-modify-write instructions

Test-and-set Fetch-and-increment Compare-and-swap


Theory47

Atomic Read-Modify-Write Instructions

Test-and-set Read a memory location and, if the value is 0, set it to 1

and return true. Otherwise, return false

Fetch-and-increment Return the current value of a memory location and

increment the value in memory by 1

Compare-and-swap Compare the value of a memory location with an old

value, and if the same, replace with a new value


Theory48

Simple Spin Lock

Test-and-set

Spin_lock(lock){

while (test-and-set(lock) == 0);}

Spin_unlock(lock){

lock = 0;}


Theory49

Load-Locked, Store-Conditional

Load-Linked, Store-Conditional (LLSC) Pair of instructions may be easier to implement in

hardware Load-linked (or load-locked) returns the value of a

memory location Store-conditional stores a new value to the same

memory location if the value of that location has not been changed since the LL. Returns 0 or 1 to indicate success or failure

If a thread is switched out between an LL and an SC, then the SC automatically fails


Theory50

What To Do While Waiting?

Spinning Waiting threads keep testing location until it changes

value Doesn’t work on uniprocessor system

Blocking OS or RT system deschedules waiting threads

Spinning vs. blocking becomes an issue in multiprocessor systems (with > 1 thread/processor)

What is the main trade-off?


Theory51

What To Do While Waiting?

Spinning Waiting threads keep testing location until it changes

value Doesn’t work on uniprocessor system

Blocking OS or RT system deschedules waiting threads

Spinning vs. blocking becomes an issue in multiprocessor systems (with > 1 thread/processor)

What is the main trade-off? Overhead of blocking vs. expected waiting time


Theory52

Deadlock

Lock A Lock B

A B


Theory53

Deadlock

Lock A Lock B

A B


Theory54

Deadlock

Lock A Lock B

A B


Theory55

Deadlock (Cont’d)

Deadlock can occur whenever multiple parties are competing for exclusive access to multiple resources

How can we deal deadlocks? Deadlock prevention

Design a system without one of mutual exclusion, hold and wait, no preemption or circular wait (four necessary conditions)

To prevent circular wait, impose a strict ordering on resources. For instance, if need to lock variables A and B, always lock A first, then lock B

Deadlock avoidance Deny requests that may lead to unsafe states (Banker’s algorithm) Running the algorithm on all resource requests is expensive

Deadlock detection and recovery Check for circular wait periodically. If circular wait is found, abort all

deadlocked processes (extreme solution but very common) Checking for circular wait is expensive


Theory56

Condition Variables

A condition variable is always associated with a condition and a lock

Typically used to wait for a condition to take on a given value

Three operations: cond_wait(lock, cond_var) cond_signal(cond_var) cond_broadcast(cond_var)


Theory57

Condition Variables

cond_wait(lock, cond_var) Release the lock Sleep on cond_var When awaken by the system, reacquire the lock and return

cond_signal(cond_var) If at least 1 thread is sleeping on cond_var, wake 1 up Otherwise, no effect

cond_broadcast(cond_var) If at least 1 thread is sleeping on cond_var, wake everyone

up Otherwise, no effect


Theory58

Producer/Consumer Example

Producer

lock(lock_bp)while (free_bp.is_empty()) cond_wait(lock_bp, cond_freebp_empty)buffer free_bp.get_buffer()unlock(lock_bp)

… produce data in buffer …

lock(lock_bp)data_bp.add_buffer(buffer)cond_signal(cond_databp_empty)unlock(lock_bp)

Consumer

lock(lock_bp)while (data_bp.is_empty()) cond_wait(lock_bp, cond_databp_empty)buffer data_bp.get_buffer()unlock(lock_bp)

… consume data in buffer …

lock(lock_bp)free_bp.add_buffer(buffer)cond_signal(cond_freebp_empty)unlock(lock_bp)


Theory59

Condition Variables

Condition variables are implemented using locks Implementation is tricky because it involves

multiple locks and scheduling queue Implemented in the OS or run-time thread

systems because they involve scheduling operations Sleep/Wakeup


Theory60

Semaphores

Synchronized counting variables A semaphore is comprised of:

An integer value Two operations: P() and V()

P() Decrement value While value < 0, sleep

V() Increments value If there are any threads sleeping, waiting for value to

become zero, wakeup 1 thread


Theory61

Monitors

Semaphores have a few limitations: unstructured, difficult to program correctly. Monitors eliminate these limitations and are as powerful as semaphores

A monitor consists of a software module with one or more procedures, an initialization sequence, and local data (can only be accessed by procedures)

Only one process can execute within the monitor at any one time (mutual exclusion)

Synchronization within the monitor implemented with condition variables (wait/signal) => one queue per condition variable


Theory62

Lock and Wait-free Synchronization (or Mutual Exclusion Considered Harmful)

Problem: Critical sections If a process is delayed or halted in a critical section,

hard or impossible for other processes to make progress

Lock-free (aka non-blocking) concurrent objects Some process will complete an operation on the object

in a finite number of steps

Wait-free concurrent objects Each process attempting to perform an operation on the

concurrent object will complete it in a finite number of steps

Essential difference between these two?


Theory63

Lock and Wait-free Synchronization (or Mutual Exclusion Considered Harmful)

Problem: Critical sections If a process is delayed or halted in a critical section,

hard or impossible for other processes to make progress

Lock-free (aka non-blocking) concurrent objects Some process will complete an operation on the object

in a finite number of steps

Wait-free concurrent objects Each process attempting to perform an operation on the

concurrent object will complete it in a finite number of steps

Essential difference between these two? Latter definition guarantees lack of starvation


Theory64

Herlihy’s Paper

What’s the point? Lock and wait-free synchronization can be very useful for

building multiprocessor systems; they provide progress guarantees

Building lock and wait-free concurrent objects is HARD, so a methodology for implementing these objects is required

Lock and wait-free synchronization provide guarantees but can be rather expensive – lots of copying

M. Greenwald and D. Cheriton. The Synergy between Non-blocking Synchronization and Operating System Structure. In Proceedings of OSDI ’96, Oct, 1996.


Theory65

Single Word Protocol

Fetch_and_add_sync(obj: integer, v: integer) returns(integer) success: boolean := false loop exit when success

old: integer := objnew: integer := old+vsuccess := compare_and_swap(obj, old, new)

end loop return oldend Fetch_and_add_sync

Lock-free but not wait-free!!


Theory66

Small Objects

Small object: occupies 1 block (or less) Basic idea of lock-free object implementation

1. Allocate a new block2. Copy object to new block, making sure copy is consistent3. Modify copied block, i.e. perform operation on copied

block 4. Try to atomically replace pointer to old block with new one

Basic idea of wait-free object implementation Same as before, except that processes should announce

their operations in a shared data structure Any process in step 3 should perform all previously

announced operations, not just its own operations


Theory67

Single Word Protocol

Fetch_and_add_sync(obj: integer, v: integer) returns(integer) success: boolean := false loop exit when success

old: integer := obj new: integer := old+v // allocate, copy and modify

// single word so consistentsuccess := compare_and_swap(obj, old, new) // replace

end loop return oldend Fetch_and_add_sync

Lock-free but not wait-free!!


Theory68

Large Objects

I’ll leave for you to read Basic idea is to make local changes in the data

structure using small object protocol Complicated but not different than locking: e.g., if

you lock an entire tree to modify some nodes, performance will go down the drain

cs 519: lecture 2

Documents