cs 519: lecture 2
DESCRIPTION
CS 519: Lecture 2. Processes, Threads, and Synchronization. Assignment 1. Design and implement a threads package (details left out on purpose!) Check the Web page and the newsgroup for further details Groups of 3 students. Anybody still looking for a group? Due 2/12/2001 at 5pm. - PowerPoint PPT PresentationTRANSCRIPT
CS 519: Lecture 2
Processes, Threads, and Synchronization
CS 519Operating System
Theory2
Assignment 1
Design and implement a threads package (details left out on purpose!)
Check the Web page and the newsgroup for further details Groups of 3 students. Anybody still looking for a group? Due 2/12/2001 at 5pm
CS 519Operating System
Theory3
Introduction: Von Neuman Model
Both text (program) and data reside in memory
Execution cycle Fetch instruction Decode instruction Execute instruction
OS is just a program OS text and data reside in memory
too Invoke OS functionality through
procedure calls
CPU
Memory
CS 519Operating System
Theory4
Execution Mode
Most processors support at least two modes of execution for protection reasons Privileged - kernel-mode Non-privileged - user-mode
The portion of the OS that executes in kernel-mode is called the kernel Access hardware resources Protected from interference by user programs
Code running in kernel-mode can do anything - no protection User code executes in user-mode OS functionality that does not need direct access to
hardware may run in user-mode
CS 519Operating System
Theory5
Interrupts and traps
Interrupt: an asynchronous event External events which occur independently of the instruction
execution in the processor, e.g. DMA completion Can be masked (specifically or not)
Trap: a synchronous event Conditionally or unconditionally caused by the execution of the
current instruction, e.g. floating point error, trap instruction Interrupt and trap events are predefined (typically as
integers) Each interrupt and trap has an associated interrupt vector Interrupt vector specifies a handler that should be called when
the event occurs Interrupts and traps force the processor to save the current
state of execution and jump to the handler
CS 519Operating System
Theory6
Process
An “instantiation” of a program System abstraction – the set of resources required for executing
a program Execution context(s) Address space File handles, communication endpoints, etc.
Historically, all of the above “lumped” into a single abstraction More recently, split into several abstractions
Threads, address space, protection domain, etc. OS process management:
Supports creation of processes and interprocess communication (IPC)
Allocates resources to processes according to specific policies Interleaves the execution of multiple processes to increase system
utilization
CS 519Operating System
Theory7
Process Image
The physical representation of a process in the OS
A process control structure includes Identification info: process, parent process, user Control info: scheduling (state, priority), resources
(memory, opened files), IPC Execution contexts – threads An address space consisting of code, data, and stack
segments
CS 519Operating System
Theory8
Process Address Space
OS
Code
Globals
Stack
Heap
A’s activationframe
B
C
A: … B() …
B: … C() …
C:
CS 519Operating System
Theory9
Virtual Memory
Process addresses are virtual memory addresses Mapping from virtual memory to physical
memory Details next lecture Translation: Translation Look-aside Buffer (TLB), page
table
Will cover more next time …
CS 519Operating System
Theory10
User Mode
When running in user-mode, a process can only access its virtual memory and processor resources (registers) directly
All other resources can only be accessed through the kernel by “calling the system” System call A system call is a call because it looks like a procedure
call It’s a system call because, in reality, it’s a software trap
Why is a system call a trap instead of a procedure call?
CS 519Operating System
Theory11
User Mode
When running in user-mode, a process can only access its virtual memory and processor resources (registers) directly
All other resources can only be accessed through the kernel by “calling the system” System call A system call is a call because it looks like a procedure call It’s a system call because, in actuality, it’s a software trap
Why is a system call a trap instead of a procedure call? Need to switch to kernel mode
CS 519Operating System
Theory12
Mode switching
One or several bits in the processor status word (PSW) indicate the current mode of execution Sometimes also the previous mode (Other bits in PSW: interrupt enabled/disabled flags, alignment
check bit, arithmetic overflow bit, parity bit, carry bit, etc.) Mode bits can be changed explicitly while in kernel-mode
but not in user-mode The processor status word can be loaded from the interrupt
vector along with the address of the interrupt handler Setting of kernel mode bit in the interrupt vector allows
interrupt and trap handlers to be “automatically” executed in kernel mode
CS 519Operating System
Theory13
System Call In Monolithic OS
kernel mode
user mode
read(....)
PC PSW
code for read system call
trap
interrupt vector for trap instruction
iret
in-kernel file system(monolithic OS)
CS 519Operating System
Theory14
Signals
OS may need to “upcall” into user processes Signals
UNIX mechanism to upcall when an event of interest occurs
Potentially interesting events are predefined: e.g., segmentation violation, message arrival, kill, etc.
When interested in “handling” a particular event (signal), a process indicates its interest to the OS and gives the OS a procedure that should be invoked in the upcall.
CS 519Operating System
Theory15
Signals (Cont’d)
When an event of interest occurs the kernel handles the event first, then modifies the process’s stack to look as if the process’s code made a procedure call to the signal handler.
When the user process is scheduled next it executes the handler first
From the handler the user process returns to where it was when the event occurred
A
B
A
B
Handler
CS 519Operating System
Theory16
Process Creation
How to create a process? System call. In UNIX, a process can create another process using the
fork() system call int pid = fork()
The creating process is called the parent and the new process is called the child
The child process is created as a copy of the parent process (process image and process control structure) except for the identification and scheduling state
Parent and child processes run in two different address spaces by default no memory sharing
Process creation is expensive because of this copying
CS 519Operating System
Theory17
Example of Process Creation Using Fork
The UNIX shell is command-line interpreter whose basic purpose is for user to run applications on a UNIX system
cmd arg1 arg2 ... argn
CS 519Operating System
Theory18
Inter-Process Communication
Most operating systems provide several abstractions for inter-process communication: message passing, shared memory, etc
Communication requires synchronization between processes (i.e. data must be produced before it is consumed)
Synchronization can be implicit (message passing) or explicit (shared memory)
Explicit synchronization can be provided by the OS (semaphores, monitors, etc) or can be achieved exclusively in user-mode (if processes share memory)
CS 519Operating System
Theory19
Inter-Process Communication
More on shared memory and message passing later
Synchronization after we talk about threads
CS 519Operating System
Theory20
Threads
Why limit ourselves to a single execution context? For example, have to use select() to deal with multiple
outstanding events. Having multiple execution contexts is more natural.
Multiprocessor systems Multiple execution contexts threads All the threads of a process share the same address space
and the same resources Each thread contains
An execution state: running, ready, etc.. An execution context: PC, SP, other registers A per-thread stack
CS 519Operating System
Theory21
Process Address Space Revisited
OS
Code
Globals
Stack
Heap
OS
Code
GlobalsStack
Heap
Stack
(a) Single-threaded address space (b) Multi-threaded address space
CS 519Operating System
Theory22
Threads vs. Processes
Why multiple threads? Can’t we use multiple processes to do whatever it is that we
do with multiple threads? Sure, but we would need to be able to share memory (and other
resources) between multiple processes This sharing is directly supported by threads – see later in the
lecture Operations on threads (creation, termination, scheduling, etc)
are cheaper than the corresponding operations on processes This is because thread operations do not involve manipulations of
other resources associated with processes Inter-thread communication is supported through shared
memory without kernel intervention Why not? Have multiple other resources, why not execution
contexts
CS 519Operating System
Theory23
Process state diagram
ready(in memory)
suspended(swapped out)
swap out swap in
CS 519Operating System
Theory24
Thread State Diagram
ready running
blockedsuspended
dispatch
timeout
wait forevent
event occurred
thread scheduling
activatesuspend
suspend
processscheduling
(swapped out)
CS 519Operating System
Theory25
Thread Switching
Typically referred to as context switching Context switching is the act of taking a thread off
of the processor and replacing it with another one that is waiting to run
A context switch takes place when Time quota allocated to the executing thread expires The executing thread performs a blocking system call A memory fault due to a page miss
CS 519Operating System
Theory26
Context Switching
How to do a context switch? Very carefully!!
Save state of currently executing thread Copy all “live” registers to thread control block Need at least 1 scratch register – points to area of
memory in thread control block that registers should be saved to
Restore state of thread to run next Copy values of live registers from thread control block
to registers How to get into and out of the context switching
code?
CS 519Operating System
Theory27
Context Switching
OS is just code stored in memory, so … Call context switching subroutine The subroutine saves context of current thread, restores
context of the next thread to be executed, and returns The subroutine returns in the context of another (the next)
thread! Eventually, will switch back to the current thread To thread, it appears as if the context switching subroutine just
took a long while to return
CS 519Operating System
Theory28
Thread Switching (Cont’d)
What if switching to a thread of a different process? Context switch to process.
Implications for caches, TLB, page table, etc.? Caches
Physical addresses: no problem Virtual addresses: cache must either have process tag or
must flush cache on context switch TLB
Each entry must have process tag or flush TLB on context switch
Page table Typically have page table pointer (register) that must be
reloaded on context switch
CS 519Operating System
Theory29
Process Swapping
May want to swap out entire process Thrashing if too many processes competing for resources
To swap out a process Suspend all of its threads
Must keep track of whether thread was blocked or ready Copy all of its information to backing store (except for
PCB)
To swap a process back in Copy needed information back into memory, e.g. page
table, thread control blocks Restore each thread to blocked or ready
Must check whether event(s) has (have) already occurred
CS 519Operating System
Theory30
Threads & Signals (Usual Unix Implementation)
A signal is directed to a single thread (we don’t want a signal delivered multiple times), but the handler is associated with the process
Signals can be ignored/masked by threads. Common strategy: one thread responsible to field all signals (calls sigwait() in an infinite loop); all other threads mask all signals
What happens if kernel wants to signal a process when all of its threads are blocked? When there are multiple threads, which thread should the kernel deliver the signal to? OS writes into PCB that a signal should be delivered to the
process If one thread is responsible for all signals, it is scheduled to run
and fields the signal Otherwise, the next time any thread that does not mask the
signal is allowed to run, the signal is delivered to that thread as part of the context switch
CS 519Operating System
Theory31
POSIX Threads (Pthreads) API Standard
thread creation and termination
pthread_create(&tid,NULL,start_fn,arg);
pthread_exit(status)’ thread join
pthread_join(tid, &status); mutual exclusion
pthread_mutex_lock(&lock);
pthread_mutex_unlock(&lock); condition variable
pthread_cond_wait(&c,&lock);
pthread_cond_signal(&c);
CS 519Operating System
Theory32
Thread Implementation
Kernel-level threads (lightweight processes) Kernel sees multiple execution contexts Thread management done by the kernel
User-level threads Implemented as a thread library which contains the
code for thread creation, termination, scheduling and switching
Kernel sees one execution context and is unaware of thread activity
Can be preemptive or not
CS 519Operating System
Theory33
User-Level vs. Kernel-Level Threads
Advantages of user-level threads Performance: low-cost thread operations and context
switching, since it is not necessary to go into the kernel Flexibility: scheduling can be application-specific Portability: user-level thread library easy to port
Disadvantages of user-level threads If a user-level thread is blocked in the kernel, the entire
process (all threads of that process) are blocked Cannot take advantage of multiprocessing (the kernel
assigns one process to only one processor)
CS 519Operating System
Theory34
User-Level vs. Kernel-Level Threads
process
processor
user-levelthreads
threadscheduling
processscheduling
kernel-levelthreads
threadscheduling
kernel
user
processor
threads
threads
processscheduling
CS 519Operating System
Theory35
User-Level vs. Kernel-Level Threads
No reason why we shouldn’t have both (e.g. Solaris)
Most systems now support kernel threads
User-level threads are available as linkable libraries
kernel-levelthreads
processor
user-levelthreads
threadscheduling
threadscheduling
kernel
user
processscheduling
CS 519Operating System
Theory36
Kernel Support for User-Level Threads
Even kernel threads are not quite the right abstraction for supporting user-level threads
Mismatch between where the scheduling information is available (user) and where scheduling on real processors is performed (kernel) When the kernel thread is blocked, the corresponding
physical processor is lost to all user-level threads although there may be some ready to run.
CS 519Operating System
Theory37
Why Kernel Threads Are Not The Right Abstraction
physical processor
kernel thread kernel
user
user-level threads
user-level scheduling
kernel-level schedulingblocked
CS 519Operating System
Theory38
Scheduler Activations
Kernel allocates processors to processes as scheduler activations
Each process contains a user-level thread system which controls the scheduling of the allocated processors
Kernel notifies a process whenever the number of allocated processors changes or when a scheduler activation is blocked due to the user-level thread running on it
The process notifies the kernel when it needs more or fewer scheduler activations (processors)
CS 519Operating System
Theory39
User-Level Threads On Top ofScheduler Activations
physical processor
scheduler activation
kernel
user
user-level threads
user-level scheduling
kernel-level schedulingblocked active
blocked active
CS 519Operating System
Theory40
Thread/Process Operation Latencies
Operation User-level Threads
(s)
Kernel Threads
(s)
Processes (s)
Null fork 34 948 11,300
Signal-wait
37 441 1,840
VAX uniprocessor running UNIX-like OS, 1992.
CS 519Operating System
Theory41
Synchronization
Why synchronization? Problem
Threads share data Data integrity must be maintained
Example Transfer $10 from account A to account B
A A + 10B B - 10
If was taking snapshot of account balances, don’t want to read A and B between the previous two statements
CS 519Operating System
Theory42
Terminology
Critical section: a section of code which reads or writes shared data
Race condition: potential for interleaved execution of a critical section by multiple threads Results are non-deterministic
Mutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections
Deadlock: permanent blocking of threads Livelock: execution but no progress Starvation: one or more threads denied resources
CS 519Operating System
Theory43
Requirements for Mutual Exclusion
No assumptions on hardware: speed, # of processors
Execution of CS takes a finite time A thread/process not in CS cannot prevent other
threads/processes to enter the CS Entering CS cannot de delayed indefinitely: no
deadlock or starvation
CS 519Operating System
Theory44
Synchronization Primitives
Most common primitives Mutual exclusion Condition variables Semaphores Monitors
Need Semaphores Mutual exclusion and condition variables
CS 519Operating System
Theory45
Mutual Exclusion
Lock(A)Lock(B)A A + 10B B - 10Unlock(B)Unlock(A)
Mutual exclusion used when want to be only thread modifying a set of data items
Have three components: Lock, Unlock, Waiting
Example: transferring $10 from A to BFunction Transfer (Amount, A, B) Lock(Transfer_Lock) A A + 10 B B - 10 Unlock(Transfer_Lock)
CS 519Operating System
Theory46
Implementation of Mutual Exclusion
Many read/write algorithms for mutual exclusion See any OS book Disadvantages: difficult to get correct and not very
general
Simple with a little help from the hardware Atomic read-modify-write instructions
Test-and-set Fetch-and-increment Compare-and-swap
CS 519Operating System
Theory47
Atomic Read-Modify-Write Instructions
Test-and-set Read a memory location and, if the value is 0, set it to 1
and return true. Otherwise, return false
Fetch-and-increment Return the current value of a memory location and
increment the value in memory by 1
Compare-and-swap Compare the value of a memory location with an old
value, and if the same, replace with a new value
CS 519Operating System
Theory48
Simple Spin Lock
Test-and-set
Spin_lock(lock){
while (test-and-set(lock) == 0);}
Spin_unlock(lock){
lock = 0;}
CS 519Operating System
Theory49
Load-Locked, Store-Conditional
Load-Linked, Store-Conditional (LLSC) Pair of instructions may be easier to implement in
hardware Load-linked (or load-locked) returns the value of a
memory location Store-conditional stores a new value to the same
memory location if the value of that location has not been changed since the LL. Returns 0 or 1 to indicate success or failure
If a thread is switched out between an LL and an SC, then the SC automatically fails
CS 519Operating System
Theory50
What To Do While Waiting?
Spinning Waiting threads keep testing location until it changes
value Doesn’t work on uniprocessor system
Blocking OS or RT system deschedules waiting threads
Spinning vs. blocking becomes an issue in multiprocessor systems (with > 1 thread/processor)
What is the main trade-off?
CS 519Operating System
Theory51
What To Do While Waiting?
Spinning Waiting threads keep testing location until it changes
value Doesn’t work on uniprocessor system
Blocking OS or RT system deschedules waiting threads
Spinning vs. blocking becomes an issue in multiprocessor systems (with > 1 thread/processor)
What is the main trade-off? Overhead of blocking vs. expected waiting time
CS 519Operating System
Theory52
Deadlock
Lock A Lock B
A B
CS 519Operating System
Theory53
Deadlock
Lock A Lock B
A B
CS 519Operating System
Theory54
Deadlock
Lock A Lock B
A B
CS 519Operating System
Theory55
Deadlock (Cont’d)
Deadlock can occur whenever multiple parties are competing for exclusive access to multiple resources
How can we deal deadlocks? Deadlock prevention
Design a system without one of mutual exclusion, hold and wait, no preemption or circular wait (four necessary conditions)
To prevent circular wait, impose a strict ordering on resources. For instance, if need to lock variables A and B, always lock A first, then lock B
Deadlock avoidance Deny requests that may lead to unsafe states (Banker’s algorithm) Running the algorithm on all resource requests is expensive
Deadlock detection and recovery Check for circular wait periodically. If circular wait is found, abort all
deadlocked processes (extreme solution but very common) Checking for circular wait is expensive
CS 519Operating System
Theory56
Condition Variables
A condition variable is always associated with a condition and a lock
Typically used to wait for a condition to take on a given value
Three operations: cond_wait(lock, cond_var) cond_signal(cond_var) cond_broadcast(cond_var)
CS 519Operating System
Theory57
Condition Variables
cond_wait(lock, cond_var) Release the lock Sleep on cond_var When awaken by the system, reacquire the lock and return
cond_signal(cond_var) If at least 1 thread is sleeping on cond_var, wake 1 up Otherwise, no effect
cond_broadcast(cond_var) If at least 1 thread is sleeping on cond_var, wake everyone
up Otherwise, no effect
CS 519Operating System
Theory58
Producer/Consumer Example
Producer
lock(lock_bp)while (free_bp.is_empty()) cond_wait(lock_bp, cond_freebp_empty)buffer free_bp.get_buffer()unlock(lock_bp)
… produce data in buffer …
lock(lock_bp)data_bp.add_buffer(buffer)cond_signal(cond_databp_empty)unlock(lock_bp)
Consumer
lock(lock_bp)while (data_bp.is_empty()) cond_wait(lock_bp, cond_databp_empty)buffer data_bp.get_buffer()unlock(lock_bp)
… consume data in buffer …
lock(lock_bp)free_bp.add_buffer(buffer)cond_signal(cond_freebp_empty)unlock(lock_bp)
CS 519Operating System
Theory59
Condition Variables
Condition variables are implemented using locks Implementation is tricky because it involves
multiple locks and scheduling queue Implemented in the OS or run-time thread
systems because they involve scheduling operations Sleep/Wakeup
CS 519Operating System
Theory60
Semaphores
Synchronized counting variables A semaphore is comprised of:
An integer value Two operations: P() and V()
P() Decrement value While value < 0, sleep
V() Increments value If there are any threads sleeping, waiting for value to
become zero, wakeup 1 thread
CS 519Operating System
Theory61
Monitors
Semaphores have a few limitations: unstructured, difficult to program correctly. Monitors eliminate these limitations and are as powerful as semaphores
A monitor consists of a software module with one or more procedures, an initialization sequence, and local data (can only be accessed by procedures)
Only one process can execute within the monitor at any one time (mutual exclusion)
Synchronization within the monitor implemented with condition variables (wait/signal) => one queue per condition variable
CS 519Operating System
Theory62
Lock and Wait-free Synchronization (or Mutual Exclusion Considered Harmful)
Problem: Critical sections If a process is delayed or halted in a critical section,
hard or impossible for other processes to make progress
Lock-free (aka non-blocking) concurrent objects Some process will complete an operation on the object
in a finite number of steps
Wait-free concurrent objects Each process attempting to perform an operation on the
concurrent object will complete it in a finite number of steps
Essential difference between these two?
CS 519Operating System
Theory63
Lock and Wait-free Synchronization (or Mutual Exclusion Considered Harmful)
Problem: Critical sections If a process is delayed or halted in a critical section,
hard or impossible for other processes to make progress
Lock-free (aka non-blocking) concurrent objects Some process will complete an operation on the object
in a finite number of steps
Wait-free concurrent objects Each process attempting to perform an operation on the
concurrent object will complete it in a finite number of steps
Essential difference between these two? Latter definition guarantees lack of starvation
CS 519Operating System
Theory64
Herlihy’s Paper
What’s the point? Lock and wait-free synchronization can be very useful for
building multiprocessor systems; they provide progress guarantees
Building lock and wait-free concurrent objects is HARD, so a methodology for implementing these objects is required
Lock and wait-free synchronization provide guarantees but can be rather expensive – lots of copying
M. Greenwald and D. Cheriton. The Synergy between Non-blocking Synchronization and Operating System Structure. In Proceedings of OSDI ’96, Oct, 1996.
CS 519Operating System
Theory65
Single Word Protocol
Fetch_and_add_sync(obj: integer, v: integer) returns(integer) success: boolean := false loop exit when success
old: integer := objnew: integer := old+vsuccess := compare_and_swap(obj, old, new)
end loop return oldend Fetch_and_add_sync
Lock-free but not wait-free!!
CS 519Operating System
Theory66
Small Objects
Small object: occupies 1 block (or less) Basic idea of lock-free object implementation
1. Allocate a new block2. Copy object to new block, making sure copy is consistent3. Modify copied block, i.e. perform operation on copied
block 4. Try to atomically replace pointer to old block with new one
Basic idea of wait-free object implementation Same as before, except that processes should announce
their operations in a shared data structure Any process in step 3 should perform all previously
announced operations, not just its own operations
CS 519Operating System
Theory67
Single Word Protocol
Fetch_and_add_sync(obj: integer, v: integer) returns(integer) success: boolean := false loop exit when success
old: integer := obj new: integer := old+v // allocate, copy and modify
// single word so consistentsuccess := compare_and_swap(obj, old, new) // replace
end loop return oldend Fetch_and_add_sync
Lock-free but not wait-free!!
CS 519Operating System
Theory68
Large Objects
I’ll leave for you to read Basic idea is to make local changes in the data
structure using small object protocol Complicated but not different than locking: e.g., if
you lock an entire tree to modify some nodes, performance will go down the drain