advanced programming rabie a. ramadan lecture 7. multithreading an overview 2 some of the slides are...
TRANSCRIPT
Processing Elements
Simple classification by Flynn: (No. of instruction and data streams)
SISD - conventional SIMD - data parallel, vector computing MISD - systolic arrays MIMD - very general, multiple
approaches.
Current focus is on MIMD model, using general purpose processors.
(No shared memory)
SISD : A Conventional Computer
Speed is limited by the rate at which computer can transfer information internally.
ProcessorProcessorData Input Data Output
Instru
ctio
ns
Ex: PC, Macintosh, Workstations
The MISD Architecture
More of an intellectual exercise than a practical configuration. Few built, but commercially not available
SIMD Architecture
Ex: CRAY machine vector processing,
Ci<= Ai * Bi
InstructionStream
Processor
A
Processor
B
Processor
C
Data Inputstream A
Data Inputstream B
Data Inputstream C
Data Outputstream A
Data Outputstream B
Data Outputstream C
MIMD Architecture
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD
Distributed memory (loosely coupled) MIMD
Processor
A
Processor
B
Processor
C
Data Inputstream A
Data Inputstream B
Data Inputstream C
Data Outputstream A
Data Outputstream B
Data Outputstream C
InstructionStream A
InstructionStream B
InstructionStream C
MEMORY
BUS
Shared Memory MIMD machine
Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandability. A memory component or any processor failure
affects the whole system. Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers....
MEMORY
BUS
Global Memory SystemGlobal Memory System
ProcessorA
ProcessorA
ProcessorB
ProcessorB
ProcessorC
ProcessorC
MEMORY
BUS
MEMORY
BUS
Distributed Memory MIMD
Communication : based on High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD
easily/ readily expandable Highly reliable (any CPU failure does not affect the whole
system)
ProcessorA
ProcessorA
ProcessorB
ProcessorB
ProcessorC
ProcessorC
MEMORY
BUS
MEMORY
BUS
MemorySystem A
MemorySystem A
MemorySystem B
MemorySystem B
MemorySystem C
MemorySystem C
Single and Multithreaded Processes
Single-threaded Process
Single instruction stream Multiple instruction stream
Multiplethreaded Process
Threads ofExecution
CommonAddress Space
OS:Multi-Processing, Multi-Threaded
Application
Application Application
Application
CPU
Better Response Times in Multiple Application Environments
Higher Throughput for Parallelizeable Applications
CPU
CPU
CPU CPU CPU
Threaded Libraries, Multi-threaded I/OThreaded Libraries, Multi-threaded I/O
Multi-threading, continued...Multi-threaded OS enables parallel, scalable I/O
Application
CPU CPU CPU
Application
Application
OS KernelMultiple, independent I/O requests can be satisfied simultaneously because all the major disk, tape, and network drivers have been multi-threaded, allowing any given driver to run on multiple CPUs simultaneously.
Applications Could have One or More Process
Program in Execution Consists of three components
• An executable program
• Associated data needed by the program
• Execution context of the program• All information the operating
system needs to manage the process
What are Threads? Thread is a piece of code that can execute in concurrence with
other threads.
It is a schedule entity on a processor
Local stateGlobal/ shared statePCHard Context
RegistersRegisters
HardwareContext
Status WordStatus Word
Program CounterProgram Counter
Thread Object
What is a Thread ? A single sequential flow of control
A unit of concurrent execution.
Multiple threads can exist within the same process and share memory resources (on the other hand, processes have each its own process space)
All programs have at least one thread called “main thread”
Thread Resources
Each thread has its own • Program Counter (point of execution)
• Control Stack (procedure call/return)
• Data Stack (local Variables)
All threads share • Heap (objects) – dynamic allocated memory for the process
• Program code
• Class and instance variables
Threaded Process Model
THREAD STACK
THREAD STACK
THREAD DATA
THREAD DATA
THREAD TEXT
THREAD TEXT
SHARED MEMORY
SHARED MEMORY
Threads within a process Independent executables All threads are parts of a process hence communication easier and simpler.
The Multi-Threading Concept
Task A
UniProcessor
A Threading library creates threads and
assigns processor time to each thread
T0T1
T2
The Multi-Threading in Multi-Processors
Task A
Processor 1T0
T1T2
Processor 2
Processor 3
Processor 4
Why multiple Threads?
Speeding up the computations • Two threads , each solve half of the problem then combine
their results
Improving Responsiveness • One thread computes while other handles the user interface
• One thread loads an image from the net while the other computes
Why multiple Threads?
Performing house keeping tasks • One thread does garbage collection while other computes
• One thread rebalances the search tree while the other uses the tree.
Performing multiuser tasks • Several threads run animation simultaneously (as an example)
Scheduling
Scheduler is part of the Operating System that determines which thread to run next
Two types of schedulers • Pre-emptive – can interrupt the running thread
• Cooperative – a thread must voluntarily yield
Most modern O.S. are pre-emptive
Thread Life cycle New state: At this point, the thread
is considered not alive. Runnable (Ready-to-run) state : �
invoked by the start() method but not actually. The scheduler is aware of the thread but may be scheduled sometimes later
Running state: The thread is �currently executing.
Dead state: If any thread �comes on this state that means it cannot ever run again.
Blocked - A thread can enter in this state because of waiting the resources that are hold by another thread.
Software Models for Multithreaded Programming
Boss/worker model
Work crew model
Pipelining model
Combinations of models
Boss/Worker Model One thread functions as the boss It assigns tasks to worker threads for them to perform. Each worker performs a different task until it has finished, at
which point it notifies the boss that it is ready to receive another task.
Alternatively, the boss polls workers periodically to see whether or not each worker is ready to receive another task.
A variation of the boss/worker model is the work queue model. The boss places tasks in a queue, and workers check the queue and take tasks to perform
Work Crew Model Multiple threads work together on a single task. The task is divided horizontally into pieces that are performed
in parallel Each thread performs one piece.
Example: Group of people cleaning a building. Each person cleans certain rooms or performs certain types of work (washing floors, polishing furniture, and so forth), and each works independently.
Pipelining Model A task is divided vertically into steps. The steps must be performed in sequence to produce a single
instance of the desired result. The work done in each step (except for the first and last) is
based on the previous step and is a prerequisite for the work in the next step.
Combinations of Models
You may find it appropriate to combine the software models in a single program if your task is complex.
Bad News Multithreaded programs are hard to write
Hard to Understand
They are incredibly hard to debug
Anyone thinks that concurrent programming is easy should have his/her thread examined
Threads Assumptions
Threads are executed in any order • Not necessarily to alternate line by line
• Bugs may show up rarely
• Bugs may be hard to repeat
More than one thread try to change memory at the same time • Assumptions about the execution does not apply
• (E. G.) What is the value of i after i=1?
Memory Conflicts
Two threads access the same memory location; they can conflict with each other
The resulting state may be expected wrong
E.g. Two states may try to increment a counter
Terminology
Critical section: a section of code which reads or writes shared dataRace condition: potential for interleaved execution of a critical section by multiple threads
•Results are non-deterministicMutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections Deadlock: permanent blocking of threadsStarvation: one or more threads denied resources; without those resources, the program can never finish its task.
Four requirements for Deadlock Mutual exclusion
• Only one thread at a time can use a resource. Hold and wait
• Thread holding at least one resource is waiting to acquire additional resources held by other threads
No preemption
• Resources are released only voluntarily by the thread holding the resource, after thread is finished with it
Circular wait
• There exists a set {T1, …, Tn} of waiting threads
• T1 is waiting for a resource that is held by T2
• T2 is waiting for a resource that is held by T3
• …
• Tn is waiting for a resource that is held by T1
Mutex Locks If a data item is shared by a number of threads, race conditions could occur
if the shared item is not protected properly.
The easiest protection mechanism is a lock
For every thread, before it accesses the set of data items, it acquires the lock.
Once the lock is successfully acquired, the thread becomes the owner of that lock and the lock is locked.
Then, the owner can access the protected items. After this, the owner must release the lock and the lock becomes unlocked.
Another thread can acquire the lock
Mutex Locks the use of a lock simply establishes a
critical section.
Before entering a critical section, a thread acquires a lock.
If it is successful, this thread enters the critical section and the lock is locked.
As a result, all subsequent acquiring requests will be queued until the lock is unlocked.
Mutex Locks Restrictions
Only the owner can release the lock • Imagine the following situation. Suppose thread A is the current owner of
lock L and thread B is a second thread who wants to lock the lock. If a non-owner can unlock a lock, thread B can unlock the lock that thread A owns, and, hence, either both threads may be executing in the same critical section, or thread B preempts thread A and executes the
instructions of the critical section.
Recursive lock acquisition is not allowed • The current owner of the lock is not allowed to acquire the same lock
again.
Mutex ExampleThe Dining Philosophers Problem
Imagine that five philosophers who spend their lives just thinking and eating.
In the middle of the dining room is a circular table with five chairs.
The table has a big plate of spaghetti. However, there are only five chopsticks available.
Each philosopher thinks. When he gets hungry, he sits down and picks up the two chopsticks that are closest to him.
If a philosopher can pick up both chopsticks, he eats for a while.
After a philosopher finishes eating, he puts down the chopsticks and starts to think.
C++ Language Support for Synchronization
Languages with exceptions like C++
• Languages that support exceptions are problematic (easy to make a non-local exit without releasing lock)
• Consider:
void Rtn() {lock.acquire();…DoFoo();…lock.release();
}void DoFoo() {
…if (exception) throw errException;…
}
Notice that an exception in DoFoo() will exit without releasing the lock
C++ Language Support for Synchronization (con’t)
Must catch all exceptions in critical sections• Catch exceptions, release lock, and re-throw exception:
void Rtn() {lock.acquire();try {
…DoFoo();…
} catch (…) { // catch exceptionlock.release(); // release lockthrow; // re-throw the
exception}lock.release();
}void DoFoo() {
…if (exception) throw errException;…
}
Even Better: auto_ptr<T> facility. See C++ Spec.
Can deallocate/free lock regardless of exit method
Java Language Support for Synchronization
Java has explicit support for threads and thread synchronization Bank Account example:
class Account {private int balance;// object constructorpublic Account (int initialBalance) {
balance = initialBalance;}public synchronized int getBalance() {
return balance;}public synchronized void deposit(int
amount) {balance += amount;
}} Every object has an associated lock which gets
automatically acquired and released on entry and exit from a synchronized method.
Condition Variables (CV)
A condition variable allows a thread to block its own execution until some shared data reaches a particular state.
A condition variable is a synchronization object used in conjunction with a mutex.
A mutex controls access to shared data;
A condition variable allows threads to wait for that data to enter a defined state.
A mutex is combined with CV to avoid the race condition.
Condition Variable
Waiting and signaling on condition variables Routines
• pthread_cond_wait(condition, mutex)• Blocks the thread until the specific condition is signalled.
• Should be called with mutex locked
• Automatically release the mutex lock while it waits
• When return (condition is signaled), mutex is locked again
• pthread_cond_signal(condition)• Wake up a thread waiting on the condition variable.
• Called after mutex is locked, and must unlock mutex after
• pthread_cond_broadcast(condition)• Used when multiple threads blocked in the condition
Condition Variable – for signaling
Think of Producer – consumer problem
Producers and consumers run in separate threads.
Producer produces data and consumer consumes data.
Producer has to inform the consumer when data is available
Consumer has to inform producer when buffer space is available
/* Globals */
int data_avail = 0;
pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER;
void *producer(void *)
{
Pthread_mutex_lock(&data_mutex);
Produce data
Insert data into queue;
data_avail=1;
Pthread_mutex_unlock(&data_mutex);
}
void *consumer(void *)
{
while( !data_avail );
/* do nothing – keep looping!!*/
Pthread_mutex_lock(&data_mutex);
Extract data from queue;
if (queue is empty)
data_avail = 0;
Pthread_mutex_unlock(&data_mutex);
consume_data();
}
int data_avail = 0;
pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cont_t data_cond = PTHREAD_COND_INITIALIZER;
void *producer(void *)
{
Pthread_mutex_lock(&data_mutex);
Produce data
Insert data into queue;
data_avail = 1;
Pthread_cond_signal(&data_cond);
Pthread_mutex_unlock(&data_mutex);
}
void *consumer(void *)
{
Pthread_mutex_lock(&data_mutex);
while( !data_avail ) {
/* sleep on condition variable*/
Pthread_cond_wait(&data_cond, &data_mutex);}
/* woken up */
Extract data from queue;
if (queue is empty)
data_avail = 0;
Pthread_mutex_unlock(&data_mutex);
consume_data();
}
So far ….
58
Condition Variable: a queue of threads waiting for something inside a critical section
Key idea: allow sleeping inside critical section by atomically releasing lock at time we go to sleep
Lock: provides mutual exclusion to shared data:
Always acquire before accessing shared data structure Always release after finishing with shared data
Semaphore An extension to mutex locks
A semaphore is an object with two methods Wait and Signal, a private integer counter and a private queue (of threads).
Example Assume that in our corporate print room, we have 5 printers online.
Our print spool manager allocates a semaphore set with 5 semaphores in it, one for each printer on the system.
Since each printer is only physically capable of printing one job at a time, each of our five semaphores will be initialized to a value of 1 (one), meaning that they are all online, and accepting requests.
John sends a print request to the spooler. The print manager looks at the semaphore set, and finds the first semaphore which has a value of one. Before sending John's request to the physical device, the print manager decrements the semaphore for the corresponding printer by a value of negative one (-1). Now, that semaphore's value is zero.
Example A value of zero represents 100% resource utilization on that
semaphore. In our example, no other request can be sent to that printer until it is no longer equal to zero.
When John's print job has completed, the print manager increments the value of the semaphore which corresponds to the printer. Its value is now back up to one (1), which means it is available again.
Semaphore Synchronized counting variables Formally, a semaphore comprises:
• An integer value
• Two operations: P() and V() P() – (e.g. consumer) - also known as wait ()
• While value == 0, sleepDecrement value
V() – (e.g. producer)--- also known as signal()• Increment value
If there are any threads sleeping waiting for value to become non-zero, wakeup at least 1 thread