advanced programming rabie a. ramadan lecture 7. multithreading an overview 2 some of the slides are...

Advanced Programming

Rabie A. Ramadan

Lecture 7

Multithreading An Overview

2Some of the slides are exerted from Jonathan Amsterdam presentation

Processing Elements Architecture

Processing Elements

Simple classification by Flynn: (No. of instruction and data streams)

SISD - conventional SIMD - data parallel, vector computing MISD - systolic arrays MIMD - very general, multiple

approaches.

Current focus is on MIMD model, using general purpose processors.

(No shared memory)

SISD : A Conventional Computer

Speed is limited by the rate at which computer can transfer information internally.

ProcessorProcessorData Input Data Output

Instru

ctio

ns

Ex: PC, Macintosh, Workstations

The MISD Architecture

More of an intellectual exercise than a practical configuration. Few built, but commercially not available

SIMD Architecture

Ex: CRAY machine vector processing,

Ci<= Ai * Bi

InstructionStream

Processor

A

Processor

B

Processor

C

Data Inputstream A

Data Inputstream B

Data Inputstream C

Data Outputstream A

Data Outputstream B

Data Outputstream C

MIMD Architecture

Unlike SISD, MISD, MIMD computer works asynchronously.

Shared memory (tightly coupled) MIMD

Distributed memory (loosely coupled) MIMD

Processor

A

Processor

B

Processor

C

Data Inputstream A

Data Inputstream B

Data Inputstream C

Data Outputstream A

Data Outputstream B

Data Outputstream C

InstructionStream A

InstructionStream B

InstructionStream C

MEMORY

BUS

Shared Memory MIMD machine

Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandability. A memory component or any processor failure

affects the whole system. Increase of processors leads to memory contention.

Ex. : Silicon graphics supercomputers....

MEMORY

BUS

Global Memory SystemGlobal Memory System

ProcessorA

ProcessorA

ProcessorB

ProcessorB

ProcessorC

ProcessorC

MEMORY

BUS

MEMORY

BUS

Distributed Memory MIMD

Communication : based on High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD

easily/ readily expandable Highly reliable (any CPU failure does not affect the whole

system)

ProcessorA

ProcessorA

ProcessorB

ProcessorB

ProcessorC

ProcessorC

MEMORY

BUS

MEMORY

BUS

MemorySystem A

MemorySystem A

MemorySystem B

MemorySystem B

MemorySystem C

MemorySystem C

Serial Vs. Parallel

QPlease

COUNTER

COUNTER 1

COUNTER 2

Single and Multithreaded Processes

Single-threaded Process

Single instruction stream Multiple instruction stream

Multiplethreaded Process

Threads ofExecution

CommonAddress Space

OS:Multi-Processing, Multi-Threaded

Application

Application Application

Application

CPU

Better Response Times in Multiple Application Environments

Higher Throughput for Parallelizeable Applications

CPU

CPU

CPU CPU CPU

Threaded Libraries, Multi-threaded I/OThreaded Libraries, Multi-threaded I/O

Multi-threading, continued...Multi-threaded OS enables parallel, scalable I/O

Application

CPU CPU CPU

Application

Application

OS KernelMultiple, independent I/O requests can be satisfied simultaneously because all the major disk, tape, and network drivers have been multi-threaded, allowing any given driver to run on multiple CPUs simultaneously.

Applications Could have One or More Process

Program in Execution Consists of three components

• An executable program

• Associated data needed by the program

• Execution context of the program• All information the operating

system needs to manage the process

What are Threads? Thread is a piece of code that can execute in concurrence with

other threads.

It is a schedule entity on a processor

Local stateGlobal/ shared statePCHard Context

RegistersRegisters

HardwareContext

Status WordStatus Word

Program CounterProgram Counter

Thread Object

What is a Thread ? A single sequential flow of control

A unit of concurrent execution.

Multiple threads can exist within the same process and share memory resources (on the other hand, processes have each its own process space)

All programs have at least one thread called “main thread”

Thread Resources

Each thread has its own • Program Counter (point of execution)

• Control Stack (procedure call/return)

• Data Stack (local Variables)

All threads share • Heap (objects) – dynamic allocated memory for the process

• Program code

• Class and instance variables

Threaded Process Model

THREAD STACK

THREAD STACK

THREAD DATA

THREAD DATA

THREAD TEXT

THREAD TEXT

SHARED MEMORY

SHARED MEMORY

Threads within a process Independent executables All threads are parts of a process hence communication easier and simpler.

The Multi-Threading Concept

Task A

UniProcessor

A Threading library creates threads and

assigns processor time to each thread

T0T1

T2

The Multi-Threading in Multi-Processors

Task A

Processor 1T0

T1T2

Processor 2

Processor 3

Processor 4

Why multiple Threads?

Speeding up the computations • Two threads , each solve half of the problem then combine

their results

Improving Responsiveness • One thread computes while other handles the user interface

• One thread loads an image from the net while the other computes

Why multiple Threads?

Performing house keeping tasks • One thread does garbage collection while other computes

• One thread rebalances the search tree while the other uses the tree.

Performing multiuser tasks • Several threads run animation simultaneously (as an example)

Simple Example

main :Run thread2

Forever

Print 1

thread2 Forever

Print 2

1

1

2

1

2

2

1

2

Scheduling

Scheduler is part of the Operating System that determines which thread to run next

Two types of schedulers • Pre-emptive – can interrupt the running thread

• Cooperative – a thread must voluntarily yield

Most modern O.S. are pre-emptive

Thread Life cycle New state: At this point, the thread

is considered not alive. Runnable (Ready-to-run) state : �

invoked by the start() method but not actually. The scheduler is aware of the thread but may be scheduled sometimes later

Running state: The thread is �currently executing.

Dead state: If any thread �comes on this state that means it cannot ever run again.

Blocked - A thread can enter in this state because of waiting the resources that are hold by another thread.

Software Models for Multithreaded Programming

Boss/worker model

Work crew model

Pipelining model

Combinations of models

Boss/Worker Model One thread functions as the boss It assigns tasks to worker threads for them to perform. Each worker performs a different task until it has finished, at

which point it notifies the boss that it is ready to receive another task.

Alternatively, the boss polls workers periodically to see whether or not each worker is ready to receive another task.

A variation of the boss/worker model is the work queue model. The boss places tasks in a queue, and workers check the queue and take tasks to perform

Work Crew Model Multiple threads work together on a single task. The task is divided horizontally into pieces that are performed

in parallel Each thread performs one piece.

Example: Group of people cleaning a building. Each person cleans certain rooms or performs certain types of work (washing floors, polishing furniture, and so forth), and each works independently.

Pipelining Model A task is divided vertically into steps. The steps must be performed in sequence to produce a single

instance of the desired result. The work done in each step (except for the first and last) is

based on the previous step and is a prerequisite for the work in the next step.

Combinations of Models

You may find it appropriate to combine the software models in a single program if your task is complex.

Bad News Multithreaded programs are hard to write

Hard to Understand

They are incredibly hard to debug

Anyone thinks that concurrent programming is easy should have his/her thread examined

Threads Assumptions

Threads are executed in any order • Not necessarily to alternate line by line

• Bugs may show up rarely

• Bugs may be hard to repeat

More than one thread try to change memory at the same time • Assumptions about the execution does not apply

• (E. G.) What is the value of i after i=1?

Memory Conflicts

Two threads access the same memory location; they can conflict with each other

The resulting state may be expected wrong

E.g. Two states may try to increment a counter

Terminology

Critical section: a section of code which reads or writes shared dataRace condition: potential for interleaved execution of a critical section by multiple threads

•Results are non-deterministicMutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections Deadlock: permanent blocking of threadsStarvation: one or more threads denied resources; without those resources, the program can never finish its task.

Four requirements for Deadlock Mutual exclusion

• Only one thread at a time can use a resource. Hold and wait

• Thread holding at least one resource is waiting to acquire additional resources held by other threads

No preemption

• Resources are released only voluntarily by the thread holding the resource, after thread is finished with it

Circular wait

• There exists a set {T1, …, Tn} of waiting threads

• T1 is waiting for a resource that is held by T2

• T2 is waiting for a resource that is held by T3

• …

• Tn is waiting for a resource that is held by T1

Memory Synchronization

Thread Synchronization methods

Mutex Locks

Condition Variables

Semaphore

Mutex Locks

Mutex Locks If a data item is shared by a number of threads, race conditions could occur

if the shared item is not protected properly.

The easiest protection mechanism is a lock

For every thread, before it accesses the set of data items, it acquires the lock.

Once the lock is successfully acquired, the thread becomes the owner of that lock and the lock is locked.

Then, the owner can access the protected items. After this, the owner must release the lock and the lock becomes unlocked.

Another thread can acquire the lock

Mutex Locks the use of a lock simply establishes a

critical section.

Before entering a critical section, a thread acquires a lock.

If it is successful, this thread enters the critical section and the lock is locked.

As a result, all subsequent acquiring requests will be queued until the lock is unlocked.

Mutex Locks Restrictions

Only the owner can release the lock • Imagine the following situation. Suppose thread A is the current owner of

lock L and thread B is a second thread who wants to lock the lock. If a non-owner can unlock a lock, thread B can unlock the lock that thread A owns, and, hence, either both threads may be executing in the same critical section, or thread B preempts thread A and executes the

instructions of the critical section.

Recursive lock acquisition is not allowed • The current owner of the lock is not allowed to acquire the same lock

again.

Mutex ExampleThe Dining Philosophers Problem

Imagine that five philosophers who spend their lives just thinking and eating.

In the middle of the dining room is a circular table with five chairs.

The table has a big plate of spaghetti. However, there are only five chopsticks available.

Each philosopher thinks. When he gets hungry, he sits down and picks up the two chopsticks that are closest to him.

If a philosopher can pick up both chopsticks, he eats for a while.

After a philosopher finishes eating, he puts down the chopsticks and starts to think.

The Dining Philosophers ProblemAnalysis

Philosopher Cycle

Philosopher flow

C++ Language Support for Synchronization

Languages with exceptions like C++

• Languages that support exceptions are problematic (easy to make a non-local exit without releasing lock)

• Consider:

void Rtn() {lock.acquire();…DoFoo();…lock.release();

}void DoFoo() {

…if (exception) throw errException;…

}

Notice that an exception in DoFoo() will exit without releasing the lock

C++ Language Support for Synchronization (con’t)

Must catch all exceptions in critical sections• Catch exceptions, release lock, and re-throw exception:

void Rtn() {lock.acquire();try {

…DoFoo();…

} catch (…) { // catch exceptionlock.release(); // release lockthrow; // re-throw the

exception}lock.release();

}void DoFoo() {

…if (exception) throw errException;…

}

Even Better: auto_ptr<T> facility. See C++ Spec.

Can deallocate/free lock regardless of exit method

Java Language Support for Synchronization

Java has explicit support for threads and thread synchronization Bank Account example:

class Account {private int balance;// object constructorpublic Account (int initialBalance) {

balance = initialBalance;}public synchronized int getBalance() {

return balance;}public synchronized void deposit(int

amount) {balance += amount;

}} Every object has an associated lock which gets

automatically acquired and released on entry and exit from a synchronized method.

Condition Variables

Condition Variables (CV)

A condition variable allows a thread to block its own execution until some shared data reaches a particular state.

A condition variable is a synchronization object used in conjunction with a mutex.

A mutex controls access to shared data;

A condition variable allows threads to wait for that data to enter a defined state.

A mutex is combined with CV to avoid the race condition.

Condition Variable

Waiting and signaling on condition variables Routines

• pthread_cond_wait(condition, mutex)• Blocks the thread until the specific condition is signalled.

• Should be called with mutex locked

• Automatically release the mutex lock while it waits

• When return (condition is signaled), mutex is locked again

• pthread_cond_signal(condition)• Wake up a thread waiting on the condition variable.

• Called after mutex is locked, and must unlock mutex after

• pthread_cond_broadcast(condition)• Used when multiple threads blocked in the condition

Condition Variable – for signaling

Think of Producer – consumer problem

Producers and consumers run in separate threads.

Producer produces data and consumer consumes data.

Producer has to inform the consumer when data is available

Consumer has to inform producer when buffer space is available

Without Condition Variables

/* Globals */

int data_avail = 0;

pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER;

void *producer(void *)

{

Pthread_mutex_lock(&data_mutex);

Produce data

Insert data into queue;

data_avail=1;

Pthread_mutex_unlock(&data_mutex);

}

void *consumer(void *)

{

while( !data_avail );

/* do nothing – keep looping!!*/


Extract data from queue;

if (queue is empty)

data_avail = 0;


consume_data();

}

With Condition Variables

int data_avail = 0;

pthread_mutex_t data_mutex = PTHREAD_MUTEX_INITIALIZER;

pthread_cont_t data_cond = PTHREAD_COND_INITIALIZER;

void *producer(void *)

{


Produce data

Insert data into queue;

data_avail = 1;

Pthread_cond_signal(&data_cond);


}

void *consumer(void *)

{


while( !data_avail ) {

/* sleep on condition variable*/

Pthread_cond_wait(&data_cond, &data_mutex);}

/* woken up */

Extract data from queue;

if (queue is empty)

data_avail = 0;


consume_data();

}

So far ….

58

Condition Variable: a queue of threads waiting for something inside a critical section

Key idea: allow sleeping inside critical section by atomically releasing lock at time we go to sleep

Lock: provides mutual exclusion to shared data:

Always acquire before accessing shared data structure Always release after finishing with shared data

Semaphore An extension to mutex locks

A semaphore is an object with two methods Wait and Signal, a private integer counter and a private queue (of threads).

Example Assume that in our corporate print room, we have 5 printers online.

Our print spool manager allocates a semaphore set with 5 semaphores in it, one for each printer on the system.

Since each printer is only physically capable of printing one job at a time, each of our five semaphores will be initialized to a value of 1 (one), meaning that they are all online, and accepting requests.

John sends a print request to the spooler. The print manager looks at the semaphore set, and finds the first semaphore which has a value of one. Before sending John's request to the physical device, the print manager decrements the semaphore for the corresponding printer by a value of negative one (-1). Now, that semaphore's value is zero.

Example A value of zero represents 100% resource utilization on that

semaphore. In our example, no other request can be sent to that printer until it is no longer equal to zero.

When John's print job has completed, the print manager increments the value of the semaphore which corresponds to the printer. Its value is now back up to one (1), which means it is available again.

Semaphore Synchronized counting variables Formally, a semaphore comprises:

• An integer value

• Two operations: P() and V() P() – (e.g. consumer) - also known as wait ()

• While value == 0, sleepDecrement value

V() – (e.g. producer)--- also known as signal()• Increment value

If there are any threads sleeping waiting for value to become non-zero, wakeup at least 1 thread

Assignment is posted

63

advanced programming rabie a. ramadan lecture 7. multithreading an overview 2 some of the slides are...

Documents

memory contention

mimddistributed memory

memory component

multithreaded os

data parallel

mimd computer

mimd model

multiple approaches