ceng334 introduction to operating systems

62
1 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL: http://kovan.ceng.metu.edu.tr/ceng334 Monitors, Condition variabless Topics: Monitors Condition Variables

Upload: zandra

Post on 07-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY. URL: http://kovan.ceng.metu.edu.tr/ceng334. CENG334 Introduction to Operating Systems. Monitors, Condition variabless Topics: Monitors Condition Variables. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CENG334 Introduction to Operating Systems

1

CENG334Introduction to Operating Systems

Erol Sahin

Dept of Computer Eng.Middle East Technical University

Ankara, TURKEY

URL: http://kovan.ceng.metu.edu.tr/ceng334

Monitors, Condition variabless Topics:•Monitors•Condition Variables

Page 2: CENG334 Introduction to Operating Systems

2

Issues with SemaphoresMuch of the power of semaphores derives from calls to

down() and up() that are unmatched See previous example!

Unlike locks, acquire() and release() are not always paired.

This means it is a lot easier to get into trouble with semaphores. “More rope”

Would be nice if we had some clean, well-defined language support for synchronization...

Java does!

Adapted from Matt Welsh’s (Harvard University) slides.

Page 3: CENG334 Introduction to Operating Systems

3

MonitorsA monitor is an object intended to be used safely by more than

one thread.

• The defining characteristic of a monitor is that its methods are executed with mutual exclusion.

• That is, at each point in time, at most one thread may be executing any of its methods.

• also provide Condition Variables (CVs) for threads to temporarily give up exclusive access, in order to wait for some condition to be met,

• before regaining exclusive access and resuming their task.

• Use CVs for signaling other threads that such conditions have been met.

Page 4: CENG334 Introduction to Operating Systems

4

Condition VariablesConceptually a condition variable (CV) is a queue of threads,

associated with a monitor, upon which a thread may wait for some assertion to become true.

Threads can use CV’s

• to temporarily give up exclusive access, in order to wait for some condition to be met,

• before regaining exclusive access and resuming their task.

• for signaling other threads that such conditions have been met.

Page 5: CENG334 Introduction to Operating Systems

5

Monitors

This style of using locks and CV's to protect access to a sharedobject is often called a monitor

Think of a monitor as a lock protecting an object, plus a queue of waiting threads.

Shared data

Methods accessingshared data

Waiting threads

At most one thread in the monitor at a time

How is this different than a lock???

Adapted from Matt Welsh’s (Harvard University) slides.

Page 6: CENG334 Introduction to Operating Systems

6

Monitors

Shared data

Methods accessingshared data

unlocked

Adapted from Matt Welsh’s (Harvard University) slides.

Page 7: CENG334 Introduction to Operating Systems

7

Monitors

Shared data

Methods accessingshared data

locked

zzzz...

zzzz...

Sleeping thread no longer “in” the monitor.(But not on the waiting queue either! Why?)

Adapted from Matt Welsh’s (Harvard University) slides.

Page 8: CENG334 Introduction to Operating Systems

8

Monitors

Shared data

Methods accessingshared data

lockedMonitor stays locked!(Lock now owned bydifferent thread...)

zzzz...

notify()

Adapted from Matt Welsh’s (Harvard University) slides.

Page 9: CENG334 Introduction to Operating Systems

9

Monitors

Shared data

Methods accessingshared data

locked

notify()

Adapted from Matt Welsh’s (Harvard University) slides.

Page 10: CENG334 Introduction to Operating Systems

10

Monitors

Shared data

Methods accessingshared data

locked

No guarantee which order threads get into the monitor.(Not necessarily FIFO!)

Adapted from Matt Welsh’s (Harvard University) slides.

Page 11: CENG334 Introduction to Operating Systems

11

Bank Example

monitor Bank{ int TL = 1000; condition haveTL;

void withdraw(int amount) { if (amount < TL)

wait(haveTL);TL -= amount;

}

void deposit(int amount) { TL += amount;notify(haveTL)

}

}

Page 12: CENG334 Introduction to Operating Systems

12

Bank Example

monitor Bank{ int TL = 1000; condition haveTL;

void withdraw(int amount) { while (amount > TL)

wait(haveTL);TL -= amount;

}

void deposit(int amount) { TL += amount;notifyAll(haveTL)

}

}

Page 13: CENG334 Introduction to Operating Systems

13

Hoare vs. Mesa Monitor SemanticsThe monitor notify() operation can have two different meanings:

Hoare monitors (1974) notify(CV) means to run the waiting thread immediately Causes notifying thread to block

Mesa monitors (Xerox PARC, 1980) notify(CV) puts waiting thread back onto the “ready queue” for the monitor But, notifying thread keeps running

Adapted from Matt Welsh’s (Harvard University) slides.

Page 14: CENG334 Introduction to Operating Systems

14

Hoare vs. Mesa Monitor SemanticsThe monitor notify() operation can have two different meanings:

Hoare monitors (1974) notify(CV) means to run the waiting thread immediately Causes notifying thread to block

Mesa monitors (Xerox PARC, 1980) notify(CV) puts waiting thread back onto the “ready queue” for the monitor But, notifying thread keeps running

What's the practical difference? In Hoare-style semantics, the “condition” that triggered the notify()

will always be true when the awoken thread runs For example, that the buffer is now no longer empty

In Mesa-style semantics, awoken thread has to recheck the condition Since another thread might have beaten it to the punch

Adapted from Matt Welsh’s (Harvard University) slides.

Page 15: CENG334 Introduction to Operating Systems

15

Hoare Monitor SemanticsHoare monitors (1974)

notify(CV) means to run the waiting thread immediately

Causes notifying thread to block

The signaling thread must wait outside the monitor (at least) until the signaled thread relinquishes occupancy of the monitor by either returning or by again waiting on a condition.

Page 16: CENG334 Introduction to Operating Systems

16

Mesa Monitor SemanticsMesa monitors (Xerox PARC,

1980) notify(CV) puts waiting thread back

onto the “ready queue” for the monitor But, notifying thread keeps running

Signaling does not cause the signaling thread to lose occupancy of the monitor. Instead the signaled threads are moved to the e queue.

Page 17: CENG334 Introduction to Operating Systems

17

Hoare vs. Mesa monitorsNeed to be careful about precise definition of signal and wait.

while (n==0) {wait(not_empty); // If nothing, sleep

}item = getItemFromArray(); // Get next item

Why didn’t we do this?

if (n==0) {wait(not_empty); // If nothing, sleep

}removeItemFromArray(val); // Get next item

Answer: depends on the type of scheduling Hoare-style (most textbooks):

Signaler gives lock, CPU to waiter; waiter runs immediately Waiter gives up lock, processor back to signaler when it exits critical section or if it

waits again Mesa-style (Java, most real operating systems):

Signaler keeps lock and processor Waiter placed on ready queue with no special priority Practically, need to check condition again after wait

Page 18: CENG334 Introduction to Operating Systems

18

Revisit: Readers/Writers ProblemCorrectness Constraints:

Readers can access database when no writers Writers can access database when no readers or writers Only one thread manipulates state variables at a time

State variables (Protected by a lock called “lock”): int NReaders: Number of active readers; initially = 0 int WaitingReaders: Number of waiting readers; initially = 0 int NWriters: Number of active writers; initially = 0 int WaitingWriters: Number of waiting writers; initially = 0 Condition canRead = NIL Conditioin canWrite = NIL

Page 19: CENG334 Introduction to Operating Systems

19

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) Signal(CanRead); else Signal(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; Signal(CanRead); }

Void EndRead() { if(--NReaders == 0) Signal(CanWrite);

}

Page 20: CENG334 Introduction to Operating Systems

20

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) Signal(CanRead); else Signal(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; Signal(CanRead); }

Void EndRead() { if(--NReaders == 0) Signal(CanWrite);

}

Page 21: CENG334 Introduction to Operating Systems

21

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; Signal(CanRead); }

Void EndRead() { if(--NReaders == 0) notify(CanWrite);

}

Page 22: CENG334 Introduction to Operating Systems

22

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; notify(CanRead); }

Void EndRead() { if(--NReaders == 0) notify(CanWrite);

}

Page 23: CENG334 Introduction to Operating Systems

23

Understanding the SolutionA writer can enter if there are no other active writers and no readers

are waiting

Page 24: CENG334 Introduction to Operating Systems

24

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; notify(CanRead); }

Void EndRead() { if(--NReaders == 0) notify(CanWrite);

}

Page 25: CENG334 Introduction to Operating Systems

25

Understanding the SolutionA reader can enter if

There are no writers active or waiting

So we can have many readers active all at once

Otherwise, a reader waits (maybe many do)

Page 26: CENG334 Introduction to Operating Systems

26

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; notify(CanRead); }

Void EndRead() { if(--NReaders == 0) notify(CanWrite);

}

Page 27: CENG334 Introduction to Operating Systems

27

Understanding the SolutionWhen a writer finishes, it checks to see if any readers are waiting

If so, it lets one of them enter That one will let the next one enter, etc…

Similarly, when a reader finishes, if it was the last reader, it lets a writer in (if any is there)

Page 28: CENG334 Introduction to Operating Systems

28

Readers and Writers

Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite;

Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); }

Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead);

--WaitingReaders; } ++NReaders; notify(CanRead); }

Void EndRead() { if(--NReaders == 0) notify(CanWrite);

}

Page 29: CENG334 Introduction to Operating Systems

29

Understanding the SolutionIt wants to be fair

If a writer is waiting, readers queue up If a reader (or another writer) is active or waiting, writers queue up

… this is mostly fair, although once it lets a reader in, it lets ALL waiting readers in all at once, even if some showed up “after” other waiting writers

Page 30: CENG334 Introduction to Operating Systems

30

The Big PictureThe point here is that getting synchronization right is hard

How to pick between locks, semaphores, condvars, monitors???

Locks are very simple for many cases. Issues: Maybe not the most efficient solution For example, can't allow multiple readers but one writer inside a standard lock.

Condition variables allow threads to sleep while holding a lock Just be sure you understand whether they use Mesa or Hoare semantics!

Semaphores provide pretty general functionality But also make it really easy to botch things up.

Adapted from Matt Welsh’s (Harvard University) slides.

Page 31: CENG334 Introduction to Operating Systems

31

CENG334Introduction to Operating Systems

Erol Sahin

Dept of Computer Eng.Middle East Technical University

Ankara, TURKEY

URL: http://kovan.ceng.metu.edu.tr/~erol/Courses/CENG334

Synchronization patternsTopics•Signalling•Rendezvous•Barrier

Page 32: CENG334 Introduction to Operating Systems

32

SignallingPossibly the simplest use for a semaphore is signaling, which means

that one thread sends a signal to another thread to indicate that something has happened.

Signaling makes it possible to guarantee that a section of code in one thread will run before a section of code in another thread; in other words, it solves the serialization problem.

Adapted from The Little Book of Semaphores.

Page 33: CENG334 Introduction to Operating Systems

33

SignallingImagine that a1 reads a line from a file, and b1 displays the line on the screen. The

semaphore in this program guarantees that Thread A has completed a1 before Thread B begins b1.

Here’s how it works: if thread B gets to the wait statement first, it will find the initial value, zero, and it will block. Then when Thread A signals, Thread B proceeds.

Similarly, if Thread A gets to the signal first then the value of the semaphore will be incremented, and when Thread B gets to the wait, it will proceed immediately.

Either way, the order of a1 and b1 is guaranteed.

Thread A

statement a1;

sem.up();

Thread B

sem.down();

statement b1;

semaphore sem=0;

Adapted from The Little Book of Semaphores.

Page 34: CENG334 Introduction to Operating Systems

34

RendezvousGeneralize the signal pattern so that it works both ways. Thread A

has to wait for Thread B and vice versa. In other words, given this code we want to guarantee that a1 happens before b2 and b1 happens before a2.

Your solution should not enforce too many constraints. For example, we don’t care about the order of a1 and b1. In your solution, either order should be possible.

Two threads rendezvous at a point of execution, and neither is allowed to proceed until both have arrived.

Thread A

statement a1;

statement a2;

Thread B

statement b1;

statement b2;

Adapted from The Little Book of Semaphores.

Page 35: CENG334 Introduction to Operating Systems

35

Rendezvous - HintGeneralize the signal pattern so that it works both ways. Thread A has to wait for

Thread B and vice versa. In other words, given this code we want to guarantee that a1 happens before b2 and b1 happens before a2.

Your solution should not enforce too many constraints. For example, we don’t care about the order of a1 and b1. In your solution, either order should be possible.

Two threads rendezvous at a point of execution, and neither is allowed to proceed until both have arrived.

Hint: Create two semaphores, named aArrived and bArrived, and initialize them both to zero. aArrived indicates whether Thread A has arrived at the rendezvous, and bArrived likewise.

Thread A

statement a1;

statement a2;

Thread B

statement b1;

statement b2;

semaphore aArrived=0;

semaphore bArrived=0;

Adapted from The Little Book of Semaphores.

Page 36: CENG334 Introduction to Operating Systems

36

Rendezvous - SolutionGeneralize the signal pattern so that it works both ways. Thread A has to wait for

Thread B and vice versa. In other words, given this code we want to guarantee that a1 happens before b2 and b1 happens before a2.

Your solution should not enforce too many constraints. For example, we don’t care about the order of a1 and b1. In your solution, either order should be possible.

Two threads rendezvous at a point of execution, and neither is allowed to proceed until both have arrived.

Hint: Create two semaphores, named aArrived and bArrived, and initialize them both to zero. aArrived indicates whether Thread A has arrived at the rendezvous, and bArrived likewise.

Thread A

statement a1;

aArrived.up();

bArrived.down();

statement a2;

Thread B

statement b1;

bArrived.up();

aArrived.down();

statement b2;

semaphore aArrived=0;

semaphore bArrived=0;

Adapted from The Little Book of Semaphores.

Page 37: CENG334 Introduction to Operating Systems

37

Rendezvous – A less efficient solution

This solution also works, although it is probably less efficient, since it might have to switch between A and B one time more than necessary.

If A arrives first, it waits for B. When B arrives, it wakes A and might proceed immediately to its wait in which case it blocks, allowing A to reach its signal, after which both threads can proceed..

Thread A

statement a1

bArrived.down()

aArrived.up()

statement a2

Thread B

statement b1;

bArrived.up();

aArrived.down();

statement b2;

semaphore aArrived=0;

semaphore bArrived=0;

Adapted from The Little Book of Semaphores.

Page 38: CENG334 Introduction to Operating Systems

38

Rendezvous – How about?

Thread A

statement a1

bArrived.down()

aArrived.up()

statement a2

Thread B

statement b1;

aArrived.down();

bArrived.up();

statement b2;

semaphore aArrived=0;

semaphore bArrived=0;

Adapted from The Little Book of Semaphores.

Page 39: CENG334 Introduction to Operating Systems

39

Barrier

rendezvous();

criticalpoint();

Rendezvous solution does not work with more than two threads.

Puzzle: Generalize the rendezvous solution. Every thread should run the following code:

The synchronization requirement is that no thread executes critical point until after all threads have executed rendezvous.

You can assume that there are n threads and that this value is stored in a variable, n, that is accessible from all threads.

When the first n − 1 threads arrive they should block until the nth thread arrives, at which point all the threads may proceed.

Adapted from The Little Book of Semaphores.

Page 40: CENG334 Introduction to Operating Systems

40

Barrier - Hint

n = thenumberofthreads;

count = 0;

Semaphore mutex=1, barrier=0;

count keeps track of how many threads have arrived. mutex provides exclusive access to count so that threads can increment it safely.

barrier is locked (zero or negative) until all threads arrive; then it should be unlocked (1 or more).

Adapted from The Little Book of Semaphores.

Page 41: CENG334 Introduction to Operating Systems

41

Barrier – Solution?n = thenumberofthreads;

count = 0;

Semaphore mutex=1, barrier=0;

rendezvous();

mutex.down();

count = count + 1;

mutex.up();

if (count == n) barrier.up();

else barrier.down();

Criticalpoint();

Since count is protected by a mutex, it counts the number of threads that pass. The first n−1 threads wait when they get to the barrier, which is initially locked. When the nth thread arrives, it unlocks the barrier.

What is wrong with this solution?Adapted from The Little Book of Semaphores.

Page 42: CENG334 Introduction to Operating Systems

42

Barrier – Solution?n = thenumberofthreads;

count = 0;

Semaphore mutex=1, barrier=0;

rendezvous();

mutex.down();

count = count + 1;

mutex.up();

if (count == n) barrier.up();

else barrier.down();

Criticalpoint();

Imagine that n = 5 and that 4 threads are waiting at the barrier. The value of the semaphore is the number of threads in queue, negated, which is -4.

When the 5th thread signals the barrier, one of the waiting threads is allowed to proceed, and the semaphore is incremented to -3. But then no one signals the semaphore again and none of the other threads can pass the barrier.

Adapted from The Little Book of Semaphores.

Page 43: CENG334 Introduction to Operating Systems

43

Barrier – Solutionn = thenumberofthreads;

count = 0;

Semaphore mutex=1, barrier=0;

rendezvous();

mutex.down();

count = count + 1;

mutex.up();

if (count == n) barrier.up();

else{

barrier.down();

barrier.up();

}

Criticalpoint();

The only change is another signal after waiting at the barrier. Now as each thread passes, it signals the semaphore so that the next thread can pass.

Adapted from The Little Book of Semaphores.

Page 44: CENG334 Introduction to Operating Systems

44

Barrier – Bad Solutionn = thenumberofthreads;

count = 0;

Semaphore mutex=1, barrier=0;

rendezvous();

mutex.down();

count = count + 1;

if (count == n) barrier.up();

barrier.down();

barrier.up();

mutex.up();

Criticalpoint();

Imagine that the first thread enters the mutex and then blocks. Since the mutex is locked, no other threads can enter, so the condition, count==n, will never be true and no one will ever unlock.

Adapted from The Little Book of Semaphores.

Page 45: CENG334 Introduction to Operating Systems

45

CENG334Introduction to Operating Systems

Erol Sahin

Dept of Computer Eng.Middle East Technical University

Ankara, TURKEY

URL: http://kovan.ceng.metu.edu.tr/ceng334

Real-world casesTopics:•Race conditions•Priority Inversion

Page 46: CENG334 Introduction to Operating Systems

46

Therac-25Computer-controlled radiation therapy machine

In operation between 1983 and 1987, 11 installations

Adapted from Matt Welsh’s (Harvard University) slides.

Page 47: CENG334 Introduction to Operating Systems

47

Therac-25Capable of delivering electron and photon (X-Ray) treatments

Completely computer controlled No hardware interlocks to prevent misconfigurations or overdoses!

All software written in PDP-11 assembly language

Cryptic error messages delivered to operator console “Malfunction 23” No documentation of these error codes No indication of which errors are potentially life-threatening

Lots of smoke and mirrors by the manufacturer Claimed that 10-11 chance of delivering wrong dose to patient No justification for this claim in the safety analysis documents

Adapted from Matt Welsh’s (Harvard University) slides.

Page 48: CENG334 Introduction to Operating Systems

48

AccidentsOn several occasions between June '85 and Jan '87

Massive overdoses to six people Some of these were lethal

Typical theraputic doses in the 200 rad range

Several overdoses delivered energy of 15,000 – 20,000 rads

Various lawsuits, all settled out of court

Initially, manufacturer claimed that overdoses were impossible

Adapted from Matt Welsh’s (Harvard University) slides.

Page 49: CENG334 Introduction to Operating Systems

49

The problem

Therac-25 operator console layout. The lethal computer error occurs when the operator accidentally sets the field (here in red) to "X", notices her mistake, then changes it to "E".

Adapted from Matt Welsh’s (Harvard University) slides.

Page 50: CENG334 Introduction to Operating Systems

50

Race Condition #1

After some trial and error, it was discovered that overdose could be caused by operator editing the dosage on the console too quickly

Operator would enter dosage on console Move cursor to bottom of screen, then move cursor back up to edit dosage

“Treat” task Periodically checks “entry done” flag

If flag is set, call subroutine to configure the magnets Configuring magnets takes about 8 sec

“Magnet” task Called periodically to check if magnets are ready Checks if edits have been made to dosage

If so, exits back to calling subroutine to restart the process Critical bug: Only checks if edits made on the first call!

How this led to overdose: Operator enters dosage: Triggers magnet setting routine Operator edits dosage while the magnets are being configured Magnet routine does not notice edits have been made after first call

Adapted from Matt Welsh’s (Harvard University) slides.

Page 51: CENG334 Introduction to Operating Systems

51

Race Condition #2

Second bug – totally different causes from the first

THERAC-25 has a “turntable” aperature that moves certain elements into the path of the beam

Field light mode used to position beam on patient No electron beam expected, instead, a light simulates the beam position Problem: Unfiltered beam exposed to patients on several occasions!

Electron scan magnet

Field light position(no electron beam)

X-Ray field flattner

Beam

Computer controls position of turntable

Adapted from Matt Welsh’s (Harvard University) slides.

Page 52: CENG334 Introduction to Operating Systems

52

Race Condition #21) Prescription entered on console

2) Operator must press “set” button to configure turntable

3) “Set up test” task runs periodically to check position of turntable Increments a variable “Class3” on each iteration If “Class3 == 0”, everything is ready and the dosage can begin Otherwise, a series of interlock checks are performed to ensure turntable in the correct

position These checks will set Class3 to 0 when they are complete

Can you spot the bug?

Adapted from Matt Welsh’s (Harvard University) slides.

Page 53: CENG334 Introduction to Operating Systems

53

Race Condition #2

The bug: “Class3” variable is 8 bits wide After 256 iterations of “set up test” routine, overflows and becomes zero! So, interlocking checks will not be performed Operator must press “set” button during the short interval that Class3 overflows

Fix: Set “Class3” to some nonzero value, rather than incrementing it Why was this done? Probably because “inc” instruction was easy enough...

Adapted from Matt Welsh’s (Harvard University) slides.

Page 54: CENG334 Introduction to Operating Systems

54

Mars PathfinderJuly 4, 1997 landing on Martian surface, followed by expeditions by Sojourner rover

Series of software glitches started a few days after landing Eventually debugged and patched remotely from Earth!

Read the full story at: http://www.ddj.com/184411097Adapted from Matt Welsh’s (Harvard University) slides.

Page 55: CENG334 Introduction to Operating Systems

55

VxWorks Operating SystemDeveloped by Wind River Systems – premier real time OS

Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks

Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor

Information Bus

Mutex

WeatherData Thread

CommunicationThread

Information BusThread

Obtain mutex; write data

Wait for mutex to read data

Adapted from Matt Welsh’s (Harvard University) slides.

Page 56: CENG334 Introduction to Operating Systems

56

VxWorks Operating SystemDeveloped by Wind River Systems – premier real time OS

Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks

Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor

Information Bus

Mutex

WeatherData Thread

CommunicationThread

Information BusThread

Free mutex

Adapted from Matt Welsh’s (Harvard University) slides.

Page 57: CENG334 Introduction to Operating Systems

57

VxWorks Operating SystemDeveloped by Wind River Systems – premier real time OS

Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks

Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor

Information Bus

Mutex

WeatherData Thread

CommunicationThread

Information BusThread

Lock mutex and read data

Adapted from Matt Welsh’s (Harvard University) slides.

Page 58: CENG334 Introduction to Operating Systems

58

Priority InversionWhat happens when threads have different priorities?

Information Bus

Mutex

WeatherData Thread

CommunicationThread

Information BusThread

Low priority Med Priority High priority

Adapted from Matt Welsh’s (Harvard University) slides.

Page 59: CENG334 Introduction to Operating Systems

59

Priority InversionWhat happens when threads have different priorities?

Information Bus

Mutex

WeatherData Thread

CommunicationThread

Information BusThread

Low priority Med Priority High priority

Interrupt!Schedule comm thread ... long running operation

Adapted from Matt Welsh’s (Harvard University) slides.

Page 60: CENG334 Introduction to Operating Systems

60

Priority InversionWhat happens when threads have different priorities?

Comm thread runs for a long time Comm thread has higher priority than weather data thread But ... the high priority info bus thread is stuck waiting!

This is called priority inversion

Information Bus

Mutex

WeatherData Thread

CommunicationThread

Information BusThread

Low priority Med Priority High priority

Adapted from Matt Welsh’s (Harvard University) slides.

Page 61: CENG334 Introduction to Operating Systems

61

What is the fix?Problem with priority inversion:

A high priority thread is stuck waiting for a low priority thread to finish its work In this case, the (medium priority) thread was holding up the low-prio thread

General solution: Priority inheritance If waiting for a low priority thread, allow that thread to inherit the higher priority High priority thread “donates” its priority to the low priority thread

Why does this fix the problem? Medium priority comm task cannot preempt weather task Weather task inherits high priority while it is being waited on

Adapted from Matt Welsh’s (Harvard University) slides.

Page 62: CENG334 Introduction to Operating Systems

62

How was this problem fixed?JPL had a replica of the Pathfinder system on the ground

Special tracing mode maintrains logs of all interesting system events e.g., context switches, mutex lock/unlock, interrupts

After much testing were able to replicate the problem in the lab

VxWorks mutex objects have an optional priority inheritance flag Engineers were able to upload a patch to set this flag on the info bus mutex After the fix, no more system resets occurred

Lessons: Automatically reset system to “known good” state if things run amuck

Far better than hanging or crashing Ability to trace execution of complex multithreaded code is useful Think through all possible thread interactions carefully!!

Adapted from Matt Welsh’s (Harvard University) slides.