comp. architecture – assumed known

Advanced Operating Systems CA549 Dublin City University 2003 1.1

Comp. Architecture – assumed known

• Basic structure– CPU, ROM, RAM, I/O– Busses– Memory hierarchy – registers, cache, RAM, disk

• Fetch execute cycle– Machine code

• PC, SP, CC, stacks

• Interrupts– Hardware

Enabling, Disabling– Software– Vectors, Priority– System clock

• DMA

• Granularity– a = b + c how many machine instructions?

• Other architecture features such as TLB and Modes not assumed known


Let’s Operate

Key task

Loading a program into computer’s RAM memory

Running it

Monitoring what it does, and getting answer

Once upon a time ..

Use front panel switches and lights

Repeat:

Set up RAM address

Set up data

Until all program code and data have been entered

Set up register contents, in particular the start address in PC

Hit ‘Run’ button or keep hitting ‘Single Step’ button

Watch selected register or RAM location on lights


Loader to the rescue

Key problem – to do the above more easily

e.g. to load code and data direct from punched cards

Need : program to control the external device that reads data

able to detect errors – data is stored with a checksum

able to specify where in RAM to put code and data

- this may be specified in the input, ‘absolute address format’

- may be decided at load time, ‘relocatable’

Hierarchy of loader programs - Bootstrapping

Basic loader – loaded using Front Panel (usually in PROM today)

Absolute binary loader – loaded using basic loader

Relocating loader – loaded using absolute binary loader

User code and data is loaded using the relocating loader,

or perhaps using the absolute binary loader

Makes sense to keep loader in memory permanently if possible – PROM

Loaders are ‘system software’, not written by the user


Operator’s Tasks (1)

Key task

To load and run user program to process user data

Main steps

Set up the user’s machine code cards on the input device

Run the relocating loader – if not there, bootstrap it

Set up the machine code cards for the library routines

Continue the loader

Set up the user’s data cards on the input device

Run the user’s program – this then reads in its input cards

Once upon a time, above steps would be done by user

Then they became the job of the ‘operator’

Now, they are largely done by the ‘operating system’

Note the amount of time likely to be wasted setting things up


Operator’s Tasks (2)

Other tasks

Using other system software

Often, user provides source code, which must be compiled.

Operator loads up compiler, feeds source code as input.If compiles o.k., takes machine code and proceeds as above.If not o.k., gives error printout to user

Sometimes, operator asked to run system utility, e.g. sort

Now, these tasks mainly done by OS

Managing the system

Is this user authorised? * suborn the operator

Is this the correct program * fool the operator

Maintaining security * suborn the operator

Making backups * steal / copy the backups

These tasks are still done by hand – ‘system administrators’

* major security vulnerability


Improving operations

Key problem

Too much CPU time wasted while things being set up

Simple ‘batch’ operation

Set up multiple jobs (user code and data) for input as a ‘batch’

Now the CPU is idle for a smaller proportion of the time

But – needs a fancier version of the loader, a ‘monitor’

Monitor (N.B. this word is also used to with another meaning)

Able to read in code for a job, transfer control to it, get control back, then read in code for next job, and so on.

Understands ‘Job Control Language’ used on control cards which separate one job from another, and also on cards separating code from data within a job.

Contains loader, device drivers, JCL interpreter. Most of it stays resident in RAM permanently if possible.

Takes a lot of load of the operator – primitive ‘operating system’


Improving throughput - Spooling

Key problem

Too much CPU time wasted while external devices are reading or writing, e.g. card reader or printer

Spooling

When CPU needs input, provide it from a fast device, tape or disk

When CPU does output, ‘spool’ it to a fast device, tape or disk.

When CPU is idle, have it copy input cards onto fast device

When CPU is idle, have it send output from fast device to slow device.

Note there are now multiple threads of control. CPU can be outputting results for one job to printer, while processing another job’s input.

The CPU is still likely to be idle a lot waiting on input or output. Why not have additional jobs in memory for it to switch to, so waste fewer cycles?

Hence Multiprogramming


Multiprogramming (1)

Many programs in memory at the same time. Operating system allocates CPU in turn to runnable ones. Many threads of control / processes.

Involves interrupt hardware and a system timer.

Only one program running at any moment in uniprocessor system.

If this program is stopped, values in registers etc. must be saved if program is to be restarted later (note the etc. – it covers a lot)

This state information is sometimes called the program’s ‘context’

cf. http://www.linuxgazette.com/issue23/flower/context.html

process = program + context

The OS keeps the information on each process in a Process Control Block (PCB)

When the running process is stopped by the OS, its PCB is updated by the OS, and put on a queue to be continued later.

The OS then decides which of the halted processes should continue, restores the registers etc. from its PCB, and that process resumes.


Multiprogramming (2)

What happens:

Suppose Process A is running, Process B,C,D ready to run.

Interrupt occurs

may be caused by timer, or by Process A making a system call

Kernel of OS handles the interrupts

saves A’s context in its PCB, puts this PCB on queue

Scheduler determines that C (say) should run next

Dispatcher copies C’s PCB data back into registers etc.

Last register copied back is typically the Program Counter, so process C is now running.

Referred to as a process switch - save everything into PCB,or context switch – expect to be back in a moment, save less


Processes vs. threads

A process typically ‘owns’ a certain range of addresses, access to certain devices and files, and other resources allocated to it by the OS, as well as the register values and its stack space.

As a process is switched in and out, what it ‘owns’ will not change much. The register values and stack will most likely change a lot.

Some systems use a Thread Control Block (TCB) to hold the register values, keep track of the stack, keep track of whether runnable, and some other stuff. For a context switch, only the TCB needs to be involved. Process switching involves the whole PCB including the TCP.

Some systems allow multiple threads and multiple TCBs per process. All of them share the resources ‘owned’ by the process.

In an existing process, the overheads of creating a new thread, deleting a thread, switching between threads, or terminating a thread are low compared to creating, switching or terminating processes.

An example - the Java run time environment can be implemented as one process with multiple threads.


Multiprogramming problems

In a multi-programmed setup, cannot assume any order in the way threads of control are scheduled by the system.

When resources are shared by different threads, various problems emerge that are not present in a single threaded setup. They can be very tricky to find, as they can depend on a scheduling of the threads which happens very rarely.

Lost update : One thread works on data in the middle of another thread working on it, and result is overwritten.

Synchronisation : How to get consumer thread to wait on producer thread

Deadlock : Each thread needs a resource held by the other, both are blocked.

Solving these kinds of problems usually involves being able to ensure mutual exclusion – while one thread is working on shared data, able to exclude others from working on it.

The OS is responsible for setting up, scheduling, and closing down threads, and is the obvious location for tools to allow them co-operate and avoid these problems.

Multithreading in Java gives a simple example of how things can go wrong.


Multithreading in Java (1)

class Globals {

public int count,outs,ins;

Globals(int initcount) {

count = initcount;

outs = ins = 0;

}

public void updatecount(int i) {

count += i;

}

}



class Producer extends Thread {

Globals current;

public Producer(Globals current) {

this.current = current;

}

public void run() {

while(true) {

current.updatecount(1);

current.ins++;

//System.out.println(“Produced " + current.ins);

}

}

}



class Consumer extends Thread {

Globals current;

public Consumer(Globals current) {

this.current = current;

}

public void run() {

while(true) {

current.updatecount(-1);

current.outs++;

//System.out.println(“Consumed " + current.ins);

}

}

}



class Mythreads {

public static void main(String[] args) {

Globals current = new Globals(0);

Consumer outthread = new Consumer(current);

Producer inthread = new Producer(current);

inthread.start();

outthread.start();

while(true) {

System.out.println("Diff = "

+ (current.ins - current.outs - current.count)

+ " Count = " + current.count

+ " In = " + current.ins + " Out = " + current.outs);

}

}

}


Multithreading in C (1)

A C compiler can generate calls to the OS routines for creating and managing threads. In C these calls are mostly explicitly made by the programmer. The Java run time system hides this lower level from the programmer, hiding the OS calls it makes. We’ll be using C a lot.

C uses ‘fork()’ to start new thread which then continues same program as existing thread. Difference is that fork() returns process id of new thread to calling thread, returns 0 to child thread. Many other OS calls used.

main()

{

int pid; /* local variable */

pid = fork(); /* new thread, both threads continue */

if (pid) { /* true if non-zero – so in original caller thread */

caller thread activities go here – usually loop

} else { /* pid == 0, so in child thread */

child thread activities go here – usually loop

}

}


The Trouble with Threads

• They access shared data (almost always)

• They break in on each other at totally unpredictable times

Result : bad behaviour, such as in ‘Multithreading in Java 1 – 4’, which behaves unpredictably and not as might be expected.

Cure : we can’t avoid sharing data, but maybe we can prevent break-ins. (locks come into this somewhere)

Want to make some sections of code ‘atomic’, i.e. can’t break into – these are known as ‘critical sections’, only one thread can be in a CS at a time, others must wait.

Want ‘mutual exclusion’ – if one thread is working on some shared data, make other threads wait to get at it until first thread is done with it – put processing of shared data in a critical section.

Need a kind of ‘lock’ – if open, a thread can pass through, setting the lock to prevent other threads coming through after it - these then have to wait until the thread that has the lock releases it again.

In Java, each object has a ‘lock’, which a thread can try to acquire by using the keyword ‘synchronized’ - can be used to fix problems above.

In C, we interact directly with OS services to control threads.

More on these later, but first some thoughts on locks


The Trouble with Locks 1

What’s to stop two threads trying to set the lock at the same time?

This can actually happen in a multiprocessor system. With a single processor, both threads can’t be active simultaneously, but one might be just setting the lock when the other gets in to set it, and both end up thinking they can go ahead – lost update type of problem again.

Need to make the ‘lock’ operation itself atomic. Trickier than it seems – usually depends on some ‘atomic’ hardware feature, though pure software solutions do exist (cf Dekker’s Algorithm or Peterson’s Algorithm)

When a thread tries to execute a ‘locked’ section of code, it must wait, but how?

The disappointed thread might just keep trying, checking the lock over and until it is open and the thread can get it. If the lock gets set for only a very brief time this ‘busy waiting’ is o.k., but otherwise it is very wasteful of machine cycles, usually best avoided.

Instead, suspend the dissappointed thread and put it on a queue or set of threads waiting for this lock. Now when lock is released, need to be able to resume one of the waiting threads.

(So there is more to Java’s ‘synchronized’ than meets the eye)


The Trouble with Locks 2

Making the ‘lock’ operation atomic

At the lowest level, i.e. the hardware, two main approachs

Approach 1 : Turn off interrupts,

if lock not set then set itelse prepare to wait,

turn on interrupts

Often used on single processor systems – since interrupts are needed to break into a running thread, nothing can break into the thread that turns off interrupts. N.B. should only turn off interrupts for very short time. But won’t work on multiprocessor system, as threads keep running on other processors.

Approach 2 :

register = set, exchange(register,lock), if (register==set) wait

Requires an ‘atomic’ machine instruction that can exchange data in memory with data in a register. If the lock was already set, wait, otherwise you have set it, proceed. Works for multiprocessor.

These are used to build more convenient higher level constructs, such as ‘semaphores’, ‘monitors’ and ‘message passing’


Semaphores

Low level mechanism for mutual exclusion

Typically provided at the operating system level

Due to Dijkstra, who based it on train signals (i.e. semaphores)

Basically,

an integer variable, a queue for waiting threads,

and two operations, wait and signal, both’atomic’

Used (different versions exist):

init initialise the variable to something >= 0 (often 0)

wait if (--variable < 0)

suspend thread and put on this semaphore’s queue

signal if (++variable <= 0)

take a suspended process from the semaphore’s queue and put it on ‘ready to run’ queue used by operating system.

Widely used at OS level – care needed, as easy to get mixed up if many semaphores and lots of code


Monitors

Programming language level construct for mutual exclusion

Developed by Tony Hoare, also Per Brinch Hansen

N.B. totally different to batch job ‘monitor’ mentioned earlier.

Easier to use, fewer programming errors, than semaphores.

Compiler may implement it using operating system semaphore calls.

Basically

very like an object - data and methods - but with a lock and a queue for waiting threads

thread must acquire the lock in order to use any part of monitor, so only one thread using any part of monitor at a time

lock is re-opened when thread leaves monitor

lock also re-opened when thread in monitor waits on a condition – also need a way to keep track of threads that are waiting on a condition

need to be able to signal waiting threads when condition becomes true, so that one of them will again try to re-obtain the lock, continue in the monitor from where it did the wait

Modified monitor structure is used in Java to support multithreading


Locking hierarchy

The locking needed to support use of multithreading comes in various guises, typically :

High Level Language Level

Monitor type construct – easier for programmers to use

Compiler may implement monitor by generating calls to operating system semaphore functions

Programmers may also be able to use semaphores directly

Operating System Level

Message passing (distributed operating systems)

Semaphores or similar constructs

Trickier to program with than monitors

Implementation usually based on atomic hardware operations

Computer Hardware

Disable interrupts (single processor system)

Atomic ‘exchange’ or ‘test and set’ machine instructions


Java Threads - Locking

Quick summary

Based on monitor concept, but not exactly the same

Every object has a lock, a set to hold threads waiting on the lock, and a set to hold threads that have called wait().

A method can be made a critical section by using ‘synchronized’ – this obtains the lock on the object on method entry, releases it on exit, in between a thread that tries to use a synchronized method of the object must wait.

Lock can be obtained instead by using synchronized(myobject) { instructions} - now the {instructions} cannot be broken into by another thread which also uses synchronized(myobject) { … } N.B. if another thread does not use ‘synchronized(myobject)’, it can get in at any time, since it does not try to acquire the ‘myobject’ lock.

Once a thread has acquired a lock it can acquire it again many times, e.g. by nested calls to synchronised methods in the same class. The lock must be released as many times as it was gained to unlock the object.

wait() causes thread to release the lock, become suspended in wait() set.

notify() causes thread in wait() set to be moved to the ‘waiting on lock’ set. Method that used notify() runs on to completion. Moved thread will get the object lock again sometime – it then continues from where it called wait().

comp. architecture – assumed known

Documents