2010 1 model exam

22
Model Examination Operating Systems / Besturingssystemen Nr: 1 Q: Which of these statements on system calls are not correct? A: (a) Invoking a system call triggers a transition from user to kernel space. (b) The system call interface contributes to the protection of the operating system. (c) The GUI API is part of the system call interface. (d) Invoking a system call has a similar performance than invoking a library function. Nr: 2 Q: Give a short explanation to the following three questions regarding microkernels. a) What is the main advantage of the microkernel approach to system design? b) How do user programs and system services interact in a microkernel architecture? c) What are the disadvantages of using the microkernel approach? A: a) Benefits typically include the following: (a) adding a new service does not require modifying the kernel, (b) it is safer and more secure as more operations are done in user mode than in kernel mode, and (c) a simpler kernel design and functionality typically results in a more reliable operating system. b) User programs and system services interact in a microkernel architecture by using interprocess communication mechanisms such as messaging. These messages are conveyed by the operating system. c) The primary disadvantages of the microkernel architecture are the overheads associated with interprocess communication and the frequent use of the operating systemʼs messaging functions in order to enable the user process and the system service to interact with each other. Nr: 3 Q: Describe four typical resources of a process / thread and state whether they are shared or disjoined when implementing a multi-process or multi-threaded program on a Linux system when using POSIX APIs to create processes and threads. A: Address space: process disjoint, thread: shared Instruction pointer: process: disjoint, thread: disjoint Stack pointer: process: disjoint, thread: disjoint Global variables: process disjoint, thread: shared

Upload: summrina-kanwal

Post on 20-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

2010 1 Model Exam

TRANSCRIPT

Page 1: 2010 1 Model Exam

Model ExaminationOperating Systems / Besturingssystemen

Nr: 1

Q: Which of these statements on system calls are not correct?

A: (a) Invoking a system call triggers a transition from user to kernel space.(b) The system call interface contributes to the protection of the operating

system.(c) The GUI API is part of the system call interface.(d) Invoking a system call has a similar performance than invoking a library

function.

Nr: 2

Q: Give a short explanation to the following three questions regarding microkernels.a) What is the main advantage of the microkernel approach to system design?b) How do user programs and system services interact in a microkernel

architecture?c) What are the disadvantages of using the microkernel approach?

A: a) Benefits typically include the following: (a) adding a new service does not require modifying the kernel, (b) it is safer and more secure as more operations are done in user mode than in kernel mode, and (c) a simpler kernel design and functionality typically results in a more reliable operating system.

b) User programs and system services interact in a microkernel architecture by using interprocess communication mechanisms such as messaging. These messages are conveyed by the operating system.

c) The primary disadvantages of the microkernel architecture are the overheads associated with interprocess communication and the frequent use of the operating systemʼs messaging functions in order to enable the user process and the system service to interact with each other.

Nr: 3

Q: Describe four typical resources of a process / thread and state whether they are shared or disjoined when implementing a multi-process or multi-threaded program on a Linux system when using POSIX APIs to create processes and threads.

A: Address space: process disjoint, thread: sharedInstruction pointer: process: disjoint, thread: disjointStack pointer: process: disjoint, thread: disjointGlobal variables: process disjoint, thread: shared

Page 2: 2010 1 Model Exam

Nr: 4

Q: Add the names of the states the process state diagram. Redraw whole diagram on the answer sheet.

!"#$%&"'

&($)'

$*)&++,-)'./0&",1&+'"$.-!)/0'

234'5+'&6&*)'/5#-1&75*'

234'5+'&6&*)'8!$)'

A:

!"#$%&'!"()

&"*)

#+&&%&,)

#"'(-)

*'%.&,)

'($%/"()

"0%!)

%&!"##+1!)234"(+5"#)(%21'!34)

678)9#)":"&!)39$15".9&)

678)9#)":"&!)*'%!)

Nr: 5

Q: Mark these elements that are usually part of a Process Control Block (PCB)?a) Instruction pointerb) Free frame listc) List of open filesd) User IDe) Global Variables

Page 3: 2010 1 Model Exam

A: a) Instruction pointerb) Free frame listc) List of open filesd) User IDe) Global Variables

Nr: 6

Q: What are typical benefits of multithreaded programming compared to using multiple processes within your application?

A: (a) Smaller memory footprint of resulting application.(b) Faster context-switches between threads.(c) Easier programming and less communication overhead.(d) Easier sharing of resources and data.(e) Usage of multiple CPU-cores.

Nr: 7

Q: Which of the following components of program state are shared across threads in a multithreaded process?a) Register valuesb) Heap memoryc) Global variablesd) Stack memory

A: a) Register valuesb) Heap memoryc) Global variablesd) Stack memory

Nr: 8

Page 4: 2010 1 Model Exam

Q: The program shown below uses the Pthreads API. What would be the output of the program at LINE C and LINE P?

#include <pthread.h>#include <stdio.h>

int value = 0;

void *runner(void *param);

int main (int argc, char *argv[]) {! int pid;! pthread_t tid;! pthread_attr_t attr;

! pid = fork();

! if (pid == 0) {! ! pthread_attr_init(&attr);! ! pthread_create(&tid,&attr,runner,NULL);! ! pthread_join(tid,NULL);! ! printf(“C: %d\n”, value); /* LINE C */! else if (pid > 0) {! ! wait(NULL);! ! printf(“P: %d\n”, value); /* LINE P */! }}

void *runner(void *param) {! value = 5;! pthread_exit(0);}

A: C: 5P: 0

Nr: 9

Q: Give the exponential average formula used to predict the length of the next CPU burst. What are the implications of assigning the following values to the parameters used by the algorithm?a." α = 0 and τ0 = 100 millisecondsb." α = 0.99 and τ0 = 10 milliseconds

A: τn+1 = α tn + (1-α) τn (0.5 credit)

When α = 0 and τ0 = 100 milliseconds, the formula always makes a prediction of 100 milliseconds for the next CPU burst.When α = 0.99 and τ0 = 10 milliseconds, the most recent behavior of the process is given much higher weight than the past history associated with the process. Consequently, the scheduling algorithm is almost memoryless, and simply predicts the length of the previous burst for the next quantum of CPU execution.

Page 5: 2010 1 Model Exam

Nr: 10

Q: Mark these statements on scheduling that are true?a) Preemptive scheduling always leads to better CPU utilization compared to

non-preemptive scheduling.b) Preemptive scheduling allows the OS to enforce fair scheduling which cannot

be guaranteed in non-preemptive scheduling.c) With preemptive scheduling, the scheduler can schedule another process in

case a process blocks on an I/O call. This cannot be done with non-preemptive scheduling.

d) Preemptive scheduling does not require special hardware support.

A: a) Preemptive scheduling always leads to better CPU utilization compared to non-preemptive scheduling.

b) Preemptive scheduling allows the OS to enforce fairness which cannot be guaranteed in non-preemptive scheduling.

c) With preemptive scheduling, the scheduler can schedule another process in case a process blocks on an I/O call. This cannot be done with non-preemptive scheduling.

d) Preemptive scheduling does not require special hardware support.

Nr: 11

Q: Consider the following set of processes, with the length of the CPU-burst time given in milliseconds:

Process" Burst Time" PriorityP1" " 10" " 3P2" " 3" " 1P3" " 1" " 5P4" " 4" " 4P5" " 5" " 2

The processes are assumed to have arrived in the order P1, P2, P3, P4, P5, all at time 0.

a) Draw four Gantt charts illustrating the execution of these processes using FCFS, SJF, a nonpreemptive priority (a smaller priority number implies a higher priority), and RR (quantum = 1) scheduling.

b) What is the waiting time of each process for each of the scheduling algorithms in part a) and which of the schedules results in the minimal average waiting time?

Page 6: 2010 1 Model Exam

A: a) !"#$

%$ &$ '$ ($ )$ *$ +$ ,$ -$ .$ &%$ &&$ &'$ &($ &)$ &*$ &+$ &,$ &-$ &.$ '%$

&$ '$ ($ )$ *$

($ *$'$ )$ &$

'$ *$ &$ )$ ($

&$ '$ ($ )$ *$ &$ '$ )$ *$ &$ '$ )$ &$ )$ &$

/0/1$

12/$

3456$

44&$

*$ *$ &$ *$

'&$ ''$ '($

b)

Proc." FCFS "SJF" PRIO"RR11" 0" 13" 8" 132" 10" 1" 0" 83" 13" 0" 22" 24" 14" 4" 18" 115" 18" 8" 3" 13AVG" 11" 5,2" 10,2" 9,4SJF has shortest average waiting time."

Nr: 12

Q: A husband and his wife are accessing their shared banking account in parallel. Initial balance is 500 EUR. The husband is withdrawing 100 EUR. The wife is depositing 200 EUR. Assume that the bank offers the following two operations to manipulate the accountʻs balance:

int read_balance();void store_balance(int new_balance);

a) Explain why there might be a problem and what the result of this might be.b) What needs to be ensured to prevent this problem and how can this be

achieved?

Page 7: 2010 1 Model Exam

A: a) The accounts current balance can be seen as a shared variable. Parallel access may lead to a race condition, so we have a critical section problem:

void change_account(int amount) {" int balance;" entry_section();" balance = read_balance();" balance = balance + amount;" store_balance(balance);" exit_section();}

Depending on interleaving of operations, results may be 400 EUR, 600 EUR, or 700 EUR as new balance.

b) The critical section must be protected to fulfill the three requirements mutual exclusion, progress, and bounded waiting. This can, e.g., be achieved using monitors, semaphores, or similar synchronization mechanisms.

Nr: 13

Q: Mark those statements on synchronization mechanisms that are true?a) Binary semaphores are also known as monitors.b) Semaphores are implemented using busy waiting.c) On single processor systems, atomicity of semaphoreʻs wait() and signal()

operations can be ensured by disabling interrupts during these functions.d) To experience the priority inversion problem, your system must have at least

three priority levels.

A: a) Binary semaphores are also known as monitors.b) Semaphores are implemented using busy waiting.c) On single processor systems, atomicity of semaphoreʻs wait() and signal

() operations can be ensured by disabling interrupts during these functions.

d) To experience the priority inversion problem, your system must have at least three priority levels.

Nr: 14

Q: Which of the listed alternatives is not a strategy to address the deadlock problem?a) Deadlock preventionb) Deadlock shiftingc) Deadlock avoidanced) Deadlock detection

Page 8: 2010 1 Model Exam

A: a) Deadlock preventionb) Deadlock shiftingc) Deadlock avoidanced) Deadlock detection

Nr: 15

Q: Explain the definitions of base and limit registers and also provide a diagram to illustrate their use to provide memory protection in contiguous memory allocation.

A: The base register holds the smallest legal physical memory address.The limit register specifies the range of the legal physical address.Graphics: See Fig 8.6. in the book.

Nr: 16

Q: Paging is an important memory-management scheme. a) In your own words, explain the reason why paging is used in most operating

systems (not considering virtual memory management). b) In order to implement a paging scheme, frames and pages are very important. In

your own words, explain frames and pages and also give the reason why the size of either page or frame size is typically a power of 2?

c) Consider a paging system with the page table stored in memory. If a memory reference takes 200 nanoseconds, explain how long a paged memory reference will take?

A: a) It permits the physical address space of a process to be noncontiguous; avoids external fragmentation and the need for compaction; solves the problem of fitting memory chunks of varying sizes onto the backing store.

b) Frames: breaking physical memory into fixed-sized blocks. Pages: breaking logical memory into fixed-size blocks. Paging is implemented by breaking up an address into a page and offset number. It is most efficient to break the address into X page bits and Y offset bits, rather than performing arithmetic on the address to calculate the page number and offset. Because each bit position represents a power of 2, splitting an address between bits results in a page size that is a power of 2.

c) 400 nanoseconds: 200 nanoseconds to access the page table and 200 nanoseconds to access the word in memory.

Nr: 17

Q: What is the cause of thrashing? How does the system detect thrashing? Once it detects thrashing, what can the system do to eliminate this problem?

A: Thrashing is caused by under allocation of the minimum number of pages required by a process, forcing it to continuously page fault.The system can detect thrashing by evaluating the level of CPU utilization as compared to the level of multiprogramming.Reducing the level of multiprogramming can eliminate it.

Nr: 18

Page 9: 2010 1 Model Exam

Q: Given a physical memory consisting of only three frames, all frames are initially empty:a) How many page faults occur for the reference string given below for each of the

following replacement algorithms: (1) FIFO; (2) OPT and (3) LRU? In addition to counting the page faults, please give short explanations for these three algorithms (basic concepts).

1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 6b) Of the three page replacement algorithms considered here (according to the

results of question a)), which one is the best and which one is impossible to implement exactly? Give some short explanation.

A: a) (1) FIFO: first-in, first-out page replacement algorithm. A FIFO replacement algorithm associates with each page the time when that page was brought into memory. When a page must be replaced, the oldest page is chosen.11 page faults occur.

(2) OPT: optimal page replacement algorithm. Replace the page that will not be used for the longest period of time.7 page faults occur.

(3) LRU: least-recently-used page replacement algorithm. Replace the page that has not been used for the longest period of time.11 page faults occur.

b) Best one is OPT since it causes less page faults and the memory access time is less. OPT is impossible to implement since it requires future knowledge of the reference string.

Nr: 19

Q: List major operations in direct file access and illustrate what happens to the file pointer (cp) in each operation.

A: a) Reset" " cp=0;b) Read next " " read cp; cp=cp+1;c) Write next"" " write cp;cp=cp+1;

Page 10: 2010 1 Model Exam

Nr: 20

Q: Mark the correct statements.a) Disk partition can not be used as raw.b) RAID can be used for disk protection against failure.c) Acyclic-graph directories never allow shared suddirectories and files. d) Mean time to repair is average time it takes to replace a failed disk and to

restore the data on it.

A: a) Disk partition can not be used as raw.b) RAID can be used for disk protection against failure.c) Acyclic-graph directories never allow shared suddirectories and files. d) Mean time to repair is the average time it takes to replace a failed disk and

to restore the data on it.

Nr: 21

Q: Mark typical layers of a file system.

a) Logical file systemb) I/O controlc) Hash tabled) Basic file systeme) Extended file systemf) FAT systemg) File-organization module

A: a) Logical file systemb) I/O controlc) Hash tabled) Basic file systeme) Extended file systemf) FAT systemg) File-organization module

Nr: 22

Q: Some file systems allow disk storage to be allocated at different levels of granularity. For instance, a file system could allocate 4KB of space as a single 4KB block or as eight 512-byte blocks. How could we use this flexibility to improve performance? What modifications would have to be made to the free-space management scheme in order to support this feature?

A: Such a scheme would decrease internal fragmentation. If a file is 5KB, than it could be allocated a 4KB block and two contiguous 512-byte blocks. In addition to maintaining a bitmap of free blocks, one would also have to maintain extra state regarding which of the subblocks are currently being used inside a block. This allocator would then have to examine this extra state to allocate subblocks and coalesce the subblocks to obtain the larger block when all of the subblocks become free.

Page 11: 2010 1 Model Exam

Nr: 23

Q: Illustrate and shortly explain the FAT structure for a file consisting of disk blocks 220,498,19,300.

A: a) Figure 11.7 in the bookb) A FAT(File Allocation Table) is a method of disk space allocation and

represents a variant of linked allocation. A section of disk at the beginning of each volume is set aside to contain the table. The directory entry contains the block number of the first block of the file. The table entry indexed by that block contains the block number of the next block in the file. This chain continues until it reaches the last block, which has a special end-of-file value as the table entry.

Nr: 24

Q: Fill in: (p506)

Transfer rate is____________________________________________________

Random-acess time is______________________________________________

A: Transfer rate is the rate at which data flow between the drive and the computer. Random-access time consists of seek time (time necessary to move the disk arm to the desired cylinder) and rotational latency time (time necessary for the desired sector to rotate to the disk head).

Nr: 25

Q: Shortly explain the SSTF scheduling algorithm. What is schedule sequence for the given example? What is the total head movement?

Example:Suppose that a disk drive has 300 cylinders (0-299). The disk needs to read cylinders 85, 170,50,125,2,200,65,67. The currrent head position is cylinder 90.

A: The SSTF algorithm selects the request with the least seek time from the current or initial head position. In the other words, this algorithm chooses the block closest to the current position. In given example, schedule sequence is: 90-85-67-65-50-2-125-170-200Total head movement is: 286

Nr: 26

Page 12: 2010 1 Model Exam

Q: Mark those statements that are true.

a) FCFS always reads the block closest to the current head position.b) For the queue sequence 1,5,8,15 and current head position 9, the SSTF

algorithm will next read block 8.c) In the SCAN algorithm the head continuously scans back and forth across the

disk. d) LOOK scheduling implements a variant of the FCFS algorithm.

A: a) FCFS always reads the block closest to the current head position.b) For the queue sequence 1,5,8,15 and current head position 9, the SSTF

algorithm will next read block 8.c) In the SCAN algorithm the head continuously scans back and forth across

the disk. d) LOOK scheduling implements a variant of the FCFS algorithm.

Nr: 27

Q: What is the purpose of interrupts? What are the differences between a trap and an interrupt? Can a user program generate traps intentionally? If so, what is the purpose?

A: An interrupt is a hardware-generated change-of-flow within the system. An interrupt handler is summoned to deal with the cause of the interrupt; control is then returned to the interrupted context and instruction. A trap is a software-generated interrupt. An interrupt can be used to signal the completion of an I/O to obviate the need for device polling. A trap can be used to call operating system routines or to catch arithmetic errors.

Nr: 28

Q: Consider the following I/O scenarios on a single-user PC. a) A mouse used with a graphical user interface b) A disk drive containing user files For each of these I/O scenarios, would you design the operating system to use buffering, spooling, caching, or a combination? Would you use polled I/O, or interrupt-driven I/O? Give reasons for your choices.

A: a) Buffering may be needed to record mouse movement during times when higher-priority operations are taking place. Spooling and caching are inappropriate. Interrupt-driven I/O is most appropriate.

b) Buffering can be used to hold data while in transit from user space to the disk, and visa versa. Caching can be used to hold disk-resident data for improved performance. Spooling is not necessary because disks are shared-access devices. Interrupt-driven I/O is best for devices such as disks that transfer data at slow rates.

Nr: 29

Page 13: 2010 1 Model Exam

Q: Mark those statements on access control matrixes that are correct.a) Rows represent objectsb) Rows represent domainsc) Columns represent objectsd) Columns represent domainse) Can be expanded to include dynamic protectionf) Cannot be expanded for dynamic protectiong) Access control matrix are not used in modern operating systemsh) Separates the mechanism from the policy

A: a) Rows represent objectsb) Rows represent domainsc) Columns represent objectsd) Columns represent domainse) Can be expanded to include dynamic protectionf) Cannot be expanded for dynamic protectiong) Access control matrix are not used in modern operating systemsh) Separates the mechanism from the policy

Nr: 30

Q: Here is an excerpt from the output of an “ls -l /var/www” command:

drwxrwxrw- 1 wwwdata wwwdata 115K Dec 24 21:18 htdocs

1) Can the user mike (member of the group staff) execute the commandtouch /var/www/htdocs/info.php? If not, explain why.

2) Give the commands required so that any users of group webauthors can access and edit files that are created in /var/www/htdocs. It is a requirement that owner and group of the directory remains as is.

3) Write the sequence of commands that the wwwdata user should execute in order to allow the user root to write files in the htdocs directory.

A: 1) No, the user mike cannot write any file inside the directory because (i) he is not owner of the file, (ii) not member of group wwwdata, and (iii) other users do have write permissions but cannot access (x) the directory.

2) setfacl –m g:webauthors:rwx,d:g:webauthors:rwx ./htdocs3) There is no need to execute any command, root has write permissions on any

file

Nr: 31

Q: Give a short explanation:1) What are the flags setuid/setgid used for, and how do they apply to files and

directories?2) What is the flag sticky used for? Does the sticky flag also limit root?

A: 1) setuid and setgid allow users to run an executable with the permissions of the executable's owner or group respectively. In case of a directory, files are created with those specified UID/GID.

2) The sticky flag is used to force the operating system to allow only the directoryʼs owner to delete it. No, root can always delete files and directories.

Page 14: 2010 1 Model Exam

Nr: 32

Q: Describe a stack overflow and how an attacker can take advantage of it. Explain the term shellcode.

A: The attacker inject some code that write to local local (stack-)variables beyond array boundaries.

Code overwrites the return address of current function on the stack: the new return address points to beginning of the exploited buffer. Code gets called when current function exits.

The content that is written is usually executable code, crafted to execute, for example, the command line interpreter (shell).

Nr: 33

Q: Name at least 4 protections against stack overflows, and briefly describe how they work.

A: Use of type-safe languages: cannot overrun buffers

Stack Guard / Stack Protector with Canaries: place random nonce on the stack just below return address. Verify before return. Will detect overwrite.

Inverted Stack: Stack grows from bottom to top. Overflowing an array in a function will not affect return address (on same lexical level).

Address Space Randomization: Addresses on stack change whenever a program gets restarted.

W^X Memory Management: A page is either Writeable or eXecutable but not both at the same time.

NX Bit for memory pages: CPU can flag pages as non-executable and traps if IP is set to this page.

Nr: 35

Q: What is a Kernel Module? For what purposes are they often used in modern operating systems? Write the code used to initialize and destroy a Linux Kernel Module and to print “Hello World from the Kernel!”.

Page 15: 2010 1 Model Exam

A: A KM is a piece of software that runs inside the kernel memory, can access kernel data structures, etc.

A KM can be used to access a physical device (such as a printer) or to add functionalities to the OS (e.g., a new file system).

1 credit for each of the above

#include <linux/init.h> // needed by printk#include <linux/module.h> // needed by any LKM#include <linux/kernel.h> // needed by any LKM

MODULE_LICENSE("Dual BSD/GPL"); // kernel will complain without this

int start(void) { " printk(KERN_INFO "Hello World from the Kernel\n");" // couldnʼt be loaded " return 0; } void stop(void) { " printk(KERN_INFO "Cleaning up.\n");" //" for messages }

module_init(start); // macro to set the init functionmodule_exit(stop); // macro to set the exit function

Nr: 36

Q: Mark the correct statements.

a) A modern Linux OS has about 100 syscalls.b) After a syscall ends, the OS transits from Ring0 to Ring3 and gives control back

to the user-space application.c) The function printf() is a system call.d) A Kernel Model cannot ever subvert the operating system.e) malloc() is used to allocate memory within a kernel module.f) A modern Linux kernel is a micro-kernel.

A: a) A modern Linux OS has about 100 syscalls.b) After a syscall ends, the OS transits from Ring0 to Ring3 and gives control

back to the user-space application.c) The function printf() is a system call.d) A Kernel Model cannot ever subvert the operating system.e) malloc() is used to allocate memory within a kernel module.f) A modern Linux kernel is a micro-kernel.

Page 16: 2010 1 Model Exam

Reading Assignment:

The following text is from an article from IBM Developer Works, June 2006.Read it carefully and answer the following questions. Answer the questions in your own words, do not excessively cite the article.

Background information on the O() notation:O-notation gives you an idea how much time an algorithm will use. The time for an O(n) algorithm depends on the input (linear with n), whereas O(n^2) is quadratic to the input. O(1) is independent of the input and operates in constant time.

Expected reading time: 30 min.

Page 17: 2010 1 Model Exam

Inside the Linux scheduler –The latest version of this all-important kernel component improves scalabilityM. Tim Jones, Consultant Engineer, Emulex

This article reviews the Linux 2.6 task scheduler and its most important attributes. But before diving into the details of the scheduler, let's understand a scheduler's basic goals.

What is a scheduler?An operating system, in a general sense, mediates between applications and available resources. Some typical resources are memory and physical devices. But a CPU can also be considered a resource to which a scheduler can temporarily allocate a task (in quantities called slices of time). The scheduler makes it possible to execute multiple programs at the same time, thus sharing the CPU with users of varying needs.An important goal of a scheduler is to allocate CPU time slices efficiently while providing a responsive user experience. The scheduler can also be faced with such conflicting goals as minimizing response times for critical real-time tasks while maximizing overall CPU utilization. Let's see how the Linux 2.6 scheduler accomplishes these goals, compared to earlier schedulers.

Problems with earlier Linux schedulersBefore the 2.6 kernel, the scheduler had a significant limitation when many tasks were active. This was due to the scheduler being implemented using an algorithm with O(n) complexity. In this type of scheduler, the time it takes to schedule a task is a function of the number of tasks in the system. In other words, the more tasks (n) are active, the longer it takes to schedule a task. At very high loads, the processor can be consumed with scheduling and devote little time to the tasks themselves. Thus, the algorithm lacked scalability.The pre-2.6 scheduler also used a single runqueue for all processors in a symmetric multiprocessing system (SMP). This meant a task could be scheduled on any processor -- which can be good for load balancing but bad for memory caches. For example, suppose a task executed on CPU-1, and its data was in that processor's cache. If the task got rescheduled to CPU-2, its data would need to be invalidated in CPU-1 and brought into CPU-2.The prior scheduler also used a single runqueue lock; so, in an SMP system, the act of choosing a task to execute locked out any other processors from manipulating the runqueues. The result was idle processors awaiting release of the runqueue lock and decreased efficiency.Finally, preemption wasn't possible in the earlier scheduler; this meant that a lower priority task could execute while a higher priority task waited for it to complete.

Introducing the Linux 2.6 schedulerThe 2.6 scheduler was designed and implemented by Ingo Molnar. Ingo has been involved in Linux kernel development since 1995. His motivation in working on the new scheduler was to create a completely O(1) scheduler for wakeup, context-switch, and timer interrupt overhead. One of the issues that triggered the need for a new scheduler was the use of Java™ virtual

Page 18: 2010 1 Model Exam

machines (JVMs). The Java programming model uses many threads of execution, which results in lots of overhead for scheduling in an O(n) scheduler. An O(1) scheduler doesn't suffer under high loads, so JVMs execute efficiently.The 2.6 scheduler resolves the primary three issues found in the earlier scheduler (O(n) and SMP scalability issues), as well as other problems. Now we'll explore the basic design of the 2.6 scheduler.

Major scheduling structuresLet's start with a review of the 2.6 scheduler structures. Each CPU has a runqueue made up of 140 priority lists that are serviced in FIFO order. Tasks that are scheduled to execute are added to the end of their respective runqueue's priority list. Each task has a time slice that determines how much time it's permitted to execute. The first 100 priority lists of the runqueue are reserved for real-time tasks, and the last 40 are used for user tasks (see Figure 1). You'll see later why this distinction is important.

Figure 1: The Linux 2.6 scheduler runqueue structure.In addition to the CPU's runqueue, which is called the active runqueue, there's also an expired runqueue. When a task on the active runqueue uses all of its time slice, it's moved to the expired runqueue. During the move, its time slice is recalculated (and so is its priority; more on this later). If no tasks exist on the active runqueue for a given priority, the pointers for the active and expired runqueues are swapped, thus making the expired priority list the active one.The job of the scheduler is simple: choose the task on the highest priority list to execute. To make this process more efficient, a bitmap is used to define when tasks are on a given priority list. Therefore, on most architectures, a find-first-bit-set instruction is used to find the highest priority bit set in one of five 32-bit words (for the 140 priorities). The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of priorities. This makes the 2.6 scheduler an O(1) process because the time to schedule is both fixed and deterministic regardless of the number of active tasks.

Page 19: 2010 1 Model Exam

Better support for SMP systemsSo, what is SMP? It's an architecture in which multiple CPUs are available to execute individual tasks simultaneously, and it differs from traditional asymmetrical processing in which a single CPU executes all tasks. The SMP architecture can be beneficial for multithreaded applications.Even though the prior scheduler worked in SMP systems, its big-lock architecture meant that while a CPU was choosing a task to dispatch, the runqueue was locked by the CPU, and others had to wait. The 2.6 scheduler doesn't use a single lock for scheduling; instead, it has a lock on each runqueue. This allows all CPUs to schedule tasks without contention from other CPUs.In addition, with a runqueue per processor, a task generally shares affinity with a CPU and can better utilize the CPU's hot cache.

Task preemptionAnother advantage of the Linux 2.6 scheduler is that it allows preemption. This means a lower-priority task won't execute while a higher-priority task is ready to run. The scheduler preempts the lower-priority process, places the process back on its priority list, and then reschedules.

But wait, there's more!As if the O(1) nature of the 2.6 scheduler and preemption weren't enough, the scheduler also offers dynamic task prioritization and SMP load balancing. Let's discuss what these are and the benefits they provide.

Dynamic task prioritizationTo prevent tasks from hogging the CPU and thus starving other tasks that need CPU access, the Linux 2.6 scheduler can dynamically alter a task's priority. It does so by penalizing tasks that are bound to a CPU and rewarding tasks that are I/O bound. I/O-bound tasks commonly use the CPU to set up an I/O and then sleep awaiting the completion of the I/O. This type of behavior gives other tasks access to the CPU.Because I/O-bound tasks are viewed as altruistic for CPU access, their priority is decreased (a reward) by a maximum of five priority levels. CPU-bound tasks are punished by having their priority increased by up to five levels.Tasks are determined to be I/O-bound or CPU-bound based on an interactivity heuristic. A task's interactiveness metric is calculated based on how much time the task executes compared to how much time it sleeps. Note that because I/O tasks schedule I/O and then wait, an I/O-bound task spends more time sleeping and waiting for I/O completion. This increases its interactive metric.It's important to note that priority adjustments are performed only on user tasks, not on real-time tasks.

SMP load balancingWhen tasks are created in an SMP system, they're placed on a given CPU's runqueue. In the general case, you can't know when a task will be short-lived or when it will run for a long time. Therefore, the initial allocation of tasks to CPUs is likely suboptimal.To maintain a balanced workload across CPUs, work can be redistributed, taking work from an overloaded CPU and giving it to an underloaded one. The Linux 2.6 scheduler provides this functionality by using load balancing. Every 200ms, a processor checks to see whether the CPU loads are unbalanced; if they are, the processor performs a cross-CPU balancing of tasks.A negative aspect of this process is that the new CPU's cache is cold for a migrated task (needing to pull its data into the cache).

Page 20: 2010 1 Model Exam

Remember that a CPU's cache is local (on-chip) memory that offers fast access over the system's memory. If a task is executed on a CPU, and data associated with the task is brought into the CPU's local cache, it's considered hot. If no data for a task is located in the CPU's local cache, then for this task, the cache is considered cold.It's unfortunate, but keeping the CPUs busy makes up for the problem of a CPU cache being cold for a migrated task.

Page 21: 2010 1 Model Exam

Nr: 37

Q: Describe the two main problems that were addressed by the new scheduler in the 2.6 kernel? By what mechanisms are they addressed?

A: The older kernels had an O(n) scheduler that scaled linearly with the number of tasks to be scheduled. The new O(1) scheduler can do this in linear time. This is especially important for systems running Java VMs that often come with a lot of threads.In addition, the scheduler includes a load balancing feature and run-queues per processor that allow better performance of SMP systems.

Nr: 38

Q: What were the main mechanisms that allow the new scheduler to schedule tasks in O(1)?

A: Having active and expired runqueues with individual lists per priority allows the scheduler to calculate time-slices without going through tasks lists.

FIFO ordering of priority runqueues allows selection of next task independent of queuelength.

Using a bitmap and a find-first-bit-set instruction, selecting the next task to be run does not even depend significantly on the number of priorities.

Nr: 39

Q: Mark those statements that are wrong.

a) On SMP systems, the pre-2.6 scheduler could schedule each job only on same CPU.

b) The O(1) scheduler is an instance of a multi-level feedback scheduling system.

c) There size of the bitmap is determined by the number of priorities.d) On an SMP system with 4 processors, the O(1) scheduler needs to lock the 4

runqueues when scheduling a new process for processor 1.e) If load balancing initiates a transfer of a task from one runqueue to that of

another process, the destinations cache is hot.f) Priority adjustment is done for both real- and non-real-time tasks.g) The O(1) scheduler achieves processor affinity and load balancing.h) I/O bound tasks will have a higher interactivity metric.

Page 22: 2010 1 Model Exam

A: a) On SMP systems, the pre-2.6 scheduler could schedule each job only on same CPU.

b) The O(1) scheduler is an instance of a multi-level feedback scheduling system.

c) There size of the bitmap is determined by the number of priorities.d) On an SMP system with 4 processors, the O(1) scheduler needs to lock

the 4 runqueues when scheduling a new process for processor 1.e) If load balancing initiates a transfer of a task from one runqueue to that

of another process, the destinations cache is hot.f) Priority adjustment is done for both real- and non-real-time tasks.g) The O(1) scheduler achieves processor affinity and load balancing.h) I/O bound tasks will have a higher interactivity metric.