inside the linux kernel

40
UnixForum Chicago - March 8, 2001 Daniel P. Bovet University of Rome "Tor Vergata" INSIDE THE LINUX KERNEL

Upload: paulos

Post on 14-Jan-2016

64 views

Category:

Documents


0 download

DESCRIPTION

INSIDE THE LINUX KERNEL. UnixForum Chicago - March 8, 2001. Daniel P. Bovet University of Rome "Tor Vergata". WHAT IS A KERNEL? (1/2). it’s a program that runs in Kernel Mode CPUs run either in Kernel Mode or in User Mode - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: INSIDE THE LINUX KERNEL

UnixForum Chicago - March 8, 2001

Daniel P. BovetUniversity of Rome "Tor Vergata"

INSIDE THE LINUX KERNEL

Page 2: INSIDE THE LINUX KERNEL

WHAT IS A KERNEL? (1/2)

it’s a program that runs in Kernel Mode

CPUs run either in Kernel Mode or in User Mode

when in User Mode, some parts of RAM can’t be addressed, some instructions can’t be executed, and I/O ports can’t be accessed

when in Kernel Mode, no restriction is put on the program

Page 3: INSIDE THE LINUX KERNEL

WHAT IS A KERNEL? (2/2)

besides running in Kernel Mode, kernels have three other peculiarities:

large size (millions of machine language instructions)

machine dependency (some parts of the kernel must be coded in Assembly language)

loading into RAM at boot time in a rather primitive way

Page 4: INSIDE THE LINUX KERNEL

ENTERING THE KERNEL PROGRAM (1/2)

when the CPU is running in User Mode

User Mode

Kernel Mode

Page 5: INSIDE THE LINUX KERNEL

ENTERING THE KERNEL PROGRAM (2/2)

when the CPU is running in Kernel Mode

User Mode

Kernel Mode

Page 6: INSIDE THE LINUX KERNEL

NESTED KERNEL INVOCATIONS

some similarity with nested function calls

AB

C

different because events causing kernel invocations are not (usually) related to the running program

Page 7: INSIDE THE LINUX KERNEL

KERNEL ENTRY POINTS

Kernel

software interrupt --->

I/O device requires attention --->

time interval elapsed --->

hardware failure --->

faulty instruction --->

Page 8: INSIDE THE LINUX KERNEL

IS AN INSTRUCTION REALLY FAULTY?

faulty instructions may occur for two distinct reasons:

programming error

deferred allocation of some kind of resource

the kernel must be able to identify the reason that caused the exception

Page 9: INSIDE THE LINUX KERNEL

EXCEPTIONS RELATED TO DEFERRED ALLOCATION

two cases of deferred allocation of resources in Linux

page frames (demand paging, Copy On Write)

floating point registers

Page 10: INSIDE THE LINUX KERNEL

WHY IS A KERNEL SO COMPLEX?

large program with many entry points

must offer disk caching to lower average disk access time

must support run nested kernel invocations --> must run with the interrupts enabled most of the time

must be updated quite frequently to support new hardware circuits and devices

Page 11: INSIDE THE LINUX KERNEL

HW CONCURRENCY (1/2)

I/Odevice

I/OAPIC

CPUIRQ

INT

INT ACK

the I/O APIC polls the devices and issues interrupts

no new interrupt can be issued until the CPU acknowledges the previous one

good kernels run with interrupts enabled most of the time

Page 12: INSIDE THE LINUX KERNEL

HW CONCURRENCY (2/2)

Symmetrical MultiProcessor architectures (SMP) include two ore more CPUs

SMP kernels must be able to execute concurrently on available CPUs

one service routine related to networking runs on a CPU while another routine related to file system runs concurrently on another CPU

Page 13: INSIDE THE LINUX KERNEL

LIMITING KERNEL SIZE

try to distribute kernel functions in smaller programs that can be linked separately

two approaches: microkernels and modules

Linux prefers modules for reasons of efficiency

Page 14: INSIDE THE LINUX KERNEL

MICROKERNELS

only a few functions such as process scheduling, and interprocess communication are included into the microkernel

other kernel functions such as memory allocation, file system handling, and device drivers are implemented as system processes running in User Mode

microkernels introduce a lot of interprocess communication

Page 15: INSIDE THE LINUX KERNEL

MODULES (1/2)

modules are object files containing kernel functions that are linked dynamically to the kernel

Linux offers an excellent support for implementing and handling modules

Page 16: INSIDE THE LINUX KERNEL

MODULES (2/2)

bpt

object module mmm.o

ab

z

kernel symbol table

externalreferencesto kernelsymbols

thanks to the kernel symbol table, it is possible to defer linking of an object module

Page 17: INSIDE THE LINUX KERNEL

MODULES AND DISTRIBUTIONS

modern computer architectures based on PCI busses support autoprobe of installed I/O devices while booting the system

recent Linux distributions put all non-critical I/O drivers into modules

at boot time, only the I/O modules of identified I/O devices are dynamically linked to the kernel

Page 18: INSIDE THE LINUX KERNEL

SUPPORT TO CLIENT/SERVER APPLICATIONS

scenario: many tasks executing concurrently on a common address space (for instance, a web server handling thousands of requests per second)

problem: implementing each client request as a new process causes a lot of overhead

process creation/elimination are time-consuming kernel functions

Page 19: INSIDE THE LINUX KERNEL

THE THREAD SOLUTION

introduce a new kernel object called thread

each process includes one or more threads

all threads associated with a given process share the same address space

CPU scheduling is done at the thread level (Windows NT)

thread switching is more efficient than process switching

Page 20: INSIDE THE LINUX KERNEL

THE CLONE SOLUTION

introduce groups of lightweight processes called clones that share a common address space, opened files, signals, etc.

CPU scheduling is done at the process level in a standard way

clones have been invented by Linux

the npmt_pthread or the dexter module used by the Linux version of Apache 2.0 are both based on clones

Page 21: INSIDE THE LINUX KERNEL

LINUX PEARLS

we selected in a rather arbitrary way a few pearls related to two distinct kernel design areas:

clever design choices

efficient coding

Page 22: INSIDE THE LINUX KERNEL

CLEVER DESIGN CHOICES

isolate the architecture-dependent code

rely on the VFS abstraction

avoid over-designing

Page 23: INSIDE THE LINUX KERNEL

ISOLATE THE ARCHITECTURE-DEPENDENT CODE (1/2)

Linux source code includes two architecture-dependent directories: /usr/src/linux/arch and /usr/src/linux/include

arch

i386 ….. s390

include

asm asm-i386 …. asm-s390

Page 24: INSIDE THE LINUX KERNEL

ISOLATE THE ARCHITECTURE-DEPENDENT CODE (2/2)

the schedule() function invokes the switch_to() Assembly language function to perform process switching

the code for switch_to() is stored in the include/asm/system.h file

depending on the target system, the asm symbolic link is set to asm-i386, asm-s390, etc.

Page 25: INSIDE THE LINUX KERNEL

RELY ON THE VFS ABSTRACTION

VFS is an abstraction for representing several kinds of information containers (IC) in a common way

standard operations on ICs: open(), close(), seek(), ioctl(), read(), write()

VFS associates a logical inode with each opened IC

Page 26: INSIDE THE LINUX KERNEL

EXAMPLES OF ICs

files stored in a disk-based filesystem

files stored in a network filesystem

disk partitions

kernel data structures (/proc filesystem)

RAM content (/dev/mem)

RAM disk (/dev/ram0)

serial port (/dev/ttyS0)

Page 27: INSIDE THE LINUX KERNEL

AVOID OVER-DESIGNING

Linux scheduler is simple and works for most applications

no attempt to transform Linux into a real-time system

Page 28: INSIDE THE LINUX KERNEL

A GENERAL-PURPOSE SCHEDULER

the scheduler of the System V Release 4 provides a set of class-independent routines that implement common services

object-oriented approach based on scheduling class: the scheduler represents an abstract base class, and each scheduling class acts as a subclass

Page 29: INSIDE THE LINUX KERNEL

A HEATED DISCUSSION

If the Linux development community is not responsive to the end user community, refusing to incorporate necessary functionality on the basis of aesthetics, then that community will abandon Linux in favor of something else. Is that really what you want?

Yes - If it turns into a pile of shit they'll abandon it even faster. I'd rather have a decent OS that works and does the right thing for most people than a single OS that tries to do everything and does nothing right (Alan Cox)

Page 30: INSIDE THE LINUX KERNEL

EXAMPLES OF EFFICIENT CODING

retrieving the process descriptor of the running process

handling dynamic timers

catching invalid addresses passed as system call parameters

Page 31: INSIDE THE LINUX KERNEL

RETRIEVING THE PROCESS DESCRIPTOR OF THE

RUNNING PROCESS (1/3)

classic solution: introduce an array current[NCPU] whose components point to the process descriptors of the processes running on the CPUs

clever solution: store the process Kernel Mode stack and the process descriptor into contiguous addresses so that the value of the CPU stack pointer register (esp register) is linked to that of the process descriptor

Page 32: INSIDE THE LINUX KERNEL

DESCRIPTOR OF THE RUNNING PROCESS (2/3)

Kernel Mode stack + process descriptor are stored in 2 contiguous page frames (8 KB)

fixed-length process descriptor

variable-length Kernel Modestack

esp

Page 33: INSIDE THE LINUX KERNEL

DESCRIPTOR OF THE RUNNING PROCESS (3/3)

fixed-length process descriptor

variable-length Kernel Modestack

esp

Mask

value of esp register: 0x00bdbad4

mask: 0xffffd000

starting address of process descriptor 0: 0x00bda000

Page 34: INSIDE THE LINUX KERNEL

HANDLING DYNAMIC TIMERS (1/3)

I/O drivers and user applications may create hundreds of timers

find an efficient way to check at each timer interrupt whether at least one timer has expired

trivial solution: maintain a list of timers ordered by increasing decaying times and start checking from the first element of the list

Page 35: INSIDE THE LINUX KERNEL

HANDLING DYNAMIC TIMERS (2/3)

clever solution (timing wheel): use percolation and maintain strict ordering only for the next 256 ticks (in Linux- i386, one tick = 10 ms)

use several lists of timers

Page 36: INSIDE THE LINUX KERNEL

HANDLING DYNAMIC TIMERS (3/3)

0 1 2 …… 255 0 1 2 …… 63

index incremented by 1 once every tick

index incremented by 1 once every 256 ticks

tv1: tv2:

when tv1 becomes empty, it is replenished byemptying one slot of tv2, and so forth

Page 37: INSIDE THE LINUX KERNEL

CATCHING INVALID ADDRESSES (1/4)

many system calls require one or more addresses specified as parameters

invalid addresses passed as parameters should not cause a system crash

classic solution: perform a preliminary check before servicing the system call

clever solution: defer checking until an exception caused by the invalid occurs in Kernel Mode

Page 38: INSIDE THE LINUX KERNEL

CATCHING INVALID ADDRESSES (2/4)

deferred checking is more efficient since system calls are issued most of the times with correct parameters

if an addressing error occurs in Kernel Mode, the kernel must be able to distinguish whether it is caused by a faulty process or whether by a kernel bug

in the first case, the kernel sends a SIGSEGV signal to the faulty process

Page 39: INSIDE THE LINUX KERNEL

CATCHING INVALID ADDRESSES (3/4)

clever idea: force the kernel to use always the same group of functions when copying data to or from the process address space

if an addressing error occurs while doing that, the CPU will signal the address of the instruction that contained an invalid address operand

Page 40: INSIDE THE LINUX KERNEL

CATCHING INVALID ADDRESSES (4/4)

the kernel knows from the address of the faulty instruction that it belongs to one of the functions used to access data in the process address space

it can then execute some kind of “fixup code”: as a result, the system call returns an error code