csc 660: advanced operating systemsslide #1 csc 660: advanced os system calls

40
CSC 660: Advanced Operating Systems Slide #1 CSC 660: Advanced OS System Calls

Upload: maria-holt

Post on 31-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #1

CSC 660: Advanced OS

System Calls

Page 2: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #2

A Different Kind of C

1. No access to C library.

2. ISO C99 + GNU C extensions.

3. No memory protection.

4. Small fixed-size (8KB) stack.

5. Limited floating point support.

6. Concurrency and synchronization.

7. Portability.

8. Coding style and idioms.

9. Debugging.

Page 3: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #3

No access to C library

Why not?Bootstrapping (C library uses system calls…)

Performance and size.

Kernel equivalent functionsUse lib/string.c for string operations.

Use printk() instead of printf()

Page 4: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #4

ISO C 99

Inline Functionsstatic inline void dog(int tail)

Struct Assignmentstruct file_operations fops = {

.read = device_read,

.write = device_write,

.open = device_open,

.release = device_release

};

Page 5: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #5

GNU C

Inline Assembly (asm or __asm__ keyword)asm ( assembler template : output operands: input operands: list of clobbered registers

); Example from arch/i386/signal.c: __asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp));

Branch AnnotationOptimize branch for most likely decision.likely() and unlikely() macros

Page 6: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #6

GNU C

asmlinkageFunction attribute to allow C functions to be called

from assembly language (prevents parameters being placed in registers.)

volatileWarns compiler that variable may be changed

asynchronously by other threads (prevents compiler from optimizing away reads.)

static inlineInline function expansion to improve speed.

Page 7: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #7

No Memory Protection

Kernel traps illegal memory access for usersSends SIGSEGV to kill offending process.

No one to look out for kernel.Memory violations result in kernel oops.

Kernel memory is not pageable.Uses physical memory, not swap space.

Page 8: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #8

Small Fixed Stack

Kernel stack is 2 4KB pagesCannot create many local variables.

No deep recursion.

Page 9: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #9

Floating Point

Floating point used to be handled by FPU.Integrated into CPU with 80486DX.Still performed with ESCAPE instructions.

FPU has own FP registers.Shared with MMX unit.Not saved by default on context switch.

Must use FP carefully in kernelCall kernel_fpu_begin() before using FPU.Call kernel_fpu_end() after using FPU.

Page 10: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #10

Concurrency

Asynchronous interruptsInterrupt handlers may access resources at the same time as

your function.Multiprocessing

Another processor may be executing function at the same time.

Preemptive kernelScheduler can preempt your kernel thread in favor of

another thread.Synchronization Solutions

SpinlocksSemaphors

Page 11: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #11

Portability

Kernel runs on 22 architectures.Different endianess.

Different word sizes.

Different page sizes.

Kernel code must beEndian neutral

64-bit clean

No assumptions about word or page size.

Page 12: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #12

Portability

A char is always 8 bits (may be signed or unsigned).

A short is currently 16 bits on all archs.

An int is currently 32 bits on all archs.

A long may be 32 or 64 bits.

A pointer may be 32 or 64 bits.

Use explicitly sized types when necessary:

s8,u8,s16,u16,s32,u32,s64,u64

Use opaque types for portabilityatomic_t, pid_t

Page 13: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #13

Coding StyleIndentation

Tabs that are 8-characters in length.

BracesConditionals/loops: initial { at end of statementif (foo) {…

} else {…

}Functions: { on separate lineint foo(){…

}

Page 14: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #14

Coding Style

NamingLower case, words separated by underscores.

Use descriptive names, especially for globals.

FunctionsNo longer than 2 screens of text.

Fewer than 10 local variables.

CommentsDescribe what and why, not how your code works.

IfdefsRestrict them to include (.h) files.

Page 15: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #15

Idioms

do { stmt1; stmt2 } while (0)Found in macros.

Allows multi-statement macros in if/else

Heavy use of bit operatorsand(&), or(|), xor(^), not(~)

Heavy use of gotoOften used to exit control structures on error.

Page 16: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #16

Kernel Debugging: Oops

An oops is a major kernel failure.Ex: dereferencing a null pointer

If kernel cannot recover, a panic results.

Information sent to consoleText description

Register contents

Stack backtrace

Page 17: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #17

Kernel Debugging: OopsUnable to handle kernel NULL pointer dereference at virtual address

00000000c0203c18EIP: 0060:[<c0203c18>] Not taintedUsing defaults from ksymoops -t elf32-i386 -a i386EFLAGS: 00010086eax: c137a800 ebx: c0e80200 ecx: c1379050 edx: 00000000esi: c137a800 edi: c13d0000 ebp: 00000246 esp: c13d1f2cds: 007b es: 007b ss: 0068Stack: c1379050 00000002 c137a800 00000008 00000000 c137a800 c02060b3

c137a800 0001221e 00000000 c030b004 c030b000 c13fdc10 c02037c0 c137a800 00000293 c0125b6d 00000000 c13fdc28 c13fdc20 c13d0000 c13d0000 c13d0000 00000000 Call Trace: [<c02060b3>] is_complete+0x2c3/0x310 [<c02037c0>] run+0x30/0x40 [<c0125b6d>] worker_thread+0x1bd/0x2b0 [<c0203790>] run+0x0/0x40 [<c0113b10>] default_wake_function+0x0/0x20 [<c0108fd6>] ret_from_fork+0x6/0x20 [<c0113b10>] default_wake_function+0x0/0x20 [<c01259b0>] worker_thread+0x0/0x2b0

Page 18: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #18

printk()

Robust and callable except early in bootEnable early_printk() option for that.

Sends output to klog circular log bufferklogd reads /proc/kmsgsyslogd gets data from klogd

writes to a file under /var/logcan also access with dmesg

Message priorities0(high) .. 7(low)Named: KERN_EMERG, _ALERT, _CRIT, _ERR, _WARNING, _NOTICE, _INFO, _DEBUG

Page 19: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #19

Printing Debugging Information

printk()Assertions

BUG_ON(bad_condition) causes oops

Panicsif (terrible_condition) panic(“Terrible condition!”);

Stack tracesif (!debug_check) {printk(KERN_DEBUG “Check x failed\n”);dump_stack();

}

Page 20: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #20

System Calls

System calls provide the interface between user programs and kernel.

1. Abstracted hardware interface.

2. Security and stability.

3. Allows virtualization.

Programmers typically don’t invoke system calls directly, but rather use libc library calls.

Page 21: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #21

libc: C standard library/POSIX API

System call users– malloc()

– free()

– exec()

– fork()

– printf()

– fopen()

– fputc()

– fclose()

– socket()

Non-system call users– asin()

– log()

– sin()

– strcmp()

– strcpy()

– atoi()

– bsearch()

– qsort()

– rand()

Page 22: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #22

User to Kernel Mode Transition

Page 23: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #23

Hello World> cat >hello.c#include <stdio.h>

int main(int argc, char *argv[]) { printf("Hello world!\n"); return 0;}> gcc –o hello hello.c> ltrace ./hello__libc_start_main(0x8048394, 1, 0xbffff914,

0x80483b8, 0x8048400 <unfinished ...>printf("Hello world!\n"Hello world!) = 13+++ exited (status 0) +++

Page 24: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #24

Hello World>strace ./helloexecve("./hello", ["./hello"], [/* 40 vars */]) = 0uname({sys="Linux", node="tara", ...}) = 0brk(0) = 0x804a000access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such

file or directory)old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|

MAP_ANONYMOUS, -1, 0) = 0xb7fe9000open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such

file or directory)open("/etc/ld.so.cache", O_RDONLY) = 3fstat64(3, {st_mode=S_IFREG|0644, st_size=50648, ...}) = 0old_mmap(NULL, 50648, PROT_READ, MAP_PRIVATE, 3, 0) =

0xb7fdc000close(3) = 0access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such

file or directory)open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\215Y\

1"..., 512) = 512fstat64(3, {st_mode=S_IFREG|0644, st_size=1222116, ...}) = 0

Page 25: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #25

Hello Worldold_mmap(NULL, 1232428, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0)

= 0xb7eaf000old_mmap(0xb7fd1000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|

MAP_FIXED, 3, 0x121000) = 0xb7fd1000old_mmap(0xb7fda000, 7724, PROT_READ|PROT_WRITE, MAP_PRIVATE|

MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7fda000close(3) = 0old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|

MAP_ANONYMOUS, -1, 0) = 0xb7eae000set_thread_area({entry_number:-1 -> 6, base_addr:0xb7eae080,

limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0

munmap(0xb7fdc000, 50648) = 0fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136,

3), ...}) = 0mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|

MAP_ANONYMOUS, -1, 0) = 0xb7fe8000write(1, "Hello world!\n", 13Hello world!) = 13munmap(0xb7fe8000, 4096) = 0exit_group(0) = ?

Page 26: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #26

Using a System Call

ApplicationCalls printf()

C library (glibc)printf() function issues write() system call.

Kernelwrite() system call manages output.

sets global errno variable if an error occurs.

returns to user application

Page 27: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #27

Making a System Call

Software InterruptHistorically: int $0x80Modern: sysenter

System Call NumberPut in %eax register before interruptsys_call_table in arch/i386/kernel/entry.S

Parameters1-5 args: %ebx, %ecx, %edx, %esi, %edi6+ args: one register has pointer to user space params

Returning Return from software interrupt: iret or sysexitReturn value stored in %eax register.

Page 28: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #28

System Call Handler

Invoked on all system calls.

Functionality:1. Saves register contents.

2. Reads syscall number from %EAX.

3. Invokes system call service routine found at sys_call_table + 4 * %EAX.

4. Stores syscall return value over stack %EAX.

5. Restores registers (moving return val to %EAX)

6. Switch from Kernel Mode to User Mode

Page 29: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #29

Kernel System Call Handlerarch/i386/kernel/entry.S

ENTRY(system_call) pushl %eax # save orig_eax SAVE_ALL GET_THREAD_INFO(%ebp) # system call tracing in operation testb $(_TIF_SYSCALL_TRACE|

_TIF_SYSCALL_AUDIT),TI_flags(%ebp) jnz syscall_trace_entry cmpl $(nr_syscalls), %eax jae syscall_badsyssyscall_call: call *sys_call_table(,%eax,4) movl %eax,EAX(%esp) # store return valuesyscall_exit: cli

movl TI_flags(%ebp), %ecx testw $_TIF_ALLWORK_MASK, %cx # current->work jne syscall_exit_workrestore_all: RESTORE_ALL

Page 30: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #30

System Call Parameters• Must use registers instead of stack

– System call runs in kernel mode.– User process doesn’t have access to kernel stack.– Copying user stack to kernel stack is slow.

• Register limitations– Parameter size <= register size (32 bits)– x86 only has a few registers, so #params limited.

• Solutions– Pass large parameters by reference.– If >6 params needed, use reference to params in memory.– System call handler saves registers to stack before calling

system call service routine, allowing service routine to use parameters like a normal C function.

Page 31: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #31

Verifying Parameters

• Must ensure users cannot access files, processes, or memory that they don’t have permission to access.

• Before accessing a user pointer, must ensure

1. Pointer points to user, not kernel memory.2. Pointers points to region of memory in

process’s address space.3. Access is permitted by memory access

restrictions (read, write, execute.)

Page 32: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #32

Accessing User Memory

• To copy user memory to kernel memory with the appropriate safety checks, use– copy_from_user(kern_buf, user_buf, len)– copy_to_user(user_buf, kern_buf, len)

• Both functions return number of bytes they failed to copy on error, 0 on success.– Syscall returns –EFAULT on such an error.

Page 33: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #33

System Call Errors

• System calls return errors as -ESYMBOL

• Error #s in include/asm-generic/errno-base.h– ENOSYS: No such system call.– EPERM: Permission denied.– EAGAIN: Try again.– EIO: I/O error

• User program API returns a -1 error value.– Actual error # stored in errno variable.

Page 34: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #34

Adding a System Call

1. Write system call function sys_mycall.2. Add entry to end of sys_call_table

In arch/i386/kernel/entry.S add.long sys_mycall

3. Define system call number for userIn include/asm-i386/unistd.h#define __NR_mycall 289

4. Update max # of system calls.5. Compile kernel.

Page 35: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #35

Defining a System Call

System call name: getpid()

System call function: sys_getpid()

asmlinkage long sys_getpid(void)

{

return current->tgid;

}

Page 36: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #36

Invoking System Calls

• Standard syscalls called indirectly via libc.

• What if you’ve created a new system call?– Manually write assembly to create a software

interrupt and pass parameters in registers.– Or use _syscall macros in <linux/unistd.h> to

automatically generate a function that calls your new system call.

Page 37: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #37

System Call Declaration Macrosinclude/asm-i386/unistd.h

_syscall0(int, fork)– fork is the system call to be invoked.– int is the type of the return value

#define _syscall0(type,name) \type name(void) \{ \long __res; \__asm__ volatile ("int $0x80" \ : "=a" (__res) \ : "0" (__NR_##name)); \__syscall_return(type,__res); \}

Page 38: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #38

System Call Declaration Macrosinclude/asm-i386/unistd.h

_syscall3(int,write,int,fd,const char *,buf,unsigned int, count)– write is the system call with 3 arguments to be called.– 3 parameters are fd, buf, and count.

#define _syscall3(type,name,t1,arg1,t2,arg2,t3,arg3) \ type name(type1 arg1,type2 arg2,type3 arg3) \ {

long __res; asm__ volatile ("int $0x80"

: "=a" (__res) \: "" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), "d" ((long)(arg3))); \

__syscall_return(type,__res); \}

Page 39: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #39

Calling your new syscall#include <linux/unistd.h>#define __NR_current_time 289_syscall0(long, current_time)#include <stdio.h>

int main(){ long retval = 1; retval = current_time(); printf("The return value is %ld\n", retval);

return 0;}

Page 40: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS System Calls

CSC 660: Advanced Operating Systems Slide #40

References1. Daniel P. Bovet and Marco Cesati, Understanding the

Linux Kernel, 3rd edition, O’Reilly, 2005.2. GNU, GNU C Library Manual,

http://www.gnu.org/software/libc/manual/, 2003.3. Robert Love, Linux Kernel Development, 2nd edition,

Prentice-Hall, 2005.4. Claudia Rodriguez et al, The Linux Kernel Primer,

Prentice-Hall, 2005.5. Peter Salzman et. al., Linux Kernel Module Programming

Guide, version 2.6.1, 2005.6. Andrew S. Tanenbaum, Modern Operating Systems, 2nd

edition, Prentice-Hall, 2001.