unix processes - academysys.academy.lv/lection/5-unixprocesses.pdf · 2016-02-18 · processes •...
TRANSCRIPT
UNIX Processes
System Calls for Process Management
s is an error code pid is a process ID residual is the remaining time from the previous alarm
System Lifecycle: Ups & Downs
Power
on Power
off
Boot-
loader
Kernel
Init
OS
Init
RUN! Shut
down
httpd
lpd
Processes
/etc/init
kernel Process 0: Kernel bootstrap. Start process 1.
Process 1: create processes to allow login.
inetd
/etc/getty
fork
exec
/bin/login
exec
shell
exec
/etc/getty
fork
exec
/bin/login
exec
shell
exec
condition terminal for login
check password
command interpreter
kernal mode
user mode
Illustration of Process Control Calls
The ls Command
Steps in executing the command ls type to the shell
Processes • Processes can run in 2 different modes: user level and kernel level • Process can switch between these two modes by means of system calls • Process resources also can be divided into two parts: user level process
resources and kernel level process resorces • User level process resources – CPU general pursope registers, command
counter, CPU state registers, stack registers, process memory segments (text segments, data sements, shared lib., stack),
• Kernel level resources – in most cases resources, which are important for underlying hardware: registers, command counter, stack pointer, schedule information, system call information and etc.
• Process kernel state divided into two parts: process structure and user structure • Process structure contains data information, which have to be always in
memory and can’t be swapped out. It have to contain pointers to all other resident structures.
• User structures have to be residently in memory only during process execution. Otherwise it can be swapped out to the disk.
• User structures can be dynamicly allocated to process by the means of memory managenet routines.
• Multitasking programming can be achieved by the context switching. And because context switching operations take place very often,minimizing cotext switching time is effective way to achieve better performance.
Parts of process memory structure
Program code
•Initialised data
•Non-initialised data
bash$ size testhand2
92763 + 7564 + 2320 = 102647
Stack frames of
invoked functions
arena/heap
•malloc
switches on system call
(trap, software interrupt)
•user-id
•open files
•saved register states
•environment
• Every process have uniq identifier – PID. It’s a common mechanism, how kernel and other processes can communicate with each other.
• Process structure contains – Process identifier PID – Signal state: waiting signals, signal mask and signal action summary – Profiling information – Timers: realtime timers and CPU usage counters – Different process substructures
• Process group identification: process group and session it belongs to • User mandats: actual, effective and stored user and group identification • Memory management describe virtual adress space for every process in the
system. • File descriptors: array of pointers to the files, indexed by file decriptors and
open file flags. • System call vector. It is possible to run object files, compiled for different UNIX
systems, by using different system call vector for different object files. • Resource accounting: rlimit structure, which is used for accounting different
system resources. • Statistics: information got from working processes and which are written to
accounting file at the time process exit, include process timers and profiling information if it’s necessary.
• Signal action: action to be taken when signal send to process • Thread structure.
Big Picture: Another look
Data
Stack
Text (shared)
kernel stack/u area
Data
Stack
Text (shared)
kernel stack/u area
Data
Stack
Text (shared)
kernel stack/u area
process structure
kernel memory
Threads
UNIX Scheduler
The UNIX scheduler is based on a multilevel queue structure
• Process status: NEW, NORMAL (RUNNNABLE, SLEEPING, STOPPED), ZOMBIE
• Kernel uses 2 queues to hold processes in different states: zombieproc and allproc.
• In most cases threads are organiezed in 2 queues – runnable queue and waiting queue.
• Threads, which are ready for running going to runnable queue and threads, which are waiting for some something placed in waiting queue.
• Queues are organized based on process and threads priority values. Waiting queue hashed based on event ID in order to make search operation faster.
• Processes are organized in groups
• Process can be created by using pid_t fork(void); pid_t rfork(int flags); pid_t vfork(void); sysytem call • Child process created by fork() is an exact copy of parent process except for the following:
– The child process has a unique process ID. – The child process has a different parent process ID (i.e., the process ID of the parent process). – The child process has its own copy of the parent's descriptors. These descriptors
reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read(2) or write(2) by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes.
– The child process' resource utilizations are set to 0; see setrlimit(2). – All interval timers are cleared; see setitimer(2).
• Child process created by rfork() is an exact copy of parent process except for the following: Forking, vforking or rforking are the only ways new processes are created. The flags argument to
rfork() selects which resources of the invoking process (parent) are shared by the new process (child) or
initialized to their default values. The resources include the open file descriptor table (which, when shared,
permits processes to open and close files for other processes), and open files.
• The vfork() system call can be used to create new processes without fully copying the address space of the old process, which is horrendously inefficient in a paged environment. It is useful when the purpose of fork(2) would have been to create a new system context for an execve(2). The vfork() system call differs from fork(2) in that the child borrows the parent's memory and thread of control until a call to execve(2) or an exit (either by a call to _exit(2) or abnormally). The parent process is suspended while the child is using its resources.
• Process exit ether by using exit() call or by reciving signal. In either way, process exit status is delivered to parent process by wait4() system call.
POSIX Signals
The signals required by POSIX.
• The /proc virtual filesystem is a switch in the configuration of the Linux kernel, one that is turned on by default. If, for whatever reason, you would like to completely disable /proc on your system, de-select /proc file system support within the File system configuration section of config, menuconfig, or xconfig when rebuilding your kernel. Alternatively, you can simply comment out the /proc line in /etc/fstab to prevent it from being mounted.
The /proc pseudo filesystem
The /proc pseudo filesystem • The /proc directory contains virtual files that are windows into the current state of the
running kernel. This allows the user to peer into a vast array of information, effectively providing them with the kernel's point-of-view within the system. In addition, the user can use the /proc directory to communicate particular configuration changes to the kernel.
• /proc directory contains files that are not part of any filesystem associated with your hard disks, CD-ROM, or any other physical storage device connected to your system (except, arguably, your RAM). Rather, these files are part of a virtual filesystem, enabled or disabled in the kernel when it is compiled.
• The /proc virtual filesystem is a switch in the configuration of the kernel, one that is turned on by default. If, for whatever reason, you would like to completely disable /proc on your system, de-select /proc file system support within the File system configuration section of config, menuconfig, or xconfig when rebuilding your kernel. Alternatively, you can simply comment out the /proc line in /etc/fstab to prevent it from being mounted.
• The /proc virtual files exhibit some interesting qualities. First, most of them are 0 bytes in size. However, when the file is viewed, it likely contains quite a bit of information. In addition, most of their time and date settings reflect the current time and date, meaning that they are constantly changing.
• A system administrator can use /proc as an easy method of accessing information about the state of the kernel, the attributes of the machine, the states of individual processes, and more. Most of the files in this directory, such as interrupts, meminfo, mounts, and partitions, provide an up-to-the-moment glimpse of a system's environment.
• Interesting quality of virtual files can be seen when viewing them with the more command, which usually tells gives your location in the file by displaying the percentage of the document you are currently seeing. This percentage number usually climbs the further you navigate down a long file. However, when viewing a /proc virtual file, the percentage amount never changes, always staying at 0%.
• Be sure to avoid viewing the kcore file in /proc. This virtual file contains an image of the kernel's memory, and the contents of the file will do strange things to your terminal. You may need to type reset after hitting [Ctrl]-[C] to get back to a proper command line prompt.
The /proc pseudo filesystem
• Most of the files at the top-level of the /proc directory hold key pieces of information about the state of the Linux kernel and your system in general.
• It is important to remember that the content of the files in the /proc directory and its various sub-directories is entirely dependent on information concerning your system. In other words, do not expect to see the exact same information in the same /proc file on two different machines.
Top-Level Files in /proc
• /proc/apm This file provides information about the Advanced Power Management (APM) state and
options on the system. This information is used by the kernel to provide information for the apm command.
• /proc/cmdline This file essentially shows the parameters passed to the Linux kernel at the time it is
started. • /proc/cpuinfo This file changes based on the type of processor in your system. The output is fairly easy
to understand. • /proc/devices This file displays the various character and block devices currently configured for use with
the kernel. It does not include modules that are available but not loaded into the kernel. The output from /proc/devices includes the major number and name of the device.
• /proc/dma This file contains a list of the registered ISA direct memory access (DMA) channels in use. • /proc/execdomains This file lists the execution domains currently supported by the Linux kernel, along with
the range of personalities they support. Think of execution domains as a kind of "personality" of a particular operating system. Other binary formats, such as Solaris, UnixWare, and FreeBSD, can be used with Linux. By changing the personality of a task running in Linux, a programmer can change the way the operating system treats particular system calls from a certain binary.
Top-Level Files in /proc
• /proc/fb This file contains a list of frame buffer devices, with the frame buffer device number and
the driver that controls it. • /proc/filesystems This file displays a list of the filesystem types currently supported by the kernel. • /proc/interrupts This file records the number of interrupts per IRQ on the x86 architecture. • /proc/iomem This file shows you the current map of the system's memory for its various devices • /proc/ioports In a way similar to /proc/iomem, /proc/ioports provides a list of currently registered port
regions used for input or output communication with a device. • /proc/isapnp This file lists Plug and Play (PnP) cards in ISA slots on the system. This is most often seen
with sound cards but may include any number of devices. • /proc/kcore This file represents the physical memory of the system and is stored in the core file
format. Unlike most /proc files, kcore does display a size. This value is given in bytes and is equal to the size of physical memory (RAM) used plus 4KB.
Top-Level Files in /proc
• /proc/kmsg This file is used to hold messages generated by the kernel. These messages are then
picked up by other programs, such as klogd. • /proc/ksyms This file holds the kernel exported symbol definitions used by the modules tools to
dynamically link and bind loadable modules. • proc/loadavg This file provides a look at load average, or the utilization of the processor, over time, as
well as giving additional data used by uptime and other commands. • /proc/locks This files displays the files currently locked by the kernel. The content of this file contains
kernel internal debugging data and can vary greatly, depending on the use of the system. • /proc/mdstat This file contains the current information for multiple-disk, RAID configurations. If your
system does not contain such a configuration, then your mdstat file will look similar to this:
Personalities : read_ahead not set unused devices: <none> • /proc/meminfo This is one of the more commonly used /proc files, as it reports back plenty of valuable
information about the current utilization of RAM on the system.
Top-Level Files in /proc
• /proc/misc This file lists miscellaneous drivers registered on the miscellaneous
major device, which is number 10 • /proc/modules This file displays a list of all modules that have been loaded by the
system. Its contents will vary based on the configuration and use of your system
• /proc/mounts This file provides a quick list of all mounts in use by the system. • /proc/mtrr This file refers to the current Memory Type Range Registers (MTRRs) in
use with the system. • /proc/partitions For very detailed information on the various partitions currently
available to the system • /proc/pci This file contains a full listing of every PCI device on your system.
Depending on the number of PCI devices you have, /proc/pci can get rather long.
Top-Level Files in /proc
• /proc/slabinfo This file gives information about memory usage on the slab level. Linux
kernels greater than 2.2 use slab pools to manage memory above the page level. Commonly used objects have their own slab pools.
• /proc/stat This file keeps track of a variety of different statistics about the system
since it was last restarted. • /proc/swaps This file measures swap space and its utilization. • /proc/uptime This file contains information about how long the system has on since
its last restart. • /proc/version This files tells you the versions of the Linux kernel and gcc
Top-Level Files in /proc
Directories in /proc • Common groups of information concerning the kernel is
grouped into directories and sub-directories within /proc. • Process Directories
– Every /proc directory contains quite a few directories named with a number. These directories are called process directories, as they refer to a process's ID and contain information specific to that process. The owner and group of each process directory is set to the user running the process. When the process is terminated, its /proc process directory vanishes. However, while the process is running, a great deal of information specific to that process is contained in the process directory's various files. Each of the process directories contains the following files: • cmdline — Contains the command line arguments that started the
process. • cpu — Provides specific information about the utilization of each of the
system's CPUs. • cwd — A link to the current working directory for the process.
Directories in /proc • environ — Gives a list of the environment variables for the process. The
environment variable is given in all upper-case characters, and the value is in lower-case characters.
• exe — A link to the executable of this process. • fd — A directory containing all of the file descriptors for a particular process. • maps — Contains memory maps to the various executables and library files
associated with this process. • mem — The memory held by the process. • root — A link to the root directory of the process. • stat — A status of the process. • statm — A status of the memory in use by the process.
The seven columns relate to different memory statistics for the process. In order of how they are displayed, from right to left, they report different aspects of the memory used:
– Total program size, in kilobytes – Size of memory portions, in kilobytes – Number of pages that are shared – Number of pages are code – Number of pages of data/stack – Number of pages of library – Number of dirty pages
Directories in /proc • status — Provides the status of the process in a form that is much more
readable than stat or statm.
– /proc/self The /proc/self directory is a link to the currently running process. This
allows a process to look at itself without having to know its process ID. Within a shell environment, a listing of the /proc/self directory produces the same contents as listing the process directory for that process.
– /proc/bus This directory contains information specific to the various busses available
on the system. So, for example, on a standard system containing ISA, PCI, and USB busses, current data on each of these busses is available in its directory under /proc/bus. The contents of the sub-directories and files available varies greatly on the precise configuration of your system. However, each of the directories for each of the bus types contains at least one directory for each bus of that type.
– /proc/driver This directory contains information for specific drivers in use by the kernel. A common file found here is rtc, which provides output from the driver for
the system's Real Time Clock (RTC), the device that keeps the time while the system is switched off.
Directories in /proc – /proc/fs This directory contains specific filesystem, file handle, inode, dentry and quota
information. This information is actually located in /proc/sys/fs. – /proc/ide This directory holds an assorted array of information about IDE devices on the system.
Each IDE channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1.
– /proc/ide This directory holds an assorted array of information about IDE devices on the system.
Each IDE channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1.
• Device Directories – Some of the most useful data can be found in the device directories within the channel
directory. Each device, such as a hard drive or CD-ROM, on that channel will have its own directory containing its own collection of information and statistics. The contents of these directories vary according to the type of device connected. Some of the more useful files common to different devices include: • cache — The device's cache. • capacity — The capacity of the device, in 512 byte blocks. • driver — The driver and version used to control the device. • geometry — The physical and logical geometry of the device. • media — The type of device, such as a disk. • model — The model name or number of the device. • settings — A collection of current parameters of the device.
Directories in /proc – /proc/irq This directory is used to set IRQ to CPU affinity, which allows you to
connect a particular IRQ to only one CPU. Alternatively, you can exclude a CPU from handling any IRQs. Each IRQ has its own directory, allowing for each IRQ to be configured different from any other. The /proc/irq/prof_cpu_mask file is a bitmask that contains the default values for the smp_affinity file in the IRQ directory. The values in smp_affinity specify which CPUs handle that particular IRQ.
– /proc/net This directory provides a comprehensive look at various networking parameters and
statistics. – /proc/scsi In the same way the /proc/ide directory only exists if an IDE controller is connected
to the system, the /proc/scsi directory is only available if you have a SCSI host adapter.
– /proc/sys This directory is special and different from the others in /proc, as it not only
provides a lot of information about the system but also allows you to make configuration changes to a running kernel.
Warning Never attempt to tweak your kernel's settings on a production system using the various files in the /proc/sys directory. Occasionally, changing a setting may render the kernel unstable, requiring a reboot of the system. As this would obviously disrupt any users currently using the system, use a similar development system to try out changes before utilizing them on any production machines.
Directories in /proc The /proc/sys directory contains several different directories that control
different aspects of a running kernel. – /proc/sys/dev This directory provides parameters for particular devices on the system.
Most systems have at least two directories, cdrom and raid, but customized kernels can have others, such as parport, which provides the ability to share one parallel port between multiple device drivers.
– /proc/sys/fs This directory contains an array of options and information concerning
various aspects of the filesystem, including quota, file handle, inode, and dentry information.
– /proc/sys/kernel This directory contains a variety of different configuration files that directly
affect the operation of the kernel. – /proc/sys/net This directory contains assorted directories of its own concerning various
networking topics, including assorted protocols and centers of emphasis. Various configurations at the time of kernel compilation make available different directories here, such as appletalk, ethernet, ipv4, ipx, and ipv6. Within these directories, you can adjust the assorted networking values for that configuration on a running system.
Directories in /proc – /proc/sys/vm This directory facilitates the configuration of the Linux kernel's
virtual memory (VM) subsystem. The kernel makes extensive and intelligent use of virtual memory, which is commonly called swap space.
– /proc/sysvipc This directory contain information about System V IPC resources.
The files in this directory relate to System V IPC calls for messages (msg), semaphores (sem), and shared memory (shm).
– /proc/tty This directory contains information about the available and
currently used tty devices on the system. Originally called a teletype device, any character-based data terminals are called tty devices. In Linux, there are three different kinds of tty devices. Serial devices are used with serial connections, such as over a modem or using a serial cable. Virtual terminals create the common console connection, such as the virtual consoles available when pressing [Alt]-[<F-key>] at the system console. Pseudo terminals create a two-way communication that is used by some higher level applications, such as X11.
Using sysctl • Setting kernel parameters in the /proc/sys directory need not be a manual
process or one that required echoing values into a virtual file, hoping they are correct. The sysctl command can make viewing, setting, and automating special kernel settings very easy.
• To get a quick overview of all settings configurable in the /proc/sys directory, type the sysctl -a command as root. This will create a large, comprehensive list.
• This is the same basic information you would see if you viewed each of the files individually. The only difference is the file location. The /proc/sys/net/ipv4/route/min_delay is signified by net.ipv4.route.min_delay, with the directory slashes replaced by dots and the proc.sys portion assumed.
• quickly setting single values like this in /proc/sys is helpful during testing, it does not work as well on a production system, as all /proc/sys special settings are lost when the machine is rebooted. To preserve the settings that you like to make permanently to your kernel, add them to the /etc/sysctl.conf file.
• Even though the /proc filesystem is a great resource to exploit, sometimes it is just missing. The filesystem is not vital to system operation, and there are cases when you choose to leave it out of the kernel image or simply don't mount it. When you build an embedded system, for example, saving 40-50 kB can be an interesting option; if you are very concerned about security, on the other hand, you might decide to hide system information and leave /proc unmounted.
Using sysctl • The system call interface to kernel tuning, namely sysctl, is an alternative way to
peek into configurable parameters and to modify them. An additional advantage of the system call interface is that it's faster, as no fork/exec is involved, nor any directory lookup. Anyway, unless you run a very old platform, the performance savings are irrelevant.
• To use the system call, the header <sys/sysctl.h> must be included: it declares the function as:
int sysctl (int *name, int nlen, void *oldval, size_t *oldlenp, void *newval, size_t newlen);
The arguments of the function have the following meaning: name points to an array of integers: each of the integer values identifies a sysctl item,
either a directory or a leaf node file. The symbolic names for such values are defined in <linux/sysctl.h>.
nlen states how many integer numbers are listed in the array name: to reach a particular entry you need to specify the path through the subdirectories, so you need to tell how long is such path.
oldval is a pointer to a data buffer where the old value of the sysctl item must be stored. If it is NULL, the system call won't return values to user space.
oldlenp points to an integer number stating the length of the oldval buffer. The system call changes the value to reflect how much data has been written, which can be less than the buffer length.
newval points to a data buffer hosting replacement data: the kernel will read this buffer to change the sysctl entry being acted upon. If it is NULL, the kernel value is not changed.
newlen is the length of newval. The kernel will read no more than newlen bytes from newval.
Using sysctl (FreeBSD specific) • The FreeBSD sysctl mechanism is based on the so-called linker set technology[1]. It lets us
gather information of a running kernel and configure it to some extent without rebuilding a new kernel.
• All the information is stored inside the kernel and is organized into a Management Information Base (MIB) tree. To access the MIB tree, you should use sysctl variables whose names are naturally managed hierarchically.
• Most sysctl variables have ASCII names separated by dots. For example, the read-only sysctl variable kern.ostype contains the type of the kernel. This naming scheme is very similar to filenames, where we use slashes to separate component names instead of using dots. To list all sysctl variables by their ASCII names, you can issue the following command:
$ sysctl -a • The types of the sysctl variables include node, integer, string, structure and opaque data.
A node is like a directory in a filesystem. The kern.ostype variable is a string. Its value is "FreeBSD." The sysctl command that you can use on a command line only accepts ASCII names of a sysctl variable. Unlike filenames, wildcard characters like "*" and "?" are not accepted. But you do not have to specify full name to display sysctl variables.
• ALL sysctl names are implemented internally as an array of integers. I call it "integer names" to distinguish with "ASCII names." You can only use integer names with the system call __sysctl(). If the user only knows the ASCII name of a sysctl variable, it must use a special integer name {0,3} (see below) along with the ASCII name to get the integer name of the sysctl variable. You can not avoid this indirection.
Using sysctl (FreeBSD specific)
• The maximum number of integers consisting of a sysctl name is limited to CTL_MAXNAME (12). The corresponding internal name of kern.ostype is an array of integers with two elements: {CTL_KERN, KERN_OSTYPE} or {1,1}. Note some sysctl variables only have integer names. For example, {CTL_KERN, KERN_PROC, GPROF_STATE} is the name for the kernel profiling sysctl variable recording whether the kernel is currently being profiled. It has no corresponding ASCII name and therefore cannot be accessed by the sysctl command.
Resources 1. Red Hat : The Official Red Hat Linux Reference Guide 2. InformIT - The /proc File System 3. Jonathon T. Giffin George S. Kola - Linux Process Control via the File
System 4. Daemonnews Department of Computer Science, SUNY at
Binghamton Zhihui Zhang - FreeBSD 4.0 Sysctl Mechanism 5. Sean Davis <[email protected]> - sysctl On NetBSD - An Easy
Way To Get Process Data 6. Oskar Andreasson [email protected] - Ipsysctl tutorial 1.0.4 7. FreeBSD Documentation project – FreeBSD Handbook 8. Marshall Kirk McKusick, Marshall Kirk McKusick, George V. Neville-
Neil. - Design and Implementation of the FreeBSD Operating System 9. Gerhard Mourani and Open Network Architecture, Inc. - Securing and
Optimizing Linux: The Ultimate Solution