c prog linux.pdf
TRANSCRIPT
-
Systems Programming00. Introduction
Alexander Holupirek
Database and Information Systems GroupDepartment of Computer & Information Science
University of Konstanz
Summer Term 2008
1
Welcome All!
1995 United Feature Syndicate, Inc. (NYC), [email protected]
2
Visiting Card
Alexander Holupirek
http://www.inf.uni-konstanz.de/~holupire
88 4440 E 217
I E-mail is the best way to reach me.I You are welcome in my office whenever you have a question
(no need to make an appointment first).
3
Your Tutors
Jochen Oekonomopulos
Enrolled in master studies Information Engineering
V 504
Tuesday, 18:00-19:30, Room C 252/Computer Pool
Thomas Zink
Enrolled in master studies Information Engineering
V 504
Friday, 12:00-13:30, Room D 406/Computer Pool
4
-
Tutorial Groups
Subversion Repository for the Tutorial
I We have set up a version control system for the tutorials.I Please use it to commit your solutions to the assignments.I Source code from the lecture is available in the /pub directory.I Once registered to the tutorial you will receive your
credentials.
Command line approach to check out the repository
$ svn --username holu --password XXXX co \svn :// phobos29.inf.uni -konstanz.de/sys_S08
5
How You Will Benefit
Assignments & Tutorials
I Work on the weekly assignments.I Hand them in on time.I Jochen and Thomas will revise them.I Attend the tutorials and discussion of solutions.
6
How You Will Benefit (cont.)
Lecture Material
I Use the material provided on the course website to prepare forthe lectures.
I Dont hesitate to ask questions.I Let me know if I can improve the lecture material and/or its
presentation.
7
How You Will Benefit (cont.)
Account & Mailinglist
I Use the Account Tool to register to the course.I You will automagically become a member on the mailinglist
sys [email protected].
I Feel free to post and discuss problems, questions, commentson that list.
I Make sure to receive the e-mails.1
I Any information about changes etc. will be posted there.
1These are sent to @inf.uni-konstanz.de8
-
How You Will Benefit (cont.)
Examination and Credits
I Register to the course (via StudIS) within 4 weeks.I Pass the examination at the end of the semester.I Examination dates:
I July, 18th, 12:00 - 13:30, D 406I October, 17th, 12:00 - 13:30, D 406
I 6 ECTS, Informatik der Systeme
Have fun!
9
Organizational Matters
Website for this course
I Please check this site regulary for latest information.
http://www.inf.uni-konstanz.de/dbis/teaching/ss08/sys/
Schedule (OK for everybody?)
I Monday, 18:00-19:30, Room C 252I Tuesday, 18:00-19:30, Room D 247/Computer Pool
10
Literature
11
Literature
The IEEE and The Open Group.Single UNIX Specification, Version 3, 2004 Edition.http://www.unix.org/single unix specification/
Brian W. Kernighan, Dennis M. Ritchie.The C Programming Language.ISBN 0-13-110370-9, 1988, 41th Printing.Prentice Hall Software Series
W. Richard Stevens, Stephen A. Rago.Advanced Programming in the UNIX Environment.ISBN 978-0201433074Addison-Wesley Professional; 2nd edition (June 27, 2005)
12
-
What Is This Course About?
Systems Programming
I With systems we mean operating systems.I With programming we mean using the interface an operating
system (OS) provides.
I With OS we mean UNIX-like OSs.
Operating System
I Layer of software on top of bare hardware.I Shields programmers from the complexity of the hardware.I Presents an interface (of a virtual machine) that is easier to
understand and program.
13
The UNIX System Interface
The UNIX operating system provides its services through a set ofsystem calls, which are in effect functions within the operatingsystem that may be called by user programs.
I Syscalls determine a direct interface to the kernel.I Employed for maximum efficiency.I Access some facility that is not the libraries.
I The service calls available in the interface vary from OS toOS, however the underlying concepts tend to be similar.
I ISO C library is (in many cases) modeled on UNIX facilities.
14
Standardization Of The UNIX System Interface
During the 1980s the proliferation of UNIX versions and differencesbetween them led many large users (such as the U.S. government)to call for standardization.
I Among others ANSI2 C and the IEEE3 POSIX emergedI POSIX stands for Portable Operating System InterfaceI POSIX refers to a family of related standards4
I POSIX originally used as synonym for IEEE Std 1003.1-1988I POSIX.1 emerged as a preferred termI The latest version of POSIX.1 was published on April, 30th 04.I It is called IEEE Std 1003.1, 2004 Edition (POSIX.1)
2American National Standards Institute3Institute of Electrical and Electronics Engineers4IEEE Std 1003.n (where n is a number) and the parts of ISO/IEC 9945
15
Systems Programming With POSIX.1
application using the API
POSIX.1 system call interface
OS as Black Box
Figure: POSIX.1 as interface to UNIX OSs
16
-
Systems vs. Kernel Programming.
I Black Box Modell is suitable for systems programming.I Knowledge about the systems internals, however, is beneficial
to use the system properly and to not work against it.
I Providing the system services is (mostly) kernel programming.
application using the API
POSIX.1 system call interface
OS as Black Box
application using the API
POSIX.1 system call interface
OS kernel
Figure: Black vs. White Box View of a UNIX System
17
The Joint Standard
The latest version POSIX.1 has been jointly developed by the IEEEand The Open Group5. As such it is both an IEEE and an OpenGroup Technical Standard:
I IEEE Std 1003.1, 2004 EditionI The Open Group Technical Standard Base Specifications, Issue 6I It is also an international standard ISO/IEC 9945:2003
5http://www.opengroup.org/overview/members/membership list.htm18
The Single UNIX Specification, Version 3
The standard is published free of charge on the web6 as
The Single UNIX Specification, Version 3, 2004 Edition
Conceptually, this standard describes a set offundamental services needed for the efficient constructionof application programs. Access to these services hasbeen provided by defining an interface, using the Cprogramming language, a command interpreter, andcommon utility programs that establish standardsemantics and syntax.
[IEEE/The Open Group, 2004, Preface]
6http://www.unix.org/single unix specification/19
The Single UNIX Specification (SUSv3)
The document is broken into four parts:
I Part 1: Base Definitions (XBD)I Part 2: System Interfaces (XSH)I Part 3: Shell and Utilities (XCU)I Part 4: Rationale
The System Interfaces volume (XSH)7 describes a set of systeminterfaces offered to application programs by systems conformantto this part of the Single UNIX Specification. Readers are expectedto be experienced C language programmers.http://www.opengroup.org/onlinepubs/009695399/functions/contents.html
7http://www.unix.org/version3/xsh contents.html20
-
Part 2: System Interfaces Volume (XSH)
Because POSIX.1 specifies an interface and not an implementation,no distinction is made between system calls and library functions.
Example
System Interface Table. Lists 1123 interfaces.http://www.opengroup.org/onlinepubs/009695399/functions/atoi.html
http://www.opengroup.org/onlinepubs/009695399/functions/read.html
21
UNIX Architecture
kernel
shellsystem calls
applications
library routines
22
System Calls - Section 2
The system call interface has traditionally been documented inSection 2 of the UNIX Programmers Manual.
1 General commands (tools and utilities).
2 System calls and error numbers.
3 Libraries.
3p perl(1) programmers reference guide.
4 Device drivers.
5 File formats.
6 Games.
7 Miscellaneous.
8 System maintenance and operation commands.
9 Kernel internals.
X11 An alias for X11R6.
X11R6 X Window System.
local Pages located in /usr/local.
man(1) on OpenBSD
23
System Calls - Section 2
The system call interface has traditionally been documented inSection 2 of the UNIX Programmers Manual.
0 Header files (usually found in /usr/include)
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions eg /etc/passwd
6 Games
7 Miscellaneous (including macro packages and
conventions), e.g. man(7), groff(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
man(1) on Linux
24
-
System Call Definition & C Library Functions
I Definition of the system call interface is in the C language8.I A standard technique on UNIX systems is for each system call
to have a function of the same name in the Standard CLibrary.
I Those functions invoke the apt kernel service, using whatevertechnique is required on the system.I The function may put one or more of the C arguments into
general registers and then execute some machine instructionthat generates a software interrupt in the kernel.
I We can consider the system calls as being C functions.
8Regardless of the actual implementation technique used to invoke a call25
Library Calls - Section 3
I Section 3 of the UNIX Programmers Manual defines thegeneral purpose functions available to the programmers.
I These functions are not entry points into the kernel.I May use kernels system calls, however.I printf(3): May invoke write(2) to perform output.I atoi(3) (convert ASCII string to integer): no OS at all.
I Implementors view (kernel programming): Distinctionbetween system call vs. library function is fundamental.
I Users perspective (systems programming): Not as critical,both exist to provide services for application programs, but . . .
26
System Calls vs. Library Calls
Example to illustrate the difference: current time and date
I Some OS have syscalls to return the time and another toreturn the date. Special handling (switch to or from daylightsaving) is handled by the kernel or requires humanintervention.
I UNIX provides one syscall (gettimeofday(2)) that returnsthe number of seconds since the Epoch.9
I Any interpretation (local time zone, converting tohuman-readable time) is left to the user process.
I Syscalls usually provide a minimal interface while libraryfunctions often provide more elaborate functionality.
9midnight, January 1, 1970, Coordinated Universal Time27
Essentials
I Good knowledge of C.I Knowledge about the services an OS provides:
I system calls.I C libraries.
I Some knowledge about kernels internas.I Some knowledge about operating system concepts.I Some knowledge about the underlying hardware.
28
-
Systems Programming01. The C Programming Language
Alexander Holupirek
Database and Information Systems GroupDepartment of Computer & Information Science
University of Konstanz
Summer Term 2008
29
A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output
Arrays
Functions
Call by Value, Call by Reference
Character Arrays
Variables, Declarations and Scope
30
Schedule For Today: A First Glance At C
I Quick introductionI Show essential elements of the languageI No details, rules, and exceptionsI Provide examplesI Show the basics, such as
I variables and constantsI arithmeticI control flowI functionsI rudiments of input and output
I Leave out anything else, such asI pointersI structuresI standard library
31
The First Program Is Always The Same
Print the words: Hello, worldNot that easy, because you have to:
I Create the program textI Compile it successfullyI Run itI Get the output
1 #include
2
3 int
4 main(void)
5 {
6 printf("Hello , world\n");
7 return (0);
8 }
32
-
Compilation On A UNIX-like OS
$ cc -Wall hello.c$ lshello.c a.out
$ ./a.outHello , world
$
engine filename description
hello.c source codepreprocessor hello.i source w/ preproc. directives expandedcompiler hello.s assembler codeassembler hello.o object code ready to be linkedlinker a.out executable
33
C Programs
Basic building blocks
I functionsI statementsI variablesI arguments
I functions contain statementsI statements specify computing operations to be doneI variables store values used during computationI arguments (one way to) communicate data between functions
34
Building Blocks Of Our Example
I A function called mainI Liberty to name functions whatever you like, but . . .I main is special, a program begins execution at the beginning
of main
I Every program must have a main somewhereI main will usually call other functions to help perform its job
I Functions that you wroteI Functions that are provided for you, e.g. printf
35
Some Explanations About The Program Itself
1 #include
2
3 int
4 main(void)
5 {
6 printf("Hello , world\n");
7 return (0);
8 }
I line 1: tell compiler to include information about the standardinput/output library
I line 3/4: define a function named main, which receives noargt values. Parentheses after the function name surround theargument list (emlist). Returns an int.
I line 5/8: statements of main are enclosed in bracesI line 6: main calls library function printf, which prints this
sequences of characters; \n represents the newline character.
36
-
Line 6: Print A String
I A function is called by naming it, followed by a parenthesizedlist of arguments:
printf("Hello world\n");
calls the function printf with the argument
"Hello world\n"
I printf is a library function that prints output
(in this case the string of characters between the quotes)
37
Character String/String Constant
I A sequence of characters in double quotes is called a characterstring or string constant
I Sequence \n stands for the newline character, which whenprinted advances the output to the left margin of the next line
I We have to use \n to include a newline character with printf
printf("Hello , world
");
$ cc hello.chello.c:6:16: missing terminating " character
hello.c:7:9: missing terminating " character
hello.c: In function main:
hello.c:8: error: syntax error before "return"
38
Printing Hello, world
I printf never supplies a newline automaticallyI so several calls can build up an output line in stagesI our first program could just as well have been written like
below to produce identical output
#include
int
main(void)
{
printf("Hello , ");
printf("world");
printf("\n");
return (0);
}
39
Escape Sequences
I Notice that \n represents only a single characterI An escape sequence like \n provides a general and extensible
mechanism for hard-to-type or invisible characters.
\a alert (bell) character \\ backslash\b backspace \? question mark\f formfeed \ single quote\n newline \" double quote\r carriage return \ooo octal number\t horizontal tab \xhh hexadecimal number\v vertical tab
Table: The complete set of escape sequences
40
-
A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output
Arrays
Functions
Call by Value, Call by Reference
Character Arrays
Variables, Declarations and Scope
41
Fahrenheit-Celsius: C = (5/9)(F 32)
1 #include
2 /* print fahrenheit -celsius table
3 for fahrenheit = 0, 20, ..., 300 */
4 int
5 main(void)
6 {
7 int fahr , celsius;
8 int lower , upper , step;
9
10 lower = 0; /* lower limit */
11 upper = 300; /* upper limit */
12 step = 20; /* step size */
13
14 fahr = lower;
15 while (fahr
-
Data Types And Sizes
Sizes are machine-dependent
I Each compiler is free to choose appropriate sizes for its ownhardware. ISO C defines compile-time limits.
I short and int are at least 16 bitI long is at least 32 bitI short is no longer than int, int is no longer than longI Numerical limits10 are documented in and
. Additional limits are specified in 11
Assignment
10ISO C99 : 7.10/5.2.4.2 : Numerical limits11ISO C99 : 7.18 : Integer Types
45
The while Loop
Each line in the result table is computed the same way:
15 while (fahr
-
Fahrenheit-Celsius Converter Bug List
Fixing problems
I Pretty printing: Right-justified outputI Switch from integer to floating-point arithmetic
Construct a patch for the changes using diff(1)
NAME
diff - compare files line by line
SYNOPSIS
diff [OPTION ]... FILES
DESCRIPTION
Compare files line by line.
-u -U NUM --unified [=NUM]
Output NUM (default 3) lines of unified context.
-p --show -c-function
Show which C function each change is in.
49
1 $ diff -up fahrenheit_v1.c fahrenheit_v2.c2 --- fahrenheit_v1.c Sat Apr 19 08:58:48 2008
3 +++ fahrenheit_v2.c Sat Apr 19 08:58:05 2008
4 @@ -4,7 +4,7 @@
5 int
6 main(void)
7 {
8 - int fahr , celsius;
9 + float fahr , celsius;
10 int lower , upper , step;
11
12 lower = 0; /* lower limit */
13 @@ -13,8 +13,8 @@ main(void)
14
15 fahr = lower;
16 while (fahr
-
Printing With printf(3)
specifier print as . . .
%d decimal integer%6d decimal, at least 6 characters wide%f floating point%6f floating point, at least 6 characters wide%.2f floating point, 2 characters after decimal point%6.2f floating point, at least 6 wide and 2 after decimal point
I Further printf(3) recognizes %o for octal, %x forhexadecimal, %c for character, %s for string, %p for address(pointer)
I ISO C : 7.19.6 : Formatted input/output functions
53
The for Loop, Fahrenheit-Celsius v3
1 #include
2 /* print fahrenheit -celsius table
3 for fahrenheit = 0, 20, ..., 300 */
4 int
5 main(void)
6 {
7 int fahr;
8
9 for (fahr = 0; fahr
-
Character Input And Output
Processing character data
I Text I/O is dealt with as streams of charactersI A text stream is a sequence of characters divided into linesI Each line consists of zero or more characters followed by a
newline character (regardless of where the stream originates orwhere it goes to). The library makes each input or outputstream conform to this model
I Standard library provides several functions for reading andwriting one character at a time, of which getchar(3) andputchar(3) are the simplest.
57
getchar(3) and putchar(3)
#include
int int
getchar(void); putchar(int c);
I getchar(3) reads the next input character from a text streamI Why does getchar(3) return an int?
I getchar(3) returns a distinctive value when there is no moreinput. A value, called EOF (end of file), that cannot beconfused with any real data. EOF is defined in
I The return type must be big enough to hold EOF in addition toany possible char.
I putchar(3) prints a character each time it is called
58
File Copying
Given getchar(3) and putchar(3) . . .
. . . we can write a surprising amount of useful code withoutknowing anything more about input and output
Copying input to output one character at a time
read a characterwhile (character is not end-of-file indicator)
output the character just readread a character
59
File Copying, v1
read a characterwhile (character is not end-of-file indicator)
output the character just readread a character
1 #include
2
3 /* copy input to output , v1 */
4 int
5 main(void)
6 {
7 int c;
8
9 c = getchar ();
10 while (c != EOF) {
11 putchar(c);
12 c = getchar ();
13 }
14
15 return (0);
16 }
60
-
File Copying, v2
I An assignment, such as c = getchar() is an expression andhas a value (value of the left hand side after the assignment)
I An assignment can appear as part of a larger expression
1 #include
2
3 /* copy input to output , v2 */
4 int
5 main(void)
6 {
7 int c;
8
9 while ((c = getchar ()) != EOF)
10 putchar(c);
11
12 return (0);
13 }
61
Character Counting, v1
1 #include
2
3 /* count characters in input , v1 */
4 int
5 main(void)
6 {
7 long nc;
8
9 nc = 0;
10 while (getchar () != EOF)
11 ++nc;
12 printf("%ld\n", nc);
13
14 return (0);
15 }
62
Character Counting, v2
1 #include
2
3 /* count characters in input , v2 */
4 int
5 main(void)
6 {
7 double nc;
8
9 for (nc = 0; getchar () != EOF; ++nc)
10 ; /* nothing */
11 printf("%.0f\n", nc);
12
13 return (0);
14 }
63
Line Counting
I Standard library ensures that an input text stream appears asa sequence of lines, each terminated by a newline
1 #include
2
3 /* count lines in input */
4 int
5 main(void)
6 {
7 int c, nl;
8
9 nl = 0;
10 while ((c = getchar ()) != EOF)
11 if (c == \n)
12 ++nl;
13 printf("%d\n", nl);
14
15 return (0);
16 }
64
-
Word Counting
NAME
wc - word , line , and byte or character count
SYNOPSIS
wc [-c | -m] [-hlw] [file ...]
DESCRIPTION
The wc utility reads one or more input text files , and ,
by default , writes the number of lines , words , and bytes
contained in each input file to the standard output
$ wc /etc/services285 1398 9732 /etc/services
$ cc count_words.c$ cat /etc/services | ./a.out285 1398 9732
65
1 #include
2
3 #define IN 1 /* inside a word */
4 #define OUT 0 /* outside a word */
5
6 /* count lines , words and , characters in input */
7 int
8 main(void)
9 {
10 int c, nl, nw, nc, state;
11
12 state = OUT;
13 nl = nw = nc = 0;
14 while ((c = getchar ()) != EOF) {
15 ++nc;
16 if (c == \n)
17 ++nl;
18 if (c == || c == \n || c == \t)
19 state = OUT;
20 else if (state == OUT) {
21 state = IN;
22 ++nw;
23 }
24 }
25 printf("%d %d %d\n", nl, nw, nc);
26 return (0);
27 }
66
A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output
Arrays
Functions
Call by Value, Call by Reference
Character Arrays
Variables, Declarations and Scope
67
Counting Digits, White Spaces, And The Rest
Next is an artificial program, which counts the number ofoccurrences of each digit, of white space characters (blank, tab,newline), and all other characters.
It will help us to . . .
I introduce arraysI talk about initializationI see that chars are, by definition, just small integersI speak about coding conventions
The output of the program on itself is:
$ cat count_digits.c | ./a.outdigits = 10 3 0 0 0 0 0 0 0 1, white space =122, other =361
$ wc -m count_digits.c497 count_digits.c
68
-
1 #include
2
3 /* count digits , white space , others */
4 int
5 main(void)
6 {
7 int c, i, nwhite , nother;
8 int ndigit [10];
9
10 nwhite = nother = 0;
11 for (i = 0; i < 10; ++i)
12 ndigit[i] = 0;
13
14 while ((c = getchar ()) != EOF)
15 if (c >= 0 && c = 0 */
17 int
18 power(int base , int n)
19 {
20 int i, p;
21
22 p = 1;
23 for (i = 1; i
-
Function Terminology
line 3: function declaration (function prototype), says that power is afunction that expects two int arguments and returns an int
line 17: function definition starts with the declaration of the parametertypes and names, and the type of the result that the functionreturns (has to match with the prototype)
I parameter, a variable named in the parenthesized list in afunction definition
I argument, a value used in a call of the functionI parameter and argument are sometimes referred to as formal
and actual argument
73
A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output
Arrays
Functions
Call by Value, Call by Reference
Character Arrays
Variables, Declarations and Scope
74
ArgumentsCall by Value/Reference
In C, all function arguments are passed by value
I The called function is given the values of its arguments intemporary variables rather than the originals
I The callee cant directly alter a variable in the calling function
Call by reference is possible
I The caller must provide the address of the variable to be set(technically a pointer to the variable), and the called functionmust declare the parameter to be a pointer and access thevariable indirectly through it
I We will discuss pointers in more detail at a later point
75
Passing An Array As Argument
When the name of an array is used as an argument,
I the value passed to the function is the location or address ofthe beginning of the array
I there is no copying of array elementsI the function can access and alter any element of the array
76
-
A Tutorial Introduction
Variables and Arithmetic Expressions
Character Input and Output
Arrays
Functions
Call by Value, Call by Reference
Character Arrays
Variables, Declarations and Scope
77
Character Arrays
The most common type of array in C is the array of characters
longline.c reads a set of text lines and prints the longest
Program outline:
while (there is another line)if (its longer than the previous longest)
save itsave its length
print longest line
78
Splitting The Program
The program divides naturally into pieces
I Function getline fetches the next line of inputI It has to return a signal about end-of-fileI We let it return the length of the line, or zero on EOFI Zero is appropriate because it is never a valid line length
I Function copy copies a line to a safe placeI Function main to control getline and copy
1 #include
2 #define MAXLINE 1000 /* maximum input line size */
3
4 int getline(char line[], int maxline );
5 void copy(char to[], char from []);
79
The Controlling Function
7 /* print longest input line */
8 int
9 main(void)
10 {
11 int len; /* current line length */
12 int max; /* maximum length seen so far */
13 char line[MAXLINE ]; /* current input line */
14 char longest[MAXLINE ]; /* longest line saved here */
15
16 max = 0;
17 while ((len = getline(line , MAXLINE )) > 0)
18 if (len > max) {
19 max = len;
20 copy(longest , line);
21 }
22 if (max > 0) /* there was a line */
23 printf("%s", longest );
24 return (0);
25 }
80
-
Getting A Line
27 /* getline: read a line into s, return length */
28 int
29 getline(char s[], int lim)
30 {
31 int c, i;
32
33 for (i=0; i
-
External Variables
Global, external variables of a program
I As an alternative to automatic variables, it is possible todefine variables that are external to all functions.
I Those can be accessed by name by any functionI Because external variables are globally accessible, they can be
used instead of argument lists to communicate data betweenfunctions (but, beware!)
I External variables remain into existence permanentlyI They retain their values even after the functions that set them
have returned
85
External Variables (cont.)
Definition and declaration of external variables
I An external variable must be defined, exactly once, outside afunction; this sets aside storage for it.
I The variable must also be declared in each function thatwants to access it; this states the type of the variable.
I In general: All variables (automatic or extern) must bedeclared, either explicit or implicit from context
I Definition of a variable, refers to the place where the variableis created and assigned storage
I Declaration of a variable, refers to places where the nature ofthe variable is stated but no storage is allocated
86
1 #include
2 #define MAXLINE 1000 /* maximum input line size */
3
4 int max; /* maximum length seen so far */
5 char line[MAXLINE ]; /* current input line */
6 char longest[MAXLINE ]; /* longest line saved here */
7
8 int getline(void);
9 void copy(void);
10
11 /* print longest line; external objects , weak solution */
12 int
13 main(void)
14 {
15 int len; /* current line length */
16 extern int max;
17 extern char longest [];
18
19 max = 0;
20 while ((len = getline ()) > 0)
21 if (len > max) {
22 max = len;
23 copy ();
24 }
25 if (max > 0) /* there was a line */
26 printf("%s", longest );
27 return (0);
28 }
29
30 int
31 getline(void)
32 {
33 int c, i;
34 extern char line [];
35
36 for (i=0; i < MAXLINE -1
37 && (c=getchar ()) != EOF && c != \n; ++i)
38 line[i] = c;
39 if (c == \n) {
40 line[i] = c;
41 ++i;
42 }
43 line[i] = \0;
44 return i;
45 }
46
47 void
48 copy(void)
49 {
50 int i;
51 extern char line[], longest [];
52
53 i = 0;
54 while (( longest[i] = line[i]) != \0)
55 ++i;
56 }
87
30 int
31 getline(void)
32 {
33 int c, i;
34 extern char line [];
35
36 for (i=0; i < MAXLINE -1
37 && (c=getchar ()) != EOF && c != \n; ++i)
38 line[i] = c;
39 if (c == \n) {
40 line[i] = c;
41 ++i;
42 }
43 line[i] = \0;
44 return i;
45 }
46
47 void
48 copy(void)
49 {
50 int i;
51 extern char line[], longest [];
52
53 i = 0;
54 while (( longest[i] = line[i]) != \0)
55 ++i;
56 }
88
-
Terminology: External vs. Internal
I A C program consists of a set of external objects, which areeither variables or functions
I Function are always external, because C does not allowfunctions to be defined inside other functions
I External is used in contrast to internal, which describes thearguments and variables used inside functions
I By default, external variables and functions have the propertythat all references to them by the same name, even fromfunctions compiled separately, are references to the samething (this is called external linkage in the standard)15
15We will see later how to define external variables and functions that arevisible only within a single source file, once again, the keyword is static
89
Static Internal Variables
The static declaration can be applied to internal variables
I Internal static variables are local to a particular function (justas automatic variables), but unlike automatics, they remain inexistence over different invokations of the function
I This means that internal static variables provide private,permanent storage within a single function
void
f(unsigned int m, long n)
{
static int i;
...
}
90
Static External Variables And Functions
The static declaration can be applied to external objects
I Applied to an external variable or function, static limits thescope of that object to the rest of the source file
I It provides a way to hide names otherwise globally visible
static char buf[BUFSIZE ];
static int bufp = 0;
static void
f(register unsigned int m, register long n)
{
...
}
91
Register Variables
The register declaration
I advises the compiler that the variable in question will beheavily used. The idea is to place it in a machine register
I Compiler are free to ignore the adviceI Can only be used with automatics and formal argumentsI Not possible to take the address of a register variable
void
f(register unsigned int m, register long n)
{
register int i;
...
}
92
-
Initialization
I In the absence of explicit initialization, external and staticvariables are guaranteed to be initialized to zero.
I Scalar variables may be initialized when they are defined, byfollowing the name with an equals sign and an expression:
int x = 1;
char squote = \;
long day = 1000L * 60L * 60L * 24L; /* milliseconds/day */
I For external and static variables, the initializer must be aconstant expression; the initialization is done once,conceptually before the program begins execution.
I For automatic and register variables, it is done each time thefunction or block is entered (not restricted being a constant)
93
Block Structure And Scope
I Declarations of variables (including initializations) may followthe left brace that introduces any compound statement, notjust the one that begins a function
I They hide any identically named variables in outer blocksI They remain into existence until the matching right braceI What is the scope of i?
if (n > 0) {
int i; /* declare a new int */
for (i = 0; i < n; i++)
...
}
94
Block Structure And Scope
I An automatic variable declared and initialized in a block isinitialized each time the block is entered
I A static variable is initialized only the first time the block isentered
I Automatic variables, including formal parameters, also hideexternal variables and functions of the same name
int x;
int y;
void
f(double x)
{
double y;
...
}
95
Systems Programming02. C Programs in Space and Time
Alexander Holupirek
Database and Information Systems GroupDepartment of Computer & Information Science
University of Konstanz
Summer Term 2008
96
-
C Programs In (Address) Space And (Run-)time
Where is my data and why do I have to know?
I C is closely related to the machine. Before talking aboutpointers, storage allocation etc. some background knowledgeabout address space, (virtual) memory and its allocationduring program execution comes in handy
I Knowledge about the memory layout of a program is quitehelpful when debugging
I Knowledge about what is happening inside the machine onprogram execution is fundamental, to both, debuggingprograms and, in first place, writing clean code
97
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
98
C, Assembler, And Machine Code
int a, b;
a = b * b;
mov 0x403030,%eaximul 0x403030,%eaxmov %eax,0x403020
4012ee a14012ef 304012f0 304012f1 404012f2 004012f3 0f4012f4 af4012f5 054012f6 304012f7 304012f8 404012f9 004012fa a34012fb 204012fc 304012fd 404012fe 00
ausfhrbarerBinrcode (hexa-dezimal dargestellt)
Intel iA32-Assembler-Quellcode
Maschinenbefehle bzw.Prozessorinstruktionen
Adresse Inhalt (je 1 Byte)
C-Quellcode
99
C, Assembler, And Machine Code
int a=4, b;
int main(void) {
if (a>5)
b=1;
else
b=0;
}
8048344: 83 3d 94 94 04 08 05 cmpl $0x5,0x8049494804834b: 7e 0c jle 8048359
804834d: c7 05 8c 95 04 08 01 movl $0x1,0x804958c8048354: 00 00 00
8048357: eb 0a jmp 8048363
8048359: c7 05 8c 95 04 08 00 movl $0x0,0x804958c8048360: 00 00 00
8048363: c9 ...
Speicher-adresse
Speicherinhalt(=Maschinenbefehl)
C-Quellcode Ausfhrbarer Binrcode Assembler-Quellcode
a liegt auf Adresse 0x8049494b liegt auf Adresse 0x804958c
Zahlenwerte in Binr- und Assemblercodesind alle hexadezimal zu verstehen
100
-
Address Space
0
max.
0x10000000
0x1000000f0x10000010
Datenblock
0x500000000x50000001
16 Byte
Gre desDatenblocks
Startadresse desDatenblocks
Letzte Byteadressedes Datenblocks
Adresse des erstenByte nach demDatenblock
Tiefstmgliche Adresse(Speicherbeginn)
Hchstmgliche Adresse(Speicherende)
Speicheradressen Speicherinhalte
Adressen einzelnerByte
0x56
0xfc
101
Byte Ordering
Adr.
AdressraumDaten (4 Byte):
MSB LSBd3 d2 d1 d0
0
n
max.
Big-Endian-System Little-Endian-System
Adr. InhaltMSB
LSB
Mit der Adresse n wird auf die 4 Byte groen Daten im Programm zugegriffen
nn+1n+2n+3
d3d2d1d0
Adr. Inhaltd0d1d2d3
nn+1n+2n+3
LSB
MSB
MSB = Most Significant Byte (hchstwertiges Byte)LSB = Least Significant Byte (niedrigstwertiges Byte)
102
Alignment Rules
Goal: Optimal Performance
I Determine the address locations for variables and instructionsI Great impact on compiler, assembler, linker tools
Adressraum
Adressen(hexadezimal)
0x350x360x370x38
Daten-Langwort(misaligned)
Datenbus
Adressoffsets (Byteadressen)
1. Zugriff
2. Zugriff
Langwortgrenzen auf dem Bus
Langwortgrenzen (ohne Rest durch 4 teilbar) im Adressraum
+00x34
+10x35
+20x36
+30x37
0x38 0x39 0x3a 0x3b
103
Alignment Rules (cont.)
For derived types16 (constructed from the basic types) alignmentrules apply to each single component:
struct artikel {char name[5];int anzahl;double preis;};
alignment(1) alignment(4)
Alignment rules may be influenced through compiler directives(-malign-int aligns variables on 32-bit boundaries producing code that runs
somewhat faster on processors with 32-bit busses at the expense of memory)
16arrays, functions, pointers, structures, unions (we will discuss them later)104
-
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
105
Storage Classes
Placement of data in memory depends on storage class
I An object, such as a variable, is a location in storage, and itsinterpretation depends on two main attributes: its storageclass and its type
I The storage class determines the lifetime of the storageassociated with the identified object
I The types determines the meaning of the values found in theidentified object.
I In C we have two storage classes: automatic and staticI Storage class specifiers (auto, extern, register, static)
together with the context of an objects declaration, specifyits storage class
106
Automatic Storage Class
Automatic Objects
I auto and register give the declared objects automaticstorage class, and may be used only within functions
I They are local to a block17, discarded on exit from the blockI Declarations within a block create automatic objects if no
storage class specification is mentioned or auto is used
I Initialization of automatic objects is performed each time theblock is entered at the top (if a jump into the block isexecuted the initializations are not performed)
I Objects declared register are automatic, and are (ifpossible) stored in fast registers of the machine
I For register the address operator & is not allowed
17aka compound statement, such as the body of a function107
Static Storage Class
Static Objects
I May be local to a block or external to all blocksI In both cases, they retain their values across exit from and
reentry to functions and blocks
I Within a block, static objects are declared with staticI Objects declared outside of all blocks (at the same level as
function definitions) are always static
I On the outer level, the keyword static makes them local toa particular translation unit (internal linkage)
I They are global to an entire program by omitting an explicitstorage class, or by using extern (external linkage)
108
-
Storage Class And Sections
Intermediate Summary
I A program executed does not only use storage for itsinstructions, but additionally needs space for, e.g., variables
I Variables may be temporary, dynamically allocated, or static(i.e., permanent in terms of storage allocation), initialized oruninitialized, declared as constant (const) and thus read-only
I Placement of data in memory depends on its storage classI During the translation process the compiler uses sections to
divide the address space into logical units
I Details vary with operating systems and compiler used
109
Typical Program Organisation
A typical program divides naturally in sections
Code machine instructions, should be unmodifiable, size is knownafter compilation, does not change (.text)
Data I static dataI initialized (.data) /uninitialized (.bbs)I constant address in memoryI permanent life time
I dynamic dataI stack or heapI storage space not knownI volatile life time
110
Program Sections
.text
.data
.bss
PROM oder RAM
RAM
RAM
Adressraum
schreibgeschtzt
PROM:Programmable Read Only Memory(im Betrieb nicht beschreibbarerSpeicherbaustein)
RAM:Random Access Memory(Speicher mit wahlfreiem Zugriff)
111
Virtual Memory And Segments
Virtual Memory
I Whenever a process is created, the kernel provides a chunk ofphysical memory which can be located anywhere
I Through the magic of virtual memory (VM), the processbelieves it has all the memory on the computer
Typically the VM space is laid out in a similar manner:
I Text Segment (.text)I Initialized Data Segment (.data)I Uninitialized Data Segment (.bss)I The StackI The Heap
112
-
A Program In Memory
Code, Konstanten
initialisierte Datennicht initialisierte Daten
Heap
0Ad
ress
en
aus ausfhrbarer Datei geladen
bei Prozessstart bereitgestelltund mit 0 initialisiert (gelscht)bei Prozessstart bereitgestellt,fr dynamische Speicherallozierung,
bei Prozessstart bereitgestellt,wchst zu tieferen Adressen(bzw. zu hheren Adr.;
wchst dem Stapel entgegen
prozessorabhngig)Stack
staticdatadynamicdata
113
Different Memory Layouts
Code, Konstanten
initialisierte Datennicht initialisierte Daten
Heap
0
Adre
ssen
Stack
Code, Konstanten
initialisierte Datennicht initialisierte Daten
Heap
Stack
0
Adre
sse
n
(A) Lsung auf PC (iA32) (B) Stack umgekehrt wachsend
Programm-startadresse
114
Memory Segments
Text Segment. The text segment contains the actual code(including constants) to be executed. Its usually sharable, somultiple instances of a program can share the text segment tolower memory requirements. This segment is usually markedread-only so a program cant modify its own instructions.
Initialized Data Segment. This segment contains global variableswhich are initialized by the programmer.
Uninitialized Data Segment. Also named .bss (block started bysymbol) which was an operator used by an old assembler.This segment contains uninitialized global variables. Allvariables in this segment are initialized to 0 or NULL pointersbefore the program begins to execute.
115
Memory Segments (cont.)
The Stack The stack is a collection of stack frames which we willdiscuss later. When a new frame needs to be added (as aresult of a newly called function), the stack grows downward.
The Heap Dynamic memory, where storage can be (de-)allocatedvia Cs free(3)/malloc(3). The C library also getsdynamic memory for its own personal workspace from theheap as well. As more memory is requested on the fly, theheap grows upward.
116
-
Variable Placement And Life Time (Code)
int a;
static int b;
void
func(void)
{
char c;
static int d;
}
int
main(void)
{
int e;
int *pi = (int*) malloc(sizeof(int));
func ();
func ();
free(pi);
return (0);
}
117
Variable Placement And Life Time (Code)
int a; /* Permanent life time */
static int b; /* dito , but reduced scope */
void
func(void)
{
char c; /* only for the life time of func() */
/* but 2x; visible only in func() */
static int d; /* im unique , exist once at a stable */
/* address , visible only in func() */
}
int
main(void)
{
int e; /* life time of main() */
int *pi = (int*) malloc(sizeof(int)); /* newborn */
func ();
func ();
free(pi); /* RIP , pi points to an invalid address */
return (0);
}
118
Variable Placement And Life Time (Diagram)
t=0: Programmausfhrung wirdgestartet, d.h., Ausfhrungsum-gebung ist bereits initialisiert
t=x: beliebiger Zeitpunkt whrendder Programmausfhrung
Code
Daten
Halde (Heap)
Stapel (Stack)
Adresse0
max.
PC(t=0)PC(t=x)
pi
SP(t=0)
SP(t=x)
1. Instruktion2. Instruktion3. Instruktion4. Instruktion...
ab
cpie
intd
119
Variable Placement
Variables (outside a function) Globally declared variables go to theUninitialized Data Segment if they are not initialized, toInitialized Data Segment otherwise. Necessary for the OS todecide if storage has to be loaded with initialization datafrom the executable binary.
Variables (inside a function) Implicit assumption of auto, go toThe Stack. Declared as static, see above.
Constants (const) Text Segment
Function Parameters Are pushed on The Stack or stored inregisters. If pointers are passed, data is elsewhere.
120
-
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
121
From Source Code To Executable Code
Translation Steps (multi-phase compilation)
Compilation HLL source code to assembler source code
Assembly Assembler source code to object code
Linking Object code to executable code
Compilers and assemblers create object files containing thegenerated binary code and data for a source file. Linkers combinemultiple object files into one, loaders take object files and loadthem into memory.
Goal: An executable binary file (a.out)
From high-level language (HLL) source code to executable code,i.e., concrete processor instructions in combination with data.
122
Translation Steps Using gcc(1)
Prprozessor Compiler Assembler Binder
*.c/*.cc/*.cpp
*.s
*.s
*.o
*.o/*.a
a.out
Eingabe-
Ausgabe-
Quellcode C/C++ Assembler-Quellcode
Assembler-Quellcode Objektdatei Ausfhrbare Datei(= Objektdatei, ladbar)
Objektdatei,
*.i/*.ii
Vorverarbeiteter
Bibliotheksdatei
dateien
dateien
C/C++-Quellcode (ungebunden)Objektdatei(ungebunden)
123
File Suffixes And Their Meaning
For any given input file, the file name suffix determines what kindof compilation is done (see gcc(1)) for more details and suffixes:
suffix compilation step
.c C source code which must be preprocessed
.i C source code which should not be preprocessed
.h Header file to be turned into a precompiled header
.s Assembler code
.o An object file to be fed straight into linking
124
-
Creation Of An Executable File
= Operation
= Eingang oder= Kommando
(Filename).o
a.out
ld
gas
Assemblieren
(Filename).sgcc
Kompilieren
(Filename).c
Object/Library Files
Binden
Ausgang
125
The C Preprocessor
The C preprocessor performs . . .
I Inclusion of named filesI Macro SubstitutionI Conditional Compilation
126
File Inclusion
A control line of the form
#include filename
causes the replacement of that line by the entire contents of thefile filename.
NoteThe characters in the name filename must not include > or \n, andthe effect is undefined if it contains any of ", , \ , or /*.
LocationThe named file is searched for in a sequence of implementation-dependent places (often starting in /usr/include).
127
Macro Substitution
A control line of the form
#define identifier token -sequence
causes the preprocessor to replace subsequent instances of theidentifier with the given sequence of tokens.
Example
#define EXIT_FAILURE 1
#define EXIT_SUCCESS 0
#define S_IRWXU 0000700 /* RWX mask for owner */
#define S_IRUSR 0000400 /* R for owner */
#define S_IWUSR 0000200 /* W for owner */
#define S_IXUSR 0000100 /* X for owner */
128
-
Macro Substitution (cont.)
A control line of the form
#define identifier( identifier -list ) token -sequence
where there is no space between the first identifier and the (, is amacro definition with parameters given by the identifier list.
Example
#define S_ISDIR(m) ((m & 0170000) == 0040000) /* directory */
#define S_ISCHR(m) ((m & 0170000) == 0020000) /* char sp. */
#define S_ISBLK(m) ((m & 0170000) == 0060000) /* block sp.*/
#define S_ISREG(m) ((m & 0170000) == 0100000) /* regular */
#define S_ISFIFO(m) ((m & 0170000) == 0010000) /* fifo */
129
Macro Substitution (cont.)
A control line of the form
#undef identifier
causes the identifiers preprocessor definition to be forgotten. It isnot erroneous to apply #undef to an unknown identifier.
Example
/*
* Some header files may define an abs macro.
* If defined , undef it to prevent a syntax error
* and issue a warning.
* #warning is a pragma (implementation -dependent action)
*/
#ifdef abs
#undef abs
#warning abs macro collides with abs() prototype , undefining
#endif
130
Conditional Inclusion
Parts of a program may be compiled conditionally
Example
#ifndef NULL
#ifdef __GNUG__
#define NULL __null
#else
#define NULL 0L
#endif
#endif
131
Predefined Names
Several identifiers are predefined, and expand to produce specialinformation. They, and also the preprocessor expression operatordefined, may not be undefined or redefined.
LINE A decimal constant containing the current source line numberFILE A string literal containing the name of the file being compiledDATE A string literal containing the data of compilation Mmm dd yyyyTIME A string literal containing the data of compilation hh:mm:ss
STDCThe constant 1. It is intended that this identifier be defined tobe 1 only in standard-conforming implementations
132
-
Compilation
HLL-Quellcode Compiler
Assembler-Quellcode
bersetzungsliste mit
Text
Text
Text
evtl. temporre Dateien
Kompilation
Fehlermeldungen
133
Assembly
Assembler-
Assemblierung
Assembler
Maschinencode und
bersetzungsliste mit Fehler-
Text
Objektformat
Text
evtl. temporre Dateien
Quellcode
Zusatzinformationen
meldungen und Symboltabelle
134
Linking
Binden
Binder (Linker)Absoluter Code oder relozier-
Link Map (Adressraum-benutzung), Symbolliste
Binrcode od.
Text
evtl. temporre DateienObjektformat
Objektformat
Bibliotheksobjektformat
Maschinencode und Zusatzinfo.
Maschinencode und Zusatzinfo.
Maschinencode und Zusatzinfo. librarysearch
Objektformat
barer Code mit Zusatzinfo.
135
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
136
-
Program Section In Virtual Memory
Sektion .text (Code):
Sektion .data (init. Daten)
0
xx
0
yy
Adressraum0
0x08048244
0x08049370
0xffffffff
Nach Kompilation Nach Bindung
Jede Sektion beginnt bei Adr. 0, Sektionen Alle Sektionen sind im Adress-sind logische. Adressrume des Compilers raum absolut platziert
137
Linking An Executable Binary
OBJ1
OBJ2
OBJ3
.data1
.text2 .bss2
.text3 .data3 .bss3
.text1 .bss1
.text1 .text2 .text3 .data1 .data3 .bss1 .bss2 .bss3
Eingabedaten: ungebundene Objektdateien
Verarbeitungsresultat: ausfhrbare Datei (gebunden, reloziert)
Bindung (linking)
OBJtotal
.text: Code
.data: initialisierte Variablen
.bss: nicht initialisierte Variablen
I Each object code (compiled seperately) starts at address 0I Linking them together involves
I centralization of sectionsI relocation of adresses
138
Relocation Records
I Once sections are placed subsequently, relocation can startI Executable code contains embedded addressesI Static data, function calls, jump targetsI On relocation those have to be changed inside the codeI Without a relocation table this is not possibleI A relocation record holds the relative address of a symbol
(name of a variable, a function etc.)
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000001a R_386_32 b
00000023 R_386_32 a
00000029 R_386_32 b
139
Source File: compile.c
int a = 1; /* Global variable , initialized -> .data */
int b; /* Global variable , uninitialized -> .bss */
int
main(void)
{
static int c; /* Local , static variable -> .bss */
b = 5;
c = b + a + 16;
return c;
}
I Compile a relocatable object file
cc -c compile.c (creates compile.o)
I Linking an executable binary (one-step compilation)
cc compile.c -o compile
140
-
Analysis of Object Files: compile.o
$ file compile.oELF 32-bit LSB relocatable , Intel 80386 , version 1, not stripped
$ objdump -x compile.ocompile.o: file format elf32 -i386
compile.o
architecture: i386 , flags 0x00000011:
HAS_RELOC , HAS_SYMS
start address 0x00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000005a 00000000 00000000 00000034 2**2
CONTENTS , ALLOC , LOAD , RELOC , READONLY , CODE
1 .data 00000004 00000000 00000000 00000090 2**2
CONTENTS , ALLOC , LOAD , DATA
2 .bss 00000004 00000000 00000000 00000094 2**2
ALLOC
3 .rodata 00000005 00000000 00000000 00000094 2**0
CONTENTS , ALLOC , LOAD , READONLY , DATA
141
Object File: compile.o (cont.)
SYMBOL TABLE:
00000000 l df *ABS* 00000000 compile.c
00000000 l d .text 00000000
00000000 l d .data 00000000
00000000 l d .bss 00000000
00000000 l O .bss 00000004 c.0
00000000 l d .rodata 00000000
00000000 g O .data 00000004 a
00000000 g F .text 0000005a main
00000004 O *COM* 00000004 b
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000001a R_386_32 b
00000023 R_386_32 a
00000029 R_386_32 b
00000031 R_386_32 .bss
00000036 R_386_32 .bss
0000004c R_386_32 .rodata
142
compile.o: file format elf32 -i386
Disassembly of section .text:
00000000 :
0: 55 push %ebp
1: 89 e5 mov %esp ,%ebp
3: 83 ec 18 sub $0x18 ,%esp6: 83 e4 f0 and $0xfffffff0 ,%esp9: b8 00 00 00 00 mov $0x0 ,%eaxe: 29 c4 sub %eax ,%esp
10: a1 00 00 00 00 mov 0x0 ,%eax
15: 89 45 e8 mov %eax ,0 xffffffe8 (%ebp)
18: c7 05 00 00 00 00 05 movl $0x5 ,0x01f: 00 00 00
22: a1 00 00 00 00 mov 0x0 ,%eax
27: 03 05 00 00 00 00 add 0x0 ,%eax
2d: 83 c0 10 add $0x10 ,%eax30: a3 00 00 00 00 mov %eax ,0x0
35: a1 00 00 00 00 mov 0x0 ,%eax
3a: 8b 55 e8 mov 0xffffffe8 (%ebp),%edx
3d: 3b 15 00 00 00 00 cmp 0x0 ,%edx
43: 74 13 je 58
45: 83 ec 08 sub $0x8 ,%esp48: ff 75 e8 pushl 0xffffffe8 (%ebp)
4b: 68 00 00 00 00 push $0x050: e8 fc ff ff ff call 51
55: 83 c4 10 add $0x10 ,%esp58: c9 leave
59: c3 ret 143
compile.o: file format elf32 -i386
Disassembly of section .text:
00000000 :
int b; /* Global variable , uninitialized -> .bss */
int
main(void)
{
0: 55 push %ebp
... 6 more lines ...
15: 89 45 e8 mov %eax ,0 xffffffe8 (%ebp)
static int c; /* Local , static variable -> .bss */
b = 5;
18: c7 05 00 00 00 00 05 movl $0x5 ,0x01f: 00 00 00
c = b + a + 16;
22: a1 00 00 00 00 mov 0x0 ,%eax
27: 03 05 00 00 00 00 add 0x0 ,%eax
2d: 83 c0 10 add $0x10 ,%eax30: a3 00 00 00 00 mov %eax ,0x0
return c;
35: a1 00 00 00 00 mov 0x0 ,%eax
}
... 10 more lines ...
144
-
Executable Binary File: compile
compile: file format elf32 -i386
compile
architecture: i386 , flags 0x00000112:
EXEC_P , HAS_SYMS , D_PAGED
start address 0x1c000408
Sections:
Idx Name Size VMA LMA File off Algn
...
9 .text 00000214 1c000408 1c000408 00000408 2**2
CONTENTS , ALLOC , LOAD , READONLY , CODE
...
12 .data 00000014 3c001008 3c001008 00001008 2**2
CONTENTS , ALLOC , LOAD , DATA
...
20 .bss 00000184 3c003100 3c003100 00001100 2**5
ALLOC
SYMBOL TABLE:
3c003140 l O .bss 00000004 c.0
3c003280 g O .bss 00000004 b
1c0005c0 g F .text 0000005a main
3c001018 g O .data 00000004 a
145
1c0005c0 :
int b; /* Global variable , uninitialized -> .bss */
int
main(void)
{
1c0005c0: 55 push %ebp
1c0005c1: 89 e5 mov %esp ,%ebp
1c0005c3: 83 ec 18 sub $0x18 ,%esp1c0005c6: 83 e4 f0 and $0xfffffff0 ,%esp1c0005c9: b8 00 00 00 00 mov $0x0 ,%eax1c0005ce: 29 c4 sub %eax ,%esp
1c0005d0: a1 00 31 00 3c mov 0x3c003100 ,%eax
1c0005d5: 89 45 e8 mov %eax ,0 xffffffe8 (%ebp)
static int c; /* Local , static variable -> .bss */
b = 5;
1c0005d8: c7 05 80 32 00 3c 05 movl $0x5 ,0 x3c0032801c0005df: 00 00 00
c = b + a + 16;
1c0005e2: a1 18 10 00 3c mov 0x3c001018 ,%eax
1c0005e7: 03 05 80 32 00 3c add 0x3c003280 ,%eax
1c0005ed: 83 c0 10 add $0x10 ,%eax1c0005f0: a3 40 31 00 3c mov %eax ,0 x3c003140
return c;
1c0005f5: a1 40 31 00 3c mov 0x3c003140 ,%eax
}
146
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
147
Relocation Of An Assembler Instruction
During the linking process relocated addresses are injected in thecode, for example the assignment b = 5;
Before relocation (relocatable compile.o):
18: c7 05 00 00 00 00 05 movl $0x5 ,0x01c0005d8: c7 05 80 32 00 3c 05 movl $0x5 ,0 x3c003280After relocation (executable compile ):
The proper address for b can be found in the symbol table.
SYMBOL TABLE: (compile)
3c003280 g O .bss 00000004 b
I The symbol table for compile yields 3c003280 for variable b
148
-
Relocation Of An Assembler Instruction (cont.)
? How to find the right places in the machine code to performthe substitutions?
I Linker has relocation record (relative address) of b
RELOCATION RECORDS FOR [.text]: (compile.o)
0000001a R_386_32 b
I Linker has absolute address of main from symbol table
SYMBOL TABLE: (compile)
3c003280 g O .bss 00000004 b
1c0005c0 g F .text 0000005a main
149
Relocation Of An Assembler Instruction (cont.)
Putting it all together:
RELOCATION RECORDS FOR [.text]: (compile.o)
0000001a R_386_32 b (relative offset)
SYMBOL TABLE: (compile)
3c003280 g O .bss 00000004 b (abs. address of b)
1c0005c0 g F .text 0000005a main (abs. address of main)
Computing the address where substitution must be performed:
1c0005c0 + 0000001a = 1c0005da
18: c7 05 00 00 00 00 05 movl $0x5 ,0x01c0005d8: c7 05 80 32 00 3c 05 movl $0x5 ,0 x3c003280
150
Systems Programming03. Functions and Program Structure
Alexander Holupirek
Database and Information Systems GroupDepartment of Computer & Information Science
University of Konstanz
Summer Term 2008
151
Schedule For Today
Please make sure to register to the course via StudIS. You can notattend the examination, otherwise.
So far: Static view of the program (before run-time)
I Compilation in different stepsI Program files (e.g., in the ELF) contain sectionsI Sections are mapped to VM segmentsI Observed correlation between static storage class specifier,
sections in ELF file and location in virtual memory
Today: Dynamic view on the program (during run-time)
I A closer look at functionsI Automatic allocation of memory on the stack/the heap
152
-
Basics of Functions
Functions Returning Non-integers
External Variables
Scope Rules
Header Files
Static Variables
A Program in Execution - Unix Run-time
153
Basics Of Functions
Basics of Functions
I Break large computer tasks into smaller onesI Enable people to build on what other have doneI No starting over from scratchI Hide details of operation from parts of the program that dont
need to know about them
I Structure the programI Easing pain of making changes
154
A Simple Version Of The Unix Tool grep(1)
Basic task for simple grep:
Print each line of input that contains a particular pattern
Example:
Input: Text in /etc/servicesPattern: http
$ ./a.out < /etc/services# See also http ://www.iana.org/assignments/port -numbers
www 80/ tcp http # WorldWideWeb HTTP
https 443/ tcp # secure http (SSL)
155
Program Layout Of Simple grep
Simple grep falls neatly into three pieces:
while (there is another line)if (the line contains the pattern)
print it
I As said, small pieces are easier to deal with than one big oneI Irrelevant details can be buried in the functionsI Chance of unwanted interactions is minimizedI Pieces may even be useful in other programs
156
-
A Function For Each Problem
Simple grep falls neatly into three pieces:
while (there is another line) getline()if (the line contains the pattern)
print it printf(3)
Decide whether the line contains an occurence of the pattern
We write strindex(s, t) that returns the position or index inthe string s where the string t begins, or -1 if s does not contain t
If we later want to switch to more sophisticated patternmatching, we only have to replace strindex; the rest of thecode remains the same.18
18The standard library provides strstr(3) that is similar to strindex, exceptthat it returns a pointer instead of an index
157
Source Code grep:main
#include
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int max);
int strindex(char source[], char searchfor []);
char pattern [] = "http"; /* pattern to search for */
/* find all lines matching pattern */
int
main(void)
{
char line[MAXLINE ];
int found = 0;
while (getline(line , MAXLINE) > 0)
if (strindex(line , pattern) >= 0) {
printf("%s", line);
found ++;
}
return found;
}
158
Source Code grep:strindex
/* strindex: return index of t in s, -1 if none */
int
strindex(char *s, char *t)
{
int i, j, k;
for (i = 0; s[i] != \0; i++) {
for (j=0, k=i; t[j] == s[k] && t[j] != \0; j++, k++)
;
if (j > 0 && t[j] == \0)
return i;
}
return (-1);
}
159
Function Definition
A function definition has the form:return-typefunction-name(parameter declarations, if any){
declarationsstatements
}
I Various parts may be absent; a minimal function is
void dummy(void) { }which does nothing, accepts nothing, and returns nothing19
I If the return-type is omitted, int is assumed
19. . . but may be used as a place holder during program development160
-
A C Program Seen As Set Of External Objects
C program is just a set of definitions of variables and functions
I Communication between the functions isI by argumentI values returned by the functionsI through external variables
I The functions can occur in any order in the source fileI Source program can be split into multiple files, so long as no
function is split.
161
Returning From Functions
The return Statement . . .. . . is the mechanism for returning a value from the called functionto its caller.
I Any expression can follow returnI expression will be converted to return-type of functionI The calling function is free to ignore the returned valueI There need be no expression after return
I in that case, no value is returned to the caller (garbage)
I Control also returns with no value when execution falls offthe end of the function by reaching the closing right brace
I It is not illegal, but probably a sign of trouble, if a functionreturns a value from one place and no value from another
I If a function fails to return a value, its value is likely garbage
162
Basics of Functions
Functions Returning Non-integers
External Variables
Scope Rules
Header Files
Static Variables
A Program in Execution - Unix Run-time
163
Functions Returning Non-integers
Functions returning non-integer values
I So far we have only returned either no value (void) or an intI What if function must return some other type?I To illustrate how to deal with this, we write and use atof(s).
The function atof(s)Converts the string s to its double-precision floating-pointequivalent. It handles an optional sign and decimal point, and thepresence or absence of either integer part or fractional part20.
20Use atof(3) declared by in real life164
-
Source Code: atof
#include /* isspace , isdigit ... */
double /* atof: convert string s to double */
atof(char s[])
{
double val , power;
int i, sign;
for (i = 0; isspace(s[i]); i++)
; /* skip white space */
sign = (s[i] == -) ? -1 : 1;
if (s[i] == + || s[i] == -)
i++;
for (val = 0.0; isdigit(s[i]); i++)
val = 10.0 * val + (s[i] - 0);
if (s[i] == .)
i++;
for (power = 1.0; isdigit(s[i]); i++) {
val = 10.0 * val + (s[i] - 0);
power *= 10.0;
}
return sign * val / power;
}
165
Declare To Use A Function
I Calling function must know atof(s) returns a non-int valueI One way to ensure this:
I Declare atof() explicitly in the calling function
I This kind of declaration is shown in a primitive calculator:
#include
#define MAXLINE 100
int /* rudimentary calculator */ +123.2
main(void) 123.2
{ -0.2
double sum , atof(char []); 123
char line[MAXLINE ]; +0.7
int getline(char line[], int max); 123.7
-123.1
sum = 0; 0.6
while (getline(line , MAXLINE) > 0)
printf("\t%g\n", sum += atof(line ));
return (0);
}
166
Inconsistent Return Types
The declaration
double sum, atof(char []);
says that sum is a double variable, and that atof is a function thattakes one char[] argument and returns double.
I The function atof must be declared and defined consistentlyI If atof itself and the call to it have inconsistent types in the
same source file, the error will be detected by the compiler
I But if (as is more likely) atof were compiled separately, themismatch would not be detected, atof would return adouble that main would treat as an int, and meaninglessanswers would result
167
Function Declaration By Context
A mismatch can happen,
I if there is no function prototype,I and a function is implicitly declared by its first appearance in
an expression, just like in our calculator expressionsum += atof(line)
Function Declaration By Context
I If a name that has not been previously declared occurs in anexpression and is followed by a left parenthesis, it is declaredby context to be a function name
I The function is assumed to return an intI Nothing is assumed about its arguments
168
-
Missing Function Arguments
If a function declaration does not include arguments, as in
double atof();
this is taken to mean that nothing is to be assumed about thearguments of atof; all parameter checking is turned off.
I This special meaning of the empty argument list is intended topermit older C programs to compile with ANSI/ISO compilers
I If the function takes arguments, declare themI If the function takes no arguments, use void
169
Explicit Cast Of The Return Type
Given atof, properly declared, we could write atoi in terms of it:
/* atoi: convert string s to integer using atof */
int
atoi(char s[])
{
double atof(char s[]);
return (int) atof(s);
}
I The value of the return expression is converted to the type ofthe function before the return is taken.
I Therefore, the value of atof, a double, is convertedautomatically to int when it appears in this return
I This operation does potentially discard information warningI The cast states explicitly that the operation is intended
170
Basics of Functions
Functions Returning Non-integers
External Variables
Scope Rules
Header Files
Static Variables
A Program in Execution - Unix Run-time
171
External Objects
As mentioned, a program is just a set of definitions of variablesand functions. These can be considered as external objects.
I Functions are always external21
I External is used in contrast to internal, which describes thearguments and variables used inside functions
I By default, external variables and functions have theproperty that all references to them by the same name, evenfrom functions compiled separately, are references to the samething (this is called external linkage in the standard)
21C does not allow functions to be defined inside other functions172
-
A Reverse Polish Notation Calculator
We will build a reverse polish notation calculator to discuss
I Function evaluationI Splitting up a program in several source filesI Scope Rules
Infix Notation vs. Reverse Polish Notation
( 1 - 2 ) * ( 4 + 5 )
1 2 - 4 5 + *
Parentheses are not needed; the notation is unambigous as long aswe know how many operands each operator expects.
173
Calculator Design Using A Stack
stack: 1 1 -1 -1 -1 -1 -9
2 4 4 9
5
input: 1 2 - 4 5 + *
Program description
I Each operand arriving is pushed on the stackI Once an operator arrives
I Pop apt number of operands (e.g., two for binary operators)I Apply operator to themI Push the result back onto the stack
I The value on the top of the stack is popped and printed whenthe end of the input line is encountered.
174
Calculator Program Layout
Basic structure of our calculator (controlling main function):
while (next operator or operand is not EOF)if (number)
push itelse if (operator)
pop operandsdo operationpush result
else if (newline)pop and print top of stack
elseerror
175
Program Design Considerations
I Pushing and popping a stack are trivial, but with errorhandling long enough to be put each in a separate function
I A function for fetching the next input operator or operand
Where to put the stack? Who should access it directly?
I Keep it in main. Pass the stack to the routines that push and pop itI But main doesnt need to know about the stackI main only does push and pop operations
I Store the stack and its pointer in external variablesI Accessible to the push and pop functions but not main
176
-
Program Layout In One Source File
Lets think of the program as existing in one source file:
#includes
#defines
function declarations for main
int main(void) { }
external variables for push and pop
void push(double f) { }
double pop(void) { }
int getop(char s[]) { }
routines called by getop
177
1 #include
2 #include
3
4 #define MAXOP 100
5 /* signal number found */
6 #define NUMBER 0
7
8 int getop(char []);
9 void push(double );
10 double pop(void);
11
12 /* reverse polish calc */
13 int
14 main(void)
15 {
16 int type;
17 char s[MAXOP ];
Main loop switches on thetype of operator or operand
18
19 while ((type=getop(s))!= EOF) {
20 switch (type) {
21 case NUMBER:
22 break;
23 case +:
24 break;
25 case *:
26 break;
27 case -:
28 break;
29 case /:
30 break;
31 case \n:
32 break;
33 default:
34 printf("error: unknown\
35 command %s\n", s);
36 }
37 }
38 return (0);
39 }
178
Order Of Function Evaluation
switch (type) {
case NUMBER:
push(atof(s));
break;
case +:
push(pop() + pop ());
break;
case *:
push(pop() * pop ());
break;
case -:
push(pop() - pop ());
break;
case /:
push(pop() / pop ());
break;
case \n:
printf("\t%.8g\n", pop ());
break;
default:
printf("error: unknown command %s\n", s);
}
I What about followingimplementation of switch?
I + and * are commutative, theorder in which the poppedoperands are combined isirrelevant
I - and / left and right operandsmust be distinguished
I / error: zero-divisorI The order in which function
calls are evaluated is not defined
Implementation is erroneous
179
Steering The Order Of Function Evaluation
20 switch (type) {
21 case NUMBER:
22 push(atof(s));
23 break;
24 case +:
25 push(pop() + pop ());
26 break;
27 case *:
28 push(pop() * pop ());
29 break;
30 case -:
31 op2 = pop ();
32 push(pop() - op2);
33 break;
34 case /:
35 op2 = pop ();
36 if (op2 != 0.0)
37 push(pop() / op2);
38 else
39 printf("error: zero divisor\n");
40 break;
41 case \n:
42 printf("\t%.8g\n", pop ());
43 break;
44 default:
45 printf("error: unknown command %s\n", s);
46 }
To guarantee the right order, itis necessary to pop the first valueinto a temporary variable.
180
-
Source Code Stack
I The stack itself and its fill factor (the stack pointer) areshared by push and pop
I Since they are defined outside any function, they are external
51 #define MAXVAL 100 /* maximum depth of val stack */
52
53 int sp = 0; /* next free stack position */
54 double val[MAXVAL ]; /* value stack */
55
56 /* push: push f onto value stack */
57 void
58 push(double f)
59 {
60 if (sp < MAXVAL)
61 val[sp++] = f;
62 else
63 printf("error: stack full , cant push %g\n", f);
64 }
181
Source Code Stack
66 /* pop: pop and return top value from stack */
67 double
68 pop(void)
69 {
70 if (sp > 0)
71 return val[--sp];
72 else {
73 printf("error: stack empty\n");
74 return (0.0);
75 }
76 }
182
Source Code To Get Operands And Operators
83 /* getop: get next operator or numeric operand */
84 int
85 getop(char s[])
86 {
87 int i, c;
88
89 while ((s[0] = c = getch ()) == || c == \t)
90 ;
91 s[1] = \0;
92 if (! isdigit(c) && c != .)
93 return c; /* not a number */
94 i = 0;
95 if (isdigit(c)) /* collect integer part */
96 while (isdigit(s[++i] = c = getch ()))
97 ;
98 if (c == .) /* collect fraction part */
99 while (isdigit(s[++i] = c = getch ()))
100 ;
101 s[i] = \0;
102 if (c != EOF)
103 ungetch(c);
104 return NUMBER;
105 }
183
What Are getch And ungetch?
What are getch and ungetch?
It is often the case that a program cannot determine that it hasenough input until is has read too much.
Example: Collecting the characters that make up a number
Problem: Until the first non-digit is seen, the number is notcomplete. But then the program has read one character too far.Solution: It would be nice if it were possible to un-read theunwanted character.
184
-
The Functions getch And ungetch
getch delivers the next input character to be considered
ungetch remembers the characters put back on the input.Subsequent calls to getch will return them beforereading new input22.
I Work together via a shared buffer and an index in the buffer.I Because of that and because they must retain their values
between calls they must be external to both functions.
22ungetc(3) declared in un-gets a character from input stream185
Source Code: (un-)getch
107 #define BUFSIZE 100
108
109 char buf[BUFSIZE ]; /* buffer for ungetch */
110 int bufp = 0; /* next free position in buf */
111
112 /* getch: get a (possibly pushed back) character */
113 int
114 getch(void)
115 {
116 return (bufp > 0) ? buf[--bufp] : getchar ();
117 }
118
119 /* ungetch: push character back on input */
120 void
121 ungetch(int c)
122 {
123 if (bufp >= BUFSIZE)
124 printf("ungetch: too many characters\n");
125 else
126 buf[bufp ++] = c;
127 }
186
Basics of Functions
Functions Returning Non-integers
External Variables
Scope Rules
Header Files
Static Variables
A Program in Execution - Unix Run-time
187
A Program In Several Files
I As seen in assignments, the functions and variables that makeup a C program need not all be compiled at the same time.
I The source text may be kept in several files, and previouslycompiled routines may be loaded from libraries.
There may arise some questions with this:
I How are declarations written so that variables are properlydeclared during compilation?
I How are declarations arranged so that all the pieces will beproperly connected when the program is loaded?
I How are declarations organized so there is only one copy?I How are external variables initialized?
188
-
Visibility Scope
Visibility Scope
The scope of a name is the part of the program within which thename can be used.
I For an automatic variable declared at the beginning of afunction, the scope is the function in which the name isdeclared
I Local variables of the same name in different functions areunrelated
I The same is true of the parameters of the function, whichare in effect local variables
I The scope of an external variable or a function lasts fromthe point at which it is declared to the end of the file beingcompiled
189
Scope In Natural Order Of Appearance
main, sp, val, push, & pop defined in one file, in the order shown:
int
main(void)
{ ... }
int sp = 0;
double val[MAXVAL ];
void
push(double f)
{ ... }
double
pop(void)
{ ... }
Variables sp and val may be used in push and pop simply bynaming them; no further declarations are needed.But these names are not visible in main, nor are push and pop
190
Definition & Declaration Of External Variables
Definition and Declaration of External Variables
I If an external variable is to be referred to before it is definedI Or if it is defined in a different source file from the one where
it is being used
I Then an extern declaration is mandatory.
It is important to distinguish between the declaration of anexternal variable and its definition.
definition causes storage to be set aside (sets storage class)
declaration announces the properties of a variable (its type)
191
Definition And Declaration Of External Variables
Consider the lines to appear outside of any function:
int sp;
double val[MAXVAL ];
I They define the external variable sp and valI and cause storage to be set asideI and serve as the declaration for the rest of that source file.
On the other hand, consider the lines:
extern int sp;
extern double val[];
I They declare for the rest of the source file that sp is an intand val is a double[] (whose size is determined elsewhere)
I They do not create the variables or reserve storage for them.
192
-
Wrap Up Definition And Declaration
Wrap Up Definition and Declaration
I There must be only one definition of an external variableamong all files that make up the program.
I Initialization of an external variable is possible only withinthe definition
I Other files may contain extern declarations to access it23
I Array sizes must be specified with the definition, but areoptional with an extern declaration.
23There may also be extern declarations in the file containing the definition193
Definition/Declaration Of Externals
Although it is not a likely organization for this program
I functions push and pop could be defined in one fileI variables val and sp defined and initialized in another.
These definitions and declarations tie them together:
extern int sp; #define MAXVALUE 100
extern double val[];
int sp = 0;
void push(double f) { ...} double val[MAXVALUE ];
double pop(void) { ... }
I Because the extern declarations lie ahead of and outside thefunction definitions, they apply to all functions
I One set of declarations suffices for all of the left fileI The same organization would also be needed if the definitions
of sp and val followed their use in one file
194
Basics of Functions
Functions Returning Non-integers
External Variables
Scope Rules
Header Files
Static Variables
A Program in Execution - Unix Run-time
195
Program Organisation In Different Files
Let us now divide the calculator program into several source files(as a simulation for substantially bigger programs)
I main main.cI push and pop, and their variables stack.cI getop getop.cI getch and ungetch getch.c24
24We seperate them from the others because they would come from aseperately-compiled library in a realistic program
196
-
Header File
What about the definitions & declarations shared among files?
I As much as possible, we want to centralize thisI As a consequence, there would be only one copy to get right
and keep right as the program evolves
I We will place this common material in a header file calc.hI It will be included by the others as necessaryI There is a tradeoff between the desire that each file have
access only to the information it needs for its job and thepractical reality that it is harder to maintain more header files
I Up to some moderate program size, it is probably best to haveone header file that contains everything that is to be sharedbetween any two parts of the program
197
Program Structure
calc.h main.c
#define NUMBER 0 #include
void push(double ); #include
double pop(void); #include "calc.h"
int getop(char []); #define MAXOP 100
int getch(void);
void ungetch(int); int main(void) {}
getch.c stack.c getop.c
#include #include #include
#define BUFSIZE 100 #include "calc.h" #include
#define MAXVAL 100 #include "calc.h"
char buf[BUFSIZE ]; int sp = 0;
int bufp = 0; double val[MAXVAL ];
int getch(void) {} void push(double) {} int getop(char []) {}
void ungetch(int) {} double pop(void) {}
198
Basics of Functions
Functions Returning Non-integers
External Variables
Scope Rules
Header Files
Static Variables
A Program in Execution - Unix Run-time
199
Static Variables
The variables
I sp and val in stack.c and buf and bufp in getch.cI are for private use of functions in their source filesI are not meant to be accessed by anything else
The static declaration
I applied to an external variable or functionI limits the scope of that object to the rest of the source file
External static thus provides a way to hide names like buf andbufp in the getch-ungetch combination, which must be external sothey can be shared, yet which should not be visible to users ofgetch and ungetch.
200
-
External Static Example
If the two function and the two variables are compiled in one file:
static char buf[BUFSIZE ]; /* buffer for ungetch */
static int bufp = 0; /* next free position in buf */
int getch(void) { ... }
void ungetch(int c) { ... }
I No other function will be able to access buf and bufpI The names will not conflict with the names in other files of
the same program
I The same goes for sp and val in stack.c
201
Specifier static For Functions
I The external static is most often use for variablesI But can be applied to functions as wellI Normally, function names are global, however, declared static
its name is invisible outside of the file in which it is compiled.
$ readelf -s global.o
Symbol table .symtab contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS global.c
2: 00000000 0 SECTION LOCAL DEFAULT 1
3: