hkust summer programming course 2008
DESCRIPTION
HKUST Summer Programming Course 2008. C API ~ Interfacing C programming. Overview. Introduction Streams Output Functions Input Functions File-Related Functions Memory Management Functions Other aspects of C Programming. C API. Introduction. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
1Department of Computer Science and Engineering, HKUST
HKUST SummerProgramming Course 2008
C API
~ Interfacing C programming
2
Overview
Introduction Streams Output Functions Input Functions File-Related Functions Memory Management Functions Other aspects of C Programming
3Department of Computer Science and Engineering, HKUST
C API
Introduction
4
Motivation
There is quite a lot of functionalities that the system offers to the programmers, as examples: Creating and destroying processes. Reading/Writing files. Opening network connection. ….
Historically, these functionalities were implemented as C functions, some library authors make C++ wrappers, but many programmer still like to use C versions.
C functions maybe faster then C++ functions. There are some overhead in the wrappers.
5
C functions to cover in this lecture
Formatted I/O Output: printf, fprintf, sprintf, snprintf Input: scanf, fscanf, sscanf
File I/O fopen, feof, fgetc, fputc, fread, fwrite,
fclose, fseek, rewind, fflush
Memory Allocation malloc, realloc, calloc, free, memcpy, memset
6
References
ALWAYS look up manual pages before using the function for the first time. Important: Learn how to read manual page
Most material in this website is extracted from: http://www.cplusplus.com/ref/
If there is anything unclear, you can always reference the aforementioned website.
7Department of Computer Science and Engineering, HKUST
C API
Streams
8
Streams in C
file1
file2
fileN
……
……
…
OS
Stream1
Stream2
StreamM
Our C Program
File I/O Code
Other Code
Screen and keyboard are emulated as files in OS
attach
attach
attach
9
Streams in C
There are three standard I/O streams. stdin – standard input (ie. keyboard) (C++: cin) stdout – standard output (ie. screen) (C++: cout) stderr – standard error (ie. screen) (C++: cerr)
You have been using stdout whenever you use cout! You can use cerr << "ERROR" << endl; to print to standard
error stream. I/O operations are performed in C using streams.
Files are also accessed through streams, these streams can be created or destroyed whenever necessary.
In C, streams are identified by a number, called the File Descriptor. Eg. printf output to stdout, while fprintf can be used to
output to any streams. (We will cover fprintf function soon).
10
Output Streams in C
Instead of standard output (stdout), there is another way for us to output data to the screen, it is called the standard error stream (stderr).
With two different output streams (both are directed to the screen), we can have a better management of display Normal output (input prompts, messages, information)
stdout Error output (error message) stderr We can even separate two types of output by redirection
(next slide)
11
Capture Output in OS
It is possible to capture ALL the output generated by a program using redirection operator in the shell (e.g. C-shell, or Bash Shell in your Linux, or cmd.exe in Windows). Log files are very useful in debugging!!!
Example (Linux, C-shell): ./a.out > output.txt
In the above, instead of printing output to the screen, the normal output will be saved to a file named output.txt.
Data printed to error stream will not be redirected (still displayed on the screen).
(./a.out > output.txt) >& error.txt This can redirect the normal output to output.txt, while
the error output to error.txt.
12
Redirect Input Stream
Similar to output streams, we can redirect a file content to the standard input stream.
Example:int main( ) { // program
int x, y; double z;cin >> x >> y >> z;return 0;
}
100 90 // input.txt80.2
./a.out < input.txt // command line
13Department of Computer Science and Engineering, HKUST
C API
Output Functions
14
Formatted I/O
The printf function provides a convenient way to output formatted data to stdout.int printf(const char * format [ , argument , ...]);
Print formatted dataPrints the arguments formatted as the format argument specifies.
The format is a C-string. The C-string specifies what the output looks like.
The arguments are data to substitute into the format template. Recall that this accepts any number of parameters, which is
implemented by ellipse list in C. It returns the number of characters written, or a negative
number when error occurs.
15
Formatted I/O
Example printf(“Hello\n”); printf(“Hello %d\n”, 100); printf(“Hello %f %s\n”, 2.5, “abc”); ….
The % means that it is a space to substitute arguments into the place to be outputted. %d -> integer %f -> floating point number %% -> the % sign …. and many others of them (check the documents).
16
Formatted I/O
Recall that the ellipse list cannot check the type of parameters.
A common pitfall in using printf is substituting a mismatched type into the arguments. For example,
printf(“Hello %f\n”, 100); This will print some garbage value, since it trys to interpret a
binary number representing an integer as a floating point number.
17
Formatting strings
The same formatted result can be used to format a string using sprintf.int sprintf ( char * buffer, const char * format [ , argument , ...] );
Print formatted data to a string.Writes a sequence of arguments to the given buffer formatted as the format argument specifies, instead of stdout.
The first argument is a character array “buffer” that the function will write the formatted string into it.
fprintf is a similar function, which can print to any stream, including file streams.
18
sprintf Example
int main() {
buffer = new char[30];
char buffer2[40];
sprintf(buffer, “Hello %s”, “World”);
sprintf(buffer2, “Mario %s”, “World”);
delete[] buffer;
return 0;
}
19
The buffer – and the buffer overflow problem The sprintf() function’s first argument requires a buffer to store
the formatted output of the string. The buffer should be allocated, either on the stack (static array) or
on the heap (dynamic array). What if it points to somewhere that cannot be written?
Runtime error “may” occurs. Because it corrupts the memory location, which may be used for another variable.
It just like throwing rubbish on the street. If there is a policeman, then you are caught, otherwise you are safe. DON’T take the chance.
What if it points to somewhere allocated but not sufficient space to store the formatted output.
char buffer[5]; sprintf( buffer, "%s", "Hello World" );
Heap -> may corrupt other objects. Stack -> may corrupt the stack -> serious problem.
20
Buffer Overflow Attack
Buffer overflow attack is a typical method for crackers to break a program. Usually, they input a string into the program so that it is too
long and will eventually corrupt the stack to do something “strange”. Don’t do this if you are not expert.
To an extreme, crackers can execute arbitrary code on another machine.
Dangerous? Yes! One might be careful about buffer and if it will overflow. The function snprintf helps to avoid buffer overflow attack.
21
snprintf
int snprintf(char *s, size_t size, const char *template, ...)
The snprintf function is similar to sprintf, except that the size argument specifies the maximum number of characters to produce. The trailing null character is counted towards this limit, so you should allocate at least size characters for the string s.
As a kind reminder, size_t is just an unsigned integer.
22Department of Computer Science and Engineering, HKUST
C API
Input Functions
23
scanf
Now we turned our head from formatted output to formatted input.int scanf(const char * format [ , argument , ...]);
Read formatted data from stdin.Reads data from the standard input (stdin) and stores it into the locations given by argument(s). Locations pointed by each argument are filled with their corresponding type of value requested in the format string. There is NO pass-by-reference in C, passing a non-const
pointer is the only way to allow modify a parameter in C. There must be the same number of type specifiers in
format string as that of arguments passed.
24
scanf - pointers
Notice, to use scanf, you must specify the address of the variable storing the input.
Examples: int x; scanf(“%d”, &x);
input: 22 x = 22
float f, int y; scanf(“%f=%d”, &f, &y); Input: 0.2=3 f = 0.2, y = 3
int x; char remain[1024]; scanf( "%d", &x ); scanf( "%s", remain ); Input: 2.2 x = 2, remain = .2
25
fscanf, sscanf
fscanf – accept extra parameter to specify which stream to read from.
sscanf – accept input from a C-string. You can use this to “parse” a string. Separating a string into separate components according to a
format.
26
scanf returns … Return Value.
The number of items succesfully read. This count doesn't include any ignored fields with asterisks (*). Technically, but seldom used, you can add a format argument to specify
a type but don’t want its value. int a, b; scanf(“%d %*d %d”, &a, &b); Input: 1 2 3 a = 1, b = 3 returns 2
If EOF is returned an error has occurred before the first assignment could be done.
Tips: Always check return value. Output understandable error message and ask for re-entering input
or quitting program. For more robust parsing, using a more sophisticated parsing method
instead of using scanf.
27
Summary
Input Output
Standard I/O scanf printf
File fscanf fprintf
String (character buffer)
sscanf sprintf, snprintf
28Department of Computer Science and Engineering, HKUST
C API
File-Related Functions
29
File I/O: Overview
File I/O is a service provided by the operating system through a set of functions.
These functions must be able to know which file the user is reading/writing.
That’s why all file related function (except fopen), requires a parameter of type FILE*. The argument is called the file descriptor, which is a unique identifier for each opened file in the operating system. There are other properties stored in the structure FILE (such
as access mode).
30
Various functions for File I/O
fopen – open a file, with various access modes. feof – check whether the file ended. fgetc – read a character from the file fputc – write a character to the file fread – read a block of characters from the file fwrite – write a block of characters to the file fclose – close the file fseek – move the currently reading position rewind – move the currently reading position back to the
beginning of the file. fflush – flush the buffer and write to the file immediately.
31
Binary File / Text File A text file is simply a human-readable file. A binary file is not human-readable. Comparison between text file and binary file:
Advantages of text files: Human readable. Compressible (since it wastes lots of space inherently).
Advantages of binary files: No precision problem for printing floating point numbers. Usually, smaller in size. Hidden some information
(you can’t understand the file without the file format). Easier to read an arbitrary element in an array (eg. each integer
is stored as 4 bytes, the 101th element is located at the 401th byte)
32
Typical File I/O Scenarios (1)
The first thing you need to consider is that the file is a text file or is a binary file.
The next thing you need to know is that you want to read, write or append the file.
Now you can open the file, using the fopen function. Remember to check if you can successfully open the file or not (return NULL when fails).
Then you need to know whether you want to process the file byte, block, or token. For byte-level access, you may want to use fgetc or fputc (text
file). For block, you may want to use fread/fwrite (binary file). For token, you can use fscanf (text file).
33
Typical File I/O Scenarios (2)
Whenever you read/write a file, the file pointer, associated with the file descriptor, moves to the end of you last accessed location and your next operation starts there. Sometimes you want to move around the file pointer in the file, then you use fseek/rewind.
You can also check whether you have reached the end of the file, using the function feof.
Last, but still very important, is to fclose the file. Otherwise, the content might not be saved into the file (or
sometimes other cannot delete/open the file)
34
More about fgetc()
Recall that the default behavior of cin is to skip the whitespace (space, newline).
fgetc (or getc, which read from standard input) won’t skip white space. You can hence use fgetc to count how many spaces in an
input.
35
More about feof() The concept of “end-of-file” is a little bit strange, it can be
interpreted in this way. The file is appended with a character, called the “eof” character. Whenever this character is read, after reading, the feof() return
true. Before reading, however feof() return false even if the last character of the original file is already read.
For example while (!feof(file)) {cout << fgetc(file) << endl;} This is NOT correct! Since it will actually output the end-of-file
character once. Even the last character is read, the feof() still return true.
The corrected version:while (true) {int v = fgetc(file); if (feof(file))break; cout << v<< endl;}
36
More about fflush() By default, C I/O are buffered (that’s true for C++ I/O as
well). Buffered I/O means that the data to be outputted are stored
in a piece of memory before it is truly written out. This is to take advantage of block I/O performance. Writing to memory (RAM) is much more efficient than writing
to disk. Eg. DMA technique can be efficiently used to write a block of
data to disk. It is up to the library when to truly written out the data or
wait. To force the library to write out the data, you should use fflush().
When the file is closed using fclose(), the data is flushed.
37Department of Computer Science and Engineering, HKUST
C API
Memory Management Functions
38
Memory Management
Memory is NOT an infinite asset, and is shared across all programs. Recall that deallocate memory once it is not needed.
The operating system has a module called the memory manager to ensure each process access it’s own memory, and will not corrupt others.
Memory management is a highly computational activity. If memory allocation/deallocation happens too often, it will leads to: Slowing down of system performance Memory fragmentation – where big piece of continuous is
hard to find.
39
Memory Fragmentation Illustrated
Suppose there are 100 bytes on memory. A cell means 10 bytes Marked with “A” means it is allocated.
However, when another 20 bytes of memory is requested, it cannot be fulfilled. However 30 bytes of memory is actually not allocated.
A A A A A A A
40
Memory Related Functions
malloc – allocate continuous piece of memory of specified size realloc – resize the allocated piece of memory, the contents
are retained. calloc – same as malloc, except initializing the memory to
zero free – release the memory back to the memory manager
Same function is used both for dynamic variables and dynamic arrays (this is different from delete and delete[] in C++)
memcpy – bitwise copy from a memory location to another memory location
memset – quickly set a value the each byte of a piece of memory
41
Operator sizeof
Many memory management functions require the size of a type. It is a bad practice to hard-code the size (say, hard-code 4
for integer). sizeof operator is used to return the size of a variable
int x; sizeof(x) 4 bytes (in Win32 machine) double x; sizeof(x) 8 bytes (in Win32 machine)
You can also apply sizeof to a type sizeof(int)
Always use sizeof, even if you know the size of the datatype in your machine. This produces a more portable code.
42
More about malloc / free To allocate a dynamic object, you can use
int* pInt = (int*) malloc( sizeof(int) ); Obj* pObj = (Obj*) malloc( sizeof(Obj) );
To allocate a dynamic array of length = ELEMENT, you can use int* pIntArray = (int*) malloc( sizeof(int)*
ELEMENT ); Obj* pObjArray = (Obj*) malloc( sizeof(Obj)*
ELEMENT ); If there is not enough space to allocate, it return NULL.
Always check this in programming. To deallocate memory, you can use
free(pInt); free(pObj); free(pIntArray); free(pObjArray);
43
More about realloc
Why realloc is faster than allocate then copy? Because it tries to expand the original allocated piece by the
specified size, if there are sufficient free space around the original allocation.
This may improve the efficiency of the solution for the last question in Midterm.
What will happen if I use realloc but the current address do NOT allow expansion? It falls back do allocate then copy.
44
More about memset / memcpy
You can initialize an array to 0 by:memset( pIntArray, 0, sizeof(int)*ELEMENT );
You can copy an array to another by:int* pIntArray2 = (int*) malloc( sizeof(int) * ELEMENT);
memcpy( pIntArray2, pIntArray, sizeof(int)*ELEMENT ); Again, this is much more efficient than the solution for the last
question in Midterm, which uses a for-loop to copy the elements.
45Department of Computer Science and Engineering, HKUST
C API
Other aspects of C Programming
46
Other aspects of C Programming
There are some more strange convention when writing a C program. All variables must be declared at the beginning of a scope. Use a C compiler – gcc, instead of g++.
Use extern "C" in function prototype to link functions compiled with a C compiler (more about this in next slide).
You can also use C++ compiler to compile a C program (C++ is a superset).
47
Other aspects of C Programming Assume func_c.obj, which defined a function add, is compiled
with C compiler. And assume you cannot access the source code of that object file.
Now, if we want to use the function add in a C++ program:// A C++ program, using C function#include <iostream>using namespace std;extern "C" { int add( int, int ); }int main( ) {
int x = 10, y = 12;cout << add(x,y) << endl;return 0;
}