coit29222-structured programming lecture week 12 reading: textbook (4 th ed.), chapter 14 textbook...
TRANSCRIPT
COIT29222-Structured Programming
Lecture Week 12 Reading: Textbook (4th Ed.), Chapter 14
Textbook (6th Ed.), Chapter 17 Study Guide Book 3, Module 4
This week, we will cover the following topics:Files
Physical and Logical FilesText and Binary Files
File ProcessingSequential Access File (SAF)Random Access File (RAF)Reading/Writing Data from/to SAFReading/Writing Data from/to RAF
Physical filesYou are familiar with files, e.g.:
– word processor documents– text-editor files– C++ source, object & executable files– etc.
Files are stored on: – hard disk, floppy disk, CD-ROM, memory stick,
etc. – their storage is persistent
i.e. the computer can be turned off and the files are accessible when the computer is turned on again
Primary & secondary storage
Files are stored in secondary storage– a collective name for all storage not
consisting of the computer’s main memoryA computer’s main memory or
primary storage is volatile (not persistent) – storage for variables in a running
program is allocated in primary storage-data stored in variables is temporary—it
can be accessed during program execution, but is lost when the program terminates execution
Files – a means of communication
Files provide a means of communication between a running program and the ‘outside world’—the environment in which the program runs – data can be read from a file into a program and– a program can communicate with the ‘outside
world’ by writing data to a file– Example:
a program gets and validates employee time-sheet entries input by a user and stores the data in a file
this file is subsequently read by another program to generate fortnightly pay cheques
Files – a means of communication
Secondary storage devices
Sequential-access devices– tape devices– to get to point q on the tape, the drive
needs to pass through points a to p analogous to audio tapes
Direct-/random-access devices– magnetic disk, floppy disk and CD-ROM– allows direct access to a particular file
or a particular position within a fileanalogous to audio CDs
Data encoding in files All data stored in physical files is encoded into
ones and zeros - 2 basic types of encodings:– Text files
separate encoding for each character in the underlying character set (typically ASCII)
human readable i.e. can be read by a text editor
– Binary filesdata must be interpreted by a program (or
processor) that understands the formatting of the file
not human readableexecutable programs and certain data files are
encoded in binary format
Efficiency: Text files vs. binary files
Storage efficiency– Binary files are more storage-efficient
than text files - Example:In binary files the integers are stored in the same
fixed number of bytes as in main memory– Example: 123 stored in 1 byte (in fact, 7bits)
In text files the length of a formatted integer determines the storage required
– Example: “123” requires 3 bytes
Logical filesIn order for a program to read from,
or write to, a physical file, we must be able to represent the file (at an abstract level) in the program– a logical file is an abstraction that can
be viewed as a ‘channel’ that connects the program to a physical fileAll references to a physical file within the
program are made via its logical representation
The logical file has a logical name (a variable) which is used to refer to the file inside the program
Logical file representation of a physical file
The operating system is responsible for associating a logical file in a program to a physical file on an external storage medium
I/O devices (e.g. keyboard, console, printer) are also represented by logical files in a computer program
The logical file is a data structure
A logical file is a data structure which consists of a sequence of components of the same type– Similar to the array construct
Significant differences – a logical file:– is (theoretically) of unlimited size– has a concept of current position
which is an implicit reference to some element in the sequence
Logical file: Sequence of components of the same
type
C++ file streams The file stream is the C++
logical file structure– C++ programs communicate with I/O
devices (keyboard, printer, console) and physical files in secondary storage via file stream objects – familiar examples:cin, the pre-defined input file stream
object which, by default, is connected to the keyboard
cout, the pre-defined output file stream object which, by default, is connected to the console
What are file stream objects? File stream objects are objects of a
pre-defined C++ class– you can think of a class as a type and
an object as a variable declared to be of that type
– cin and cout are objects (variables) of the iostream class type • these objects are pre-defined in the iostream library
the requirement for #include <iostream> in programs which use these objects
C++ logical file – A sequence of bytes
A C++ file stream is a sequence of bytes– i.e. the components of C++ logical files
are bytes there is no inherent “record”
structure in the C++ view of a fileany such structure must be imposed by
the C++ program reading or writing the file
– individual bytes could be read into, and written from, the fields of a struct-type object
Defining your own file stream objects
To access files in secondary storage from a C++ program you need to define your own file stream objects.– These objects must be defined to be one of
the following class types:
ifstream – for input (read) operations onlyofstream – for output (write) operations only
fstream – for input & output (read/write) operationsThese classes are all declared in the fstream
header file (#include <fstream>)
Defining your own file stream objects - Example
Example: 3 file stream objects defined:– OutFile - an output file stream object– InFile - an input file stream object – InOutFile - an object that can be used for
both input and output
Current positionDuring program execution:
– When a logical file is associated with a physical file the notion of current position becomes well-defined i.e. refers to a particular element in the linear
sequence of components
– Each read or write operation advances this reference one positioni.e. successive operations access successive
elements automatically. this reference to the current position in the
file is called the file window (or file pointer).
File window (file pointer) The file window is automatically created
when a logical file is associated with a physical file.
An individual component of a file can be “seen” (is accessible) in the program, only when the file window is positioned over it:
Automatic advance of the file window
File access types The logical file access types are:
– Sequential access components can be accessed in the
sequence in which the data is stored in the file automatic advance of the file window after a
read/write operation is the only way of changing the current position
– Random (direct) accesscomponents can be accessed in any order
(including sequentially) the file window is implicitly advanced after
read/write operations and can be explicitly positioned with a seek operation
Associating logical & physical files
At the point of association between a logical file name and a physical file, the following are usually specified or take certain default values:– the file window is set to some specified
position in the file – the type of data encoding (text or
binary) – the access type (sequential or random)
Attaching file stream objects to external files/devices
In C++, before a file stream object (logical file) can be used, it must be associated with an external file or device (physical file).– achieved with a call to the open() function –
Syntax:<file stream object>.open(<physical file name>,
<file access mode>) <physical file name>:
– a C-style character array – must specify a file name which adheres to the
file-naming requirements of the operating system
…/Cont’d
Attaching file stream objects to external files/devices
<file access mode>: – allows specification of:
– file window position: defaults to beginning of file (can be set to end of file)
– data encoding: defaults to text (can be set to binary)
Note: no file access type is specified since files in C++ are not distinguished as direct-access or sequential-access files
– optional for file stream objects of type ifstream and ofstream
– must be specified for objects of type fstream
Attaching file stream objects to external files/devices - Example
- associates clients.dat with OutFile writing to OutFile generates output to clients.dat
- associates trans.dat with InFile reading from InFile will take input from trans.dat
C++ file access modes
C++ file access modes
The file mode designators are ORed together, using the bitwise OR operator, |, to achieve the required file-access type.– Thus, to open the disk file, emp.dat, for both
input and output:
C++ file access modesPosition of the file window
– by default, the beginning of the file– mode designators, ios::ate and ios::app
can be specified to alter this default statusData encoding
– by default, C++ files are text filesThe underlying character set on an IBM
compatible PC is the ASCII character set C++ text files are encoded in ASCII format.
– the mode designator, ios::binary, must be used to specify binary encoding in a file
C++ file access modes - Defaults for output file streams
When a file stream object is opened for output the default file status depends on whether the file associated with the object exists or not.
– Examples:if a disk file exists and is opened for output the
contents of the file will be lost if the same file is opened for input and output
the contents of the file remains unchanged
Testing the success of the open() operation
An attempt to associate a file stream object with a physical file/device might fail for various reasons – Examples:– attempting to open a non-existent file for reading– attempting to open a file for writing when no disk
space is available (i.e., the disk is full)
The success of an open() operation can be determined with the use of the fail() function - invoked on a file stream object.– returns false if the last operation on the file
stream was successful, and true otherwise
Testing the success of the open() operation - Example
Fatal error—Terminate program execution
The inability to open a file stream object is usually a fatal error – one from which the program cannot recover.
When a fatal error occurs, program execution is generally terminated. – In C++, this can be achieved with the use of the exit() function (#include <cstdlib>)
– The argument to exit() is returned to the environment in which the program was executed. An argument of 0 program terminated
normally; an argument other than 0 the program
terminated due to an error.
Closing the association between a logical & physical
fileThis operation is like “hanging up” the
connection between a physical file and a program.
In C++ the association between an external file/device and a file stream is terminated with a call to the close() function – Syntax: <file stream object>.close() – Example: InOutFile.close();
File stream objects should be closed once processing on the file is complete.
Writing sequentially to a C++ file stream
An output file stream object that has been defined and associated with an external file via a call to the open() function can be used in the same way as the pre-defined output stream object, cout - Example:
Writing sequentially to a C++ file stream - Example
To generate output to a sequential-access file in
C++6 basic steps:– include the file stream library: #include <fstream> – define an output file stream object of the ofstream
class – associate the output file stream object with an
external file using the open() function on the file stream object
– test to ensure that the open() operation succeeded – transfer data to the external file by using the stream
insertion operator on the file stream object– close the file stream object when data transfer is
complete
Reading sequentially from a C++ file stream
To read a sequential-access file we define an input file stream object and associate it with an external file via a call to the open() function.
Data can be read from this file stream object using the stream extraction operator, >>, in the same way as data can be read from the pre-defined input stream object, cin. – Recall from your experience with cin, that when
reading from an input stream, “white space” serves to separate data items.
Reading sequentially from a C++ file stream - Example
Reading sequentially from a C++ file stream - Example
Copying files To copy the entire contents of a text file
(including “white space” characters) to the screen/another file, we can use the get() function on an input file stream object. – Example: InFile.get(CharRead);
stores the next character in the input file stream, InFile, in the character variable, CharRead.
However, to read all the characters in a file we would need to know the number of characters in the file (in general, an unlikely scenario), or we need to know when we have reached the end of the file.
End-of-file statusA logical file is theoretically of unlimited size
since it must be able to represent physical files of arbitrary size.
However, given a logical file that has been associated with an existing physical file, there are a fixed number of components that can be read from the file. a logical file must provide a means of determining
when all the components in the file have been read– we can view a logical file as having an end-of-file
component following the last component in the file
End-of-file status
Conceptual view of the logical file after a read operation:
Detecting end-of-file in C++
There is an end-of-file function, eof(), which can be called on objects of the file stream classes. – This function returns true if an attempt has
been made to read beyond the last component in the file.
– With reference to the previous slide, this corresponds to the file window being positioned over the EOF component when a read operation is performed.
Detecting end-of-file in C++ - Example
C++ I/O state bitsHowever, the eof() function tests only for the
end-of-file condition. – In the example of the previous slide there is no
test to ensure that the get() operation succeeded. – to do this requires a review of the C++ I/O state
bits:eofbit: set if an attempt has been made to read the EOF
marker failbit: set if an operation failed on a stream—for
example, on bad format of input data—note that this includes an attempt to read the EOF marker
badbit: set when the stream becomes unstable due to some unrecoverable I/O systems or hardware error – usually involves a loss of data
Checking the status of C++ file streams
The status of a file stream can be tested with the following functions:– eof() true if the eofbit is set – bad() true if the badbit is set – fail() true if either the badbit is set
or the failbit is set – good() true if none of the state bits
(eofbit, failbit, badbit) are set
Copying files - ExampleThe success of the get() operation can be
ensured with the use of the fail() function – i.e. loop until:– an operation fails on the stream or – the end-of-file status has been set
Reading from a sequential-access file in C++
6 basic steps:– include the file stream library: #include <fstream>
– define an input file stream object - ifstream class – associate the input file stream object with an external
file using the open()function – test to ensure that the open() operation succeeded
– transfer data from the external file to the program
use stream extraction operator if “white space” characters are to be ignored
use the get()/getline() function if “white space” characters are to be read
– close the file stream when data transfer complete
Numeric data in sequential-access files – Example:
A disk file, in.dat, contains a set of integer value pairs. – A program is to read this file and generate a
disk file, out.dat, consisting of the product of each pair of numbers.
– The format of these disk files is shown below:
Numeric data in sequential-access files – Example:
Numeric data in sequential-access files – Example:
Numeric data in sequential-access files – Example:
Reading and writing structured data
The data processed in programs generally has some sort of structure.
However, from the C++ viewpoint, a file stream is a sequence of bytes terminated by an EOF marker. there is no facility for reading or
writing entire records in C++any record structure is imposed by the
C++ program reading or writing the file– In C++, records must be read or written
one field at a time
Reading and writing structured data - Example
To write a function to read employee data from the external file, emp.dat:
– where the fields of a record are:• employee id• employee name• employee age• employee sex
Reading and writing structured data - Example
Read data into an array of employee records of type:
struct Employee {int id;string name;int age;char sex;
};
Reading and writing structured data - Example
Reading and writing structured data - Example
Why we need to access files randomly/directly
Formatted data in a sequential-access file cannot be modified without the risk of destroying other data in the file. – Problem: formatted I/O using the stream insertion
and extraction operators variable-length “fields”
formatted I/O model is not usually used to update “records” in place
Accessing a file sequentially is inappropriate for the instant retrieval of particular “records”– sequential search from the beginning of a large file is
too slow
Fixed-length recordsFixed-length records make it easy for a
program to calculate the location of any record with respect to the beginning of the file. – This location is a function of the record key
and the record size. – Example: a file of 100-byte, fixed-length records
• key: id number (1000 to 9999)• the locations of records as byte offsets from
the beginning of the file shown in dia. of the next slide
Fixed-length records – byte offset from beginning of file
• The location of the record for id number, N, is calculated as: (N – 1000) * 100 bytes from the beginning of the file
Direct-access file – Fixed-length records
In a direct-access file with fixed-length records, storage for all “records” is allocated and initialised when the file is created. – whereas, in a sequential-access file, new
“records” are appended to the end of the file.Direct access to fixed-length records
is more time-efficient than sequential access, but there is a trade-off. – Fixed-length records are less storage-
efficient than variable-length records (see wasted space in the dia. of next slide)
Fixed-length records -Employee data
Creating a file of fixed-length records
Assuming appropriate declarations, initialisation of the 10-record employee file can be achieved with:
Positioning file windows in C++
C++ maintains two file windows (file pointers), one for reading and one for writing.
To reposition these file pointers to a specific byte location in a file, two functions are provided: – seekp() – to reposition the file pointer
for writing (‘p’ is for put) – seekg() – to reposition the file pointer
for reading (‘g’ is for get)
File pointers in C++
Objects of type ifstream have a ‘get pointer’ indicating the byte
position in the file from which the next read operation is to occur
Objects of type ofstream have a ‘put pointer’ indicating the byte
position in the file at which the next write operation is to occur
Objects of type fstream have a ‘get pointer’ and a ‘put pointer’
Positioning file pointers in C++
The seek functions take the general form:
<file stream object>.seekX(<byte offset>, <seek direction>)
– seekX seekp or seekg– <byte offset> no. of byte locations to move– <seek direction> relative position for
<byte offset> •ios::beg from beginning of the file •ios::cur from current position of the file
pointer •ios::end from the end of the file
Positioning file pointers in C++
The <seek direction> argument is optional– if unspecified defaults to ios::beg - byte offset
relative to the beginning of the file Examples:seekp(0); // move put ptr to beg. of file
seekg(45, ios::cur); // move get ptr 45 // bytes forward
// move put ptr to the position 10 bytes// from the beginning of the file seekp(10, ios::beg);
Access to file pointers
To return the current position of the “get” and “put” pointers:– “get pointer” – use tellg() function –
Example (assuming appropriate declarations):long GetPosition;GetPosition = InFile.tellg();
– “put pointer” – use tellp() function – Example (assuming appropriate declarations):
long PutPosition; PutPosition = OutFile.tellp();
Writing data to a random-access file - Example
Code to write NumEmps employee records, stored in an array Employees, to the file stream, OutFile:
Reading data sequentially from a random-access file Data in a random-access file can be read
into an array of records for processing. In C++, reading random-access files
sequentially, is basically the same as reading sequential-access files, except that: – only those records which contain a “valid” entry
are retrieved. • i.e. skip “invalid” records
– Employee file example: • “valid” records have a non-zero employee id
number
Reading data sequentially from a random-access file -
Example
Reading data randomly from a random-access file -
ExampleThe function below locates and reads the record for the employee id number given by the parameter, IdKey.
Reading data randomly from a random-access file -
Example