file systems

Unit VIIIFILE SYSTEM

1. File ConceptComputers can store information on various storage media, such as magnetic disks, magnetic tapes,and optical disks. So that the computer system will be convenient to use, the operating systemprovides a uniform logical view of information storage. The operating system abstracts from thephysical properties of its storage devices to define a logical storage unit, the file. Files are mappedby the operating system onto physical devices. These storage devices are usually nonvolatile, so thecontents are persistent through power failures and system reboots.

A file is a named collection of related information that is recorded on secondary storage. Theinformation in a file is defined by its creator. Many different types of information may be stored in afile- such as source programs, object programs, executable programs, numeric data, text, payrollrecords, graphic images, sound recordings, and so on. A file has a certain defined structure, whichdepends on its type. A text file is a sequence of characters organized into lines (and possibly pages).A source file is a sequence of subroutines and functions, each of which is further organized asdeclarations followed by executable statements. An object file is a sequence of bytes organized intoblocks understandable by the system's linker. An executable file is a series of code sections that theloader can bring into memory and execute.

File management system is the software which empowers users and applications to organize andmanage the data that resides on secondary storage. The file management system is supposed to hideall device-specific aspects of file manipulation from users. The organization and management offiles may involve access, updates and several other file operations. The common responsibilities ofthe file-management system include the following:

• Mapping of access requests from logical to physical file-address apace• Transmission of file elements between main and secondary storage• Management of the secondary storage, such as keeping track of the status, allocation, and

deallocation of space• Support for protection and sharing of files and the recovery and possible restoration of files

after system crashes

A file's attributes consist of the following:

• Name. The symbolic file name is the only information kept in human-readable form.

• Identifier. This unique tag, usually a number, identifies the file within the file system; it isthe non-human-readable name for the file.

• Type. This information is needed for systems that support different types of files.

• Location. This information is a pointer to a device and to the location of the file on thatdevice.

• Size. The current size of the file (in bytes, words, or blocks) and possibly the maximumallowed size are included in this attribute.

• Protection. Access-control information determines who can do reading, writing, executing,and so on.

• Time, date, and user identification. This information may be kept for creation, lastmodification, and last use. These data can be useful for protection, security, and usagemonitoring.

A generalization of file services.

File operations are the functions that can be carried out on a file. For handling these operations,operating system provides some services in the form of system calls. The various operations and therelated system calls are given below:

• Create a file. Two steps are necessary to create a file. First, space in the file system must befound for the file. Second, an entry for the new file must be made in the directory, whichcontains name, location and other such pieces of information. The system call used for thisoperation is create.

• Open a file. The system call open, accepts the file name and the access mode (read, writeand execute) as parameters. The OS searches the directory entry table for the file name andchecks if access mode is allowed. It then copies the directory entry of the file to the open-file table (this table stores information about the files that are open at a particular time).

• Write to a file. To write a file, write system call is uded which requires both the name of thefile and the information to be written to the file. Given the name of the file, the systemsearches the directory to find the file's location. The system must keep a write pointer to thelocation in the file where the next write is to take place. The write pointer must be updatedwhenever a write occurs.

• Read a file. To read from a file, we use system call read that specifies the name of the fileand where (in memory) the next block of the file should be put. Again, the directory issearched for the associated entry, and the system needs to keep a read pointer to the locationin the file where the next read is to take place. Once the read has taken place, the readpointer is updated.

• Repositioning within a file/Seek file. To position the pointer to a specific position in a file,the seek system call is used. Once the pointer is positioned, data can be read from andwritten to that position.

• Close a file. When all operations on a file are completed, it must be closed using the closesystem call. The OS searches and erase the file entry from the open-file table.

• Delete a file. When a file is not required, the delete system call searches the directory for thenamed file. Having found the associated directory entry, all file space is released, so that itcan be reused by other files, and erase the directory entry.

• Append a file. To add data at the end of an existing file, append system call is used. Itworks similar to the write system call, except that it positions the pointer at the end of thefile and then performs the write operation.

• Rename file. To change the name of existing file, rename system call is used. This systemcall changes the existing entry for the file name in the directory to the new file name.

Listing of some file types commonly used:

Ques. What is a file? Define File Management System.

Ques. List file attributes. Which operations are performed on files? List some file types commonly used.

2. Access MethodsInformation stored in files are accessed and read into computer memory in several ways:

1. Sequential Access: In simplest access method information in the file is processed in order,one record after the other. A read operation-read next-reads the next portion of the file andautomatically advances a file pointer, which tracks the I/O location. Similarly, the writeoperation-write next-appends to the end of the file and advances to the end of the newlywritten material (the new end of file). For example, editors and compilers usually accessfiles in this fashion. Such a file can be reset to the beginning by rewind operation.

2. Direct Access: For direct access, a file is viewed as a numbered sequence of fixed-lengthblocks or records, that allow programs to read and write records rapidly in no particularorder. Thus, we may read block 14, then read block 53, and then write block 7. Direct-accessfiles are of great use for immediate access to large amounts of information. Databases areoften of this type.

For the direct-access method, the file operations must be modified to include the blocknumber as a parameter. Thus, we have read n, where n is the block number, rather than readnext, and write n rather than write next.

3. Other Access Methods (Indexed sequential access): This method involves theconstruction of an index for the file. The index contains pointers to the various blocks. Tofind a record in the file, we first search the index and then use the pointer to access the filedirectly and to find the desired record. Following figure shows an example of indexed sequential-access method, where the file iskept sorted on a defined key (last name). To find a particular item, we first make a search ofthe index file, which provides the block number of the secondary index. This blockcontaining the desired record is read in and thus sequential search on index is performed.

With large files, the index file itself may become too large to be kept in memory. Onesolution is to create an index for the index file. The primary index file would containpointers to secondary index files, which would point to the actual data items.

3. Directory structureA computer system stores numerous data on disks. To manage this data, the disk is divided into oneor more partitions (also known as volumes) and each partition contains information about the filesstored in it. This information is stored in a directory. In simplest terms, directory is a flat file thatstores information about the files and subdirectories and organizes files hierarchically. The systemviews them differently from ordinary files.

Directory operations include:

• Create a file – new files can be created and added to the directory as new directory entry.• Search for a file – when a file is required, corresponding entry is serached in the directory.• Delete a file – a file erase from the directory, when it is no longer required.• List a directory – contents of the directory entry is listed.• Rename a file – a file may be renamed, wchich may change its position in the directory.• Traverse the file system – every file in the directory can be accessed.

Following are the commonly used directory structure:

1. Single-Level DirectoryIt is the simplest form of directory structure, in which a single directory contains all files for allusers. Sometimes this directory is referred to as root directory. It is simple to implement, but thedrawback is that no two files have the same name.

2. Two-Level DirectoryIn this directory structure, each user gets their own directory space called user file directory(UFD). File names only need to be unique within a given user's directory.

A master file directory (MFD) is used to keep track of each users directory, and must bemaintained when users are added to or removed from the system.

3. Hierarchical-Level Directory/Tree-Structured DirectoriesThe two-level hierarchy eliminates file name conflicts between users. But another problem is thatusers with many files may want to group them in smaller subgroups. So, two-tiered directorystructure is extended for new directory structure called Hierarchical Directory Structure.

Here each user / process has the concept of a current directory from which all ( relative ) searchestake place. Files may be accessed using either absolute pathnames (relative to the root of the tree)or relative pathnames (relative to the current directory).

All these three directory structures are shown as (a), (b) and (c) respectively in the above figure. A,B and C represents three different users. Enclosing rectangles represent directories of correspondingusers and enclosing circles represent files of those users.

4. Acyclic-Graph Directories:A tree structure prohibits the sharing of files or directories. An acyclic graph( a graph with nocycles) allows directories to share subdirectories and files. The acyclic graph is a naturalgeneralization of the tree-structured directory scheme.

Shared files and subdirectories can be implemented in several ways: 1) A common way, used bymany of the UNIX systems, is to create a new directory entry called a link. A link is effectively apointer to another file or subdirectory. 2) Another common approach to implement shared files issimply to duplicate all information. Thus, both entries are identical and equal.

5. General Graph DirectoryThe primary advantage of an acyclic graph is the relative simplicity of the algorithms to traverse thegraph and to determine when there are no more references to a file. We want to avoid traversingshared sections of an acyclic graph twice, mainly for performance reasons. With acyclic-graphdirectory structures, a value of 0 in the reference count means that there are no more references tothe file or directory, and the file can be deleted. However, when cycles exist, the reference countmay not be 0 even when it is no longer possible to refer to a directory or file, because of self-referencing concept.

So, A serious problem with using an acyclic-graph structure is ensuring that there are no cycles. Ifwe start with a two-level directory and allow users to create subdirectories, a tree-structureddirectory results. It should be fairly easy to see that simply adding new files and subdirectories to anexisting tree-structured directory preserves the tree-structured nature. However when we add links,the tree structure is destroyed, resulting in a simple graph structure.

Ques: What is a Directory? What are the commonly used directory structure?Ques: Differentiate between a file and a directory.

Ques: Justify the need of file system.Ans: File systems provide efficient and convenient access to the disk by allowing data to be stored,located, and retrieved easily. A file system provides solution to following design problems:

1. To define how the file system should look to the user. This task involves defining a file andits attributes, the operations allowed on a file, and the directory structure for organizingfiles.

2. To create algorithms and data structures to map the logical file system onto the physicalsecondary-storage devices.

The file system itself is generally composed of many different levels as shown in following figure.

Each level uses the features of lower levels to create new features for use by higher levels.

The lowest level, the I/O control, consists of device drivers and interrupt handlers to transferinformation between the main memory and the disk system.

The basic file system needs only to issue generic commands to the appropriate device driver to readand write physical blocks on the disk. Each physical block is identified by its numeric disk address(for example, drive 1, cylinder 73, track 2, sector 10).

The file-organization module knows about files and their logical blocks, as well as physical blocks.It also includes the free-space manager, which tracks unallocated blocks and provides these blocksto the file-organization module when requested.

Finally, the logical file system manages metadata information, which includes all of the file-systemstructure except the actual data (or contents of the files). It is also responsible for protection andsecurity.

To create a new file, an application program calls the logical file system. The logical file systemknows the format of the directory structures. To create a new file, it allocates a new FCB. Thesystem then reads the appropriate directory into memory, updates it with the new file name andFCB, and writes it back to the disk.

Ques: What are the structures used in file-system implementation? File system provide efficient and convenient access to the disk by allowing data to be stored,located, and retrieved easily. On disk, the file system may contain information about the totalnumber of blocks, the number and location of free blocks, the directory structure, and individualfiles. Several on-disk and in-memory structures are used to implement a file system.

Important data structures stored on disk are briefly explained here:

• A boot-control block, ( per volume ) can contain information needed by the system to boot

an operating system from that volume. If the disk does not contain an operating system, thisblock will be left empty.

• A volume control block, (per volume) contains volume (or partition) details such number of

blocks in the partition, size of blocks, free-block counts, and free-block pointers.

• A directory structure (per file system), containing file names and pointers to corresponding

FCBs. UNIX uses inode numbers, and windows uses a master file table.

• The File Control Block, FCB, (per file) contains many details about the file. It has a unique

identifier number to allow association with a directory entry. UNIX stores this informationin inodes, and windows maintains a relational database structure within the master file table.

There are also several key data structures stored in memory:• An in-memory mount table, contains information about each mounted volumes. (Before

you can access the files on a file system, you need to mount the file system. Mounting a filesystem attaches that file system to a directory (mount point) and makes it available to thesystem.)

• An in-memory directory cache holds the directory information of recently accesseddirectories.

• A system-wide open file table, contains a copy of the FCB of each open file, as well assome other related information.

• A per-process open file table, contains a pointer to the appropriate entry in the system-wideopen-file table, as well as other information.

Ques: Explain allocation strategies: Contoguous, linked and indexed.Files are allocated disk spaces by operating system. The main problem is how to allocate space tothese files so that disk space is utilized effectively and files can be accessed quickly. Three majormethods of allocating disk space are in wide use:

1. Contiguous Allocation: It requires that each file occupy a set of contiguous blocks on disk.Disk addresses define a linear ordering on the disk. Contiguous allocation of a file is definedby the disk address and length (in block units) of the first block. If the file is n blocks longand starts at location b, then it occupies blocks b, b + 1, b + 2, ... , b + n – 1. Advantage:

1. Accessing a file that has been allocated contiguously is easy.2. Both sequential and direct access can be supported by contiguous allocation.

Drawbacks:1. difficulty in finding space for a new file.2. As files are allocated and deleted, the free disk space is broken into little pieces

resulting into external fragmentation.

2. Linked Allocation: With this linked allocation, each file is a linked list of disk blocks; thedisk blocks may be scattered anywhere on the disk. The directory contains a pointer to thefirst and last blocks of the file. Each block contains a pointer to the next block. Thesepointers are not made available to the user. For example, linked allocation for a file of fiveblocks might start at block 9 and continue at block 16, then block 1, then block 10, andfinally block 25 can be done in following way:

To create a new file, we simply create a new entry in the directory. With linked allocation,each directory entry has a pointer to the first disk block of the file. The size of a file need notbe declared when that file is created. A file can continue to grow as long as free blocks areavailable. To read a file, we simply read blocks by following the pointers from block toblock.

Advantage: There is no external fragmentation with linked allocation, and any free block onthe free-space list can be used to satisfy a request.

Drawbacks: 1. The major problem is that it can be used effectively only for sequential-access files.

To find the ith block of a file, we must start at the begining of that file and follow thepointers until we get to the ith block. So, it is inefficient to support a direct-accesscapability for linked-allocation files.

2. Each file requires slightly more space because it stores pointers to next block

3. Indexed Allocation: Linked allocation solves the external-fragmentation and size-declaration problems of contiguous allocation. However, linked allocation cannot supportefficient direct access, since the pointers to the blocks are scattered all over the disk andmust be retrieved in order. Indexed allocation solves this problem by bringing all thepointers together into one location: the index block.

Each file has its own index block, which is an array of disk-block addresses. The directorycontains the address of the index block only. When the file is created, all pointers in theindex block are set to nil. When the ith block is first written, a block is obtained from thefree-space manager and its address is put in the ith index-block entry. To find and read theith block, we use the pointer in the ith index-block entry.

Advantage: Indexed allocation supports direct access, without suffering from externalfragmentation, because any free block on the disk can satisfy a request for more space.

Drawback: Indexed allocation does suffer from wasted space because with indexedallocation, an entire index block must be allocated, even if only one or two pointers will benon-nil. However, the pointer overhead of the index block is generally greater than thepointer overhead of linked allocation.

This point raises the questions of how big the index block should be, and how it should beimplemented. There are several approaches:

1. Linked scheme. An index block is normally one disk block. Thus, it can be read andwritten directly by itself. To allow for large files, we can link together several indexblocks.

2. Multilevel index. A variant of linked representation uses a first-level index block topoint to a set of second-level index blocks, which in turn point to the file blocks. Toaccess a block, the operating system uses the first-level index to find a second-levelindex block and then uses that block to find the desired data block. This approachcould be continued to a third or fourth level, depending on the desired maximum filesize.

3. Combined scheme. Another alternative, used in the Unix File Systetm, is to keep thefirst, say, 15 pointers of the index block in the file's inode. The first 12 of thesepointers point to direct blocks (data blocks). Thus, the data for small files (of nomore than 12 blocks) do not need a separate index block. If the block size is 4 KB,then up to 48 KB of data can be accessed directly. The next three pointers point toindirect blocks. The first points to a single indirect block, which is an index blockcontaining not data but the addresses of blocks that do contain data. The secondpoints to a double indirect block, which contains the address of a block thatcontains the addresses of blocks that contain pointers to the actual data blocks. Thelast pointer contains the address of a triple indirect block.

file systems

Education