hicasbscit.files.wordpress.com · system software and operating system 1 system software &...
TRANSCRIPT
System Software And Operating System
1
SYSTEM SOFTWARE & OPERATING SYSTEM
UNIT I
Introduction –System Software and machine architecture-Assemblers-Basic assembler functions -
Machine dependent features-program relocation-Machine independent features – literals - symbol
defining statements-expressions-program blocks-control sections and program linking - Assembler
design options-one pass assemblers-multi pass assemblers.
UNIT II
Loader and Linkers: Basic Loader Functions - Machine dependent loader features – relocation –
program – linking - Machine independent loader features - Automatic Library search - Loader
options - Loader design options - linkage editor - dynamic linking - Bootstrap loader.
Text Editors: Overview of editing process - user interface - editor structure
UNIT III
Machine dependent compiler features - Intermediate form of the program-Machine dependent code
optimization-machine independent compiler features-Compiler design options-division into passes-
interpreters-p –code compilers-compiler-compilers.
UNIT IV
Introduction: Definition of DOS – History of DOS – Definition of Process - Process states - process
states transition – Interrupt processing – interrupt classes - Storage Management Real Storage: Real
storage management strategies – Contiguous versus
Non-contiguous storage allocation – Single User Contiguous Storage allocation- Fixed partition
multiprogramming – Variable partition multiprogramming. Virtual Storage: Virtual storage
management strategies – Page replacement strategies – Working sets – Demand paging – page size.
UNIT V
Processor Management Job and Processor Scheduling: Preemptive Vs Non-preemptive scheduling –
Priorities – Deadline scheduling - Device and Information Management Disk Performance
Optimization: Operation of moving head disk storage – Need for disk scheduling – Seek
Optimization .File and Database Systems: File System – Functions – Organization – Allocating and
freeing space – File descriptor – Access control matrix.
UNIT I
System Software And Operating System
2
Introduction –System Software and machine architecture-Assemblers-Basic assembler functions -
Machine dependent features-program relocation-Machine independent features – literals - symbol
defining statements-expressions-program blocks-control sections and program linking - Assembler
design options-one pass assemblers-multi pass assemblers.
Loader and Linkers: Basic Loader Functions - Machine dependent loader features – relocation –
program – linking - Machine independent loader features - Automatic Library search - Loader
options - Loader design options - linkage editor - dynamic linking - Bootstrap loader.
INTRODUCTION
System Software consists of a variety of programs that support the operation of a computer.
It makes possible for the user to focus on an application or other problem to be solved,
without needing to know the details of how the machine works internally.
You probably wrote programs in a high level language like C, C++ or VC++, using text
editor to create and modify the program.
You translated these programs into machine languages using a compiler.
The resulting machine language program was loaded into memory and prepared for
execution by loader and linker. It is also used debugger to find errors in the programs.
System software refers to the files and programs that make up your computer's operating
system. System files include libraries of functions, system services, drivers for printers
and other hardware, system preferences, and other configuration files.
The programs that are part of the system software include assemblers, compilers, file
management tools, system utilites, and debuggers.
The system software is installed on your computer when you install your operating
system.
You can update the software by running programs such as "Windows Update" for
Windows or "Software Update" for Mac OS X. Unlike application programs, however,
system software is not meant to be run by the end user.
For example, while you might use your Web browser every day, you probably don't have
much use for an assembler program (unless, of course, you are a computer programmer).
Since system software runs at the most basic level of your computer, it is called "low-
level" software.
It generates the user interface and allows the operating system to interact with the
hardware. Fortunately, you don't have to worry about what the system software is doing
since it just runs in the background.
One characteristic in which most system software differs from application software is machine
dependency.
System software – support operation and use of computer.
Application software - solution to a problem.
System Software And Operating System
3
Application Software
The different types of application software are used by individual users and business
enterprises as well, and they have many benefits of doing so.
It includes word processing software, database software, and multimedia software,
editing software and many other different kinds as well.
All these software are either provided individually, or they are packaged together and
sold by business to business sellers.
When a whole variety of them are integrated collectively and sold to a business, they can
take up the form of enterprise software, educational software, simulation software,
information worker software etc.
Advantages
It is easy to compare, you will find that the pros outweigh the cons very easily.
With that in mind, here are some of their most popular and widely accepted benefits.
Note that in this scenario, we are speaking of application software that is designed for a
specific purpose, to be used either by individuals or by businesses.
Their single biggest advantage is that it meets the exact needs of the user. Since it is
designed specifically with one purpose in mind, the user knows that he has to use the
specific software to accomplish his task.
The threat of viruses invading custom-made applications is very small, since any business
that incorporates it can restrict access and can come up with means to protect their
network as well.
Licensed application software gets regular updates from the developer for security
reasons.
System Software And Operating System
4
Additionally, the developer also regularly sends personnel to correct any problems that
may arise from time to time.
Disadvantages
In this case with all such matters, there are certain disadvantages of such software as
well. Though these are not spoken about very often, nor are they highlighted, the fact is
that they do exist and affect certain users.
People have accepted these misgivings and still continue to use such software because
their utility and importance is much more profound than their weaknesses.
Developing application software designed to meet specific purposes can prove to be quite
costly for developers.
This can affect their budget and their revenue flow, especially if too much time is spent
developing software that is not generally acceptable.
Some software that are designed specifically for a certain business, may not be
compatible with other general software.
This is something that can prove to be a major stumbling block for many corporations.
Developing them is something that takes a lot of time, because it needs constant
communication between the developer and the customer. This delays the entire
production process, which can prove to be harmful in some cases.
Application software that is used commonly by many people, and then shared online,
carries a very real threat of infection by a computer virus or other malicious programs.
So whether you are buying them off the shelf, or whether you are hiring a developer to build
specific software for you, all of these points will seem pertinent to you. Many individuals and
businesses have regularly found the need and the requirement for such software, and the fact
remains that any computing device will be utterly useless without such software running on
it.
Assembler
Assembler translates mnemonic instructions into machine code. The instruction formats,
addressing modes etc., are of direct concern in assembler design. Similarly, Compilers must
generate machine language code, taking into account such hardware characteristics as the number
and type of registers and the machine instructions available.
Operating systems
Operating systems is directly concerned with the management of nearly all of the resources
of a computing system.
There are aspects of system software that do not directly depend upon the type of computing
system, general design and logic of an assembler, general design and logic of a compiler and,
code optimization techniques, which are independent of target machines.
Likewise, the process of linking together independently assembled subprograms does not
usually depend on the computer being used.
System Software And Operating System
5
Simplified Instructional Computer (SIC) is a hypothetical computer that includes the hardware
features most often found on real machines. There are two versions of SIC, they are, standard model
(SIC), and, extension version (SIC/XE) (extra equipment or extra expensive).
Later, you probably wrote programs in assembler language, by using macro instructions to
read and write data. You used assembler, which included macro processor, to translate these
programs into machine languages.
You controlled all these processes by interacting with the operating system of the computer.
The operating system took care of all the machine level details for you. You should concentrate on
what you wanted to do, without worrying about how it was accomplished.
You will come to understand the processes that were going on “ behind the scenes” as you
used the computer in previous courses. By understanding the system software, you will gain a deeper
understanding of how computers actually work.
SYSTEM SOFTWARE AND MACHINE ARCHITECTURE
One characteristic in which most system software differs from application soft-ware is
machine dependency. An application program is primarily concerned with the solution of some
problem, using the computer as a tool. The focus is on the application, not on the computing system.
System programs, on the other hand, are intended to support the operation and use of the computer
itself, rather than any particular application. For this mason, they are usually related to the
architecture of the machine on which they are to run.
For example, assemblers translate mnemonic instructions into machine code; the instruction
formats, addressing modes, etc., are of direct concern in assembler design. Similarly, compilers must
generate machine language code, taking into account such hardware characteristics as the number
and type of registers and the machine instructions available.
Operating systems are directly concerned with the management of nearly all of the resources
of a computing system. Many other examples of such machine dependencies may be found through-
out this book. On the other hand, there are some aspects of system software that do not directly
depend upon the type of computing system being supported.
For example, the general design and logic of an assembler is basically the same on most
computers. Some of the code optimization techniques used by compilers are independent of the
target machine (although there are also machine-dependent optimizations). Likewise, the process of
linking together independently assembled subprograms does not usually depend on the computer
being used.
Assembler is system software which is used to convert an assembly language program to its
equivalent object code.
System Software And Operating System
6
The input to the assembler is a source code written in assembly language (using mnemonics)
and the output is the object code. The design of an assembler depends upon the machine architecture
as the language used is mnemonic language.
An application program is primarily concerned with the solution of some problem, using the
computer as a tool. The focus is on the application, not on the computing system. System programs,
on the other hand, are intended to support the operation and use of the computer itself, rather than
any particular application. For this reason, they are usually related to the architecture of the machine
on which they are to run.
For example,
Assemblers translate mnemonic instructions into machine code, the instruction formats,
addressing modes, etc., are of direct concern in assembler design.
Compilers generate machine code, taking into account such hardware characteristics as the
number and type of registers & machine instruction available.
Operating system concerned with the management of nearly all resources of a computing
system.
Some of the system software is machine independent, the processes of linking together
independent assembled subprograms does not usually depend on the computer being used. And the
other system software is machine dependent; we must include real machines and real pieces of
software in our study.
However, most real computers have certain characteristics that are unusual or even unique. It is
difficult to distinguish between those features of the software. To avoid this problem, we present the
fundamental functions of piece of software through discussion of a Simplified Instructional
Computer (SIC).
SIC is a hypothetical computer that has been carefully designed to include the hardware
features most often found on real machines, while avoiding unusual or irrelevant complexities.
SIC MACHINE ARCHITECTURE
Memory
Memory consists of 8- bit bytes, any three consecutive bytes form a word (24 bits). All addresses on
SIC are byte addresses, words are addressed by the location of their lowest numbered byte. There are
total of 32768 bytes in the computer memory.
Registers
There are five registers, all of which have special uses. Each register is 24 bits in length.
System Software And Operating System
7
Mnemonic Number Special Use
A 0 Accumulator, used for arithmetic operations
X 1 Index register, used for Addressing
L 2 Linkage register, the jump to subroutine instruction stores
There turn address in this register .
PC 8 Program counter, contains the address of the next
Instruction to be fetched for execution.
SW 9 Status word, contains a variety of information, including a
Condition Code.
Data format
Integers are stored as 24-bit binary numbers; 2’s complement representation is used for
negative numbers.
Characters are store using their 8-bit ASCII codes.
There is no floating-point hardware on SIC.
Instruction Format
All machine instructions on SIC has the following 24-bit format.
used to indicate indexed-addressing mode.
Addressing Modes
X is only two modes are supported:
– Direct
– Indexed
System Software And Operating System
8
() are used to indicate the content of a register.
Instruction Set
SIC provide a basic set of instructions that are sufficient for most simple task.
Load and store registers (LDA, LDX, STA, STX)
Integer arithmetic (ADD, SUB, MUL, DIV), all involve register A and a word in
memory.
Comparison (COMP), involve register A and a word in memory.
Conditional jump (JLE, JEQ, JGT, etc.)
Subroutine linkage (JSUB, RSUB).
INPUT AND OUTPUT
One byte at a time to or from the rightmost 8 bits of register A.
Each device has a unique 8-bit ID code.
Test device (TD): test if a device is ready to send or receive a byte of data.
Read data (RD): read a byte from the device to register A
Write data (WD): write a byte from register A to the device.
Sic Machine Architecture
Mnemonic Number Special Use
B 3 Used for addressing; know as the base register.
S 4 No special use, general purpose register.
T 5 No special use, general purpose register.
F 6 Floating point accumulator register (This register is 48-bits
instead of 24).
System Software And Operating System
9
Memory
Two versions: SIC and SIC/XE (extra equipments). SIC program can be executed on
SIC/XE.
Memory consists of 8-bit bytes. 3 consecutive bytes form a word (24 bits)
In total, there are 2^15 bytes in the memory.
There are 5 registers. Each is 24 bits in length.
Addressing Modes for SIC and SIC/XE
The Simplified Instruction Computer has three instruction formats, and the Extra Equipment
add-on includes a fourth. The instruction formats provide a model for memory and data
management. Each format has a different representation in memory:
Format 1: Consists of 8 bits of allocated memory to store instructions.
Format 2: Consists of 16 bits of allocated memory to store 8 bits of instructions
and two 4-bits segments to store operands.
Format 3: Consists of 6 bits to store an instruction, 6 bits of flag values, and 12
bits of displacement.
Format 4: Only valid on SIC/XE machines, consists of the same elements as
format 3, but instead of a 12-bit displacement, stores a 20-bit address.
Both format 3 and format 4 have six-bit flag values in them, consisting of the following flag bits:
n: Indirect addressing flag
i: Immediate addressing flag
x: Indexed addressing flag
b: Base address-relative flag
p: Program counter-relative flag
e: Format 4 instruction flag
SIC PROGRAMMING EXAMPLES:
COPY START 1000
FIRST STL RETADR
CLOOP JSUB RDREC
LDA LENGTH
COMP ZERO
JEQ ENDFIL
JSUB WRREC
J CLOOP ENDFIL LDA EOF
STA BUFFER
LDA THREE
STA LENGTH
JSUB WRREC
System Software And Operating System
10
LDL RETADR
RSUB
EOF BYTE C'EOF'
THREE WORD 3
ZERO WORD 0
RETADR RESW 1
LENGTH RESW 1
BUFFER RESB 4096
.
. SUBROUTINE TO READ RECORD INTO BUFFER
. RDREC LDX ZERO
LDA ZERO
RLOOP TD INPUT
JEQ RLOOP
RD INPUT
COMP ZERO
JEQ EXIT
STCH BUFFER,X
TIX MAXLEN
JLT RLOOP
EXIT STX LENGTH
RSUB INPUT BYTE X'F1'
MAXLEN WORD 4096
.
. SUBROUTINE TO WRITE RECORD FROM BUFFER
.
WRREC LDX ZERO
WLOOP TD OUTPUT
JEQ WLOOP
LDCH BUFFER,X
WD OUTPUT
TIX LENGTH JLT WLOOP
RSUB
OUTPUT BYTE X'06'
END FIRST
ASSEMBLERS
The design and implementation of assemblers. There are certain fundamental functions that
any assembler must perform, such as translating mnemonic operation codes to their machine
language equivalents and assigning machine addresses to symbolic labels used by the programmer.
If we consider only these fundamental functions, most assemblers are very much alike. Beyond this
most basic level, however, the features and design of an assembler depend heavily upon the source
language it translates and the machine language it produces.
One aspect of this dependence is, of course, the existence of different machine instruction
formats and codes to accomplish (for example) an ADD operation. As we shall see, there are also
many subtler ways that assemblers depend upon machine architecture. On the other hand, there are
some features of an assembler language (and the corresponding assembler) that have no direct
relation to machine architecture—they are, in a sense, arbitrary decisions made by the designers of
System Software And Operating System
11
the language. We begin by considering the design of a basic assembler for the standard version of
our Simplified Instructional Computer (SIC).
It introduces the most fundamental operations performed by a typical assembler, and
describes common ways of accomplishing these functions. The algorithms and data structures that
we describe are shared by almost all assemblers. Thus this level of presentation gives us a starting
point from which to approach the study of more advanced assembler features. We can also use this
basic structure as a framework from which to begin the design of an assembler for a completely new
or unfamiliar machine. We examine some typical extensions to the basic assembler structure that
might be dictated by hardware considerations.
An assembler is a program that takes basic computer instructions and converts them into a
pattern of bits that the computer's processor can use to perform its basic operations. Some people call
these instructions assembler language and others use the term assembly language.
he design of assembler can be to perform the following:
Scanning (tokenizing)
Parsing (validating the instructions)
Creating the symbol table
Resolving the forward references
Converting into the machine language
The design of assembler in other words:
Convert mnemonic operation codes to their machine language equivalents
Convert symbolic operands to their equivalent machine addresses
Decide the proper instruction format Convert the data constants to internal machine r
representations.
Write the object program and the assembly listing, So for the design of the assembler we
need to concentrate on the machine architecture of the SIC/XE machine.
BASIC ASSEMBLER FUNCTIONS
We use variations of this program throughout this chapter to show different assembler
features. The line numbers are for reference only and are not part of the program. These numbers
also help to relate corresponding parts of different versions of the program. The mnemonic
instructions used are those introduced and Appendix A. Indexed addressing is indicated by adding
System Software And Operating System
12
the modifier, "X" following the operand (see line 160). Lines beginning with "." contain comments
only. In addition to the mnemonic machine instructions, we have used the following assembler
directives:
START Specify name and starting address for the program.
END indicates the end of the source program and (optionally) specifies the first executable
instruction in the program.
BYTE Generate character or hexadecimal constant, occupying as many bytes as needed to
represent the constant.
WORD Generate one-word integer constant
RESB Reserve the indicated number of bytes for a data area.
RESW Reserve the indicated number of words for a data area.
The program contains a main routine that reads records from an input device (identified with
device code F1) and copies them to an output device (code 05). This main routine calls subroutine
RDREC to read a record into a buffer and subroutine WRICEC to write the record from the buffer to
the out-put device.
Line Source statement
5 COPY START 1000 COPY FILE FROM INPUT TO OUTPUT
10 FIRST STL RETADR SAVE RETURN ADDRESS
15 CLOOP JSUB RDREC READ INPUT RECORD
20 LDA LENGTH TEST FOR FOE (LENGTH = 0)
25 COMP ZERO
30 JEQ ENDFIL EXIT IF EOF FOUND 3
5 JSUB WRREC WRITE OUTPUT RECORD
40 J CLOOP LOOP
45 ENDFIL LDA EOF INSERT END OF FILE MARKER
50 STA BUFFER
55 LDA THREE SET LENGTH = 3
60 STA LENGTH
65 JSUB WRREC WRITE EOF
70 LDL RETADR GET RETURN ADDRESS
75 RSUB RETURN TO CALLER
80 EOF BYTE C'EOF'
System Software And Operating System
13
85 THREE WORD 3
90 ZERO WORD 0
95 RETADR RESW 1
100 LENGTH RESW 1 LENGTH OF RECORD
105 BUFFER RESB 4096 4096-BYTE BUFFER AREA
110 .
115 . SUBROUTINE TO READ RECORD INTO BUFFER
120 .
125 RDREC LDX ZERO CLEAR LOOP COUNTER
130 LDA ZERO CLEAR A TO ZERO
135 RLOOP TD INPUT TEST INPUT DEVICE
140 JEQ RLOOP LOOP UNTIL READY
145 RD INPUT READ CHARACTER INTO REGISTER A
150 COMP ZERO TEST FOR END OF RECORD IX'00')
155 JEQ EXIT EXIT LOOP IF FOR
160 STCH BUFFER,X STORE CHARACTER IN BUFFER
165 TIX MAXLEN LOOP UNLESS MAX LENGTH
170 JLT RLOOP HAS BEEN REACHED
175 EXIT STX LENGTH SAVE RECORD LENGTH
Example of a SIC assembler language program.
Each subroutine must transfer the record one character at a time because the only I/O
instructions available are RD and WD. The buffer is necessary because the I/O rates for the two
devices, such as a disk and a slow printing terminal, may be very different. (In Chapter 6, we see
how to use channel programs and operating system calls on a SIC/XE system to accomplish the
same functions.) The end of each record is marked with a null character (hexadecimal 00). If a
record is longer than the length of the buffer (4096 bytes), only the first 4096 bytes are copied. (For
simplicity, the program does not deal with error recovery when a record containing 4096 bytes or
more is read.) The end of the file to be copied is indicated by a zero-length record. When the end of
file is detected, the program writes EOF on the output device and terminates by executing an RSUB
instruction. We assume that this pro-gram was called by the operating system using a PUB
instruction; thus, the RSUB will return control to the operating system.
System Software And Operating System
14
A Simple SIC Assembler
The generated object code for each statement. The column headed Loc gives the machine
address (in hexadecimal) for each part of the assembled program. We have assumed that the program
starts at address 1000. (In an actual assembler listing, of course, the comments would be retained;
they have been eliminated here to save space.) The translation of source program to object code
requires us to accomplish the following functions (not necessarily in the order given):
1. Convert mnemonic operation codes to their machine language equivalents—e.g., translate STL to
14 (line 10).
2. Convert symbolic operands to their equivalent machine addresses—e.g., translate RETADR to
1033 (line 10).
3. Build the machine instructions in the proper format.
4. Convert the data constants specified in the source program into their internal machine
representations—e.g., translate EOF to 454F46 (line 80).
5. Write the object program and the assembly listing.
All of these functions except number 2 can easily be accomplished by sequential processing of the
source program, one line at a time. The translation of addresses, however, presents a problem.
Consider the statement
10 1000 FIRST ̀ STL REPADR 141033
Line Loc Source statement Object code
5 1000 COPY START 1000
10 1000 FIRST STL RETADR 141033
15 1003 CLOOP JSUB RDREC 482039
20 1006 LDA LENGTH 001036
25 1009 COMP ZERO 281030
30 100C JEQ ENDFIL 301015
35 100F JSUB WRREC 482061
40 1012 J CLOOP 3C1003
45 1015 ENDFIL LDA EOF 00102A
50 1018 STA BUFFER 0C1039
55 101B LDA THREE 00102D
System Software And Operating System
15
60 101E STA LENGTH 0C1036
65 1021 JSUB WRREC 482061
70 1024 LDL RETADR 081033
75 1027 RSUB 4C0000
80 1027 EOF BYTE C'EOF' 454F46
85 102D THREE WORD 3 000003
90 1030 ZERO WORD 0 000000
95 1033 RETADR RESW 1
100 1036 LENGTH RESW 1
Program with object code
This instruction contains a forward reference that is, a reference to a label (RETADR) that is
defined later in the program. If we attempt to translate the program line by line, we will be unable to
process this statement because we do not know the address that will be assigned to RETADR.
Because of this, most assemblers make two passes over the source program. The first pass does little
more than scan the source program for label definitions and assign addresses.
The second pass per-forms most of the actual translation previously described. In addition to
translating the instructions of the source program, the assembler must process statements called
assembler directives (or pseudo-instructions). These statements are not translated into machine
instructions (although they may have an effect on the object program).
Instead, they provide instructions to the assembler itself. Examples of assembler directives
are statements like BYTE and WORD, which direct the assembler to generate constants as part of
the object program, and RESB and RESW, which instruct the assembler to reserve memory locations
without generating data values. The other assembler directives in our sample program are START,
which specifies the starting memory address for the object program, and END, which marks the end
of the program.
Finally, the assembler must write the generated object code onto some out-put device. This
object program will later be loaded into memory for execution. The simple object program format
we use contains three types of records: Header, Text, and End. The Header record contains the
program name, starting address, and length.
Text records contain the translated (i.e., machine code) instructions and data of the program,
together with an indication of the addresses where these are to be loaded. The End record marks the
end of the object program and specifies the address in the program where execution is to begin. (This
is taken from the operand of the program's END statement. If no operand is specified, the address of
the first executable instruction is used.)
System Software And Operating System
16
The formats we use for these records are as follows. The details of the for-mats (column
numbers, etc.) are arbitrary; however, the information contained in these records must be present (in
some form) in the object program.
Header record:
Col. I H
Col. 2-7 Program name
Col. 8-13 Starting address of object program (hexadecimal)
Col. 14-19 Length of object program in bytes (hexadecimal)
Text record:
Col. 1 T
Col. 2-7 Starting address for object code in this record (hexadecimal)
Col. 8-9 Length of object code in this record in bytes (hexadecimal)
Col. 10-69 Object code, represented in hexadecimal (2 columns per byte of object code)
End record:
Col. 1 E
Col. 2-7 Address of first executable instruction in object program (hexadecimal)
To avoid confusion, we have used the term column rather than byte to refer to positions within
object program records. This is not meant to imply the use of any particular medium for the object
program.
The symbol A is used to separate fields visually. Of course, such symbols are not present in
the actual object program. Note that there is no object code corresponding to addresses 1033-2038.
This storage is simply reserved by the loader for use by the program during execution. (Chapter 3
contains a detailed discussion of the operation of the loader.) We can now give a general description
of the functions of the two passes of our simple assembler.
Pass 1 (define symbols):
1. Assign addresses to all statements in the program.
2. Save the values (addresses0 assigned to all labels for use in pass 2.
3. Perform some processing of assembler directives. (This includes processing that affects
address assignment, such as determining the length of data areas defined by BYTE, RESW, etc.)
Pass 2 (assemble instructions and generate object program):
System Software And Operating System
17
1. Assemble instructions (translating operation codes and looking up addresses).
2. Generate data values defined by BYTE, WORD, etc.
3. Perform processing of assembler directives not done during Pass 1.
4. Write the object program and the assembly listing.
In the next section we discuss these functions in more detail, describe the internal tables required by
the assembler, and give an overall description of the logic flow of each pass.
ASSEMBLER ALGORITHM AND DATA STRUCTURE The simple assembler uses two major internal data structures:
Operation Code Table(OPTAB)
Symbol Table (SYMTAB)
Operation Code Table (OPTAB):
OPTAB is used to lookup mnemonic operation codes and translates them to their machine
language equivalents.
In more complex assemblers the table contains information about instruction format and
length.
In pass 1 the OPTAB is used to look up and validate the operation code in the source
program.
In pass 2, it is used to translate the operation codes to machine language. In simple SIC
machine this process can be performed in either in pass 1 or in pass 2.
But for machine like SIC/XE that has instructions of different lengths, we must search
OPTAB in the first pass to find the instruction length for incrementing LOCCTR.
In pass 2 we take the information from OPTAB to tell us which instruction format to use in
assembling the instruction, and any peculiarities of the object code instruction.
OPTAB is usually organized as a hash table, with mnemonic operation code as the key. The
hash table organization is particularly appropriate, since it provides fast retrieval with a minimum of
searching. Most of the cases the OPTAB are a static table that is; entries are not normally added to or
deleted from it. In such cases it is possible to design a special hashing function or other data
structure to give optimum performance for the particular set of keys being stored.
Symbol Table (SYMTAB):
Symbol table includes the name and value for each label in the source program, together with
flags to indicate the error conditions (e.g., if a symbol is defined in two different places).
During Pass 1: labels are entered into the symbol table along with their assigned address
value as they are encountered. All the symbols address value should get resolved at the pass 1.
During Pass 2: Symbols used as operands are looked up the symbol table to obtain the
address value to be inserted in the assembled instructions.
System Software And Operating System
18
System Software And Operating System
19
SYMTAB is usually organized as a hash table for efficiency of insertion and retrieval. Since
entries are rarely deleted, efficiency of deletion is the important criteria for optimization. Both pass 1
and pass 2 require reading the source program. Apart from this an intermediate file is created by pass
1 that contains each source statement together with its assigned address, error indicators, etc. This
file is one of the inputs to the pass 2.
A copy of the source program is also an input to the pass 2, which is used to retain the
operations that may be performed during pass 1 (such as scanning the operation field for symbols
and addressing flags), so that these need not be performed during pass 2. Similarly, pointers into
OPTAB and SYMTAB is retained for each operation code and symbol used. This avoids need to
repeat many of the table searching operations.
LOCCTR:
System Software And Operating System
20
Location counter helps in the assignment of the addresses. LOCCTR is initialized to the beginning
address mentioned in the START statement of the program. After each statement is processed, the
length of the assembled instruction is added to the LOCCTR to make it point to the next instruction.
Whenever a label is encountered in an instruction the LOCCTR value gives the address to be
associated with that label.
The Algorithm for Pass 1:
Begin
read first input line
if OPCODE = „START. then begin
save #[Operand] as starting addr
initialize LOCCTR to starting address
write line to intermediate file
read next line
end( if START)
else
initialize LOCCTR to 0
While OPCODE != „END. do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
else
(if symbol)
search OPTAB for OPCODE
if found then
add 3 (instr length) to LOCCTR
else if OPCODE = „WORD. then
add 3 to LOCCTR
else if OPCODE = „RESW. then
add 3 * #[OPERAND] to LOCCTR
else if OPCODE = „RESB. then
add #[OPERAND] to LOCCTR
else if OPCODE = „BYTE. then
begin
find length of constant in bytes
add length to LOCCTR
end
else
set error flag (invalid operation code)
end (if not a comment)
write line to intermediate file
read next input line
System Software And Operating System
21
end { while not END}
write last line to intermediate file
Save (LOCCTR – starting address) as program length
End {pass 1}
The algorithm scans the first statement START and saves the operand field (the address) as
the starting address of the program. Initializes the LOCCTR value to this address. This line is written
to the intermediate line. If no operand is mentioned the LOCCTR is initialized to zero. If a label is
encountered, the symbol has to be entered in the symbol table along with its associated address
value. If the symbol already exists that indicates an entry of the same symbol already exists.
So an error flag is set indicating a duplication of the symbol. It next checks for the mnemonic
code, it searches for this code in the OPTAB. If found then the length of the instruction is added to
the LOCCTR to make it point to the next instruction.
If the opcode is the directive WORD it adds a value 3 to the LOCCTR. If it is RESW, it
needs to add the number of data word to the LOCCTR. If it is BYTE it adds a value one to the
LOCCTR, if RESB it adds number of bytes. If it is END directive then it is the end of the program it
finds the length of the program by evaluating current LOCCTR the starting address mentioned in the
operand field of the END directive. Each processed line is written to the intermediate file.
The Algorithm for Pass 2:
Begin
read 1st input line
if OPCODE = „START. then
begin
write listing line
read next input line
end
write Header record to object program
initialize 1st Text record
while OPCODE != „END. do
begin
if this is not comment line then
begin
search OPTAB for OPCODE
if found then
begin
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND field then
if found then
begin
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
System Software And Operating System
22
end
end (if symbol)
else store 0 as operand address
assemble the object code instruction
else if OPCODE = „BYTE. or „WORD” then
convert constant to object code
if object code doesn.t fit into current Text record then
begin
Write text record to object code
initialize new Text record
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end
write listing line
read next input line
write last listing line
End {Pass 2}
Here the first input line is read from the intermediate file. If the opcode is START, then this
line is directly written to the list file. A header record is written in the object program which gives
the starting address and the length of the program (which is calculated during pass 1). Then the first
text record is initialized.
Comment lines are ignored. In the instruction, for the opcode the OPTAB is searched to find
the object code. If a symbol is there in the operand field, the symbol table is searched to get the
address value for this which gets added to the object code of the opcode. If the address not found
then zero value is stored as operands address. An error flag is set indicating it as undefined. If
symbol itself is not found then store 0 as operand address and the object code instruction is
assembled.
If the opcode is BYTE or WORD, then the constant value is converted to its equivalent
object code ( for example, for character EOF, its equivalent hexadecimal value 454f46 is stored). If
the object code cannot fit into the current text record, a new text record is created and the rest of the
instructions object code is listed. The text records are written to the object program. Once the whole
program is assembled and when the END directive is encountered, the End record is written.
MACHINE-DEPENDENT ASSEMBLER FEATURES
The design and implementation of an assembler for the more complex XE version of SIC. In
doing so, we examine the effect of the extended hardware on the structure and functions of the
assembler. Many real machines have certain architectural features that are similar to those we
consider here. Thus our discussion applies in large part to these machines as well as to SIC/XE. It
might be rewritten to take advantage of the SIC/XE instruction set. In our assembler language,
indirect addressing is indicated by adding the prefix 0 to the operand immediate operands is denoted
with the prefix # (lines 25, 55, 133).
System Software And Operating System
23
Instructions that refer to memory are normally assembled using either the program-counter
relative or the base relative mode. The assembler directive BASE (line 13) is used in conjunction
with base relative addressing. (See Section 2.2.1 for a discussion and examples.) If the displacements
required.
• Instructions can be:
1. Instructions involving register to register
2. Instructions with one operand in memory, the other in Accumulator (Single operand
instruction)
3.Extended instruction format
• Addressing Modes are:
Index Addressing(SIC): Opcode m, x
Indirect Addressing: Opcode @m
PC-relative: Opcode m
Base relative: Opcode m
Immediate addressing: Opcode #c
1. Translations for the Instruction involving Register-Register addressing mode:
During pass 1 the registers can be entered as part of the symbol table itself. The value for
these registers is their equivalent numeric codes. During pass 2, these values are
assembled along with the mnemonics object code. If required a separate table can be
created with the register names and their equivalent numeric values.
2. Translation involving Register-Memory instructions: In SIC/XE machine there are four
instruction formats and five addressing modes. For formats and addressing modes. Among the
instruction formats, format -3 and format-4 instructions are Register- Memory type of
instruction. One of the operand is always in a register and the other operand is in the memory.
The addressing mode tells us the way in which the operand from the memory is to be fetched.
There are two ways: Program-counter relative and Base-relative. This addressing
mode can be represented by either using format-3 type or format-4 type of instruction
format. In format-3, the instruction has the opcode followed by a 12-bit displacement
value in the address field. Where as in format-4 the instruction contains the mnemonic
code followed by a 20-bit displacement value in the address field.
Program-Counter Relative: In this usually format-3 instruction format is used. The
instruction contains the opcode followed by a 12-bit displacement value. The range of
displacement values are from 0 -2048. This displacement (should be small enough to fit
in a 12-bit field) value is added to the current contents of the program counter to get the
target address of the operand required by the instruction. This is relative way of
calculating the address of the operand relative to the program counter. Hence the
displacement of the operand is relative to the current program counter value.
The Algorithm for Pass 1:
System Software And Operating System
24
Begin
read first input line
if OPCODE = „START. then begin
save #[Operand] as starting addr
initialize LOCCTR to starting address
write line to intermediate file
read next line
end( if START)
else
initialize LOCCTR to 0
While OPCODE != „END. do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
else
(if symbol)
search OPTAB for OPCODE
if found then
add 3 (instr length) to LOCCTR
else if OPCODE = „WORD. then
add 3 to LOCCTR
else if OPCODE = „RESW. then
add 3 * #[OPERAND] to LOCCTR
else if OPCODE = „RESB. then
add #[OPERAND] to LOCCTR
else if OPCODE = „BYTE. then
begin
find length of constant in bytes
add length to LOCCTR
end
else
set error flag (invalid operation code)
end (if not a comment)
write line to intermediate file
read next input line
end { while not END}
write last line to intermediate file
Save (LOCCTR – starting address) as program length
End {pass 1}
Algorithum for pass 1 assembler
Line Source statement
System Software And Operating System
25
5 COPY START 1000 COPY FILE FROM INPUT TO OUTPUT
10 FIRST STL RETADR SAVE RETURN ADDRESS
15 CLOOP JSUB RDREC READ INPUT RECORD
20 LDA LENGTH TEST FOR FOE (LENGTH = 0)
25 COMP ZERO
30 JEQ ENDFIL EXIT IF EOF FOUND 3
5 JSUB WRREC WRITE OUTPUT RECORD
40 J CLOOP LOOP
45 ENDFIL LDA EOF INSERT END OF FILE MARKER
50 STA BUFFER
55 LDA THREE SET LENGTH = 3
60 STA LENGTH
65 JSUB WRREC WRITE EOF
70 LDL RETADR GET RETURN ADDRESS
75 RSUB RETURN TO CALLER
80 EOF BYTE C'EOF'
85 THREE WORD 3
90 ZERO WORD 0
95 RETADR RESW 1
100 LENGTH RESW 1 LENGTH OF RECORD
EXAMPLE of a SIC/XE Program
For both program-counter relative and base relative addressing are too large to fit into a 3-byte
instruction, then the 4-byte extended format (Format 4) must be used.
The extended instruction format is specified with the prefix + added to the operation code in
the source statement (see lines 15, 35, 65). It is the programmer's responsibility to specify this form
of addressing when it is required. The main differences between this version of the program and the
version in Fig. 2.1 involve the use of register-to-register instructions wherever possible. In addition,
immediate and indirect addressing has been used as much as possible (for example, lines 25, 55, and
70).
System Software And Operating System
26
These changes take advantage of the more advanced SIC/XE architecture to improve the
execution speed of the program. Register-to-register instructions are faster than the corresponding
register-to-memory operations because they are shorter, and, more importantly, because they do not
require another memory reference. (Fetching an operand from a register is much faster than
retrieving it from main memory.)
Likewise, when using immediate addressing, the operand is already present as part of the
instruction and need not be fetched from anywhere. The use of indirect addressing often avoids the
need for another instruction (as in the "return" operation on line 70). You may notice that some of
the changes require the addition of other instructions to the program. This still results in an
improvement in execution speed.
The CLEAR is executed only once for each record read, whereas the benefits of COMPR (as
opposed to COMP) are realized for every byte of data transferred. In Section 2.2.1, we examine the
assembly of this SIC/XE program, focusing on the differences in the assembler that are required by
the new addressing modes. These changes are direct con-sequences of the extended hardware
functions.
An indirect consequence of the change to SIC/XE. The larger main memory of SIC/XE
means that we may have room to load and run several programs at the same time. This kind of
sharing of the machine between programs is called multiprogramming. Such sharing often results in
more productive use of the hardware.
INSTRUCTION FORMATS AND ADDRESSING MODES
The object code generated for each statement in the program. In this section we consider the
translation of the source statements, paying particular attention to the handling of different
instruction format and different addressing modes. Note that the START statement now specifies a
beginning program address of 0. As we discuss in the next section, this indicates a relocatable
program. For the purposes of instruction assembly, how-ever, the program will be translated exactly
as if it were really to be loaded at machine address 0.
Translation of register-to-register instructions such as CLEAR and COMPR presents no new
problems. The assembler must simply convert the mnemonic operation code to machine language
(using OPTAB) and change each register mnemonic to its numeric equivalent. This translation is
done during Pass 2, at the same point at which the other types of instructions are assembled.
The conversion of register mnemonics to numbers can be done with a separate table;
however, it is often convenient to use the symbol table for this purpose. To do this, SYMTAB would
be preloaded with the register names (A, X, etc.) and their values (0, 1, etc.).
Most of the register-to-memory instructions are assembled using either program-counter
relative or base relative addressing. The assembler must, in either case, calculate a displacement to
be assembled as part of the object instruction.
This is computed so that the correct target address results when the displacement is added to
the contents of the program counter (PC) or the base register (B). Of course, the resulting
displacement must be small enough to fit in the 12-bit field in the instruction. This means that the
System Software And Operating System
27
displacement must be between 0 and 4095 (for base relative mode) or between —2048 and +2047
(for program-counter relative mode).
If neither program-counter relative nor base relative addressing can be used (because the
displacements are too large), then the 4-byte extended instruction format (Format 4) must be used.
This 4-byte format contains a 20-bit address field, which is large enough to contain the full memory
address. In this case, there is no displacement to be calculated. For example, in the instruction
15 0006 CLOOP +JSUB RDREC 48101036
The operand address is 1036. This full address is stored in the instruction, with bit e set to 1
to indicate extended instruction format. Note that the programmer must specify the extended format
by using the prefix + (as on line 15). If extended format is not specified, our assembler first attempts
to translate the instruction using program-counter relative addressing. If this is not possible (because
the required displacement is out of range), the assembler then attempts to use base relative
addressing. If neither form of
Line Loc Source statement Object code
5 1000 COPY START 1000
10 1000 FIRST STL RETADR 141033
15 1003 CLOOP JSUB RDREC 482039
20 1006 LDA LENGTH 001036
25 1009 COMP ZERO 281030
30 100C JEQ ENDFIL 301015
35 100F JSUB WRREC 482061
40 1012 J CLOOP 3C1003
45 1015 ENDFIL LDA EOF 00102A
50 1018 STA BUFFER 0C1039
55 101B LDA THREE 00102D
60 101E STA LENGTH 0C1036
65 1021 JSUB WRREC 482061
70 1024 LDL RETADR 081033
75 1027 RSUB 4C0000
80 1027 EOF BYTE C'EOF' 454F46
85 102D THREE WORD 3 000003
System Software And Operating System
28
90 1030 ZERO WORD 0 000000
95 1033 RETADR RESW 1
100 1036 LENGTH RESW 1
Program with object code
Relative addressing is applicable and extended format is not specified , then the instruction cannot be
properly assembled. In this case , the assembler must generate an error message.
PROGRAM RELOCATION Sometimes it is required to load and run several programs at the same time. The system must
be able to load these programs wherever there is place in the memory. Therefore the exact starting is
not known until the load time.
Absolute Program
In this the address is mentioned during assembling itself. This is called Absolute Assembly.
Consider the instruction:
55 101B LDA THREE 00102D
This statement says that the register A is loaded with the value stored at location 102D.
Suppose it is decided to load and execute the program at location 2000 instead of location 1000.
Then at address 102D the required value which needs to be loaded in the register A is no more
available. The address also gets changed relative to the displacement of the program.
Hence we need to make some changes in the address portion of the instruction so that we can
load and execute the program at location 2000. Apart from the instruction which will undergo a
change in their operand address value as the program load address changes. There exist some parts
in the program which will remain same regardless of where the program is being loaded.
Since assembler will not know actual location where the program will get loaded, it cannot
make the necessary changes in the addresses used in the program. However, the assembler identifies
for the loader those parts of the program which need modification. An object program that has the
information necessary to perform this kind of modification is called the relocatable program.
Relocation Program
System Software And Operating System
29
The above diagram shows the concept of relocation. Initially the program is loaded at
location 0000. The instruction JSUB is loaded at location 0006. The address field of this instruction
contains 01036, which is the address of the instruction labeled RDREC. The second figure shows
that if the program is to be loaded at new location 5000. The address of the instruction JSUB gets
modified to new location 6036. Likewise the third figure shows that if the program is relocated at
location 7420, the JSUB instruction would need to be changed to 4B108456 that correspond to the
new address of RDREC.
The only part of the program that require modification at load time are those that specify
direct addresses. The rest of the instructions need not be modified. The instructions which doesn’t
require modification are the ones that is not a memory address (immediate addressing) and PC-
relative, Base-relative instructions. From the object program, it is not possible to distinguish the
address and constant.
The role of relocation, the ability to execute processes independently from their physical
location in memory, is central for memory management: virtually all the techniques in this field
rely on the ability to relocate processes efficiently. The need for relocation is immediately
evident when one considers that in a general-purpose multiprogramming environment a program
cannot know in advance (before execution, i.e. at compile time) what processes will be running
in memory when it is executed, nor how much memory the system has available for it, nor where
it is located. Hence a program must be compiled and linked in such a way that it can later be
loaded starting from an unpredictable address in memory, an address that can even change
during the execution of the process itself, if any swapping occurs.
It's easy to identify the basic requirement for a (binary executable) program to be
relocatable: all the references to memory it makes during its execution must not contain absolute
(i.e. physical) addresses of memory cells, but must be generated relatively, i.e. as a distance,
measured in number of contiguous memory words, from some known point. The memory
references a program can generate are of two kinds: references to instructions ad references to
data. The former kind is implied in the execution of program branches or subroutine calls: a
jump machine instruction always involves the loading of the CPU program counter register with
the address of the memory word containing the instruction to jump to. The executable code of a
relocatable program must then contain only relative branch machine instructions, in which the
address to branch to is specified as an increment (or decrement) with respect to the address of the
current instruction (or to the content of a register or memory word). The latter kind comes into
play when whenever program variables (including program execution variables, like a subroutine
call stack) are accessed. In this case relocation is made possible by the use of indexed or
increment processor addressing modes, in which the address of a memory word is computed at
reference time as the sum of the content of a register plus an increment or a decrement.
As we'll see later, the memory references of a process in a multitasking environment must
somehow be bounded, so to protect from unwanted interferences memory areas like the
unwritable parts of the process itself, or the memory areas containing the images of other
processes, etc. This is usually accomplished in hardware by comparing the address of each
memory reference produced by a process with the content of one or more bound registers or
System Software And Operating System
30
memory words, so that the processor traps an exception to block the process should an illegal
address be generated
A Scheme Of The Address Computation Involved In The Memory References Of A
Relocatable Program Is Shown.
The assembler must keep some information to tell the loader. The object program that
contains the modification record is called a relocatable program. For an address label, its address is
assigned relative to the start of the program (START 0). The assembler produces a Modification
record to store the starting location and the length of the address field to be modified. The command
for the loader must also be a part of the object program. The Modification has the following format:
Modification record
Col. 1 M
Col. 2-7 Starting location of the address field to be modified, relative to the beginning of
the program (Hex)
Col. 8-9 Length of the address field to be modified, in half-bytes (Hex)
System Software And Operating System
31
One modification record is created for each address to be modified The length is stored in
half bytes (4 bits) The starting location is the location of the byte containing the leftmost bits of the
address field to be modified. If the field contains an odd number of half-bytes, the starting location
begins in the middle of the first byte.
55 101B LDA THREE 00102D
In the above object code the red boxes indicate the addresses that need modifications. The
object code lines at the end are the descriptions of the modification records for those instructions
which need change if relocation occurs. M00000705 is the modification suggested for the statement
at location 0007 and requires modification 5-half bytes. Similarly the remaining instructions
indicate.
MACHINE-INDEPENDENT ASSEMBLER FEATURES These are the features which do not depend on the architecture of the machine. These are:
Literals
Symbol-Defining Statements
Expressions
Program blocks
Control sections and program linking.
LITERALS In programming, a value written exactly as it's meant to be interpreted. In contrast, a
variable is a name that can represent different values during the execution of the program. And a
constant is a name that represents the same value throughout a program. But a literal is not a
name -- it is the value itself.
A literal can be a number, a character, or a string. For example, in the expression,
x = 3
x is a variable, and 3 is a literal.
A literal is defined with a prefix = followed by a specification of the literal value.
Example:
45 001A ENDFIL LDA =C.EOF 032010
This statement specifies a 3byte operand whose value is the character of EOF.
215 1062 WLOOP TD =X’05’ E32011
This statement specifies a 1byte literal with the hexadecimal value 05.
- -
93 LTORG
002D * =C.EOF. 454F46
The object code for the instruction is also mentioned in the above example. It shows the
relative displacement value of the location where this value is stored. In the example the value is at
location (002D) and hence the displacement value is (010).
It is important to understand the difference between a constant defined as a literal and a
System Software And Operating System
32
constant defined as an immediate operand. In case of literals the assembler generates the specified
value as a constant at some other memory location In immediate mode the operand value is
assembled as part of the instruction itself. Example
55 0020 LDA #03 010003
All the literal operands used in a program are gathered together into one or more literal pools.
This is usually placed at the end of the program. The assembly listing of a program containing
literals usually includes a listing of this literal pool, which shows the assigned addresses and the
generated data values. In some cases it is placed at some other location in the object program. An
assembler directive LTORG is used. Whenever the LTORG is encountered, it creates a literal pool
that contains all the literal operands used since the beginning of the program. The literal pool
definition is done after LTORG is encountered. It is better to place the literals close to the
instructions.
A literal table is created for the literals which are used in the program. The literal table
contains the literal name, operand value and length. The literal table is usually created as a hash table
on the literal name.
Line Source statement
5 COPY START 1000 COPY FILE FROM INPUT TO OUTPUT
10 FIRST STL RETADR SAVE RETURN ADDRESS
15 CLOOP JSUB RDREC READ INPUT RECORD
20 LDA LENGTH TEST FOR FOE (LENGTH = 0)
25 COMP ZERO
30 JEQ ENDFIL EXIT IF EOF FOUND 3
5 JSUB WRREC WRITE OUTPUT RECORD
40 J CLOOP LOOP
45 ENDFIL LDA EOF INSERT END OF FILE MARKER
50 STA BUFFER
55 LDA THREE SET LENGTH = 3
60 STA LENGTH
65 JSUB WRREC WRITE EOF
70 LDL RETADR GET RETURN ADDRESS
75 RSUB RETURN TO CALLER
80 EOF BYTE C'EOF'
System Software And Operating System
33
85 THREE WORD 3
90 ZERO WORD 0
95 RETADR RESW 1
100 LENGTH RESW 1 LENGTH OF RECORD
IMPLEMENTATION OF LITERALS
During Pass-1:
The literal encountered is searched in the literal table. If the literal already exists, no action is
taken; if it is not present, the literal is added to the LITTAB and for the address value it waits till it
encounters LTORG for literal definition. When Pass 1 encounters a LTORG statement or the end of
the program, the assembler makes a scan of the literal table. At this time each literal currently in the
table is assigned an address. As addresses are assigned, the location counter is updated to reflect the
number of bytes occupied by each literal.
During Pass-2:
The assembler searches the LITTAB for each literal encountered in the instruction and
replaces it with its equivalent value as if these values are generated by BYTE or WORD. If a
literal represents an address in the program, the assembler must generate a modification
relocation for, if it all it gets affected due to relocation. The following figure shows the difference
between the SYMTAB and LITTAB
SYMBOL-DEFINING STATEMENTS EQU Statement:
Most assemblers provide an assembler directive that allows the programmer to define
symbols and specify their values. The directive used for this EQU (Equate). The general form of
the statement is
Symbol EQU value
This statement defines the given symbol (i.e., entering in the SYMTAB) and assigning to it
the value specified. The value can be a constant or an expression involving constants and any other
symbol which is already defined. One common usage is to define symbolic names that can be used
to improve readability in place of numeric values. For example
+LDT #4096
This loads the register T with immediate value 4096, this does not clearly what exactly this
value indicates. If a statement is included as:
MAXLEN EQU 4096 and then
+LDT #MAXLEN
Then it clearly indicates that the value of MAXLEN is some maximum length value. When
the assembler encounters EQU statement, it enters the symbol MAXLEN along with its value in the
symbol table. During LDT the assembler searches the SYMTAB for its entry and its equivalent
value as the operand in the instruction. The object code generated is the same for both the options
discussed, but is easier to understand.
If the maximum length is changed from 4096 to 1024, it is difficult to change if it is
mentioned as an immediate value wherever required in the instructions. We have to scan the whole
System Software And Operating System
34
program and make changes wherever 4096 is used. If we mention this value in the instruction
through the symbol defined by EQU, we may not have to search the whole program but change only
the value of MAXLENGTH in the EQU statement (only once).
Another common usage of EQU statement is for defining values for the general-purpose
registers. The assembler can use the mnemonics for register usage like a-register A, X – index
register and so on. But there are some instructions which requires numbers in place of names in the
instructions. For example in the instruction RMO 0,1 instead of RMO A,X. The programmer can
assign the numerical values to these registers using EQU directive.
A EQU 0
X EQU 1 and so on
These statements will cause the symbols A, X, L… to be entered into the symbol table with
their respective values. An instruction RMO A, X would then be allowed. As another usage if in a
machine that has many general purpose registers named as R1, R2… some may be used as base
register, some may be used as accumulator. Their usage may change from one program to another.
In this case we can define these requirement using EQU statements.
BASE EQU R1
INDEX EQU R2
COUNT EQU R3
One restriction with the usage of EQU is whatever symbol occurs in the right hand side of
the EQU should be predefined. For example, the following statement is not valid:
BETA EQU ALPHA
ALPHA RESW 1
As the symbol ALPHA is assigned to BETA before it is defined. The value of ALPHA is not known.
ORG STATEMENT This directive can be used to indirectly assign values to the symbols. The directive is usually
called ORG (for origin). Its general format is:
ORG value
where value is a constant or an expression involving constants and previously defined symbols.
When this statement is encountered during assembly of a program, the assembler resets its location
counter (LOCCTR) to the specified value. Since the values of symbols used as labels are taken from
LOCCTR, the ORG statement will affect the values of all labels defined until the next ORG is
encountered. ORG is used to control assignment storage in the object program.
Sometimes altering the values may result in incorrect assembly.
ORG can be useful in label definition. Suppose we need to define a symbol table with the following
structure:
SYMBOL 6 Bytes
VALUE 3 Bytes
FLAG 2 Bytes
The table looks like the one given below.
The symbol field contains a 6-byte user-defined symbol; VALUE is a one-word representation of the
value assigned to the symbol; FLAG is a 2-byte field specifies symbol type and other information.
The space for the table can be reserved by the statement:
STAB RESB 1100
System Software And Operating System
35
If we want to refer to the entries of the table using indexed addressing, place the offset value of the
desired entry from the beginning of the table in the index register. To refer to the fields SYMBOL,
VALUE, and FLAGS individually, we need to assign the values first as shown below:
SYMBOL EQU STAB
VALUE EQU STAB+6
FLAGS EQU STAB+9
To retrieve the VALUE field from the table indicated by register X, we can write a statement:
LDA VALUE, X
The same thing can also be done using ORG statement in the following way:
STAB RESB 1100
ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAG RESB 2
ORG STAB+1100
The first statement allocates 1100 bytes of memory assigned to label STAB. In the second statement
the ORG statement initializes the location counter to the value of STAB. Now the LOCCTR points
to STAB. The next three lines assign appropriate memory storage to each of SYMBOL, VALUE and
FLAG symbols. The last ORG statement reinitializes the LOCCTR to a
new value after skipping the required number of memory for the table STAB (i.e., STAB+1100).
While using ORG, the symbol occurring in the statement should be predefined as is required in EQU
statement. For example for the sequence of statements below:
ORG ALPHA
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1
ORG
ALPHA RESB 1
The sequence could not be processed as the symbol used to assign the new location counter value is
not defined. In first pass, as the assembler would not know what value to assign to ALPHA, the
other symbol in the next lines also could not be defined in the symbol table. This is a kind of
problem of the forward reference.
EXPRESSIONS: Assembler language statements have used single terms (labels, literals, etc.) as instruction
operands. Most assemblers allow the use of expressions wherever such a single operand is permitted.
Each such expression must, of course, be evaluated by the assembler to produce a single operand
address or value. Assemblers generally allow arithmetic expressions formed according to the normal
rules using the operators +, and /. Division is usually defined to produce an integer result. Individual
terms in the expression may be constants, user-defined symbols, or special terms. The most common
such special term is the current value of the location counter (often designated by 1. This term
represents the value of the next unassigned memory location.
106 BUFEND EQU
gives BUFEND a value that is the address of the next byte after the buffer area.
System Software And Operating System
36
we discussed the problem of program relocation. We saw that some values in the object
program are relative to the beginning of the pro-gram, while others are absolute (independent of
program location). Similarly, the values of terms and expressions are either relative or absolute. A
constant is, of course, an absolute term. Labels on instructions and data are, and references to the
location counter value, are relative terms.
A symbol whose value is given by EQU (or some similar assembler directive) may be either
an absolute term or a relative term depending upon the expression used to define its value.
Expressions are classified as either absolute expressions or relative expressions depending upon the
type of value they produce. An expression that contains only absolute terms is, of course, an absolute
expression. However, absolute expressions may also contain relative terms provided the relative
terms occur in pairs and the terms in each such pair have opposite signs. It is not necessary that the
paired terms be adjacent to each other in the expression; however, all relative terms must be capable
of being paired in this way. None of the relative terms may enter into a multiplication or division
operation.
A relative expression is one M which all of the relative terms except one can be paired as
described above; the remaining unpaired relative term must have a positive sign. As before, no
relative term may enter into a multiplication or division operation. Expressions that do not meet the
conditions given for either absolute or relative expressions should be flagged by the assembler as
errors. Although the rules given above may seem arbitrary, they are actually quite reasonable. The
expressions that are legal under these definitions include exactly those expressions whose value
remains meaningful when the program is relocated.
A relative term or expression represents some value that may be written as r, where S is the
starting address of the program and r is the value of the term or expression relative to the starting
address. Thus a relative term usually represents some location within the program. When relative
terms are paired with opposite signs, the dependency on the program starting address is cancelled out
the result is an absolute value. In the statement
107 MAXLEN EQU BUFEND-BUFFER
Both BUFEND and BUFFER are relative terms each representing an address within the program.
However, the expression represents the absolute value the difference between the two addresses
which is the length of the buffer area in bytes.
Expressions such as BUFEND + BUFFER, 100 - BUFFER, or 3 • BUFFER represent neither
absolute values nor locations within the program. The values of these expressions depend upon the
program starting address in a way that is unrelated to anything within the program itself. Because
such expressions are very unlikely to be of any use, they are considered errors.
To determine the type of an expression, we must keep track of the types of all symbols
defined in the program. For this purpose we need a flag in the symbol table to indicate type of value
(absolute or relative) in additional the value itself. Thus for the program, some of the symbol table
entries might be
System Software And Operating System
37
With this information the assembler can easily determine the type of each expression used as
an operand and generate Modification records in the object program for relative values. In Section
235 we consider programs that consist of several parts that can be relocated independently of ea.
other. As we discuss in the later section, our rules for determining the type of an expression must be
modified in such instances.
Assemblers also allow use of expressions in place of operands in the instruction. Each
such expression must be evaluated to generate a single operand value or address. Assemblers
generally arithmetic expressions formed according to the normal rules using arithmetic operators +, -
*, /. Division is usually defined to produce an integer result. Individual terms may be constants, user-
defined symbols, or special terms. The only special term used is * ( the current value of location
counter) which indicates the value of the next unassigned memory location. Thus the statement
BUFFEND EQU *
Assigns a value to BUFFEND, which is the address of the next byte following the buffer
area. Some values in the object program are relative to the beginning of the program and some are
absolute (independent of the program location, like constants). Hence, expressions are classified as
either absolute expression or relative expressions depending on the type of value
they produce.
Absolute Expressions: The expression that uses only absolute terms is absolute expression.
Absolute expression may contain relative term provided the relative terms occur in pairs with
opposite signs for each pair. Example:
MAXLEN EQU BUFEND-BUFFER
In the above instruction the difference in the expression gives a value that does not depend on
the location of the program and hence gives an absolute immaterial o the relocation of the
program. The expression can have only absolute terms. Example:
MAXLEN EQU 1000
Relative Expressions:
All the relative terms except one can be paired as described in “absolute”. The remaining
unpaired relative term must have a positive sign.
Example:
STAB EQU OPTAB + (BUFFEND – BUFFER)
Handling the type of expressions: to find the type of expression, we must keep track the type
of symbols used. This can be achieved by defining the type in the symbol table against each of the
symbol as shown in the table below:
System Software And Operating System
38
PROGRAM BLOCKS Program blocks allow the generated machine instructions and data to appear in the object
program in a different order by Separating blocks for storing code, data, stack, and larger data
block.
Assembler Directive USE:
USE [blockname]
At the beginning, statements are assumed to be part of the unnamed (default) block. If no
USE statements are included, the entire program belongs to this single block. Each program block
may actually contain several separate segments of the source program.
Assemblers rearrange these segments to gather together the pieces of each block and assign
address. Separate the program into blocks in a particular order. Large buffer area is moved to the end
of the object program. Program readability is better if data areas are placed in the source program
close to the statements that reference them.
The assembler directive USE indicates which portions of the source pro-gram belong to the
various blocks. At the beginning of the program, statements are assumed to be part of the unnamed
(default) block; if no USE statements are included, the entire program belongs to this single block.
The USE statement on line 92 signals the beginning of the block named CDATA. Source
statements are associated with this block until the USE statement on line 103, which begins the
block named CBLKS. The USE statement may also indicate a continuation of a previously begun
block. Thus the statement on line 123 resumes the default block, and the statement on line 183
resumes the block named CDATA.
As we can see, each program block may actually contain several separate segments of the
source program. The assembler will (logically) rearrange these segments to gather together the
pieces of each block. These blocks will then be assigned addresses in the object program, with the
blocks appearing in the same order in which they were first begun in the source program. The result
is the same as if the programmer had physically rearranged the source statements to group together
all the source lines belonging to each block.
The assembler accomplishes this logical rearrangement of code by maintaining, during Pass
1, a separate location counter for each program block. The location counter for a block is initialized
to 0 when the block is first begun. The current value of this location counter is saved when switching
to another block and the saved value is restored when resuming a previous block. Thus during Pass 1
each label in the program is assigned an address that is relative to the start of the block that contains
it.
When labels are entered into the symbol table, the block name or number is stored along with
the assigned relative address. At the end of Pass 1 the latest value of the location counter for each
block indicates the length of that block. The assembler can then assign to each block a starting
address in the object program (beginning with relative location 0). For code generation during Pass2,
the assembler needs the address for each symbol relative to the start of the object program (not the
start of an individual program block).
System Software And Operating System
39
Line Source Statement
System Software And Operating System
40
Example of program with multiple blocks
System Software And Operating System
41
This is easily found from the information in SYMTAB. The assembler simply adds the
location of the symbol, relative to the start of its block, to the assigned block starting address. It
demonstrates this process applied to our sample program. The column headed Loc/Block shows the
relative address (within a program block) assigned to each source line and a block number indicating
which program block is involved (0 = default block, 1 = CDATA, 2 = CBLKS). This is essentially
the same information that is stored in SYMTAB for each symbol. Notice that the value of the
symbol MAXLEN (line 107) is shown without a block number.
This indicates that MAXLEN is an absolute symbol, whose value is not relative to the start of
any program block. At the end of Pa. 1 the assembler constructs a table that contains the starting
addresses and lengths for all blocks. For our sample program, this table looks like
Now consider the instruction
20 0006 LDA LENGTH 032060
SYMTAB shows the value of the operand (the symbol LENGTH) as relative location 0003 within
program block 1 (CDATA). The starting address for CDATA is .6. Thus the desired target address
for this instruction is 0003 + 0066 =.9. The instruction is to be assembled using program-counter
relative addressing. When the instruction is executed, the program counter contains the address of
the following instruction Other 25). The address of this instruction is relative location 0009 within
the default block. Since the default block starts at location 0000, this address is simply 0009. Thus
the required displacement is .9 - 0009 = 60. The calculation of the other addresses during Pass 2
follows a similar pattern.
Arranging code into program blocks:
Pass 1
•A separate location counter for each program block is maintained.
•Save and restore LOCCTR when switching between blocks.
•At the beginning of a block, LOCCTR is set to 0.
•Assign each label an address relative to the start of the block.
•Store the block name or number in the SYMTAB along with the assigned relative address of the
label
•indicate the block length as the latest value of LOCCTR for each block at the end of Pass1
System Software And Operating System
42
\
Fig: Program blocks trace through the assembly and loading processes
In the example below three blocks are used:
Default: executable instructions
CDATA: all data areas that are less in length
CBLKS: all data areas that consists of larger blocks of memory
Example Code
Arranging code into program blocks:
Pass 1
A separate location counter for each program block is maintained.
Save and restore LOCCTR when switching between blocks.
At the beginning of a block, LOCCTR is set to 0.
Assign each label an address relative to the start of the block.
Store the block name or number in the SYMTAB along with the assigned relative address of
the label.
Indicate the block length as the latest value of LOCCTR for each block at the end of Pass1
Assign to each block a starting address in the object program by concatenating the program
blocks in a particular order
System Software And Operating System
43
Pass 2
Calculate the address for each symbol relative to the start of the object program by adding
The location of the symbol relative to the start of its block
The starting address of this block.
CONTROL SECTIONS AND PROGRAM LINKING A control section is a part of the program that maintains its identity after assembly; each
control section can be loaded and relocated independently of the others. Different control sections
are most often used for subroutines or other logical subdivisions. The programmer can assemble,
load, and manipulate each of these control sections separately.
Because of this, there should be some means for linking control sections together. For
example, instructions in one control section may refer to the data or instructions of other control
sections. Since control sections are independently loaded and relocated, the assembler is unable to
process these references in the usual way. Such references between different control sections are
called external references.
The assembler generates the information about each of the external references that will allow
the loader to perform the required linking. When a program is written using multiple control
sections, the beginning of each of the control section is indicated by an assembler directive
assembler directive: CSECT
secname CSECT
This syntax separates location counter for each control section.
Control sections differ from program blocks in that they are handled separately by the
assembler. Symbols that are defined in one control section may not be used directly another control
section; they must be identified as external reference for the loader to handle. The external
references are indicated by two assembler directives:
EXTDEF (external Definition):
It is the statement in a control section, names symbols that are defined in this section but may
be used by other control sections. Control section names do not need to be named in the EXTREF as
they are automatically considered as external symbols.
EXTREF (external Reference):
It names symbols that are used in this section but are defined in some other control section.
The order in which these symbols are listed is not significant. The assembler must include proper
information about the external references in the object program that will cause the loader to insert
the proper value where they are required.
Handling External Reference
Case 1
15 0003 CLOOP +JSUB RDREC 4B100000
The operand RDREC is an external reference.
The assembler has no idea where RDREC is inserts an address of zero can only use extended
format to provide enough room (that is, relative addressing for external reference is invalid)
The assembler generates information for each external reference that will allow the loader to
perform the required linking.
Case 2
190 0028 MAXLEN WORD BUFEND-BUFFER 000000
There are two external references in the expression, BUFEND and BUFFER.
System Software And Operating System
44
The assembler inserts a value of zero passes information to the loader
Add to this data area the address of BUFEND
Subtract from this data area the address of BUFFER
Case 3
On line 107, BUFEND and BUFFER are defined in the same control section and the
expression can be calculated immediately.
107 1000 MAXLEN EQU BUFEND-BUFFER
Object Code for the example program:
System Software And Operating System
45
The assembler must also include information in the object program that will cause the loader to
insert the proper value where they are required. The assembler maintains two new record in the
object code and a changed version of modification record.
Define record (EXTDEF)
Col. 1 D
Col. 2-7 Name of external symbol defined in this control section
Col. 8-13 Relative address within this control section (hexadecimal)
Col.14-73 Repeat information in Col. 2-13 for other external symbols
Refer record (EXTREF)
Col. 1 R
Col. 2-7 Name of external symbol referred to in this control section
Col. 8-73 Name of other external reference symbols
Modification record
Col. 1 M
Col. 2-7 Starting address of the field to be modified (hexadecimal)
Col. 8-9 Length of the field to be modified, in half-bytes (hexadecimal)
Col.11-16 External symbol whose value is to be added to or subtracted from the indicated field
A define record gives information about the external symbols that are defined in this control
section, i.e., symbols named by EXTDEF. A refer record lists the symbols that are used as external
references by the control section, i.e., symbols named by EXTREF. The new items in the
modification record specify the modification to be performed: adding or subtracting the value of
some external symbol. The symbol used for modification may be defined either in this control
section or in another section.
The object program is shown below. There is a separate object program for each of the
control sections. In the Define Record and refer record the symbols named in EXTDEF and
EXTREF are included. In the case of Define, the record also indicates the relative address of each
external symbol within the control section. For EXTREF symbols, no address information is
available. These symbols are simply named in the Refer record.
System Software And Operating System
46
ASSEMBLER DESIGN OPTIONS The existence of multiple control sections that can be relocated independently of one another
makes the handling of expressions complicated. It is required that in an expression that all the
relative terms be paired (for absolute expression), or that all except one be paired (for relative
expressions). When it comes in a program having multiple control sections then we have an
extended restriction that: Both terms in each pair of an expression must be within the same control
section If two terms represent relative locations within the same control section , their difference is
an absolute value (regardless of where the control section is located.
Legal: BUFEND-BUFFER (both are in the same control section) If the terms are located in different
control sections, their difference has a value that is unpredictable.
Illegal: RDREC-COPY (both are of different control section) it is the difference in the load
addresses of the two control sections. This value depends on the way run-time storage is allocated; it
is unlikely to be of any use.
How to enforce this restriction
When an expression involves external references, the assembler cannot determine whether or
not the expression is legal. The assembler evaluates all of the terms it can, combines these to form an
initial expression value, and generates Modification records. The loader checks the expression for
errors and finishes the evaluation.
One-Pass Assembler
The main problem in designing the assembler using single pass was to resolve forward
references. We can avoid to some extent the forward references by: Eliminating forward reference to
data items, by defining all the storage reservation statements at the beginning of the program rather
at the end. Unfortunately, forward reference to labels on the instructions cannot be avoided. (forward
jumping) To provide some provision for handling forward references by prohibiting forward
references to data items. There are two types of one-pass assemblers: One that produces object code
directly in memory for immediate execution (Load-and-go assemblers). The other type produces the
usual kind of object code for later execution.
Load-and-Go Assembler
Load-and-go assembler generates their object code in memory for immediate execution. No
object program is written out, no loader is needed. It is useful in a system with frequent program
development and testing. The efficiency of the assembly process is an important consideration.
Forward Reference in One-Pass Assemblers: In load-and-Go assemblers when a forward reference is
encountered:
Omits the operand address if the symbol has not yet been defined
Enters this undefined symbol into SYMTAB and indicates that it is undefined
Adds the address of this operand address to a list of forward references associated with the
SYMTAB entry
When the definition for the symbol is encountered, scans the reference list and inserts the
address.
At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols.
For Load-and-Go assembler
Search SYMTAB for the symbol named in the END statement and jumps to this location to
begin execution if there is no error
After scanning line 40 of the program:
40 2021 J` CLOOP 302012
System Software And Operating System
47
The status is that upto this point the symbol RREC is referred once at location 2013, ENDFIL at
201F and WRREC at location 201C. None of these symbols are defined. The figure shows that how
the pending definitions along with their addresses are included in the symbol table.
The status after scanning line 160, which has encountered the definition of RDREC and ENDFIL is
as given below:
If One-Pass needs to generate object code:
If the operand contains an undefined symbol, use 0 as the address and write the Text
record to the object program.
Forward references are entered into lists as in the load-and-go assembler.
When the definition of a symbol is encountered, the assembler generates another Text
record with the correct operand address of each entry in the reference list.
When loaded, the incorrect address 0 will be updated by the latter Text record containing
the symbol definition.
Object Code Generated by One-Pass Assembler.
Multi Pass Assembler:
For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
Symbol definition must be completed in pass 1.
Prohibiting forward references in symbol definition is not a serious inconvenience. Forward
references tend to create difficulty for a person reading the program.
Implementation Issues for Modified Two-Pass Assembler:
Implementation Issues when forward referencing is encountered in Symbol Defining
statements:
For a forward reference in symbol definition, we store in the SYMTAB:
The symbol name, The defining expression, The number of undefined symbols in the
defining expression, The undefined symbol (marked with a flag *) associated with a list of symbols
depend on this undefined symbol. When a symbol is defined, we can recursively evaluate the symbol
expressions depending on the newly defined symbol.
LOADERS AND LINKERS Introduction
The Source Program written in assembly language or high level language will be converted
to object program, which is in the machine language form for execution. This conversion either from
assembler or from compiler, contains translated instructions and data values from the source
program, or specifies addresses in primary memory where these items are to be loaded for execution.
This contains the following three processes, and they are,
Loading - which allocates memory location and brings the object program into memory for
execution - (Loader)
Linking- which combines two or more separate object programs and supplies the
information needed to allow references between them - (Linker)
Relocation - Which modifies the object program so that it can be loaded at an address
different from the location originally specified - (Linking Loader)
System Software And Operating System
48
Linker:
In high level languages, some built in header files or libraries are stored. These libraries are
predefined and these contain basic functions which are essential for executing the program. These
functions are linked to the libraries by a program called Linker. If linker does not find a library of a
function then it informs to compiler and then compiler generates an error. The compiler
automatically invokes the linker as the last step in compiling a program.
Not built in libraries, it also links the user defined functions to the user defined
libraries. Usually a longer program is divided into smaller subprograms called modules. And these
modules must be combined to execute the program. The process of combining the modules is done
by the linker.
Loader:
Loader is a program that loads machine codes of a program into the system memory.
In Computing, a loader is the part of an Operating System that is responsible for loading programs.
It is one of the essential stages in the process of starting a program. Because it places programs into
memory and prepares them for execution. Loading a program involves reading the contents
of executable file into memory. Once loading is complete, the operating system starts the program
by passing control to the loaded program code. All operating systems that support program loading
have loaders. In many operating systems the loader is permanently resident in memory.
BASIC LOADER FUNCTIONS A loader is a system program that performs the loading function. It brings object program
into memory and starts its execution. translator may be assembler/complier, which generates the
object program and later loaded to the memory by the loader for execution. The translator is
specifically an assembler, which generates the object loaded, which becomes input to the loader.
Type of Loaders
The different types of loaders are, absolute loader, bootstrap loader, relocating loader
(relative loader), and, direct linking loader.
ABSOLUTE LOADER
The operation of absolute loader is very simple. The object code is loaded to specified
locations in the memory. At the end the loader jumps to the specified address to begin execution of
the loaded program. The advantage of absolute loader is simple and efficient. But the disadvantages
are, the need for programmer to specify the actual address, and, difficult to use subroutine libraries.
System Software And Operating System
49
Memory address content
Fig : program loaded in memory
The algorithm for this type of loader is given here. The object program and, the object
program loaded into memory by the absolute loader are also shown. Each byte of assembled code is
given using its hexadecimal representation in character form. Easy to read by human beings. Each
byte of object code is stored as a single byte. Most machine store object programs in a binary form,
and we must be sure that our file and device conventions do not cause some of
the program bytes to be interpreted as control characters.
Begin
read Header record
verify program name and length
read first Text record
while record type is <> ‘E’ do
begin
{if object code is in character form, convert into internal representation}
move object code to specified location in memory
read next object program record
end
jump to address specified in End record
end
A SIMPLE BOOTSTRAP LOADER
System Software And Operating System
50
When a computer is first turned on or restarted, a special type of absolute loader, called
bootstrap loader is executed. This bootstrap loads the first program to be run by the computer usually
an operating system. The bootstrap itself begins at address 0. It loads the OS starting address 0x80.
No header record or control information, the object code is consecutive bytes of memory.
The algorithm for the bootstrap loader is as follows
Begin
X=0x80 (the address of the next memory location to be loaded
Loop
AGETC (and convert it from the ASCII character
code to the value of the hexadecimal digit)
save the value in the high-order 4 bits of S
AGETC
combine the value to form one byte A<-(A+S)
store the value (in A) to the address in register X
X X+1
End
It uses a subroutine GETC, which is
GETC A read one character
if A=0x04 then jump to 0x80
if A<48 then GETC
A A-48 (0x30)
if A<10 then return
A A-7
Return
MACHINE-DEPENDENT LOADER FEATURES Absolute loader is simple and efficient, but the scheme has potential disadvantages One of
the most disadvantage is the programmer has to specify the actual starting address, from where the
program to be loaded. This does not create difficulty, if one program to run, but not for several
programs. Further it is difficult to use subroutine libraries efficiently. This needs the design and
implementation of a more complex loader. The loader must provide program relocation and linking,
as well as simple loading functions.
RELOCATION
The concept of program relocation is, the execution of the object program using any part of
the available and sufficient memory. The object program is loaded into memory wherever there is
room for it. The actual starting address of the object program is not known until load time.
Relocation provides the efficient sharing of the machine with larger memory and when several
independent programs are to be run together. It also supports the use of subroutine libraries
efficiently. Loaders that allow for program relocation are called relocating loaders or relative
loaders.
Methods for specifying relocation
Use of modification record and, use of relocation bit, are the methods available for specifying
relocation. In the case of modification record, a modification record M is used in the object program
to specify any relocation. In the case of use of relocation bit, each instruction is associated with one
relocation bit and, these relocation bits in a Text record is gathered into bit masks.
System Software And Operating System
51
Modification records are used in complex machines and are also called Relocation and
Linkage Directory (RLD) specification. The format of the modification record (M) is as follows. The
object program with relocation by Modification records is also shown here.
Modification record
col 1: M
col 2-7: relocation address
col 8-9: length (halfbyte)
col 10: flag (+/-)
col 11-17: segment name
The relocation bit method is used for simple machines. Relocation bit is 0: no modification is
necessary, and is 1: modification is needed. This is specified in the columns 10- 12 of text record
(T), the format of text record, along with relocation bits is as follows.
Text record
col 1: T
col 2-7: starting address
col 8-9: length (byte)
col 10-12: relocation bits
col 13-72: object code
Twelve-bit mask is used in each Text record (col:10-12 – relocation bits), since each text record
contains less than 12 words, unused words are set to 0, and, any value that is to be modified during
relocation must coincide with one of these 3-byte segments. For absolute loader, there are no
relocation bits column 10-69 contains object code. The object program with relocation by bit mask is
as shown below. Observe FFC - means all ten words are to be modified and, E00 - means first three
records are to be modified.
PROGRAM LINKING The Goal of program linking is to resolve the problems with external references (EXTREF)
and external definitions (EXTDEF) from different control sections. EXTDEF (external definition)
- The EXTDEF statement in a control section names symbols, called external symbols, that are
defined in this (present) control section and may be
used by other sections.
ex: EXTDEF BUFFER, BUFFEND, LENGTH
EXTDEF LISTA, ENDA
EXTREF (external reference) - The EXTREF statement names symbols used in this
(present) control section and are defined elsewhere.
ex: EXTREF RDREC, WRREC
EXTREF LISTB, ENDB, LISTC, ENDC
How to implement EXTDEF and EXTREF
The assembler must include information in the object program that will cause the loader
to insert proper values where they are required – in the form of Define record (D) and, Refer
record(R).
Define record
The format of the Define record (D) along with examples is as shown here.
Col. 1 D
Col. 2-7 Name of external symbol defined in this control section
System Software And Operating System
52
Col. 8-13 Relative address within this control section (hexadecimal)
Col.14-73 Repeat information in Col. 2-13 for other external symbols
Example records
D LISTA 000040 ENDA 000054
D LISTB 000060 ENDB 000070
Refer record
The format of the Refer record (R) along with examples is as shown here.
Col. 1 R
Col. 2-7 Name of external symbol referred to in this control section
Col. 8-73 Name of other external reference symbols
Example records
R LISTB ENDB LISTC ENDC
R LISTA ENDA LISTC ENDC
R LISTA ENDA LISTB ENDB
Here are the three programs named as PROGA, PROGB and PROGC, which are
separately assembled and each of which consists of a single control section. LISTA, ENDA in
PROGA, LISTB, ENDB in PROGB and LISTC, ENDC in PROGC are external definitions in each
of the control sections. Similarly LISTB, ENDB, LISTC, ENDC in PROGA, LISTA, ENDA,
LISTC,
ENDC in PROGB, and LISTA, ENDA, LISTB, ENDB in PROGC, are external references. These
sample programs given here are used to illustrate linking and relocation. The following figures give
the sample programs and their corresponding object programs. Observe the object programs, which
contain D and R records along with other records.
0000 PROGA START 0
EXTDEF LISTA, ENDA
EXTREF LISTB, ENDB, LISTC, ENDC
………..
……….
0020 REF1 LDA LISTA 03201D
0023 REF2 +LDT LISTB+4 77100004
0027 REF3 LDX #ENDA-LISTA 050014
. .
0040 LISTA EQU *
0054 ENDA EQU *
0054 REF4 WORD ENDA-LISTA+LISTC 000014
0057 REF5 WORD ENDC-LISTC-10 FFFFF6
005A REF6 WORD ENDC-LISTC+LISTA-1 00003F
005D REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000014
0060 REF8 WORD LISTB-LISTA FFFFC0
END REF1
0000 PROGB START 0
EXTDEF LISTB, ENDB
EXTREF LISTA, ENDA, LISTC, ENDC
………..
……….
0036 REF1 +LDA LISTA 03100000
System Software And Operating System
53
003A REF2 LDT LISTB+4 772027
003D REF3 +LDX #ENDA-LISTA 05100000
. .
0060 LISTB EQU *
0070 ENDB EQU *
0070 REF4 WORD ENDA-LISTA+LISTC 000000
0073 REF5 WORD ENDC-LISTC-10 FFFFF6
0076 REF6 WORD ENDC-LISTC+LISTA-1 FFFFFF
0079 REF7 WORD ENDA-LISTA-(ENDB-LISTB) FFFFF0
007C REF8 WORD LISTB-LISTA 000060
END
0000 PROGC START 0
EXTDEF LISTC, ENDC
EXTREF LISTA, ENDA, LISTB, ENDB
………..
………..
0018 REF1 +LDA LISTA 03100000
001C REF2 +LDT LISTB+4 77100004
0020 REF3 +LDX #ENDA-LISTA 05100000
. .
0030 LISTC EQU *
0042 ENDC EQU *
0042 REF4 WORD ENDA-LISTA+LISTC 000030
0045 REF5 WORD ENDC-LISTC-10 000008
0045 REF6 WORD ENDC-LISTC+LISTA-1 000011
004B REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000000
004E REF8 WORD LISTB-LISTA 000000
END
H PROGA 000000 000063
D LISTA 000040 ENDA 000054
R LISTB ENDB LISTC ENDC
. .
T 000020 0A 03201D 77100004 050014
. .
T 000054 0F 000014 FFFF6 00003F 000014 FFFFC0
M000024 05+LISTB
M000054 06+LISTC
M000057 06+ENDC
M000057 06 -LISTC
M00005A06+ENDC
M00005A06 -LISTC
M00005A06+PROGA
M00005D06-ENDB
M00005D06+LISTB
M00006006+LISTB
M00006006-PROGA
System Software And Operating System
54
E000020
H PROGB 000000 00007F
D LISTB 000060 ENDB 000070
R LISTA ENDA LISTC ENDC
.
T 000036 0B 03100000 772027 05100000
.
T 000007 0F 000000 FFFFF6 FFFFFF FFFFF0 000060
M000037 05+LISTA
M00003E 06+ENDA
M00003E 06 -LISTA
M000070 06 +ENDA
M000070 06 -LISTA
M000070 06 +LISTC
M000073 06 +ENDC
M000073 06 -LISTC
M000073 06 +ENDC
M000076 06 -LISTC
M000076 06+LISTA
M000079 06+ENDA
M000079 06 -LISTA
M00007C 06+PROGB
M00007C 06-LISTA
E
H PROGC 000000 000051
D LISTC 000030 ENDC 000042
R LISTA ENDA LISTB ENDB
.
T 000018 0C 03100000 77100004 05100000
.
T 000042 0F 000030 000008 000011 000000 000000
M000019 05+LISTA
M00001D 06+LISTB
M000021 06+ENDA
M000021 06 -LISTA
M000042 06+ENDA
M000042 06 -LISTA
M000042 06+PROGC
M000048 06+LISTA
M00004B 06+ENDA
M00004B 006-LISTA
M00004B 06-ENDB
M00004B 06+LISTB
M00004E 06+LISTB
M00004E 06-LISTA
E
System Software And Operating System
55
The following figure shows these three programs as they might appear in memory after
loading and linking. PROGA has been loaded starting at address 4000, with PROGB and PROGC
immediately following.
For example, the value for REF4 in PROGA is located at address 4054 (the beginning
address of PROGA plus 0054, the relative address of REF4 within PROGA). The following figure
shows the details of how this value is computed.
The initial value from the Text record T0000540F000014FFFFF600003F000014FFFFC0 is
000014. To this is added the address assigned to LISTC, which is 4112 (the beginning address of
PROGC plus 30). The result is 004126. That is REF4 in PROGA is ENDA-LISTA+LISTC=4054-
4040+4112=4126. Similarly the load address for symbols LISTA: PROGA+0040=4040, LISTB:
PROGB+0060=40C3 and LISTC: PROGC+0030=4112
Keeping these details work through the details of other references and values of these references are
the same in each of the three programs.
ALGORITHM AND DATA STRUCTURES FOR A LINKING LOADER The algorithm for a linking loader is considerably more complicated than the absolute loader
program, which is already given. The concept given in the program linking section is used for
developing the algorithm for linking loader. The modification records are used for relocation so that
the linking and relocation functions are performed using the same mechanism. Linking Loader uses
two-passes logic. ESTAB (external symbol table) is the main data structure for a linking loader.
Pass 1: Assign addresses to all external symbols
Pass 2: Perform the actual loading, relocation, and linking
ESTAB - ESTAB for the example (refer three programs PROGA PROGB and PROGC)
given is as shown below.
The ESTAB has four entries in it; they are name of the control section, the symbol appearing in the
control section, its address and length of the control section.
Control section Symbol Address Length
PROGA 4000 63
LISTA 4040
ENDA 4054
PROGB 4063 7F
LISTB 40C3
ENDB 40D3
PROGC 40E2 51
LISTC 4112
ENDC 4124
Program Logic for Pass 1
Pass 1 assign addresses to all external symbols. The variables & Data structures used during
pass 1 are, PROGADDR (program load address) from OS, CSADDR (control section address),
CSLTH (control section length) and ESTAB. The pass 1 processes the Define Record. The algorithm
for Pass 1 of Linking Loader is given below.
Program Logic for Pass 2
System Software And Operating System
56
Pass 2 of linking loader perform the actual loading, relocation, and linking. It uses
modification record and lookup the symbol in ESTAB to obtain its address. Finally it uses end
record of a main program to obtain transfer address, which is a starting address needed for the
execution of the program. The pass 2 process Text record and Modification record of the object
programs. The algorithm for Pass 2 of Linking Loader is given below. Improve Efficiency,
The question here is can we improve the efficiency of the linking loader. Also observe that,
even though we have defined Refer record (R), we haven’t made use of it. The efficiency can be
improved by the use of local searching instead of multiple searches of ESTAB for the same symbol.
For implementing this we assign a reference number to each external symbol in the Refer record.
Then this reference number is used in Modification records instead of external symbols. 01 is
assigned to control section name, and other numbers for external reference
symbols.
The object programs for PROGA, PROGB and PROGC are shown below, with above
modification to Refer record ( Observe R records).
Symbol and Addresses in PROGA, PROGB and PROGC are as shown below. These
are the entries of ESTAB. The main advantage of reference number mechanism is that it avoids
multiple searches of ESTAB for the same symbol during the loading of a control section
Ref No. Symbol Address
1 PROGA 4000
2 LISTB 40C3
3 ENDB 40D3
4 LISTC 4112
5 ENDC 4124
MACHINE-INDEPENDENT LOADER FEATURES Machine-independent loader features are not directly related to machine architecture and
design. Automatic Library Search and Loader Options are such Machine independent Loader
Features. AUTOMATIC LIBRARY SEARCH
This feature allows a programmer to use standard subroutines without explicitly including
them in the program to be loaded. The routines are automatically retrieved from a library as they are
needed during linking. This allows programmer to use subroutines from one or more libraries. The
subroutines called by the program being loaded are automatically fetched from the library, linked
with the main program and loaded. The loader searches the library or libraries specified for routines
that contain the definitions of these symbols in the main program.
Ref No . Symbol Address
1 PROGB 4063
2 LISTA 4040
3 ENDA 4054
4 LISTC 4112
5 ENDC 4124
Ref No. Symbol Address
1 PROGC 4063
2 LISTA 4040
System Software And Operating System
57
3 ENDA 4054
4 LISTB 40C3
5 ENDB 40D3
LOADER OPTIONS Loader options allow the user to specify options that modify the standard processing. The
options may be specified in three different ways. They are, specified using a command language,
specified as a part of job control language that is processed by the operating system, and can be
specified using loader control statements in the source program.
Here are the some examples of how option can be specified.
INCLUDE program-name (library-name) - read the designated object program from a
library
DELETE csect-name – delete the named control section from the set pf programs being
loaded
CHANGE name1, name2 - external symbol name1 to be changed to name2 wherever it
appears in the object programs
LIBRARY MYLIB – search MYLIB library before standard libraries
NOCALL STDDEV, PLOT, CORREL – no loading and linking of unneeded routines
.Here is one more example giving, how commands can be specified as a part of object File, and the
respective changes are carried out by the loader.
LIBRARY UTLIB
INCLUDE READ (UTLIB)
INCLUDE WRITE (UTLIB)
DELETE RDREC, WRREC
CHANGE RDREC, READ
CHANGE WRREC, WRITE
NOCALL SQRT, PLOT
The commands are, use
UTLIB ( say utility library),
include READ and WRITE control sections from the library,
delete the control sections RDREC and WRREC from the load,
the change command causes all external references to the symbol RDREC to be changed to
the symbol READ,
similarly references to WRREC is changed to WRITE,
finally, no call to the functions SQRT, PLOT, if they are used in the program.
LOADER DESIGN OPTIONS There are some common alternatives for organizing the loading functions, including
relocation and linking. Linking Loaders – Perform all linking and relocation at load time. The Other
Alternatives are Linkage editors, which perform linking prior to load time and, dynamic linking, in
which linking function is performed at execution time.
LINKING LOADERS
System Software And Operating System
58
The below diagram shows the processing of an object program using Linking Loader. The
source program is first assembled or compiled, producing an object program. A linking loader
performs all linking and loading operations, and loads the program into memory for execution.
LINKAGE EDITORS
The above figure shows the processing of an object program using Linkage editor. A Linkage
editor produces a linked version of the program often called a load module or an executable image
which is written to a file or library for later execution.
The linked program produced is generally in a form that is suitable for processing by a
relocating loader. Some useful functions of Linkage editor are, an absolute object program can be
created, if starting address is already known.
New versions of the library can be included without changing the source program. Linkage
editors can also be used to build packages of subroutines or other control sections that are generally
used together.
System Software And Operating System
59
Linkage editors often allow the user to specify that external references are not to be resolved
by automatic library search – linking will be done later by linking loader – linkage editor + linking
loader – savings in space.
Distinguish linking loader from linkage editor.
Linking Loader
Linkage Editor
1. Performs linking and relocation at load time.
2. Loads the linked program directly into the
memory.
3. Linking loader has less flexibility and control
1. Linking is done prior to load time.
2. Writes a linked version of program,
which is later executed by relocating
loader
3. Linkage editors offer more flexibility and
control
DYNAMIC LINKING
The scheme that postpones the linking functions until execution. A subroutine is loaded and
linked to the rest of the program when it is first called – usually called dynamic linking, dynamic
loading or load on call. The advantages of dynamic linking are, it allow several executing programs
to share one copy of a subroutine or library. In an object oriented system, dynamic linking makes it
possible for one object to be shared by several programs. Dynamic linking provides the ability to
load the routines only when (and if) they are needed. The actual loading and linking can be
accomplished using operating system service request.
System Software And Operating System
60
BOOTSTRAP LOADERS The bootstrap loader loads the first program to be run by the computer, usually it is an
operating system. The bootstrap loader is a small program that runs before any other normal program
can run. It is stored on non-volatile storage (normally the computer's ROM) so that it can still be
used after the computer has been switched off and then on again.
All these bootstraps (except paper tape) are designed to read in 512 bytes into locations
0-776, and then start program execution at 0. If you are having system problems, then it pays to
have the bootstrap halt, so that you can check for error conditions in device registers. Also, check
location 0, which is normally 240 for DEC bootstraps and 407 or 411 (separate I/D space) for
Unix (unless it's a tight load, and the out header has been removed). Some DEC bootstraps refuse
to load boot blocks that don't begin with 240. If you have doubts, load location 0 with 777 before
you start the bootstrap.
Toggle in these programs starting at location 1000. To be safe, load a trap catcher into the
following locations, and if you get a halt at 6 or 12, check the program or missing device. If it
halts at 26, then you have power supply problems. If the CPU loops on location 0 (can show as
location 2 on CPUs with data displays) then the boot block wasn't loaded.
Location Contents Comment
=======================================
000000 000777 Loop at location zero if
000002 000000 secondary bootstrap isn't loaded
000004 000006 Bus error
000006 000000
000010 000012 Reserved instruction
000012 000000
000024 000026 Power failure
000026 000000
It gives instructions as to where the operating system on a microcomputer is to be found. If
the question, how is the loader itself loaded into the memory? is asked, then the answer is, when
computer is started – with no program in memory, a program present in ROM ( absolute address)
can be made executed – may be OS itself or A Bootstrap loader, which in turn loads OS and prepares
it for execution. The first record ( or records) is generally referred to as a bootstrap loader – makes
the OS to be loaded. Such a loader is added to the beginning of all object programs that are to be
loaded into an empty and idle system.
A bootstrap loader is a small program which is held in ROM.
The processor executes this code when it gets the reset (or powerup) signal.
The bootstrap loader does a few hardware checks and then causes the processor to load and
execute the code in the boot sector of the start-up hard disc.
Finally the processor will load the main part of the operating system from disk into main
memory.
System Software And Operating System
61
Alternatively referred to as bootstrapping, bootloader, or boot program, a bootstrap loader is
a program that resides in the computers EPROM, ROM, or other non-volatile memory that
automatically executed by the processor when turning on the computer. The bootstrap loader
reads the hard drives boot sector to continue the process of loading the computers operating
system. The term boostrap comes from the old phrase "Pull yourself up by your bootstraps."
Definition - What does Bootstrap mean?
A bootstrap is the process of starting up a computer. It also refers to the program that initializes
the operating system (OS) during start-up.
The term bootstrap or bootstrapping originated in the early 1950s. It referred to a bootstrap load
button that was used to initiate a hardwired bootstrap program, or smaller program that executed
a larger program such as the OS. The term was said to be derived from the expression “pulling
yourself up by your own bootstraps;” starting small and loading programs one at a time while
each program is “laced” or connected to the next program to be executed in sequence.
Bootstrap
Bootstrap is the process of loading a set of instructions when a computer is first turned on or
booted. During the start-up process, diagnostic tests are performed, such as the power-on self-test
(POST), that set or check configurations for devices and implement routine testing for the
connection of peripherals, hardware and external memory devices. The bootloader or bootstrap
program is then loaded to initialize the OS
Typical programs that load the OS are:
GNU grand unified bootloader (GRUB): A multiboot specification that allows the user to
choose one of several OSs
NT loader (NTLDR): A bootloader for Microsoft’s Windows NT OS that usually runs
from the hard drive
Linux loader (LILO): A bootloader for Linux that generally runs from a hard drive or
floppy disc
Network interface controller (NIC): Uses a bootloader that supports booting from a
network interface such as Etherboot or pre-boot execution environment (PXE)
Prior to bootstrap a computer is said to start with a blank main memory and an intact magnetic
core memory or kernel. The bootstrap allows the sequence of programs to load in order to initiate
the OS. The OS is the main program that manages all programs that run on a computer and
performs tasks such as controlling peripheral devices like a disc drive, managing directories and
files, transmitting output signals to a monitor and identifying input signals from a keyboard.
Bootstrap can also refer to as preparing early programming environments incrementally to create
more complex and user-friendly programming environments. For example, at one time the
System Software And Operating System
62
programming environment might have consisted of an assembler program and a simple text
editor. Over time, gradual improvements have led to today's sophisticated object-oriented
programming languages and graphical integrated development environments (IDEs).
SUMMARY:
Relocation - which modifies the object program so that it can be loaded at an address
different from the location originally specified - (Linking Loader).
The different types of loaders are, absolute loader, bootstrap loader, relocating loader
(relative loader), and, direct linking loader
The scheme that postpones the linking functions until execution. A subroutine is loaded
and linked to the rest of the program when it is first called – usually called dynamic
linking, dynamic loading or load on call.
The bootstrap loader loads the first program to be run by the computer, usually it is an
operating system.
A subroutine is loaded and linked to the rest of the program when it is first called –
usually called dynamic linking, dynamic loading or load on call.
. A Linkage editor produces a linked version of the program often called a load module or
an executable image which is written to a file or library for later execution.
A linking loader performs all linking and loading operations, and loads the program into
memory for execution.
This directive can be used to indirectly assign values to the symbols. The directive is
usually called ORG (for origin).
Simplified Instructional Computer (SIC) is a hypothetical computer that includes the
hardware features most often found on real machines.
The simple assembler uses two major internal data structures:
• Operation Code Table(OPTAB)
• Symbol Table (SYMTAB).
System Software And Operating System
63
SUMMARY:
SUMMARY:
System software – support operation and use of computer.
Application software - solution to a problem.
Assembler translates mnemonic instructions into machine code. The instruction formats,
addressing modes etc., are of direct concern in assembler design.
Compilers must generate machine language code, taking into account such hardware
characteristics as the number and type of registers and the machine instructions available.
Location counter helps in the assignment of the addresses.
The address is mentioned during assembling itself. This is called Absolute Assembly.
The actual address of a memory location, also called an absolute address;
These are the features which do not depend on the architecture of the machine. These are:
• Literals
• Symbol-Defining Statements
• Expressions
• Program blocks
• Control sections and program linking
A literal is defined with a prefix = followed by a specification of the literal value.
This directive can be used to indirectly assign values to the symbols. The directive is
usually called ORG (for origin). Its general format is: ORG value
Program blocks allow the generated machine instructions and data to appear in the
object program in a different order by Separating blocks for storing code, data, stack, and
larger data block.
A control section is a part of the program that maintains its identity after assembly; each
control section can be loaded and relocated independently of the others.
Loading - which allocates memory location and brings the object program into memory
for execution - (Loader)
Linking- which combines two or more separate object programs and supplies the
information needed to allow references between them - (Linker)
System Software And Operating System
64
UNIT II
Macroprocessor: Basic macroprocessor functions - Machine independent macroprocessor features -
concatenation of macro parameter macro processor design options-recursive macro expansion -
general purpose macro processor - macro processing within language translators.
Text Editors: Overview of editing process - user interface - editor structure
MACRO PROCESSORS
A Macro represents a commonly used group of statements in the source programming
language.
A macro instruction (macro) is a notational convenience for the programmer
It allows the programmer to write shorthand version of a program (module programming)
The macro processor replaces each macro instruction with the corresponding group of source
language statements (expanding)
Normally, it performs no analysis of the text it handles.
It does not concern the meaning of the involved statements during macro expansion.
The design of a macro processor generally is machine independent.
Two new assembler directives are used in macro definition
MACRO: identify the beginning of a macro definition
MEND: identify the end of a macro definition
PROTOTYPE FOR THE MACRO
Each parameter begins with &.
name MACRO parameters
:
body
:
MEND
Body: the statements that will be generated as the expansion of the macro.
Macro Definition and Expansion:
The figure shows the MACRO expansion. The left block shows the MACRO definition
and the right block shows the expanded macro replacing the MACRO call with its block of
executable instruction.
M1 is a macro with two parameters D1 and D2. The MACRO stores the contents of register
A in D1 and the contents of register B in D2. Later M1 is invoked with the parameters DATA1 and
DATA2, Second time with DATA4 and DATA3. Every call of MACRO is expended with the
executable statements. The statement M1 DATA1, DATA2 is a macro invocation statements that
gives the name of the macro instruction being invoked and the arguments (M1 and M2) to be used in
expanding. A macro invocation is referred as a Macro Call or Invocation.
System Software And Operating System
65
MACRO EXPANSION
The program with macros is supplied to the macro processor. Each macro invocation
statement will be expanded into the statement s that form the body of the macro, with the arguments
from the macro invocation substituted for the parameters in the macro prototype. During the
expansion, the macro definition statements are deleted since they are no longer needed.
The arguments and the parameters are associated with one another according to their
positions. The first argument in the macro matches with the first parameter in the macro prototype
and so on.
After macro processing the expanded file can become the input for the Assembler. The
Macro Invocation statement is considered as comments and the statement generated from expansion
is treated exactly as though they had been written directly by the programmer. The difference
between Macros and Subroutines is that the statement s from the body of the Macro is expanded the
number of times the macro invocation is encountered, whereas the statement of the subroutine
appears only once no matter how many times the subroutine is called. Macro instructions will be
written so that the body of the macro contains no labels.
Problem of the label in the body of macro:
If the same macro is expanded multiple times at different places in the program
There will be duplicate labels, which will be treated as errors by the assembler. Solutions:
Do not use labels in the body of macro.
Explicitly use PC-relative addressing instead.
Ex, in RDBUFF and WRBUFF macros.
JEQ *+11
JLT *-14
It is inconvenient and error-prone.
System Software And Operating System
66
System Software And Operating System
67
Program with Macros Expanded
MACRO PROCESSOR ALGORITHM AND DATA STRUCTURE Design can be done as two-pass or a one-pass macro. In case of two-pass assembler.
Two-pass macro processor
Pass 1: Process all macro definitions
Pass 2: Expand all macro invocation statements
However, one-pass may be enough because all macros would have to be defined during the first pass
before any macro invocations were expanded.
The definition of a macro must appear before any statements that invoke that macro.
Moreover, the body of one macro can contain definitions of the other macro
Consider the example of a Macro defining another Macro.
In the example below, the body of the first Macro (MACROS) contains statement that
Define RDBUFF, WRBUFF and other macro instructions for SIC machine.
The body of the second Macro (MACROX) defines these same macros for SIC/XE machine.
A proper invocation would make the same program to perform macro invocation to run on
either SIC or SIC/XE machine.
A program that is to be run on SIC system could invoke MACROS whereas a program to be
run on SIC/XE can invoke MACROX.
System Software And Operating System
68
However, defining MACROS or MACROX does not define RDBUFF and WRBUFF.
These definitions are processed only when an invocation of MACROS or MACROX is
expanded.
Example Of The Definition Of Macros Within A Macro Body
One-Pass Macro Processor:
A one-pass macro processor that alternate between macro definition and macro expansion in
a recursive way is able to handle recursive macro definition.
Restriction
System Software And Operating System
69
The definition of a macro must appear in the source program before any statements that
invoke that macro.
This restriction does not create any real inconvenience.
The design considered is for one-pass assembler. The data structures required are:
DEFTAB (Definition Table)
Stores the macro definition including macro prototype and macro body
Comment lines are omitted.
References to the macro instruction parameters are converted to a positional notation for efficiency
in substituting arguments.
NAMTAB (Name Table)
Stores macro names
Serves as an index to DEFTAB
Pointers to the beginning and the end of the macro definition (DEFTAB)
ARGTAB (Argument Table)
Stores the arguments according to their positions in the argument list.
As the macro is expanded the arguments from the Argument table are
substituted for the corresponding parameters in the macro body.
The above figure shows the portion of the contents of the table during the processing of the
program.
Definition of RDBUFF is stored in DEFTAB, with an entry in NAMTAB having the pointers to
the beginning and the end of the definition. The arguments referred by the instructions are denoted
by the positional notations. For example,
TD =X.?1.
The above instruction is to test the availability of the device whose number is given by the parameter
&INDEV. In the instruction this is replaced by its positional value? 1. shows the ARTAB as it would
appear during expansion of the RDBUFF statement as given below:
CLOOP RDBUFF F1, BUFFER, LENGTH
For the invocation of the macro RDBUFF, the first parameter is F1 (input device code),
second is BUFFER (indicating the address where the characters read are stored), and the third is
LENGTH (which indicates total length of the record to be read). When the notation is encountered in
a line from DEFTAB, a simple indexing operation supplies the proper argument from ARGTAB.
The algorithm of the Macro processor is given below. This has the procedure DEFINE to
make the entry of macro name in the NAMTAB, Macro Prototype in DEFTAB. EXPAND is called
to set up the argument values in ARGTAB and expand a Macro Invocation statement. Procedure
GETLINE is called to get the next line to be processed either from the DEFTAB or from the file
itself.
When a macro definition is encountered it is entered in the DEFTAB. The normal approach
is to continue entering till MEND is encountered. If there is a program having a Macro defined
within another Macro, while defining in the DEFTAB the very first MEND is taken as the end of the
Macro definition. This does not complete the definition as there is another outer Macro which
completes the definition of Macro as a whole. Therefore the DEFINE procedure keeps a counter
variable LEVEL. Every time a Macro directive is encountered this counter is incremented by 1. The
System Software And Operating System
70
moment the innermost Macro ends indicated by the directive MEND it starts decreasing the value of
the counter variable by one. The last MEND should make the counter value set to zero. So when
LEVEL becomes zero, the MEND corresponds to the original MACRO directive.
Most macro processors allow the definitions of the commonly used instructions to appear in a
standard system library, rather than in the source program. This makes the use of macros convenient;
definitions are retrieved from the library as they are needed during macro processing.
Comparison of Macro Processor Design
One-pass algorithm
Every macro must be defined before it is called
One-pass processor can alternate between macro definition and macro expansion
Nested macro definitions are allowed but nested calls are not allowed.
System Software And Operating System
71
System Software And Operating System
72
ALGORITHM FOR A ONE PASS MACRO PROCESSOR
System Software And Operating System
73
Two-pass algorithm
Pass1: Recognize macro definitions
Pass2: Recognize macro calls.
Nested macro definitions are not allowed.
System Software And Operating System
74
MACHINE-INDEPENDENT MACRO-PROCESSOR FEATURES The design of macro processor doesn’t depend on the architecture of the machine. We will be
studying some extended feature for this macro processor. These features are:
Concatenation of Macro Parameters
Generation of unique labels
Conditional Macro Expansion
Keyword Macro Parameters
CONCATENATION OF MACRO PARAMETERS Most macro processor allows parameters to be concatenated with other character strings.
Suppose that a program contains a series of variables named by the symbols XA1, XA2, XA3,…,
another series of variables named XB1, XB2, XB3,…, etc. If similar processing is to be performed
on each series of labels, the programmer might put this as a macro instruction. The parameter to such
a macro instruction could specify the series of variables to be operated on (A, B, etc.). The macro
processor would use this parameter to construct the symbols required in the macro expansion (XA1,
Xb1, etc.).
Suppose that the parameter to such a macro instruction is named &ID. The body of the macro
definition might contain a statement like LDA X&ID1 is the starting character of the macro
instruction; but the end of the parameter is not marked. So in the case of &ID1, the macro processor
could deduce the meaning that was intended. If the macro definition contains contain &ID and &ID1
as parameters, the situation would be unavoidable ambiguous. Most of the macro processors deal
with this problem by providing a special concatenation operator. In the SIC macro language, this
operator is the character. Thus the statement LDA X&ID1 can be written as
LDA X&ID -> 1
The statement SUM A and SUM BETA shows the invocation statements and the corresponding
macro expansion.
Fig: Concatenation of macro parameters
System Software And Operating System
75
GENERATION OF UNIQUE LABELS As discussed it is not possible to use labels for the instructions in the macro definition, since
every expansion of macro would include the label repeatedly which is not allowed by the assembler.
This in turn forces us to use relative addressing in the jump instructions. Instead we can use the
technique of generating unique labels for every macro invocation and expansion. During macro
expansion each $ will be replaced with $XX, where xx is a two-character alphanumeric counter of
the number of macro instructions expansion.
For example,
XX = AA, AB, AC…
This allows 1296 macro expansions in a single program. The following program shows the macro
definition with labels to the instruction. The following figure shows the macro invocation an
expansion first time.
Fig: Generation of unique labels within macro expansion
System Software And Operating System
76
If the macro is invoked second time the labels may be expanded as
$ABLOOP
$ABEXIT.
CONDITIONAL MACRO EXPANSION There are applications of macro processors that are not related to assemblers or assembler
programming. Conditional assembly depends on parameters provides
MACRO &COND
……..
IF (&COND NE “ “)
part I
ELSE
part II
ENDIF
………
MEND
Part I is expanded if condition part is true, otherwise part II is expanded. Compare operators:
NE, EQ, LE, GT.
Macro-Time Variables:
Fig : use of macro time looping statement
System Software And Operating System
77
Macro-time variables (often called as SET Symbol) can be used to store working values
during the macro expansion. Any symbol that begins with symbol & and not a macro instruction
parameter is considered as macro-time variable. All such variables are initialized to zero which gives
the definition of the macro RDBUFF with the parameters &INDEV, &BUFADR, &RECLTH,
&EOR, &MAXLTH. According to the above program if &EOR has any value, then &EORCK is set
to 1 by using the directive SET, otherwise it retains its default value 0.
The above programs show the expansion of Macro invocation statements with different
values for the time variables. In figure 4.9(b) the &EOF value is NULL. When the macro invocation
is done, IF statement is executed, if it is true EORCK is set to 1, otherwise normal execution of the
other part of the program is continued.
The macro processor must maintain a symbol table that contains the value of all macro-time
variables used. Entries in this table are modified when SET statements are processed. The table is
used to look up the current value of the macro-time variable whenever it is required. When an IF
statement is encountered during the expansion of a macro, the specified Boolean expression is
evaluated.
If the value of this expression TRUE,
The macro processor continues to process lines from the DEFTAB until it encounters the
ELSE or ENDIF statement.
If an ELSE is found, macro processor skips lines in DEFTAB until the next ENDIF.
Once it reaches ENDIF, it resumes expanding the macro in the usual way.
If the value of the expression is FALSE,
The macro processor skips ahead in DEFTAB until it encounters next ELSE or ENDIF
statement.
The macro processor then resumes normal macro expansion.
The macro-time IF-ELSE-ENDIF structure provides a mechanism for either generating (once) or
skipping selected statements in the macro body. There is another construct WHILE statement which
specifies that the following line until the next ENDW statement, are to be generated repeatedly as
long as a particular condition is true. The testing of this condition, and the looping are done during
the macro is under expansion. The example shown below shows the usage of Macro-Time Looping
statement.
WHILE-ENDW structure
When an WHILE statement is encountered during the expansion of a macro, the specified Boolean
expression is evaluated.
TRUE
o The macro processor continues to process lines from DEFTAB until it encounters the next
ENDW statement.
o When ENDW is encountered, the macro processor returns to the preceding WHILE, re-
evaluates the Boolean expression, and takes action based on the new value.
FALSE
The macro processor skips ahead in DEFTAB until it finds the next ENDW statement and
then resumes normal macro expansion.
System Software And Operating System
78
KEYWORD MACRO PARAMETERS All the macro instruction definitions used positional parameters. Parameters and arguments
are matched according to their positions in the macro prototype and the macro invocation statement.
The programmer needs to be careful while specifying the arguments. If an argument is to be omitted
the macro invocation statement must contain a null argument mentioned with two commas.
Positional parameters are suitable for the macro invocation. But if the macro invocation has
large number of parameters, and if only few of the values need to be used in a typical invocation, a
different type of parameter specification is required (for example, in many cases most of the
parameters may have default values, and the invocation may mention only the changes from the
default values).
Ex: XXX MACRO &P1, &P2, …., &P20, ….
XXX A1, A2,,,,,,,,,,…,,A20,…..
Keyword parameters
Each argument value is written with a keyword that names the corresponding parameter.
Arguments may appear in any order.
Null arguments no longer need to be used.
Ex: XXX P1=A1, P2=A2, P20=A20.
It is easier to read and much less error-prone than the positional method.
System Software And Operating System
79
System Software And Operating System
80
Use of keyword parameters in macro instructions
MACRO PROCESSOR DESIGN OPTIONS
System Software And Operating System
81
RECURSIVE MACRO EXPANSION
We have seen an example of the definition of one macro instruction by another. But we have
not dealt with the invocation of one macro by another. The following example shows the invocation
of one macro by another macro.
Problem of Recursive Expansion
Previous macro processor design cannot handle such kind of recursive macro invocation and
expansion
The procedure EXPAND would be called recursively, thus the invocation arguments in the
ARGTAB will be overwritten. The Boolean variable EXPANDING would be set to FALSE
when the “inner” macro expansion is finished, i.e., the macro process would forget that it had
been in the middle of expanding an “outer” macro.
Solutions
Write the macro processor in a programming language that allows recursive calls, thus local
variables will be retained.
If you are writing in a language without recursion support, use a stack to take care of pushing
and popping local variables and return addresses.
The procedure EXPAND would be called when the macro was recognized. The arguments
from the macro invocation would be entered into ARGTAB as follows:
Parameter Value
1 BUFFER
2 LENGTH
3 F1
4 (unused)
The Boolean variable EXPANDING would be set to TRUE, and expansion of the macro
invocation statement would begin. The processing would proceed normally until statement invoking
RDCHAR is processed. This time, ARGTAB would look like
Parameter Value
1 F1
2 (Unused)
The expansion of RDCHAR would also proceed normally. At the end of this expansion, however, a
problem would appear. When the end of the definition of RDCHAR was recognized, EXPANDING
would be set to FALSE. Thus the macro processor would "forget" that it had been in the middle of
expanding a macro when it encountered the RDCHAR statement. In addition, the arguments from
the original macro invocation (RDBUFF) would be lost because the values in ARGTAB were
overwritten with the arguments from the invocation of RDCHAR. The cause of these difficulties is
the recursive calls of the procedure EXPAND. When the RDBUFF macro invocation is encountered,
EXPAND is called. Later, it calls PROCESSLINE for line 50, which results in another call to
EXPAND before a return is made from the original call. A similar problem would occur with
PROCESSLINE since this procedure too would be called recursively. For example, there might be
confusion about whether the return from PROCESSLINE should be made to the main (outermost)
loop of the macro processor logic or to the loop within EXPAND. These problems are not difficult to
solve if the macro processor is being written in a programming language (such as Pascal or C) that
allows recursive calls. The compiler would be sure that previous values of any variables declared
within a procedure were saved when that procedure was called recursively. It would also take care of
other details involving return from the procedure.
System Software And Operating System
82
If a programming language that supports recursion is not available, the programmer must take care
of handling such items as return addresses and values of local variables. In such a case,
PROCESSLINE and EXPAND would probably not be procedures at all. Instead, the same logic
would be incorporated into a looping structure, with data values being saved on a stack. The
algorithm for implementing the recursive macro call is same as the algorithm for a one-pass macro
processor except the EXPAND and GETLINE procedures. The EXPAND and GETLINE procedures
are as follows.
procedure EXPAND
level = 0; SP = —1
begin
set S(SP + N + 2) = SP
set SP = SP + N + 2
set S(SP + 1) = DEFTAB index from NAMTAB
set up macro call argument list array in
S(SP + 2)...S(SP + N + 1) where N = total number of
arguments
while not end of macro definition and Level ! = 0
do
begin
GETLINE
PROCESSLINE
end {while)
set N = SP — S(SP) — 2
set SP = S(SP) end {EXPAND}
procedure GETLINE
begin if SP! = —1 then
begin increment DEFTAB pointer to next entry
set S(SP + 1) = S(SP + 1) + 1
get the line from DEFTAB with the pointer S(SP + 1)
substitute arguments from macro call
S(SP + 2)...S(SP + N + 1)
end
else
read next line from input file
end {GETLINE}
System Software And Operating System
83
RECURSIVE MACRO EXPANSION
1.SP=-1
2.Call RDBUFF BUFFER,LENGTH,F1
SP=1
System Software And Operating System
84
SP=S(SP)=S(1)=1
At the expansion, when the end of RDCHAR is recognized, EXPANDING would be set to
FALSE. Thus the macro processor would forget. That it had been in the middle of expanding a
macro when it encountered the RDCHAR statement. In addition, the arguments from the original
macro invocation (RDBUFF) would be lost because the value in ARGTAB was overwritten with the
arguments from the invocation of RDCHAR.
System Software And Operating System
85
Example of nested macro invocation
GENERAL-PURPOSE MACRO PROCESSORS Macro processors that do not dependent on any particular programming language, but can be
used with a variety of different languages
Pros
Programmers do not need to learn many macro languages.
Although its development costs are somewhat greater than those for a language specific
macro processor, this expense does not need to be repeated for each language, thus save
substantial overall cost.
Cons
Large number of details must be dealt with in a real programming language
Situations in which normal macro parameter substitution should not occur, e.g., comments.
Facilities for grouping together terms, expressions, or statements
Tokens, e.g., identifiers, constants, operators, keywords
Syntax had better be consistent with the source programming language
System Software And Operating System
86
MACRO PROCESSING WITHIN LANGUAGE TRANSLATORS The macro processors we discussed are called “Preprocessors”.
Process macro definitions
Expand macro invocations
Produce an expanded version of the source program, which is then used as input to an
assembler or compiler
You may also combine the macro processing functions with the language translator:
Line-by-line macro processor
Integrated macro processor
Line-by-Line Macro Processor
Used as a sort of input routine for the assembler or compiler
Read source program
Process macro definitions and expand macro invocations
Pass output lines to the assembler or compiler
Benefits
Avoid making an extra pass over the source program.
Data structures required by the macro processor and the language translator can be
combined (e.g., OPTAB and NAMTAB)
Utility subroutines can be used by both macro processor and the language translator.
o Scanning input lines
o Searching tables
o Data format conversion
It is easier to give diagnostic messages related to the source statements
Integrated Macro Processor
An integrated macro processor can potentially make use of any information about the source
program that is extracted by the language translator.
Ex (blanks are not significant in FORTRAN)
DO 100 I = 1,20
a DO statement
DO 100 I = 1
An assignment statement
DO100I: variable (blanks are not significant in FORTRAN)
An integrated macro processor can support macro instructions that depend upon the context
in which they occur.
TEXT- EDITORS Text Editors are the primary interface to the computer for all types of “Knowledge workers”
as they compose, organize, study and manipulate computer-based information.
OVERVIEW OF THE EDITING PROCESS
An interactive editor is a computer program that allows a user to create and revise a target
document. The term document includes objects such as computer programs, texts, equations, tables,
diagrams, line art and photographs-anything that one might find on a printed page. Text editor is one
in which the primary elements being edited are character strings of the target text.
System Software And Operating System
87
The document editing process is an interactive user-computer dialogue designed to
accomplish four tasks:
1. Select the part of the target document to be viewed and manipulated
2. Determine how to format this view on-line and how to display it.
3. Specify and execute operations that modify the target document.
4. Update the view appropriately.
Traveling – Selection of the part of the document to be viewed and edited. It involves first
traveling through the document to locate the area of interest such as “next screen full ” , ”bottom”,
and “find pattern”. Traveling specifies where the area of interest is;
Filtering - The selection of what is to be viewed and manipulated is controlled by filtering.
Filtering extracts the relevant subset of the target document at the point of interest such as next
Screen full of text or next statement.
Formatting: Formatting determines how the result of filtering will be seen as a visible
representation (the view) on a display screen or other device.
Editing: In the actual editing phase, the target document is created or altered with a set of
operations such as insert, delete, replace, move or copy.
Manuscript oriented editors operate on elements such as single characters, words, lines,
sentences and paragraphs;
Program-oriented editors operate on elements such as identifiers, keywords and statements.
THE USER-INTERFACE OF AN EDITOR The user of an interactive editor is presented with a conceptual model of the editing system.
The model is an abstract framework on which the editor and the world on which the operations are
based.
The line editors simulated the world of the keypunch they allowed operations on numbered
sequence of 80-character card image lines.
The Screen-editors define a world in which a document is represented as a quarter plane of
text lines, unbounded both down and to the right. The user sees, through a cutout, only a
rectangular subset of this plane on a multi line display terminal. The cutout can be moved left
or right, and up or down, to display other portions of the document. The user interface is also
concerned with the input devices, the output devices, and the interaction language of the
system.
INPUT DEVICES: The input devices are used to enter elements of text being edited, to enter
commands, and to designate editable elements.
Input devices are categorized as:
1) Text devices
2) Button devices
3) Locator devices
4) The Data Tablet
5) Text devices
1) Text or string devices are typically typewriter like keyboards on which user presses and
release keys, sending unique code for each key. Virtually all computer key boards are of the
QWERTY type.
System Software And Operating System
88
2) Button or Choice devices generate an interrupt or set a system flag, usually causing an
invocation of an associated application program. Also special function keys are also available on the
key board. Alternatively, buttons can be simulated in software by displaying text strings or symbols
on the screen. The user chooses a string or symbol instead of pressing a button.
3) Locator devices: They are two-dimensional analog-to-digital converters that position a
cursor symbol on the screen by observing the user’s movement of the device. The most common
such devices are the mouse and the tablet.
4) The Data Tablet is a flat, rectangular, electromagnetically sensitive panel. Either the
ballpoint pen like stylus or a puck, a small device similar to a mouse is moved over the surface.
The tablet returns to a system program the co-ordinates of the position on the data tablet at which the
stylus or puck is currently located. The program can then map these data-tablet co-ordinates to
screen coordinates and move the cursor to the corresponding screen position.
5) Text devices with arrow (Cursor) keys can be used to simulate locator devices. Each of
these keys shows an arrow that point up, down, left or right. Pressing an arrow key typically
generates an appropriate character sequence; the program interprets this sequence and moves the
cursor in the direction of the arrow on the key pressed.
VOICE-INPUT DEVICES: which translate spoken words to their textual equivalents, may prove to
be the text input devices of the future. Voice recognizers are currently available for command input
on some systems.
OUTPUT DEVICES
The output devices let the user view the elements being edited and the result of the editing
operations.
The first output devices were teletypewriters and other character-printing terminals that
generated output on paper.
Next “glass teletypes” based on Cathode Ray Tube (CRT) technology which uses
CRT screen essentially to simulate the hard-copy teletypewriter.
Today’s advanced CRT terminals use hardware assistance for such features as moving the
cursor, inserting and deleting characters and lines, and scrolling lines and pages.
The modern professional workstations are based on personal computers with high
resolution displays; support multiple proportionally spaced character fonts to produce
realistic facsimiles of hard copy documents.
INTERACTION LANGUAGE
The interaction language of the text editor is generally one of several common types.
The typing oriented or text command-oriented method
It is the oldest of the major editing interfaces. The user communicates with the editor by typing text
strings both for command names and for operands. These strings are sent to the editor and are
usually echoed to the output device.
Typed specification often requires the user to remember the exact form of all commands, or at least
their abbreviations. If the command language is complex, the user must continually refer to a manual
or an on-line Help function. The typing required can be time consuming for inexperienced users.
Function key interfaces:
Each command is associated with marked key on the key board. This eliminates much typing. E.g.:
Insert key, Shift key, Control key
Disadvantages:
Have too many unique keys
System Software And Operating System
89
Multiple key stroke commands
Menu oriented interface
A menu is a multiple choice set of text strings or icons which are graphical symbols that represent
objects or operations. The user can perform actions by selecting items for the menus. The editor
prompts the user with a menu. One problem with menu oriented system can arise when there are
many possible actions and several choices are required to complete an action. The display area of the
menu is rather limited
EDITOR STRUCTURE The Command Language Processor accepts input from the user’s input devices, and analyzes
the tokens and syntactic structure of the commands. It functions much like the lexical and syntactic
phases of a compiler. The command language processor may invoke the semantic routines directly.
In a text editor, these semantic routines perform functions such as editing and viewing.
The semantic routines involve traveling, editing, viewing and display functions. Editing
operations are always specified by the user and display operations are specified implicitly by the
other three categories of operations. Traveling and viewing operations may be invoked either
explicitly by the user or implicitly by the editing operations.
Typical editor structure
Editing Component
In editing a document, the start of the area to be edited is determined by the current editing
pointer maintained by the editing component, which is the collection of modules dealing with
System Software And Operating System
90
editing tasks. The current editing pointer can be set or reset explicitly by the user using travelling
commands, such as next paragraph and next screen, or implicitly as a side effect of the previous
editing operation such as delete paragraph.
Traveling Component
The traveling component of the editor actually performs the setting of the current editing and
viewing pointers, and thus determines the point at which the viewing and /or editing filtering begins.
Viewing Component
The start of the area to be viewed is determined by the current viewing pointer. This pointer
is maintained by the viewing component of the editor, which is a collection of modules responsible
for determining the next view. The current viewing pointer can be set or reset explicitly by the user
or implicitly by system as a result of previous editing operation. The viewing component formulates
an ideal view, often expressed in a device independent intermediate representation. This view may
be a very simple one consisting of a windows worth of text arranged so that lines are not broken in
the middle of the words.
Display Component
It takes the idealized view from the viewing component and maps it to a physical output
device in the most efficient manner. The display component produces a display by mapping the
buffer to a rectangular subset of the screen, usually a window.
Simple relationship between Editing and viewing buffers
Editing Filter
System Software And Operating System
91
Filtering consists of the selection of contiguous characters beginning at the current point. The
editing filter filters the document to generate a new editing buffer based on the current editing
pointer as well as on the editing filter parameters
Editing Buffer
It contains the subset of the document filtered by the editing filter based on the editing
pointer and editing filter parameters.
Viewing Filter
When the display needs to be updated, the viewing component invokes the viewing filter.
This component filters the document to generate a new viewing buffer based on the current viewing
pointer as well as on the viewing filter parameters.
Viewing Buffer
It contains the subset of the document filtered by the viewing filter pointer and viewing filter
parameters. As a part of the editing command there is implicit travel to the first line of the file. Lines
1 through 50 are then filtered from the document to become the editing buffer. Successive
substitutions take place in this editing buffer without corresponding updates of the view
In Line editors, the viewing buffer may contain the current line; in screen editors, this buffer
may contain rectangular cut out of the quarter-plane of text. This viewing buffer is then passed to the
display component of the editor, which produces a display by mapping the buffer to a rectangular
subset of the screen, usually called a window.
The editing and viewing buffers, while independent, can be related in many ways. In a
simplest case, they are identical: the user edits the material directly on the screen. On the other hand,
the editing and viewing buffers may be completely disjoint.
Windows typically cover the entire screen or rectangular portion of it. Mapping viewing
buffers to windows that cover only part of the screen is especially useful for editors on modern
graphics based workstations. Such systems can support multiple windows, simultaneously showing
different portions of the same file or portions of different file. This approach allows the user to
perform inter-file editing operations much more effectively than with a system only a single
window.
The mapping of the viewing buffer to a window is accomplished by two components of the system.
(i) First, the viewing component formulates an ideal view often expressed in a device
independent intermediate representation. This view may be a very simple one consisting of a
windows worth of text arranged so that lines are not broken in the middle of words. At the other
extreme, the idealized view may be a facsimile of a page of fully formatted and typeset text with
equations, tables and figures.
(ii) Second the display component takes these idealized views from the viewing component
and maps it to a physical output device the most efficient manner possible.
The components of the editor deal with a user document on two levels:
(i) In main memory and
(ii) In the disk file system.
Loading an entire document into main memory may be infeasible. However if only part of a
document is loaded and if many user specified operations require a disk read by the editor to locate
the affected portions, editing might be unacceptably slow. In some systems this problem is solved by
the mapping the entire file into virtual memory and letting the operating system perform efficient
demand paging.
System Software And Operating System
92
An alternative is to provide is the editor paging routines which read one or more logical
portions of a document into memory as needed. Such portions are often termed pages, although
there is usually no relationship between these pages and the hard copy document pages or virtual
memory pages. These pages remain resident in main memory until a user operation requires that
another portion of the document be loaded.
Editors function in three basic types of computing environment:
(i) Time-sharing environment
(ii) Stand-alone environment and
(iii) Distributed environment.
Each type of environment imposes some constraint on the design of an editor.
The Time –Sharing Environment
The time sharing editor must function swiftly within the context of the load on the
computer’s processor, central memory and I/O devices.
The Stand alone Environment
The editor on a stand-alone system must have access to the functions that the time sharing
editors obtain from its host operating system. This may be provided in pair by a small local operating
system or they may be built into the editor itself if the stand alone system is dedicated to editing.
Distributed Environment
The editor operating in a distributed resource sharing local network must, like a standalone
editor, run independently on each user’s machine and must, like a time sharing editor, content for
shared resources such as files.
System Software And Operating System
93
SUMMARY:
MACRO: It identify the beginning of a macro definition
MEND: It identify the end of a macro definition
Basic macro processor functions:
Macro Definition and Expansion
Macro Processor Algorithms and Data structures
Two-pass macro processor:
Pass 1: Process all macro definitions
Pass 2: Expand all macro invocation statements
One-Pass Macro Processor:
A one-pass macro processor that alternate between macro definition and macro
expansion in a recursive way is able to handle recursive macro definition.
NAMTAB (Name Table):
Stores macro names
Serves as an index to DEFTAB
Pointers to the beginning and the end of the macro definition (DEFTAB)
ARGTAB (Argument Table):
Stores the arguments according to their positions in the argument list.
As the macro is expanded the arguments from the Argument table are
substituted for the corresponding parameters in the macro body.
Comparison of Macro Processor Design
One-pass algorithm
Every macro must be defined before it is called
One-pass processor can alternate between macro definition and macro expansion
Nested macro definitions are allowed but nested calls are not allowed.
Two-pass algorithm
Pass1: Recognize macro definitions.
Pass2: Recognize macro calls.
Nested macro definitions are not allowed.
General-purpose macro processors
Macro processors that do not dependent on any particular programming language, but
can be used with a variety of different languages
Machine-independent macro-processor features:
o Concatenation of Macro Parameters
o Generation of unique labels
o Conditional Macro Expansion
o Keyword Macro Parameters
System Software And Operating System
94
UNIT III
Machine dependent compiler features - Intermediate form of the program-Machine dependent code
optimization-machine independent compiler features-Compiler design options-division into passes-
interpreters-p –code compilers-compiler-compilers.
MACHINE INDEPENDENT COMPILER FEATURES
Machine independent compilers describe the method for handling structured variables such
as arrays. Problems involved in compiling a block-structured language indicate some possible
solution.
STRUCTURED VARIABLES
Structured variables discussed here are arrays, records, strings and sets. The primarily
consideration is the allocation of storage for such variable and then the generation of code to
reference them. The same principles can also be applied to the other types of structured variables.
Arrays: In Pascal array declaration -
(i)Single dimension array:
A: ARRAY [ 1 . . 10] OF INTEGER
If each integer variable occupies one word of memory, then we require 10 words of
memory to store this array. In general an array declaration is
ARRAY [ l .. u ] OF INTEGER
Memory word allocated = ( u - l + 1) words.
(ii)Two dimension array :
B : ARRAY [ 0 .. 3,1 . . 3 ] OF INTEGER
System Software And Operating System
95
In this type of declaration total word memory required is 0 to 3 = 4 ;1 - 3 = 3 ; 4 x 3 = 12 word
memory locations.
In general : ARRAY [ l1 .. u1,l2 . . u2.] OF INTEGER Requires ( u1 - l1 + 1)* ( u2 -l2 + 1) Memory
words
The data is stored in memory in two different ways. They are row-major and column major.
All array elements that have the same value of the first subscript are stored in contiguous locations.
This is called row-major order. It is shown in fig. 30(a). Another way of looking at this is to scan the
words of the array in sequence and observe the subscript values. In row-major order, the right most
subscript varies most rapidly.
Fig. 30(b) shows the column major way of storing the data in memory. All elements that
have the same value of the second subscript are stored together; this is called column major order. In
other words, the column major order, the left most subscript varies most rapidly.
To refer to an element, we must calculate the address of the referenced element relative to
the base address of the array. Compiler would generate code to place the relative address in an index
register. Index addressing mode is made easier to access the desired array element.
(1) One Dimensional Array: On a SIC machine to access A [6], the address is
Calculated by starting address of data + size of each data * number of preceding data.
i.e. Assuming the starting address is 1000H
Size of each data is 3 bytes on SIC machine
Number of preceding data is 5
Therefore the address for A [ 6 ] is = 1000 + 3 * 5 = 1015.In general for A:ARRAY [ l . . u
] of integer, if each array element occupies W bytes of storage and if the value of the subscript is S,
then the relative address of the referred element A[ S ] is given by W * ( S - l ).
The code generation to perform such a calculation is shown in fig. 31.
The notation A [ i2 ] in quadruple 3 specifies that the generated machine code
Should refer to A using index addressing after having placed the value
A: ARRAY [ 1 . . 10 ] OF INTEGER
(2) Multi-Dimensional Array: In multi-dimensional array we assume row major order. To access
element B[ 2,3 ] of the matrix B[ 6, 4 ], we must skip over two complete rows before arriving at the
beginning of row 2. Each row contains 6 elements so we have to skip 6 x 2 = 12 array elements
before we come to the beginning of row 2 to arrive at B[ 2, 3 ]. To skip over the first two elements of
row 2 to arrive at B[ 2, 3 ]. This makes a total of 12 + 2 = 14 elements between the beginning of the
array and element B[2, 3 ]. If each element occurs 3 byte as in SIC, the B[2, 3] is located relating at
14 x 3 =42 address within the array.
Generally the two dimensional array can be written as
System Software And Operating System
96
B ; ARRAY [ l1 . . . u1,l1 . . . u1, ]OF INTEGER
Code Generation for Two Dimensional Array
The symbol - table entry for an array usually specifies the following:
1. The type of the elements in the array
2. The number of dimensions declared
3. The lower and upper limit for each subscript.
4. This information is sufficient for the compiler to generate the code required for array
reference. Some of the languages line FORTRAN 90, the values of ROWS and COLUMNS
are not known at completion time. The compiler cannot directly generate code. Then, the
compiler create a descriptor called dope vector for the array. The descriptor includes space
for storing the lower and upper bounds for each array subscript. When storage is allocated for
the array, the values of these bounds are computed and stored in the descriptor. The
generated code for one array reference uses the values from the descriptor to calculate
relative addresses as required. The descriptor may also include the number of dimension for
the array, the type of the array elements and a pointer to the beginning of the array. This
information can be useful if the allocated array is passed as a parameter to another procedure.
System Software And Operating System
97
Code Generation For Array References
For example, FORTRAN 90 provides dynamic arrays. Using this feature, a two-dimensional array
could be declared as
INTEGER, ALLOCABLE, ARRAY (:,:) :: MATRIX
This specifies that MATRIX is an array of integers that can be allocated lynamically. The
allocation can be accomplished by a statement like
ALLOCATE (MATRIX(ROWS,COLUMNS))
Where the variables ROWS and COLUMNS have previously been assigned values. Since the
values of ROWS and COLUMNS are not known at compilation time.In the compilation of other
structured variables like recode, string and sets the same type of storage allocations are required. The
compiler must store information concerning the structure of the variable and use the information to
generate code to access components of the structure and it must construct a description for situation
in which the required conformation is not known at compilation time.
MACHINE - INDEPENDENT CODE OPTIMIZATION
One important source of code optimization is the elimination of common sub-expressions.
These are sub-expressions that appear at more than one point in the program and that compute the
same value.
x, y : ARRAY [ 0 . . 10,1 . . 10 ] OF INTEGER
....FOR I : = 1 TO 10 DO
X [ I, 2 * J - 1 ]: = Y[ I,2 * J }
The sub-expression 2 * J is calculated twice. An optimizing compiler should generate code
so that the multiplication is performed only once and the result is used in both places. Common sub-
expressions are usually detected through the analysis of an intermediate form of the program.
System Software And Operating System
98
(8) JLE I #20 (4)
CODE OPTIMIZATION BY REDUCTION IN STRENGTH OF OPERATIONS
The operand J is not changed in value between quadruples 5 and 12. It is not possible to
reach quadruple 12 without passing through quadruple 5 first because the quadruples are part of the
same basic block. Therefore, quadruples 5 and 12 compute the same value. This means we can
delete quadruple 12 and replace any reference to its result (i10), with the reference to i3, the result of
quadruple 5. this information eliminates the duplicate calculation of 2 * J which we identified
previously as a common expression in the source statement.
After the substitution of i3 for i10 , quadruples 6 and 13 are the same except for the name of
the result. Hence the quadruple 13 can be removed and substitute i4 for i11wherever it is used.
Similarly quadruple 10 and 11 can be removed because they are equivalent to quadruples 3 and 4.
STORAGE ALLOCATION
All the program defined variable, temporary variable, including the location used to save
the return address use simple type of storage assignment called static allocation.
When recursively procedures are called, static allocation cannot be used. This is explained
with an example. Fig. 38(a) shows the operating system calling the program MAIN. The return
address from register 'L' is stored as a static memory location RETADR within MAIN.
MAIN has called the procedure SUB. The return address for the call has been stored at a
fixed location within SUB (invocation 2). If SUB now calls itself recursively as shown in a problem
occurs.SUB stores the return address for invocation 3 into RETADR from register L. This destroys
the return address for invocation 2. As a result, there is no possibility of ever making a correct return
to MAIN.
There is no provision of saving the register contents. When the recursive call is made,
variable within SUB may set few variables. These variables may be destroyed. However, these
previous values may be needed by invocation 2 or SUB after the return from the recursive call.
Hence it is necessary to preserve the previous values of any variables used by SUB, including
parameters, temporaries, return addresses, and register save areas etc., when a recursive call is made.
This is accomplished with a dynamic storage allocation technique.
System Software And Operating System
99
In this technique, each procedure call creates an activation record that contains storage for
all the variables used by the procedure. If the procedure is called recursively, another activation
record is created. Each activation record is associated with a particular invocation of the procedure,
not with the itself. An activation record is not deleted until a return has been made from the
corresponding invocation.
Activation records are typically allocated on a stack, with the correct record at the tip of the
stack. The procedure MAIN has been called; its activation record appears on the stack. The base
register B has been set to indicate the starting address of this correct activation record. The first word
in an activation record would normally contain a pointer PREV to the previous record on the stack.
Since the record is the first, the pointer value is null.
The second word of the activation record contain a portion NEXT to the first unused word of
the stack, which will be the starting address for the next activation record created. The third word
System Software And Operating System
100
contains the return address for this invocation of the procedure, and then necessary words contain the
values of variables used by the procedure.
(a) (b) (c)
Recursive invocation of a procedure using static storage allocation
System Software And Operating System
101
Activation records are typically allocated on a stack, with the current record at the top of the stack.
The procedure MAIN has been called; its activation record appears on the stack. The base register B
has been set to indicate the starting address of this current activation record. The first word in an
activation record would normally contain a pointer PREV to the previous record on the stack. Since
this record is the first, the pointer value is null. The second word of the activation record contains a
pointer NEXT to the first unused word of the stack, which will be the starting address for the next
activation record created. The third word contains the return address for this invocation of the
procedure, and the remaining words contain the values of variables used by the procedure.
System Software And Operating System
102
©
System Software And Operating System
103
(d)
System Software And Operating System
104
When a procedure returns to its caller, the current activation record (which corresponds to the most
recent invocation) is deleted. The pointer PREV in the deleted record is used to reestablish the
previous activation record as the cur-rent one, and execution continues. This shows the stack as it
would appear after SUB returns from the recursive call. Register B has been reset to point to the
activation record for the previous invocation of SUB.
The return address and all the variable values in this activation record are exactly the same as
they were before the recursive call. This technique is often referred to as automatic allocation of
storage to distinguish it from other types of dynamic allocation that are under the control of the
programmer.
When automatic allocation is used, the compiler must generate code for references to
variables using some sort of relative addressing. In our example the compiler assigns to each
variable an address that is relative to the beginning of the activation record, instead of an actual
location within the object program.
The address of the current activation record is, by convention, contained in register B, so a
reference to a variable is translated as an instruction that uses base relative addressing. The
displacement in this instruction is the relative address of the variable within the activation record.
The compiler must also generate additional code to manage the activation records themselves. At the
beginning of each procedure there must be code to create a new activation record, linking it to the
previous one and setting the appropriate pointers. This code is often called a prologue for the
procedure. At the end of the procedure, there must be code to delete the current activation record,
resetting pointers as needed. This code is often called an epilogue.
BLOCK – STRUCTURED LANGUAGE
A block is a unit that can be divided in a language. It is a portion of a program that has
the ability to declare its own identifiers. This definition of a block is also meeting the units such as
procedures and functions.
Each procedure corresponds to a block. Note that blocks are rested within other blocks.
Example: Procedures B and D are rested within procedure A and procedure C is rested within
procedure B. Each block may contain a declaration of variables. A block may also refer to variables
that are defined in any block that contains it, provided the same names are not redefined in the inner
block. Variables cannot be used outside the block in which they are declared.
In compiling a program written in a blocks structured language, it is convenient to number
the blocks .As the beginning of each new block is recognized, it is assigned the next block number in
sequence. The compiler can then construct a table that describes the block structure. The block-level
entry gives the nesting depth for each block. The outer most block number that is one greater than
that of the surrounding block.
A NEW or MALLOC statement would be translated into a request to the operating system for an
area of storage of the required size. Another method is to handle the required allocation through a
run-time support procedure associated with the compiler. With this method, a large block of free
storage called a heap is obtained from the operating system at the beginning of the program.
Allocations of storage from the heap are managed by the run-time procedure. In some systems, it is
System Software And Operating System
105
not even necessary for the programmer to free storage explicitly. Instead, a run-time garbage
collection procedure scans the pointers in the program and reclaims areas from the heap that are no
longer being used. Dynamic storage allocation, as discussed in this section, provides another
example of delayed binding.
Nesting of blocks in a source program
System Software And Operating System
106
When a reference to an identifier appears in the source program, the compiler must first
check the symbol table for a definition of that identifier by the current block. If no such definition is
found, the compiler looks for a definition by the block that surrounds the current one, then by the
block that surrounds that, and so on.
If the outermost block is reached without finding a definition of the identifier, then the
reference is an error. The search process just described can easily be implemented within a symbol
table that uses hashed addressing. The hashing function is used to locate one definition of the
identifier. The chain of definitions for that identifier is then searched for the appropriate entry.
There are other symbol-table organizations that store the definitions of identifiers according
to the nesting of the blocks that define them. This kind of structure can make the search for the
proper definition more efficient. Most block-structured languages make use of automatic storage
allocation. That is, the variables that are defined by a block are stored in an activation record that is
created each time the block is entered.
If a statement refers to a variable that is declared within the current block, this variable is
present in the current activation record, so it can be accessed in the usual way. However, it is also
possible for a statement to refer to a variable that is declared in some surrounding block. In that case,
the most recent activation record for that block must be located to access the variable. One common
method for providing access to variables in surrounding blocks uses a data structure called a display.
Use of display for procedures
System Software And Operating System
107
COMPILER DESIGN OPTIONS
The compiler design is briefly discussed in this section. The compiler is divided into single pass and
multi pass compilers.
DIVISION INTO PASSES
In this design the parsing process driven the compiler. The lexical scanner was called when
the parser needed another input token and a code-generation routine was invoked as the parser
recognized each language construct. The code optimization techniques discussed cannot be applied
System Software And Operating System
108
in total to one-pass compiler without intermediate code-generation. One pass compiler is efficient to
generate the object code.
One pass compiler cannot be used for translation for all languages. FORTRAN and
PASCAL language programs have declaration of variable at the beginning of the program. Any
variable that is not declared is assigned characteristic by default.
One pass compiler may fix the formal reference jump instruction without problem as in one
pass assembler. But it is difficult to fix if the declaration of an identifier appears after it has been
used in the program as in some programming languages.
Example: X : = Y * Z
If all the variables x, y and z are of type INTEGER, the object code for this statement might
consist of a simple integer multiplication followed by storage of the result. If the variable are a
mixture of REAL and INTEGER types, one or more conversion operations will need to be included
in the object code, and floating point arithmetic instructions may be used. Obviously the compiler
cannot decide what machine instructions to generate for this statement unless instruction about the
operands is available. The statement may even be illegal for certain combinations of operand types.
Thus a language that allows forward reference to data items cannot be compiled in one pass.
Some programming language requires more than two passes.
Example: ALGOL-98 requires at least 3 passes.
There are a number of factors that should be considered in deciding between one pass and multi pass
compiler designs.
(1)One Pass Compilers:
Speed of compilation is considered important. Computer running students jobs tend to spend
a large amount of time performing compilations. The resulting object code is usually executed only
once or twice for each compilation, these test runs are not normally very short. In such an
environment, improvement in the speed of compilation can lead to significant benefit in system
performance and job turnaround time.
(2)Multi-Pass Compilers:
If programs are executed many times for each compilation or if they process large amount of
data, then speed of executive becomes more important than speed of compilation. In a case, we
might prefer a multi-pass compiler design that could incorporate sophisticated code-optimization
technique.
Multi-pass compilers are also used when the amount of memory, or other systems
resources, is severely limited. The requirements of each pass can be kept smaller if the work by
compilation is divided into several passes.
Other factors may also influence the design of the compiler. If a compiler is divided into
several passes, each pass becomes simpler and therefore, easier to understand, read and test.
System Software And Operating System
109
Different passes can be assigned to different programmers and can be written and tested in parallel,
which shortens the overall time require for compiler construction.
INTERPRETERS
An interpreter processes a source program written in a high-level language. The main
difference between compiler and interpreter is that interpreters execute a version of the source
program directly, instead of translating it into machine code.
An interpreter performs lexical and syntactic analysis functions just like compiler and then
translates the source program into an internal form. The internal form may also be a sequence of
quadruples.
After translating the source program into an internal form, the interpreter executes the
operations specified by the program. During this phase, an interpreter can be viewed as a set of
subtractions. The internal form of the program drives the execution of this subtraction.
The real advantage of an interpreter over a compiler, however is in the debugging facilities
that can easily be provided. The symbol table, source line number, and other information from the
source program are usually retained by the interpreter. During execution, these can be used to
produce symbolic dumps of data values, traces of program execution related to the source statements
etc.
Most programming languages can be either compiled or interpreted successfully. However,
some languages are particularly well suited to the use of interpreter. Compilers usually generate calls
to library routines to perform function such as I/O and complex conversion operations. In such cases,
an interpreter might be performed because of its speed of translation. Most of the execution time for
the standard program would be consumed by the standard library routines. These routines would be
the same, regardless of whether a compiler or an interpreter is used.
In some languages the type of a variable can change during the execution of a program.
Dynamic scoping is used, in which the variable that are referred to by a function or a subroutines are
determined by the sequence of calls made during execution, not by the nesting of blocks in the
source program. It is difficult to compile such language efficiently and allow for dynamic changes in
the types of variables and the scope of names. These features can be more easily handled by an
interpreter that provides delayed binding of symbolic variable names to data types and locations.
P-CODE COMPILERS
P-Code compilers also called bytecode compilers are very similar in concept to interpreters.
A P-code compiler, intermediate form is the machine language for hypothetical computers, often
called pseudo-machine or P-machine. The source program is compiled, with the resulting object
program being in P-code. This P-code program is then read and executed under the control of a P-
code interpreter.
The main advantage of this approach is portability of software. It is not necessary for the
compiler to generate different code for different computers, because the P-code object program can
be executed on any machine that has a P-code interpreter. Even the compiler itself can be transported
if it is written in the language that it compiles. To accomplish this, the source version of the compiler
System Software And Operating System
110
is compiled into P-code; this P-code can then be interpreted on another compiler. In this way P-code
compiler can be used without modification as a wide variety of system if a P-code interpreter is
written for each different machine.
The design of a P-machine and the associated P-code is often related to the requirements of
the language being compiled. For example, the P-code for a Pascal compiler might include single P-
instructions that perform:
Array subscript calculation
Handle the details of procedure entry and exit and
Perform elementary operation on sets
Translation and execution using a P-code compiler
This simplifies the code generation process, leading to a smaller and more efficient compiler. The P-
code object program is often much smaller than a corresponding machine code program. This is
particularly useful on machines with severely limited memory size. The interpretive execution of P-
code program may be much slower than the execution of the equivalent machine code. Many P-code
compilers are designed for a single user running on dedicated micro-computer systems. In that case,
the speed of execution may be relatively insignificant because the limiting factor is system
performance may be the response time and “think time " of the user.
If execution speed is important, some P-code compilers support the use of machine-
language subtraction. By rewriting a small number of commonly used routines in machine language,
Source
Program
Compile
Object Program
(P-code)
Execute
P-code
compiler
P-code
interpreter
System Software And Operating System
111
rather than P-code, it is often possible to improve the performance. Of course, this approach
sacrifices some of the portability associated with the use of P-code compilers.
COMPILER-COMPILERS
Compiler-Compiler is software tool that can be used to help in the task of compiler
construction. Such tools are also called Compiler Generators or Translator - writing system
The compiler writer provides a description of the language to be translated. A compiler-
compiler is a software tool that can be used to help in the task of compiler construction. such tools
are also often called compiler generators or translator-writing systems.
This description may consist of a set of lexical rules for defining tokens and a grammar for
the source language. Some compiler-compilers use this information to generate a scanner and a
Compiler parses directly. Others create tables for use by standard table-driven scanning and parsing
routines that are supplies by the compiler - compiler.
Compiler
Automated compiler construction using a compiler-compiler
Difference between assembler, compiler and interpreter
Assembler:
A computer will not understand any program written in a language, other than its machine
language. The programs written in other languages must be translated into the machine language.
Such translation is performed with the help of software. A program which translates an assembly
language program into a machine language program is called an assembler. If an assembler which
runs on a computer and produces the machine codes for the same computer then it is called self
Lexical Rules
Grammar
Compiler-
Compiler
Scanner
Parser
Code generator
Semantic
routines
System Software And Operating System
112
assembler or resident assembler. If an assembler that runs on a computer and produces the machine
codes for other computer then it is called Cross Assembler.
Assemblers are further divided into two types: One Pass Assembler and Two Pass
Assembler. One pass assembler is the assembler which assigns the memory addresses to the
variables and translates the source code into machine code in the first pass simultaneously. A Two
Pass Assembler is the assembler which reads the source code twice. In the first pass, it reads all the
variables and assigns them memory addresses. In the second pass, it reads the source code and
translates the code into object code.
Compiler:
It is a program which translates a high level language program into a machine language
program. A compiler is more intelligent than an assembler. It checks all kinds of limits, ranges,
errors etc. But its program run time is more and occupies a larger part of the memory. It has slow
speed. Because a compiler goes through the entire program and then translates the entire program
into machine codes. If a compiler runs on a computer and produces the machine codes for the same
computer then it is known as a self compiler or resident compiler. On the other hand, if a compiler
runs on a computer and produces the machine codes for other computer then it is known as a cross
compiler.
Interpreter:
An interpreter is a program which translates statements of a program into machine code. It
translates only one statement of the program at a time. It reads only one statement of program,
translates it and executes it. Then it reads the next statement of the program again translates it and
executes it. In this way it proceeds further till all the statements are translated and executed. On the
other hand, a compiler goes through the entire program and then translates the entire program into
machine codes. A compiler is 5 to 25 times faster than an interpreter.
By the compiler, the machine codes are saved permanently for future reference. On
the other hand, the machine codes produced by interpreter are not saved. An interpreter is a small
program as compared to compiler. It occupies less memory space, so it can be used in a smaller
system which has limited memory space.
System Software And Operating System
113
SUMMARY:
Machine independent compilers describe the method for handling structured variables such as
arrays. Problems involved in compiling a block-structured language indicate some possible
solution.
Structured variables are arrays, records, strings and sets.
All the program defined variable, temporary variable, including the location used to save the return
address use simple type of storage assignment called static allocation.
When recursively procedures are called, static allocation cannot be used.
A block is a unit that can be divided in a language. It is a portion of a program that has the ability
to declare its own identifiers. This definition of a block is also meeting the units such as procedures
and functions.
A NEW or MALLOC statement would be translated into a request to the operating system for an
area of storage of the required size.
The compiler is divided into single pass and multi pass compilers.
An interpreter processes a source program written in a high-level language.
The main difference between compiler and interpreter is that interpreters execute a version of
the source program directly, instead of translating it into machine code.
P-Code compilers also called bytecode, intermediate form is the machine language for
hypothetical computers, often called pseudo-machine or P-machine.’
Compiler-Compiler is software tool that can be used to help in the task of compiler construction.
Such tools are also called Compiler Generators or Translator - writing system.
System Software And Operating System
114
UNIT IV
Introduction: Definition of DOS – History of DOS – Definition of Process - Process states - process
states transition – Interrupt processing – interrupt classes - Storage Management Real Storage: Real
storage management strategies – Contiguous versus
Non-contiguous storage allocation – Single User Contiguous Storage allocation- Fixed partition
multiprogramming – Variable partition multiprogramming. Virtual Storage: Virtual storage
management strategies – Page replacement strategies – Working sets – Demand paging – page size.
INTRODUCTION
DEFINITION OF DOS
“An Operating System can be defined as a program, implemented in either software or
firmware, that makes the hardware usable. The Disk Operating System can also be defined as
the software that controls the hardware”.
Hardware provides “raw computing power”. Operating system makes this computing power
conveniently available to users, and they manage the hardware carefully to achieve good
performance.
Operating system are primarily resource managers. The main resource that the OS manages
is computer hardware in the form of processors, storage, input /output devices, communication
devices, and data. Operating system perform many functions such as implementing the user
interface, sharing hardware among users, allowing users to share data among themselves, preventing
users from interfacing with one another, scheduling resources among users, facilitating i/o
recovering from errors, accounting for resource usage, facilitating parallel operations, organizing
data for secure and rapid access and handling network communications.
The basic functions of an operating system are as follows:
File management
Working with the Files like picking up and preparing to use a tools like calculator.
Configuration of working environment.
The operating system translates the computer language (refer to the hardware lesson for a
review) into information. One method of presenting this information is via a graphical user interface
(GUI). Elements of a GUI include such things as windows, menus, buttons, scroll bars, icons and the
desktop. The desktop is the primary GUI generated by the operating system.
System Software And Operating System
115
BLOCK DIAGRAM OF OPERATING SYSTEM
OPERATING SYSTEM OBJECTIVES AND FUNCTIONS
An operating system is a program that controls the execution of application programs
and acts as an interface between the user of a computer and the computer hardware. An operating
system can be thought of as having three objectives or performing 3 functions.
System Software And Operating System
116
1) Convenience:
An operating system makes a computer more convenient to use.
2) Efficiency:
An operating system allows the computer system resources to be used in an efficient manner.
3) Ability to evolve:
An operating system should be constructed in such a way as to permit the effective
development, testing and introduction of new system functions without at the same time interfering
with service.
HISTORY OF DOS
SUBHEADINGS
THE 1940’s AND 1950’s.
THE 1960’s.
THE EMERGENCE OF NEW FIELD : SOFTWARE ENGINEERING.
THE 1980’s.
THE 1990’s.
UNIX.
The 1940’s and the 1950’s:
Operating system has evolved over the last 40 years through a number of distinct phases or
generations. In the 1940’s, the earliest electronic digital computers had no operating system.
Machines of the time programs were entered one bit at a time on rows of mechanical switches.
Machine language programs were entered on punched cards, and assembly languages were
developed to speed the programming process.
The general motors’ research laboratories implemented the first operating system in the early
1990’s for their IBM 701.The systems of the 1950’s generally ram only one job at a time and
smoothed the transition between jobs to get maximum initialization of the computer system. these
were called single-stream batch processing system because programs and data were submitted in
groups or batches.
The 1960’s:
System Software And Operating System
117
The systems of the 1960’s were also batch processing systems, but they were able to take
better advantage of the computer’s resources by running several jobs at once. They contained many
peripheral devices such as card readers, card punches, printers, tape drives and disk drives. Any one
job rarely utilized all a computer’s resources effectively. Operating system designers that when one
job was waiting for an i/o operation to complete before the job could continue using the processor,
some other job could use the idle processor.
Similarly, when one job was using the processor other job could be using the various input
/output devices. In fact running a mixture of diverse jobs appeared to be the best way to optimize
computer utilization. So operating system designers developed the concept of in which several jobs
are in main memory at once, a processor is switched from job to job as needed to keep several jobs
advancing while keeping the peripheral devices in use.
More advanced operating system were developed to service multiple interactive users at
once. Timesharing systems were developed to multi program large numbers of simultaneous
interactive users. Many of the time-sharing systems of the 1960’s were multimode systems also
supporting batch processing as well as real-time application. Real-time systems are characterized by
supplying immediate response.
The key time-sharing development efforts of this period included the CTSS system
developed at MIT, the TSS system developed by IBM, the multicast system developed at MIT, as
the successor to CTSS turnaround time that is the time between submission of a job and the return of
results, was reduced to minutes or even seconds.
THE EMERGENCE OF A NEW FIELD: SOFTWARE ENGINEERING
The operating system developed during the 1960’s endless hours and countless dollars were
spent detecting and removing bugs that should never have entered the systems in the first place. So
much attention was given to these problems of constructing software systems. This spawned the
field of engineering is developing a disciplined and structured approach to the construction of
reliable, understandable and maintainable software.
The 1980’s:
The 1980’s was the decade of the personal computer and the workstation. Individuals could
have their own dedicated computers for performing the bulk of their work, and they use
communication facilities for transmitting data between systems. Computing was distributed to the
sites at which it was needed rather than bringing the data to be processed to some central, large -
scale, computer installation. The key was to transfer information between computers in computer
networks. E-mail file transfer and remote database access applications and client/server model
become widespread.
The 1990’s and beyond:
In 1990’s the distributed computing were used in which computations will be paralleled into
sub - computations that can be executed on other processors in multiprocessor computers and in
computer networks. Networks will be dynamically configured new devices and s/w are
added/removed.
System Software And Operating System
118
When new server is added, it will make itself known to the server tells the networks about its
capabilities, billing policies accessibility and forth client need not know all the details of the
networks instead they contact locating brokers for the services provided by servers. The locating
brokers know which servers are available, where they are, and how to access them. This kind of
connectivity will be facilitated by open system standards and protocols.
Computing is destined to become very powerful and very portable. In recent years, laptop
computers have been introduced that enable people to carry their computers with them where ever
they go. With the development of OSI communication protocols, integrated services digital network
(ISDN) people will be able to communicate and transmit data worldwide with high reliability.
UNIX
The unix operating system was originally designed in the late 1960’s and elegance attracted
researchers in the universities and industry. UNIX is the only operating system that has been
implementing on computers ranging from micros to supercomputers
DEFINITIONS OF “PROCESS”
The term “Process” was first used by the designers of the Multicast system in the 1960s.
Some definitions of process are as follows:
A program in execution.
An asynchronous activity.
The “animated spirit” of a procedure.
The “locus of control” of a procedure in execution.
That which is manifested by the existence of a “process control block” in the operating
system.
That entity to which processors are assigned.
The “dispatchable” unit.
PROCESS STATES
A process goes through a series of discrete process states. Various events can cause a process
to change states.
A process is said to be running (i.e., in the running state) if it currently has the CPU. A
process is said to be ready (i.e., in the ready state) if it could use a CPU if one were available. A
process is said to be blocked (i.e., in the blocked state) if it is waiting for some event to happen (such
as an I/O completion event) before it can proceed. For example consider a single CPU system, only
one process can run at a time, but several processes may be ready, and several may be blocked. So
establish a ready list of ready processes and a blocked list of blocked processes. The ready list is
maintained in priority order so that the next process to receive the CPU is the first process on the list.
PROCESS STATE TRANSITIONS(***5m)
SUBHEADINGS
System Software And Operating System
119
INTRODUCTION.
DIAGRAM OF PROCESS STATE TRANSITIONS.
STATE TRANSITION DEFINITIONS.
THE PROCESS CONTROL BLOCK.
When a job is admitted to the system, a corresponding process is created and normally inserted at
the back of the ready list. The process gradually moves to the head of the ready list as the processes
before it complete their turns at using the CPU. When the process reaches the head of the list, and
when the CPU becomes available , the process is given the CPU and is said to make a state
transition from ready state to the running state. The assignment of the CPU to the first process on
the ready list is called dispatching, and is performed by a system entity called the dispatcher. We
indicate this transition as follows
Dispatch (process name): ready --> running.
To prevent any one process to use the system as a monopoly, the operating system sets a
hardware interrupting clock (or interval timer) to allow this user to run for a specific time interval
or quantum. If the process does not leave the CPU before the time interval expires, the interrupting
clock generates an interrupt, causing the operating system to regain control. The operating system
then makes the previously running process ready, and makes the first process on the ready list
running.
These state transitions are indicated as
Timerunout (processname) : running --> ready
System Software And Operating System
120
and dispatch (processname) : ready --> running.
If a running process initiates an input/output operation before its quantum expires, the running
process voluntarily leaves the CPU. This state transition is
Block (processname): running blocked.
When an input/output operation (or some other event the process is waiting for) completes. The
process makes the transition from the blocked state to the ready state. The transition is
Wakeup (processname): blocked --> ready.
So the possible state transitions can be sequenced as:
dispatch (processname) : ready --> running.
Timerunout (processname) : running --> ready.
Block (processname): running blocked.
Wakeup (processname): blocked --> ready.
THE PROCESS CONTROL BLOCK (PCB)
The PCB is a data structure containing certain important information about the process including.
The current state of the process.
Unique identification of the process.
Pointers to the process’s parent (i.e., the process that created this process).
Pointers to the process’s child processes (i.e., processes created by this process).
The process’s priority.
Pointers to locate the process’s memory.
Pointers to allocated resources.
A register save area.
The processor it is running on ( in multiprocessor system)
The PCB is a central store of information that allows the operating system to locate all key
information about a process. The PCB is the entity that defines a process to the operating system.
INTERRUPT PROCESSING (***8m)
System Software And Operating System
121
SUBHEADINGS
DEFINITION OF INTERRUPTS.
TYPES OF INTERRUPTS.
**HARDWARE INTERRUPTS.
**SOFTWARE INTERRUPTS.
INTERRUPT PROCESS DIAGRAM.
INTERRUPT CLASSES.
** SUPERVISORY CLASSES.
**I/O INTERRUPTS.
**EXTERNAL INTERRUPTS.
**RESTART INTERRUPTS.
**PROGRAM CHECK INTERRUPTS.
**MACHINE CHECK INTERRUPTS.
DEFINITION
An interrupt is an event that alters the sequence in which a processor executes instructions.
An interrupt is a dynamic event that needs prompt attention by the CPU. Usually an interrupt only
needs a short period of CPU time to serve it. After that the original process can resume its execution.
TYPES There are two types interrupting events:
HARDWARE INTERRUPTS.
SOFTWARE INTERRUPTS.
Hardware interrupts that are those issued by I/O device controllers when they need CPU
to process I/O data.
Software interrupts or traps are raised when the current process executes a special trap
instruction to indicate that something wrong has happened or the process needs special service from
the operating system (like performing some I/O operation).
Each type of I/O device has a special program called an interrupt handler to serve the
interrupt requests from these devices.
All software traps, there is a special trap handler. Each type of interrupt has an associated
priority level.
System Software And Operating System
122
A running process would only be interrupted by an interrupt source or trap of higher priority.
When the CPU is executing an interrupt handler, the interrupt handler may be further interrupted by
an interrupt source of even higher priority. It is generated by the Hardware of the computer system.
The main advantage of interrupt concept is that it provides a low-overhead means of gaining
the attention of the CPU. This eliminates the need for the CPU to remain busy polling to check if
the devices require the usage of the CPU.
The disadvantage of interrupt concept is that, it is also possible that the system can become
overloaded. If interrupts arrive quickly, the system may not be able to keep up with the interrupts.
THE INTERRUPT PROCESS
When an interrupt occurs.
The operating system gains control.
System Software And Operating System
123
The operating system saves the state of the interrupted process. In many systems this
information is stored in the interrupted process’s Process Control Block.
The operating system analyzes the interrupt and passes control to the appropriate
routing to handle the interrupt. Today a system is handled automatically by the
hardware.
The interrupt handler routine processes the interrupt.
The state of the interrupted process is restored.
The interrupted process executes.
An interrupt may be initiated by a running process called a trap and said to be synchronous
with the operation of the process or it may be caused by some event that may or may not be related
to the running process it is said to be asynchronous with the operation of the process.
INTERRUPT CLASSES
There are six interrupt classes. They are
* SVC (Supervisor Call) interrupts.
These are initiated by a running process that execute the svc is a user generated request for a
particular system service such as performing input/output, obtaining more storage, or
communicating with the system operator.
* I/O interrupts:
These are initiated by the input/output hardware. They signal to the cpu that the status of a
channel or device has changed. For e.g., they are caused when an I/O operation completes, when an
I/O error occurs.
* External interrupts:
These are caused by various events including the expiration of a quantum on an interrupting
clock or the receipt of a signal from another processor on a multiprocessor system.
* Restart interrupts:
These occur when the operator presses the restart button or arrival of restart signal
processor instruction from another processor on a multiprocessor system.
* Program check interrupts:
These may occur when a programs machine language instructions are executed. These
problems include division by zero, arithmetic overflow or underflow, data is in wrong format,
attempt to execute an invalid operation code or attempt to refer a memory location that do not exist
or attempt to refer protected resource.
* Machine check interrupts:
These are caused by multi-functioning hardware.
STORAGE MANAGEMENT REAL STORAGE
System Software And Operating System
124
Storage management strategies determine how a particular storage organization performs
under various policies.
*when do we get a new program to place in the memory?
*do we get it when the system specifically asks for it, or
*do we attempt to anticipate the systems requests?
*where in main storage do we place the next program to be run?
*do we place the program as close as possible into available memory slots to minimize
wasted space.
If a new program needs to be placed in main storage and if main storage is currently full, which of
the other programs do we displace? Should we replace the oldest programs, or should we replace
those that are least frequently used or least recently used.
REAL STORAGE MANAGEMENT STRATEGIES(*8m)
SUBHAEDINGS
INTRODUCTION.
CATEGORIES.
DIAGRAM-HIERARCHICAL STORAGE ORGANIZATION
Storage management strategies are used to obtain the best possible use of the main storage resource. Storage management strategies are divided into the following categories
Fetch strategies
Demand fetches strategies
Anticipatory fetch strategies
Placement strategies
Replacement strategies.
System Software And Operating System
125
System Software And Operating System
126
Fetch strategies are concerned with when to obtain the next piece of program or data for
transfer to main storage from secondary storage. Demand fetch, in which the next piece of program
or data is brought into the main storage when it is referenced by a running program. Placement
strategies are concerned with determining where in main storage to place an incoming program.
Replacement strategies are concerned with determining which piece of program are data to displace
to make room for incoming programs.
CONTIGUOUS VS NONCONTIGUOUS STORAGE ALLOCATION(*8m)
Memory allocation is the process of reserving a partial or complete portion of computer
memory for the execution of programs and processes. Memory allocation is achieved through a
process known as memory management. Memory allocation is primarily a computer hardware
operation but is managed through operating system and software applications. Once the program has
finished its operation or is idle, the memory is released and allocated to another program or merged
within the primary memory.
Memory allocation has two core types:
Static Memory Allocation: The program is allocated memory at compile time.
System Software And Operating System
127
System Software And Operating System
128
SINGLE USER CONTIGUOUS STORAGE ALLOCATION
SUBHEADINGS
INTRODUCTION.
DIAGRAM-SINGLE USER CONTIGUOUS STORAGE ALLOCATION.
PROTECTION IN SINGLE USER SYSTEM.
DIAGRAM- STORAGE PROTECTION IN SINGLE USER SYSTEM.
DIAGRAM-TYPICAL OVERLAY STRUCTURE.
SINGLE STREAM BATCH PROCESING.
The earliest computer systems allowed only a single person at a time to use the machine. All
of the machines resources were at the user’s disposal. User wrote all the code necessary to
implement a particular application, including the highly detailed machine level input/output
instructions. To implement basic functions was consolidated into an input/output control system
(ions).
Programs are limited in size to the amount of main storage, but it is possible to run programs
larger than the main storage by using overlays.
If a particular program section is not needed for the duration of the program’s execution, then
another section of the program may be brought in from the secondary storage to occupy the storage
used by the program section that is no longer needed.
SINGLE USER CONTIGUOUS ALLOCATION SYSTEM
System Software And Operating System
129
PROTECTION IN SINGLE USER SYSTEMS:
In single user contiguous storage allocation systems, the user has complete control over all of
main storage. Storage is divided into a portion holding operating system routines, a portion holding
the user’s program and an unused portion.
Suppose the user destroys the operating system for example, suppose certain input/output are
accidentally changed. The operating system should be protected from the user. Protection is
implemented by the use of a single boundary register built into the CPU. Each time a user program
refers to a storage address, the boundary register is checked to be certain that the user is not about to
destroy the operating system. The boundary register contains the highest address used by the
operating system. If the user tries to enter the operating system, the instruction is intercepted and the
job terminates with an appropriate error message.
STORAGE PROTECTION WITH SINGLE USER CONTIGUOUS STORAGE ALLOCATION
System Software And Operating System
130
The user needs to enter the operating system from time to time to obtain services such as
input/output. This problem is solved by giving the user a specific instruction with which to request
services from the operating system( ie., A supervisor call instruction). The user wanting to read from
tape will issue an instruction asking the operating system to do so in the user’s behalf.
Operating system must not be damaged by programs
System cannot function if operating system overwritten
Boundary register
Contains address where program’s memory space begins
Any memory accesses outside boundary are denied
Can only be set by privileged commands
Applications can access OS memory to execute OS procedures
Using system calls, which places the system in executive mode.
TYPICAL OVERLAY STRUCTURE.
System Software And Operating System
131
SINGLE STREAM BATCH PROCESING:
Early single-user real storage systems were dedicated to one job for more than the job’s
execution time. During job setup and job tear down the computer is idle. Designers realized that if
they could automate job-to-job transition, then they could reduce considerably the amount of time
wasted between jobs. In single stream batch processing, jobs are grouped in batches by loading them
consecutively onto tape or disk. A job stream processor reads the job control language statements
and facilitates the setup of the next job. When the current job terminates the job stream reader
automatically reads in the control language statements for the next job, and facilitate the transition to
the next job .
Early systems required significant setup time
o Wasted time and resources
o Automating setup and teardown improved efficiency
Batch processing
o Job stream processor reads job control languages
Defines each job and how to set it up
FIXED PARTITION MULTIPROGRAMMING-FPM(*8m)
SUBHEADINGS
INTRODUCTION.
DIAGRAM-CPU UTILIZATION ON SINGLE USER SYSTEM.
FPM : ABSOLUTE TRANSLATION AND LOADING.
DIAGRAM- FPM : ABSOLUTE TRANSLATION AND LOADING.
DIAGRAM- FPM : EXAMPLE OF POOR STORAGE UTILIZATION.
FPM : RELOCATABLE TRANSLATION AND LOADING.
DIAGRAM- FPM :RELOCATABLE TRANSLATION AND LOADING.
STORAGE PROTECTION IN MULTIPROGRAMMING SYSTEMS.
DIAGRAM-STORAGE PROTECTION IN MULTIPROGRAMMING SYSTEM.
FRAGMENTATION IN FIXED PARTITION MULTIPROGRAMMING.
In batch processing systems, single user systems waste a considerable amount of the
computing resource. I/O speeds are extremely slow compared to CPU speed. Multiprogramming is
System Software And Operating System
132
one solution to such problems, where I/o and CPU calculations can occur simultaneously. This
increases CPU utilization and system throughput. It requires more storage than single user system.
CPU UTILIZATION ON SINGLE USER SYSTEM.
CPU UTILIZATION ON A SINGLE USER SYSTEM
The program consumes the CPU resource until an input or output is needed
When the input and output request is issued the job often can’t continue until the requested
data is either sent or received.
System Software And Operating System
133
Input and output speeds are extremely slow compared with CPU’s speeds
To increase the utilization of the CPU multiprogramming systems are implemented in which
several users simultaneously compete for system resources .
Advantage of multiprogramming is several jobs should reside in the computer‘s main storage
at once. Thus when one job requests input/output ,the CPU may immediately switched to
another job and may do calculations without delay. Thus both input/output and CPU
calculations can occur simultaneously. This greatly increases CPU utilization and system
through put.
Multiprogramming normally requires considerably more storage than a single user system.
Because multi-user programs has to be stored inside the main storage.
FPM : ABSOLUTE TRANSLATION AND LOADING
The earliest multiprogramming systems used fixed partition multiprogramming.
The main storage is divided into a number of fixed size partitions.
Each partition could hold a single job.
CPU switches between users to create simultaneity.
Jobs were translated with absolute assemblers & compilers to run only in a specified
partition.
If a job was ready to run and its partition was occupied, then that job had to wait, even if
other partitions were available.
This resulted in waste of the storage resource.
FPM : ABSOLUTE TRANSLATION AND LOADING
System Software And Operating System
134
EXAMPLE OF POOR STORAGE UTILIZATION
System Software And Operating System
135
FPM : RELOCATABLE TRANSLATION AND LOADING:
*Relocating compilers, assemblers and loaders are used to produce reloadable programs that can run
in any available partition that is large enough to hold them.
*This scheme eliminates storage waste inherent in multiprogramming with absolute translation and
loading.
*Relocatable translators and loaders are more complex than their absolute counterparts.
FPM : RELOCATABLE TRANSLATION AND LOADING
System Software And Operating System
136
PROTECTION IN MULTIPROGRAMMING SYSTEMS:
In contiguous allocation multiprogramming systems, protection is implemented with
boundary registers.
With two registers, the low and high boundaries of a user partition can be delineated or the
low boundary (high boundary) and the length of the region can be indicated.
The user wants any service to be done by operating system. The user can request operating
system through supervisor call instruction (SVC).
This allows the user to cross the boundary of the operating system without compromising
operating system security.
Storage protection in contiguous allocation multiprogramming systems. While the user in partition
2 is active, all storage addresses developed by the running program are checked to be sure they fall
between b and c.
STORAGE PROTECTION IN MULTIPROGRAMMING SYSTEM
System Software And Operating System
137
FRAGMENTATION IN FIXED PARTITION MULTIPROGRAMMING:
There are two difficulties with the use of equal-size fixed partitions.
A program may be too big to fit into a partition. In this case, the programmer must design the
program with the use of the overlays, so that only a portion of the program need be in main
memory at any one-time.
Main memory use is extremely inefficient. Any program, no matter how small, occupies an
entire partition. In our example, there may be a program that occupies less than 128KB of
memory, yet it takes up a 512K partition whenever it is swapped in. This phenomenon, in
which there is wasted space internal to a partition due to the fact that the block of data
located is smaller than the partition, is referred to as internal fragmentation.
VARIABLE PARTITION MULTIPROGRAMMING(VPM)
SUBHEADINGS
INTRODUCTION.
DIAGRAM-INITIAL PARTITION ASSIGNMENTS IN VPM.
DIAGRAM-STORAGE HOLES IN VPM.
COALESCING HOLES IN VPM.
DIAGRAM- COALESCING HOLES IN VPM.
STORAGE COMPACTION.
DIAGRAM- STORAGE COMPACTION.
STORAGE PLACEMENT STRATEGIES.
**BEST-FIT STRATEGY.
**FIRST-FIT STRATEGY.
**WORST-FIT STRATEGY.
INTRODUCTION
To overcome the problems with fixed partition multiprogramming, a method is used to
allow jobs to occupy as much space as needed.
No fixed boundaries are specified here.
The method of giving jobs as much as storage required is called variable partition
multiprogramming.
System Software And Operating System
138
There are no assumptions of the size of the job.As the jobs arrive,if the scheduling
mechanisms decide that it should proceed,it is given as much storage as required. There is
no waste and a jobs partition is exactly the size of the job.
INITIAL PARTITION ASSIGNMENTS IN VPM
STORAGE HOLES IN VARIABLE PA
PARTITION PROGRAMMING.
USER I NEEDS 09K.
USER H NEEDS 18K.
USER G NEEDS 11K.
USER F NEEDS 32K.
USER E NEEDS 14K.
USER D NEEDS 25K.
USER C NEEDS 10K.
USER B NEEDS 20K.
USER A NEEDS 15K.
OS
USER A 15K.
FREE.
OS
USER A 15K.
USER B 20K.
FREE.
OS
USER A 15K.
USER B 20K.
USER C 10K.
FREE.
OS
USER A 15K.
USER B 20K.
USER C 10K.
USER D 25K.
FREE.
System Software And Operating System
139
An example of variable partition programming is shown using 1MB of main memory. Main
memory is empty except for the operating system. The first three processes are loaded in starting
where the operating system ends, and occupy just enough space for each process. This leaves a
“hole”(ie a unused space) at the end of memory that is too small for a fourth process. At some point,
none of the processes in memory is ready. The operating system therefore swaps out process 2,
which leaves sufficient room to load a new process, process 4. Because process 4 is smaller than
process 2, another small hole is created.
Then the operating system swaps out process 1, and swaps process 2 back in. As this
example this method starts out well but leads to a lot of small holes in memory. As time goes on,
memory becomes more and more fragmented, and memory use declines. This phenomenon is called
external fragmentation. One technique for overcoming external fragmentation is compaction.
System Software And Operating System
140
COALESCING HOLES
When a job finishes in a variable partition multiprogramming system, we can check whether the
storage being freed borders on other free storage areas(holes). If it does then we may record in the
free storage list either
(1) an additional hole or
(2) a single hole reflecting the merger of the existing hole and the new adjacent hole.
The process of merging adjacent hole to form a single larger hole is called coalescing. By
coalescing we reclaim, the largest possible contiguous block of storage.
COALESCING HOLES IN VPM.
STORAGE COMPACTION
Sometimes when a job requests a certain amount of main storage no individual holes is large
enough to hold the job, even though the sum of all the holes is larger than the storage needed by the
new job.
User 6 wants to execute his program . The program requires 100k of storage in main storage.
But he cannot use the main storage of his program in contiguous storage allocation. Because 100k of
storage is available but divided into 20k, 40k and 40k. So user 6 programs cannot be stored in the
storage area. So the memory space is wasted. To avoid this technique storage compaction is used.
System Software And Operating System
141
Compaction attacks the problem of fragmentation by moving all the allocated blocks to one
end of memory, thus combining all the holes. Aside from the obvious cost of all that copying, there
is an important limitation to compaction: Any pointers to a block need to be updated when the block
is moved. Unless it is possible to find all such pointers, compaction is not possible. Pointers can
stored in the allocated blocks themselves as well as other places in the client of the memory
manager.
In some situations, pointers can point not only to the start of blocks but also into their bodies.
For example, if a block contains executable code, a branch instruction might be a pointer to another
location in the same block. Compaction is performed in three phases. First, the new location of each
block is calculated to determine the distance the block will be moved. Then each pointer is updated
by adding to it the amount that the block it is pointing into will be moved. Finally, the data is
actually moved. There are various clever tricks possible to combine these operations.
STORAGE COMPACTION IN VPM.
The technique of storage compaction involves moving all occupied areas of storage to one end
or the other of main storage. This leaves a single large hole for storage hole instead of the numerous
small holes common in variable partition multiprogramming. Now all of the available free storage is
System Software And Operating System
142
contiguous so that a waiting job can run if its memory requirement is met by the single hole that
results from compaction.
Drawbacks of Compaction are:
It consumes system resources that could otherwise be used productively.
The system must stop everything while it performs the compaction. This can result
inerratic response times for interactive users and could be devastating in real-time
systems.
Compaction involves relocating the jobs that are in storage. This means that
relocation information, ordinarily lost when a program is loaded, must now be
maintained in readily accessible form.
With a normal, rapidly changing job mix, it is necessary to compact frequently.
STORAGE PLACEMENT STRATEGIES(*5m):
Storage placement strategies are used to determine where in the main storage to place
incoming programs and data
Three strategies of storage placement are
1) Best-fit Strategy: An incoming job is placed in the hole in main storage in which it fits most
tightly and leaves the smallest amount of unused space.
Best-fit strategy
Place job in the smallest possible hole in which it will fit
Free storage list (kept in ascending order by hole size)
Fig: First-fit, best-fit and worst-fit memory placement strategies
System Software And Operating System
143
Fig: First-fit, best-fit and worst-fit memory placement strategies
System Software And Operating System
144
2) First-fit Strategy: An incoming job is placed in the main storage in the first available hole large
enough to hold it
Fig: Worst-fit Strategy
2) Worst-fit Strategy: Worst fit says to place a program in main storage in the hole in which it
fits worst ie., the largest possible hole. The idea behind is after placing the program in this
large hole, the remaining hole often is also large and is thus able to hold a relatively large
new program.
System Software And Operating System
145
VIRTUAL STORAGE
The term virtual storage is associated with the ability to address a storage space much larger
than that available in the primary storage of a particular computer system.
The Virtual Storage in MVS refers to the use of virtual memory in the operating system.
Virtual storage or memory allows a program to have access to the maximum amount of memory in a
system even though this memory is actually being shared among more than one application program.
The operating system translates the program's virtual address into the real physical memory
address where the data is actually located. The Multiple in MVS indicates that a separate virtual
memory is maintained for each of multiple task partitions.
The two most common methods of implementing virtual storage are paging and
segmentation. Fixed-Size blocks are called pages; variable-size blocks are called segments.
VIRTUAL STORAGE MANAGEMENT STRATEGIES(*5m):
SUBHEADINGS
INTRODUCTION.
FETCH STRATEGY.
**DEMAND FETCH.
**DIAGRAM-DEMAND FETCH.
**ANTICIPATORY FETCH.
PLACEMENT STRATEGY .
REPLACEMENT STRATEGY.
**PRINCIPLE OF OPTIMALITY
**RANDOM PAGE REPLACEMENT
**DIAGRAM-REPLACEMENT STRATEGY.
**FIRST-IN-FIRST-OUT
**LEAST-RECENTLY USED
**LEAST-FREQUENTLY USED
System Software And Operating System
146
**NOT-USED-RECENTLY
**SECOND CHANCE
**CLOCK
INTRODUCTION:
The term virtual storage is associated with the ability to address a storage
space much larger than that available in the primary storage of a particular computer system.
The two most common methods of implementing virtual storage are paging and
segmentation. Fixed-Size blocks are called pages; variable-size blocks are called segments.
FETCH STRATEGY
*
*Demand Fetch Scheme: It is concerned with when a page or segment should be brought
from secondary to primary storage.
– Demand fetch strategy wait for a page or segment to be referenced by a running
process before bringing the page or segment to primary storage When a process first
executes, the system loads into main memory the page that contains its first instruction
– After that, the system loads a page from secondary storage to main memory only when the process explicitly references that page
– Requires a process to accumulate pages one at a time.
DEMAND FETCH
System Software And Operating System
147
**Anticipatory Fetch Schemes: Anticipatory fetch strategies attempt to determine in advance
what pages or segments will be referenced by a process.An attempt to predict the pages a process
will need and preloads these pages when memory space is available.It must be carefully designed so that
overhead incurred by the strategy does not reduce system performance.
PLACEMENT STRATEGIES:
These are concerned with where in primary storage to place an income page or segment.
REPLACEMENT STRATEGIES:
These are concerned with deciding which page or segment to displace to make room for an
incoming page or segment when primary storage is already fully committed. In this case
operating system storage management routines must decide which page in primary storage to
displace to make room for an incoming page.
1) principle of optimality
2) Random page replacement
3) First-in-first-out
4) Least-recently used
5) Least-frequently used
6) Not-used-recently
7) Second chance
8) Clock
9) Working set
10) Page fault frequency
**The principle of optimality:
The principle of optimality states that to obtain optimum performance the page to
replace is the one that will not be used again for the furthest time in future.
**Random Page Replacement:
It is low-overhead . No discrimination against particular processes .It easily selects as the next page
to replace the page that will be referenced next. It is rarely used. All pages in main storage thus
have an equal likelihood of being selected for replacement. This strategy could select any page for
replacement, including the next page to be referred.
System Software And Operating System
148
REPLACEMENT STRATEGY
**FIRST-IN-FIRST-OUT (FIFO) Page Replacement:
When a page needs to be replaced, we choose the one that has been in storage the longest.
First-in-first-out is likely to replace heavily used pages because the reason a page has been in
primary storage for a long time may be that it is in constant use.
**LEAST-RECENTLY-USED (LRU) Page Replacement:
This strategy selects that page for replacement that has not been used for the longest time.
LRU can be implemented with a list structure containing one entry for each occupied page frame.
Each time a page frame is referenced, the entry for that page is placed at the head of the list. Older
entries migrate toward the tail of the list. When a page must be replaced to make room for an
incoming page, the entry at the tail of the list is selected, the corresponding page frame is freed, the
incoming page is placed in that page frame, and the entry for that page frame is placed at the head of
the list because that page is now the one that has been most recently used.
**LEAST-FREQUENTLY-USED (LFU) Page Replacement:
System Software And Operating System
149
In this strategy the page to replace is that page that is least frequently used or least intensively
referenced. The wrong page could be selected for replacement. For example, the least frequently
used page could be the page brought into main storage most recently.
**Not- Used- Recently Page Replacement:
It has approximate LRU with less overhead . It uses 2 indicator bits per page:
• referenced bit
• modified bit
The bits are reset periodically .The order for page replacement is
• un-referenced page
• un-modified page
It is supported in hardware on modern systems. Pages not used recently are not likely to be used
in the near future and they may be replaced with incoming pages.
The NUR strategy is implemented with the addition of two hardware bit per page. These are
a) Referenced bit=0 if the page has not been referenced
=1 if the page has been referenced
B) Modified bit=0 if the page has not been modified
=1 if the page has been modified
The NUR strategy works as follows. Initially, the referenced bits of all pages are set to 0. As a
reference to a particular page occurs, the referenced bit of that page is set to 1. When a page is to be
replaced we first try to find a page which has not been referenced.
**MODIFICATIONS TO FIFO; CLOCK PAGE REPLACEMENT AND SECOND
CHANCE PAGE REPLACEMENT:
The second chance variation of FIFO examines the referenced bit of the oldest page; if this
bit is off, the page is immediately selected for replacement. If the referenced bit is on, it is set off and
the page is moved to the tail of the FIFO list and treated essentially as a new arrival; this page
gradually moves to the head of the list from which it will be selected for replacement only if its
referenced bit is still off. This essentially gives the page a second chance to remain in primary
storage if indeed its referenced bit is turned on before the page reaches the head of the list.
LOCALITY
System Software And Operating System
150
Locality is a property exhibited by running processes, namely that processes tend to favor a
subset of their pages during an execution interval. Temporal locality means that if a process
reference a page, it will probably reference that page again soon. Spatial locality means that if a
process references a page it will probably reference adjacent pages in its virtual address space.
Locality
Two types: spatial locality and temporal locality
Real programs in execution tend to display both types of locality
Spatial Locality:
references to addresses close to the current address (in the virtual address space)
natural occurrence in our programs
e.g. may be using instructions within a few pages and data from a few pages for a relatively
long time
to deal with slowly changing spatial locality, the OS may to use prediction (prepaging)
Temporal Locality
next address will be one that has been used recently (use some instructions over and over)
loops have both types of locality
paging takes care of most of the temporal locality problems (pages in 4k blocks)
WORKING SETS:
SUBHEADINGS.
INTRODUCTION.
DIAGRAM-DEFINITION OF A PROCESSE’S WORKING SET OF PAGES.
DIAGRAM-WORKING SET SIZE AS A FUNCTION OF WINDOW SIZE.
DIAGRAM-PRIMARY STORAGE ALLOCATION UNDER WS STORAGE.
INTRODUCTION:
Denning developed a view of program paging activity called the working set theory of
program behavior. A working set is a collection of pages a process is actively referencing. To run
a program efficiently, its working set of pages must be maintained in primary storage.
Otherwise excessive paging activity called thrashing might occur as the program repeatedly
requests pages from secondary storage.
DEFINITION OF A PROCESSE’S WORKING SET OF PAGES.
System Software And Operating System
151
WORKING SET SIZE AS A FUNCTION OF WINDOW SIZE.
PRIMARY STORAGE ALLOCATION UNDER WORKING SET STORAGE MANAGEMENT.
Process
execution time.
W
t-W t
The pages referenced by the
process during this time interval c is
the working set W(t,w).
PROGRAM -----------------------------------------------------------------------------------------SIZE.
WORKING SET
SIZE.
System Software And Operating System
152
Process time is the time during which a process has the CPU.The variable W is called the
Working set window size. The real working set of a process is the set of pages that must be in
the primary storage for a process to execute efficiently.
DEMAND PAGING(*5m):
A thumb rule is that thrashing can be avoided by giving processes enough page frames to hold half
their virtual space.
A working set storage management policy is to maintain the working sets of active
programs in primary storage. The decision to add a new process to the active set of processes is
based on the availability of sufficient space in the primary storage to accommodate the working set
of pages of the new process. The working set of pages of a process, W(t,w) at time t, is the set of
pages referenced by the process during time interval ( t-w) to t.
NUMBER
OF
PRIMAR
Y
STORAG
E
SPACES
ALLOCA
TED
TO THIS
PROCESS
.
TIME.
FIRST
WORKI
-NG
SET .
SECON-
--D
WORKI-
-NG
SET.
THIRD
WORKIN-
-G
SET.
FOURT-
-H
WORKI-
-NG
SET.
TRANSITION BETWEEN WORKING SETS.
System Software And Operating System
153
SUBHEADINGS
INTRODUCTION.
DIAGRAM-DEMAND PAGING.
DIAGRAM-VIRTUAL MEMORY ADDRESSES.
DIAGRAM-SPACE-TIME PRODUCT UNDER DEMAND PAGING.
In virtual memory systems, demand paging is a type of swapping in which pages of data
are not copied from disk to RAM until they are needed. In contrast, some virtual memory
systems use anticipatory paging, in which the OPERATING SYSTEM attempts to anticipate which
data will be needed next and copies it to RAM before it is actually required.
As there is much less physical memory than virtual memory the operating system must be
careful that it does not use the physical memory inefficiently. One way to save physical memory is
to only load virtual pages that are currently being used by the executing program. For example, a
database program may be run to query a database. In this case not all the database needs to be loaded
into memory, just those data records that are being examined.
DEMAND PAGING
The Demand Paging is also same with the Simple Paging. But the Main Difference is that in
the Demand Paging Swapping is used. Means all the Pages will be in and out from the Memory
when they are required. When we specify a Process for the Execution then the Processes is stored
firstly on the Secondary Memory which is also known as the Hard Disk.
But when they are required then they are Swapped Backed into the Memory and when a
Process is not used by the user then they are Temporary Swapped out from the Memory. Means they
are Stored on the Disk and after that they are Copied into the Memory.
So Demand Paging is the Concept in which a Process is Copied into the Logical Memory
from the Physical Memory when we needs them. A Process can load either Entire, Copied into the
Main Memory or the part of single Process is copied into the Memory so that is only the single Part
of the Process is copied into the Memory then this is also called as the Lazy Swapping.
For Swapping the Process from the Main Memory or from the Physical Memory, a Page
Table must be used. The Page Table is used for Storing the Entries which Contains the Page or
Process Number and also the offset Number which indicates the address of the Process where a
Process is Stored and there will also be the Special or Extra Bit which is also Known as the Flag Bit
which indicates whether the Page is Stored into the Physical Memory.
System Software And Operating System
154
The Page Table Contains two Entries those are used as valid and invalid means whether the
Process is Stored into the Page Table. Or Whether the Demand Program is Stored into the Physical
Memory So that they can be easily swapped. If the Requested Program is not stored into the Page
Table then the Page Table must Contains the Entries as v and I means valid and invalid along the
Page Number.
When a user Request for any Operation then the Operating System perform the following
instructions:-
1) First of all this will fetch all the instructions from the Physical Memory into the Logical Memory.
2) Decode all the instructions means this will find out which Operation has to be performed on the
instructions.
3) Perform Requested Operation.
4) Stores the Result into the Logical Memory and if needed the Results will be Stored into the
Physical Memory.
As there is much less physical memory than virtual memory the operating system must be
careful that it does not use the physical memory inefficiently. One way to save physical memory is
to only load virtual pages that are currently being used by the executing program. For example, a
database program may be run to query a database. In this case not all of the database needs to be
loaded into memory, just those data records that are being examined. Also, if the database query is a
search query then the it does not make sense to load the code from the database program that deals
with adding new records. This technique of only loading virtual pages into memory as they are
accessed is known as demand paging.
When a process attempts to access a virtual address that is not currently in memory the CPU
cannot find a page table entry for the virtual page referenced. For example, in Figure there is no
entry in Process X's page table for virtual PFN 2 and so if Process X attempts to read from an
address within virtual PFN 2 the CPU cannot translate the address into a physical one. At this point
the CPU cannot cope and needs the operating system to fix things up.
It notifies the operating system that a page fault has occurred and the operating system makes
the process wait whilst it fixes things up. The CPU must bring the appropriate page into memory
from the image on disk. Disk access takes a long time, relatively speaking, and so the process must
wait quite a while until the page has been fetched. If there are other processes that could run then the
operating system will select one of them to run.
The fetched page is written into a free physical page frame and an entry for the virtual PFN is
added to the processes page table. The process is then restarted at the point where the memory fault
occurred. This time the virtual memory access is made, the CPU can make the address translation
and so the process continues to run. This is known as demand paging and occurs when the system is
busy but also when an image is first loaded into memory. This mechanism means that a process can
execute an image that only partially resides in physical memory at any one time.
System Software And Operating System
155
Demand paging
System Software And Operating System
156
VIRTUAL MEMORY ADDRESSES
Also, if the database query is a search query then it does not make sense to load the code
from the database program that deals with adding new records. This technique of only loading
virtual pages into memory as they are accessed is known as demand paging.
System Software And Operating System
157
In Demand Paging:
Pages are evicted to disk when memory is full
Pages loaded from disk when referenced again
OS allocates a page frame, reads page from disk
When I/O completes, the OS fills in PTE, marks it valid, and restarts faulting
process.
SPACE-TIME PRODUCT UNDER DEMAND PAGING.
ONE PAGE
FRAME
F F F F F
P
R
I
M
A
R
Y
S
T
O
R
A
G
E
A
L
L
O
C
A
T
I
O
N
‘F ‘ IS AVERAGE TIME FOR A PAGE FETCH.
PAGE
WAIT.
PAGE
WAIT.
PAGE
WAIT.
PAGE
WAIT. PAGE
WAIT.
PROCESS RUNNING.
System Software And Operating System
158
No pages will be brought from secondary to primary storage until it is explicitly referenced by a
running process. Demand paging guarantees that the only pages brought to main storage are those
actually needed by processes. As each new page is referenced, the process must wait while the new
page is transferred to primary storage.
SUBHEADINGS.
DEFINITION.
ADVANTAGES.
DIAGRAM- INTERNAL FRAGMENTATION IN A PAGED SYSTEM.
TABLE- SOME COMMON PAGE SIZES.
DERIVATION.
DEFINITION:
Page size refers to the size of a page, which is a block of stored memory.
Page size affects the amount of memory needed and space used when running programs.
Most operating systems allow for the determination of the page size when a program begins running,
which allows it to calculate the most efficient use of memory while running that program.
The basic method for implementation involves breaking physical memory into fixed-sized
blocks called FRAMES and break logical memory into blocks of the same size called PAGES.
The basic idea is to allocate physical memory to processes in fixed size chunks called page
frames. Present abstraction to application of a single linear address space. Inside machine, break
up the address space of application into fixed size chunks called pages. Pages and page frames are
same size.
Store pages in page frames. When process generates an address, dynamically translate to the
physical page frame which holds data for that page.
So, a virtual address now consists of two pieces:
**page number
** an offset within that page.
Page sizes are typically powers of 2.This simplifies extraction of page numbers and offsets. To
access a piece of data at a given address, system automatically does the following:
Extracts page number.
Extracts offset.
Translate page number to physical page frame id.
System Software And Operating System
159
Accesses data at offset in physical page frame.
ADVANTAGES:
A number of issues affect the determination of optimum page size for a given system
A small page size causes larger page table. The waste of storage due to excessively large
tables is called table fragmentation.
A large page size causes large amount of information that ultimately may not be referenced
are paged into primary storage.
I/O transfers are more efficient with large pages.
Localities tend to be small.
Internal fragmentation is reduced with small.
In the balance, most designers feel that pages factors point to the need for small pages.
INTERNAL FRAGMENTATION IN A PAGED SYSTEM.
FIRST
PAGE OF
SEGMENT. SECOND
PAGE OF
SEGMENT. S S LAST
AVERAGE PAGE OF
WASTE OF SEGMENT.
½ PAGE.
System Software And Operating System
160
The important consideration is the size of pages to use. Many MMUs allow various different page
sizes.
Small size: It has less wasted space due to internal fragmentation. There are fewer unused
pages in memory.
Large page size: It has more efficient disk I/O and smaller page tables.
SOME COMMON PAGE SIZES.
MANUFACTURER MODEL PAGE SIZE. UNIT
HONEYWELL
MUTICS
1024
36-BIT WORD.
IBM
370/168.
1024 / 512.
32-BIT WORD.
DEC
PDP-10.
PDP-20.
512
36-
BIT WORD.
DEC
VAX 8800
512
8—BIT BYTE.
INTEL
80386
4096
8—
BIT BYTE.
DERIVATION:
If ,
the average process segment size is s.
The page table entry size is e bytes.
The page size is p,
then ,
Average amount of internal fragmentation per segment is p/2.
Average number of pages per process segment is s/p.
Each page requires 'e' bytes of page table, so each process segment requires page table
size of es/p bytes.
The Total overhead per segment due to internal fragmentation and page table entries, is
System Software And Operating System
161
se/p + p/2
To minimise the overhead, differentiate with respect to page size, p, and equate to 0:
-se/p2 + ½ = 0
=> p = sqrt(2se)
EXAMPLE:
So for example, if the average segment size were 256K, and the page table entry size were 8
bytes, the optimum page size, to minimise overhead due to page table entries and internal
fragmentation, would be sqrt(2 × 256K × 8) = 2048 = 2K.
This calculation ignores the need to keep page sizes large in order to speed up paging
operations; it only considers memory overheads.
System Software And Operating System
162
SUMMARY.
DEFINITION OF DOS
“An Operating System can be defined as a program, implemented in either software or firmware,
that makes the hardware usable. The Disk Operating System can also be defined as the software that
controls the hardware”.
BASIC FUNCTIONS OF OS
File management
Working with the Files like picking up and preparing to use a tools like
calculator.
Configuration of working environment.
DEFINITIONS OF “PROCESS”
A program in execution.
An asynchronous activity.
The “animated spirit” of a procedure.
The “locus of control” of a procedure in execution.
That entity to which processors are assigned.
The “dispatchable” unit.
PROCESS STATE TRANSITION.
When the process reaches the head of the list, and when the CPU becomes available , the process is
given the CPU and is said to make a state transition from ready state to the running state. The
assignment of the CPU to the first process on the ready list is called dispatching, and is performed
by a system entity called the dispatcher. We indicate this transition as follows
Dispatch (process name): ready --> running.
To prevent any one process to use the system as a monopoly, the operating system sets a
hardware interrupting clock (or interval timer) to allow this user to run for a specific time interval
or quantum.
System Software And Operating System
163
UNIT IV
INTERRUPT
An interrupt is a dynamic event that needs prompt attention by the CPU. Usually an interrupt only
needs a short period of CPU time to serve it. After that the original process can resume its execution.
TYPES
There are two types interrupting events:
HARDWARE INTERRUPTS.
SOFTWARE INTERRUPTS.
FPM - Fixed Partition Multiprogramming.
VPM - The method of giving jobs as much as storage required is called variable partition
multiprogramming.
STORAGE PLACEMENT STRATEGIES
BEST-FIT.
FIRST-FIT.
WORST-FIT.
WORKING SET
A working set is a collection of pages a process is actively referencing.
PAGE SIZE
Page size refers to the size of a page, which is a block of stored memory.
TABLE FRAGMENTATION
A small page size causes larger page table. The waste of storage due to excessively
large tables is called table fragmentation.
System Software And Operating System
164
UNIT V
Processor Management Job and Processor Scheduling: Preemptive Vs Non-preemptive scheduling –
Priorities – Deadline scheduling - Device and Information Management Disk Performance
Optimization: Operation of moving head disk storage – Need for disk scheduling – Seek
Optimization .File and Database Systems: File System – Functions – Organization – Allocating and
freeing space – File descriptor – Access control matrix.
JOB AND PROCESSOR SCHEDULING
The assignment of physical processors to processes allows processes to accomplish work.
The problems of determining when processors should be assigned, and to which processes. This is
called processor scheduling.
SCHEDULING LEVELS
Three important levels of scheduling are considered.
High-Level Scheduling:
Sometimes called job scheduling, this determines which jobs shall be allowed to compete
actively for the resources of the system. This is sometimes called admission scheduling
because it determines which jobs gain admission to the system.
Intermediate-Level Scheduling:
This determines which processes shall be allowed to compete for the CPU.
System Software And Operating System
165
The intermediate-level scheduler responds to short-term fluctuations in system load by
temporarily suspending and activating (or resuming) processes to achieve smooth system
operation and to help realize certain system wide performance goals.
Low-Level Scheduling:
This determines which ready process will be assigned the CPU when it next becomes
available, and actually assigns the CPU to this process.
System Software And Operating System
166
The CPU cannot be taken away from that process. A scheduling discipline is preemptive if
the CPU can be taken away.
Preemptive scheduling is useful in systems in which high-priority require rapid attention. In
real-time systems and interactive timesharing systems, preemptive scheduling is important in
guaranteeing acceptable response times.
To make preemption effective, many processes must be kept in main storage so that the next
process is normally ready for the CPU when it becomes available. Keeping non-running program in
main storage also involves overhead.
In non-preemptive systems, short jobs are made to wait by longer jobs, but the treatment of
all processes is fairer. Response time are more predictable because incoming high-priority jobs
cannot displace waiting jobs.
In designing a preemptive scheduling mechanism, one must carefully consider the
arbitrariness of virtually any priority scheme.
THE INTERVAL TIMER OR INTERRUPTING CLOCK
The processes to which the CPU is currently assigned is said to be running. To prevent users
from monopolizing the system the operating system has mechanisms for taking the CPU away from
the user. The operating system sets an interrupting clock or interval timer to generate an interrupt at
some specific future time. The CPU is then dispatched to the process. The process retains control of
the CPU until it voluntarily releases the CPU, or the clock interrupts or some other interrupt diverts
the attention of the CPU.
If the user is running and the clock interrupts, the interrupt causes the operating system to
run. The operating system then decides which process should get the CPU next. The interrupting
clock helps guarantee reasonable response times to interactive users, to prevent the system from
getting hung up on a user in an infinite loop, and allows processes to respond to time-dependent
events. Processes that need to run periodically depend on the interrupting events.
PRIORITIES
SUBHEADINGS.
INTRODUCTION.
TYPES OF PRIORITIES.
DIAGRAM-DISPATCHING PRIORITIES.
STATIC VS DYNAMIC PRIORITIES.
PURCHASED PRIORITIES
System Software And Operating System
167
INTRODUCTION:
Priorities may be assigned automatically by the system or they may be assigned externally.
They may be static or they may be dynamic. They may be rationally assigned or arbitrarily assigned
in situations in which a system mechanism needs to distinguish between processes but does not
depend on which is more important. They may be earned or they may be bought.
TYPES OF PRIORITIES:
There are two types of priorities.
Static
Dynamic
DISPATCHING PRIORITIES.
System Software And Operating System
168
STATIC VS DYNAMIC PRIORITIES
Static priorities do not change. Static priority mechanisms are easy to implement and have
relatively low overhead. They are not responsive to changes in environment, changes that might
make it desirable to adjust a priority.
Dynamic priority mechanisms are responsive to change. The initial priority assigned to a
process may have only a short duration after which it is adjusted to a more appropriate values.
Dynamic priority schemes are more complex to implement and have greater overhead than static
schemes.
PURCHASED PRIORITIES
An operating system must provide competent and reasonable service to a large community of
users but must also provide for those situations in which a member of the user community needs
special treatment.
A user with a rush job may be willing to pay a premium, ie., purchase priority, for a higher
level of service. This extra charge is merited because resources may need to be withdrawn from
other paying customers. If there were no extra charge, then all users would request the higher level
of service.
DEADLINE SCHEDULING
INTRODUCTION
In deadline scheduling certain jobs are scheduled to be completed within a specific time or
deadline. These jobs may have very high value if delivered on time and may be worthless if
delivered later than the deadline. The user is often willing to pay a premium to have the system
ensure on-time consumption. Deadline scheduling is complex for many reasons.
DEADLINE SCHEDULING
Two kinds of deadlines can be specified for each process: a starting deadline or the latest
instant of time by which execution of the process must start and a completion deadline ,or the time
by which the execution of process must complete.
SUBHEADINGS.
INTRODUCTION.
DIAGRAM - PROCESSES WITH SHORTER DEADLINE.
DIAGRAM - PROCESSES WITH LONGER DEADLINE.
REASONS –SCHEDULING IS COMPLEX.
System Software And Operating System
169
DEADLINE ESTIMATTION
An in-depth analysis of a real time application and its response requirements is carried out
during its development. Deadlines for individual process can be determined by considering process
precedences and working backwards from the response requirement of an application. The deadline
of a process pi is
Di = Dapplication ∑k€descendant(i) xk
Where Dapplication is the deadline of an application, xk is the service time of process pk , and
descendant(i) is the set of descendants of Pi in the PPG , ie, the set of all processes that lie on the
some path of Pi and the exit node of PPG.Thus the deadline of the process Pi is such that if it is met
, all the processes that directly or indirectly depend on Pi can also finish by the overall deadline of
the application. It can be explained as follows:
P1
P5
P3
P2
P4
P6
2
5
4
5
6
3
System Software And Operating System
170
PROCESSES T1,T2,T3 WITH SHORTER DEADLINE(MAX=9)
PROCESSES T1,T2,T3 WITH LONGER DEADLINE(MAX=23)
REASONS - SCHEDULING IS COMPLEX.
The user must supply the resource requirements of the job in advance. Such information is
rarely available. It may generate substantial overhead.
The system must run the deadline job without severely degrading service to other users.
The system must plan its resource requirements through to the deadline because new jobs
may arrive and place unpredictable demands on the system.
If many deadline jobs are to be active at once, scheduling could become so complex.
System Software And Operating System
171
DEVICE AND INFORMATION MANAGEMENT
DISK PERFORMANCE OPTIMIZATION In multi programmed computing systems, inefficiency is often caused by improper use of
rotational storage devices such as disks and drums.
This is a schematic representation of the side view of a moving-head disk. Data is recorded
on a series of magnetic disk or platters. These disks are connected by a common spindle that spins at
very high speed. The data is accessed (i.e., either read or written) by a series of read-write heads, one
head per disk surface. A read-write head can access only data immediately adjacent to it.
Therefore, before data can be accessed, the portion of the disk surface from which the data is
to be read (or the portion on which the data is to be written) must rotate until it is immediately below
(or above) the read-write head. The time it takes for data to rotate from its current position to a
position adjacent to the read-write head is called latency time.
Each of the several read-write heads, while fixed in position, sketches out in circular track of
data on a disk surface. All read-write heads are attached to a single boom or moving arm assembly.
The boom may move in or out. When the boom moves the read-write heads to a new position, a
different set of tracks becomes accessible. For a particular position of the boom, the set of tracks
System Software And Operating System
172
sketched out by all the read-write heads forms a vertical cylinder. The process of moving the boom
to a new cylinder is called a seek operation.
Thus, in
order to access a particular record of data on a moving-head disk, several operations are usually
necessary. First, the boom must be moved to the appropriate cylinder. Then the portion of the disk
on which the data record is stored must rotate until it is immediately under (or over) the read-write
head (i.e., latency time).
Then the record, which is of arbitrary size must be made to spin by the read-write head. This
is called transmission time. This is tediously slow compared with the high processing speeds of the
central computer system.
NEED FOR DISK SCHEDULING:
SUBHEADINGS
INTRODUCTION.
DIAGRAM-FCFS RANDOM SEEK PATTERN.
DESIRABLE CHARACTERS OF DISK SCHEDULING
POLICIES.
System Software And Operating System
173
INTRODUCTION:
In multiprogramming computing systems, many processes may be generating requests for
reading and writing disk records. Because these processes sometimes make requests faster than they
can be serviced by the moving-head disks, waiting lines or queues build up for each device. Some
computing systems simply service these requests on a first-come-first-served (FCFS) basis.
Whichever request for service arrives first is serviced first. FCFS is a fair method of allocating
service, but when the request rate becomes heavy, FCFS can result in very long waiting times.
The numbers indicate the order in which the requests arrived.
FCFS exhibits a random seek pattern in which successive requests can cause time consuming
seeks from the innermost to the outermost cylinders. To minimize time spent seeking records, it
seems reasonable to order the request queue in some manner other than FCFS. This process is called
disk scheduling.
Disk scheduling involves a careful examination of pending requests to determine the most
efficient way to service the requests.
A disk scheduler examines the positional relationships among waiting requests. The request
queue is then reordered so that the requests will be serviced with minimum mechanical motion. The
two most common types of scheduling are seek optimization and rotation (or latency) optimization.
System Software And Operating System
174
DESIRABLE CHARACTERISTICS OF DISK SCHEDULING POLICIES:
Several other criteria for categorizing scheduling policies are
1. Throughput
2. Mean response time
3. Variance of response times (ie. predictability)
A scheduling policy should attempt to maximize throughput
the number of requests serviced per unit time. A scheduling policy should attempt to minimize the
mean response time (or average waiting time plus average service time). Variance is a mathematical
measure of how far individual items tend to deviate from the average of the items. Variance to
indicate predictability- the smaller the variance, the greater the predictability. We desire a scheduling
policy that minimizes variance.
SEEK OPTIMIZATION
Most popular seek optimization strategies are:
1) FCFS (First-Come-First Served) Scheduling:
In FCFS scheduling, the first request to arrive is the first one serviced. FCFS is fair in
the same that once a request has arrived, its place in the schedule is fixed. A request cannot be
displaced because of the arrival of a higher priority request.
FCFS will actually do a lengthy seek to service a distant waiting request even though another
request may have just arrived on the same cylinder to which the read-write head is currently
positioned. It ignores the positional relationships among the pending requests in the queue.
FCFS is acceptable when the load on a disk is light. FCFS tend to saturate the device and
response times become large.
SUBHEADINGS.
**FCFS.
**SSTF.
**DIAGRAM-SSTF.
**SCAN SCHEDULING.
**DIAGRAM-SCAN SCHEDULING.
**N-STEP SCAN SCHEDULING.
**DIAGRAM- N-STEP SCAN SCHEDULING.
**C-SCAN SCHEDULING.
**DIAGRAM- C-SCAN SCHEDULING.
**ESCHENBACH SCHEME.
System Software And Operating System
175
2) SSTF (Shortest-Seek-Time-First) Scheduling:
In SSTF Scheduling, the request that results in the shortest seek distance is serviced next,
even if that request is not the first one in the queue. SSTF is a cylinder –oriented scheme SSTF seek
patterns tend to be highly localized with the result that the innermost and outermost tracks can
receive poor service compared with the mid-range tracks.
SSTF results in better throughput rates than FCFS, and mean response times tend to be lower
for moderate loads. One significant drawback is that higher variance occurs on response times
because of the discrimination against the outermost and innermost tracks.
SSTF is useful in batch processing systems where throughput is the major consideration. But
its high variance of response times (i.e., its lack of predictability) makes it unacceptable in
interactive systems
System Software And Operating System
176
3) SCAN Scheduling:
Denning developed the SCAN scheduling strategy to overcome the discrimination and high
variance in response times of SSTF. SCAN operates like SSTF except that it chooses the request that
results in the shortest seek distance in a preferred direction. If the preferred direction is currently
outward, then the SCAN strategy chooses the shortest seek distance in the outward direction. SCAN
does not change direction until it reaches the outermost cylinder or until there are no further requests
pending in the preferred direction. It is sometimes called the elevator algorithm because an elevator
normally continues in one direction until there are no more requests pending and then it reverses
direction.
SCAN behaves very much like SSTF in terms of improved
Throughput and improved mean response times, but it eliminates much of the discrimination
inherent in SSTF schemes and offers much lower variance.
4) N-STEP SCAN SCHEDULING:
System Software And Operating System
177
One interesting modification to the basic SCAN strategy is called N-STEP SCAN. In this
strategy, the disk arm moves back and forth as in SCAN except that it services only those requests
waiting when a particular sweep begins. Requests arriving during a sweep are grouped together and
ordered for optimum service during the return sweep. N-STEP SCAN offers good performance in
throughput and mean response time. N-STEP has a lower variance of response times than either
SSTF or conventional SCAN scheduling. N-STEP SCAN avoids the possibility of indefinite
postponement occurring if a large number of requests arrive for the current cylinder. It saves these
requests for servicing on the return sweep.
5) C-SCAN SCHEDULING:
System Software And Operating System
178
Another interesting modification to the basic SCAN strategy is called C-SCAN (for circular
SCAN). In C-SCAN strategy, the arm moves from the outer cylinder to the inner cylinder, servicing
requests on a shortest-seek basis. When the arm has completed its inward sweep, it jumps (without
servicing requests) to the request nearer the outermost cylinder, and then resumes its inward sweep
processing requests. Thus C-SCAN completely eliminates the
discrimination against requests for the innermost or outermost cylinder. It has a very small variance
in response times. At low loading, the SCAN policy is best. At medium to heavy loading, C-SCAN
yields the best results.
C-SCAN SCHEDULING.
ESCHENBACH SCHEME
System Software And Operating System
179
This scheme was originally developed for an airline reservation system for handling
extremely heavy loads. This scheme was one of the first to attempt to optimize not only “seek time”
but also “rotational delays” as well. Still the C-SCAN strategy with rorational optimization has
proven to be better than Eschenbach Scheme under all loading conditions.
RAM DISKS A RAM disk is a disk device simulated in conventional random access memory. It
completely eliminates delays suffered in conventional disks because of the mechanical motions
inherent in seeks and in spinning a disk. RAM disks are especially useful in high-performance
applications.
Caching incurs a certain amount of CPU overhead in maintaining the contents of the cache
and in searching for data in the cache before attempting to read the data from disk. If the record
reference patterns are not seen in the cache, then the disk cache hit ratio will be small and the CPU’s
efforts in managing the cache will be waster, possibly resulting in poor performance.
RAM disks are much faster than conventional disks because they involve no mechanical
motion. They are separate from main memory so they do not occupy space needed by the operating
system or applications. Reference times to individual data items are uniform rather than widely
variable as with conventional disks.
RAM disks are much more expensive than regular disks. Most forms of RAM in use today
are volatile ie., they lose their contents when power is turned off or when the power supply is
interrupted. Thus RAM disk users should perform frequent backups to conventional disks. As
memory prices continue decreasing, and as capacities continue increasing it is anticipated that RAM
disks will become increasingly popular.
OPTICAL DISKS Various recording techniques are used. In one technique, intense laser heat is used to burn
microscopic holes in a metal coating. In another technique, the laser heat causes raised blisters on the
surface. In a third technique, the reflectivity of the surface is altered.
The first optical disks were write-once-read-many(WORM) devices. This is not useful for
applications that require regular updating. Several rewritable optical disk products have appeared on
the market recently. Each person could have a disk with the sum total of human knowledge and this
disk could be updated regularly. Some estimates of capacities are so huge that researchers feel it will
be possible to store 10^21 bits on a single optical disk.
An optical disc is an electronic data storage medium that can be written to and read using a
low-powered laser beam. Originally developed in the late 1960s, the first optical disc, created by
James T. Russell, stored data as micron-wide dots of light and dark. A laser read the dots, and the
data was converted to an electrical signal, and finally to audio or visual output. However, the
technology didn't appear in the marketplace until Philips and Sony came out with the compact disc
(CD) in 1982. Since then, there has been a constant succession of optical disc formats, first in CD
formats, followed by a number of DVD formats.
Optical disc offers a number of advantages over magnetic storage media. An optical disc
holds much more data. The greater control and focus possible with laser beams (in comparison to
tiny magnetic heads) means that more data can be written into a smaller space. Storage capacity
System Software And Operating System
180
increases with each new generation of optical media. Emerging standards, such as Blu-ray, offer up
to 27 gigabytes (GB) on a single-sided 12-centimeter disc. In comparison, a diskette, for example,
can hold 1.44 megabytes (MB). Optical discs are inexpensive to manufacture and data stored on
them is relatively impervious to most environmental threats, such as power surges, or magnetic
disturbances.
FILE AND DATABASE SYSTEMS.
INTRODUCTION A file is a named collection of data. It normally resides on a secondary storage device such as a
disk or tape. It may be manipulated as a unit by operations such as
open – prepare a file to be referenced.
close – prevent further reference to a file until it is reopened.
create – build a new file.
destroy – remove a file.
copy – create another version of the file with a new name.
rename – change the name of a file.
list – print or display the contents of a file.
Individual data items within the file may be manipulated by operations like
read – input a data item to a process from a file.
write – output a data item from a process to a file.
update – modify an existing data item in a file.
insert – add a new data item to a file.
delete – remove a data item from a file.
Files may be characterized by
volatility – this refers to the frequency with which additions and deletions are made
to a file.
activity – this refers to the percentage of a file’s records accessed during a given
period of time.
size – this refers to the amount of information stored in the file.
File
A file is a named collection of related information that is recorded on secondary storage such
as magnetic disks, magnetic tapes and optical disks.In general, a file is a sequence of bits, bytes,
lines or records whose meaning is defined by the files creator and user.
File Structure
File structure is a structure, which is according to a required format that operating system can
understand.
A file has a certain defined structure according to its type.
A text file is a sequence of characters organized into lines.
A source file is a sequence of procedures and functions.
System Software And Operating System
181
An object file is a sequence of bytes organized into blocks that are understandable by the
machine.
When operating system defines different file structures, it also contains the code to support
these file structure. Unix, MS-DOS support minimum number of file structure.
File Type
File type refers to the ability of the operating system to distinguish different types of file such
as text files source files and binary files etc. Many operating systems support many types of files.
Operating system like MS-DOS and UNIX have the following types of files:
Ordinary files
These are the files that contain user information.
These may have text, databases or executable program.
The user can apply various operations on such files like add, modify, delete or even remove
the entire file.
Directory files
These files contain list of file names and other information related to these files.
Special files:
These files are also known as device files.
These files represent physical device like disks, terminals, printers, networks, tape drive etc.
These files are of two types
Character special files - data is handled character by character as in case of terminals or
printers.
Block special files - data is handled in blocks as in the case of disks and tapes.
File Access Mechanisms
File access mechanism refers to the manner in which the records of a file may be accessed. There
are several ways to access files
Sequential access
Direct/Random access
Indexed sequential access
Sequential access
System Software And Operating System
182
A sequential access is that in which the records are accessed in some sequence i.e the
information in the file is processed in order, one record after the other. This access method is the
most primitive one. Example: Compilers usually access files in this fashion.
Direct/Random access
Random access file organization provides, accessing the records directly.
Each record has its own address on the file with by the help of which it can be directly
accessed for reading or writing.
The records need not be in any sequence within the file and they need not be in adjacent
locations on the storage medium.
Indexed sequential access
This mechanism is built up on base of sequential access.
An index is created for each file which contains pointers to various blocks.
Index is searched sequentially and its pointer is used to access the file directly.
THE FILE SYSTEM
COMPONENTS:
SUBHEADINGS.
COMPONENTS.
**ACCESS METHODS.
**FILE MANAGEMENT.
**AUXILLARY STORAGE MANAGEMENT.
**FILE INTEGRITY MECHANISMS.
**DIAGRAM-TWO LEVEL HIERARCHICAL FILE
MANAGEMENT SYSTEM.
System Software And Operating System
183
An important component of an operating system is the file system. File systems generally
contain
Access Methods – these are concerned with the manner in which data stored in files
is accessed.
File Management – This is concerned with providing the mechanisms for files to be
stored, referenced, shared and secured.
Auxiliary storage Management – This is concerned with allocating space for files
on secondary storage devices.
File integrity mechanisms – These are concerned with guaranteeing that the
information in a file is uncorrupted.
The file system is primarily concerned with managing
Secondary storage space, particularly disk storage. Let us assume an environment of a large-scale
timesharing system supporting approximately 100 active terminals accessible to a user community of
several thousand users. It is common for user accounts to contain between 10 and 100 files. Thus
with a user community of several thousand users, a system disks might contain 50,000 to 1,00,000 or
more separate files. These files need to be accessed quickly to keep response times small.
A file system for this type of environment may be organized as follows. A root is used to
indicate where on disk the root directory begins. The root directory points to the various user
System Software And Operating System
184
directories. A user directory contains an entry for each of a user’s files; each entry points to where
the corresponding file is stored on disk.
Files names should be unique within a given user directory. In hierarchically structured file
systems, the system name of a file is usually formed as pathname from the root directory to the file.
For e.g., in a two-level file system with users A,B and C and in which A has files PAYROLL and
INVOICES, the pathname for file
PAYROLL is A: PAYROLL.
FILE SYSTEM FUNCTIONS Some of the functions normally attributed to file systems follows.
1) Users should be able to create, modify and delete files.
2) Users should be able to share each others files in a carefully controlled manner in order to
build upon each others work.
3) The mechanism for sharing files should provide various types of controlled access such
as read access, write access, execute access or various combinations of these.
4) Users should be able to structure their files in a manner most appropriate for each
application.
5) Users should be able to order the transfer of information between files.
6) Backup and recovery capabilities must be provided to prevent either accidental loss or
malicious destruction of information.
7) Users should be able to refer to their files by symbolic names rather than having to user
physical devices name (ie., device independence)
8) In sensitive environments in which information must be kept secure and private, the file
system may also provide encryption and decryption capabilities.
9) The file system should provide a user-friendly interface. It should give users a logical
view of their data and functions to be performed upon it rather than a physical view. The
user should not have to be concerned with the particular devices on which data is stored,
the form the data takes on those devices, or the physical means of transferring data to and
from these devices.
File-System Structure
Hard disks have two important properties that make them suitable for secondary storage of
files in file systems:
(1) Blocks of data can be rewritten in place, and
(2) They are direct access, allowing any block of data to be accessed with only ( relatively )
minor movements of the disk heads and rotational latency. ( See Chapter 12 )
Disks are usually accessed in physical blocks, rather than a byte at a time. Block sizes may
range from 512 bytes to 4K or larger.
File systems organize storage on disk drives, and can be viewed as a layered design:
o At the lowest layer are the physical devices, consisting of the magnetic media, motors
& controls, and the electronics connected to them and controlling them. Modern disk
System Software And Operating System
185
put more and more of the electronic controls directly on the disk drive itself, leaving
relatively little work for the disk controller card to perform.
o I/O Control consists of device drivers, special software programs ( often written in
assembly ) which communicate with the devices by reading and writing special codes
directly to and from memory addresses corresponding to the controller card's
registers. Each controller card ( device ) on a system has a different set of addresses
( registers, a.k.a. ports ) that it listens to, and a unique set of command codes and
results codes that it understands.
o The basic file system level works directly with the device drivers in terms of
retrieving and storing raw blocks of data, without any consideration for what is in
each block. Depending on the system, blocks may be referred to with a single block
number, ( e.g. block # 234234 ), or with head-sector-cylinder combinations.
o The file organization module knows about files and their logical blocks, and how
they map to physical blocks on the disk. In addition to translating from logical to
physical blocks, the file organization module also maintains the list of free blocks,
and allocates free blocks to files as needed.
o The logical file system deals with all of the meta data associated with a file ( UID,
GID, mode, dates, etc ), i.e. everything about the file except the data itself. This level
manages the directory structure and the mapping of file names to file control blocks,
FCBs, which contain all of the meta data as well as block number information for
finding the data on the disk.
The layered approach to file systems means that much of the code can be used uniformly for
a wide variety of different file systems, and only certain layers need to be file system
specific. Common file systems in use include the UNIX file system, UFS, the Berkeley Fast
File System, FFS, Windows systems FAT, FAT32, NTFS, CD-ROM systems ISO 9660, and
for Linux the extended file systems ext2 and ext3 ( among 40 others supported. )
File-System Implementation
Overview
File systems store several important data structures on the disk:
o A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or the partition
boot sector in Windows contains information about how to boot the system off of
this disk. This will generally be the first sector of the volume if there is a bootable
system loaded on that volume, or the block will be left vacant otherwise.
o A volume control block, ( per volume ) a.k.a. the master file table in UNIX or the
superblock in Windows, which contains information such as the partition table,
number of blocks on each filesystem, and pointers to free blocks and free FCB
blocks.
o A directory structure ( per file system ), containing file names and pointers to
corresponding FCBs. UNIX uses inode numbers, and NTFS uses a master file table.
o The File Control Block, FCB, ( per file ) containing details about ownership, size,
permissions, dates, etc. UNIX stores this information in inodes, and NTFS in the
master file table as a relational database structure.
There are also several key data structures stored in memory:
System Software And Operating System
186
o An in-memory mount table.
o An in-memory directory cache of recently accessed directory information.
o A system-wide open file table, containing a copy of the FCB for every currently
open file in the system, as well as some other related information.
o A per-process open file table, containing a pointer to the system open file table as
well as some other information. ( For example the current file position pointer may be
either here or in the system file table, depending on the implementation and whether
the file is being shared or not. )
The interactions of file system components when files are created and/or used:
o When a new file is created, a new FCB is allocated and filled out with important
information regarding the new file. The appropriate directory is modified with the
new file name and FCB information.
o When a file is accessed during a program, the open( ) system call reads in the FCB
information from disk, and stores it in the system-wide open file table. An entry is
added to the per-process open file table referencing the system-wide table, and an
index into the per-process table is returned by the open( ) system call. UNIX refers to
this index as a file descriptor, and Windows refers to it as a file handle.
o If another process already has a file open when a new request comes in for the same
file, and it is sharable, then a counter in the system-wide table is incremented and the
per-process table is adjusted to point to the existing entry in the system-wide table.
o When a file is closed, the per-process table entry is freed, and the counter in the
system-wide table is decremented. If that counter reaches zero, then the system wide
table is also freed. Any data currently stored in memory cache for this file is written
out to disk if necessary.
THE DATA HIERARCHY:
Bits are grouped together in bit patterns to represent all data items. There are 2^n possible
bit patterns for a string of n bits.
The two most popular character sets in use today are ASCII (American Standard Code for
Information Interchange) and EBCDIC (Extended Binary Coded Decimal Interchange Code).
ASCII is popular in personal computers and in data communication systems. EBCDIC is popular for
representing data internally in mainframe computer systems, particularly those of IBM.
A field is a group of characters. A record is a group of fields. A record key is a control field
that uniquely identifies the record. A file is a group of related records. A database is a collection of
files.
BLOCKING AND BUFFERING:
A physical record or block is the unit of information actually read from or written to a
device. A logical record is a collection of data treated as a unit from the user’s standpoint. When
each physical record contains exactly one logical record, the file is said to consist of unblocked
records. When each physical record may contain several logical records, the file is said to consist of
blocked records. In a file with fixed-length records, all records are the same length. In a file with
variable-length records, records may vary in size up to the block size.
Buffering allows computation to proceed in parallel with input/output. Spaces are provided in
primary storage to hold several
System Software And Operating System
187
Physical blocks of a file at once – each of these spaces is called a buffer. The most common scheme
is called double buffering and it operates as follows (for output). There are two buffers. Initially,
records generated by a running process are deposited in the first buffer until it is full. The transfer of
the block in the first buffer to secondary storage is then initiated. While this transfer is in progress,
the process continues generating records that are deposited in the second buffer. When the second
buffer is full, and when the transfer from the first buffer is complete, transfer from the second buffer
is initiated. The process continues generating records that are now deposited in the first buffer. This
alternation between the buffers allows input/output to occur in parallel with a process’s
computations.
The Bridge Between The Logical and The Physical
Block:
Smallest amount of data that can be read from or written to secondary storage at one time.
Often generalized to mean any chunk of data that can be treated as a unit (for reading, writing,
organizing). We will distinguish between disk blocks (physical) and program defined blocks
(logical).
- can't always ensure that logical and physical blocks match (often don't even want to).
- should make sure they complement each other
- logical blocks should not be split between physical blocks
- it's often more efficient to waste a little physical space in order to achieve a better match
eg. logical blocks = 10 bytes; physical blocks = 32 bytes; so fit 3/p.b. (waste 2 bytes per physical
block)
Blocking:
The process of grouping several components into one block
Clustering:
Grouping file components according to access behavior
Considerations affecting block size:
1. size of available main memory
2. space reserved for programs (and their internal data space) that use the files
3. size of one component of the block
4. characteristics of the external storage device used
5.
Buffering:
Software interface that reconciles blocked components of the file with the program that
accesses information as single components.
A buffering interface is of one of two types:
o blocking routine
o deblocking routine.
Blocking Routine:
Stores components from the program into a buffer (in main memory)
Deblocking Routine:
System Software And Operating System
188
Accesses one block from the file (,places it in memory) and sends one component at a time to
the program.
Sample Deblocking Process:
1. If buffer not empty, go to step 6
2. CPU issues input request
3. I/O channel signals device controller for device specified in the input request
4. device controller locates requested information and starts reading bytes from the device and
sends them to the buffer in main memory.
5. I/O channel waits until the buffer is full, then signals the CPU that I/O operation is complete,
Location indicator for the buffer sent to 1.
6. next component to which the location indicator points is sent to the program
7. increment location indicator
8. CPU continues execution of program
Logical Write:
Writing one component to the block-sized buffer
Physical Write:
Writing one block to the external file
Double Buffering:
Having two buffers so one can be filled while the other is being processed
Processor Bound:
A process where more time is taken to process a block than is taken to read or write the
block. In such a case, the entire process can only be made faster by increasing the efficiency of the
processing part.
Buffers
file manager : confirms file use info; finds physical location of file on disk; makes sure
required sector in buffer;
I/O buffer: holds sectors of data; often doesn't get written back to disk until the buffer is
needed for other uses (that way if more stuff done to the same sector, it doesn't have to be
loaded again)
I/O processor: may be simple chip or complex CPU takes instruction from O/S, but once it
starts it runs independently; it'll tell someone when it's done
Disk controller: I/O processor checks w/ disk controller if it's ready; then asks to position r/w heads;
when ready, I/O processor passes bytes to disk or vice-versa
Buffer management:
bottlenecks - what if only one buffer and we are alternately reading and writing - most have
at least one each
buffering strategies - trade-off is management overhead VS transfer time savings
avoid being I/O bound by having several buffers so one can be processed while another is
filled (then switch roles - double buffering)
keep a pool of buffers (take one only when you need it)
System Software And Operating System
189
move mode: parts of memory are reserved for specific purposes (like system buffers and user
space) - this means stuff must be moved around, sometimes A LOT
locate mode: allows use of data directly from I/O buffer or transfer of data from device to
user buffer
scatter/ gather I/O: moves into/out of several buffers with a single READ/WRITE ; scatter:
move data from one block to several buffers according to specified organization; gather:
gather several buffers; write with single output
can sometimes control buffer management through calls to O/S
FILE ORGANIZATION
Collection of records, a key element in file management is the way in which the records themselves
are organized inside the file, since this heavily affects system performances ad far as record finding
and access. Note carefully that by ``organization'' we refer here to the logical arrangement of the
records in the file (their ordering or, more generally, the presence of ``closeness'' relations between
them based on their content), and not instead to the physical layout of the file as stored on a storage
media, To prevent confusion, the latter is referred to by the expression ``record blocking'', and will
be treated later on.
Choosing a file organization is a design decision, hence it must be done having in mind the
achievement of good performance with respect to the most likely usage of the file. The criteria
usually considered important are:
1. Fast access to single record or collection of related records.
2. Easy record adding/update/removal, without disrupting (1).
3. Storage efficiency.
4. Redundancy as a warranty against data corruption.
SUBHEADINGS.
SCHEMES
**SEQUENTIAL.
**DIRECT.
**INDEXED SEQUENTIAL.
**PARTITIONED.
**DIAGRAM-PARTITIONED DATA SET.
QUEUED AND BASIC ACCESS METHODS.
System Software And Operating System
190
Needless to say, these requirements are in contrast with each other for all but the most trivial
situations, and it's the designer job to find a good compromise among them, yielding and adequate
solution to the problem at hand. For example, easiness of adding/etc. is not an issue when defining
the data organization of a CD-ROM product, whereas fast access is, given the huge amount of data
that this media can store. However, as it will become apparent shortly, fast access techniques are
based on the use of additional information about the records, which in turn competes with the high
volumes of data to be stored.
Logical data organization is indeed the subject of whole shelves of books, in the ``Database'' section
of your library. Here we'll briefly address some of the simpler used techniques, mainly because of
their relevance to data management from the lower-level (with respect to a database's) point of view
of an OS. Five organization models will be considered:
Pile.
Sequential.
Indexed-sequential.
Indexed.
Hashed.
File organization refers to the manner in which the records of a file are arranged on secondary
storage. The most popular file organization schemes in use today follow.
Sequential – Records are placed in physical order. The “next” record is the one that
physically follows the previous record. This organization is natural for files stored on magnetic tape,
an inherently sequential medium.
Direct – records are directly (randomly) accessed by their physical addresses on a
direct access storage device (DASD).
Indexed sequential – records are arranged in logical sequence according to a key
contained in each record. Indexed sequential records may be accessed sequentially in key order or
they may be accessed directly.
Partitioned – This is essentially a file of sequential subfiles. Each sequential subfile is called a
member. The starting address of each member is stored in the file’s directory.
The term volume is used to refer to the recording medium for each particular auxiliary storage
device. The volume used on a tape drive is a reel of magnetic tape; the volume used on a disk drive
is a disk.
System Software And Operating System
191
QUEUED AND BASIC ACCESS METHODS:
Operating systems generally provide many access methods. These are sometimes grouped into
two categories, namely queued access methods and basic access methods. The queued methods
provide more powerful capabilities than the basic methods.
Queued access methods are used when the sequence in which records are to be processed can
be anticipated, such as in sequential and indexed sequential accessing. The queued methods perform
anticipatory buffering and scheduling of I/O operations. They try to have the next record available
for processing as soon as the previous record has been processed.
The basic access methods are normally used when the sequence in which records are to be
processed cannot be anticipated such as in direct accessing. And also in user applications to control
record access without incurring the overhead of the queue method.
System Software And Operating System
192
ALLOCATING AND FREEING SPACE
INTRODUCTION:
When files are allocated and freed it is common for the space on disk to become
increasingly fragmented. One technique for alleviating this problem is to perform periodic
compaction or garbage collection. Files may be reorganized to occupy adjacent areas of the disk, and
free areas may be collected into a single block or a group of large blocks.
This garbage collection is often done during the system shut down; some systems
perform compaction dynamically while in operation. A system may choose to reorganize the files of
users not currently logged in, or it may reorganize files that have not been referenced for a long time.
Designing a file system requires knowledge of the user community, including the number of
users, the average number and size of files per user, the average duration of user sessions, the nature
of application to be run on the system, and the like. Users searching a file for information often use
file scan options to locate the next record or the previous record.
In paged systems, the smallest amount of information transferred between secondary and
primary storage is a page, so it makes sense to allocate secondary storage in blocks of the page size
or a multiple of a page size.
Locality tells us that once a process has referred to a data item on a page it is likely to reference
additional data items on that page; it is also likely to reference data items on pages contiguous to that
page in the user’s virtual address space.
SUBHEADINGS.
INTRODUCTION.
TYPES OF ALLOCATION.
**CONTIGUOUS ALLOCATION.
**NON-CONTIGUOUS ALLOCATION.
SECTOR – ORIENTED LINKED ALLOCATION.
BLOCK ALLOCATION.
DIAGRAM-BLOCK CHAINING.
DIAGRAM-INDEX BLOCK CHAINING.
DIAGRAM-BLOCK-ORIENTED FILE MAPPING.
DIAGRAM-PHYSICAL BLOCKS ON SECONDARY
STORAGE
System Software And Operating System
193
Space Allocation
Files are allocated disk spaces by operating system. Operating systems deploy following three
main ways to allocate disk space to files.
Contiguous Allocation
Linked Allocation
Indexed Allocation
Contiguous Allocation
Each file occupy a contiguous address space on disk.
Assigned disk address is in linear order.
Easy to implement.
External fragmentation is a major issue with this type of allocation technique.
Linked Allocation
Each file carries a list of links to disk blocks.
Directory contains link / pointer to first block of a file.
No external fragmentation
Effectively used in sequential access file.
Inefficient in case of direct access file.
Indexed Allocation
Provides solutions to problems of contigous and linked allocation.
A index block is created having all pointers to files.
Each file has its own index block which stores the addresses of disk space occupied by the
file.
Directory contains the addresses of index blocks of files.
TYPES OF ALLOCATION:
There are two major types of allocation. They are
Contiguous allocation.
Noncontiguous allocation.
Sector-oriented linked allocation.
Block allocation.
CONTIGUOUS ALLOCATION
In contiguous allocation, files are assigned to contiguous areas of secondary storage. A user
specifies in advance the size of the area needed to hold a file is to be created. If the desired amount
of contiguous space is not available the file cannot be created.
One advantage of contiguous allocation is that successive logical records are normally
physically adjacent to one another. This speed access compared to systems in which successive
logical records are dispersed throughout the disk.
System Software And Operating System
194
The file directories in contiguous allocation systems are relatively straightforward to implement.
For each file it is necessary to retain the address of the start of the file and the file’s length.
Disadvantage of contiguous allocation
The files are deleted, the space they occupied on secondary storage is reclaimed.
This space becomes available for allocation of new files, but these new files must
fit in the available holes.
Thus contiguous allocation schemes exhibit the same types of fragmentation
problems inherent in variable partition multiprogramming systems – adjacent
secondary storage holes must be coalesced, and periodic compaction may need to be
performed to reclaim storage areas large enough to hold new files.
NONCONTIGUOUS ALLOCATION
Files tend to grow or shrink over time so generally we go for dynamic noncontiguous storage
allocation systems instead of contiguous allocation systems.
SECTOR-ORIENTED LINKED ALLOCATION
Files consist of many sectors which may be dispersed throughout the disk. Sectors belonging to a
common file contain pointers to one another, forming a linked list. A free space list contains entries
for all free sectors on the disk. When a file needs to grow, the process requests more sectors from the
free space list. Files that shrink return sectors to the free space list. There is no need for compaction.
The drawbacks in noncontiguous allocation is that the records of a file may be dispersed
throughout the disk, retrieval of logically contiguous records can involve lengthy seeks.
BLOCK ALLOCATION
One scheme used to manage secondary storage more efficiently and reduce execution time
overhead is called block allocation. This is a mixture of both contiguous allocation and
noncontiguous allocation methods.
In this scheme, instead of allocating individual sectors, blocks of contiguous sectors (sometimes
called extents) are allocated. There are several common ways of implementing block-allocation
systems. These include block chaining, index block chaining, and block –oriented file mapping.
In block chaining entries in the user directory point to the first block of each file. The fixed-
length blocks comprising a file each contain two portions: a data block, and a pointer to the next
block. Locating a particular record requires searching the block chain until the appropriate block is
found, and then searching that block until the appropriate block is found, and then searching that
block until the appropriate record is found. Insertions and deletion are straightforward.
System Software And Operating System
195
With index block chaining, the pointers are placed into separate index blocks.
Each index block contains a fixed number of items. Each entry contains a record identifier and a
pointer to that record. If more than one index block is needed to describe a file, then a series of index
blocks is chained together.
System Software And Operating System
196
The big advantage of index block chaining over simple block chaining over
simple block chaining is that searching may take place in the index blocks themselves. Once the
appropriate record is located via the index blocks, the data block containing that record is read into
primary storage. The disadvantage of this scheme is that insertions can require the complete
reconstruction of the index blocks, so some systems leave a certain portion of the index blocks
empty to provide for future insertions.
In block-oriented file mapping instead of using pointers, the system uses block
numbers. Normally, these are easily converted to actual block addresses because of the geometry of
the disk. A file map contains one entry for each block on the disk. Entries in the user directory
point to the first entry in the file map for each file.
Each entry in the file map for each file. Each entry in the file map contains the
block number of the next block in that file. Thus all the blocks in a file may be located by following
the entries in the file map.
The entry in the file map that corresponds to the last entry of a particular file is set to some
sentinel value like ‘Nil’ to indicate that the last block of a file has been reached. Some of the entries
in the file map are set to “Free” to indicate that the block is available for allocation. The system may
either search the file map linearly to locate a free block, or a free block list can be maintained.
An advantage of this scheme is that the physical adjacencies on the disk are
reflected in the file map. Insertions and deletions are straightforward in this scheme.
INDEX BLOCK CHAINING
System Software And Operating System
197
BLOCK-ORIENTED FILE MAPPING.
System Software And Operating System
198
FILE DESCRIPTOR A file descriptor or file control block is a control block containing information the system needs
to manage a file.
A typical file descriptor might include
1) symbolic file name
2) location of file in secondary storage
3) file organization (Sequential, indexed sequential, etc.)
4) device type
5) access control data
6) type (data file, object program, c source program, etc.)
7) disposition (permanent vs temporary)
8) creation date and time
9) destroy date
10) date and time last modified
11) access activity counts (number of reads, for example)
System Software And Operating System
199
File descriptors are maintained on secondary storage. They are brought to primary storage when
a file is opened.
ACCESS CONTROL MATRIX One way to control access to files is to create a two-dimensional access control matrix
listing all the users and all the files in the system. The entry Aij is 1 if user i is allowed access to file
j Otherwise Aij=0. In an installation with a largae number of users and a large number of files, this
matrix would be very large and very sparse. Allowing one user access to another users files.
To make a matrix concept useful, it would be necessary to use codes to indicate various
kinds of access such as read only, write only, execute only, read write etc.
ACCESS CONTROL BY USER CLASSES
A technique that requires considerably less space is to control access to various user classes. A
common classification scheme is
1) Owner – Normally, this is the user who created the file.
2) Specified User - The owner specifies that another individual may use the file.
3) Group or Project – Users are often members of a group working on a particular project. In this
case the various members of the group may all be granted access to each other’s project-related files.
4) Public- Most systems allow a file to be designated as public so that it may be accessed by any
member of the system’s user community. Public access normally allows users to read or execute a
file, but not to write it.
System Software And Operating System
200