hicasbscit.files.wordpress.com · system software and operating system 1 system software &...

System Software And Operating System

1

SYSTEM SOFTWARE & OPERATING SYSTEM

UNIT I

Introduction –System Software and machine architecture-Assemblers-Basic assembler functions -

Machine dependent features-program relocation-Machine independent features – literals - symbol

defining statements-expressions-program blocks-control sections and program linking - Assembler

design options-one pass assemblers-multi pass assemblers.

UNIT II

Loader and Linkers: Basic Loader Functions - Machine dependent loader features – relocation –

program – linking - Machine independent loader features - Automatic Library search - Loader

options - Loader design options - linkage editor - dynamic linking - Bootstrap loader.

Text Editors: Overview of editing process - user interface - editor structure

UNIT III

Machine dependent compiler features - Intermediate form of the program-Machine dependent code

optimization-machine independent compiler features-Compiler design options-division into passes-

interpreters-p –code compilers-compiler-compilers.

UNIT IV

Introduction: Definition of DOS – History of DOS – Definition of Process - Process states - process

states transition – Interrupt processing – interrupt classes - Storage Management Real Storage: Real

storage management strategies – Contiguous versus

Non-contiguous storage allocation – Single User Contiguous Storage allocation- Fixed partition

multiprogramming – Variable partition multiprogramming. Virtual Storage: Virtual storage

management strategies – Page replacement strategies – Working sets – Demand paging – page size.

UNIT V

Processor Management Job and Processor Scheduling: Preemptive Vs Non-preemptive scheduling –

Priorities – Deadline scheduling - Device and Information Management Disk Performance

Optimization: Operation of moving head disk storage – Need for disk scheduling – Seek

Optimization .File and Database Systems: File System – Functions – Organization – Allocating and

freeing space – File descriptor – Access control matrix.

UNIT I


2

Introduction –System Software and machine architecture-Assemblers-Basic assembler functions -

Machine dependent features-program relocation-Machine independent features – literals - symbol

defining statements-expressions-program blocks-control sections and program linking - Assembler

design options-one pass assemblers-multi pass assemblers.

Loader and Linkers: Basic Loader Functions - Machine dependent loader features – relocation –

program – linking - Machine independent loader features - Automatic Library search - Loader

options - Loader design options - linkage editor - dynamic linking - Bootstrap loader.

INTRODUCTION

System Software consists of a variety of programs that support the operation of a computer.

It makes possible for the user to focus on an application or other problem to be solved,

without needing to know the details of how the machine works internally.

You probably wrote programs in a high level language like C, C++ or VC++, using text

editor to create and modify the program.

You translated these programs into machine languages using a compiler.

The resulting machine language program was loaded into memory and prepared for

execution by loader and linker. It is also used debugger to find errors in the programs.

System software refers to the files and programs that make up your computer's operating

system. System files include libraries of functions, system services, drivers for printers

and other hardware, system preferences, and other configuration files.

The programs that are part of the system software include assemblers, compilers, file

management tools, system utilites, and debuggers.

The system software is installed on your computer when you install your operating

system.

You can update the software by running programs such as "Windows Update" for

Windows or "Software Update" for Mac OS X. Unlike application programs, however,

system software is not meant to be run by the end user.

For example, while you might use your Web browser every day, you probably don't have

much use for an assembler program (unless, of course, you are a computer programmer).

Since system software runs at the most basic level of your computer, it is called "low-

level" software.

It generates the user interface and allows the operating system to interact with the

hardware. Fortunately, you don't have to worry about what the system software is doing

since it just runs in the background.

One characteristic in which most system software differs from application software is machine

dependency.

System software – support operation and use of computer.

Application software - solution to a problem.

http://www.techterms.com/definition/operating_system

http://www.techterms.com/definition/operating_system

http://www.techterms.com/definition/debugger

http://www.techterms.com/definition/application


3

Application Software

The different types of application software are used by individual users and business

enterprises as well, and they have many benefits of doing so.

It includes word processing software, database software, and multimedia software,

editing software and many other different kinds as well.

All these software are either provided individually, or they are packaged together and

sold by business to business sellers.

When a whole variety of them are integrated collectively and sold to a business, they can

take up the form of enterprise software, educational software, simulation software,

information worker software etc.

Advantages

It is easy to compare, you will find that the pros outweigh the cons very easily.

With that in mind, here are some of their most popular and widely accepted benefits.

Note that in this scenario, we are speaking of application software that is designed for a

specific purpose, to be used either by individuals or by businesses.

Their single biggest advantage is that it meets the exact needs of the user. Since it is

designed specifically with one purpose in mind, the user knows that he has to use the

specific software to accomplish his task.

The threat of viruses invading custom-made applications is very small, since any business

that incorporates it can restrict access and can come up with means to protect their

network as well.

Licensed application software gets regular updates from the developer for security

reasons.


4

Additionally, the developer also regularly sends personnel to correct any problems that

may arise from time to time.

Disadvantages

In this case with all such matters, there are certain disadvantages of such software as

well. Though these are not spoken about very often, nor are they highlighted, the fact is

that they do exist and affect certain users.

People have accepted these misgivings and still continue to use such software because

their utility and importance is much more profound than their weaknesses.

Developing application software designed to meet specific purposes can prove to be quite

costly for developers.

This can affect their budget and their revenue flow, especially if too much time is spent

developing software that is not generally acceptable.

Some software that are designed specifically for a certain business, may not be

compatible with other general software.

This is something that can prove to be a major stumbling block for many corporations.

Developing them is something that takes a lot of time, because it needs constant

communication between the developer and the customer. This delays the entire

production process, which can prove to be harmful in some cases.

Application software that is used commonly by many people, and then shared online,

carries a very real threat of infection by a computer virus or other malicious programs.

So whether you are buying them off the shelf, or whether you are hiring a developer to build

specific software for you, all of these points will seem pertinent to you. Many individuals and

businesses have regularly found the need and the requirement for such software, and the fact

remains that any computing device will be utterly useless without such software running on

it.

Assembler

Assembler translates mnemonic instructions into machine code. The instruction formats,

addressing modes etc., are of direct concern in assembler design. Similarly, Compilers must

generate machine language code, taking into account such hardware characteristics as the number

and type of registers and the machine instructions available.

Operating systems

Operating systems is directly concerned with the management of nearly all of the resources

of a computing system.

There are aspects of system software that do not directly depend upon the type of computing

system, general design and logic of an assembler, general design and logic of a compiler and,

code optimization techniques, which are independent of target machines.

Likewise, the process of linking together independently assembled subprograms does not

usually depend on the computer being used.


5

Simplified Instructional Computer (SIC) is a hypothetical computer that includes the hardware

features most often found on real machines. There are two versions of SIC, they are, standard model

(SIC), and, extension version (SIC/XE) (extra equipment or extra expensive).

Later, you probably wrote programs in assembler language, by using macro instructions to

read and write data. You used assembler, which included macro processor, to translate these

programs into machine languages.

You controlled all these processes by interacting with the operating system of the computer.

The operating system took care of all the machine level details for you. You should concentrate on

what you wanted to do, without worrying about how it was accomplished.

You will come to understand the processes that were going on “ behind the scenes” as you

used the computer in previous courses. By understanding the system software, you will gain a deeper

understanding of how computers actually work.

SYSTEM SOFTWARE AND MACHINE ARCHITECTURE

One characteristic in which most system software differs from application soft-ware is

machine dependency. An application program is primarily concerned with the solution of some

problem, using the computer as a tool. The focus is on the application, not on the computing system.

System programs, on the other hand, are intended to support the operation and use of the computer

itself, rather than any particular application. For this mason, they are usually related to the

architecture of the machine on which they are to run.

For example, assemblers translate mnemonic instructions into machine code; the instruction

formats, addressing modes, etc., are of direct concern in assembler design. Similarly, compilers must

generate machine language code, taking into account such hardware characteristics as the number

and type of registers and the machine instructions available.

Operating systems are directly concerned with the management of nearly all of the resources

of a computing system. Many other examples of such machine dependencies may be found through-

out this book. On the other hand, there are some aspects of system software that do not directly

depend upon the type of computing system being supported.

For example, the general design and logic of an assembler is basically the same on most

computers. Some of the code optimization techniques used by compilers are independent of the

target machine (although there are also machine-dependent optimizations). Likewise, the process of

linking together independently assembled subprograms does not usually depend on the computer

being used.

Assembler is system software which is used to convert an assembly language program to its

equivalent object code.


6

The input to the assembler is a source code written in assembly language (using mnemonics)

and the output is the object code. The design of an assembler depends upon the machine architecture

as the language used is mnemonic language.

An application program is primarily concerned with the solution of some problem, using the

computer as a tool. The focus is on the application, not on the computing system. System programs,

on the other hand, are intended to support the operation and use of the computer itself, rather than

any particular application. For this reason, they are usually related to the architecture of the machine

on which they are to run.

For example,

Assemblers translate mnemonic instructions into machine code, the instruction formats,

addressing modes, etc., are of direct concern in assembler design.

Compilers generate machine code, taking into account such hardware characteristics as the

number and type of registers & machine instruction available.

Operating system concerned with the management of nearly all resources of a computing

system.

Some of the system software is machine independent, the processes of linking together

independent assembled subprograms does not usually depend on the computer being used. And the

other system software is machine dependent; we must include real machines and real pieces of

software in our study.

However, most real computers have certain characteristics that are unusual or even unique. It is

difficult to distinguish between those features of the software. To avoid this problem, we present the

fundamental functions of piece of software through discussion of a Simplified Instructional

Computer (SIC).

SIC is a hypothetical computer that has been carefully designed to include the hardware

features most often found on real machines, while avoiding unusual or irrelevant complexities.

SIC MACHINE ARCHITECTURE

Memory

Memory consists of 8- bit bytes, any three consecutive bytes form a word (24 bits). All addresses on

SIC are byte addresses, words are addressed by the location of their lowest numbered byte. There are

total of 32768 bytes in the computer memory.

Registers

There are five registers, all of which have special uses. Each register is 24 bits in length.


7

Mnemonic Number Special Use

A 0 Accumulator, used for arithmetic operations

X 1 Index register, used for Addressing

L 2 Linkage register, the jump to subroutine instruction stores

There turn address in this register .

PC 8 Program counter, contains the address of the next

Instruction to be fetched for execution.

SW 9 Status word, contains a variety of information, including a

Condition Code.

Data format

Integers are stored as 24-bit binary numbers; 2’s complement representation is used for

negative numbers.

Characters are store using their 8-bit ASCII codes.

There is no floating-point hardware on SIC.

Instruction Format

All machine instructions on SIC has the following 24-bit format.

used to indicate indexed-addressing mode.

Addressing Modes

X is only two modes are supported:

– Direct

– Indexed


8

() are used to indicate the content of a register.

Instruction Set

SIC provide a basic set of instructions that are sufficient for most simple task.

Load and store registers (LDA, LDX, STA, STX)

Integer arithmetic (ADD, SUB, MUL, DIV), all involve register A and a word in

memory.

Comparison (COMP), involve register A and a word in memory.

Conditional jump (JLE, JEQ, JGT, etc.)

Subroutine linkage (JSUB, RSUB).

INPUT AND OUTPUT

One byte at a time to or from the rightmost 8 bits of register A.

Each device has a unique 8-bit ID code.

Test device (TD): test if a device is ready to send or receive a byte of data.

Read data (RD): read a byte from the device to register A

Write data (WD): write a byte from register A to the device.

Sic Machine Architecture

Mnemonic Number Special Use

B 3 Used for addressing; know as the base register.

S 4 No special use, general purpose register.

T 5 No special use, general purpose register.

F 6 Floating point accumulator register (This register is 48-bits

instead of 24).


9

Memory

Two versions: SIC and SIC/XE (extra equipments). SIC program can be executed on

SIC/XE.

Memory consists of 8-bit bytes. 3 consecutive bytes form a word (24 bits)

In total, there are 2^15 bytes in the memory.

There are 5 registers. Each is 24 bits in length.

Addressing Modes for SIC and SIC/XE

The Simplified Instruction Computer has three instruction formats, and the Extra Equipment

add-on includes a fourth. The instruction formats provide a model for memory and data

management. Each format has a different representation in memory:

Format 1: Consists of 8 bits of allocated memory to store instructions.

Format 2: Consists of 16 bits of allocated memory to store 8 bits of instructions

and two 4-bits segments to store operands.

Format 3: Consists of 6 bits to store an instruction, 6 bits of flag values, and 12

bits of displacement.

Format 4: Only valid on SIC/XE machines, consists of the same elements as

format 3, but instead of a 12-bit displacement, stores a 20-bit address.

Both format 3 and format 4 have six-bit flag values in them, consisting of the following flag bits:

n: Indirect addressing flag

i: Immediate addressing flag

x: Indexed addressing flag

b: Base address-relative flag

p: Program counter-relative flag

e: Format 4 instruction flag

SIC PROGRAMMING EXAMPLES:

COPY START 1000

FIRST STL RETADR

CLOOP JSUB RDREC

LDA LENGTH

COMP ZERO

JEQ ENDFIL

JSUB WRREC

J CLOOP ENDFIL LDA EOF

STA BUFFER

LDA THREE

STA LENGTH

JSUB WRREC


10

LDL RETADR

RSUB

EOF BYTE C'EOF'

THREE WORD 3

ZERO WORD 0

RETADR RESW 1

LENGTH RESW 1

BUFFER RESB 4096

.

. SUBROUTINE TO READ RECORD INTO BUFFER

. RDREC LDX ZERO

LDA ZERO

RLOOP TD INPUT

JEQ RLOOP

RD INPUT

COMP ZERO

JEQ EXIT

STCH BUFFER,X

TIX MAXLEN

JLT RLOOP

EXIT STX LENGTH

RSUB INPUT BYTE X'F1'

MAXLEN WORD 4096

.

. SUBROUTINE TO WRITE RECORD FROM BUFFER

.

WRREC LDX ZERO

WLOOP TD OUTPUT

JEQ WLOOP

LDCH BUFFER,X

WD OUTPUT

TIX LENGTH JLT WLOOP

RSUB

OUTPUT BYTE X'06'

END FIRST

ASSEMBLERS

The design and implementation of assemblers. There are certain fundamental functions that

any assembler must perform, such as translating mnemonic operation codes to their machine

language equivalents and assigning machine addresses to symbolic labels used by the programmer.

If we consider only these fundamental functions, most assemblers are very much alike. Beyond this

most basic level, however, the features and design of an assembler depend heavily upon the source

language it translates and the machine language it produces.

One aspect of this dependence is, of course, the existence of different machine instruction

formats and codes to accomplish (for example) an ADD operation. As we shall see, there are also

many subtler ways that assemblers depend upon machine architecture. On the other hand, there are

some features of an assembler language (and the corresponding assembler) that have no direct

relation to machine architecture—they are, in a sense, arbitrary decisions made by the designers of


11

the language. We begin by considering the design of a basic assembler for the standard version of

our Simplified Instructional Computer (SIC).

It introduces the most fundamental operations performed by a typical assembler, and

describes common ways of accomplishing these functions. The algorithms and data structures that

we describe are shared by almost all assemblers. Thus this level of presentation gives us a starting

point from which to approach the study of more advanced assembler features. We can also use this

basic structure as a framework from which to begin the design of an assembler for a completely new

or unfamiliar machine. We examine some typical extensions to the basic assembler structure that

might be dictated by hardware considerations.

An assembler is a program that takes basic computer instructions and converts them into a

pattern of bits that the computer's processor can use to perform its basic operations. Some people call

these instructions assembler language and others use the term assembly language.

he design of assembler can be to perform the following:

Scanning (tokenizing)

Parsing (validating the instructions)

Creating the symbol table

Resolving the forward references

Converting into the machine language

The design of assembler in other words:

Convert mnemonic operation codes to their machine language equivalents

Convert symbolic operands to their equivalent machine addresses

Decide the proper instruction format Convert the data constants to internal machine r

representations.

Write the object program and the assembly listing, So for the design of the assembler we

need to concentrate on the machine architecture of the SIC/XE machine.

BASIC ASSEMBLER FUNCTIONS

We use variations of this program throughout this chapter to show different assembler

features. The line numbers are for reference only and are not part of the program. These numbers

also help to relate corresponding parts of different versions of the program. The mnemonic

instructions used are those introduced and Appendix A. Indexed addressing is indicated by adding

http://searchsoftwarequality.techtarget.com/definition/program

http://searchcio-midmarket.techtarget.com/definition/instruction

http://searchcio-midmarket.techtarget.com/definition/bit

http://searchcio-midmarket.techtarget.com/definition/processor


12

the modifier, "X" following the operand (see line 160). Lines beginning with "." contain comments

only. In addition to the mnemonic machine instructions, we have used the following assembler

directives:

START Specify name and starting address for the program.

END indicates the end of the source program and (optionally) specifies the first executable

instruction in the program.

BYTE Generate character or hexadecimal constant, occupying as many bytes as needed to

represent the constant.

WORD Generate one-word integer constant

RESB Reserve the indicated number of bytes for a data area.

RESW Reserve the indicated number of words for a data area.

The program contains a main routine that reads records from an input device (identified with

device code F1) and copies them to an output device (code 05). This main routine calls subroutine

RDREC to read a record into a buffer and subroutine WRICEC to write the record from the buffer to

the out-put device.

Line Source statement

5 COPY START 1000 COPY FILE FROM INPUT TO OUTPUT

10 FIRST STL RETADR SAVE RETURN ADDRESS

15 CLOOP JSUB RDREC READ INPUT RECORD

20 LDA LENGTH TEST FOR FOE (LENGTH = 0)

25 COMP ZERO

30 JEQ ENDFIL EXIT IF EOF FOUND 3

5 JSUB WRREC WRITE OUTPUT RECORD

40 J CLOOP LOOP

45 ENDFIL LDA EOF INSERT END OF FILE MARKER

50 STA BUFFER

55 LDA THREE SET LENGTH = 3

60 STA LENGTH

65 JSUB WRREC WRITE EOF

70 LDL RETADR GET RETURN ADDRESS

75 RSUB RETURN TO CALLER

80 EOF BYTE C'EOF'


13

85 THREE WORD 3

90 ZERO WORD 0

95 RETADR RESW 1

100 LENGTH RESW 1 LENGTH OF RECORD

105 BUFFER RESB 4096 4096-BYTE BUFFER AREA

110 .

115 . SUBROUTINE TO READ RECORD INTO BUFFER

120 .

125 RDREC LDX ZERO CLEAR LOOP COUNTER

130 LDA ZERO CLEAR A TO ZERO

135 RLOOP TD INPUT TEST INPUT DEVICE

140 JEQ RLOOP LOOP UNTIL READY

145 RD INPUT READ CHARACTER INTO REGISTER A

150 COMP ZERO TEST FOR END OF RECORD IX'00')

155 JEQ EXIT EXIT LOOP IF FOR

160 STCH BUFFER,X STORE CHARACTER IN BUFFER

165 TIX MAXLEN LOOP UNLESS MAX LENGTH

170 JLT RLOOP HAS BEEN REACHED

175 EXIT STX LENGTH SAVE RECORD LENGTH

Example of a SIC assembler language program.

Each subroutine must transfer the record one character at a time because the only I/O

instructions available are RD and WD. The buffer is necessary because the I/O rates for the two

devices, such as a disk and a slow printing terminal, may be very different. (In Chapter 6, we see

how to use channel programs and operating system calls on a SIC/XE system to accomplish the

same functions.) The end of each record is marked with a null character (hexadecimal 00). If a

record is longer than the length of the buffer (4096 bytes), only the first 4096 bytes are copied. (For

simplicity, the program does not deal with error recovery when a record containing 4096 bytes or

more is read.) The end of the file to be copied is indicated by a zero-length record. When the end of

file is detected, the program writes EOF on the output device and terminates by executing an RSUB

instruction. We assume that this pro-gram was called by the operating system using a PUB

instruction; thus, the RSUB will return control to the operating system.


14

A Simple SIC Assembler

The generated object code for each statement. The column headed Loc gives the machine

address (in hexadecimal) for each part of the assembled program. We have assumed that the program

starts at address 1000. (In an actual assembler listing, of course, the comments would be retained;

they have been eliminated here to save space.) The translation of source program to object code

requires us to accomplish the following functions (not necessarily in the order given):

1. Convert mnemonic operation codes to their machine language equivalents—e.g., translate STL to

14 (line 10).

2. Convert symbolic operands to their equivalent machine addresses—e.g., translate RETADR to

1033 (line 10).

3. Build the machine instructions in the proper format.

4. Convert the data constants specified in the source program into their internal machine

representations—e.g., translate EOF to 454F46 (line 80).

5. Write the object program and the assembly listing.

All of these functions except number 2 can easily be accomplished by sequential processing of the

source program, one line at a time. The translation of addresses, however, presents a problem.

Consider the statement

10 1000 FIRST ̀ STL REPADR 141033

Line Loc Source statement Object code

5 1000 COPY START 1000

10 1000 FIRST STL RETADR 141033

15 1003 CLOOP JSUB RDREC 482039

20 1006 LDA LENGTH 001036

25 1009 COMP ZERO 281030

30 100C JEQ ENDFIL 301015

35 100F JSUB WRREC 482061

40 1012 J CLOOP 3C1003

45 1015 ENDFIL LDA EOF 00102A

50 1018 STA BUFFER 0C1039

55 101B LDA THREE 00102D


15

60 101E STA LENGTH 0C1036

65 1021 JSUB WRREC 482061

70 1024 LDL RETADR 081033

75 1027 RSUB 4C0000

80 1027 EOF BYTE C'EOF' 454F46

85 102D THREE WORD 3 000003

90 1030 ZERO WORD 0 000000

95 1033 RETADR RESW 1

100 1036 LENGTH RESW 1

Program with object code

This instruction contains a forward reference that is, a reference to a label (RETADR) that is

defined later in the program. If we attempt to translate the program line by line, we will be unable to

process this statement because we do not know the address that will be assigned to RETADR.

Because of this, most assemblers make two passes over the source program. The first pass does little

more than scan the source program for label definitions and assign addresses.

The second pass per-forms most of the actual translation previously described. In addition to

translating the instructions of the source program, the assembler must process statements called

assembler directives (or pseudo-instructions). These statements are not translated into machine

instructions (although they may have an effect on the object program).

Instead, they provide instructions to the assembler itself. Examples of assembler directives

are statements like BYTE and WORD, which direct the assembler to generate constants as part of

the object program, and RESB and RESW, which instruct the assembler to reserve memory locations

without generating data values. The other assembler directives in our sample program are START,

which specifies the starting memory address for the object program, and END, which marks the end

of the program.

Finally, the assembler must write the generated object code onto some out-put device. This

object program will later be loaded into memory for execution. The simple object program format

we use contains three types of records: Header, Text, and End. The Header record contains the

program name, starting address, and length.

Text records contain the translated (i.e., machine code) instructions and data of the program,

together with an indication of the addresses where these are to be loaded. The End record marks the

end of the object program and specifies the address in the program where execution is to begin. (This

is taken from the operand of the program's END statement. If no operand is specified, the address of

the first executable instruction is used.)


16

The formats we use for these records are as follows. The details of the for-mats (column

numbers, etc.) are arbitrary; however, the information contained in these records must be present (in

some form) in the object program.

Header record:

Col. I H

Col. 2-7 Program name

Col. 8-13 Starting address of object program (hexadecimal)

Col. 14-19 Length of object program in bytes (hexadecimal)

Text record:

Col. 1 T

Col. 2-7 Starting address for object code in this record (hexadecimal)

Col. 8-9 Length of object code in this record in bytes (hexadecimal)

Col. 10-69 Object code, represented in hexadecimal (2 columns per byte of object code)

End record:

Col. 1 E

Col. 2-7 Address of first executable instruction in object program (hexadecimal)

To avoid confusion, we have used the term column rather than byte to refer to positions within

object program records. This is not meant to imply the use of any particular medium for the object

program.

The symbol A is used to separate fields visually. Of course, such symbols are not present in

the actual object program. Note that there is no object code corresponding to addresses 1033-2038.

This storage is simply reserved by the loader for use by the program during execution. (Chapter 3

contains a detailed discussion of the operation of the loader.) We can now give a general description

of the functions of the two passes of our simple assembler.

Pass 1 (define symbols):

1. Assign addresses to all statements in the program.

2. Save the values (addresses0 assigned to all labels for use in pass 2.

3. Perform some processing of assembler directives. (This includes processing that affects

address assignment, such as determining the length of data areas defined by BYTE, RESW, etc.)

Pass 2 (assemble instructions and generate object program):


17

1. Assemble instructions (translating operation codes and looking up addresses).

2. Generate data values defined by BYTE, WORD, etc.

3. Perform processing of assembler directives not done during Pass 1.

4. Write the object program and the assembly listing.

In the next section we discuss these functions in more detail, describe the internal tables required by

the assembler, and give an overall description of the logic flow of each pass.

ASSEMBLER ALGORITHM AND DATA STRUCTURE The simple assembler uses two major internal data structures:

Operation Code Table(OPTAB)

Symbol Table (SYMTAB)

Operation Code Table (OPTAB):

OPTAB is used to lookup mnemonic operation codes and translates them to their machine

language equivalents.

In more complex assemblers the table contains information about instruction format and

length.

In pass 1 the OPTAB is used to look up and validate the operation code in the source

program.

In pass 2, it is used to translate the operation codes to machine language. In simple SIC

machine this process can be performed in either in pass 1 or in pass 2.

But for machine like SIC/XE that has instructions of different lengths, we must search

OPTAB in the first pass to find the instruction length for incrementing LOCCTR.

In pass 2 we take the information from OPTAB to tell us which instruction format to use in

assembling the instruction, and any peculiarities of the object code instruction.

OPTAB is usually organized as a hash table, with mnemonic operation code as the key. The

hash table organization is particularly appropriate, since it provides fast retrieval with a minimum of

searching. Most of the cases the OPTAB are a static table that is; entries are not normally added to or

deleted from it. In such cases it is possible to design a special hashing function or other data

structure to give optimum performance for the particular set of keys being stored.

Symbol Table (SYMTAB):

Symbol table includes the name and value for each label in the source program, together with

flags to indicate the error conditions (e.g., if a symbol is defined in two different places).

During Pass 1: labels are entered into the symbol table along with their assigned address

value as they are encountered. All the symbols address value should get resolved at the pass 1.

During Pass 2: Symbols used as operands are looked up the symbol table to obtain the

address value to be inserted in the assembled instructions.


18


19

SYMTAB is usually organized as a hash table for efficiency of insertion and retrieval. Since

entries are rarely deleted, efficiency of deletion is the important criteria for optimization. Both pass 1

and pass 2 require reading the source program. Apart from this an intermediate file is created by pass

1 that contains each source statement together with its assigned address, error indicators, etc. This

file is one of the inputs to the pass 2.

A copy of the source program is also an input to the pass 2, which is used to retain the

operations that may be performed during pass 1 (such as scanning the operation field for symbols

and addressing flags), so that these need not be performed during pass 2. Similarly, pointers into

OPTAB and SYMTAB is retained for each operation code and symbol used. This avoids need to

repeat many of the table searching operations.

LOCCTR:


20

Location counter helps in the assignment of the addresses. LOCCTR is initialized to the beginning

address mentioned in the START statement of the program. After each statement is processed, the

length of the assembled instruction is added to the LOCCTR to make it point to the next instruction.

Whenever a label is encountered in an instruction the LOCCTR value gives the address to be

associated with that label.

The Algorithm for Pass 1:

Begin

read first input line

if OPCODE = „START. then begin

save #[Operand] as starting addr

initialize LOCCTR to starting address

write line to intermediate file

read next line

end( if START)

else

initialize LOCCTR to 0

While OPCODE != „END. do

begin

if this is not a comment line then

begin

if there is a symbol in the LABEL field then

begin

search SYMTAB for LABEL

if found then

set error flag (duplicate symbol)

else

(if symbol)

search OPTAB for OPCODE

if found then

add 3 (instr length) to LOCCTR

else if OPCODE = „WORD. then

add 3 to LOCCTR

else if OPCODE = „RESW. then

add 3 * #[OPERAND] to LOCCTR

else if OPCODE = „RESB. then

add #[OPERAND] to LOCCTR

else if OPCODE = „BYTE. then

begin

find length of constant in bytes

add length to LOCCTR

end

else

set error flag (invalid operation code)

end (if not a comment)


read next input line


21

end { while not END}

write last line to intermediate file

Save (LOCCTR – starting address) as program length

End {pass 1}

The algorithm scans the first statement START and saves the operand field (the address) as

the starting address of the program. Initializes the LOCCTR value to this address. This line is written

to the intermediate line. If no operand is mentioned the LOCCTR is initialized to zero. If a label is

encountered, the symbol has to be entered in the symbol table along with its associated address

value. If the symbol already exists that indicates an entry of the same symbol already exists.

So an error flag is set indicating a duplication of the symbol. It next checks for the mnemonic

code, it searches for this code in the OPTAB. If found then the length of the instruction is added to

the LOCCTR to make it point to the next instruction.

If the opcode is the directive WORD it adds a value 3 to the LOCCTR. If it is RESW, it

needs to add the number of data word to the LOCCTR. If it is BYTE it adds a value one to the

LOCCTR, if RESB it adds number of bytes. If it is END directive then it is the end of the program it

finds the length of the program by evaluating current LOCCTR the starting address mentioned in the

operand field of the END directive. Each processed line is written to the intermediate file.


Begin

read 1st input line

if OPCODE = „START. then

begin

write listing line


end

write Header record to object program

initialize 1st Text record

while OPCODE != „END. do

begin

if this is not comment line then

begin


if found then

begin

if there is a symbol in OPERAND field then

begin

search SYMTAB for OPERAND field then

if found then

begin

store symbol value as operand address

else

begin

store 0 as operand address

set error flag (undefined symbol)


22

end

end (if symbol)

else store 0 as operand address

assemble the object code instruction

else if OPCODE = „BYTE. or „WORD” then

convert constant to object code

if object code doesn.t fit into current Text record then

begin

Write text record to object code

initialize new Text record

end

add object code to Text record

end {if not comment}

write listing line


end

write listing line


write last listing line

End {Pass 2}

Here the first input line is read from the intermediate file. If the opcode is START, then this

line is directly written to the list file. A header record is written in the object program which gives

the starting address and the length of the program (which is calculated during pass 1). Then the first

text record is initialized.

Comment lines are ignored. In the instruction, for the opcode the OPTAB is searched to find

the object code. If a symbol is there in the operand field, the symbol table is searched to get the

address value for this which gets added to the object code of the opcode. If the address not found

then zero value is stored as operands address. An error flag is set indicating it as undefined. If

symbol itself is not found then store 0 as operand address and the object code instruction is

assembled.

If the opcode is BYTE or WORD, then the constant value is converted to its equivalent

object code ( for example, for character EOF, its equivalent hexadecimal value 454f46 is stored). If

the object code cannot fit into the current text record, a new text record is created and the rest of the

instructions object code is listed. The text records are written to the object program. Once the whole

program is assembled and when the END directive is encountered, the End record is written.

MACHINE-DEPENDENT ASSEMBLER FEATURES

The design and implementation of an assembler for the more complex XE version of SIC. In

doing so, we examine the effect of the extended hardware on the structure and functions of the

assembler. Many real machines have certain architectural features that are similar to those we

consider here. Thus our discussion applies in large part to these machines as well as to SIC/XE. It

might be rewritten to take advantage of the SIC/XE instruction set. In our assembler language,

indirect addressing is indicated by adding the prefix 0 to the operand immediate operands is denoted

with the prefix # (lines 25, 55, 133).


23

Instructions that refer to memory are normally assembled using either the program-counter

relative or the base relative mode. The assembler directive BASE (line 13) is used in conjunction

with base relative addressing. (See Section 2.2.1 for a discussion and examples.) If the displacements

required.

• Instructions can be:

1. Instructions involving register to register

2. Instructions with one operand in memory, the other in Accumulator (Single operand

instruction)

3.Extended instruction format

• Addressing Modes are:

Index Addressing(SIC): Opcode m, x

Indirect Addressing: Opcode @m

PC-relative: Opcode m

Base relative: Opcode m

Immediate addressing: Opcode #c

1. Translations for the Instruction involving Register-Register addressing mode:

During pass 1 the registers can be entered as part of the symbol table itself. The value for

these registers is their equivalent numeric codes. During pass 2, these values are

assembled along with the mnemonics object code. If required a separate table can be

created with the register names and their equivalent numeric values.

2. Translation involving Register-Memory instructions: In SIC/XE machine there are four

instruction formats and five addressing modes. For formats and addressing modes. Among the

instruction formats, format -3 and format-4 instructions are Register- Memory type of

instruction. One of the operand is always in a register and the other operand is in the memory.

The addressing mode tells us the way in which the operand from the memory is to be fetched.

There are two ways: Program-counter relative and Base-relative. This addressing

mode can be represented by either using format-3 type or format-4 type of instruction

format. In format-3, the instruction has the opcode followed by a 12-bit displacement

value in the address field. Where as in format-4 the instruction contains the mnemonic

code followed by a 20-bit displacement value in the address field.

Program-Counter Relative: In this usually format-3 instruction format is used. The

instruction contains the opcode followed by a 12-bit displacement value. The range of

displacement values are from 0 -2048. This displacement (should be small enough to fit

in a 12-bit field) value is added to the current contents of the program counter to get the

target address of the operand required by the instruction. This is relative way of

calculating the address of the operand relative to the program counter. Hence the

displacement of the operand is relative to the current program counter value.



24

Begin

read first input line

if OPCODE = „START. then begin

save #[Operand] as starting addr

initialize LOCCTR to starting address


read next line

end( if START)

else

initialize LOCCTR to 0

While OPCODE != „END. do

begin

if this is not a comment line then

begin

if there is a symbol in the LABEL field then

begin

search SYMTAB for LABEL

if found then

set error flag (duplicate symbol)

else

(if symbol)


if found then

add 3 (instr length) to LOCCTR

else if OPCODE = „WORD. then

add 3 to LOCCTR

else if OPCODE = „RESW. then

add 3 * #[OPERAND] to LOCCTR

else if OPCODE = „RESB. then

add #[OPERAND] to LOCCTR

else if OPCODE = „BYTE. then

begin

find length of constant in bytes

add length to LOCCTR

end

else

set error flag (invalid operation code)

end (if not a comment)



end { while not END}

write last line to intermediate file

Save (LOCCTR – starting address) as program length

End {pass 1}

Algorithum for pass 1 assembler



25





25 COMP ZERO



40 J CLOOP LOOP


50 STA BUFFER


60 STA LENGTH




80 EOF BYTE C'EOF'

85 THREE WORD 3

90 ZERO WORD 0

95 RETADR RESW 1


EXAMPLE of a SIC/XE Program

For both program-counter relative and base relative addressing are too large to fit into a 3-byte

instruction, then the 4-byte extended format (Format 4) must be used.

The extended instruction format is specified with the prefix + added to the operation code in

the source statement (see lines 15, 35, 65). It is the programmer's responsibility to specify this form

of addressing when it is required. The main differences between this version of the program and the

version in Fig. 2.1 involve the use of register-to-register instructions wherever possible. In addition,

immediate and indirect addressing has been used as much as possible (for example, lines 25, 55, and

70).


26

These changes take advantage of the more advanced SIC/XE architecture to improve the

execution speed of the program. Register-to-register instructions are faster than the corresponding

register-to-memory operations because they are shorter, and, more importantly, because they do not

require another memory reference. (Fetching an operand from a register is much faster than

retrieving it from main memory.)

Likewise, when using immediate addressing, the operand is already present as part of the

instruction and need not be fetched from anywhere. The use of indirect addressing often avoids the

need for another instruction (as in the "return" operation on line 70). You may notice that some of

the changes require the addition of other instructions to the program. This still results in an

improvement in execution speed.

The CLEAR is executed only once for each record read, whereas the benefits of COMPR (as

opposed to COMP) are realized for every byte of data transferred. In Section 2.2.1, we examine the

assembly of this SIC/XE program, focusing on the differences in the assembler that are required by

the new addressing modes. These changes are direct con-sequences of the extended hardware

functions.

An indirect consequence of the change to SIC/XE. The larger main memory of SIC/XE

means that we may have room to load and run several programs at the same time. This kind of

sharing of the machine between programs is called multiprogramming. Such sharing often results in

more productive use of the hardware.

INSTRUCTION FORMATS AND ADDRESSING MODES

The object code generated for each statement in the program. In this section we consider the

translation of the source statements, paying particular attention to the handling of different

instruction format and different addressing modes. Note that the START statement now specifies a

beginning program address of 0. As we discuss in the next section, this indicates a relocatable

program. For the purposes of instruction assembly, how-ever, the program will be translated exactly

as if it were really to be loaded at machine address 0.

Translation of register-to-register instructions such as CLEAR and COMPR presents no new

problems. The assembler must simply convert the mnemonic operation code to machine language

(using OPTAB) and change each register mnemonic to its numeric equivalent. This translation is

done during Pass 2, at the same point at which the other types of instructions are assembled.

The conversion of register mnemonics to numbers can be done with a separate table;

however, it is often convenient to use the symbol table for this purpose. To do this, SYMTAB would

be preloaded with the register names (A, X, etc.) and their values (0, 1, etc.).

Most of the register-to-memory instructions are assembled using either program-counter

relative or base relative addressing. The assembler must, in either case, calculate a displacement to

be assembled as part of the object instruction.

This is computed so that the correct target address results when the displacement is added to

the contents of the program counter (PC) or the base register (B). Of course, the resulting

displacement must be small enough to fit in the 12-bit field in the instruction. This means that the


27

displacement must be between 0 and 4095 (for base relative mode) or between —2048 and +2047

(for program-counter relative mode).

If neither program-counter relative nor base relative addressing can be used (because the

displacements are too large), then the 4-byte extended instruction format (Format 4) must be used.

This 4-byte format contains a 20-bit address field, which is large enough to contain the full memory

address. In this case, there is no displacement to be calculated. For example, in the instruction

15 0006 CLOOP +JSUB RDREC 48101036

The operand address is 1036. This full address is stored in the instruction, with bit e set to 1

to indicate extended instruction format. Note that the programmer must specify the extended format

by using the prefix + (as on line 15). If extended format is not specified, our assembler first attempts

to translate the instruction using program-counter relative addressing. If this is not possible (because

the required displacement is out of range), the assembler then attempts to use base relative

addressing. If neither form of

Line Loc Source statement Object code

5 1000 COPY START 1000

10 1000 FIRST STL RETADR 141033

15 1003 CLOOP JSUB RDREC 482039

20 1006 LDA LENGTH 001036

25 1009 COMP ZERO 281030

30 100C JEQ ENDFIL 301015

35 100F JSUB WRREC 482061

40 1012 J CLOOP 3C1003

45 1015 ENDFIL LDA EOF 00102A

50 1018 STA BUFFER 0C1039


60 101E STA LENGTH 0C1036

65 1021 JSUB WRREC 482061

70 1024 LDL RETADR 081033

75 1027 RSUB 4C0000

80 1027 EOF BYTE C'EOF' 454F46

85 102D THREE WORD 3 000003


28

90 1030 ZERO WORD 0 000000

95 1033 RETADR RESW 1

100 1036 LENGTH RESW 1

Program with object code

Relative addressing is applicable and extended format is not specified , then the instruction cannot be

properly assembled. In this case , the assembler must generate an error message.

PROGRAM RELOCATION Sometimes it is required to load and run several programs at the same time. The system must

be able to load these programs wherever there is place in the memory. Therefore the exact starting is

not known until the load time.

Absolute Program

In this the address is mentioned during assembling itself. This is called Absolute Assembly.

Consider the instruction:


This statement says that the register A is loaded with the value stored at location 102D.

Suppose it is decided to load and execute the program at location 2000 instead of location 1000.

Then at address 102D the required value which needs to be loaded in the register A is no more

available. The address also gets changed relative to the displacement of the program.

Hence we need to make some changes in the address portion of the instruction so that we can

load and execute the program at location 2000. Apart from the instruction which will undergo a

change in their operand address value as the program load address changes. There exist some parts

in the program which will remain same regardless of where the program is being loaded.

Since assembler will not know actual location where the program will get loaded, it cannot

make the necessary changes in the addresses used in the program. However, the assembler identifies

for the loader those parts of the program which need modification. An object program that has the

information necessary to perform this kind of modification is called the relocatable program.

Relocation Program


29

The above diagram shows the concept of relocation. Initially the program is loaded at

location 0000. The instruction JSUB is loaded at location 0006. The address field of this instruction

contains 01036, which is the address of the instruction labeled RDREC. The second figure shows

that if the program is to be loaded at new location 5000. The address of the instruction JSUB gets

modified to new location 6036. Likewise the third figure shows that if the program is relocated at

location 7420, the JSUB instruction would need to be changed to 4B108456 that correspond to the

new address of RDREC.

The only part of the program that require modification at load time are those that specify

direct addresses. The rest of the instructions need not be modified. The instructions which doesn’t

require modification are the ones that is not a memory address (immediate addressing) and PC-

relative, Base-relative instructions. From the object program, it is not possible to distinguish the

address and constant.

The role of relocation, the ability to execute processes independently from their physical

location in memory, is central for memory management: virtually all the techniques in this field

rely on the ability to relocate processes efficiently. The need for relocation is immediately

evident when one considers that in a general-purpose multiprogramming environment a program

cannot know in advance (before execution, i.e. at compile time) what processes will be running

in memory when it is executed, nor how much memory the system has available for it, nor where

it is located. Hence a program must be compiled and linked in such a way that it can later be

loaded starting from an unpredictable address in memory, an address that can even change

during the execution of the process itself, if any swapping occurs.

It's easy to identify the basic requirement for a (binary executable) program to be

relocatable: all the references to memory it makes during its execution must not contain absolute

(i.e. physical) addresses of memory cells, but must be generated relatively, i.e. as a distance,

measured in number of contiguous memory words, from some known point. The memory

references a program can generate are of two kinds: references to instructions ad references to

data. The former kind is implied in the execution of program branches or subroutine calls: a

jump machine instruction always involves the loading of the CPU program counter register with

the address of the memory word containing the instruction to jump to. The executable code of a

relocatable program must then contain only relative branch machine instructions, in which the

address to branch to is specified as an increment (or decrement) with respect to the address of the

current instruction (or to the content of a register or memory word). The latter kind comes into

play when whenever program variables (including program execution variables, like a subroutine

call stack) are accessed. In this case relocation is made possible by the use of indexed or

increment processor addressing modes, in which the address of a memory word is computed at

reference time as the sum of the content of a register plus an increment or a decrement.

As we'll see later, the memory references of a process in a multitasking environment must

somehow be bounded, so to protect from unwanted interferences memory areas like the

unwritable parts of the process itself, or the memory areas containing the images of other

processes, etc. This is usually accomplished in hardware by comparing the address of each

memory reference produced by a process with the content of one or more bound registers or


30

memory words, so that the processor traps an exception to block the process should an illegal

address be generated

A Scheme Of The Address Computation Involved In The Memory References Of A

Relocatable Program Is Shown.

The assembler must keep some information to tell the loader. The object program that

contains the modification record is called a relocatable program. For an address label, its address is

assigned relative to the start of the program (START 0). The assembler produces a Modification

record to store the starting location and the length of the address field to be modified. The command

for the loader must also be a part of the object program. The Modification has the following format:

Modification record

Col. 1 M

Col. 2-7 Starting location of the address field to be modified, relative to the beginning of

the program (Hex)

Col. 8-9 Length of the address field to be modified, in half-bytes (Hex)


31

One modification record is created for each address to be modified The length is stored in

half bytes (4 bits) The starting location is the location of the byte containing the leftmost bits of the

address field to be modified. If the field contains an odd number of half-bytes, the starting location

begins in the middle of the first byte.


In the above object code the red boxes indicate the addresses that need modifications. The

object code lines at the end are the descriptions of the modification records for those instructions

which need change if relocation occurs. M00000705 is the modification suggested for the statement

at location 0007 and requires modification 5-half bytes. Similarly the remaining instructions

indicate.

MACHINE-INDEPENDENT ASSEMBLER FEATURES These are the features which do not depend on the architecture of the machine. These are:

Literals

Symbol-Defining Statements

Expressions

Program blocks

Control sections and program linking.

LITERALS In programming, a value written exactly as it's meant to be interpreted. In contrast, a

variable is a name that can represent different values during the execution of the program. And a

constant is a name that represents the same value throughout a program. But a literal is not a

name -- it is the value itself.

A literal can be a number, a character, or a string. For example, in the expression,

x = 3

x is a variable, and 3 is a literal.

A literal is defined with a prefix = followed by a specification of the literal value.

Example:

45 001A ENDFIL LDA =C.EOF 032010

This statement specifies a 3byte operand whose value is the character of EOF.

215 1062 WLOOP TD =X’05’ E32011

This statement specifies a 1byte literal with the hexadecimal value 05.

- -

93 LTORG

002D * =C.EOF. 454F46

The object code for the instruction is also mentioned in the above example. It shows the

relative displacement value of the location where this value is stored. In the example the value is at

location (002D) and hence the displacement value is (010).

It is important to understand the difference between a constant defined as a literal and a

http://www.webopedia.com/TERM/V/variable.html

http://www.webopedia.com/TERM/N/name.html

http://www.webopedia.com/TERM/P/program.html

http://www.webopedia.com/TERM/C/constant.html

http://www.webopedia.com/TERM/C/character.html

http://www.webopedia.com/TERM/C/character_string.html

http://www.webopedia.com/TERM/E/expression.html


32

constant defined as an immediate operand. In case of literals the assembler generates the specified

value as a constant at some other memory location In immediate mode the operand value is

assembled as part of the instruction itself. Example

55 0020 LDA #03 010003

All the literal operands used in a program are gathered together into one or more literal pools.

This is usually placed at the end of the program. The assembly listing of a program containing

literals usually includes a listing of this literal pool, which shows the assigned addresses and the

generated data values. In some cases it is placed at some other location in the object program. An

assembler directive LTORG is used. Whenever the LTORG is encountered, it creates a literal pool

that contains all the literal operands used since the beginning of the program. The literal pool

definition is done after LTORG is encountered. It is better to place the literals close to the

instructions.

A literal table is created for the literals which are used in the program. The literal table

contains the literal name, operand value and length. The literal table is usually created as a hash table

on the literal name.






25 COMP ZERO



40 J CLOOP LOOP


50 STA BUFFER


60 STA LENGTH




80 EOF BYTE C'EOF'


33

85 THREE WORD 3

90 ZERO WORD 0

95 RETADR RESW 1


IMPLEMENTATION OF LITERALS

During Pass-1:

The literal encountered is searched in the literal table. If the literal already exists, no action is

taken; if it is not present, the literal is added to the LITTAB and for the address value it waits till it

encounters LTORG for literal definition. When Pass 1 encounters a LTORG statement or the end of

the program, the assembler makes a scan of the literal table. At this time each literal currently in the

table is assigned an address. As addresses are assigned, the location counter is updated to reflect the

number of bytes occupied by each literal.

During Pass-2:

The assembler searches the LITTAB for each literal encountered in the instruction and

replaces it with its equivalent value as if these values are generated by BYTE or WORD. If a

literal represents an address in the program, the assembler must generate a modification

relocation for, if it all it gets affected due to relocation. The following figure shows the difference

between the SYMTAB and LITTAB

SYMBOL-DEFINING STATEMENTS EQU Statement:

Most assemblers provide an assembler directive that allows the programmer to define

symbols and specify their values. The directive used for this EQU (Equate). The general form of

the statement is

Symbol EQU value

This statement defines the given symbol (i.e., entering in the SYMTAB) and assigning to it

the value specified. The value can be a constant or an expression involving constants and any other

symbol which is already defined. One common usage is to define symbolic names that can be used

to improve readability in place of numeric values. For example

+LDT #4096

This loads the register T with immediate value 4096, this does not clearly what exactly this

value indicates. If a statement is included as:

MAXLEN EQU 4096 and then

+LDT #MAXLEN

Then it clearly indicates that the value of MAXLEN is some maximum length value. When

the assembler encounters EQU statement, it enters the symbol MAXLEN along with its value in the

symbol table. During LDT the assembler searches the SYMTAB for its entry and its equivalent

value as the operand in the instruction. The object code generated is the same for both the options

discussed, but is easier to understand.

If the maximum length is changed from 4096 to 1024, it is difficult to change if it is

mentioned as an immediate value wherever required in the instructions. We have to scan the whole


34

program and make changes wherever 4096 is used. If we mention this value in the instruction

through the symbol defined by EQU, we may not have to search the whole program but change only

the value of MAXLENGTH in the EQU statement (only once).

Another common usage of EQU statement is for defining values for the general-purpose

registers. The assembler can use the mnemonics for register usage like a-register A, X – index

register and so on. But there are some instructions which requires numbers in place of names in the

instructions. For example in the instruction RMO 0,1 instead of RMO A,X. The programmer can

assign the numerical values to these registers using EQU directive.

A EQU 0

X EQU 1 and so on

These statements will cause the symbols A, X, L… to be entered into the symbol table with

their respective values. An instruction RMO A, X would then be allowed. As another usage if in a

machine that has many general purpose registers named as R1, R2… some may be used as base

register, some may be used as accumulator. Their usage may change from one program to another.

In this case we can define these requirement using EQU statements.

BASE EQU R1

INDEX EQU R2

COUNT EQU R3

One restriction with the usage of EQU is whatever symbol occurs in the right hand side of

the EQU should be predefined. For example, the following statement is not valid:

BETA EQU ALPHA

ALPHA RESW 1

As the symbol ALPHA is assigned to BETA before it is defined. The value of ALPHA is not known.

ORG STATEMENT This directive can be used to indirectly assign values to the symbols. The directive is usually

called ORG (for origin). Its general format is:

ORG value

where value is a constant or an expression involving constants and previously defined symbols.

When this statement is encountered during assembly of a program, the assembler resets its location

counter (LOCCTR) to the specified value. Since the values of symbols used as labels are taken from

LOCCTR, the ORG statement will affect the values of all labels defined until the next ORG is

encountered. ORG is used to control assignment storage in the object program.

Sometimes altering the values may result in incorrect assembly.

ORG can be useful in label definition. Suppose we need to define a symbol table with the following

structure:

SYMBOL 6 Bytes

VALUE 3 Bytes

FLAG 2 Bytes

The table looks like the one given below.

The symbol field contains a 6-byte user-defined symbol; VALUE is a one-word representation of the

value assigned to the symbol; FLAG is a 2-byte field specifies symbol type and other information.

The space for the table can be reserved by the statement:

STAB RESB 1100


35

If we want to refer to the entries of the table using indexed addressing, place the offset value of the

desired entry from the beginning of the table in the index register. To refer to the fields SYMBOL,

VALUE, and FLAGS individually, we need to assign the values first as shown below:

SYMBOL EQU STAB

VALUE EQU STAB+6

FLAGS EQU STAB+9

To retrieve the VALUE field from the table indicated by register X, we can write a statement:

LDA VALUE, X

The same thing can also be done using ORG statement in the following way:

STAB RESB 1100

ORG STAB

SYMBOL RESB 6

VALUE RESW 1

FLAG RESB 2

ORG STAB+1100

The first statement allocates 1100 bytes of memory assigned to label STAB. In the second statement

the ORG statement initializes the location counter to the value of STAB. Now the LOCCTR points

to STAB. The next three lines assign appropriate memory storage to each of SYMBOL, VALUE and

FLAG symbols. The last ORG statement reinitializes the LOCCTR to a

new value after skipping the required number of memory for the table STAB (i.e., STAB+1100).

While using ORG, the symbol occurring in the statement should be predefined as is required in EQU

statement. For example for the sequence of statements below:

ORG ALPHA

BYTE1 RESB 1

BYTE2 RESB 1

BYTE3 RESB 1

ORG

ALPHA RESB 1

The sequence could not be processed as the symbol used to assign the new location counter value is

not defined. In first pass, as the assembler would not know what value to assign to ALPHA, the

other symbol in the next lines also could not be defined in the symbol table. This is a kind of

problem of the forward reference.

EXPRESSIONS: Assembler language statements have used single terms (labels, literals, etc.) as instruction

operands. Most assemblers allow the use of expressions wherever such a single operand is permitted.

Each such expression must, of course, be evaluated by the assembler to produce a single operand

address or value. Assemblers generally allow arithmetic expressions formed according to the normal

rules using the operators +, and /. Division is usually defined to produce an integer result. Individual

terms in the expression may be constants, user-defined symbols, or special terms. The most common

such special term is the current value of the location counter (often designated by 1. This term

represents the value of the next unassigned memory location.

106 BUFEND EQU

gives BUFEND a value that is the address of the next byte after the buffer area.


36

we discussed the problem of program relocation. We saw that some values in the object

program are relative to the beginning of the pro-gram, while others are absolute (independent of

program location). Similarly, the values of terms and expressions are either relative or absolute. A

constant is, of course, an absolute term. Labels on instructions and data are, and references to the

location counter value, are relative terms.

A symbol whose value is given by EQU (or some similar assembler directive) may be either

an absolute term or a relative term depending upon the expression used to define its value.

Expressions are classified as either absolute expressions or relative expressions depending upon the

type of value they produce. An expression that contains only absolute terms is, of course, an absolute

expression. However, absolute expressions may also contain relative terms provided the relative

terms occur in pairs and the terms in each such pair have opposite signs. It is not necessary that the

paired terms be adjacent to each other in the expression; however, all relative terms must be capable

of being paired in this way. None of the relative terms may enter into a multiplication or division

operation.

A relative expression is one M which all of the relative terms except one can be paired as

described above; the remaining unpaired relative term must have a positive sign. As before, no

relative term may enter into a multiplication or division operation. Expressions that do not meet the

conditions given for either absolute or relative expressions should be flagged by the assembler as

errors. Although the rules given above may seem arbitrary, they are actually quite reasonable. The

expressions that are legal under these definitions include exactly those expressions whose value

remains meaningful when the program is relocated.

A relative term or expression represents some value that may be written as r, where S is the

starting address of the program and r is the value of the term or expression relative to the starting

address. Thus a relative term usually represents some location within the program. When relative

terms are paired with opposite signs, the dependency on the program starting address is cancelled out

the result is an absolute value. In the statement

107 MAXLEN EQU BUFEND-BUFFER

Both BUFEND and BUFFER are relative terms each representing an address within the program.

However, the expression represents the absolute value the difference between the two addresses

which is the length of the buffer area in bytes.

Expressions such as BUFEND + BUFFER, 100 - BUFFER, or 3 • BUFFER represent neither

absolute values nor locations within the program. The values of these expressions depend upon the

program starting address in a way that is unrelated to anything within the program itself. Because

such expressions are very unlikely to be of any use, they are considered errors.

To determine the type of an expression, we must keep track of the types of all symbols

defined in the program. For this purpose we need a flag in the symbol table to indicate type of value

(absolute or relative) in additional the value itself. Thus for the program, some of the symbol table

entries might be


37

With this information the assembler can easily determine the type of each expression used as

an operand and generate Modification records in the object program for relative values. In Section

235 we consider programs that consist of several parts that can be relocated independently of ea.

other. As we discuss in the later section, our rules for determining the type of an expression must be

modified in such instances.

Assemblers also allow use of expressions in place of operands in the instruction. Each

such expression must be evaluated to generate a single operand value or address. Assemblers

generally arithmetic expressions formed according to the normal rules using arithmetic operators +, -

*, /. Division is usually defined to produce an integer result. Individual terms may be constants, user-

defined symbols, or special terms. The only special term used is * ( the current value of location

counter) which indicates the value of the next unassigned memory location. Thus the statement

BUFFEND EQU *

Assigns a value to BUFFEND, which is the address of the next byte following the buffer

area. Some values in the object program are relative to the beginning of the program and some are

absolute (independent of the program location, like constants). Hence, expressions are classified as

either absolute expression or relative expressions depending on the type of value

they produce.

Absolute Expressions: The expression that uses only absolute terms is absolute expression.

Absolute expression may contain relative term provided the relative terms occur in pairs with

opposite signs for each pair. Example:

MAXLEN EQU BUFEND-BUFFER

In the above instruction the difference in the expression gives a value that does not depend on

the location of the program and hence gives an absolute immaterial o the relocation of the

program. The expression can have only absolute terms. Example:

MAXLEN EQU 1000

Relative Expressions:

All the relative terms except one can be paired as described in “absolute”. The remaining

unpaired relative term must have a positive sign.

Example:

STAB EQU OPTAB + (BUFFEND – BUFFER)

Handling the type of expressions: to find the type of expression, we must keep track the type

of symbols used. This can be achieved by defining the type in the symbol table against each of the

symbol as shown in the table below:


38

PROGRAM BLOCKS Program blocks allow the generated machine instructions and data to appear in the object

program in a different order by Separating blocks for storing code, data, stack, and larger data

block.

Assembler Directive USE:

USE [blockname]

At the beginning, statements are assumed to be part of the unnamed (default) block. If no

USE statements are included, the entire program belongs to this single block. Each program block

may actually contain several separate segments of the source program.

Assemblers rearrange these segments to gather together the pieces of each block and assign

address. Separate the program into blocks in a particular order. Large buffer area is moved to the end

of the object program. Program readability is better if data areas are placed in the source program

close to the statements that reference them.

The assembler directive USE indicates which portions of the source pro-gram belong to the

various blocks. At the beginning of the program, statements are assumed to be part of the unnamed

(default) block; if no USE statements are included, the entire program belongs to this single block.

The USE statement on line 92 signals the beginning of the block named CDATA. Source

statements are associated with this block until the USE statement on line 103, which begins the

block named CBLKS. The USE statement may also indicate a continuation of a previously begun

block. Thus the statement on line 123 resumes the default block, and the statement on line 183

resumes the block named CDATA.

As we can see, each program block may actually contain several separate segments of the

source program. The assembler will (logically) rearrange these segments to gather together the

pieces of each block. These blocks will then be assigned addresses in the object program, with the

blocks appearing in the same order in which they were first begun in the source program. The result

is the same as if the programmer had physically rearranged the source statements to group together

all the source lines belonging to each block.

The assembler accomplishes this logical rearrangement of code by maintaining, during Pass

1, a separate location counter for each program block. The location counter for a block is initialized

to 0 when the block is first begun. The current value of this location counter is saved when switching

to another block and the saved value is restored when resuming a previous block. Thus during Pass 1

each label in the program is assigned an address that is relative to the start of the block that contains

it.

When labels are entered into the symbol table, the block name or number is stored along with

the assigned relative address. At the end of Pass 1 the latest value of the location counter for each

block indicates the length of that block. The assembler can then assign to each block a starting

address in the object program (beginning with relative location 0). For code generation during Pass2,

the assembler needs the address for each symbol relative to the start of the object program (not the

start of an individual program block).


39

Line Source Statement


40

Example of program with multiple blocks


41

This is easily found from the information in SYMTAB. The assembler simply adds the

location of the symbol, relative to the start of its block, to the assigned block starting address. It

demonstrates this process applied to our sample program. The column headed Loc/Block shows the

relative address (within a program block) assigned to each source line and a block number indicating

which program block is involved (0 = default block, 1 = CDATA, 2 = CBLKS). This is essentially

the same information that is stored in SYMTAB for each symbol. Notice that the value of the

symbol MAXLEN (line 107) is shown without a block number.

This indicates that MAXLEN is an absolute symbol, whose value is not relative to the start of

any program block. At the end of Pa. 1 the assembler constructs a table that contains the starting

addresses and lengths for all blocks. For our sample program, this table looks like

Now consider the instruction

20 0006 LDA LENGTH 032060

SYMTAB shows the value of the operand (the symbol LENGTH) as relative location 0003 within

program block 1 (CDATA). The starting address for CDATA is .6. Thus the desired target address

for this instruction is 0003 + 0066 =.9. The instruction is to be assembled using program-counter

relative addressing. When the instruction is executed, the program counter contains the address of

the following instruction Other 25). The address of this instruction is relative location 0009 within

the default block. Since the default block starts at location 0000, this address is simply 0009. Thus

the required displacement is .9 - 0009 = 60. The calculation of the other addresses during Pass 2

follows a similar pattern.

Arranging code into program blocks:

Pass 1

•A separate location counter for each program block is maintained.

•Save and restore LOCCTR when switching between blocks.

•At the beginning of a block, LOCCTR is set to 0.

•Assign each label an address relative to the start of the block.

•Store the block name or number in the SYMTAB along with the assigned relative address of the

label

•indicate the block length as the latest value of LOCCTR for each block at the end of Pass1


42

\

Fig: Program blocks trace through the assembly and loading processes

In the example below three blocks are used:

Default: executable instructions

CDATA: all data areas that are less in length

CBLKS: all data areas that consists of larger blocks of memory

Example Code

Arranging code into program blocks:

Pass 1

A separate location counter for each program block is maintained.

Save and restore LOCCTR when switching between blocks.

At the beginning of a block, LOCCTR is set to 0.

Assign each label an address relative to the start of the block.

Store the block name or number in the SYMTAB along with the assigned relative address of

the label.

Indicate the block length as the latest value of LOCCTR for each block at the end of Pass1

Assign to each block a starting address in the object program by concatenating the program

blocks in a particular order


43

Pass 2

Calculate the address for each symbol relative to the start of the object program by adding

The location of the symbol relative to the start of its block

The starting address of this block.

CONTROL SECTIONS AND PROGRAM LINKING A control section is a part of the program that maintains its identity after assembly; each

control section can be loaded and relocated independently of the others. Different control sections

are most often used for subroutines or other logical subdivisions. The programmer can assemble,

load, and manipulate each of these control sections separately.

Because of this, there should be some means for linking control sections together. For

example, instructions in one control section may refer to the data or instructions of other control

sections. Since control sections are independently loaded and relocated, the assembler is unable to

process these references in the usual way. Such references between different control sections are

called external references.

The assembler generates the information about each of the external references that will allow

the loader to perform the required linking. When a program is written using multiple control

sections, the beginning of each of the control section is indicated by an assembler directive

assembler directive: CSECT

secname CSECT

This syntax separates location counter for each control section.

Control sections differ from program blocks in that they are handled separately by the

assembler. Symbols that are defined in one control section may not be used directly another control

section; they must be identified as external reference for the loader to handle. The external

references are indicated by two assembler directives:

EXTDEF (external Definition):

It is the statement in a control section, names symbols that are defined in this section but may

be used by other control sections. Control section names do not need to be named in the EXTREF as

they are automatically considered as external symbols.

EXTREF (external Reference):

It names symbols that are used in this section but are defined in some other control section.

The order in which these symbols are listed is not significant. The assembler must include proper

information about the external references in the object program that will cause the loader to insert

the proper value where they are required.

Handling External Reference

Case 1

15 0003 CLOOP +JSUB RDREC 4B100000

The operand RDREC is an external reference.

The assembler has no idea where RDREC is inserts an address of zero can only use extended

format to provide enough room (that is, relative addressing for external reference is invalid)

The assembler generates information for each external reference that will allow the loader to

perform the required linking.

Case 2

190 0028 MAXLEN WORD BUFEND-BUFFER 000000

There are two external references in the expression, BUFEND and BUFFER.


44

The assembler inserts a value of zero passes information to the loader

Add to this data area the address of BUFEND

Subtract from this data area the address of BUFFER

Case 3

On line 107, BUFEND and BUFFER are defined in the same control section and the

expression can be calculated immediately.

107 1000 MAXLEN EQU BUFEND-BUFFER

Object Code for the example program:


45

The assembler must also include information in the object program that will cause the loader to

insert the proper value where they are required. The assembler maintains two new record in the

object code and a changed version of modification record.

Define record (EXTDEF)

Col. 1 D

Col. 2-7 Name of external symbol defined in this control section

Col. 8-13 Relative address within this control section (hexadecimal)

Col.14-73 Repeat information in Col. 2-13 for other external symbols

Refer record (EXTREF)

Col. 1 R

Col. 2-7 Name of external symbol referred to in this control section

Col. 8-73 Name of other external reference symbols

Modification record

Col. 1 M

Col. 2-7 Starting address of the field to be modified (hexadecimal)

Col. 8-9 Length of the field to be modified, in half-bytes (hexadecimal)

Col.11-16 External symbol whose value is to be added to or subtracted from the indicated field

A define record gives information about the external symbols that are defined in this control

section, i.e., symbols named by EXTDEF. A refer record lists the symbols that are used as external

references by the control section, i.e., symbols named by EXTREF. The new items in the

modification record specify the modification to be performed: adding or subtracting the value of

some external symbol. The symbol used for modification may be defined either in this control

section or in another section.

The object program is shown below. There is a separate object program for each of the

control sections. In the Define Record and refer record the symbols named in EXTDEF and

EXTREF are included. In the case of Define, the record also indicates the relative address of each

external symbol within the control section. For EXTREF symbols, no address information is

available. These symbols are simply named in the Refer record.


46

ASSEMBLER DESIGN OPTIONS The existence of multiple control sections that can be relocated independently of one another

makes the handling of expressions complicated. It is required that in an expression that all the

relative terms be paired (for absolute expression), or that all except one be paired (for relative

expressions). When it comes in a program having multiple control sections then we have an

extended restriction that: Both terms in each pair of an expression must be within the same control

section If two terms represent relative locations within the same control section , their difference is

an absolute value (regardless of where the control section is located.

Legal: BUFEND-BUFFER (both are in the same control section) If the terms are located in different

control sections, their difference has a value that is unpredictable.

Illegal: RDREC-COPY (both are of different control section) it is the difference in the load

addresses of the two control sections. This value depends on the way run-time storage is allocated; it

is unlikely to be of any use.

How to enforce this restriction

When an expression involves external references, the assembler cannot determine whether or

not the expression is legal. The assembler evaluates all of the terms it can, combines these to form an

initial expression value, and generates Modification records. The loader checks the expression for

errors and finishes the evaluation.

One-Pass Assembler

The main problem in designing the assembler using single pass was to resolve forward

references. We can avoid to some extent the forward references by: Eliminating forward reference to

data items, by defining all the storage reservation statements at the beginning of the program rather

at the end. Unfortunately, forward reference to labels on the instructions cannot be avoided. (forward

jumping) To provide some provision for handling forward references by prohibiting forward

references to data items. There are two types of one-pass assemblers: One that produces object code

directly in memory for immediate execution (Load-and-go assemblers). The other type produces the

usual kind of object code for later execution.

Load-and-Go Assembler

Load-and-go assembler generates their object code in memory for immediate execution. No

object program is written out, no loader is needed. It is useful in a system with frequent program

development and testing. The efficiency of the assembly process is an important consideration.

Forward Reference in One-Pass Assemblers: In load-and-Go assemblers when a forward reference is

encountered:

Omits the operand address if the symbol has not yet been defined

Enters this undefined symbol into SYMTAB and indicates that it is undefined

Adds the address of this operand address to a list of forward references associated with the

SYMTAB entry

When the definition for the symbol is encountered, scans the reference list and inserts the

address.

At the end of the program, reports the error if there are still SYMTAB entries indicated

undefined symbols.

For Load-and-Go assembler

Search SYMTAB for the symbol named in the END statement and jumps to this location to

begin execution if there is no error

After scanning line 40 of the program:

40 2021 J` CLOOP 302012


47

The status is that upto this point the symbol RREC is referred once at location 2013, ENDFIL at

201F and WRREC at location 201C. None of these symbols are defined. The figure shows that how

the pending definitions along with their addresses are included in the symbol table.

The status after scanning line 160, which has encountered the definition of RDREC and ENDFIL is

as given below:

If One-Pass needs to generate object code:

If the operand contains an undefined symbol, use 0 as the address and write the Text

record to the object program.

Forward references are entered into lists as in the load-and-go assembler.

When the definition of a symbol is encountered, the assembler generates another Text

record with the correct operand address of each entry in the reference list.

When loaded, the incorrect address 0 will be updated by the latter Text record containing

the symbol definition.

Object Code Generated by One-Pass Assembler.

Multi Pass Assembler:

For a two pass assembler, forward references in symbol definition are not allowed:

ALPHA EQU BETA

BETA EQU DELTA

DELTA RESW 1

Symbol definition must be completed in pass 1.

Prohibiting forward references in symbol definition is not a serious inconvenience. Forward

references tend to create difficulty for a person reading the program.

Implementation Issues for Modified Two-Pass Assembler:

Implementation Issues when forward referencing is encountered in Symbol Defining

statements:

For a forward reference in symbol definition, we store in the SYMTAB:

The symbol name, The defining expression, The number of undefined symbols in the

defining expression, The undefined symbol (marked with a flag *) associated with a list of symbols

depend on this undefined symbol. When a symbol is defined, we can recursively evaluate the symbol

expressions depending on the newly defined symbol.

LOADERS AND LINKERS Introduction

The Source Program written in assembly language or high level language will be converted

to object program, which is in the machine language form for execution. This conversion either from

assembler or from compiler, contains translated instructions and data values from the source

program, or specifies addresses in primary memory where these items are to be loaded for execution.

This contains the following three processes, and they are,

Loading - which allocates memory location and brings the object program into memory for

execution - (Loader)

Linking- which combines two or more separate object programs and supplies the

information needed to allow references between them - (Linker)

Relocation - Which modifies the object program so that it can be loaded at an address

different from the location originally specified - (Linking Loader)


48

Linker:

In high level languages, some built in header files or libraries are stored. These libraries are

predefined and these contain basic functions which are essential for executing the program. These

functions are linked to the libraries by a program called Linker. If linker does not find a library of a

function then it informs to compiler and then compiler generates an error. The compiler

automatically invokes the linker as the last step in compiling a program.

Not built in libraries, it also links the user defined functions to the user defined

libraries. Usually a longer program is divided into smaller subprograms called modules. And these

modules must be combined to execute the program. The process of combining the modules is done

by the linker.

Loader:

Loader is a program that loads machine codes of a program into the system memory.

In Computing, a loader is the part of an Operating System that is responsible for loading programs.

It is one of the essential stages in the process of starting a program. Because it places programs into

memory and prepares them for execution. Loading a program involves reading the contents

of executable file into memory. Once loading is complete, the operating system starts the program

by passing control to the loaded program code. All operating systems that support program loading

have loaders. In many operating systems the loader is permanently resident in memory.

BASIC LOADER FUNCTIONS A loader is a system program that performs the loading function. It brings object program

into memory and starts its execution. translator may be assembler/complier, which generates the

object program and later loaded to the memory by the loader for execution. The translator is

specifically an assembler, which generates the object loaded, which becomes input to the loader.

Type of Loaders

The different types of loaders are, absolute loader, bootstrap loader, relocating loader

(relative loader), and, direct linking loader.

ABSOLUTE LOADER

The operation of absolute loader is very simple. The object code is loaded to specified

locations in the memory. At the end the loader jumps to the specified address to begin execution of

the loaded program. The advantage of absolute loader is simple and efficient. But the disadvantages

are, the need for programmer to specify the actual address, and, difficult to use subroutine libraries.


49

Memory address content

Fig : program loaded in memory

The algorithm for this type of loader is given here. The object program and, the object

program loaded into memory by the absolute loader are also shown. Each byte of assembled code is

given using its hexadecimal representation in character form. Easy to read by human beings. Each

byte of object code is stored as a single byte. Most machine store object programs in a binary form,

and we must be sure that our file and device conventions do not cause some of

the program bytes to be interpreted as control characters.

Begin

read Header record

verify program name and length

read first Text record

while record type is <> ‘E’ do

begin

{if object code is in character form, convert into internal representation}

move object code to specified location in memory

read next object program record

end

jump to address specified in End record

end

A SIMPLE BOOTSTRAP LOADER


50

When a computer is first turned on or restarted, a special type of absolute loader, called

bootstrap loader is executed. This bootstrap loads the first program to be run by the computer usually

an operating system. The bootstrap itself begins at address 0. It loads the OS starting address 0x80.

No header record or control information, the object code is consecutive bytes of memory.

The algorithm for the bootstrap loader is as follows

Begin

X=0x80 (the address of the next memory location to be loaded

Loop

AGETC (and convert it from the ASCII character

code to the value of the hexadecimal digit)

save the value in the high-order 4 bits of S

AGETC

combine the value to form one byte A<-(A+S)

store the value (in A) to the address in register X

X X+1

End

It uses a subroutine GETC, which is

GETC A read one character

if A=0x04 then jump to 0x80

if A<48 then GETC

A A-48 (0x30)

if A<10 then return

A A-7

Return

MACHINE-DEPENDENT LOADER FEATURES Absolute loader is simple and efficient, but the scheme has potential disadvantages One of

the most disadvantage is the programmer has to specify the actual starting address, from where the

program to be loaded. This does not create difficulty, if one program to run, but not for several

programs. Further it is difficult to use subroutine libraries efficiently. This needs the design and

implementation of a more complex loader. The loader must provide program relocation and linking,

as well as simple loading functions.

RELOCATION

The concept of program relocation is, the execution of the object program using any part of

the available and sufficient memory. The object program is loaded into memory wherever there is

room for it. The actual starting address of the object program is not known until load time.

Relocation provides the efficient sharing of the machine with larger memory and when several

independent programs are to be run together. It also supports the use of subroutine libraries

efficiently. Loaders that allow for program relocation are called relocating loaders or relative

loaders.

Methods for specifying relocation

Use of modification record and, use of relocation bit, are the methods available for specifying

relocation. In the case of modification record, a modification record M is used in the object program

to specify any relocation. In the case of use of relocation bit, each instruction is associated with one

relocation bit and, these relocation bits in a Text record is gathered into bit masks.


51

Modification records are used in complex machines and are also called Relocation and

Linkage Directory (RLD) specification. The format of the modification record (M) is as follows. The

object program with relocation by Modification records is also shown here.

Modification record

col 1: M

col 2-7: relocation address

col 8-9: length (halfbyte)

col 10: flag (+/-)

col 11-17: segment name

The relocation bit method is used for simple machines. Relocation bit is 0: no modification is

necessary, and is 1: modification is needed. This is specified in the columns 10- 12 of text record

(T), the format of text record, along with relocation bits is as follows.

Text record

col 1: T

col 2-7: starting address

col 8-9: length (byte)

col 10-12: relocation bits

col 13-72: object code

Twelve-bit mask is used in each Text record (col:10-12 – relocation bits), since each text record

contains less than 12 words, unused words are set to 0, and, any value that is to be modified during

relocation must coincide with one of these 3-byte segments. For absolute loader, there are no

relocation bits column 10-69 contains object code. The object program with relocation by bit mask is

as shown below. Observe FFC - means all ten words are to be modified and, E00 - means first three

records are to be modified.

PROGRAM LINKING The Goal of program linking is to resolve the problems with external references (EXTREF)

and external definitions (EXTDEF) from different control sections. EXTDEF (external definition)

- The EXTDEF statement in a control section names symbols, called external symbols, that are

defined in this (present) control section and may be

used by other sections.

ex: EXTDEF BUFFER, BUFFEND, LENGTH

EXTDEF LISTA, ENDA

EXTREF (external reference) - The EXTREF statement names symbols used in this

(present) control section and are defined elsewhere.

ex: EXTREF RDREC, WRREC

EXTREF LISTB, ENDB, LISTC, ENDC

How to implement EXTDEF and EXTREF

The assembler must include information in the object program that will cause the loader

to insert proper values where they are required – in the form of Define record (D) and, Refer

record(R).

Define record

The format of the Define record (D) along with examples is as shown here.

Col. 1 D

Col. 2-7 Name of external symbol defined in this control section


52

Col. 8-13 Relative address within this control section (hexadecimal)

Col.14-73 Repeat information in Col. 2-13 for other external symbols

Example records

D LISTA 000040 ENDA 000054

D LISTB 000060 ENDB 000070

Refer record

The format of the Refer record (R) along with examples is as shown here.

Col. 1 R

Col. 2-7 Name of external symbol referred to in this control section

Col. 8-73 Name of other external reference symbols

Example records

R LISTB ENDB LISTC ENDC

R LISTA ENDA LISTC ENDC

R LISTA ENDA LISTB ENDB

Here are the three programs named as PROGA, PROGB and PROGC, which are

separately assembled and each of which consists of a single control section. LISTA, ENDA in

PROGA, LISTB, ENDB in PROGB and LISTC, ENDC in PROGC are external definitions in each

of the control sections. Similarly LISTB, ENDB, LISTC, ENDC in PROGA, LISTA, ENDA,

LISTC,

ENDC in PROGB, and LISTA, ENDA, LISTB, ENDB in PROGC, are external references. These

sample programs given here are used to illustrate linking and relocation. The following figures give

the sample programs and their corresponding object programs. Observe the object programs, which

contain D and R records along with other records.

0000 PROGA START 0

EXTDEF LISTA, ENDA

EXTREF LISTB, ENDB, LISTC, ENDC

………..

……….

0020 REF1 LDA LISTA 03201D

0023 REF2 +LDT LISTB+4 77100004

0027 REF3 LDX #ENDA-LISTA 050014

. .

0040 LISTA EQU *

0054 ENDA EQU *

0054 REF4 WORD ENDA-LISTA+LISTC 000014

0057 REF5 WORD ENDC-LISTC-10 FFFFF6

005A REF6 WORD ENDC-LISTC+LISTA-1 00003F

005D REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000014

0060 REF8 WORD LISTB-LISTA FFFFC0

END REF1

0000 PROGB START 0

EXTDEF LISTB, ENDB

EXTREF LISTA, ENDA, LISTC, ENDC

………..

……….

0036 REF1 +LDA LISTA 03100000


53

003A REF2 LDT LISTB+4 772027

003D REF3 +LDX #ENDA-LISTA 05100000

. .

0060 LISTB EQU *

0070 ENDB EQU *


0073 REF5 WORD ENDC-LISTC-10 FFFFF6

0076 REF6 WORD ENDC-LISTC+LISTA-1 FFFFFF

0079 REF7 WORD ENDA-LISTA-(ENDB-LISTB) FFFFF0

007C REF8 WORD LISTB-LISTA 000060

END

0000 PROGC START 0

EXTDEF LISTC, ENDC

EXTREF LISTA, ENDA, LISTB, ENDB

………..

………..

0018 REF1 +LDA LISTA 03100000

001C REF2 +LDT LISTB+4 77100004

0020 REF3 +LDX #ENDA-LISTA 05100000

. .

0030 LISTC EQU *

0042 ENDC EQU *


0045 REF5 WORD ENDC-LISTC-10 000008

0045 REF6 WORD ENDC-LISTC+LISTA-1 000011

004B REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000000

004E REF8 WORD LISTB-LISTA 000000

END

H PROGA 000000 000063

D LISTA 000040 ENDA 000054

R LISTB ENDB LISTC ENDC

. .

T 000020 0A 03201D 77100004 050014

. .

T 000054 0F 000014 FFFF6 00003F 000014 FFFFC0

M000024 05+LISTB

M000054 06+LISTC

M000057 06+ENDC

M000057 06 -LISTC

M00005A06+ENDC

M00005A06 -LISTC

M00005A06+PROGA

M00005D06-ENDB

M00005D06+LISTB

M00006006+LISTB

M00006006-PROGA


54

E000020

H PROGB 000000 00007F

D LISTB 000060 ENDB 000070

R LISTA ENDA LISTC ENDC

.

T 000036 0B 03100000 772027 05100000

.

T 000007 0F 000000 FFFFF6 FFFFFF FFFFF0 000060

M000037 05+LISTA

M00003E 06+ENDA

M00003E 06 -LISTA

M000070 06 +ENDA

M000070 06 -LISTA

M000070 06 +LISTC

M000073 06 +ENDC

M000073 06 -LISTC

M000073 06 +ENDC

M000076 06 -LISTC

M000076 06+LISTA

M000079 06+ENDA

M000079 06 -LISTA

M00007C 06+PROGB

M00007C 06-LISTA

E

H PROGC 000000 000051

D LISTC 000030 ENDC 000042

R LISTA ENDA LISTB ENDB

.

T 000018 0C 03100000 77100004 05100000

.

T 000042 0F 000030 000008 000011 000000 000000

M000019 05+LISTA

M00001D 06+LISTB

M000021 06+ENDA

M000021 06 -LISTA

M000042 06+ENDA

M000042 06 -LISTA

M000042 06+PROGC

M000048 06+LISTA

M00004B 06+ENDA

M00004B 006-LISTA

M00004B 06-ENDB

M00004B 06+LISTB

M00004E 06+LISTB

M00004E 06-LISTA

E


55

The following figure shows these three programs as they might appear in memory after

loading and linking. PROGA has been loaded starting at address 4000, with PROGB and PROGC

immediately following.

For example, the value for REF4 in PROGA is located at address 4054 (the beginning

address of PROGA plus 0054, the relative address of REF4 within PROGA). The following figure

shows the details of how this value is computed.

The initial value from the Text record T0000540F000014FFFFF600003F000014FFFFC0 is

000014. To this is added the address assigned to LISTC, which is 4112 (the beginning address of

PROGC plus 30). The result is 004126. That is REF4 in PROGA is ENDA-LISTA+LISTC=4054-

4040+4112=4126. Similarly the load address for symbols LISTA: PROGA+0040=4040, LISTB:

PROGB+0060=40C3 and LISTC: PROGC+0030=4112

Keeping these details work through the details of other references and values of these references are

the same in each of the three programs.

ALGORITHM AND DATA STRUCTURES FOR A LINKING LOADER The algorithm for a linking loader is considerably more complicated than the absolute loader

program, which is already given. The concept given in the program linking section is used for

developing the algorithm for linking loader. The modification records are used for relocation so that

the linking and relocation functions are performed using the same mechanism. Linking Loader uses

two-passes logic. ESTAB (external symbol table) is the main data structure for a linking loader.

Pass 1: Assign addresses to all external symbols

Pass 2: Perform the actual loading, relocation, and linking

ESTAB - ESTAB for the example (refer three programs PROGA PROGB and PROGC)

given is as shown below.

The ESTAB has four entries in it; they are name of the control section, the symbol appearing in the

control section, its address and length of the control section.

Control section Symbol Address Length

PROGA 4000 63

LISTA 4040

ENDA 4054

PROGB 4063 7F

LISTB 40C3

ENDB 40D3

PROGC 40E2 51

LISTC 4112

ENDC 4124

Program Logic for Pass 1

Pass 1 assign addresses to all external symbols. The variables & Data structures used during

pass 1 are, PROGADDR (program load address) from OS, CSADDR (control section address),

CSLTH (control section length) and ESTAB. The pass 1 processes the Define Record. The algorithm

for Pass 1 of Linking Loader is given below.

Program Logic for Pass 2


56

Pass 2 of linking loader perform the actual loading, relocation, and linking. It uses

modification record and lookup the symbol in ESTAB to obtain its address. Finally it uses end

record of a main program to obtain transfer address, which is a starting address needed for the

execution of the program. The pass 2 process Text record and Modification record of the object

programs. The algorithm for Pass 2 of Linking Loader is given below. Improve Efficiency,

The question here is can we improve the efficiency of the linking loader. Also observe that,

even though we have defined Refer record (R), we haven’t made use of it. The efficiency can be

improved by the use of local searching instead of multiple searches of ESTAB for the same symbol.

For implementing this we assign a reference number to each external symbol in the Refer record.

Then this reference number is used in Modification records instead of external symbols. 01 is

assigned to control section name, and other numbers for external reference

symbols.

The object programs for PROGA, PROGB and PROGC are shown below, with above

modification to Refer record ( Observe R records).

Symbol and Addresses in PROGA, PROGB and PROGC are as shown below. These

are the entries of ESTAB. The main advantage of reference number mechanism is that it avoids

multiple searches of ESTAB for the same symbol during the loading of a control section

Ref No. Symbol Address

1 PROGA 4000

2 LISTB 40C3

3 ENDB 40D3

4 LISTC 4112

5 ENDC 4124

MACHINE-INDEPENDENT LOADER FEATURES Machine-independent loader features are not directly related to machine architecture and

design. Automatic Library Search and Loader Options are such Machine independent Loader

Features. AUTOMATIC LIBRARY SEARCH

This feature allows a programmer to use standard subroutines without explicitly including

them in the program to be loaded. The routines are automatically retrieved from a library as they are

needed during linking. This allows programmer to use subroutines from one or more libraries. The

subroutines called by the program being loaded are automatically fetched from the library, linked

with the main program and loaded. The loader searches the library or libraries specified for routines

that contain the definitions of these symbols in the main program.

Ref No . Symbol Address

1 PROGB 4063

2 LISTA 4040

3 ENDA 4054

4 LISTC 4112

5 ENDC 4124

Ref No. Symbol Address

1 PROGC 4063

2 LISTA 4040


57

3 ENDA 4054

4 LISTB 40C3

5 ENDB 40D3

LOADER OPTIONS Loader options allow the user to specify options that modify the standard processing. The

options may be specified in three different ways. They are, specified using a command language,

specified as a part of job control language that is processed by the operating system, and can be

specified using loader control statements in the source program.

Here are the some examples of how option can be specified.

INCLUDE program-name (library-name) - read the designated object program from a

library

DELETE csect-name – delete the named control section from the set pf programs being

loaded

CHANGE name1, name2 - external symbol name1 to be changed to name2 wherever it

appears in the object programs

LIBRARY MYLIB – search MYLIB library before standard libraries

NOCALL STDDEV, PLOT, CORREL – no loading and linking of unneeded routines

.Here is one more example giving, how commands can be specified as a part of object File, and the

respective changes are carried out by the loader.

LIBRARY UTLIB

INCLUDE READ (UTLIB)

INCLUDE WRITE (UTLIB)

DELETE RDREC, WRREC

CHANGE RDREC, READ

CHANGE WRREC, WRITE

NOCALL SQRT, PLOT

The commands are, use

UTLIB ( say utility library),

include READ and WRITE control sections from the library,

delete the control sections RDREC and WRREC from the load,

the change command causes all external references to the symbol RDREC to be changed to

the symbol READ,

similarly references to WRREC is changed to WRITE,

finally, no call to the functions SQRT, PLOT, if they are used in the program.

LOADER DESIGN OPTIONS There are some common alternatives for organizing the loading functions, including

relocation and linking. Linking Loaders – Perform all linking and relocation at load time. The Other

Alternatives are Linkage editors, which perform linking prior to load time and, dynamic linking, in

which linking function is performed at execution time.

LINKING LOADERS


58

The below diagram shows the processing of an object program using Linking Loader. The

source program is first assembled or compiled, producing an object program. A linking loader

performs all linking and loading operations, and loads the program into memory for execution.

LINKAGE EDITORS

The above figure shows the processing of an object program using Linkage editor. A Linkage

editor produces a linked version of the program often called a load module or an executable image

which is written to a file or library for later execution.

The linked program produced is generally in a form that is suitable for processing by a

relocating loader. Some useful functions of Linkage editor are, an absolute object program can be

created, if starting address is already known.

New versions of the library can be included without changing the source program. Linkage

editors can also be used to build packages of subroutines or other control sections that are generally

used together.


59

Linkage editors often allow the user to specify that external references are not to be resolved

by automatic library search – linking will be done later by linking loader – linkage editor + linking

loader – savings in space.

Distinguish linking loader from linkage editor.

Linking Loader

Linkage Editor

1. Performs linking and relocation at load time.

2. Loads the linked program directly into the

memory.

3. Linking loader has less flexibility and control

1. Linking is done prior to load time.

2. Writes a linked version of program,

which is later executed by relocating

loader

3. Linkage editors offer more flexibility and

control

DYNAMIC LINKING

The scheme that postpones the linking functions until execution. A subroutine is loaded and

linked to the rest of the program when it is first called – usually called dynamic linking, dynamic

loading or load on call. The advantages of dynamic linking are, it allow several executing programs

to share one copy of a subroutine or library. In an object oriented system, dynamic linking makes it

possible for one object to be shared by several programs. Dynamic linking provides the ability to

load the routines only when (and if) they are needed. The actual loading and linking can be

accomplished using operating system service request.


60

BOOTSTRAP LOADERS The bootstrap loader loads the first program to be run by the computer, usually it is an

operating system. The bootstrap loader is a small program that runs before any other normal program

can run. It is stored on non-volatile storage (normally the computer's ROM) so that it can still be

used after the computer has been switched off and then on again.

All these bootstraps (except paper tape) are designed to read in 512 bytes into locations

0-776, and then start program execution at 0. If you are having system problems, then it pays to

have the bootstrap halt, so that you can check for error conditions in device registers. Also, check

location 0, which is normally 240 for DEC bootstraps and 407 or 411 (separate I/D space) for

Unix (unless it's a tight load, and the out header has been removed). Some DEC bootstraps refuse

to load boot blocks that don't begin with 240. If you have doubts, load location 0 with 777 before

you start the bootstrap.

Toggle in these programs starting at location 1000. To be safe, load a trap catcher into the

following locations, and if you get a halt at 6 or 12, check the program or missing device. If it

halts at 26, then you have power supply problems. If the CPU loops on location 0 (can show as

location 2 on CPUs with data displays) then the boot block wasn't loaded.

Location Contents Comment

=======================================

000000 000777 Loop at location zero if

000002 000000 secondary bootstrap isn't loaded

000004 000006 Bus error

000006 000000

000010 000012 Reserved instruction

000012 000000

000024 000026 Power failure

000026 000000

It gives instructions as to where the operating system on a microcomputer is to be found. If

the question, how is the loader itself loaded into the memory? is asked, then the answer is, when

computer is started – with no program in memory, a program present in ROM ( absolute address)

can be made executed – may be OS itself or A Bootstrap loader, which in turn loads OS and prepares

it for execution. The first record ( or records) is generally referred to as a bootstrap loader – makes

the OS to be loaded. Such a loader is added to the beginning of all object programs that are to be

loaded into an empty and idle system.

A bootstrap loader is a small program which is held in ROM.

The processor executes this code when it gets the reset (or powerup) signal.

The bootstrap loader does a few hardware checks and then causes the processor to load and

execute the code in the boot sector of the start-up hard disc.

Finally the processor will load the main part of the operating system from disk into main

memory.


61

Alternatively referred to as bootstrapping, bootloader, or boot program, a bootstrap loader is

a program that resides in the computers EPROM, ROM, or other non-volatile memory that

automatically executed by the processor when turning on the computer. The bootstrap loader

reads the hard drives boot sector to continue the process of loading the computers operating

system. The term boostrap comes from the old phrase "Pull yourself up by your bootstraps."

Definition - What does Bootstrap mean?

A bootstrap is the process of starting up a computer. It also refers to the program that initializes

the operating system (OS) during start-up.

The term bootstrap or bootstrapping originated in the early 1950s. It referred to a bootstrap load

button that was used to initiate a hardwired bootstrap program, or smaller program that executed

a larger program such as the OS. The term was said to be derived from the expression “pulling

yourself up by your own bootstraps;” starting small and loading programs one at a time while

each program is “laced” or connected to the next program to be executed in sequence.

Bootstrap

Bootstrap is the process of loading a set of instructions when a computer is first turned on or

booted. During the start-up process, diagnostic tests are performed, such as the power-on self-test

(POST), that set or check configurations for devices and implement routine testing for the

connection of peripherals, hardware and external memory devices. The bootloader or bootstrap

program is then loaded to initialize the OS

Typical programs that load the OS are:

GNU grand unified bootloader (GRUB): A multiboot specification that allows the user to

choose one of several OSs

NT loader (NTLDR): A bootloader for Microsoft’s Windows NT OS that usually runs

from the hard drive

Linux loader (LILO): A bootloader for Linux that generally runs from a hard drive or

floppy disc

Network interface controller (NIC): Uses a bootloader that supports booting from a

network interface such as Etherboot or pre-boot execution environment (PXE)

Prior to bootstrap a computer is said to start with a blank main memory and an intact magnetic

core memory or kernel. The bootstrap allows the sequence of programs to load in order to initiate

the OS. The OS is the main program that manages all programs that run on a computer and

performs tasks such as controlling peripheral devices like a disc drive, managing directories and

files, transmitting output signals to a monitor and identifying input signals from a keyboard.

Bootstrap can also refer to as preparing early programming environments incrementally to create

more complex and user-friendly programming environments. For example, at one time the

http://www.computerhope.com/jargon/p/program.htm

http://www.computerhope.com/jargon/e/eprom.htm

http://www.computerhope.com/jargon/r/rom.htm

http://www.computerhope.com/jargon/m/memory.htm

http://www.computerhope.com/jargon/h/harddriv.htm

http://www.computerhope.com/jargon/o/os.htm

http://www.computerhope.com/jargon/o/os.htm


62

programming environment might have consisted of an assembler program and a simple text

editor. Over time, gradual improvements have led to today's sophisticated object-oriented

programming languages and graphical integrated development environments (IDEs).

SUMMARY:

Relocation - which modifies the object program so that it can be loaded at an address

different from the location originally specified - (Linking Loader).

The different types of loaders are, absolute loader, bootstrap loader, relocating loader

(relative loader), and, direct linking loader

The scheme that postpones the linking functions until execution. A subroutine is loaded

and linked to the rest of the program when it is first called – usually called dynamic

linking, dynamic loading or load on call.

The bootstrap loader loads the first program to be run by the computer, usually it is an

operating system.

A subroutine is loaded and linked to the rest of the program when it is first called –

usually called dynamic linking, dynamic loading or load on call.

. A Linkage editor produces a linked version of the program often called a load module or

an executable image which is written to a file or library for later execution.

A linking loader performs all linking and loading operations, and loads the program into

memory for execution.

This directive can be used to indirectly assign values to the symbols. The directive is

usually called ORG (for origin).

Simplified Instructional Computer (SIC) is a hypothetical computer that includes the

hardware features most often found on real machines.

The simple assembler uses two major internal data structures:

• Operation Code Table(OPTAB)

• Symbol Table (SYMTAB).


63

SUMMARY:

SUMMARY:

System software – support operation and use of computer.

Application software - solution to a problem.

Assembler translates mnemonic instructions into machine code. The instruction formats,

addressing modes etc., are of direct concern in assembler design.

Compilers must generate machine language code, taking into account such hardware

characteristics as the number and type of registers and the machine instructions available.

Location counter helps in the assignment of the addresses.

The address is mentioned during assembling itself. This is called Absolute Assembly.

The actual address of a memory location, also called an absolute address;

These are the features which do not depend on the architecture of the machine. These are:

• Literals

• Symbol-Defining Statements

• Expressions

• Program blocks

• Control sections and program linking

A literal is defined with a prefix = followed by a specification of the literal value.

This directive can be used to indirectly assign values to the symbols. The directive is

usually called ORG (for origin). Its general format is: ORG value

Program blocks allow the generated machine instructions and data to appear in the

object program in a different order by Separating blocks for storing code, data, stack, and

larger data block.

A control section is a part of the program that maintains its identity after assembly; each

control section can be loaded and relocated independently of the others.

Loading - which allocates memory location and brings the object program into memory

for execution - (Loader)

Linking- which combines two or more separate object programs and supplies the

information needed to allow references between them - (Linker)


64

UNIT II

Macroprocessor: Basic macroprocessor functions - Machine independent macroprocessor features -

concatenation of macro parameter macro processor design options-recursive macro expansion -

general purpose macro processor - macro processing within language translators.

Text Editors: Overview of editing process - user interface - editor structure

MACRO PROCESSORS

A Macro represents a commonly used group of statements in the source programming

language.

A macro instruction (macro) is a notational convenience for the programmer

It allows the programmer to write shorthand version of a program (module programming)

The macro processor replaces each macro instruction with the corresponding group of source

language statements (expanding)

Normally, it performs no analysis of the text it handles.

It does not concern the meaning of the involved statements during macro expansion.

The design of a macro processor generally is machine independent.

Two new assembler directives are used in macro definition

MACRO: identify the beginning of a macro definition

MEND: identify the end of a macro definition

PROTOTYPE FOR THE MACRO

Each parameter begins with &.

name MACRO parameters

:

body

:

MEND

Body: the statements that will be generated as the expansion of the macro.

Macro Definition and Expansion:

The figure shows the MACRO expansion. The left block shows the MACRO definition

and the right block shows the expanded macro replacing the MACRO call with its block of

executable instruction.

M1 is a macro with two parameters D1 and D2. The MACRO stores the contents of register

A in D1 and the contents of register B in D2. Later M1 is invoked with the parameters DATA1 and

DATA2, Second time with DATA4 and DATA3. Every call of MACRO is expended with the

executable statements. The statement M1 DATA1, DATA2 is a macro invocation statements that

gives the name of the macro instruction being invoked and the arguments (M1 and M2) to be used in

expanding. A macro invocation is referred as a Macro Call or Invocation.


65

MACRO EXPANSION

The program with macros is supplied to the macro processor. Each macro invocation

statement will be expanded into the statement s that form the body of the macro, with the arguments

from the macro invocation substituted for the parameters in the macro prototype. During the

expansion, the macro definition statements are deleted since they are no longer needed.

The arguments and the parameters are associated with one another according to their

positions. The first argument in the macro matches with the first parameter in the macro prototype

and so on.

After macro processing the expanded file can become the input for the Assembler. The

Macro Invocation statement is considered as comments and the statement generated from expansion

is treated exactly as though they had been written directly by the programmer. The difference

between Macros and Subroutines is that the statement s from the body of the Macro is expanded the

number of times the macro invocation is encountered, whereas the statement of the subroutine

appears only once no matter how many times the subroutine is called. Macro instructions will be

written so that the body of the macro contains no labels.

Problem of the label in the body of macro:

If the same macro is expanded multiple times at different places in the program

There will be duplicate labels, which will be treated as errors by the assembler. Solutions:

Do not use labels in the body of macro.

Explicitly use PC-relative addressing instead.

Ex, in RDBUFF and WRBUFF macros.

JEQ *+11

JLT *-14

It is inconvenient and error-prone.


66


67

Program with Macros Expanded

MACRO PROCESSOR ALGORITHM AND DATA STRUCTURE Design can be done as two-pass or a one-pass macro. In case of two-pass assembler.

Two-pass macro processor

Pass 1: Process all macro definitions

Pass 2: Expand all macro invocation statements

However, one-pass may be enough because all macros would have to be defined during the first pass

before any macro invocations were expanded.

The definition of a macro must appear before any statements that invoke that macro.

Moreover, the body of one macro can contain definitions of the other macro

Consider the example of a Macro defining another Macro.

In the example below, the body of the first Macro (MACROS) contains statement that

Define RDBUFF, WRBUFF and other macro instructions for SIC machine.

The body of the second Macro (MACROX) defines these same macros for SIC/XE machine.

A proper invocation would make the same program to perform macro invocation to run on

either SIC or SIC/XE machine.

A program that is to be run on SIC system could invoke MACROS whereas a program to be

run on SIC/XE can invoke MACROX.


68

However, defining MACROS or MACROX does not define RDBUFF and WRBUFF.

These definitions are processed only when an invocation of MACROS or MACROX is

expanded.

Example Of The Definition Of Macros Within A Macro Body

One-Pass Macro Processor:

A one-pass macro processor that alternate between macro definition and macro expansion in

a recursive way is able to handle recursive macro definition.

Restriction


69

The definition of a macro must appear in the source program before any statements that

invoke that macro.

This restriction does not create any real inconvenience.

The design considered is for one-pass assembler. The data structures required are:

DEFTAB (Definition Table)

Stores the macro definition including macro prototype and macro body

Comment lines are omitted.

References to the macro instruction parameters are converted to a positional notation for efficiency

in substituting arguments.

NAMTAB (Name Table)

Stores macro names

Serves as an index to DEFTAB

Pointers to the beginning and the end of the macro definition (DEFTAB)

ARGTAB (Argument Table)

Stores the arguments according to their positions in the argument list.

As the macro is expanded the arguments from the Argument table are

substituted for the corresponding parameters in the macro body.

The above figure shows the portion of the contents of the table during the processing of the

program.

Definition of RDBUFF is stored in DEFTAB, with an entry in NAMTAB having the pointers to

the beginning and the end of the definition. The arguments referred by the instructions are denoted

by the positional notations. For example,

TD =X.?1.

The above instruction is to test the availability of the device whose number is given by the parameter

&INDEV. In the instruction this is replaced by its positional value? 1. shows the ARTAB as it would

appear during expansion of the RDBUFF statement as given below:

CLOOP RDBUFF F1, BUFFER, LENGTH

For the invocation of the macro RDBUFF, the first parameter is F1 (input device code),

second is BUFFER (indicating the address where the characters read are stored), and the third is

LENGTH (which indicates total length of the record to be read). When the notation is encountered in

a line from DEFTAB, a simple indexing operation supplies the proper argument from ARGTAB.

The algorithm of the Macro processor is given below. This has the procedure DEFINE to

make the entry of macro name in the NAMTAB, Macro Prototype in DEFTAB. EXPAND is called

to set up the argument values in ARGTAB and expand a Macro Invocation statement. Procedure

GETLINE is called to get the next line to be processed either from the DEFTAB or from the file

itself.

When a macro definition is encountered it is entered in the DEFTAB. The normal approach

is to continue entering till MEND is encountered. If there is a program having a Macro defined

within another Macro, while defining in the DEFTAB the very first MEND is taken as the end of the

Macro definition. This does not complete the definition as there is another outer Macro which

completes the definition of Macro as a whole. Therefore the DEFINE procedure keeps a counter

variable LEVEL. Every time a Macro directive is encountered this counter is incremented by 1. The


70

moment the innermost Macro ends indicated by the directive MEND it starts decreasing the value of

the counter variable by one. The last MEND should make the counter value set to zero. So when

LEVEL becomes zero, the MEND corresponds to the original MACRO directive.

Most macro processors allow the definitions of the commonly used instructions to appear in a

standard system library, rather than in the source program. This makes the use of macros convenient;

definitions are retrieved from the library as they are needed during macro processing.

Comparison of Macro Processor Design

One-pass algorithm

Every macro must be defined before it is called

One-pass processor can alternate between macro definition and macro expansion

Nested macro definitions are allowed but nested calls are not allowed.


71


72

ALGORITHM FOR A ONE PASS MACRO PROCESSOR


73

Two-pass algorithm

Pass1: Recognize macro definitions

Pass2: Recognize macro calls.

Nested macro definitions are not allowed.


74

MACHINE-INDEPENDENT MACRO-PROCESSOR FEATURES The design of macro processor doesn’t depend on the architecture of the machine. We will be

studying some extended feature for this macro processor. These features are:

Concatenation of Macro Parameters

Generation of unique labels

Conditional Macro Expansion

Keyword Macro Parameters

CONCATENATION OF MACRO PARAMETERS Most macro processor allows parameters to be concatenated with other character strings.

Suppose that a program contains a series of variables named by the symbols XA1, XA2, XA3,…,

another series of variables named XB1, XB2, XB3,…, etc. If similar processing is to be performed

on each series of labels, the programmer might put this as a macro instruction. The parameter to such

a macro instruction could specify the series of variables to be operated on (A, B, etc.). The macro

processor would use this parameter to construct the symbols required in the macro expansion (XA1,

Xb1, etc.).

Suppose that the parameter to such a macro instruction is named &ID. The body of the macro

definition might contain a statement like LDA X&ID1 is the starting character of the macro

instruction; but the end of the parameter is not marked. So in the case of &ID1, the macro processor

could deduce the meaning that was intended. If the macro definition contains contain &ID and &ID1

as parameters, the situation would be unavoidable ambiguous. Most of the macro processors deal

with this problem by providing a special concatenation operator. In the SIC macro language, this

operator is the character. Thus the statement LDA X&ID1 can be written as

LDA X&ID -> 1

The statement SUM A and SUM BETA shows the invocation statements and the corresponding

macro expansion.

Fig: Concatenation of macro parameters


75

GENERATION OF UNIQUE LABELS As discussed it is not possible to use labels for the instructions in the macro definition, since

every expansion of macro would include the label repeatedly which is not allowed by the assembler.

This in turn forces us to use relative addressing in the jump instructions. Instead we can use the

technique of generating unique labels for every macro invocation and expansion. During macro

expansion each $ will be replaced with $XX, where xx is a two-character alphanumeric counter of

the number of macro instructions expansion.

For example,

XX = AA, AB, AC…

This allows 1296 macro expansions in a single program. The following program shows the macro

definition with labels to the instruction. The following figure shows the macro invocation an

expansion first time.

Fig: Generation of unique labels within macro expansion


76

If the macro is invoked second time the labels may be expanded as

$ABLOOP

$ABEXIT.

CONDITIONAL MACRO EXPANSION There are applications of macro processors that are not related to assemblers or assembler

programming. Conditional assembly depends on parameters provides

MACRO &COND

……..

IF (&COND NE “ “)

part I

ELSE

part II

ENDIF

………

MEND

Part I is expanded if condition part is true, otherwise part II is expanded. Compare operators:

NE, EQ, LE, GT.

Macro-Time Variables:

Fig : use of macro time looping statement


77

Macro-time variables (often called as SET Symbol) can be used to store working values

during the macro expansion. Any symbol that begins with symbol & and not a macro instruction

parameter is considered as macro-time variable. All such variables are initialized to zero which gives

the definition of the macro RDBUFF with the parameters &INDEV, &BUFADR, &RECLTH,

&EOR, &MAXLTH. According to the above program if &EOR has any value, then &EORCK is set

to 1 by using the directive SET, otherwise it retains its default value 0.

The above programs show the expansion of Macro invocation statements with different

values for the time variables. In figure 4.9(b) the &EOF value is NULL. When the macro invocation

is done, IF statement is executed, if it is true EORCK is set to 1, otherwise normal execution of the

other part of the program is continued.

The macro processor must maintain a symbol table that contains the value of all macro-time

variables used. Entries in this table are modified when SET statements are processed. The table is

used to look up the current value of the macro-time variable whenever it is required. When an IF

statement is encountered during the expansion of a macro, the specified Boolean expression is

evaluated.

If the value of this expression TRUE,

The macro processor continues to process lines from the DEFTAB until it encounters the

ELSE or ENDIF statement.

If an ELSE is found, macro processor skips lines in DEFTAB until the next ENDIF.

Once it reaches ENDIF, it resumes expanding the macro in the usual way.

If the value of the expression is FALSE,

The macro processor skips ahead in DEFTAB until it encounters next ELSE or ENDIF

statement.

The macro processor then resumes normal macro expansion.

The macro-time IF-ELSE-ENDIF structure provides a mechanism for either generating (once) or

skipping selected statements in the macro body. There is another construct WHILE statement which

specifies that the following line until the next ENDW statement, are to be generated repeatedly as

long as a particular condition is true. The testing of this condition, and the looping are done during

the macro is under expansion. The example shown below shows the usage of Macro-Time Looping

statement.

WHILE-ENDW structure

When an WHILE statement is encountered during the expansion of a macro, the specified Boolean

expression is evaluated.

TRUE

o The macro processor continues to process lines from DEFTAB until it encounters the next

ENDW statement.

o When ENDW is encountered, the macro processor returns to the preceding WHILE, re-

evaluates the Boolean expression, and takes action based on the new value.

FALSE

The macro processor skips ahead in DEFTAB until it finds the next ENDW statement and

then resumes normal macro expansion.


78

KEYWORD MACRO PARAMETERS All the macro instruction definitions used positional parameters. Parameters and arguments

are matched according to their positions in the macro prototype and the macro invocation statement.

The programmer needs to be careful while specifying the arguments. If an argument is to be omitted

the macro invocation statement must contain a null argument mentioned with two commas.

Positional parameters are suitable for the macro invocation. But if the macro invocation has

large number of parameters, and if only few of the values need to be used in a typical invocation, a

different type of parameter specification is required (for example, in many cases most of the

parameters may have default values, and the invocation may mention only the changes from the

default values).

Ex: XXX MACRO &P1, &P2, …., &P20, ….

XXX A1, A2,,,,,,,,,,…,,A20,…..

Keyword parameters

Each argument value is written with a keyword that names the corresponding parameter.

Arguments may appear in any order.

Null arguments no longer need to be used.

Ex: XXX P1=A1, P2=A2, P20=A20.

It is easier to read and much less error-prone than the positional method.


79


80

Use of keyword parameters in macro instructions

MACRO PROCESSOR DESIGN OPTIONS


81

RECURSIVE MACRO EXPANSION

We have seen an example of the definition of one macro instruction by another. But we have

not dealt with the invocation of one macro by another. The following example shows the invocation

of one macro by another macro.

Problem of Recursive Expansion

Previous macro processor design cannot handle such kind of recursive macro invocation and

expansion

The procedure EXPAND would be called recursively, thus the invocation arguments in the

ARGTAB will be overwritten. The Boolean variable EXPANDING would be set to FALSE

when the “inner” macro expansion is finished, i.e., the macro process would forget that it had

been in the middle of expanding an “outer” macro.

Solutions

Write the macro processor in a programming language that allows recursive calls, thus local

variables will be retained.

If you are writing in a language without recursion support, use a stack to take care of pushing

and popping local variables and return addresses.

The procedure EXPAND would be called when the macro was recognized. The arguments

from the macro invocation would be entered into ARGTAB as follows:

Parameter Value

1 BUFFER

2 LENGTH

3 F1

4 (unused)

The Boolean variable EXPANDING would be set to TRUE, and expansion of the macro

invocation statement would begin. The processing would proceed normally until statement invoking

RDCHAR is processed. This time, ARGTAB would look like

Parameter Value

1 F1

2 (Unused)

The expansion of RDCHAR would also proceed normally. At the end of this expansion, however, a

problem would appear. When the end of the definition of RDCHAR was recognized, EXPANDING

would be set to FALSE. Thus the macro processor would "forget" that it had been in the middle of

expanding a macro when it encountered the RDCHAR statement. In addition, the arguments from

the original macro invocation (RDBUFF) would be lost because the values in ARGTAB were

overwritten with the arguments from the invocation of RDCHAR. The cause of these difficulties is

the recursive calls of the procedure EXPAND. When the RDBUFF macro invocation is encountered,

EXPAND is called. Later, it calls PROCESSLINE for line 50, which results in another call to

EXPAND before a return is made from the original call. A similar problem would occur with

PROCESSLINE since this procedure too would be called recursively. For example, there might be

confusion about whether the return from PROCESSLINE should be made to the main (outermost)

loop of the macro processor logic or to the loop within EXPAND. These problems are not difficult to

solve if the macro processor is being written in a programming language (such as Pascal or C) that

allows recursive calls. The compiler would be sure that previous values of any variables declared

within a procedure were saved when that procedure was called recursively. It would also take care of

other details involving return from the procedure.


82

If a programming language that supports recursion is not available, the programmer must take care

of handling such items as return addresses and values of local variables. In such a case,

PROCESSLINE and EXPAND would probably not be procedures at all. Instead, the same logic

would be incorporated into a looping structure, with data values being saved on a stack. The

algorithm for implementing the recursive macro call is same as the algorithm for a one-pass macro

processor except the EXPAND and GETLINE procedures. The EXPAND and GETLINE procedures

are as follows.

procedure EXPAND

level = 0; SP = —1

begin

set S(SP + N + 2) = SP

set SP = SP + N + 2

set S(SP + 1) = DEFTAB index from NAMTAB

set up macro call argument list array in

S(SP + 2)...S(SP + N + 1) where N = total number of

arguments

while not end of macro definition and Level ! = 0

do

begin

GETLINE

PROCESSLINE

end {while)

set N = SP — S(SP) — 2

set SP = S(SP) end {EXPAND}

procedure GETLINE

begin if SP! = —1 then

begin increment DEFTAB pointer to next entry

set S(SP + 1) = S(SP + 1) + 1

get the line from DEFTAB with the pointer S(SP + 1)

substitute arguments from macro call

S(SP + 2)...S(SP + N + 1)

end

else

read next line from input file

end {GETLINE}


83

RECURSIVE MACRO EXPANSION

1.SP=-1

2.Call RDBUFF BUFFER,LENGTH,F1

SP=1


84

SP=S(SP)=S(1)=1

At the expansion, when the end of RDCHAR is recognized, EXPANDING would be set to

FALSE. Thus the macro processor would forget. That it had been in the middle of expanding a

macro when it encountered the RDCHAR statement. In addition, the arguments from the original

macro invocation (RDBUFF) would be lost because the value in ARGTAB was overwritten with the

arguments from the invocation of RDCHAR.


85

Example of nested macro invocation

GENERAL-PURPOSE MACRO PROCESSORS Macro processors that do not dependent on any particular programming language, but can be

used with a variety of different languages

Pros

Programmers do not need to learn many macro languages.

Although its development costs are somewhat greater than those for a language specific

macro processor, this expense does not need to be repeated for each language, thus save

substantial overall cost.

Cons

Large number of details must be dealt with in a real programming language

Situations in which normal macro parameter substitution should not occur, e.g., comments.

Facilities for grouping together terms, expressions, or statements

Tokens, e.g., identifiers, constants, operators, keywords

Syntax had better be consistent with the source programming language


86

MACRO PROCESSING WITHIN LANGUAGE TRANSLATORS The macro processors we discussed are called “Preprocessors”.

Process macro definitions

Expand macro invocations

Produce an expanded version of the source program, which is then used as input to an

assembler or compiler

You may also combine the macro processing functions with the language translator:

Line-by-line macro processor

Integrated macro processor

Line-by-Line Macro Processor

Used as a sort of input routine for the assembler or compiler

Read source program

Process macro definitions and expand macro invocations

Pass output lines to the assembler or compiler

Benefits

Avoid making an extra pass over the source program.

Data structures required by the macro processor and the language translator can be

combined (e.g., OPTAB and NAMTAB)

Utility subroutines can be used by both macro processor and the language translator.

o Scanning input lines

o Searching tables

o Data format conversion

It is easier to give diagnostic messages related to the source statements

Integrated Macro Processor

An integrated macro processor can potentially make use of any information about the source

program that is extracted by the language translator.

Ex (blanks are not significant in FORTRAN)

DO 100 I = 1,20

a DO statement

DO 100 I = 1

An assignment statement

DO100I: variable (blanks are not significant in FORTRAN)

An integrated macro processor can support macro instructions that depend upon the context

in which they occur.

TEXT- EDITORS Text Editors are the primary interface to the computer for all types of “Knowledge workers”

as they compose, organize, study and manipulate computer-based information.

OVERVIEW OF THE EDITING PROCESS

An interactive editor is a computer program that allows a user to create and revise a target

document. The term document includes objects such as computer programs, texts, equations, tables,

diagrams, line art and photographs-anything that one might find on a printed page. Text editor is one

in which the primary elements being edited are character strings of the target text.


87

The document editing process is an interactive user-computer dialogue designed to

accomplish four tasks:

1. Select the part of the target document to be viewed and manipulated

2. Determine how to format this view on-line and how to display it.

3. Specify and execute operations that modify the target document.

4. Update the view appropriately.

Traveling – Selection of the part of the document to be viewed and edited. It involves first

traveling through the document to locate the area of interest such as “next screen full ” , ”bottom”,

and “find pattern”. Traveling specifies where the area of interest is;

Filtering - The selection of what is to be viewed and manipulated is controlled by filtering.

Filtering extracts the relevant subset of the target document at the point of interest such as next

Screen full of text or next statement.

Formatting: Formatting determines how the result of filtering will be seen as a visible

representation (the view) on a display screen or other device.

Editing: In the actual editing phase, the target document is created or altered with a set of

operations such as insert, delete, replace, move or copy.

Manuscript oriented editors operate on elements such as single characters, words, lines,

sentences and paragraphs;

Program-oriented editors operate on elements such as identifiers, keywords and statements.

THE USER-INTERFACE OF AN EDITOR The user of an interactive editor is presented with a conceptual model of the editing system.

The model is an abstract framework on which the editor and the world on which the operations are

based.

The line editors simulated the world of the keypunch they allowed operations on numbered

sequence of 80-character card image lines.

The Screen-editors define a world in which a document is represented as a quarter plane of

text lines, unbounded both down and to the right. The user sees, through a cutout, only a

rectangular subset of this plane on a multi line display terminal. The cutout can be moved left

or right, and up or down, to display other portions of the document. The user interface is also

concerned with the input devices, the output devices, and the interaction language of the

system.

INPUT DEVICES: The input devices are used to enter elements of text being edited, to enter

commands, and to designate editable elements.

Input devices are categorized as:

1) Text devices

2) Button devices

3) Locator devices

4) The Data Tablet

5) Text devices

1) Text or string devices are typically typewriter like keyboards on which user presses and

release keys, sending unique code for each key. Virtually all computer key boards are of the

QWERTY type.


88

2) Button or Choice devices generate an interrupt or set a system flag, usually causing an

invocation of an associated application program. Also special function keys are also available on the

key board. Alternatively, buttons can be simulated in software by displaying text strings or symbols

on the screen. The user chooses a string or symbol instead of pressing a button.

3) Locator devices: They are two-dimensional analog-to-digital converters that position a

cursor symbol on the screen by observing the user’s movement of the device. The most common

such devices are the mouse and the tablet.

4) The Data Tablet is a flat, rectangular, electromagnetically sensitive panel. Either the

ballpoint pen like stylus or a puck, a small device similar to a mouse is moved over the surface.

The tablet returns to a system program the co-ordinates of the position on the data tablet at which the

stylus or puck is currently located. The program can then map these data-tablet co-ordinates to

screen coordinates and move the cursor to the corresponding screen position.

5) Text devices with arrow (Cursor) keys can be used to simulate locator devices. Each of

these keys shows an arrow that point up, down, left or right. Pressing an arrow key typically

generates an appropriate character sequence; the program interprets this sequence and moves the

cursor in the direction of the arrow on the key pressed.

VOICE-INPUT DEVICES: which translate spoken words to their textual equivalents, may prove to

be the text input devices of the future. Voice recognizers are currently available for command input

on some systems.

OUTPUT DEVICES

The output devices let the user view the elements being edited and the result of the editing

operations.

The first output devices were teletypewriters and other character-printing terminals that

generated output on paper.

Next “glass teletypes” based on Cathode Ray Tube (CRT) technology which uses

CRT screen essentially to simulate the hard-copy teletypewriter.

Today’s advanced CRT terminals use hardware assistance for such features as moving the

cursor, inserting and deleting characters and lines, and scrolling lines and pages.

The modern professional workstations are based on personal computers with high

resolution displays; support multiple proportionally spaced character fonts to produce

realistic facsimiles of hard copy documents.

INTERACTION LANGUAGE

The interaction language of the text editor is generally one of several common types.

The typing oriented or text command-oriented method

It is the oldest of the major editing interfaces. The user communicates with the editor by typing text

strings both for command names and for operands. These strings are sent to the editor and are

usually echoed to the output device.

Typed specification often requires the user to remember the exact form of all commands, or at least

their abbreviations. If the command language is complex, the user must continually refer to a manual

or an on-line Help function. The typing required can be time consuming for inexperienced users.

Function key interfaces:

Each command is associated with marked key on the key board. This eliminates much typing. E.g.:

Insert key, Shift key, Control key

Disadvantages:

Have too many unique keys


89

Multiple key stroke commands

Menu oriented interface

A menu is a multiple choice set of text strings or icons which are graphical symbols that represent

objects or operations. The user can perform actions by selecting items for the menus. The editor

prompts the user with a menu. One problem with menu oriented system can arise when there are

many possible actions and several choices are required to complete an action. The display area of the

menu is rather limited

EDITOR STRUCTURE The Command Language Processor accepts input from the user’s input devices, and analyzes

the tokens and syntactic structure of the commands. It functions much like the lexical and syntactic

phases of a compiler. The command language processor may invoke the semantic routines directly.

In a text editor, these semantic routines perform functions such as editing and viewing.

The semantic routines involve traveling, editing, viewing and display functions. Editing

operations are always specified by the user and display operations are specified implicitly by the

other three categories of operations. Traveling and viewing operations may be invoked either

explicitly by the user or implicitly by the editing operations.

Typical editor structure

Editing Component

In editing a document, the start of the area to be edited is determined by the current editing

pointer maintained by the editing component, which is the collection of modules dealing with


90

editing tasks. The current editing pointer can be set or reset explicitly by the user using travelling

commands, such as next paragraph and next screen, or implicitly as a side effect of the previous

editing operation such as delete paragraph.

Traveling Component

The traveling component of the editor actually performs the setting of the current editing and

viewing pointers, and thus determines the point at which the viewing and /or editing filtering begins.

Viewing Component

The start of the area to be viewed is determined by the current viewing pointer. This pointer

is maintained by the viewing component of the editor, which is a collection of modules responsible

for determining the next view. The current viewing pointer can be set or reset explicitly by the user

or implicitly by system as a result of previous editing operation. The viewing component formulates

an ideal view, often expressed in a device independent intermediate representation. This view may

be a very simple one consisting of a windows worth of text arranged so that lines are not broken in

the middle of the words.

Display Component

It takes the idealized view from the viewing component and maps it to a physical output

device in the most efficient manner. The display component produces a display by mapping the

buffer to a rectangular subset of the screen, usually a window.

Simple relationship between Editing and viewing buffers

Editing Filter


91

Filtering consists of the selection of contiguous characters beginning at the current point. The

editing filter filters the document to generate a new editing buffer based on the current editing

pointer as well as on the editing filter parameters

Editing Buffer

It contains the subset of the document filtered by the editing filter based on the editing

pointer and editing filter parameters.

Viewing Filter

When the display needs to be updated, the viewing component invokes the viewing filter.

This component filters the document to generate a new viewing buffer based on the current viewing

pointer as well as on the viewing filter parameters.

Viewing Buffer

It contains the subset of the document filtered by the viewing filter pointer and viewing filter

parameters. As a part of the editing command there is implicit travel to the first line of the file. Lines

1 through 50 are then filtered from the document to become the editing buffer. Successive

substitutions take place in this editing buffer without corresponding updates of the view

In Line editors, the viewing buffer may contain the current line; in screen editors, this buffer

may contain rectangular cut out of the quarter-plane of text. This viewing buffer is then passed to the

display component of the editor, which produces a display by mapping the buffer to a rectangular

subset of the screen, usually called a window.

The editing and viewing buffers, while independent, can be related in many ways. In a

simplest case, they are identical: the user edits the material directly on the screen. On the other hand,

the editing and viewing buffers may be completely disjoint.

Windows typically cover the entire screen or rectangular portion of it. Mapping viewing

buffers to windows that cover only part of the screen is especially useful for editors on modern

graphics based workstations. Such systems can support multiple windows, simultaneously showing

different portions of the same file or portions of different file. This approach allows the user to

perform inter-file editing operations much more effectively than with a system only a single

window.

The mapping of the viewing buffer to a window is accomplished by two components of the system.

(i) First, the viewing component formulates an ideal view often expressed in a device

independent intermediate representation. This view may be a very simple one consisting of a

windows worth of text arranged so that lines are not broken in the middle of words. At the other

extreme, the idealized view may be a facsimile of a page of fully formatted and typeset text with

equations, tables and figures.

(ii) Second the display component takes these idealized views from the viewing component

and maps it to a physical output device the most efficient manner possible.

The components of the editor deal with a user document on two levels:

(i) In main memory and

(ii) In the disk file system.

Loading an entire document into main memory may be infeasible. However if only part of a

document is loaded and if many user specified operations require a disk read by the editor to locate

the affected portions, editing might be unacceptably slow. In some systems this problem is solved by

the mapping the entire file into virtual memory and letting the operating system perform efficient

demand paging.


92

An alternative is to provide is the editor paging routines which read one or more logical

portions of a document into memory as needed. Such portions are often termed pages, although

there is usually no relationship between these pages and the hard copy document pages or virtual

memory pages. These pages remain resident in main memory until a user operation requires that

another portion of the document be loaded.

Editors function in three basic types of computing environment:

(i) Time-sharing environment

(ii) Stand-alone environment and

(iii) Distributed environment.

Each type of environment imposes some constraint on the design of an editor.

The Time –Sharing Environment

The time sharing editor must function swiftly within the context of the load on the

computer’s processor, central memory and I/O devices.

The Stand alone Environment

The editor on a stand-alone system must have access to the functions that the time sharing

editors obtain from its host operating system. This may be provided in pair by a small local operating

system or they may be built into the editor itself if the stand alone system is dedicated to editing.

Distributed Environment

The editor operating in a distributed resource sharing local network must, like a standalone

editor, run independently on each user’s machine and must, like a time sharing editor, content for

shared resources such as files.


93

SUMMARY:

MACRO: It identify the beginning of a macro definition

MEND: It identify the end of a macro definition

Basic macro processor functions:

Macro Definition and Expansion

Macro Processor Algorithms and Data structures

Two-pass macro processor:

Pass 1: Process all macro definitions

Pass 2: Expand all macro invocation statements

One-Pass Macro Processor:

A one-pass macro processor that alternate between macro definition and macro

expansion in a recursive way is able to handle recursive macro definition.

NAMTAB (Name Table):

Stores macro names

Serves as an index to DEFTAB

Pointers to the beginning and the end of the macro definition (DEFTAB)

ARGTAB (Argument Table):

Stores the arguments according to their positions in the argument list.

As the macro is expanded the arguments from the Argument table are

substituted for the corresponding parameters in the macro body.

Comparison of Macro Processor Design

One-pass algorithm

Every macro must be defined before it is called

One-pass processor can alternate between macro definition and macro expansion

Nested macro definitions are allowed but nested calls are not allowed.

Two-pass algorithm

Pass1: Recognize macro definitions.

Pass2: Recognize macro calls.

Nested macro definitions are not allowed.

General-purpose macro processors

Macro processors that do not dependent on any particular programming language, but

can be used with a variety of different languages

Machine-independent macro-processor features:

o Concatenation of Macro Parameters

o Generation of unique labels

o Conditional Macro Expansion

o Keyword Macro Parameters


94

UNIT III

Machine dependent compiler features - Intermediate form of the program-Machine dependent code

optimization-machine independent compiler features-Compiler design options-division into passes-

interpreters-p –code compilers-compiler-compilers.

MACHINE INDEPENDENT COMPILER FEATURES

Machine independent compilers describe the method for handling structured variables such

as arrays. Problems involved in compiling a block-structured language indicate some possible

solution.

STRUCTURED VARIABLES

Structured variables discussed here are arrays, records, strings and sets. The primarily

consideration is the allocation of storage for such variable and then the generation of code to

reference them. The same principles can also be applied to the other types of structured variables.

Arrays: In Pascal array declaration -

(i)Single dimension array:

A: ARRAY [ 1 . . 10] OF INTEGER

If each integer variable occupies one word of memory, then we require 10 words of

memory to store this array. In general an array declaration is

ARRAY [ l .. u ] OF INTEGER

Memory word allocated = ( u - l + 1) words.

(ii)Two dimension array :

B : ARRAY [ 0 .. 3,1 . . 3 ] OF INTEGER


95

In this type of declaration total word memory required is 0 to 3 = 4 ;1 - 3 = 3 ; 4 x 3 = 12 word

memory locations.

In general : ARRAY [ l1 .. u1,l2 . . u2.] OF INTEGER Requires ( u1 - l1 + 1)* ( u2 -l2 + 1) Memory

words

The data is stored in memory in two different ways. They are row-major and column major.

All array elements that have the same value of the first subscript are stored in contiguous locations.

This is called row-major order. It is shown in fig. 30(a). Another way of looking at this is to scan the

words of the array in sequence and observe the subscript values. In row-major order, the right most

subscript varies most rapidly.

Fig. 30(b) shows the column major way of storing the data in memory. All elements that

have the same value of the second subscript are stored together; this is called column major order. In

other words, the column major order, the left most subscript varies most rapidly.

To refer to an element, we must calculate the address of the referenced element relative to

the base address of the array. Compiler would generate code to place the relative address in an index

register. Index addressing mode is made easier to access the desired array element.

(1) One Dimensional Array: On a SIC machine to access A [6], the address is

Calculated by starting address of data + size of each data * number of preceding data.

i.e. Assuming the starting address is 1000H

Size of each data is 3 bytes on SIC machine

Number of preceding data is 5

Therefore the address for A [ 6 ] is = 1000 + 3 * 5 = 1015.In general for A:ARRAY [ l . . u

] of integer, if each array element occupies W bytes of storage and if the value of the subscript is S,

then the relative address of the referred element A[ S ] is given by W * ( S - l ).

The code generation to perform such a calculation is shown in fig. 31.

The notation A [ i2 ] in quadruple 3 specifies that the generated machine code

Should refer to A using index addressing after having placed the value

A: ARRAY [ 1 . . 10 ] OF INTEGER

(2) Multi-Dimensional Array: In multi-dimensional array we assume row major order. To access

element B[ 2,3 ] of the matrix B[ 6, 4 ], we must skip over two complete rows before arriving at the

beginning of row 2. Each row contains 6 elements so we have to skip 6 x 2 = 12 array elements

before we come to the beginning of row 2 to arrive at B[ 2, 3 ]. To skip over the first two elements of

row 2 to arrive at B[ 2, 3 ]. This makes a total of 12 + 2 = 14 elements between the beginning of the

array and element B[2, 3 ]. If each element occurs 3 byte as in SIC, the B[2, 3] is located relating at

14 x 3 =42 address within the array.

Generally the two dimensional array can be written as


96

B ; ARRAY [ l1 . . . u1,l1 . . . u1, ]OF INTEGER

Code Generation for Two Dimensional Array

The symbol - table entry for an array usually specifies the following:

1. The type of the elements in the array

2. The number of dimensions declared

3. The lower and upper limit for each subscript.

4. This information is sufficient for the compiler to generate the code required for array

reference. Some of the languages line FORTRAN 90, the values of ROWS and COLUMNS

are not known at completion time. The compiler cannot directly generate code. Then, the

compiler create a descriptor called dope vector for the array. The descriptor includes space

for storing the lower and upper bounds for each array subscript. When storage is allocated for

the array, the values of these bounds are computed and stored in the descriptor. The

generated code for one array reference uses the values from the descriptor to calculate

relative addresses as required. The descriptor may also include the number of dimension for

the array, the type of the array elements and a pointer to the beginning of the array. This

information can be useful if the allocated array is passed as a parameter to another procedure.


97

Code Generation For Array References

For example, FORTRAN 90 provides dynamic arrays. Using this feature, a two-dimensional array

could be declared as

INTEGER, ALLOCABLE, ARRAY (:,:) :: MATRIX

This specifies that MATRIX is an array of integers that can be allocated lynamically. The

allocation can be accomplished by a statement like

ALLOCATE (MATRIX(ROWS,COLUMNS))

Where the variables ROWS and COLUMNS have previously been assigned values. Since the

values of ROWS and COLUMNS are not known at compilation time.In the compilation of other

structured variables like recode, string and sets the same type of storage allocations are required. The

compiler must store information concerning the structure of the variable and use the information to

generate code to access components of the structure and it must construct a description for situation

in which the required conformation is not known at compilation time.

MACHINE - INDEPENDENT CODE OPTIMIZATION

One important source of code optimization is the elimination of common sub-expressions.

These are sub-expressions that appear at more than one point in the program and that compute the

same value.

x, y : ARRAY [ 0 . . 10,1 . . 10 ] OF INTEGER

....FOR I : = 1 TO 10 DO

X [ I, 2 * J - 1 ]: = Y[ I,2 * J }

The sub-expression 2 * J is calculated twice. An optimizing compiler should generate code

so that the multiplication is performed only once and the result is used in both places. Common sub-

expressions are usually detected through the analysis of an intermediate form of the program.


98

(8) JLE I #20 (4)

CODE OPTIMIZATION BY REDUCTION IN STRENGTH OF OPERATIONS

The operand J is not changed in value between quadruples 5 and 12. It is not possible to

reach quadruple 12 without passing through quadruple 5 first because the quadruples are part of the

same basic block. Therefore, quadruples 5 and 12 compute the same value. This means we can

delete quadruple 12 and replace any reference to its result (i10), with the reference to i3, the result of

quadruple 5. this information eliminates the duplicate calculation of 2 * J which we identified

previously as a common expression in the source statement.

After the substitution of i3 for i10 , quadruples 6 and 13 are the same except for the name of

the result. Hence the quadruple 13 can be removed and substitute i4 for i11wherever it is used.

Similarly quadruple 10 and 11 can be removed because they are equivalent to quadruples 3 and 4.

STORAGE ALLOCATION

All the program defined variable, temporary variable, including the location used to save

the return address use simple type of storage assignment called static allocation.

When recursively procedures are called, static allocation cannot be used. This is explained

with an example. Fig. 38(a) shows the operating system calling the program MAIN. The return

address from register 'L' is stored as a static memory location RETADR within MAIN.

MAIN has called the procedure SUB. The return address for the call has been stored at a

fixed location within SUB (invocation 2). If SUB now calls itself recursively as shown in a problem

occurs.SUB stores the return address for invocation 3 into RETADR from register L. This destroys

the return address for invocation 2. As a result, there is no possibility of ever making a correct return

to MAIN.

There is no provision of saving the register contents. When the recursive call is made,

variable within SUB may set few variables. These variables may be destroyed. However, these

previous values may be needed by invocation 2 or SUB after the return from the recursive call.

Hence it is necessary to preserve the previous values of any variables used by SUB, including

parameters, temporaries, return addresses, and register save areas etc., when a recursive call is made.

This is accomplished with a dynamic storage allocation technique.


99

In this technique, each procedure call creates an activation record that contains storage for

all the variables used by the procedure. If the procedure is called recursively, another activation

record is created. Each activation record is associated with a particular invocation of the procedure,

not with the itself. An activation record is not deleted until a return has been made from the

corresponding invocation.

Activation records are typically allocated on a stack, with the correct record at the tip of the

stack. The procedure MAIN has been called; its activation record appears on the stack. The base

register B has been set to indicate the starting address of this correct activation record. The first word

in an activation record would normally contain a pointer PREV to the previous record on the stack.

Since the record is the first, the pointer value is null.

The second word of the activation record contain a portion NEXT to the first unused word of

the stack, which will be the starting address for the next activation record created. The third word


100

contains the return address for this invocation of the procedure, and then necessary words contain the

values of variables used by the procedure.

(a) (b) (c)

Recursive invocation of a procedure using static storage allocation


101

Activation records are typically allocated on a stack, with the current record at the top of the stack.

The procedure MAIN has been called; its activation record appears on the stack. The base register B

has been set to indicate the starting address of this current activation record. The first word in an

activation record would normally contain a pointer PREV to the previous record on the stack. Since

this record is the first, the pointer value is null. The second word of the activation record contains a

pointer NEXT to the first unused word of the stack, which will be the starting address for the next

activation record created. The third word contains the return address for this invocation of the

procedure, and the remaining words contain the values of variables used by the procedure.


102

©


103

(d)


104

When a procedure returns to its caller, the current activation record (which corresponds to the most

recent invocation) is deleted. The pointer PREV in the deleted record is used to reestablish the

previous activation record as the cur-rent one, and execution continues. This shows the stack as it

would appear after SUB returns from the recursive call. Register B has been reset to point to the

activation record for the previous invocation of SUB.

The return address and all the variable values in this activation record are exactly the same as

they were before the recursive call. This technique is often referred to as automatic allocation of

storage to distinguish it from other types of dynamic allocation that are under the control of the

programmer.

When automatic allocation is used, the compiler must generate code for references to

variables using some sort of relative addressing. In our example the compiler assigns to each

variable an address that is relative to the beginning of the activation record, instead of an actual

location within the object program.

The address of the current activation record is, by convention, contained in register B, so a

reference to a variable is translated as an instruction that uses base relative addressing. The

displacement in this instruction is the relative address of the variable within the activation record.

The compiler must also generate additional code to manage the activation records themselves. At the

beginning of each procedure there must be code to create a new activation record, linking it to the

previous one and setting the appropriate pointers. This code is often called a prologue for the

procedure. At the end of the procedure, there must be code to delete the current activation record,

resetting pointers as needed. This code is often called an epilogue.

BLOCK – STRUCTURED LANGUAGE

A block is a unit that can be divided in a language. It is a portion of a program that has

the ability to declare its own identifiers. This definition of a block is also meeting the units such as

procedures and functions.

Each procedure corresponds to a block. Note that blocks are rested within other blocks.

Example: Procedures B and D are rested within procedure A and procedure C is rested within

procedure B. Each block may contain a declaration of variables. A block may also refer to variables

that are defined in any block that contains it, provided the same names are not redefined in the inner

block. Variables cannot be used outside the block in which they are declared.

In compiling a program written in a blocks structured language, it is convenient to number

the blocks .As the beginning of each new block is recognized, it is assigned the next block number in

sequence. The compiler can then construct a table that describes the block structure. The block-level

entry gives the nesting depth for each block. The outer most block number that is one greater than

that of the surrounding block.

A NEW or MALLOC statement would be translated into a request to the operating system for an

area of storage of the required size. Another method is to handle the required allocation through a

run-time support procedure associated with the compiler. With this method, a large block of free

storage called a heap is obtained from the operating system at the beginning of the program.

Allocations of storage from the heap are managed by the run-time procedure. In some systems, it is


105

not even necessary for the programmer to free storage explicitly. Instead, a run-time garbage

collection procedure scans the pointers in the program and reclaims areas from the heap that are no

longer being used. Dynamic storage allocation, as discussed in this section, provides another

example of delayed binding.

Nesting of blocks in a source program


106

When a reference to an identifier appears in the source program, the compiler must first

check the symbol table for a definition of that identifier by the current block. If no such definition is

found, the compiler looks for a definition by the block that surrounds the current one, then by the

block that surrounds that, and so on.

If the outermost block is reached without finding a definition of the identifier, then the

reference is an error. The search process just described can easily be implemented within a symbol

table that uses hashed addressing. The hashing function is used to locate one definition of the

identifier. The chain of definitions for that identifier is then searched for the appropriate entry.

There are other symbol-table organizations that store the definitions of identifiers according

to the nesting of the blocks that define them. This kind of structure can make the search for the

proper definition more efficient. Most block-structured languages make use of automatic storage

allocation. That is, the variables that are defined by a block are stored in an activation record that is

created each time the block is entered.

If a statement refers to a variable that is declared within the current block, this variable is

present in the current activation record, so it can be accessed in the usual way. However, it is also

possible for a statement to refer to a variable that is declared in some surrounding block. In that case,

the most recent activation record for that block must be located to access the variable. One common

method for providing access to variables in surrounding blocks uses a data structure called a display.

Use of display for procedures


107

COMPILER DESIGN OPTIONS

The compiler design is briefly discussed in this section. The compiler is divided into single pass and

multi pass compilers.

DIVISION INTO PASSES

In this design the parsing process driven the compiler. The lexical scanner was called when

the parser needed another input token and a code-generation routine was invoked as the parser

recognized each language construct. The code optimization techniques discussed cannot be applied


108

in total to one-pass compiler without intermediate code-generation. One pass compiler is efficient to

generate the object code.

One pass compiler cannot be used for translation for all languages. FORTRAN and

PASCAL language programs have declaration of variable at the beginning of the program. Any

variable that is not declared is assigned characteristic by default.

One pass compiler may fix the formal reference jump instruction without problem as in one

pass assembler. But it is difficult to fix if the declaration of an identifier appears after it has been

used in the program as in some programming languages.

Example: X : = Y * Z

If all the variables x, y and z are of type INTEGER, the object code for this statement might

consist of a simple integer multiplication followed by storage of the result. If the variable are a

mixture of REAL and INTEGER types, one or more conversion operations will need to be included

in the object code, and floating point arithmetic instructions may be used. Obviously the compiler

cannot decide what machine instructions to generate for this statement unless instruction about the

operands is available. The statement may even be illegal for certain combinations of operand types.

Thus a language that allows forward reference to data items cannot be compiled in one pass.

Some programming language requires more than two passes.

Example: ALGOL-98 requires at least 3 passes.

There are a number of factors that should be considered in deciding between one pass and multi pass

compiler designs.

(1)One Pass Compilers:

Speed of compilation is considered important. Computer running students jobs tend to spend

a large amount of time performing compilations. The resulting object code is usually executed only

once or twice for each compilation, these test runs are not normally very short. In such an

environment, improvement in the speed of compilation can lead to significant benefit in system

performance and job turnaround time.

(2)Multi-Pass Compilers:

If programs are executed many times for each compilation or if they process large amount of

data, then speed of executive becomes more important than speed of compilation. In a case, we

might prefer a multi-pass compiler design that could incorporate sophisticated code-optimization

technique.

Multi-pass compilers are also used when the amount of memory, or other systems

resources, is severely limited. The requirements of each pass can be kept smaller if the work by

compilation is divided into several passes.

Other factors may also influence the design of the compiler. If a compiler is divided into

several passes, each pass becomes simpler and therefore, easier to understand, read and test.


109

Different passes can be assigned to different programmers and can be written and tested in parallel,

which shortens the overall time require for compiler construction.

INTERPRETERS

An interpreter processes a source program written in a high-level language. The main

difference between compiler and interpreter is that interpreters execute a version of the source

program directly, instead of translating it into machine code.

An interpreter performs lexical and syntactic analysis functions just like compiler and then

translates the source program into an internal form. The internal form may also be a sequence of

quadruples.

After translating the source program into an internal form, the interpreter executes the

operations specified by the program. During this phase, an interpreter can be viewed as a set of

subtractions. The internal form of the program drives the execution of this subtraction.

The real advantage of an interpreter over a compiler, however is in the debugging facilities

that can easily be provided. The symbol table, source line number, and other information from the

source program are usually retained by the interpreter. During execution, these can be used to

produce symbolic dumps of data values, traces of program execution related to the source statements

etc.

Most programming languages can be either compiled or interpreted successfully. However,

some languages are particularly well suited to the use of interpreter. Compilers usually generate calls

to library routines to perform function such as I/O and complex conversion operations. In such cases,

an interpreter might be performed because of its speed of translation. Most of the execution time for

the standard program would be consumed by the standard library routines. These routines would be

the same, regardless of whether a compiler or an interpreter is used.

In some languages the type of a variable can change during the execution of a program.

Dynamic scoping is used, in which the variable that are referred to by a function or a subroutines are

determined by the sequence of calls made during execution, not by the nesting of blocks in the

source program. It is difficult to compile such language efficiently and allow for dynamic changes in

the types of variables and the scope of names. These features can be more easily handled by an

interpreter that provides delayed binding of symbolic variable names to data types and locations.

P-CODE COMPILERS

P-Code compilers also called bytecode compilers are very similar in concept to interpreters.

A P-code compiler, intermediate form is the machine language for hypothetical computers, often

called pseudo-machine or P-machine. The source program is compiled, with the resulting object

program being in P-code. This P-code program is then read and executed under the control of a P-

code interpreter.

The main advantage of this approach is portability of software. It is not necessary for the

compiler to generate different code for different computers, because the P-code object program can

be executed on any machine that has a P-code interpreter. Even the compiler itself can be transported

if it is written in the language that it compiles. To accomplish this, the source version of the compiler


110

is compiled into P-code; this P-code can then be interpreted on another compiler. In this way P-code

compiler can be used without modification as a wide variety of system if a P-code interpreter is

written for each different machine.

The design of a P-machine and the associated P-code is often related to the requirements of

the language being compiled. For example, the P-code for a Pascal compiler might include single P-

instructions that perform:

Array subscript calculation

Handle the details of procedure entry and exit and

Perform elementary operation on sets

Translation and execution using a P-code compiler

This simplifies the code generation process, leading to a smaller and more efficient compiler. The P-

code object program is often much smaller than a corresponding machine code program. This is

particularly useful on machines with severely limited memory size. The interpretive execution of P-

code program may be much slower than the execution of the equivalent machine code. Many P-code

compilers are designed for a single user running on dedicated micro-computer systems. In that case,

the speed of execution may be relatively insignificant because the limiting factor is system

performance may be the response time and “think time " of the user.

If execution speed is important, some P-code compilers support the use of machine-

language subtraction. By rewriting a small number of commonly used routines in machine language,

Source

Program

Compile

Object Program

(P-code)

Execute

P-code

compiler

P-code

interpreter


111

rather than P-code, it is often possible to improve the performance. Of course, this approach

sacrifices some of the portability associated with the use of P-code compilers.

COMPILER-COMPILERS

Compiler-Compiler is software tool that can be used to help in the task of compiler

construction. Such tools are also called Compiler Generators or Translator - writing system

The compiler writer provides a description of the language to be translated. A compiler-

compiler is a software tool that can be used to help in the task of compiler construction. such tools

are also often called compiler generators or translator-writing systems.

This description may consist of a set of lexical rules for defining tokens and a grammar for

the source language. Some compiler-compilers use this information to generate a scanner and a

Compiler parses directly. Others create tables for use by standard table-driven scanning and parsing

routines that are supplies by the compiler - compiler.

Compiler

Automated compiler construction using a compiler-compiler

Difference between assembler, compiler and interpreter

Assembler:

A computer will not understand any program written in a language, other than its machine

language. The programs written in other languages must be translated into the machine language.

Such translation is performed with the help of software. A program which translates an assembly

language program into a machine language program is called an assembler. If an assembler which

runs on a computer and produces the machine codes for the same computer then it is called self

Lexical Rules

Grammar

Compiler-

Compiler

Scanner

Parser

Code generator

Semantic

routines


112

assembler or resident assembler. If an assembler that runs on a computer and produces the machine

codes for other computer then it is called Cross Assembler.

Assemblers are further divided into two types: One Pass Assembler and Two Pass

Assembler. One pass assembler is the assembler which assigns the memory addresses to the

variables and translates the source code into machine code in the first pass simultaneously. A Two

Pass Assembler is the assembler which reads the source code twice. In the first pass, it reads all the

variables and assigns them memory addresses. In the second pass, it reads the source code and

translates the code into object code.

Compiler:

It is a program which translates a high level language program into a machine language

program. A compiler is more intelligent than an assembler. It checks all kinds of limits, ranges,

errors etc. But its program run time is more and occupies a larger part of the memory. It has slow

speed. Because a compiler goes through the entire program and then translates the entire program

into machine codes. If a compiler runs on a computer and produces the machine codes for the same

computer then it is known as a self compiler or resident compiler. On the other hand, if a compiler

runs on a computer and produces the machine codes for other computer then it is known as a cross

compiler.

Interpreter:

An interpreter is a program which translates statements of a program into machine code. It

translates only one statement of the program at a time. It reads only one statement of program,

translates it and executes it. Then it reads the next statement of the program again translates it and

executes it. In this way it proceeds further till all the statements are translated and executed. On the

other hand, a compiler goes through the entire program and then translates the entire program into

machine codes. A compiler is 5 to 25 times faster than an interpreter.

By the compiler, the machine codes are saved permanently for future reference. On

the other hand, the machine codes produced by interpreter are not saved. An interpreter is a small

program as compared to compiler. It occupies less memory space, so it can be used in a smaller

system which has limited memory space.


113

SUMMARY:

Machine independent compilers describe the method for handling structured variables such as

arrays. Problems involved in compiling a block-structured language indicate some possible

solution.

Structured variables are arrays, records, strings and sets.

All the program defined variable, temporary variable, including the location used to save the return

address use simple type of storage assignment called static allocation.

When recursively procedures are called, static allocation cannot be used.

A block is a unit that can be divided in a language. It is a portion of a program that has the ability

to declare its own identifiers. This definition of a block is also meeting the units such as procedures

and functions.

A NEW or MALLOC statement would be translated into a request to the operating system for an

area of storage of the required size.

The compiler is divided into single pass and multi pass compilers.

An interpreter processes a source program written in a high-level language.

The main difference between compiler and interpreter is that interpreters execute a version of

the source program directly, instead of translating it into machine code.

P-Code compilers also called bytecode, intermediate form is the machine language for

hypothetical computers, often called pseudo-machine or P-machine.’

Compiler-Compiler is software tool that can be used to help in the task of compiler construction.

Such tools are also called Compiler Generators or Translator - writing system.


114

UNIT IV

Introduction: Definition of DOS – History of DOS – Definition of Process - Process states - process

states transition – Interrupt processing – interrupt classes - Storage Management Real Storage: Real

storage management strategies – Contiguous versus

Non-contiguous storage allocation – Single User Contiguous Storage allocation- Fixed partition

multiprogramming – Variable partition multiprogramming. Virtual Storage: Virtual storage

management strategies – Page replacement strategies – Working sets – Demand paging – page size.

INTRODUCTION

DEFINITION OF DOS

“An Operating System can be defined as a program, implemented in either software or

firmware, that makes the hardware usable. The Disk Operating System can also be defined as

the software that controls the hardware”.

Hardware provides “raw computing power”. Operating system makes this computing power

conveniently available to users, and they manage the hardware carefully to achieve good

performance.

Operating system are primarily resource managers. The main resource that the OS manages

is computer hardware in the form of processors, storage, input /output devices, communication

devices, and data. Operating system perform many functions such as implementing the user

interface, sharing hardware among users, allowing users to share data among themselves, preventing

users from interfacing with one another, scheduling resources among users, facilitating i/o

recovering from errors, accounting for resource usage, facilitating parallel operations, organizing

data for secure and rapid access and handling network communications.

The basic functions of an operating system are as follows:

File management

Working with the Files like picking up and preparing to use a tools like calculator.

Configuration of working environment.

The operating system translates the computer language (refer to the hardware lesson for a

review) into information. One method of presenting this information is via a graphical user interface

(GUI). Elements of a GUI include such things as windows, menus, buttons, scroll bars, icons and the

desktop. The desktop is the primary GUI generated by the operating system.


115

BLOCK DIAGRAM OF OPERATING SYSTEM

OPERATING SYSTEM OBJECTIVES AND FUNCTIONS

An operating system is a program that controls the execution of application programs

and acts as an interface between the user of a computer and the computer hardware. An operating

system can be thought of as having three objectives or performing 3 functions.


116

1) Convenience:

An operating system makes a computer more convenient to use.

2) Efficiency:

An operating system allows the computer system resources to be used in an efficient manner.

3) Ability to evolve:

An operating system should be constructed in such a way as to permit the effective

development, testing and introduction of new system functions without at the same time interfering

with service.

HISTORY OF DOS

SUBHEADINGS

THE 1940’s AND 1950’s.

THE 1960’s.

THE EMERGENCE OF NEW FIELD : SOFTWARE ENGINEERING.

THE 1980’s.

THE 1990’s.

UNIX.

The 1940’s and the 1950’s:

Operating system has evolved over the last 40 years through a number of distinct phases or

generations. In the 1940’s, the earliest electronic digital computers had no operating system.

Machines of the time programs were entered one bit at a time on rows of mechanical switches.

Machine language programs were entered on punched cards, and assembly languages were

developed to speed the programming process.

The general motors’ research laboratories implemented the first operating system in the early

1990’s for their IBM 701.The systems of the 1950’s generally ram only one job at a time and

smoothed the transition between jobs to get maximum initialization of the computer system. these

were called single-stream batch processing system because programs and data were submitted in

groups or batches.

The 1960’s:


117

The systems of the 1960’s were also batch processing systems, but they were able to take

better advantage of the computer’s resources by running several jobs at once. They contained many

peripheral devices such as card readers, card punches, printers, tape drives and disk drives. Any one

job rarely utilized all a computer’s resources effectively. Operating system designers that when one

job was waiting for an i/o operation to complete before the job could continue using the processor,

some other job could use the idle processor.

Similarly, when one job was using the processor other job could be using the various input

/output devices. In fact running a mixture of diverse jobs appeared to be the best way to optimize

computer utilization. So operating system designers developed the concept of in which several jobs

are in main memory at once, a processor is switched from job to job as needed to keep several jobs

advancing while keeping the peripheral devices in use.

More advanced operating system were developed to service multiple interactive users at

once. Timesharing systems were developed to multi program large numbers of simultaneous

interactive users. Many of the time-sharing systems of the 1960’s were multimode systems also

supporting batch processing as well as real-time application. Real-time systems are characterized by

supplying immediate response.

The key time-sharing development efforts of this period included the CTSS system

developed at MIT, the TSS system developed by IBM, the multicast system developed at MIT, as

the successor to CTSS turnaround time that is the time between submission of a job and the return of

results, was reduced to minutes or even seconds.

THE EMERGENCE OF A NEW FIELD: SOFTWARE ENGINEERING

The operating system developed during the 1960’s endless hours and countless dollars were

spent detecting and removing bugs that should never have entered the systems in the first place. So

much attention was given to these problems of constructing software systems. This spawned the

field of engineering is developing a disciplined and structured approach to the construction of

reliable, understandable and maintainable software.

The 1980’s:

The 1980’s was the decade of the personal computer and the workstation. Individuals could

have their own dedicated computers for performing the bulk of their work, and they use

communication facilities for transmitting data between systems. Computing was distributed to the

sites at which it was needed rather than bringing the data to be processed to some central, large -

scale, computer installation. The key was to transfer information between computers in computer

networks. E-mail file transfer and remote database access applications and client/server model

become widespread.

The 1990’s and beyond:

In 1990’s the distributed computing were used in which computations will be paralleled into

sub - computations that can be executed on other processors in multiprocessor computers and in

computer networks. Networks will be dynamically configured new devices and s/w are

added/removed.


118

When new server is added, it will make itself known to the server tells the networks about its

capabilities, billing policies accessibility and forth client need not know all the details of the

networks instead they contact locating brokers for the services provided by servers. The locating

brokers know which servers are available, where they are, and how to access them. This kind of

connectivity will be facilitated by open system standards and protocols.

Computing is destined to become very powerful and very portable. In recent years, laptop

computers have been introduced that enable people to carry their computers with them where ever

they go. With the development of OSI communication protocols, integrated services digital network

(ISDN) people will be able to communicate and transmit data worldwide with high reliability.

UNIX

The unix operating system was originally designed in the late 1960’s and elegance attracted

researchers in the universities and industry. UNIX is the only operating system that has been

implementing on computers ranging from micros to supercomputers

DEFINITIONS OF “PROCESS”

The term “Process” was first used by the designers of the Multicast system in the 1960s.

Some definitions of process are as follows:

A program in execution.

An asynchronous activity.

The “animated spirit” of a procedure.

The “locus of control” of a procedure in execution.

That which is manifested by the existence of a “process control block” in the operating

system.

That entity to which processors are assigned.

The “dispatchable” unit.

PROCESS STATES

A process goes through a series of discrete process states. Various events can cause a process

to change states.

A process is said to be running (i.e., in the running state) if it currently has the CPU. A

process is said to be ready (i.e., in the ready state) if it could use a CPU if one were available. A

process is said to be blocked (i.e., in the blocked state) if it is waiting for some event to happen (such

as an I/O completion event) before it can proceed. For example consider a single CPU system, only

one process can run at a time, but several processes may be ready, and several may be blocked. So

establish a ready list of ready processes and a blocked list of blocked processes. The ready list is

maintained in priority order so that the next process to receive the CPU is the first process on the list.

PROCESS STATE TRANSITIONS(***5m)

SUBHEADINGS


119

INTRODUCTION.

DIAGRAM OF PROCESS STATE TRANSITIONS.

STATE TRANSITION DEFINITIONS.

THE PROCESS CONTROL BLOCK.

When a job is admitted to the system, a corresponding process is created and normally inserted at

the back of the ready list. The process gradually moves to the head of the ready list as the processes

before it complete their turns at using the CPU. When the process reaches the head of the list, and

when the CPU becomes available , the process is given the CPU and is said to make a state

transition from ready state to the running state. The assignment of the CPU to the first process on

the ready list is called dispatching, and is performed by a system entity called the dispatcher. We

indicate this transition as follows

Dispatch (process name): ready --> running.

To prevent any one process to use the system as a monopoly, the operating system sets a

hardware interrupting clock (or interval timer) to allow this user to run for a specific time interval

or quantum. If the process does not leave the CPU before the time interval expires, the interrupting

clock generates an interrupt, causing the operating system to regain control. The operating system

then makes the previously running process ready, and makes the first process on the ready list

running.

These state transitions are indicated as

Timerunout (processname) : running --> ready


120

and dispatch (processname) : ready --> running.

If a running process initiates an input/output operation before its quantum expires, the running

process voluntarily leaves the CPU. This state transition is

Block (processname): running blocked.

When an input/output operation (or some other event the process is waiting for) completes. The

process makes the transition from the blocked state to the ready state. The transition is

Wakeup (processname): blocked --> ready.

So the possible state transitions can be sequenced as:

dispatch (processname) : ready --> running.

Timerunout (processname) : running --> ready.

Block (processname): running blocked.

Wakeup (processname): blocked --> ready.

THE PROCESS CONTROL BLOCK (PCB)

The PCB is a data structure containing certain important information about the process including.

The current state of the process.

Unique identification of the process.

Pointers to the process’s parent (i.e., the process that created this process).

Pointers to the process’s child processes (i.e., processes created by this process).

The process’s priority.

Pointers to locate the process’s memory.

Pointers to allocated resources.

A register save area.

The processor it is running on ( in multiprocessor system)

The PCB is a central store of information that allows the operating system to locate all key

information about a process. The PCB is the entity that defines a process to the operating system.

INTERRUPT PROCESSING (***8m)


121

SUBHEADINGS

DEFINITION OF INTERRUPTS.

TYPES OF INTERRUPTS.

**HARDWARE INTERRUPTS.

**SOFTWARE INTERRUPTS.

INTERRUPT PROCESS DIAGRAM.

INTERRUPT CLASSES.

** SUPERVISORY CLASSES.

**I/O INTERRUPTS.

**EXTERNAL INTERRUPTS.

**RESTART INTERRUPTS.

**PROGRAM CHECK INTERRUPTS.

**MACHINE CHECK INTERRUPTS.

DEFINITION

An interrupt is an event that alters the sequence in which a processor executes instructions.

An interrupt is a dynamic event that needs prompt attention by the CPU. Usually an interrupt only

needs a short period of CPU time to serve it. After that the original process can resume its execution.

TYPES There are two types interrupting events:

HARDWARE INTERRUPTS.

SOFTWARE INTERRUPTS.

Hardware interrupts that are those issued by I/O device controllers when they need CPU

to process I/O data.

Software interrupts or traps are raised when the current process executes a special trap

instruction to indicate that something wrong has happened or the process needs special service from

the operating system (like performing some I/O operation).

Each type of I/O device has a special program called an interrupt handler to serve the

interrupt requests from these devices.

All software traps, there is a special trap handler. Each type of interrupt has an associated

priority level.


122

A running process would only be interrupted by an interrupt source or trap of higher priority.

When the CPU is executing an interrupt handler, the interrupt handler may be further interrupted by

an interrupt source of even higher priority. It is generated by the Hardware of the computer system.

The main advantage of interrupt concept is that it provides a low-overhead means of gaining

the attention of the CPU. This eliminates the need for the CPU to remain busy polling to check if

the devices require the usage of the CPU.

The disadvantage of interrupt concept is that, it is also possible that the system can become

overloaded. If interrupts arrive quickly, the system may not be able to keep up with the interrupts.

THE INTERRUPT PROCESS

When an interrupt occurs.

The operating system gains control.


123

The operating system saves the state of the interrupted process. In many systems this

information is stored in the interrupted process’s Process Control Block.

The operating system analyzes the interrupt and passes control to the appropriate

routing to handle the interrupt. Today a system is handled automatically by the

hardware.

The interrupt handler routine processes the interrupt.

The state of the interrupted process is restored.

The interrupted process executes.

An interrupt may be initiated by a running process called a trap and said to be synchronous

with the operation of the process or it may be caused by some event that may or may not be related

to the running process it is said to be asynchronous with the operation of the process.

INTERRUPT CLASSES

There are six interrupt classes. They are

* SVC (Supervisor Call) interrupts.

These are initiated by a running process that execute the svc is a user generated request for a

particular system service such as performing input/output, obtaining more storage, or

communicating with the system operator.

* I/O interrupts:

These are initiated by the input/output hardware. They signal to the cpu that the status of a

channel or device has changed. For e.g., they are caused when an I/O operation completes, when an

I/O error occurs.

* External interrupts:

These are caused by various events including the expiration of a quantum on an interrupting

clock or the receipt of a signal from another processor on a multiprocessor system.

* Restart interrupts:

These occur when the operator presses the restart button or arrival of restart signal

processor instruction from another processor on a multiprocessor system.

* Program check interrupts:

These may occur when a programs machine language instructions are executed. These

problems include division by zero, arithmetic overflow or underflow, data is in wrong format,

attempt to execute an invalid operation code or attempt to refer a memory location that do not exist

or attempt to refer protected resource.

* Machine check interrupts:

These are caused by multi-functioning hardware.

STORAGE MANAGEMENT REAL STORAGE


124

Storage management strategies determine how a particular storage organization performs

under various policies.

*when do we get a new program to place in the memory?

*do we get it when the system specifically asks for it, or

*do we attempt to anticipate the systems requests?

*where in main storage do we place the next program to be run?

*do we place the program as close as possible into available memory slots to minimize

wasted space.

If a new program needs to be placed in main storage and if main storage is currently full, which of

the other programs do we displace? Should we replace the oldest programs, or should we replace

those that are least frequently used or least recently used.

REAL STORAGE MANAGEMENT STRATEGIES(*8m)

SUBHAEDINGS

INTRODUCTION.

CATEGORIES.

DIAGRAM-HIERARCHICAL STORAGE ORGANIZATION

Storage management strategies are used to obtain the best possible use of the main storage resource. Storage management strategies are divided into the following categories

Fetch strategies

Demand fetches strategies

Anticipatory fetch strategies

Placement strategies

Replacement strategies.


125


126

Fetch strategies are concerned with when to obtain the next piece of program or data for

transfer to main storage from secondary storage. Demand fetch, in which the next piece of program

or data is brought into the main storage when it is referenced by a running program. Placement

strategies are concerned with determining where in main storage to place an incoming program.

Replacement strategies are concerned with determining which piece of program are data to displace

to make room for incoming programs.

CONTIGUOUS VS NONCONTIGUOUS STORAGE ALLOCATION(*8m)

Memory allocation is the process of reserving a partial or complete portion of computer

memory for the execution of programs and processes. Memory allocation is achieved through a

process known as memory management. Memory allocation is primarily a computer hardware

operation but is managed through operating system and software applications. Once the program has

finished its operation or is idle, the memory is released and allocated to another program or merged

within the primary memory.

Memory allocation has two core types:

Static Memory Allocation: The program is allocated memory at compile time.


127


128

SINGLE USER CONTIGUOUS STORAGE ALLOCATION

SUBHEADINGS

INTRODUCTION.

DIAGRAM-SINGLE USER CONTIGUOUS STORAGE ALLOCATION.

PROTECTION IN SINGLE USER SYSTEM.

DIAGRAM- STORAGE PROTECTION IN SINGLE USER SYSTEM.

DIAGRAM-TYPICAL OVERLAY STRUCTURE.

SINGLE STREAM BATCH PROCESING.

The earliest computer systems allowed only a single person at a time to use the machine. All

of the machines resources were at the user’s disposal. User wrote all the code necessary to

implement a particular application, including the highly detailed machine level input/output

instructions. To implement basic functions was consolidated into an input/output control system

(ions).

Programs are limited in size to the amount of main storage, but it is possible to run programs

larger than the main storage by using overlays.

If a particular program section is not needed for the duration of the program’s execution, then

another section of the program may be brought in from the secondary storage to occupy the storage

used by the program section that is no longer needed.

SINGLE USER CONTIGUOUS ALLOCATION SYSTEM


129

PROTECTION IN SINGLE USER SYSTEMS:

In single user contiguous storage allocation systems, the user has complete control over all of

main storage. Storage is divided into a portion holding operating system routines, a portion holding

the user’s program and an unused portion.

Suppose the user destroys the operating system for example, suppose certain input/output are

accidentally changed. The operating system should be protected from the user. Protection is

implemented by the use of a single boundary register built into the CPU. Each time a user program

refers to a storage address, the boundary register is checked to be certain that the user is not about to

destroy the operating system. The boundary register contains the highest address used by the

operating system. If the user tries to enter the operating system, the instruction is intercepted and the

job terminates with an appropriate error message.

STORAGE PROTECTION WITH SINGLE USER CONTIGUOUS STORAGE ALLOCATION


130

The user needs to enter the operating system from time to time to obtain services such as

input/output. This problem is solved by giving the user a specific instruction with which to request

services from the operating system( ie., A supervisor call instruction). The user wanting to read from

tape will issue an instruction asking the operating system to do so in the user’s behalf.

Operating system must not be damaged by programs

System cannot function if operating system overwritten

Boundary register

Contains address where program’s memory space begins

Any memory accesses outside boundary are denied

Can only be set by privileged commands

Applications can access OS memory to execute OS procedures

Using system calls, which places the system in executive mode.

TYPICAL OVERLAY STRUCTURE.


131

SINGLE STREAM BATCH PROCESING:

Early single-user real storage systems were dedicated to one job for more than the job’s

execution time. During job setup and job tear down the computer is idle. Designers realized that if

they could automate job-to-job transition, then they could reduce considerably the amount of time

wasted between jobs. In single stream batch processing, jobs are grouped in batches by loading them

consecutively onto tape or disk. A job stream processor reads the job control language statements

and facilitates the setup of the next job. When the current job terminates the job stream reader

automatically reads in the control language statements for the next job, and facilitate the transition to

the next job .

Early systems required significant setup time

o Wasted time and resources

o Automating setup and teardown improved efficiency

Batch processing

o Job stream processor reads job control languages

Defines each job and how to set it up

FIXED PARTITION MULTIPROGRAMMING-FPM(*8m)

SUBHEADINGS

INTRODUCTION.

DIAGRAM-CPU UTILIZATION ON SINGLE USER SYSTEM.

FPM : ABSOLUTE TRANSLATION AND LOADING.

DIAGRAM- FPM : ABSOLUTE TRANSLATION AND LOADING.

DIAGRAM- FPM : EXAMPLE OF POOR STORAGE UTILIZATION.

FPM : RELOCATABLE TRANSLATION AND LOADING.

DIAGRAM- FPM :RELOCATABLE TRANSLATION AND LOADING.

STORAGE PROTECTION IN MULTIPROGRAMMING SYSTEMS.

DIAGRAM-STORAGE PROTECTION IN MULTIPROGRAMMING SYSTEM.

FRAGMENTATION IN FIXED PARTITION MULTIPROGRAMMING.

In batch processing systems, single user systems waste a considerable amount of the

computing resource. I/O speeds are extremely slow compared to CPU speed. Multiprogramming is


132

one solution to such problems, where I/o and CPU calculations can occur simultaneously. This

increases CPU utilization and system throughput. It requires more storage than single user system.

CPU UTILIZATION ON SINGLE USER SYSTEM.

CPU UTILIZATION ON A SINGLE USER SYSTEM

The program consumes the CPU resource until an input or output is needed

When the input and output request is issued the job often can’t continue until the requested

data is either sent or received.


133

Input and output speeds are extremely slow compared with CPU’s speeds

To increase the utilization of the CPU multiprogramming systems are implemented in which

several users simultaneously compete for system resources .

Advantage of multiprogramming is several jobs should reside in the computer‘s main storage

at once. Thus when one job requests input/output ,the CPU may immediately switched to

another job and may do calculations without delay. Thus both input/output and CPU

calculations can occur simultaneously. This greatly increases CPU utilization and system

through put.

Multiprogramming normally requires considerably more storage than a single user system.

Because multi-user programs has to be stored inside the main storage.

FPM : ABSOLUTE TRANSLATION AND LOADING

The earliest multiprogramming systems used fixed partition multiprogramming.

The main storage is divided into a number of fixed size partitions.

Each partition could hold a single job.

CPU switches between users to create simultaneity.

Jobs were translated with absolute assemblers & compilers to run only in a specified

partition.

If a job was ready to run and its partition was occupied, then that job had to wait, even if

other partitions were available.

This resulted in waste of the storage resource.

FPM : ABSOLUTE TRANSLATION AND LOADING


134

EXAMPLE OF POOR STORAGE UTILIZATION


135

FPM : RELOCATABLE TRANSLATION AND LOADING:

*Relocating compilers, assemblers and loaders are used to produce reloadable programs that can run

in any available partition that is large enough to hold them.

*This scheme eliminates storage waste inherent in multiprogramming with absolute translation and

loading.

*Relocatable translators and loaders are more complex than their absolute counterparts.

FPM : RELOCATABLE TRANSLATION AND LOADING


136

PROTECTION IN MULTIPROGRAMMING SYSTEMS:

In contiguous allocation multiprogramming systems, protection is implemented with

boundary registers.

With two registers, the low and high boundaries of a user partition can be delineated or the

low boundary (high boundary) and the length of the region can be indicated.

The user wants any service to be done by operating system. The user can request operating

system through supervisor call instruction (SVC).

This allows the user to cross the boundary of the operating system without compromising

operating system security.

Storage protection in contiguous allocation multiprogramming systems. While the user in partition

2 is active, all storage addresses developed by the running program are checked to be sure they fall

between b and c.

STORAGE PROTECTION IN MULTIPROGRAMMING SYSTEM


137

FRAGMENTATION IN FIXED PARTITION MULTIPROGRAMMING:

There are two difficulties with the use of equal-size fixed partitions.

A program may be too big to fit into a partition. In this case, the programmer must design the

program with the use of the overlays, so that only a portion of the program need be in main

memory at any one-time.

Main memory use is extremely inefficient. Any program, no matter how small, occupies an

entire partition. In our example, there may be a program that occupies less than 128KB of

memory, yet it takes up a 512K partition whenever it is swapped in. This phenomenon, in

which there is wasted space internal to a partition due to the fact that the block of data

located is smaller than the partition, is referred to as internal fragmentation.

VARIABLE PARTITION MULTIPROGRAMMING(VPM)

SUBHEADINGS

INTRODUCTION.

DIAGRAM-INITIAL PARTITION ASSIGNMENTS IN VPM.

DIAGRAM-STORAGE HOLES IN VPM.

COALESCING HOLES IN VPM.

DIAGRAM- COALESCING HOLES IN VPM.

STORAGE COMPACTION.

DIAGRAM- STORAGE COMPACTION.

STORAGE PLACEMENT STRATEGIES.

**BEST-FIT STRATEGY.

**FIRST-FIT STRATEGY.

**WORST-FIT STRATEGY.

INTRODUCTION

To overcome the problems with fixed partition multiprogramming, a method is used to

allow jobs to occupy as much space as needed.

No fixed boundaries are specified here.

The method of giving jobs as much as storage required is called variable partition

multiprogramming.


138

There are no assumptions of the size of the job.As the jobs arrive,if the scheduling

mechanisms decide that it should proceed,it is given as much storage as required. There is

no waste and a jobs partition is exactly the size of the job.

INITIAL PARTITION ASSIGNMENTS IN VPM

STORAGE HOLES IN VARIABLE PA

PARTITION PROGRAMMING.

USER I NEEDS 09K.

USER H NEEDS 18K.

USER G NEEDS 11K.

USER F NEEDS 32K.

USER E NEEDS 14K.

USER D NEEDS 25K.

USER C NEEDS 10K.

USER B NEEDS 20K.

USER A NEEDS 15K.

OS

USER A 15K.

FREE.

OS

USER A 15K.

USER B 20K.

FREE.

OS

USER A 15K.

USER B 20K.

USER C 10K.

FREE.

OS

USER A 15K.

USER B 20K.

USER C 10K.

USER D 25K.

FREE.


139

An example of variable partition programming is shown using 1MB of main memory. Main

memory is empty except for the operating system. The first three processes are loaded in starting

where the operating system ends, and occupy just enough space for each process. This leaves a

“hole”(ie a unused space) at the end of memory that is too small for a fourth process. At some point,

none of the processes in memory is ready. The operating system therefore swaps out process 2,

which leaves sufficient room to load a new process, process 4. Because process 4 is smaller than

process 2, another small hole is created.

Then the operating system swaps out process 1, and swaps process 2 back in. As this

example this method starts out well but leads to a lot of small holes in memory. As time goes on,

memory becomes more and more fragmented, and memory use declines. This phenomenon is called

external fragmentation. One technique for overcoming external fragmentation is compaction.


140

COALESCING HOLES

When a job finishes in a variable partition multiprogramming system, we can check whether the

storage being freed borders on other free storage areas(holes). If it does then we may record in the

free storage list either

(1) an additional hole or

(2) a single hole reflecting the merger of the existing hole and the new adjacent hole.

The process of merging adjacent hole to form a single larger hole is called coalescing. By

coalescing we reclaim, the largest possible contiguous block of storage.

COALESCING HOLES IN VPM.

STORAGE COMPACTION

Sometimes when a job requests a certain amount of main storage no individual holes is large

enough to hold the job, even though the sum of all the holes is larger than the storage needed by the

new job.

User 6 wants to execute his program . The program requires 100k of storage in main storage.

But he cannot use the main storage of his program in contiguous storage allocation. Because 100k of

storage is available but divided into 20k, 40k and 40k. So user 6 programs cannot be stored in the

storage area. So the memory space is wasted. To avoid this technique storage compaction is used.


141

Compaction attacks the problem of fragmentation by moving all the allocated blocks to one

end of memory, thus combining all the holes. Aside from the obvious cost of all that copying, there

is an important limitation to compaction: Any pointers to a block need to be updated when the block

is moved. Unless it is possible to find all such pointers, compaction is not possible. Pointers can

stored in the allocated blocks themselves as well as other places in the client of the memory

manager.

In some situations, pointers can point not only to the start of blocks but also into their bodies.

For example, if a block contains executable code, a branch instruction might be a pointer to another

location in the same block. Compaction is performed in three phases. First, the new location of each

block is calculated to determine the distance the block will be moved. Then each pointer is updated

by adding to it the amount that the block it is pointing into will be moved. Finally, the data is

actually moved. There are various clever tricks possible to combine these operations.

STORAGE COMPACTION IN VPM.

The technique of storage compaction involves moving all occupied areas of storage to one end

or the other of main storage. This leaves a single large hole for storage hole instead of the numerous

small holes common in variable partition multiprogramming. Now all of the available free storage is


142

contiguous so that a waiting job can run if its memory requirement is met by the single hole that

results from compaction.

Drawbacks of Compaction are:

It consumes system resources that could otherwise be used productively.

The system must stop everything while it performs the compaction. This can result

inerratic response times for interactive users and could be devastating in real-time

systems.

Compaction involves relocating the jobs that are in storage. This means that

relocation information, ordinarily lost when a program is loaded, must now be

maintained in readily accessible form.

With a normal, rapidly changing job mix, it is necessary to compact frequently.

STORAGE PLACEMENT STRATEGIES(*5m):

Storage placement strategies are used to determine where in the main storage to place

incoming programs and data

Three strategies of storage placement are

1) Best-fit Strategy: An incoming job is placed in the hole in main storage in which it fits most

tightly and leaves the smallest amount of unused space.

Best-fit strategy

Place job in the smallest possible hole in which it will fit

Free storage list (kept in ascending order by hole size)

Fig: First-fit, best-fit and worst-fit memory placement strategies


143

Fig: First-fit, best-fit and worst-fit memory placement strategies


144

2) First-fit Strategy: An incoming job is placed in the main storage in the first available hole large

enough to hold it

Fig: Worst-fit Strategy

2) Worst-fit Strategy: Worst fit says to place a program in main storage in the hole in which it

fits worst ie., the largest possible hole. The idea behind is after placing the program in this

large hole, the remaining hole often is also large and is thus able to hold a relatively large

new program.


145

VIRTUAL STORAGE

The term virtual storage is associated with the ability to address a storage space much larger

than that available in the primary storage of a particular computer system.

The Virtual Storage in MVS refers to the use of virtual memory in the operating system.

Virtual storage or memory allows a program to have access to the maximum amount of memory in a

system even though this memory is actually being shared among more than one application program.

The operating system translates the program's virtual address into the real physical memory

address where the data is actually located. The Multiple in MVS indicates that a separate virtual

memory is maintained for each of multiple task partitions.

The two most common methods of implementing virtual storage are paging and

segmentation. Fixed-Size blocks are called pages; variable-size blocks are called segments.

VIRTUAL STORAGE MANAGEMENT STRATEGIES(*5m):

SUBHEADINGS

INTRODUCTION.

FETCH STRATEGY.

**DEMAND FETCH.

**DIAGRAM-DEMAND FETCH.

**ANTICIPATORY FETCH.

PLACEMENT STRATEGY .

REPLACEMENT STRATEGY.

**PRINCIPLE OF OPTIMALITY

**RANDOM PAGE REPLACEMENT

**DIAGRAM-REPLACEMENT STRATEGY.

**FIRST-IN-FIRST-OUT

**LEAST-RECENTLY USED

**LEAST-FREQUENTLY USED

http://searchstorage.techtarget.com/definition/virtual-memory

http://searchmobilecomputing.techtarget.com/definition/memory


146

**NOT-USED-RECENTLY

**SECOND CHANCE

**CLOCK

INTRODUCTION:

The term virtual storage is associated with the ability to address a storage

space much larger than that available in the primary storage of a particular computer system.

The two most common methods of implementing virtual storage are paging and

segmentation. Fixed-Size blocks are called pages; variable-size blocks are called segments.

FETCH STRATEGY

*

*Demand Fetch Scheme: It is concerned with when a page or segment should be brought

from secondary to primary storage.

– Demand fetch strategy wait for a page or segment to be referenced by a running

process before bringing the page or segment to primary storage When a process first

executes, the system loads into main memory the page that contains its first instruction

– After that, the system loads a page from secondary storage to main memory only when the process explicitly references that page

– Requires a process to accumulate pages one at a time.

DEMAND FETCH


147

**Anticipatory Fetch Schemes: Anticipatory fetch strategies attempt to determine in advance

what pages or segments will be referenced by a process.An attempt to predict the pages a process

will need and preloads these pages when memory space is available.It must be carefully designed so that

overhead incurred by the strategy does not reduce system performance.

PLACEMENT STRATEGIES:

These are concerned with where in primary storage to place an income page or segment.

REPLACEMENT STRATEGIES:

These are concerned with deciding which page or segment to displace to make room for an

incoming page or segment when primary storage is already fully committed. In this case

operating system storage management routines must decide which page in primary storage to

displace to make room for an incoming page.

1) principle of optimality

2) Random page replacement

3) First-in-first-out

4) Least-recently used

5) Least-frequently used

6) Not-used-recently

7) Second chance

8) Clock

9) Working set

10) Page fault frequency

**The principle of optimality:

The principle of optimality states that to obtain optimum performance the page to

replace is the one that will not be used again for the furthest time in future.

**Random Page Replacement:

It is low-overhead . No discrimination against particular processes .It easily selects as the next page

to replace the page that will be referenced next. It is rarely used. All pages in main storage thus

have an equal likelihood of being selected for replacement. This strategy could select any page for

replacement, including the next page to be referred.


148

REPLACEMENT STRATEGY

**FIRST-IN-FIRST-OUT (FIFO) Page Replacement:

When a page needs to be replaced, we choose the one that has been in storage the longest.

First-in-first-out is likely to replace heavily used pages because the reason a page has been in

primary storage for a long time may be that it is in constant use.

**LEAST-RECENTLY-USED (LRU) Page Replacement:

This strategy selects that page for replacement that has not been used for the longest time.

LRU can be implemented with a list structure containing one entry for each occupied page frame.

Each time a page frame is referenced, the entry for that page is placed at the head of the list. Older

entries migrate toward the tail of the list. When a page must be replaced to make room for an

incoming page, the entry at the tail of the list is selected, the corresponding page frame is freed, the

incoming page is placed in that page frame, and the entry for that page frame is placed at the head of

the list because that page is now the one that has been most recently used.

**LEAST-FREQUENTLY-USED (LFU) Page Replacement:


149

In this strategy the page to replace is that page that is least frequently used or least intensively

referenced. The wrong page could be selected for replacement. For example, the least frequently

used page could be the page brought into main storage most recently.

**Not- Used- Recently Page Replacement:

It has approximate LRU with less overhead . It uses 2 indicator bits per page:

• referenced bit

• modified bit

The bits are reset periodically .The order for page replacement is

• un-referenced page

• un-modified page

It is supported in hardware on modern systems. Pages not used recently are not likely to be used

in the near future and they may be replaced with incoming pages.

The NUR strategy is implemented with the addition of two hardware bit per page. These are

a) Referenced bit=0 if the page has not been referenced

=1 if the page has been referenced

B) Modified bit=0 if the page has not been modified

=1 if the page has been modified

The NUR strategy works as follows. Initially, the referenced bits of all pages are set to 0. As a

reference to a particular page occurs, the referenced bit of that page is set to 1. When a page is to be

replaced we first try to find a page which has not been referenced.

**MODIFICATIONS TO FIFO; CLOCK PAGE REPLACEMENT AND SECOND

CHANCE PAGE REPLACEMENT:

The second chance variation of FIFO examines the referenced bit of the oldest page; if this

bit is off, the page is immediately selected for replacement. If the referenced bit is on, it is set off and

the page is moved to the tail of the FIFO list and treated essentially as a new arrival; this page

gradually moves to the head of the list from which it will be selected for replacement only if its

referenced bit is still off. This essentially gives the page a second chance to remain in primary

storage if indeed its referenced bit is turned on before the page reaches the head of the list.

LOCALITY


150

Locality is a property exhibited by running processes, namely that processes tend to favor a

subset of their pages during an execution interval. Temporal locality means that if a process

reference a page, it will probably reference that page again soon. Spatial locality means that if a

process references a page it will probably reference adjacent pages in its virtual address space.

Locality

Two types: spatial locality and temporal locality

Real programs in execution tend to display both types of locality

Spatial Locality:

references to addresses close to the current address (in the virtual address space)

natural occurrence in our programs

e.g. may be using instructions within a few pages and data from a few pages for a relatively

long time

to deal with slowly changing spatial locality, the OS may to use prediction (prepaging)

Temporal Locality

next address will be one that has been used recently (use some instructions over and over)

loops have both types of locality

paging takes care of most of the temporal locality problems (pages in 4k blocks)

WORKING SETS:

SUBHEADINGS.

INTRODUCTION.

DIAGRAM-DEFINITION OF A PROCESSE’S WORKING SET OF PAGES.

DIAGRAM-WORKING SET SIZE AS A FUNCTION OF WINDOW SIZE.

DIAGRAM-PRIMARY STORAGE ALLOCATION UNDER WS STORAGE.

INTRODUCTION:

Denning developed a view of program paging activity called the working set theory of

program behavior. A working set is a collection of pages a process is actively referencing. To run

a program efficiently, its working set of pages must be maintained in primary storage.

Otherwise excessive paging activity called thrashing might occur as the program repeatedly

requests pages from secondary storage.

DEFINITION OF A PROCESSE’S WORKING SET OF PAGES.


151

WORKING SET SIZE AS A FUNCTION OF WINDOW SIZE.

PRIMARY STORAGE ALLOCATION UNDER WORKING SET STORAGE MANAGEMENT.

Process

execution time.

W

t-W t

The pages referenced by the

process during this time interval c is

the working set W(t,w).

PROGRAM -----------------------------------------------------------------------------------------SIZE.

WORKING SET

SIZE.


152

Process time is the time during which a process has the CPU.The variable W is called the

Working set window size. The real working set of a process is the set of pages that must be in

the primary storage for a process to execute efficiently.

DEMAND PAGING(*5m):

A thumb rule is that thrashing can be avoided by giving processes enough page frames to hold half

their virtual space.

A working set storage management policy is to maintain the working sets of active

programs in primary storage. The decision to add a new process to the active set of processes is

based on the availability of sufficient space in the primary storage to accommodate the working set

of pages of the new process. The working set of pages of a process, W(t,w) at time t, is the set of

pages referenced by the process during time interval ( t-w) to t.

NUMBER

OF

PRIMAR

Y

STORAG

E

SPACES

ALLOCA

TED

TO THIS

PROCESS

.

TIME.

FIRST

WORKI

-NG

SET .

SECON-

--D

WORKI-

-NG

SET.

THIRD

WORKIN-

-G

SET.

FOURT-

-H

WORKI-

-NG

SET.

TRANSITION BETWEEN WORKING SETS.


153

SUBHEADINGS

INTRODUCTION.

DIAGRAM-DEMAND PAGING.

DIAGRAM-VIRTUAL MEMORY ADDRESSES.

DIAGRAM-SPACE-TIME PRODUCT UNDER DEMAND PAGING.

In virtual memory systems, demand paging is a type of swapping in which pages of data

are not copied from disk to RAM until they are needed. In contrast, some virtual memory

systems use anticipatory paging, in which the OPERATING SYSTEM attempts to anticipate which

data will be needed next and copies it to RAM before it is actually required.

As there is much less physical memory than virtual memory the operating system must be

careful that it does not use the physical memory inefficiently. One way to save physical memory is

to only load virtual pages that are currently being used by the executing program. For example, a

database program may be run to query a database. In this case not all the database needs to be loaded

into memory, just those data records that are being examined.

DEMAND PAGING

The Demand Paging is also same with the Simple Paging. But the Main Difference is that in

the Demand Paging Swapping is used. Means all the Pages will be in and out from the Memory

when they are required. When we specify a Process for the Execution then the Processes is stored

firstly on the Secondary Memory which is also known as the Hard Disk.

But when they are required then they are Swapped Backed into the Memory and when a

Process is not used by the user then they are Temporary Swapped out from the Memory. Means they

are Stored on the Disk and after that they are Copied into the Memory.

So Demand Paging is the Concept in which a Process is Copied into the Logical Memory

from the Physical Memory when we needs them. A Process can load either Entire, Copied into the

Main Memory or the part of single Process is copied into the Memory so that is only the single Part

of the Process is copied into the Memory then this is also called as the Lazy Swapping.

For Swapping the Process from the Main Memory or from the Physical Memory, a Page

Table must be used. The Page Table is used for Storing the Entries which Contains the Page or

Process Number and also the offset Number which indicates the address of the Process where a

Process is Stored and there will also be the Special or Extra Bit which is also Known as the Flag Bit

which indicates whether the Page is Stored into the Physical Memory.


154

The Page Table Contains two Entries those are used as valid and invalid means whether the

Process is Stored into the Page Table. Or Whether the Demand Program is Stored into the Physical

Memory So that they can be easily swapped. If the Requested Program is not stored into the Page

Table then the Page Table must Contains the Entries as v and I means valid and invalid along the

Page Number.

When a user Request for any Operation then the Operating System perform the following

instructions:-

1) First of all this will fetch all the instructions from the Physical Memory into the Logical Memory.

2) Decode all the instructions means this will find out which Operation has to be performed on the

instructions.

3) Perform Requested Operation.

4) Stores the Result into the Logical Memory and if needed the Results will be Stored into the

Physical Memory.

As there is much less physical memory than virtual memory the operating system must be

careful that it does not use the physical memory inefficiently. One way to save physical memory is

to only load virtual pages that are currently being used by the executing program. For example, a

database program may be run to query a database. In this case not all of the database needs to be

loaded into memory, just those data records that are being examined. Also, if the database query is a

search query then the it does not make sense to load the code from the database program that deals

with adding new records. This technique of only loading virtual pages into memory as they are

accessed is known as demand paging.

When a process attempts to access a virtual address that is not currently in memory the CPU

cannot find a page table entry for the virtual page referenced. For example, in Figure there is no

entry in Process X's page table for virtual PFN 2 and so if Process X attempts to read from an

address within virtual PFN 2 the CPU cannot translate the address into a physical one. At this point

the CPU cannot cope and needs the operating system to fix things up.

It notifies the operating system that a page fault has occurred and the operating system makes

the process wait whilst it fixes things up. The CPU must bring the appropriate page into memory

from the image on disk. Disk access takes a long time, relatively speaking, and so the process must

wait quite a while until the page has been fetched. If there are other processes that could run then the

operating system will select one of them to run.

The fetched page is written into a free physical page frame and an entry for the virtual PFN is

added to the processes page table. The process is then restarted at the point where the memory fault

occurred. This time the virtual memory access is made, the CPU can make the address translation

and so the process continues to run. This is known as demand paging and occurs when the system is

busy but also when an image is first loaded into memory. This mechanism means that a process can

execute an image that only partially resides in physical memory at any one time.

http://ecomputernotes.com/fundamental/disk-operating-system/what-is-operating-system


155

Demand paging


156

VIRTUAL MEMORY ADDRESSES

Also, if the database query is a search query then it does not make sense to load the code

from the database program that deals with adding new records. This technique of only loading

virtual pages into memory as they are accessed is known as demand paging.


157

In Demand Paging:

Pages are evicted to disk when memory is full

Pages loaded from disk when referenced again

OS allocates a page frame, reads page from disk

When I/O completes, the OS fills in PTE, marks it valid, and restarts faulting

process.

SPACE-TIME PRODUCT UNDER DEMAND PAGING.

ONE PAGE

FRAME

F F F F F

P

R

I

M

A

R

Y

S

T

O

R

A

G

E

A

L

L

O

C

A

T

I

O

N

‘F ‘ IS AVERAGE TIME FOR A PAGE FETCH.

PAGE

WAIT.

PAGE

WAIT.

PAGE

WAIT.

PAGE

WAIT. PAGE

WAIT.

PROCESS RUNNING.


158

No pages will be brought from secondary to primary storage until it is explicitly referenced by a

running process. Demand paging guarantees that the only pages brought to main storage are those

actually needed by processes. As each new page is referenced, the process must wait while the new

page is transferred to primary storage.

SUBHEADINGS.

DEFINITION.

ADVANTAGES.

DIAGRAM- INTERNAL FRAGMENTATION IN A PAGED SYSTEM.

TABLE- SOME COMMON PAGE SIZES.

DERIVATION.

DEFINITION:

Page size refers to the size of a page, which is a block of stored memory.

Page size affects the amount of memory needed and space used when running programs.

Most operating systems allow for the determination of the page size when a program begins running,

which allows it to calculate the most efficient use of memory while running that program.

The basic method for implementation involves breaking physical memory into fixed-sized

blocks called FRAMES and break logical memory into blocks of the same size called PAGES.

The basic idea is to allocate physical memory to processes in fixed size chunks called page

frames. Present abstraction to application of a single linear address space. Inside machine, break

up the address space of application into fixed size chunks called pages. Pages and page frames are

same size.

Store pages in page frames. When process generates an address, dynamically translate to the

physical page frame which holds data for that page.

So, a virtual address now consists of two pieces:

**page number

** an offset within that page.

Page sizes are typically powers of 2.This simplifies extraction of page numbers and offsets. To

access a piece of data at a given address, system automatically does the following:

Extracts page number.

Extracts offset.

Translate page number to physical page frame id.


159

Accesses data at offset in physical page frame.

ADVANTAGES:

A number of issues affect the determination of optimum page size for a given system

A small page size causes larger page table. The waste of storage due to excessively large

tables is called table fragmentation.

A large page size causes large amount of information that ultimately may not be referenced

are paged into primary storage.

I/O transfers are more efficient with large pages.

Localities tend to be small.

Internal fragmentation is reduced with small.

In the balance, most designers feel that pages factors point to the need for small pages.

INTERNAL FRAGMENTATION IN A PAGED SYSTEM.

FIRST

PAGE OF

SEGMENT. SECOND

PAGE OF

SEGMENT. S S LAST

AVERAGE PAGE OF

WASTE OF SEGMENT.

½ PAGE.


160

The important consideration is the size of pages to use. Many MMUs allow various different page

sizes.

Small size: It has less wasted space due to internal fragmentation. There are fewer unused

pages in memory.

Large page size: It has more efficient disk I/O and smaller page tables.

SOME COMMON PAGE SIZES.

MANUFACTURER MODEL PAGE SIZE. UNIT

HONEYWELL

MUTICS

1024

36-BIT WORD.

IBM

370/168.

1024 / 512.

32-BIT WORD.

DEC

PDP-10.

PDP-20.

512

36-

BIT WORD.

DEC

VAX 8800

512

8—BIT BYTE.

INTEL

80386

4096

8—

BIT BYTE.

DERIVATION:

If ,

the average process segment size is s.

The page table entry size is e bytes.

The page size is p,

then ,

Average amount of internal fragmentation per segment is p/2.

Average number of pages per process segment is s/p.

Each page requires 'e' bytes of page table, so each process segment requires page table

size of es/p bytes.

The Total overhead per segment due to internal fragmentation and page table entries, is


161

se/p + p/2

To minimise the overhead, differentiate with respect to page size, p, and equate to 0:

-se/p2 + ½ = 0

=> p = sqrt(2se)

EXAMPLE:

So for example, if the average segment size were 256K, and the page table entry size were 8

bytes, the optimum page size, to minimise overhead due to page table entries and internal

fragmentation, would be sqrt(2 × 256K × 8) = 2048 = 2K.

This calculation ignores the need to keep page sizes large in order to speed up paging

operations; it only considers memory overheads.


162

SUMMARY.

DEFINITION OF DOS

“An Operating System can be defined as a program, implemented in either software or firmware,

that makes the hardware usable. The Disk Operating System can also be defined as the software that

controls the hardware”.

BASIC FUNCTIONS OF OS

File management

Working with the Files like picking up and preparing to use a tools like

calculator.

Configuration of working environment.

DEFINITIONS OF “PROCESS”

A program in execution.

An asynchronous activity.

The “animated spirit” of a procedure.

The “locus of control” of a procedure in execution.

That entity to which processors are assigned.

The “dispatchable” unit.

PROCESS STATE TRANSITION.

When the process reaches the head of the list, and when the CPU becomes available , the process is

given the CPU and is said to make a state transition from ready state to the running state. The

assignment of the CPU to the first process on the ready list is called dispatching, and is performed

by a system entity called the dispatcher. We indicate this transition as follows

Dispatch (process name): ready --> running.

To prevent any one process to use the system as a monopoly, the operating system sets a

hardware interrupting clock (or interval timer) to allow this user to run for a specific time interval

or quantum.


163

UNIT IV

INTERRUPT

An interrupt is a dynamic event that needs prompt attention by the CPU. Usually an interrupt only

needs a short period of CPU time to serve it. After that the original process can resume its execution.

TYPES

There are two types interrupting events:

HARDWARE INTERRUPTS.

SOFTWARE INTERRUPTS.

FPM - Fixed Partition Multiprogramming.

VPM - The method of giving jobs as much as storage required is called variable partition

multiprogramming.

STORAGE PLACEMENT STRATEGIES

BEST-FIT.

FIRST-FIT.

WORST-FIT.

WORKING SET

A working set is a collection of pages a process is actively referencing.

PAGE SIZE

Page size refers to the size of a page, which is a block of stored memory.

TABLE FRAGMENTATION

A small page size causes larger page table. The waste of storage due to excessively

large tables is called table fragmentation.


164

UNIT V

Processor Management Job and Processor Scheduling: Preemptive Vs Non-preemptive scheduling –

Priorities – Deadline scheduling - Device and Information Management Disk Performance

Optimization: Operation of moving head disk storage – Need for disk scheduling – Seek

Optimization .File and Database Systems: File System – Functions – Organization – Allocating and

freeing space – File descriptor – Access control matrix.

JOB AND PROCESSOR SCHEDULING

The assignment of physical processors to processes allows processes to accomplish work.

The problems of determining when processors should be assigned, and to which processes. This is

called processor scheduling.

SCHEDULING LEVELS

Three important levels of scheduling are considered.

High-Level Scheduling:

Sometimes called job scheduling, this determines which jobs shall be allowed to compete

actively for the resources of the system. This is sometimes called admission scheduling

because it determines which jobs gain admission to the system.

Intermediate-Level Scheduling:

This determines which processes shall be allowed to compete for the CPU.


165

The intermediate-level scheduler responds to short-term fluctuations in system load by

temporarily suspending and activating (or resuming) processes to achieve smooth system

operation and to help realize certain system wide performance goals.

Low-Level Scheduling:

This determines which ready process will be assigned the CPU when it next becomes

available, and actually assigns the CPU to this process.


166

The CPU cannot be taken away from that process. A scheduling discipline is preemptive if

the CPU can be taken away.

Preemptive scheduling is useful in systems in which high-priority require rapid attention. In

real-time systems and interactive timesharing systems, preemptive scheduling is important in

guaranteeing acceptable response times.

To make preemption effective, many processes must be kept in main storage so that the next

process is normally ready for the CPU when it becomes available. Keeping non-running program in

main storage also involves overhead.

In non-preemptive systems, short jobs are made to wait by longer jobs, but the treatment of

all processes is fairer. Response time are more predictable because incoming high-priority jobs

cannot displace waiting jobs.

In designing a preemptive scheduling mechanism, one must carefully consider the

arbitrariness of virtually any priority scheme.

THE INTERVAL TIMER OR INTERRUPTING CLOCK

The processes to which the CPU is currently assigned is said to be running. To prevent users

from monopolizing the system the operating system has mechanisms for taking the CPU away from

the user. The operating system sets an interrupting clock or interval timer to generate an interrupt at

some specific future time. The CPU is then dispatched to the process. The process retains control of

the CPU until it voluntarily releases the CPU, or the clock interrupts or some other interrupt diverts

the attention of the CPU.

If the user is running and the clock interrupts, the interrupt causes the operating system to

run. The operating system then decides which process should get the CPU next. The interrupting

clock helps guarantee reasonable response times to interactive users, to prevent the system from

getting hung up on a user in an infinite loop, and allows processes to respond to time-dependent

events. Processes that need to run periodically depend on the interrupting events.

PRIORITIES

SUBHEADINGS.

INTRODUCTION.

TYPES OF PRIORITIES.

DIAGRAM-DISPATCHING PRIORITIES.

STATIC VS DYNAMIC PRIORITIES.

PURCHASED PRIORITIES


167

INTRODUCTION:

Priorities may be assigned automatically by the system or they may be assigned externally.

They may be static or they may be dynamic. They may be rationally assigned or arbitrarily assigned

in situations in which a system mechanism needs to distinguish between processes but does not

depend on which is more important. They may be earned or they may be bought.

TYPES OF PRIORITIES:

There are two types of priorities.

Static

Dynamic

DISPATCHING PRIORITIES.


168

STATIC VS DYNAMIC PRIORITIES

Static priorities do not change. Static priority mechanisms are easy to implement and have

relatively low overhead. They are not responsive to changes in environment, changes that might

make it desirable to adjust a priority.

Dynamic priority mechanisms are responsive to change. The initial priority assigned to a

process may have only a short duration after which it is adjusted to a more appropriate values.

Dynamic priority schemes are more complex to implement and have greater overhead than static

schemes.

PURCHASED PRIORITIES

An operating system must provide competent and reasonable service to a large community of

users but must also provide for those situations in which a member of the user community needs

special treatment.

A user with a rush job may be willing to pay a premium, ie., purchase priority, for a higher

level of service. This extra charge is merited because resources may need to be withdrawn from

other paying customers. If there were no extra charge, then all users would request the higher level

of service.

DEADLINE SCHEDULING

INTRODUCTION

In deadline scheduling certain jobs are scheduled to be completed within a specific time or

deadline. These jobs may have very high value if delivered on time and may be worthless if

delivered later than the deadline. The user is often willing to pay a premium to have the system

ensure on-time consumption. Deadline scheduling is complex for many reasons.

DEADLINE SCHEDULING

Two kinds of deadlines can be specified for each process: a starting deadline or the latest

instant of time by which execution of the process must start and a completion deadline ,or the time

by which the execution of process must complete.

SUBHEADINGS.

INTRODUCTION.

DIAGRAM - PROCESSES WITH SHORTER DEADLINE.

DIAGRAM - PROCESSES WITH LONGER DEADLINE.

REASONS –SCHEDULING IS COMPLEX.


169

DEADLINE ESTIMATTION

An in-depth analysis of a real time application and its response requirements is carried out

during its development. Deadlines for individual process can be determined by considering process

precedences and working backwards from the response requirement of an application. The deadline

of a process pi is

Di = Dapplication ∑k€descendant(i) xk

Where Dapplication is the deadline of an application, xk is the service time of process pk , and

descendant(i) is the set of descendants of Pi in the PPG , ie, the set of all processes that lie on the

some path of Pi and the exit node of PPG.Thus the deadline of the process Pi is such that if it is met

, all the processes that directly or indirectly depend on Pi can also finish by the overall deadline of

the application. It can be explained as follows:

P1

P5

P3

P2

P4

P6

2

5

4

5

6

3


170

PROCESSES T1,T2,T3 WITH SHORTER DEADLINE(MAX=9)

PROCESSES T1,T2,T3 WITH LONGER DEADLINE(MAX=23)

REASONS - SCHEDULING IS COMPLEX.

The user must supply the resource requirements of the job in advance. Such information is

rarely available. It may generate substantial overhead.

The system must run the deadline job without severely degrading service to other users.

The system must plan its resource requirements through to the deadline because new jobs

may arrive and place unpredictable demands on the system.

If many deadline jobs are to be active at once, scheduling could become so complex.


171

DEVICE AND INFORMATION MANAGEMENT

DISK PERFORMANCE OPTIMIZATION In multi programmed computing systems, inefficiency is often caused by improper use of

rotational storage devices such as disks and drums.

This is a schematic representation of the side view of a moving-head disk. Data is recorded

on a series of magnetic disk or platters. These disks are connected by a common spindle that spins at

very high speed. The data is accessed (i.e., either read or written) by a series of read-write heads, one

head per disk surface. A read-write head can access only data immediately adjacent to it.

Therefore, before data can be accessed, the portion of the disk surface from which the data is

to be read (or the portion on which the data is to be written) must rotate until it is immediately below

(or above) the read-write head. The time it takes for data to rotate from its current position to a

position adjacent to the read-write head is called latency time.

Each of the several read-write heads, while fixed in position, sketches out in circular track of

data on a disk surface. All read-write heads are attached to a single boom or moving arm assembly.

The boom may move in or out. When the boom moves the read-write heads to a new position, a

different set of tracks becomes accessible. For a particular position of the boom, the set of tracks


172

sketched out by all the read-write heads forms a vertical cylinder. The process of moving the boom

to a new cylinder is called a seek operation.

Thus, in

order to access a particular record of data on a moving-head disk, several operations are usually

necessary. First, the boom must be moved to the appropriate cylinder. Then the portion of the disk

on which the data record is stored must rotate until it is immediately under (or over) the read-write

head (i.e., latency time).

Then the record, which is of arbitrary size must be made to spin by the read-write head. This

is called transmission time. This is tediously slow compared with the high processing speeds of the

central computer system.

NEED FOR DISK SCHEDULING:

SUBHEADINGS

INTRODUCTION.

DIAGRAM-FCFS RANDOM SEEK PATTERN.

DESIRABLE CHARACTERS OF DISK SCHEDULING

POLICIES.


173

INTRODUCTION:

In multiprogramming computing systems, many processes may be generating requests for

reading and writing disk records. Because these processes sometimes make requests faster than they

can be serviced by the moving-head disks, waiting lines or queues build up for each device. Some

computing systems simply service these requests on a first-come-first-served (FCFS) basis.

Whichever request for service arrives first is serviced first. FCFS is a fair method of allocating

service, but when the request rate becomes heavy, FCFS can result in very long waiting times.

The numbers indicate the order in which the requests arrived.

FCFS exhibits a random seek pattern in which successive requests can cause time consuming

seeks from the innermost to the outermost cylinders. To minimize time spent seeking records, it

seems reasonable to order the request queue in some manner other than FCFS. This process is called

disk scheduling.

Disk scheduling involves a careful examination of pending requests to determine the most

efficient way to service the requests.

A disk scheduler examines the positional relationships among waiting requests. The request

queue is then reordered so that the requests will be serviced with minimum mechanical motion. The

two most common types of scheduling are seek optimization and rotation (or latency) optimization.


174

DESIRABLE CHARACTERISTICS OF DISK SCHEDULING POLICIES:

Several other criteria for categorizing scheduling policies are

1. Throughput

2. Mean response time

3. Variance of response times (ie. predictability)

A scheduling policy should attempt to maximize throughput

the number of requests serviced per unit time. A scheduling policy should attempt to minimize the

mean response time (or average waiting time plus average service time). Variance is a mathematical

measure of how far individual items tend to deviate from the average of the items. Variance to

indicate predictability- the smaller the variance, the greater the predictability. We desire a scheduling

policy that minimizes variance.

SEEK OPTIMIZATION

Most popular seek optimization strategies are:

1) FCFS (First-Come-First Served) Scheduling:

In FCFS scheduling, the first request to arrive is the first one serviced. FCFS is fair in

the same that once a request has arrived, its place in the schedule is fixed. A request cannot be

displaced because of the arrival of a higher priority request.

FCFS will actually do a lengthy seek to service a distant waiting request even though another

request may have just arrived on the same cylinder to which the read-write head is currently

positioned. It ignores the positional relationships among the pending requests in the queue.

FCFS is acceptable when the load on a disk is light. FCFS tend to saturate the device and

response times become large.

SUBHEADINGS.

**FCFS.

**SSTF.

**DIAGRAM-SSTF.

**SCAN SCHEDULING.

**DIAGRAM-SCAN SCHEDULING.

**N-STEP SCAN SCHEDULING.

**DIAGRAM- N-STEP SCAN SCHEDULING.

**C-SCAN SCHEDULING.

**DIAGRAM- C-SCAN SCHEDULING.

**ESCHENBACH SCHEME.


175

2) SSTF (Shortest-Seek-Time-First) Scheduling:

In SSTF Scheduling, the request that results in the shortest seek distance is serviced next,

even if that request is not the first one in the queue. SSTF is a cylinder –oriented scheme SSTF seek

patterns tend to be highly localized with the result that the innermost and outermost tracks can

receive poor service compared with the mid-range tracks.

SSTF results in better throughput rates than FCFS, and mean response times tend to be lower

for moderate loads. One significant drawback is that higher variance occurs on response times

because of the discrimination against the outermost and innermost tracks.

SSTF is useful in batch processing systems where throughput is the major consideration. But

its high variance of response times (i.e., its lack of predictability) makes it unacceptable in

interactive systems


176

3) SCAN Scheduling:

Denning developed the SCAN scheduling strategy to overcome the discrimination and high

variance in response times of SSTF. SCAN operates like SSTF except that it chooses the request that

results in the shortest seek distance in a preferred direction. If the preferred direction is currently

outward, then the SCAN strategy chooses the shortest seek distance in the outward direction. SCAN

does not change direction until it reaches the outermost cylinder or until there are no further requests

pending in the preferred direction. It is sometimes called the elevator algorithm because an elevator

normally continues in one direction until there are no more requests pending and then it reverses

direction.

SCAN behaves very much like SSTF in terms of improved

Throughput and improved mean response times, but it eliminates much of the discrimination

inherent in SSTF schemes and offers much lower variance.

4) N-STEP SCAN SCHEDULING:


177

One interesting modification to the basic SCAN strategy is called N-STEP SCAN. In this

strategy, the disk arm moves back and forth as in SCAN except that it services only those requests

waiting when a particular sweep begins. Requests arriving during a sweep are grouped together and

ordered for optimum service during the return sweep. N-STEP SCAN offers good performance in

throughput and mean response time. N-STEP has a lower variance of response times than either

SSTF or conventional SCAN scheduling. N-STEP SCAN avoids the possibility of indefinite

postponement occurring if a large number of requests arrive for the current cylinder. It saves these

requests for servicing on the return sweep.

5) C-SCAN SCHEDULING:


178

Another interesting modification to the basic SCAN strategy is called C-SCAN (for circular

SCAN). In C-SCAN strategy, the arm moves from the outer cylinder to the inner cylinder, servicing

requests on a shortest-seek basis. When the arm has completed its inward sweep, it jumps (without

servicing requests) to the request nearer the outermost cylinder, and then resumes its inward sweep

processing requests. Thus C-SCAN completely eliminates the

discrimination against requests for the innermost or outermost cylinder. It has a very small variance

in response times. At low loading, the SCAN policy is best. At medium to heavy loading, C-SCAN

yields the best results.

C-SCAN SCHEDULING.

ESCHENBACH SCHEME


179

This scheme was originally developed for an airline reservation system for handling

extremely heavy loads. This scheme was one of the first to attempt to optimize not only “seek time”

but also “rotational delays” as well. Still the C-SCAN strategy with rorational optimization has

proven to be better than Eschenbach Scheme under all loading conditions.

RAM DISKS A RAM disk is a disk device simulated in conventional random access memory. It

completely eliminates delays suffered in conventional disks because of the mechanical motions

inherent in seeks and in spinning a disk. RAM disks are especially useful in high-performance

applications.

Caching incurs a certain amount of CPU overhead in maintaining the contents of the cache

and in searching for data in the cache before attempting to read the data from disk. If the record

reference patterns are not seen in the cache, then the disk cache hit ratio will be small and the CPU’s

efforts in managing the cache will be waster, possibly resulting in poor performance.

RAM disks are much faster than conventional disks because they involve no mechanical

motion. They are separate from main memory so they do not occupy space needed by the operating

system or applications. Reference times to individual data items are uniform rather than widely

variable as with conventional disks.

RAM disks are much more expensive than regular disks. Most forms of RAM in use today

are volatile ie., they lose their contents when power is turned off or when the power supply is

interrupted. Thus RAM disk users should perform frequent backups to conventional disks. As

memory prices continue decreasing, and as capacities continue increasing it is anticipated that RAM

disks will become increasingly popular.

OPTICAL DISKS Various recording techniques are used. In one technique, intense laser heat is used to burn

microscopic holes in a metal coating. In another technique, the laser heat causes raised blisters on the

surface. In a third technique, the reflectivity of the surface is altered.

The first optical disks were write-once-read-many(WORM) devices. This is not useful for

applications that require regular updating. Several rewritable optical disk products have appeared on

the market recently. Each person could have a disk with the sum total of human knowledge and this

disk could be updated regularly. Some estimates of capacities are so huge that researchers feel it will

be possible to store 10^21 bits on a single optical disk.

An optical disc is an electronic data storage medium that can be written to and read using a

low-powered laser beam. Originally developed in the late 1960s, the first optical disc, created by

James T. Russell, stored data as micron-wide dots of light and dark. A laser read the dots, and the

data was converted to an electrical signal, and finally to audio or visual output. However, the

technology didn't appear in the marketplace until Philips and Sony came out with the compact disc

(CD) in 1982. Since then, there has been a constant succession of optical disc formats, first in CD

formats, followed by a number of DVD formats.

Optical disc offers a number of advantages over magnetic storage media. An optical disc

holds much more data. The greater control and focus possible with laser beams (in comparison to

tiny magnetic heads) means that more data can be written into a smaller space. Storage capacity

http://searchcio-midmarket.techtarget.com/definition/laser

http://searchcio-midmarket.techtarget.com/definition/micron

http://searchstorage.techtarget.com/definition/compact-disc


180

increases with each new generation of optical media. Emerging standards, such as Blu-ray, offer up

to 27 gigabytes (GB) on a single-sided 12-centimeter disc. In comparison, a diskette, for example,

can hold 1.44 megabytes (MB). Optical discs are inexpensive to manufacture and data stored on

them is relatively impervious to most environmental threats, such as power surges, or magnetic

disturbances.

FILE AND DATABASE SYSTEMS.

INTRODUCTION A file is a named collection of data. It normally resides on a secondary storage device such as a

disk or tape. It may be manipulated as a unit by operations such as

open – prepare a file to be referenced.

close – prevent further reference to a file until it is reopened.

create – build a new file.

destroy – remove a file.

copy – create another version of the file with a new name.

rename – change the name of a file.

list – print or display the contents of a file.

Individual data items within the file may be manipulated by operations like

read – input a data item to a process from a file.

write – output a data item from a process to a file.

update – modify an existing data item in a file.

insert – add a new data item to a file.

delete – remove a data item from a file.

Files may be characterized by

volatility – this refers to the frequency with which additions and deletions are made

to a file.

activity – this refers to the percentage of a file’s records accessed during a given

period of time.

size – this refers to the amount of information stored in the file.

File

A file is a named collection of related information that is recorded on secondary storage such

as magnetic disks, magnetic tapes and optical disks.In general, a file is a sequence of bits, bytes,

lines or records whose meaning is defined by the files creator and user.

File Structure

File structure is a structure, which is according to a required format that operating system can

understand.

A file has a certain defined structure according to its type.

A text file is a sequence of characters organized into lines.

A source file is a sequence of procedures and functions.

http://whatis.techtarget.com/definition/Blu-ray

http://searchstorage.techtarget.com/definition/gigabyte

http://searchstorage.techtarget.com/definition/megabyte


181

An object file is a sequence of bytes organized into blocks that are understandable by the

machine.

When operating system defines different file structures, it also contains the code to support

these file structure. Unix, MS-DOS support minimum number of file structure.

File Type

File type refers to the ability of the operating system to distinguish different types of file such

as text files source files and binary files etc. Many operating systems support many types of files.

Operating system like MS-DOS and UNIX have the following types of files:

Ordinary files

These are the files that contain user information.

These may have text, databases or executable program.

The user can apply various operations on such files like add, modify, delete or even remove

the entire file.

Directory files

These files contain list of file names and other information related to these files.

Special files:

These files are also known as device files.

These files represent physical device like disks, terminals, printers, networks, tape drive etc.

These files are of two types

Character special files - data is handled character by character as in case of terminals or

printers.

Block special files - data is handled in blocks as in the case of disks and tapes.

File Access Mechanisms

File access mechanism refers to the manner in which the records of a file may be accessed. There

are several ways to access files

Sequential access

Direct/Random access

Indexed sequential access

Sequential access


182

A sequential access is that in which the records are accessed in some sequence i.e the

information in the file is processed in order, one record after the other. This access method is the

most primitive one. Example: Compilers usually access files in this fashion.

Direct/Random access

Random access file organization provides, accessing the records directly.

Each record has its own address on the file with by the help of which it can be directly

accessed for reading or writing.

The records need not be in any sequence within the file and they need not be in adjacent

locations on the storage medium.

Indexed sequential access

This mechanism is built up on base of sequential access.

An index is created for each file which contains pointers to various blocks.

Index is searched sequentially and its pointer is used to access the file directly.

THE FILE SYSTEM

COMPONENTS:

SUBHEADINGS.

COMPONENTS.

**ACCESS METHODS.

**FILE MANAGEMENT.

**AUXILLARY STORAGE MANAGEMENT.

**FILE INTEGRITY MECHANISMS.

**DIAGRAM-TWO LEVEL HIERARCHICAL FILE

MANAGEMENT SYSTEM.


183

An important component of an operating system is the file system. File systems generally

contain

Access Methods – these are concerned with the manner in which data stored in files

is accessed.

File Management – This is concerned with providing the mechanisms for files to be

stored, referenced, shared and secured.

Auxiliary storage Management – This is concerned with allocating space for files

on secondary storage devices.

File integrity mechanisms – These are concerned with guaranteeing that the

information in a file is uncorrupted.

The file system is primarily concerned with managing

Secondary storage space, particularly disk storage. Let us assume an environment of a large-scale

timesharing system supporting approximately 100 active terminals accessible to a user community of

several thousand users. It is common for user accounts to contain between 10 and 100 files. Thus

with a user community of several thousand users, a system disks might contain 50,000 to 1,00,000 or

more separate files. These files need to be accessed quickly to keep response times small.

A file system for this type of environment may be organized as follows. A root is used to

indicate where on disk the root directory begins. The root directory points to the various user


184

directories. A user directory contains an entry for each of a user’s files; each entry points to where

the corresponding file is stored on disk.

Files names should be unique within a given user directory. In hierarchically structured file

systems, the system name of a file is usually formed as pathname from the root directory to the file.

For e.g., in a two-level file system with users A,B and C and in which A has files PAYROLL and

INVOICES, the pathname for file

PAYROLL is A: PAYROLL.

FILE SYSTEM FUNCTIONS Some of the functions normally attributed to file systems follows.

1) Users should be able to create, modify and delete files.

2) Users should be able to share each others files in a carefully controlled manner in order to

build upon each others work.

3) The mechanism for sharing files should provide various types of controlled access such

as read access, write access, execute access or various combinations of these.

4) Users should be able to structure their files in a manner most appropriate for each

application.

5) Users should be able to order the transfer of information between files.

6) Backup and recovery capabilities must be provided to prevent either accidental loss or

malicious destruction of information.

7) Users should be able to refer to their files by symbolic names rather than having to user

physical devices name (ie., device independence)

8) In sensitive environments in which information must be kept secure and private, the file

system may also provide encryption and decryption capabilities.

9) The file system should provide a user-friendly interface. It should give users a logical

view of their data and functions to be performed upon it rather than a physical view. The

user should not have to be concerned with the particular devices on which data is stored,

the form the data takes on those devices, or the physical means of transferring data to and

from these devices.

File-System Structure

Hard disks have two important properties that make them suitable for secondary storage of

files in file systems:

(1) Blocks of data can be rewritten in place, and

(2) They are direct access, allowing any block of data to be accessed with only ( relatively )

minor movements of the disk heads and rotational latency. ( See Chapter 12 )

Disks are usually accessed in physical blocks, rather than a byte at a time. Block sizes may

range from 512 bytes to 4K or larger.

File systems organize storage on disk drives, and can be viewed as a layered design:

o At the lowest layer are the physical devices, consisting of the magnetic media, motors

& controls, and the electronics connected to them and controlling them. Modern disk


185

put more and more of the electronic controls directly on the disk drive itself, leaving

relatively little work for the disk controller card to perform.

o I/O Control consists of device drivers, special software programs ( often written in

assembly ) which communicate with the devices by reading and writing special codes

directly to and from memory addresses corresponding to the controller card's

registers. Each controller card ( device ) on a system has a different set of addresses

( registers, a.k.a. ports ) that it listens to, and a unique set of command codes and

results codes that it understands.

o The basic file system level works directly with the device drivers in terms of

retrieving and storing raw blocks of data, without any consideration for what is in

each block. Depending on the system, blocks may be referred to with a single block

number, ( e.g. block # 234234 ), or with head-sector-cylinder combinations.

o The file organization module knows about files and their logical blocks, and how

they map to physical blocks on the disk. In addition to translating from logical to

physical blocks, the file organization module also maintains the list of free blocks,

and allocates free blocks to files as needed.

o The logical file system deals with all of the meta data associated with a file ( UID,

GID, mode, dates, etc ), i.e. everything about the file except the data itself. This level

manages the directory structure and the mapping of file names to file control blocks,

FCBs, which contain all of the meta data as well as block number information for

finding the data on the disk.

The layered approach to file systems means that much of the code can be used uniformly for

a wide variety of different file systems, and only certain layers need to be file system

specific. Common file systems in use include the UNIX file system, UFS, the Berkeley Fast

File System, FFS, Windows systems FAT, FAT32, NTFS, CD-ROM systems ISO 9660, and

for Linux the extended file systems ext2 and ext3 ( among 40 others supported. )

File-System Implementation

Overview

File systems store several important data structures on the disk:

o A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or the partition

boot sector in Windows contains information about how to boot the system off of

this disk. This will generally be the first sector of the volume if there is a bootable

system loaded on that volume, or the block will be left vacant otherwise.

o A volume control block, ( per volume ) a.k.a. the master file table in UNIX or the

superblock in Windows, which contains information such as the partition table,

number of blocks on each filesystem, and pointers to free blocks and free FCB

blocks.

o A directory structure ( per file system ), containing file names and pointers to

corresponding FCBs. UNIX uses inode numbers, and NTFS uses a master file table.

o The File Control Block, FCB, ( per file ) containing details about ownership, size,

permissions, dates, etc. UNIX stores this information in inodes, and NTFS in the

master file table as a relational database structure.

There are also several key data structures stored in memory:


186

o An in-memory mount table.

o An in-memory directory cache of recently accessed directory information.

o A system-wide open file table, containing a copy of the FCB for every currently

open file in the system, as well as some other related information.

o A per-process open file table, containing a pointer to the system open file table as

well as some other information. ( For example the current file position pointer may be

either here or in the system file table, depending on the implementation and whether

the file is being shared or not. )

The interactions of file system components when files are created and/or used:

o When a new file is created, a new FCB is allocated and filled out with important

information regarding the new file. The appropriate directory is modified with the

new file name and FCB information.

o When a file is accessed during a program, the open( ) system call reads in the FCB

information from disk, and stores it in the system-wide open file table. An entry is

added to the per-process open file table referencing the system-wide table, and an

index into the per-process table is returned by the open( ) system call. UNIX refers to

this index as a file descriptor, and Windows refers to it as a file handle.

o If another process already has a file open when a new request comes in for the same

file, and it is sharable, then a counter in the system-wide table is incremented and the

per-process table is adjusted to point to the existing entry in the system-wide table.

o When a file is closed, the per-process table entry is freed, and the counter in the

system-wide table is decremented. If that counter reaches zero, then the system wide

table is also freed. Any data currently stored in memory cache for this file is written

out to disk if necessary.

THE DATA HIERARCHY:

Bits are grouped together in bit patterns to represent all data items. There are 2^n possible

bit patterns for a string of n bits.

The two most popular character sets in use today are ASCII (American Standard Code for

Information Interchange) and EBCDIC (Extended Binary Coded Decimal Interchange Code).

ASCII is popular in personal computers and in data communication systems. EBCDIC is popular for

representing data internally in mainframe computer systems, particularly those of IBM.

A field is a group of characters. A record is a group of fields. A record key is a control field

that uniquely identifies the record. A file is a group of related records. A database is a collection of

files.

BLOCKING AND BUFFERING:

A physical record or block is the unit of information actually read from or written to a

device. A logical record is a collection of data treated as a unit from the user’s standpoint. When

each physical record contains exactly one logical record, the file is said to consist of unblocked

records. When each physical record may contain several logical records, the file is said to consist of

blocked records. In a file with fixed-length records, all records are the same length. In a file with

variable-length records, records may vary in size up to the block size.

Buffering allows computation to proceed in parallel with input/output. Spaces are provided in

primary storage to hold several


187

Physical blocks of a file at once – each of these spaces is called a buffer. The most common scheme

is called double buffering and it operates as follows (for output). There are two buffers. Initially,

records generated by a running process are deposited in the first buffer until it is full. The transfer of

the block in the first buffer to secondary storage is then initiated. While this transfer is in progress,

the process continues generating records that are deposited in the second buffer. When the second

buffer is full, and when the transfer from the first buffer is complete, transfer from the second buffer

is initiated. The process continues generating records that are now deposited in the first buffer. This

alternation between the buffers allows input/output to occur in parallel with a process’s

computations.

The Bridge Between The Logical and The Physical

Block:

Smallest amount of data that can be read from or written to secondary storage at one time.

Often generalized to mean any chunk of data that can be treated as a unit (for reading, writing,

organizing). We will distinguish between disk blocks (physical) and program defined blocks

(logical).

- can't always ensure that logical and physical blocks match (often don't even want to).

- should make sure they complement each other

- logical blocks should not be split between physical blocks

- it's often more efficient to waste a little physical space in order to achieve a better match

eg. logical blocks = 10 bytes; physical blocks = 32 bytes; so fit 3/p.b. (waste 2 bytes per physical

block)

Blocking:

The process of grouping several components into one block

Clustering:

Grouping file components according to access behavior

Considerations affecting block size:

1. size of available main memory

2. space reserved for programs (and their internal data space) that use the files

3. size of one component of the block

4. characteristics of the external storage device used

5.

Buffering:

Software interface that reconciles blocked components of the file with the program that

accesses information as single components.

A buffering interface is of one of two types:

o blocking routine

o deblocking routine.

Blocking Routine:

Stores components from the program into a buffer (in main memory)

Deblocking Routine:


188

Accesses one block from the file (,places it in memory) and sends one component at a time to

the program.

Sample Deblocking Process:

1. If buffer not empty, go to step 6

2. CPU issues input request

3. I/O channel signals device controller for device specified in the input request

4. device controller locates requested information and starts reading bytes from the device and

sends them to the buffer in main memory.

5. I/O channel waits until the buffer is full, then signals the CPU that I/O operation is complete,

Location indicator for the buffer sent to 1.

6. next component to which the location indicator points is sent to the program

7. increment location indicator

8. CPU continues execution of program

Logical Write:

Writing one component to the block-sized buffer

Physical Write:

Writing one block to the external file

Double Buffering:

Having two buffers so one can be filled while the other is being processed

Processor Bound:

A process where more time is taken to process a block than is taken to read or write the

block. In such a case, the entire process can only be made faster by increasing the efficiency of the

processing part.

Buffers

file manager : confirms file use info; finds physical location of file on disk; makes sure

required sector in buffer;

I/O buffer: holds sectors of data; often doesn't get written back to disk until the buffer is

needed for other uses (that way if more stuff done to the same sector, it doesn't have to be

loaded again)

I/O processor: may be simple chip or complex CPU takes instruction from O/S, but once it

starts it runs independently; it'll tell someone when it's done

Disk controller: I/O processor checks w/ disk controller if it's ready; then asks to position r/w heads;

when ready, I/O processor passes bytes to disk or vice-versa

Buffer management:

bottlenecks - what if only one buffer and we are alternately reading and writing - most have

at least one each

buffering strategies - trade-off is management overhead VS transfer time savings

avoid being I/O bound by having several buffers so one can be processed while another is

filled (then switch roles - double buffering)

keep a pool of buffers (take one only when you need it)


189

move mode: parts of memory are reserved for specific purposes (like system buffers and user

space) - this means stuff must be moved around, sometimes A LOT

locate mode: allows use of data directly from I/O buffer or transfer of data from device to

user buffer

scatter/ gather I/O: moves into/out of several buffers with a single READ/WRITE ; scatter:

move data from one block to several buffers according to specified organization; gather:

gather several buffers; write with single output

can sometimes control buffer management through calls to O/S

FILE ORGANIZATION

Collection of records, a key element in file management is the way in which the records themselves

are organized inside the file, since this heavily affects system performances ad far as record finding

and access. Note carefully that by ``organization'' we refer here to the logical arrangement of the

records in the file (their ordering or, more generally, the presence of ``closeness'' relations between

them based on their content), and not instead to the physical layout of the file as stored on a storage

media, To prevent confusion, the latter is referred to by the expression ``record blocking'', and will

be treated later on.

Choosing a file organization is a design decision, hence it must be done having in mind the

achievement of good performance with respect to the most likely usage of the file. The criteria

usually considered important are:

1. Fast access to single record or collection of related records.

2. Easy record adding/update/removal, without disrupting (1).

3. Storage efficiency.

4. Redundancy as a warranty against data corruption.

SUBHEADINGS.

SCHEMES

**SEQUENTIAL.

**DIRECT.

**INDEXED SEQUENTIAL.

**PARTITIONED.

**DIAGRAM-PARTITIONED DATA SET.

QUEUED AND BASIC ACCESS METHODS.


190

Needless to say, these requirements are in contrast with each other for all but the most trivial

situations, and it's the designer job to find a good compromise among them, yielding and adequate

solution to the problem at hand. For example, easiness of adding/etc. is not an issue when defining

the data organization of a CD-ROM product, whereas fast access is, given the huge amount of data

that this media can store. However, as it will become apparent shortly, fast access techniques are

based on the use of additional information about the records, which in turn competes with the high

volumes of data to be stored.

Logical data organization is indeed the subject of whole shelves of books, in the ``Database'' section

of your library. Here we'll briefly address some of the simpler used techniques, mainly because of

their relevance to data management from the lower-level (with respect to a database's) point of view

of an OS. Five organization models will be considered:

Pile.

Sequential.

Indexed-sequential.

Indexed.

Hashed.

File organization refers to the manner in which the records of a file are arranged on secondary

storage. The most popular file organization schemes in use today follow.

Sequential – Records are placed in physical order. The “next” record is the one that

physically follows the previous record. This organization is natural for files stored on magnetic tape,

an inherently sequential medium.

Direct – records are directly (randomly) accessed by their physical addresses on a

direct access storage device (DASD).

Indexed sequential – records are arranged in logical sequence according to a key

contained in each record. Indexed sequential records may be accessed sequentially in key order or

they may be accessed directly.

Partitioned – This is essentially a file of sequential subfiles. Each sequential subfile is called a

member. The starting address of each member is stored in the file’s directory.

The term volume is used to refer to the recording medium for each particular auxiliary storage

device. The volume used on a tape drive is a reel of magnetic tape; the volume used on a disk drive

is a disk.


191

QUEUED AND BASIC ACCESS METHODS:

Operating systems generally provide many access methods. These are sometimes grouped into

two categories, namely queued access methods and basic access methods. The queued methods

provide more powerful capabilities than the basic methods.

Queued access methods are used when the sequence in which records are to be processed can

be anticipated, such as in sequential and indexed sequential accessing. The queued methods perform

anticipatory buffering and scheduling of I/O operations. They try to have the next record available

for processing as soon as the previous record has been processed.

The basic access methods are normally used when the sequence in which records are to be

processed cannot be anticipated such as in direct accessing. And also in user applications to control

record access without incurring the overhead of the queue method.


192

ALLOCATING AND FREEING SPACE

INTRODUCTION:

When files are allocated and freed it is common for the space on disk to become

increasingly fragmented. One technique for alleviating this problem is to perform periodic

compaction or garbage collection. Files may be reorganized to occupy adjacent areas of the disk, and

free areas may be collected into a single block or a group of large blocks.

This garbage collection is often done during the system shut down; some systems

perform compaction dynamically while in operation. A system may choose to reorganize the files of

users not currently logged in, or it may reorganize files that have not been referenced for a long time.

Designing a file system requires knowledge of the user community, including the number of

users, the average number and size of files per user, the average duration of user sessions, the nature

of application to be run on the system, and the like. Users searching a file for information often use

file scan options to locate the next record or the previous record.

In paged systems, the smallest amount of information transferred between secondary and

primary storage is a page, so it makes sense to allocate secondary storage in blocks of the page size

or a multiple of a page size.

Locality tells us that once a process has referred to a data item on a page it is likely to reference

additional data items on that page; it is also likely to reference data items on pages contiguous to that

page in the user’s virtual address space.

SUBHEADINGS.

INTRODUCTION.

TYPES OF ALLOCATION.

**CONTIGUOUS ALLOCATION.

**NON-CONTIGUOUS ALLOCATION.

SECTOR – ORIENTED LINKED ALLOCATION.

BLOCK ALLOCATION.

DIAGRAM-BLOCK CHAINING.

DIAGRAM-INDEX BLOCK CHAINING.

DIAGRAM-BLOCK-ORIENTED FILE MAPPING.

DIAGRAM-PHYSICAL BLOCKS ON SECONDARY

STORAGE


193

Space Allocation

Files are allocated disk spaces by operating system. Operating systems deploy following three

main ways to allocate disk space to files.

Contiguous Allocation

Linked Allocation

Indexed Allocation

Contiguous Allocation

Each file occupy a contiguous address space on disk.

Assigned disk address is in linear order.

Easy to implement.

External fragmentation is a major issue with this type of allocation technique.

Linked Allocation

Each file carries a list of links to disk blocks.

Directory contains link / pointer to first block of a file.

No external fragmentation

Effectively used in sequential access file.

Inefficient in case of direct access file.

Indexed Allocation

Provides solutions to problems of contigous and linked allocation.

A index block is created having all pointers to files.

Each file has its own index block which stores the addresses of disk space occupied by the

file.

Directory contains the addresses of index blocks of files.

TYPES OF ALLOCATION:

There are two major types of allocation. They are

Contiguous allocation.

Noncontiguous allocation.

Sector-oriented linked allocation.

Block allocation.

CONTIGUOUS ALLOCATION

In contiguous allocation, files are assigned to contiguous areas of secondary storage. A user

specifies in advance the size of the area needed to hold a file is to be created. If the desired amount

of contiguous space is not available the file cannot be created.

One advantage of contiguous allocation is that successive logical records are normally

physically adjacent to one another. This speed access compared to systems in which successive

logical records are dispersed throughout the disk.


194

The file directories in contiguous allocation systems are relatively straightforward to implement.

For each file it is necessary to retain the address of the start of the file and the file’s length.

Disadvantage of contiguous allocation

The files are deleted, the space they occupied on secondary storage is reclaimed.

This space becomes available for allocation of new files, but these new files must

fit in the available holes.

Thus contiguous allocation schemes exhibit the same types of fragmentation

problems inherent in variable partition multiprogramming systems – adjacent

secondary storage holes must be coalesced, and periodic compaction may need to be

performed to reclaim storage areas large enough to hold new files.

NONCONTIGUOUS ALLOCATION

Files tend to grow or shrink over time so generally we go for dynamic noncontiguous storage

allocation systems instead of contiguous allocation systems.

SECTOR-ORIENTED LINKED ALLOCATION

Files consist of many sectors which may be dispersed throughout the disk. Sectors belonging to a

common file contain pointers to one another, forming a linked list. A free space list contains entries

for all free sectors on the disk. When a file needs to grow, the process requests more sectors from the

free space list. Files that shrink return sectors to the free space list. There is no need for compaction.

The drawbacks in noncontiguous allocation is that the records of a file may be dispersed

throughout the disk, retrieval of logically contiguous records can involve lengthy seeks.

BLOCK ALLOCATION

One scheme used to manage secondary storage more efficiently and reduce execution time

overhead is called block allocation. This is a mixture of both contiguous allocation and

noncontiguous allocation methods.

In this scheme, instead of allocating individual sectors, blocks of contiguous sectors (sometimes

called extents) are allocated. There are several common ways of implementing block-allocation

systems. These include block chaining, index block chaining, and block –oriented file mapping.

In block chaining entries in the user directory point to the first block of each file. The fixed-

length blocks comprising a file each contain two portions: a data block, and a pointer to the next

block. Locating a particular record requires searching the block chain until the appropriate block is

found, and then searching that block until the appropriate block is found, and then searching that

block until the appropriate record is found. Insertions and deletion are straightforward.


195

With index block chaining, the pointers are placed into separate index blocks.

Each index block contains a fixed number of items. Each entry contains a record identifier and a

pointer to that record. If more than one index block is needed to describe a file, then a series of index

blocks is chained together.


196

The big advantage of index block chaining over simple block chaining over

simple block chaining is that searching may take place in the index blocks themselves. Once the

appropriate record is located via the index blocks, the data block containing that record is read into

primary storage. The disadvantage of this scheme is that insertions can require the complete

reconstruction of the index blocks, so some systems leave a certain portion of the index blocks

empty to provide for future insertions.

In block-oriented file mapping instead of using pointers, the system uses block

numbers. Normally, these are easily converted to actual block addresses because of the geometry of

the disk. A file map contains one entry for each block on the disk. Entries in the user directory

point to the first entry in the file map for each file.

Each entry in the file map for each file. Each entry in the file map contains the

block number of the next block in that file. Thus all the blocks in a file may be located by following

the entries in the file map.

The entry in the file map that corresponds to the last entry of a particular file is set to some

sentinel value like ‘Nil’ to indicate that the last block of a file has been reached. Some of the entries

in the file map are set to “Free” to indicate that the block is available for allocation. The system may

either search the file map linearly to locate a free block, or a free block list can be maintained.

An advantage of this scheme is that the physical adjacencies on the disk are

reflected in the file map. Insertions and deletions are straightforward in this scheme.

INDEX BLOCK CHAINING


197

BLOCK-ORIENTED FILE MAPPING.


198

FILE DESCRIPTOR A file descriptor or file control block is a control block containing information the system needs

to manage a file.

A typical file descriptor might include

1) symbolic file name

2) location of file in secondary storage

3) file organization (Sequential, indexed sequential, etc.)

4) device type

5) access control data

6) type (data file, object program, c source program, etc.)

7) disposition (permanent vs temporary)

8) creation date and time

9) destroy date

10) date and time last modified

11) access activity counts (number of reads, for example)


199

File descriptors are maintained on secondary storage. They are brought to primary storage when

a file is opened.

ACCESS CONTROL MATRIX One way to control access to files is to create a two-dimensional access control matrix

listing all the users and all the files in the system. The entry Aij is 1 if user i is allowed access to file

j Otherwise Aij=0. In an installation with a largae number of users and a large number of files, this

matrix would be very large and very sparse. Allowing one user access to another users files.

To make a matrix concept useful, it would be necessary to use codes to indicate various

kinds of access such as read only, write only, execute only, read write etc.

ACCESS CONTROL BY USER CLASSES

A technique that requires considerably less space is to control access to various user classes. A

common classification scheme is

1) Owner – Normally, this is the user who created the file.

2) Specified User - The owner specifies that another individual may use the file.

3) Group or Project – Users are often members of a group working on a particular project. In this

case the various members of the group may all be granted access to each other’s project-related files.

4) Public- Most systems allow a file to be designated as public so that it may be accessed by any

member of the system’s user community. Public access normally allows users to read or execute a

file, but not to write it.


200

hicasbscit.files.wordpress.com · system software and operating system 1 system software &...

Documents