mc0073 - set 1

February 2010

Master of Computer Application (MCA) – Semester 3

MC0073 – System Programming

Assignment Set – 1

1. Describe the following with respect to Language Specification:

A) Programming Language Grammars

B) Classification of Grammars

C) Binding and Binding Times

Ans –

A) Programming Language Grammars

The lexical and syntactic features of a programming language are specified by its grammar. This section discusses key concepts and notions from formal language grammars. A language L can be considered to be a collection of valid sentences. Each sentence can be looked upon as a sequence of words and each word as a sequence of letters or graphic symbols acceptable in L. A language specified in this manner is known as & formal language. A formal language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural languages are not formal languages due to their rich vocabulary. However, PLs are formal languages.

Terminal symbols, alphabet and strings

The alphabet of L, denoted by the Greek symbol , is the collection of symbols in

its character set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be

represented using the mathematical notation of a set, e.g. = {a, b,… z, 0, l,… 9}

Here the symbols {, ‘,’ and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are distinct from the terminal symbols. If this is not the case, i.e. if a terminal symbol and a meta symbol are identical, we enclose the terminal symbol in quotes to differentiate it from the meta symbol. For example, the set of punctuation symbols of English can be defined as where ‘,’ denotes the terminal symbol ‘comma’.

A string is a finite sequence of symbols. We will represent strings by Greek symbols a, (α, ß, γ etc. Thus α = axy is a string over ∑. The length of a string is the number of symbols in it. Note that the absence of any symbol is also a string, the null string ε. The

http://resources.smude.edu.in/slm/wp-content/uploads/2010/01/clip-image01232.jpg



concatenation operation combines two strings into a single string. It is used to build larger strings from existing strings. Thus, given two strings α and ß, concatenation of α with ß yields a string which is formed by putting the sequence of symbols forming α before the sequence of symbols forming ß. For example, if α = ab, ß = axy, then concatenation of α and ß, represented as α.ß or simply αß, gives the string abaxy. The null string can also participate in a concatenation, thus a.ε =ε.a = a.

Nonterminal symbols

A nonterminal symbol (NT) is the name of a syntax category of a language, e.g. noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed between <…>, e.g. A or < Noun >. During grammatical analysis, a nonterminal symbol represents an instance of the category. Thus, < Noun > represents a noun.

Productions

A production, also called a rewriting rule, is a rule of the grammar. A production has the form A nonterminal symbol :: = String of Ts and NTs and defines the fact that the NT on the LHS of the production can be rewritten as the string of Ts and NTs appearing on the RHS. When an NT can be written as one of many different strings, the symbol ‘|’ (standing for ‘or’) is used to separate the strings on the RHS, e.g.

< Article > ::- a | an | the

The string on the RHS of a production can be a concatenation of component strings, e.g. the production < Noun Phrase > ::= < Article >< Noun >

expresses the fact that the noun phrase consists of an article followed by a noun.

Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G. A valid string α of lg is obtained by using the following procedure

1. Let α= ‘S’.

2. While α is not a string of terminal symbols

(a) Select an NT appearing in α, say X.

(b) Replace X by a string appearing on the RHS of a production of X.

Example

Grammar (1.1) defines a language consisting of noun phrases in English

< Noun Phrase > :: = < Article > < Noun >

< Article > ::= a | an | the

<Noun> ::= boy | apple

< Noun Phrase > is the distinguished symbol of the grammar, the boy and an apple are some valid strings in the language.

Definition (Grammar)

A grammar G of a language lg is a quadruple where

is the alphabet of Lg, i.e. the set of Ts,

SNT is the set of NTs,

S is the distinguished symbol, and

P is the set of productions.

Derivation, reduction and parse trees

A grammar G is used for two purposes, to generate valid strings of lg and to ‘rec-ognize’ valid strings of lg. The derivation operation helps to generate valid strings while the reduction operation helps to recognize valid strings. A parse tree is used to depict the syntactic structure of a valid string as it emerges during a sequence of derivations or reductions.

Derivation

Let production pi of grammar G be of the form

**

*A

and let be a string such that ß = γAθ, then replacement of A by α in string constitutes a derivation according to production p1 . We use the notation N Þη to denote direct derivation of η from N and N Þ η to denote transitive derivation of η (i.e. derivation in zero or more steps) from N, respectively. Thus, A =>α only if A : = α is a production of G and A Þ δ if A Þ … Þ δ. We can use this notation to define a valid string according to a grammar G as follows: δ is a valid string according to G only if S Þ δ, where S is the distinguished symbol of G.

Example: Derivation of the string the boy according to grammar can be depicted as






< Noun Phrase > => < Article > < Noun >

=> the < Noun >

=> the boy

A string α such that S => α is a sentential form of lg. The string α is a sentence of lg if it consists of only Ts.

Example: Consider the grammar G

< Sentence >::= < Noun Phrase > < Verb Phrase >

< Noun Phrase >::= < Article >< Noun >

< Verb Phrase >::= <verb> <Noun Phrase>

<Article> ::= = a | an | the

< Noun >::= boy | apple

<verb> ::= ate

The following strings are sentential forms of Lg

< Noun Phrase > < Verb Phrase >

the boy < Verb Phrase >

< Noun Phrase > ate < Noun Phrase >

the boy ate < Noun Phrase >

the boy ate an apple

However, only the boy ate an apple is a sentence.

Reduction: To determine the validity of the string

Example

The boy ate an apple

according to grammar we perform the following reductions Step String

The boy ate an apple

1 < Article > boy ate an apple

2 < Article > < Noun > ate an apple

3 < Article > < Noun > < Verb > an apple

4 < Article > < Noun > < Verb > < Article > apple

5 < Article > < Noun > < Verb > < Article > < Noun >

6 < Noun Phrase > < Verb > < Article > < Noun >

7 < Noun Phrase > < Verb > < Noun Phrase >

8 < Noun Phrase > < Verb Phrase >

9 < Sentence >

The string is a sentence of lg since we are able to construct the reduction sequence the boy ate an apple —> < Sentence >.

Parse trees

A sequence of derivations or reductions reveals the syntactic structure of a string with respect to G. We depict the syntactic structure in the form of a parse tree. Derivation according to the production A :: = α gives rise to the following elemental parse tree.

B) Classification of Grammars

Grammars are classified on the basis of the nature of productions used in them (Chomsky, 1963). Each grammar class has its own characteristics and limitations.

Type – 0 Grammars

These grammars, known as phrase structure grammars, contain productions of the form

where both α and ß can be strings of Ts and NTs. Such productions permit arbitrary substitution of strings during derivation or reduction, hence they are not relevant to specification of programming languages.

Type – 1 grammars

These grammars are known as context sensitive grammars because their productions specify that derivation or reduction of strings can take place only in specific contexts. A Type-1 production has the form


Thus, a string in a sentential form can be replaced by ‘A’ (or vice versa) only when it is enclosed by the strings . These grammars are also not particularly relevant for PL specification since recognition of PL constructs is not context sensitive in nature.

Type – 2 grammars

These grammars impose no context requirements on derivations or reductions. A typical Type-2 production is of the form

which can be applied independent of its context. These grammars are therefore known as context free grammars (CFG). CFGs are ideally suited for programming language specification.

Type – 3 grammars

Type-3 grammars are characterized by productions of the form

A::= tB | t or A ::= Bt | t

Note that these productions also satisfy the requirements of Type-2 grammars. The specific form of the RHS alternatives—namely a single T or a string containing a single T and a single NT—gives some practical advantages in scanning.

Type-3 grammars are also known as linear grammars or regular grammars. These are further categorized into left-linear and right-linear grammars depending on whether the NT in the RHS alternative appears at the extreme left or extreme right.

Operator grammars

Definition (Operator grammar (OG)) An operator grammar is a grammar none of whose productions contain two or more consecutive NTs in any RHS alternative.

Thus, nonterminals occurring in an RHS string are separated by one or more terminal symbols. All terminal symbols occurring in the RHS strings are called operators of the grammar.

C) Binding and Binding Times

Definition: Binding: A binding is the association of an attribute of a program entity with a value.

Binding time is the time at which a binding is performed. Thus the type attribute of variable var is bound to type, when its declaration is processed. The size attribute of type is bound to a value sometime prior to this binding. We are interested in the following binding times:





1. Language definition time of L

2. Language implementation time of L

3. Compilation time of P

4. Execution init time of proc

5. Execution time of proc.

Where L is a programming language, P is a program written in L and proc is a procedure in P. Note that language implementation time is the time when a language translator is designed. The preceding list of binding times is not exhaustive; other binding times can be defined, viz. binding at the linking time of P. The language definition of L specifies binding times for the attributes of various entities of programs written in L.

Binding of the keywords of Pascal to their meanings is performed at language def-inition time. This is how keywords like program, procedure, begin and end get their meanings. These bindings apply to all programs written in Pascal. At language implementation time, the compiler designer performs certain bindings. For example, the size of type ‘integer’ is bound to n bytes where n is a number determined by the architecture of the target machine. Binding of type attributes of variables is performed at compilation time of program bindings. The memory addresses of local variables info and p of procedure proc are bound at every execution init time of procedure proc. The value attributes of variables are bound (possibly more than once) during an execution of proc. The memory address of P↑ is bound when the procedure call new (p) is executed.

Static and dynamic bindings

Definition (Static binding) A static binding is a binding performed before the execution of a program begins.

Definition (Dynamic binding) A dynamic binding is a binding performed after the execution of a program has begun.

2. Define the following:

A) Systems Software B) Application Software

C) System Programming D) Von Neumann Architecture

Ans –

A) System Software

System software is computer software designed to operate the computer hardware and to provide and maintain a platform for running application software. The most important types of system software are:

The computer BIOS and device firmware, which provide basic functionality to operate and control the hardware connected to or built into the computer.The operating system (prominent examples being Microsoft Windows, Mac OS X and Linux), which allows the parts of a computer to work together by performing tasks like transferring data between memory and disks or rendering output onto a display device. It also provides a platform to run high-level system software and application software.Utility software, which helps to analyze, configure, optimize and maintain the computer.

In some publications, the term system software is also used to designate software development tools (like a compiler, linker or debugger).

System software is usually not what a user would buy a computer for - instead, it can be seen as the basics of a computer which come built-in or pre-installed. In contrast to system software, software that allows users to do things like create text documents, play games, listen to music, or surf the web is called application software.

B) Application Software

Application software, also known as applications or apps, is computer software designed to help the user to perform singular or multiple related specific tasks. Examples include Enterprise software, Accounting software, Office suites, Graphics software and media players.

Application software is contrasted with system software and middleware, which manage and integrate a computer's capabilities, but typically do not directly apply them in the performance of tasks that benefit the user. A simple, if imperfect analogy in the world of hardware would be the relationship of an electric light bulb (an application) to an electric power generation plant (a system). The power plant merely generates electricity, not itself of any real use until harnessed to an application like the electric light that performs

In computer science, an application is a computer program designed to help people perform a certain type of work. An application thus differs from an operating system (which runs a computer), a utility (which performs maintenance or general-purpose chores), and a programming language (with which computer programs are created). Depending on the work for which it was designed, an application can manipulate text, numbers, graphics, or a combination of these elements. Some application packages offer considerable computing power by focusing on a single task, such as word processing; others, called integrated software, offer somewhat less power but include several applications. User-written software tailors systems to meet the user's specific needs. User-written software includes spreadsheet templates, word processor macros, scientific simulations, and graphics and animation scripts. Even email filters are a kind of user software. Users create this software themselves and often overlook how important it is. The delineation between system software such as operating systems and application software is not exact, however, and is occasionally the object of controversy.

C) System Programming

System programming (or systems programming) is the activity of programming system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to produce software which provides services to the user (e.g. word processor), whereas systems programming aims to produce software which provides services to the computer hardware (e.g. disk defragmenter). It requires a greater degree of hardware awareness.

In system programming more specifically:

the programmer will make assumptions about the hardware and other properties of the system that the program runs on, and will often exploit those properties (for example by using an algorithm that is known to be efficient when used with specific hardware)

usually a low-level programming language or programming language dialect is used that: can operate in resource-constrained environments is very efficient and has little runtime overhead has a small runtime library, or none at all allows for direct and "raw" control over memory access and control flow lets the programmer write parts of the program directly in assembly language

debugging can be difficult if it is not possible to run the program in a debugger due to resource constraints. Running the program in a simulated environment can be used to reduce this problem.

Systems programming is sufficiently different from application programming that programmers tend to specialize in one or the other.

In system programming, often limited programming facilities are available. The use of automatic garbage collection is not common and debugging is sometimes hard to do. The runtime library, if available at all, is usually far less powerful, and does less error checking. Because of those limitations, monitoring and logging are often used; operating systems may have extremely elaborate logging subsystems.

Implementing certain parts in operating system and networking requires systems programming (for example implementing Paging (Virtual Memory) or a device driver for an operating system).

D) The Von Neumann architectureThe von Neumann architecture is a design model for a stored-program digital

computer that uses a central processing unit (CPU) and a single separate storage structure ("memory") to hold both instructions and data. It is named after the mathematician and early computer scientist John von Neumann. Such computers implement a universal Turing machine and have a sequential architecture.

A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write, random-access memory (RAM). Stored-program computers

were an advancement over the program-controlled computers of the 1940s, such as the Colossus and the ENIAC, which were programmed by setting switches and inserting patch leads to route data and to control signals between various functional units. In the vast majority of modern computers, the same memory is used for both data and program instructions. The mechanisms for transferring the data and instructions between the CPU and memory are, however, considerably more complex than the original von Neumann architecture.

3. Explain the following with respect to the design specifications of an

Assembler:

A) Data Structures B) pass1 & pass2 Assembler flow chart

Ans –

A) Data Structure

The second step in our design procedure is to establish the databases that we have to work with.

Pass 1 Data Structures

1. Input source program

2. A Location Counter (LC), used to keep track of each instruction’s location.

3. A table, the Machine-operation Table (MOT) that indicates the symbolic mnemonic, for each instruction and its length (two, four, or six bytes)

4. A table, the Pseudo-Operation Table (POT) that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1.

5. A table, the Symbol Table (ST) that is used to store each label and its corresponding value.

6. A table, the literal table (LT) that is used to store each literal encountered and its corresponding assignment location.

7. A copy of the input to be used by pass 2.

Pass 2 Data Structures

1. Copy of source program input to pass1.

2. Location Counter (LC)

3. A table, the Machine-operation Table (MOT), that indicates for each instruction, symbolic mnemonic, length (two, four, or six bytes), binary machine opcode and format of instruction.

4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2.

5. A table, the Symbol Table (ST), prepared by pass1, containing each label and corresponding value.

6. A Table, the base table (BT), that indicates which registers are currently specified as base registers by USING pseudo-ops and what the specified contents of these registers are.

7. A work space INST that is used to hold each instruction as its various parts are being assembled together.

8. A work space, PRINT LINE, used to produce a printed listing.

9. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader.

10. An output deck of assembled instructions in the format needed by the loader.

Format of Data Structures

The third step in our design procedure is to specify the format and content of each of the data structures. Pass 2 requires a machine operation table (MOT) containing the name, length, binary code and format; pass 1 requires only name and length. Instead of using two different tables, we construct single (MOT). The Machine operation table (MOT) and pseudo-operation table are example of fixed tables. The contents of these tables are not filled in or altered during the assembly process.

The following figure depicts the format of the machine-op table (MOT)

—————————————– 6 bytes per entry ———————————–

Mnemonic Opcode (4bytes) characters

Binary Opcode (1byte) (hexadecimal)

Instruction length

(2 bits) (binary)

Instruction format

(3bits) (binary)

Not used here

(3 bits)

“Abbb” 5A 10 001

“Ahbb” 4A 10 001

“ALbb” 5E 10 001

“ALRB” 1E 01 000

……. ……. ……. …….

‘b’ represents “blank”

B) pass1 & pass2 Assembler flow chart

Pass Structure of Assemblers

Here we discuss two pass and single pass assembly schemes in this section:

Two pass translation

Two pass translation of an assembly language program can handle forward references easily. LC processing is performed in the first pass and symbols defined in the program are entered into the symbol table. The second pass synthesizes the target form using the address information found in the symbol table. In effect, the first pass performs analysis of the source program while the second pass performs synthesis of the target program. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass. This representation consists of two main components–data structures, e.g. the symbol table, and a processed form of the source program. The latter component is called intermediate code (IC).

Single pass translation

LC processing and construction of the symbol table proceed as in two pass translation. The problem of forward references is tackled using a process called backpatch-ing. The operand field of an instruction containing a forward reference is left blank initially. The address of the forward referenced symbol is put into this field when its definition is encountered.

Look at the following instructions:

START 101

READ N 101) + 09 0 113

MOVER BREG, ONE 102) + 04 2 115

MOVEM BREG, TERM 103) + 05 2 116

AGAIN MULT BREG, TERM 104) + 03 2 116

MOVER CREG, TERM 105) + 04 3 116

ADD CREG, ONE 106) + 01 3 115

MOVEM CREG, TERM 107) + 05 3 116

COMP CREG, N 108) + 06 3 113

BC LE, AGAIN 109) + 07 2 104

MOVEM BREG, RESULT 110) + 05 2 114

PRINT RESULT 111) + 10 0 114

STOP

112) + 00 0 000

N DS 1 113)

RESULT DS 1 114)

ONE DC ‘1’ 115) + 00 0 001

TERM PS 1 116)

END

In the above program, the instruction corresponding to the statement

MOVER BREG, ONE

can be only partially synthesized since ONE is a forward reference. Hence the instruction opcode and address of BREG will be assembled to reside in location 101. The need for inserting the second operand’s address at a later stage can be indicated by adding an entry to the Table of Incomplete Instructions (TII). This entry is a pair (instruction address>, <symbol>), e.g. (101, ONE) in this case.

By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the source program and TII would contain information describing all forward references. The assembler can now process each entry in TII to complete the concerned instruction. For example, the entry (101, ONE) would be processed by obtaining the address of ONE from symbol table and inserting it in the operand address field of the instruction with assembled address 101. Alternatively, entries in TII can be processed in an incremental manner. Thus, when definition of some symbol symb is encountered, all forward references to symb can be processed.

Design of A Two Pass Assembler

Tasks performed by the passes of a two pass assembler are as follows:

Pass I:

1. Separate the symbol, mnemonic opcode and operand fields.

2. Build the symbol table.

3. Perform LC processing.

4. Construct intermediate representation.

Pass II: Synthesize the target program.

Pass I performs analysis of the source program and synthesis of the intermediate representation while Pass II processes the intermediate representation to synthesize the target program. The design details of assembler passes are discussed after introducing advanced assembler directives and their influence on LC processing.

4. Explain the following with respect to Macros and Macro Processors:

A) Macro Definition and Expansion

B) Conditional Macro Expansion

C) Macro Parameters

Ans –

A) Macro definition and Expansion

Definition : macro

A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for the following purposes:

· To simplify and reduce the amount of repetitive coding

· To reduce errors caused by repetitive coding

· To make an assembly program more readable.


A macro consists of name, set of formal parameters and body of code. The use of macro name with set of actual parameters is replaced by some code generated by its body. This is called macro expansion.

Macros allow a programmer to define pseudo operations, typically operations that are generally desirable, are not implemented as part of the processor instruction, and can be implemented as a sequence of instructions. Each use of a macro generates new program instructions, the macro has the effect of automating writing of the program.

Macros can be defined used in many programming languages, like C, C++ etc. Example macro in C programming.Macros are commonly used in C to define small snippets of code. If the macro has parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the overhead of a function call in simple cases, where the code is lightweight enough that function call overhead has a significant impact on performance.

For instance,

#define max (a, b) a>b? A: b

Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing

z = max(x, y);

Becomes z = x>y? X:y;

While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls.

C macros are capable of mimicking functions, creating new syntax within some limitations, as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or else comments), but they have some limitations as a programming construct. Macros which mimic functions, for instance, can be called like real functions, but a macro cannot be passed to another function using a function pointer, since the macro itself has no address.

In programming languages, such as C or assembly language, a name that defines a set of commands that are substituted for the macro name wherever the name appears in a program (a process called macro expansion) when the program is compiled or assembled. Macros are similar to functions in that they can take arguments and in that they are calls to lengthier sets of instructions. Unlike functions, macros are replaced by the actual commands they represent when the program is prepared for execution. function instructions are copied into a program only once.

Macro Expansion.

A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of assembly statements.

Macro expansion on a source program.

Example

In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which is called during program execution. Every macro begins with MACRO keyword at the beginning and ends with the ENDM (end macro).when ever a macro is called the entire is code is substituted in the program where it is called. So the resultant of the macro code is shown on the right most side of the figure. Macro calling in high level programming languages

(C programming)

#define max(a,b) a>b?a:b

Main () {

int x , y;

x=4; y=6;

z = max(x, y); }

The above program was written using C programing statements. Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing

Becomes z = x>y ? x: y;

After macro expansion, the whole code would appear like this.




#define max(a,b) a>b?a:b

main()

{ int x , y;

x=4; y=6;z = x>y?x:y; }

B) Conditional Macro Expansion

Means that some sections of the program may be optional, either included or not in the final program, dependent upon specified conditions. A reasonable use of conditional assembly would be to combine two versions of a program, one that prints debugging information during test executions for the developer, another version for production operation that displays only results of interest for the average user. A program fragment that assembles the instructions to print the Ax register only if Debug is true is given below. Note that true is any non-zero value.

Here is a conditional statements in C programming, the following statements tests the expression `BUFSIZE == 1020 , where `BUFSIZE’ must be a macro. ′

#if BUFSIZE == 1020

printf ("Large buffers!n");

#endif /* BUFSIZE is large */

Note : In the C programming macros are defined above the main() .

C) Macros Parameters

Macros may have any number of parameters, as long as they fit on one line. Parameter names are local symbols, which are known within the macro only. Outside the macro they have no meaning!

Syntax:

<macro name> MACRO <parameter 1>…….<parameter n><body line 1><body line 2>..<body line m>ENDM

Valid macro arguments are

1. arbitrary sequences of printable characters, not containing blanks, tabs, commas, or semicolons

2. quoted strings (in single or double quotes)

3. Single printable characters, preceded by ‘!’ as an escape character

4. Character sequences, enclosed in literal brackets < … >, which may be arbitrary sequences of valid macro blanks, commas and semicolons

5. Arbitrary sequences of valid macro arguments

6. Expressions preceded by a ‘%’ character

During macro expansion, these actual arguments replace the symbols of the corresponding formal parameters, wherever they are recognized in the macro body. The first argument replaces the symbol of the first parameter, the second argument replaces the symbol of the second parameter, and so forth. This is called substitution.

Example 3

MY_SECOND MACRO CONSTANT, REGISTER

MOV A,#CONSTANT

ADD A,REGISTER

ENDM

MY_SECOND 42, R5

After calling the macro MY_SECOND, the body lines

MOV A,#42

ADD A,R5

are inserted into the program, and assembled. The parameter names CONSTANT and REGISTER have been replaced by the macro arguments "42" and "R5". The number of arguments, passed to a macro, can be less (but not greater) than the number of its formal parameters. If an argument is omitted, the corresponding formal parameter is replaced by an empty string. If other arguments than the last ones are to be omitted, they can be represented by commas.

Macro parameters support code reuse, allowing one macro definition to implement multiple algorithms. In the following, the .DIV macro has a single parameter N. When the macro is used in the program, the actual parameter used is substituted for the formal

parameter defined in the macro prototype during the macro expansion. Now the same macro, when expanded, can produce code to divide by any unsigned integer.

Fig. 3.0

Example 4

The macro OPTIONAL has eight formal parameters: OPTIONAL MACRO P1,P2,P3,P4,P5,P6,P7,P8..<macro body>..ENDM

If it is called as follows,

OPTIONAL 1,2,,,5,6

the formal parameters P1, P2, P5 and P6 are replaced by the arguments 1, 2, 5 and 6 during substitution. The parameters P3, P4, P7 and P8 are replaced by a zero length string.

5. Describe the process of Bootstrapping in the context of Linkers

Ans –

In computing, bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. It is a solution to the Chicken-and-egg problem of starting a certain system without the system already functioning. The term is most often applied to the process of starting up a computer, in which a mechanism is needed to execute the software program that is responsible for executing software programs (the operating system).

Bootstrap loading

The discussions of loading up to this point have all presumed that there’s already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer?

In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling one’s self up by the


bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the system’s address space. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.)

Why not just load the operating system directly? Because you can’t fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompress a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that it’s tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs.

Many Unix systems use a similar bootstrap process to get user-mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs.

None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single-application system could be built over a Unix kernel by naming the application /etc/init.

Software Bootstraping & Compiler Bootstraping

Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler for a higher-level language and so on, until one can have a graphical IDE and an extremely high-level programming language.

Compiler Bootstraping

In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler.

6. Describe the procedure for design of a Linker.

Ans –

Design of a linker

Relocation and linking requirements in segmented addressing

The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. Use of the segmented addressing structure reduces the relocation requirements of program.

Implementation Examples: A Linker for MS-DOS

Example: Consider the program of written in the assembly language of intel 8088. The ASSUME statement declares the segment registers CS and DS to the available for memory addressing. Hence all memory addressing is performed by using suitable displacements from their contents. Translation time address o A is 0196. In statement 16, a reference to A is assembled as a displacement of 196 from the contents of the CS register. This avoids the use of an absolute address, hence the instruction is not address sensitive. Now no relocation is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or by the OS). The effective operand address would be calculated as <CS>+0196, which is the correct address 2196. A similar situation exists with the reference to B in statement 17. The reference to B is assembled as a displacement of 0002 from the contents of the DS register. Since the DS register would be loaded with the execution time address of DATA_HERE, the reference to B would be automatically relocated to the correct address.

Though use of segment register reduces the relocation requirements, it does not completely eliminate the need for relocation. Consider statement 14 .

MOV AX, DATA_HERE

Which loads the segment base of DATA_HERE into the AX register preparatory to its transfer into the DS register . Since the assembler knows DATA_HERE to be a segment, it makes provision to load the higher order 16 bits of the address of DATA_HERE into the AX register. However it does not know the link time address of DATA_HERE, hence it assembles the MOV instruction in the immediate operand format and puts zeroes in the operand field. It also makes an entry for this instruction in RELOCTAB so that the linker would put the appropriate address in the operand field. Inter-segment calls and jumps are handled in a similar way.

Relocation is somewhat more involved in the case of intra-segment jumps assembled in the FAR format. For example, consider the following program :

FAR_LAB EQU THIS FAR ; FAR_LAB is a FAR label

JMP FAR_LAB ; A FAR jump

Here the displacement and the segment base of FAR_LAB are to be put in the JMP instruction itself. The assembler puts the displacement of FAR_LAB in the first two operand bytes of the instruction , and makes a RELOCTAB entry for the third and fourth operand bytes which are to hold the segment base address. A segment like

ADDR_A DW OFFSET A

(which is an ‘address constant’) does not need any relocation since the assemble can itself put the required offset in the bytes. In summary, the only RELOCATAB entries that must exist for a program using segmented memory addressing are for the bytes that contain a segment base address.

For linking, however both segment base address and offset of the external symbol must be computed by the linker. Hence there is no reduction in the linking requirements.

mc0073 - set 1

Documents

string of terminal symbols

distinguished symbol

string of ts

symbol s

meta symbol

single string

null string

nonterminal symbol nt