manual on compiler design

Download Manual on Compiler Design

Post on 15-Nov-2014




0 download

Embed Size (px)


Compiler Design


COURSE CODE AND TITLE: ITED 308 - Compiler Design COURSE DESCRIPTION: This is a three-unit course. This course is designed as an introduction and construction of compilers and interpreters. It will open with a discussion of translators related to compilers, followed by an overview of the phases and data structures involved in compilation. Topics including lexical analysis, parsing, semantic analysis, code generation, and optimization will then be covered in depth with a series of projects assigned to illustrate practical issues. The performance of the student will be evaluated according to quizzes, machine problems, and term project and modular examinations. PREREQUISITES: - CS 19C Numerical Method OBJECTIVES: General Objective: To study the theory and techniques of compiler construction. The student completing the course should have gained an understanding of the major ideas and techniques in compiler writing and a further development of programming skills. Specific Objective: - To explain the basic concepts and principles of compiler. - To discuss the problem issues in designing and implementing lexical analyzers. - To listen critically and purposively to the basic concepts of compiler. - To participate actively in the group term project. - To construct a program that act as a recognizer for the set of strings defined by a regular expression or context-free grammar. COURSE CONTENT: Module I. II. III. Title Introduction to Compilers ------------------------------------Lexical Analysis ---------------------------------------------The Syntactic Specification of Programming Languages ---------Pre-Midterm Examination IV. V. Basic Parsing Techniques -------------------------------------Syntax-Directed Translation ----------------------------------Midterm Examination VI. VII. VIII. Symbol Tables ------------------------------------------------Run-Time Storage ---------------------------------------------Error-Detection and Recovery ---------------------------------Pre-Final Examination IX. X. Introduction to Code Optimization ----------------------------Code Generation ----------------------------------------------Final Examination 52 60 37 42 48 27 31 Page 2 6 11

Saint Paul University San Nicolas Campus

Compiler Design




Compiler writing spans programming languages, machine architecture, language theory, algorithms, and software engineering. Fortunately, a few basic compilerwriting techniques can be used to construct translators for a wide variety of languages and machines. A Compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language the target language as illustrated in Figure 1.1 in which the important part of the translation process is that the compiler reports to its user the presence of errors in the source program. target source compiler program program error messages Figure 1.1 A Compiler At first glance, the variety of compilers may appear overwhelming. There are thousands of source languages, ranging from traditional programming languages such as Fortran and Pascal to specialized languages that have arisen in virtually every area of computer application. Target languages are equally as varied; a target language may be another programming language, or the machine language of any computer between a microprocessor and a supercomputer. A compiler translates a source program into machine language. An interpreter program reads individual words of a source program and immediately executes corresponding machine-language segments. Interpretation occurs each time the program is used. Thus, once it has been compiled, a program written into a compiled language will run more quickly than a program in an interpreted language. An interpreter is a computer program that translates commands written in a high-level computer language into machine-language commands that the computer can understand and execute. An interpreter's function is thus similar to that of a compiler, but the two differ in their mode of operation. A compiler translates a complete set of high-level commands, producing a corresponding set of machinelanguage commands that are then executed, whereas an interpreter translates and executes each command as the user enters it. Interpretive languages, such as the widely used BASIC, are relatively easy to learn, and programs written in them are easy to edit and correct. Compiled programs, on the other hand, operate more rapidly and efficiently. THE ANALYSIS-SYNTHESIS MODELOF


There are two parts of compilation: analysis and synthesis. The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. The synthesis part constructs the desired target program from the intermediate representation. During analysis, the operations implied by and recorded in a hierarchical structure called tree called a syntax tree, in which each node children of a node represent the arguments of the the source program are determined a tree. It is a special kind of represents an operation and the operation. An example is shown in

Saint Paul University San Nicolas Campus

Compiler Design


Figure 1.2.

:= position + initial rate * 60

Figure 1.2 Syntax tree for position := initial + rate * 60. Software tools that manipulate source programs:

1. Structure editors takes an input a sequence of commands to build a source

program. The structure editor not only performs the text-creation and modification functions of an ordinary text editor, but it also analyzes the program text, putting an appropriate hierarchical structure of the source program. 2. Pretty printers analyzes a program and prints it in such a way that the structure of the program becomes clearly visible. Example of this, comments may appear in a special font, statements may appear with an amount of indentation proportional to the depth of their nesting in the hierarchical organization of the statements. 3. Static checkers reads a program, analyzes it, and attempts to discover potential bugs without running the program. For example, a static checker may detect that parts of the source program can never be executed, or that a certain variable might be used before being defined. 4. Interpreters performs the operations implied by the source program. For an assignment statement, for example, an interpreter might build a tree like in Figure 1.2 and then carry out the operations at the nodes as it walks the tree. Interpreters are frequently used to execute command languages, since each operator executed in a command language is usually in invocation of a complex routine such as an editor or compilers. ANALYSISOF THE


It consists of three phases:

1. Linear analysis, in which the stream of characters making up the source 2. 3.

program is read from left-to-right and grouped into tokens that are sequences of characters having a collective meaning. Hierarchical analysis, in which characters or tokens are grouped hierarchically into nested collections with collective meaning. Semantic analysis, in which certain checks are performed to ensure that the components of a program fit together meaningfully.

THE PHASES OF A COMPILER Conceptually, a compiler operates in phases, each of which transforms the source program from one representation to another. A typical decomposition of a compiler is shown in Figure 1.3.

Saint Paul University San Nicolas Campus

Compiler Design

4 source program

lexical analyzer

syntax analyzer semantic analyzer error handler intermediate code generator code optimizer

symbol-table manager

code generator

target program Figure 1.3 Phases of Compiler LEXICAL ANALYSIS In a compiler, linear analysis is called lexical analysis or scanning. example, in lexical analysis the characters in the assignment statement position := initial + rate * 60 would 1. 2. 3. 4. 5. 6. 7. be grouped into the following tokens: The identifier position. The assignment symbol :=. The identifier initial. The plus sign The identifier rate. The multiplication sign. The number 60. of these tokens would normally be For

The blanks separating the characters eliminated during lexical analysis. SYNTAX ANALYSIS

Hierarchical analysis is called parsing or syntax analysis. It involves grouping the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. Usually, a parse tree such as the one shown in Figure 1.4 represents the grammatical phrases of the source program.

Saint Paul University San Nicolas Campus

Compiler Design


assignment statement identifier position := expression expression + expression

identifier rate


expression number 60

Figure 1.4 Parse tree for position := initial + rate * 60SEMANTICANALYSIS

The semantic analysis phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. It uses the hierarchical structure determined by the syntax-analysis phase to identify the operators and operands of expressions and statements. An important component of semantic analysis is type checking. Here the compiler checks that each operator has operands that are permitted by the source language specification. For example, many programming language definitions require a compiler to report an error every time a real number is used to index an array. However, the language specification may permit some operand coercions, for example, when a binary arithmetic operator is applied to an integer and real. In this case, the com