compiler design assignment

ACROPOLIS INSTITUTE OF TECHNOLOGY AND RESEARCH

Name: Abhijeet Kumar Pandey Subject: Compiler DesignBranch: Computer Science & Engg.

Year/Sem:2010/VII

Q1. Explain why we should study about compiler. What is Compiler & its various phases with diagram & by taking a example of a= b*cd.

Ans: Reasons for Studying Compilers:• An essential programming tool• Improves software productivity by hiding low-level details• A tool for designing and evaluating computer architectures• Inspired RISC, VLIW machines• Machines’ performance measured on compiled code• Techniques for developing other programming tools Examples: error detection tools• Little languages and program translations can be used to solve other problems

Compiler : A compiler is a computer program (or set of programs) that transforms source code written in a programming language(the source language) into another computer language (the target language, often having a binary form known as object code).

Phases of Compiler:Lexical Analyzer: The lexical analysis stage transforms a sequence of characters to a sequence of lexical elements. These lexical entities correspond principally to integers, floating point numbers, characters, strings of characters and identifiers. The message Illegal character might be generated by this analysis.

Syntax Analysis: The parsing stage constructs a syntax tree and verifies that the sequence of lexical elements is correct with respect to the grammar of the language. The message Syntax error indicates that the phrase analyzed does not follow the grammar of the language.

Compiler Design (CS-701)

http://en.wikipedia.org/wiki/Object_code

http://en.wikipedia.org/wiki/Programming_language

http://en.wikipedia.org/wiki/Source_code

http://en.wikipedia.org/wiki/Computer_program



Year/Sem:2010/VII

Semantic Analysis: The semantic analysis stage traverses the syntax tree, checking another aspect of program correctness. The analysis consists principally of type inference, which if successful, produces the most general type of an expression or declaration. Type error messages may occur during this phase. This stage also detects whether any members of a sequence are not of type unit. Other warnings may result, including pattern matching analysis (e.g pattern matching is not exhaustive, part of pattern matching will not be used).

Code Generation: Generation and the optimization of intermediate code does not produce errors or warning messages.

The final step in the compilation process is the generation of a program binary.




Year/Sem:2010/VII

Given Expression is a=b*c

id1=id2*id3;

Temp1=id3; temp2=id2; Temp3=temp2*temp1; id1=temp3;


Laxical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate code generator



Year/Sem:2010/VII

Temp1=id2*id3; Id1=temp1;

MOVE ID2,AXMOVE ID3,BXMUL AX,BXMOV AX,ID1

Q2. What is parsing? How many types of techniques used in the parsing explain with diagram?

Ans: Parsing is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. Parsing can also be used as a linguistic term, especially in reference to how phrases are divided up in garden path sentences.

The basic connection between a sentence and the grammar it derives from is the parse tree, which describes how the grammar was used to produce the sentence.

There are only two techniques to do parsing. The first method tries to imitate the original production process by rederiving the sentence from the start symbol. This method is called top-down, because the production tree is reconstructed from the top downwards.


code optimizer

Code generator

http://en.wikipedia.org/wiki/Garden_path_sentence

http://en.wikipedia.org/wiki/Formal_grammar

http://en.wikipedia.org/wiki/Lexical_analysis#Token



Year/Sem:2010/VII

The second methods tries to roll back the production process and to reduce thesentence back to the start symbol. Quite naturally this technique is called bottom-up.

Top-down parsing: grammar for the language anb nc n

SS -> aSQS -> abcbQc -> bbcccQ -> Qcand suppose the (input) sentence is aabbcc.

Top-down parsing tends to identify the production rules (and thus to characterize

the parse tree) in prefix order.

Bottom-up parsing:A bottom up parser is trying to go backwards, performing the following reverse derivation sequence:




Year/Sem:2010/VII

ax → Ax → SIntuitively, a top-down parser tries to expand nonterminals into right-hand-sides and a

bottom-up parser tries to replace (reduce) right-hand-sides with nonterminals. The first action of the bottom-up parser would be to replace a with A yielding Ax. Then it would replace Ax with S. Once it arrives at a sentential form with exactly S, it has reached the goal and stops, indicating success.

Q3. What do you understand by error recovery & error handling in LL & LR parsing also explains its types.

Ans: Error Recovery in Predictive Parsing: An error is detected during predictive parsing when the terminal on top of the stack does not match the next input symbol or when nonterminal A is on top of the stack, a is the next input symbol, and M[A, a] is error (i.e., the parsing-table entry is empty).

Panic ModePanic-mode error recovery is based on the idea of skipping symbols on the the input

until a token in a selected set of synchronizing tokens appears. Its effectiveness depends on the choice of synchronizing set. The sets should be chosen so that the parser recovers quickly from errors that are likely to occur in practice.

Phrase-level RecoveryPhrase-level error recovery is implemented by filling in the blank entries in the predictive parsing table with pointers to error routines. These routines may change, insert, or delete symbols on the input and issue appropriate error messages. They may also pop from the stack. Alteration of stack symbols or the pushing of new symbols onto the stack is questionable for several reasons.


http://en.wikipedia.org/wiki/Top-down_parser



Year/Sem:2010/VII

First, the steps carried out by the parser might then not correspond to the derivation of any word in the language at all. Second, we must ensure that there is no possibility of an infinite loop. Checking that any recovery action eventually results

in an input symbol being consumed (or the stack being shortened if the end of the input has been reached) is a good way to protect against such loops.

Q4. Define Syntax Directed Definition. Construct a Syntax Directed Definition to convert infix to postfix translation and also show annotated parse tree for expression 9-5+2.

Ans: A syntax directed definition is a generalization of a context free grammar in which each grammar symbol has an associated set of attributes, partitioned into two subsets called the synthesized and inherited attributes of that grammar symbol. An attribute can represent anything we choose: a string, a number, a type, a memory location, or whatever. The value of an attribute at a parse tree node is defined by a semantic rule associated with a production used at that node. The value of a synthesized attribute at a node is computed from the values of attributes at the children of that node in the parse tree; the value of an inherited attribute is computed from the values of attributes at the siblings and parent of that node.




Year/Sem:2010/VII

Semantic rules set up dependencies between attributes that will be represented by a graph. From the dependency graph, we derive an evaluation order for the semantic rules. Evaluation of the semantic rules defines the values of the attributes at the nodes in parse tree for the input string. A parse tree showing the values of attributes at each node is called an annotated parse tree. The process of computing the attributes at the nodes is called annotating or decorating the parse tree.

Syntax Directed Definition for infix to postfix:

Productions Semantic rule




Year/Sem:2010/VII

E -> E+T E.t := E.t||T.t||’+’ E-> E-T E.t=E.t||T.t|| ‘-‘

E-> T E.t=T.tT-> 0 T.t=’0’ T-> 1 T.t=’1’

……. ………..T-> 9 T.t=’9’

Q5. What is S attributed & L attributed definitions.

Ans: L-ATTRIBUTED DEFINITIONS : A syntax-directed definition is L-attributed if each inherited attribute of Xj for i between 1 and n, and on the right side of production A → X1X2…,Xn, depends only on:

1. The attributes (both inherited as well as synthesized) of the symbols X1,X2,…, Xj−1

(i.e., the symbols to the left of Xj in the production, and2. The inherited attributes of A.

The syntax-directed definition above is an example of the L-attributed definition, because the inherited attribute L.type depends on T.type, and T is to the left of L in the production D → TL. Similarly, the inherited attribute L1.type depends on the inherited attribute L.type, and L is parent of L1 in the production L → L1,id.

S-ATTRIBUTED DEFINITIONS A syntax directed definition that uses synthesized attributes exclusively is said to be an S- attributed definition. A parse tree for an S-attributed definition can always be annotated by evaluating the semantic rules for the attributes at each node bottom up, from the leaves to the root.




Year/Sem:2010/VII

Q6. Translate the expression into Quadruples, Triples and Indirect Triples.-(a+b)*(c+d)-(a+b+c)

Ans: Quadruples:

Operator Arg1 Arg2 result + a b T1 uminus T1 T2 + c d T3 + c T1 T4 * T2 T3 T5 - T5 T4 T6

Triples:

Operator Arg1 Arg2 (0) + a b (1) uminus (0) (2) + c d (3) + (0) c (4) * (1) (2) (5) - (4) (3)

Indirect Triples:




Year/Sem:2010/VII

Operator Arg1 Arg2 (14) + a b (15) uminus (14) (16) + c d (17) + (14) c (18) * (15) (16) (19) - (18) (17)

Pointers to triples

(0) (14) (1) (15) (2) (16) (3) (17) (4) (18) (5) (19)

Q7. What is activation record and what are its contents with diagram.Ans: Procedure calls and returns are usually managed by a run-time stack called the control

stack. Each live activation has an activation record (sometimes called a frame) on the control stack, with the root of the activation tree at the bottom, and the entire sequence of activation records on the stack corresponding to the path in the activation tree to the activation where control currently resides. The latter activation has its record at the top of the stack. The contents of activation records vary with the language being implemented. Here is a list of the kinds of data that might appear in an activation record




Year/Sem:2010/VII

1. Temporary values, such as those arising from the evaluation of expressions, in cases where those temporaries cannot be held in registers.2. Local data belonging to the procedure whose activation record this is.3. A saved machine status, with information about the state of the machine just before the call to the procedure. This information typically includes the return address (value of the program counter, to which the called procedure must return) and the contents of registers that were used by the calling procedure and that must be restored when the return occurs.4. An "access link" may be needed to locate data needed by the called procedure but found elsewhere, e.g., in another activation record. 5. A control link, pointing to the activation record of the caller.




Year/Sem:2010/VII

6. Space for the return value of the called function, if any. Again, not all called procedures return a value, and if one does, we may prefer to place that value in a register for efficiency.7. The actual parameters used by the calling procedure. Commonly, these values are not placed in the activation record but rather in registers, when possible, for greater efficiency. However, we show a space for them to be

Q8. Explain Symbol Table Organization and its Data Structure.Ans: In computer science, a symbol table is a data structure used by a language translator such

as a compiler or interpreter, where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location.An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.

A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program.

The various data structure used to implement the data structure.

ListThe simplest and easiest to implement data structure for symbol table is a linear list of

records. We use singlearray or collection of several arrays for this purpose to store name and their associated information. Now names are added to end of array. End of array always marks by a point known as space.


http://en.wikipedia.org/wiki/Execution_(computers)

http://en.wikipedia.org/wiki/Debugger

http://en.wikipedia.org/wiki/Linker_(computing)

http://en.wikipedia.org/wiki/Object_file

http://en.wikipedia.org/wiki/Memory_address

http://en.wikipedia.org/wiki/Scope_(programming)

http://en.wikipedia.org/wiki/Data_type

http://en.wikipedia.org/wiki/Source_code

http://en.wikipedia.org/wiki/Identifier

http://en.wikipedia.org/wiki/Interpreter_(computing)

http://en.wikipedia.org/wiki/Compiler

http://en.wikipedia.org/wiki/Data_structure

http://en.wikipedia.org/wiki/Computer_science



Year/Sem:2010/VII

Self Organizing ListTo reduce the time of searching we can add an addition field ‘linker’ to each record field or each array index.When a name is inserted then it will insert at ‘space’ and manage all linkers to other existing name.

In above figure (a) represent the simple list and (b) represent self organzing list in which Id1 is related to Id2

and Id3 is related to Id1.

Hash table:A hash table, or a hash map, is a data structure that associates keys with values ‘Open hashing’ is a key thatis applied to hash table. In hashing –open, there is a property that no limit on number of entries that can bemade in table. Hash table consist an array ‘HESH’ and several buckets attached to array HESH according to




Year/Sem:2010/VII

hash function.

Search Tree:Another approach to organize symbol table is that we add two link fields i.e. left and right child, we use thesefield as binary search tree. All names are created as child of root node that always follow the property of binarytree i.e. name <name ie and Namej <name. These two statements show that all smaller name than Namei must beleft child of name otherwise right child of namej. For inserting any name it always follow binary search treeinsert algorithm.




Year/Sem:2010/VII

Q9. Describe the Storage Allocation Strategies of Symbol Table.The data structure for a particular implementation of a symbol table is sketched in a separate array ‘arr_lexemes’ holds the character string forming an identifier. The string is terminated by anend-of-string character, denoted by EOS, that may not appear in identifiers. Each entry in symbol-table array‘arr_symbol_table’ is a record consisting of two fields, as “lexeme_pointer”, pointing to the beginning of a lexeme, and token. Additional fields can hold attribute values. In figure 9.1, the 0th entry is left empty,because lookup return 0 to indicate that there is no entry for a string. The 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th entries are for the ‘a’, ‘plus’ ‘b’ ‘and’, ‘c’, ‘minus’, and ‘d’ where 2nd, 4th and 6th entries are for reserve keyword.




Year/Sem:2010/VII

Q10. What is DAG? How basic blocks are represented through DAG.What are its Advantages and Applications.

Ans: In mathematics and computer science, a directed acyclic graph (commonly abbreviated to DAG), is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of edges that eventually loops back to v again.The DAG Representation of Basic BlocksDirected acyclic graphs (DAGs) give a picture of how the value computed by each

statement in the basic block is used in the subsequent statements of the block.

Definition: a dag for a basic block is a directed acyclic graph with the following labels on nodes:

- leaves are labeled with either variable names or constants. they are unique identifiers from operators we determine whether l- or r-value. represent initial values of names. Subscript with 0.

- interior nodes are labeled by an operator symbol.- Nodes are also (optionally) given a sequence of identifiers for labels.

- interior node º computed values

- identifiers in the sequence – have that value.

Example of DAG Representation


http://en.wikipedia.org/wiki/Edge_(graph_theory)

http://en.wikipedia.org/wiki/Vertex_(graph_theory)

http://en.wikipedia.org/wiki/Cycle_graph#Directed_cycle_graph

http://en.wikipedia.org/wiki/Directed_graph

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Mathematics



Year/Sem:2010/VII


t1:= 4*it2:= a[t1]t3:= 4*it4:= b[t3]t5:= t2 * t4

t6:= prod + t5

prod:= t6

t7:= i + 1i:= t7

if i <= 20 goto 1Three address code



Year/Sem:2010/VII

An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will use these symbol tables to resolve any unresolved references.

A symbol table may only exist during the translation process, or it may be embedded in the output of that process for later exploitation, for example, during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program.

While reverse engineering an executable a lot of tools refer the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has


*

+

[]

*

[]

+

<=

a b

4 i0 1

20

t5

t2t4

t1, t3 t7, i

(1)prod

http://en.wikipedia.org/wiki/Execution_(computers)



http://en.wikipedia.org/wiki/Linker_(computing)

http://en.wikipedia.org/wiki/Object_file



Year/Sem:2010/VII

been stripped or cleaned out before converting it into an executable tools will find it hard to find out addresses and understand anything about the program.


http://en.wikipedia.org/wiki/Strip_(Unix)

compiler design assignment

Documents