compilers - principles, techniques and tools

Download compilers - principles, techniques and tools

Post on 13-Nov-2014

16 views

Category:

Documents

9 download

Embed Size (px)

TRANSCRIPT

<p>CHAPTER</p> <p>Introduction to CompilingThe principles and techniques of compiler writing are sc, pervasive that the ideas found in this book will be used many times in the career of a cumputer scicnt is1, Compiler writing spans programming languages, machine architecture, language theory, algorithms, and software engineering. Fortunately, a few basic mrnpikr-writing techniques can be used to construct translators for P wide variety of languages and machines. In this chapter, we intrduce the subject of cornpiiing by dewxibing the components o a compiler, the environf ment in which compilers do their job, and some software tmls that make it easier to build compilers.1.1 COMPILERS</p> <p>Simply stated, a mmpiltr i s a program that reads a program written in oae language - the source Language - and translates it inro an equivalent prqgram in another language - the target language (see Fig. 1.I). As an important part of this translation process, the compiler reports to its user the presence of errors in the murcc program.target</p> <p>prwa"'error</p> <p>messages</p> <p>At first glance, the variety of mmpilers may appear overwhelming. There are thousands of source languages, ranging from traditional programming languages such as Fortran and Pascal to specialized languages (hat have arisen in vktually every area of computer application. Target languages are equally as varied; a target language may be another programming language, or the machine language of any computer between a microprocasor and a</p> <p>2</p> <p>IN'TROI1UCTII)N TO COMPILING</p> <p>SEC.</p> <p>1.1</p> <p>supercwmputcr, Compilers arc sometimes classified as ~ingle~pass, multi-pass, load-and-go, debugging, or optimizing, depending on how they have been constructed or on what function they arc suppsed to pcrform. Uespitc this apparent complexity, the basic tasks that any compiler must perform arc essentially the same. By understanding thcse tasks, w can construct come pilers ior a wide variety of murcc languages and targct machines using the</p> <p>same basic techniques. Our knowlctlge about how to organird and write compilers has increased vastly sincc thc first compilers started to appcar in the carty 1950'~~ is diffiit cult to give an exact date for the first compiler kcausc initially a great deal of experimentat ion and implementat ion was donc independently by several groups. Much of the early work on compiling deal1 with the translation of arithmetic formulas into machine cads. Throughout the lY501s, compilers were mnsidcred notoriously difficult programs to write. The first Fortran ~Cimpller,for exampie, t o o k f 8 staff-years to implement (Backus ct a[. 119571). We have since discovered systematic techniques for handling many of the imponant tasks that mcur during compilation. Good implementation languages, programming environments, and software t w l s have also been developed. With the% advances, a substantial compiler can be implemented even as a student projtxt in a one-semester compiler-design cuursc+</p> <p>There are two parts to compilation: analysis and synthesis. The analysis part breaks up the source program into mnstitucnt pieces and creates an intermdiate representation of the sou'rce pmgram. Tbc synthesis part constructs the desired larget program from the intcrmcdiate representation. Of the I w e parts, synthesis requires the most specialized techniques, Wc shall msider analysis informally in Sxtion 1.2 and nutline the way target cude is synthesized in a standard compiler in Section 1.3. During anaiysis, the operations implicd by thc source program are determined and recorded in a hierarchical pltrlrcturc m l l d a trcc. Oftcn, a special kind of tree called a syntax tree is used, in which cach nodc reprcscnts an operation and the children of a node represent the arguments of the operation. Fw example. a syntax tree for an assignment statemcnt i s shown in Fig. 1.2.</p> <p>:e</p> <p>gasition</p> <p>/ ' \initial.</p> <p>/</p> <p>+</p> <p>'-./</p> <p>+</p> <p>rate</p> <p>\</p> <p>60</p> <p>h 1.2, .</p> <p>Syntax trtx lor @sit ion := i n i t i a l + r a t e 60.</p> <p>EC. 1.1</p> <p>COMPILERS</p> <p>3</p> <p>Many software tools that manipulate source programskind of analysis. Some exampies of such tools include:Structure editom, A Structure editor</p> <p>first</p> <p>perform some</p> <p>takes as input a sequence of corn-</p> <p>mands to build a sour= program* The structure editor not ofily performs the text-creation and modification functions of an ordinary text editor, but it alw analyzes the program text, putting an appropriate hierarchical structure on the source program. Thus, the structure editor can perform additional tasks that are useful in the preparation of programs. For example, it can check that the input is correctly formed, can supply kcywords automatically (e-g.. when the user types while. the editor svpplics the mathing do and r e m i d i the user tha# a conditional must come ktween them), and can jump from a begin or left parenthesis to its matching end or right parenihesis. Further, the output of such an editor i s often similar to the output of the analysis phase of a compiler.2.Pretty printers. A pretty printer anaiyxs a program and prints it in wch a way that the structure of the program becomes clearly visible. For example, comments may appear in a spcial font, and statements may appear with an amount of indentation proportional to the depth of their</p> <p>nesting in the hierarchical organization of the stalements.3.Static checkers. A siatic checker reads a program, analyzes it, and attempts to dimver potential bugs without running the program, The</p> <p>analysis portion is often similar to that fmnd in optimizing compilers ofthe type discussed in Chapter 10. Fw example, a static checker may detect that parts of the source propam can never be errscutd, or that a certain variable might be used before b c t g defined, In addition, it can catch logicat errors such as trying to use a real variable as a pintcr, employing the t ype-checking techniques discussed in Chapter 6.4.</p> <p>inrerpr~iers. Instead of producing a target program as a translation, an interpreter performs the operations implied by the murce program. For an assignment statement, for example, an interpreter might build a tree like Fig. 1.2, and then any out the operations at the nodes as it "walks" the tree. At the root it wwk! discover it bad an assignment to perform, so it would call a routine to evaluate the axprcssion on the right, and then store the resulting value in the Location asmiated with the identifiet position. At the right child of the rm, the routine would discover it had to compute the sum of two expressions. C would call itaclf recurt siwly to compute the value of the expression rate + 60. It would then add that value to the vaiue of the variable initial. Interpreters are Trequeatly used to cxecute command languages, since each operator executed in a command language is usually an invmtim of a cornpk~routine such as an editor o compiler. Similarly, some 'Wry r high-level" Languages, like APL, are normally interpreted b a u s e there are many things about the data, such as the site and shape of arrays, that</p> <p>4 1NTRODUCTION TO COMPILING</p> <p>SEC.</p> <p>I.</p> <p>cannot be deduced at compile time.Traditionally, we think of a compiler 8s a program that translates a source language like Fortran into the assembly or machine ianguage of some computer. However, there are seemingly unrelated places where compiler technology is regularly used. The analysis portion in each o the following examples f is similar to that of a conventional compiler.</p> <p>I,</p> <p>Text formrrers.</p> <p>A text farmatter takes input that i a stream uf sharacs ten, most of which is text to be typeset, but some of which includes commands to indicate paragraphs, figures. or mathematical structures like</p> <p>subscripts and superscripts. We mention some of the analysis done bytext formatters in the next section.</p> <p>2.</p> <p>Sibicr~nct~stylihrs. A silicon compiler has a source language that is similar or identical to a conventional programrning language. However, the variables of the language represent, not locations in memory, but, logical signals (0 or 1) or groups of signals in a switching circuit. The output is a circuit design in an appropriate language. See Johnson 1 19831. Ullman 1 19843, or Trickey 1 19BSJfor a discussion of silicon compilation.Qucry inwrpreters. A query interpreter translates a predicate containing relational and h l e a n operators into commands to search s database for records satisfying [hat pmlicate. (See Ullman 119821 or Date 11986j+)</p> <p>3.</p> <p>The Context of a CompilerIn addit ion to a compiler, several other programs may be required to create an executable target program, A source program may be divided into modules stored in separate files. The task of collecting the source program is sometimes entrusted to a distinct program, called a preprocessor, The preprocessor may also expand shorthands, called rnacras, into source language staternenfs. Figure 1.3 shows a typical "compilation." The target program created by the compiler may require further processing before it can be run. The cornpiler in Fig, 1.3 creates assembly code that is translated by an assembler into machine code and then linked together with some library routines into thc code that actually runs on the machine, We shall consider the components of a compiler in the next two sccticsns; the remaining programs in Fig. 1.3 are discussed in Sec~ion1.4.</p> <p>1,2 ANALYSIS OF</p> <p>THE SOURCE PROGRAM</p> <p>In this section, we introduce analysis and illustrate its use in some textformatting languages, The subject is treated in more detail in Chapters 2-4 and 6. In compiling, analysis consists of three phaxs:1.Lirtuar unu!ysix, in which the stream of characters making up the source program i s read from left-to-right and grouped into wkc*ns thar are sequences of characters having a collective meaning.</p> <p>ANALYSIS OF THE SOURCE PROGRAM</p> <p>5</p> <p>library. rclrjcatab!~ objwt filcsabsolutc machinc a d cF . '1-3. i g</p> <p>A language-praccsning systcm.</p> <p>2.</p> <p>Hi~rurc~htcu~ am/y,~i.s, which characters or tokens are grouped hierarchiin cally into nested collcctiwnx with mlleclive meaning*</p> <p>3.</p> <p>Scmontic unuiy.is, in which certain checks are performed to ensure that Ihe components of a program fit together meaningfully.</p> <p>In a compiler, linear analysis i s called Irxicui anulysi,~ srwwin#. For examor ple, in lexical analysis the charaaers in the assignment statement</p> <p>'position := initial</p> <p>+</p> <p>rate</p> <p>*</p> <p>60</p> <p>would be grouped into the follow~ng tokens;</p> <p>1. The identifier go$ ition. 2. The assignment symbol :=. 3. Theidentifier i n i t i a l . 4. The plus sim. 5 . The identifier rate. 6 . The multiplication sign. 7. The number 6 0 ,The blanks separating the characters of these tokens would normally be eliminated during lexical analysis.</p> <p>Syntax Analysis</p> <p>H ierarchical analysis is called pur.~ingor synm untiiy.~i.~, involves grouping 14 the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. Usualty, the grammatical phrases of the source program are represented by a parse tree such as the one shown in Fig. 1 -4.</p> <p>position</p> <p>I'</p> <p>i n i ti a l</p> <p>Iidcnt#icr</p> <p>rate</p> <p>I</p> <p>Fig, 1.4. Pursc trcc for position : = initial + rate 60.</p> <p>In the expression i n i t i a l + rate * 60,the phrase rate 6 0 is a logical unit bemuse the usual conventions of arithmetic expressions tell us that multiplication is performed before addit ion. Because the expression 5n i t i a l + rate is foilowed by a *. it is not grouped into a single phrase by itself in Fig. 1.4, The hierarchical structure of a program i usually expressed by recursive s rules. For example, we might have the following rules as part sf the definition of expressions:</p> <p>I. 2, 3.</p> <p>Any idcntijeris an expression. Any m m h r is an expression. If c~prc.rsioiz and ~ x p r ~ ' s s i u n expressions, then so are 1 are</p> <p>Rules (I) and (2) are (noorecursive) basis rules, while (3) defines expressions in terms of operators applied to other expressions. Thus, by rule I I). i n i t i a l and rate are expressions. By rule (21, 6 0 is an expression, while by rule (31, we can first infer that rate * 60 is an expresxion and finally that initial + rate 60 is an expression. Similarly, many Ianguagei; define statements recursively by rules such as:</p> <p>SEC+ 1.2</p> <p>ANALYSIS OF THE SOURCE PROGRAM</p> <p>7</p> <p>1.</p> <p>I f identrfrer i an identifier, and c'xprc+.v.~ion~an exyrc~qion,then s is</p> <p>is a statement.</p> <p>2.</p> <p>If expremion I is an expression and siultrncnr 2 is a statemen I, thenw hilc I expression 1 do stutrmc.rrtif( expression I</p> <p>,</p> <p>1 then sfutemrn! 2</p> <p>are statements.</p> <p>The division between lexical and syntactic analysis is somewhat arbitrary. We usually choose a division that simplifies the overall task of analysis. One factor in determining the division is whether a source !anguage construct i s inherently recursive or not. Lexical constructs do not require recursion, while syntactic conslructs often do. Context-free grammars are a formalization of recursive rules that can be used to guide syntactic analysis. They are introduced in Chapter 2 and studied extensively in Chapter 4, For example, recursion is not required to recognize identifiers, which are typically strings of letters and digits beginning with a letter. We would normally recognize identifiers by a simple scan of the input stream. waiting unlil a character that was neither a letter nor a digit was found, and then grouping all the letters and digits found up to that point into an ideatifier token. The characters s grouped are recorded in a table, called a symbol table. and o removed from the input s that processing o f the next token can begin. o On the other hand, this kind of linear scan is no1 powerful enough to analyze expressions or statements. For example, we cannot properly match parentheses in expressions, or begin and end in statements, without putting some kind of hierarchical or nesting structu~eon the input.</p> <p>position</p> <p>/ - -\initial</p> <p>.-</p> <p>:=</p> <p>++</p> <p>vos i t ion</p> <p>/</p> <p>\</p> <p>+</p> <p>I \</p> <p>initial</p> <p>/</p> <p>rate</p> <p>* /'\</p> <p>\</p> <p>h#~resl</p> <p>IFig. 1.5. Scmantic analysis inscrt s a conversion frnm intcgcr to real.</p> <p>The parse tree in Fig. 1.4 describes the syntactic siructure of the input. A more common internal representation of this syntactic structure is given by the syntax tree in Fig. L.5(a). A syntax tree is a compressed representation of the parse tree in which the operators appear as the interior nodes, a.nd the operands of an operator are the children of the node for that operator. The construction of trecs such as the one In Fig. 1 .S(a) i s discussed in Section 5.2.</p> <p>8</p> <p>INTRODUCTION TO COMPILING</p> <p>SEC.</p> <p>1.2</p> <p>We shall take up in Chapter 2, and in more detail in Chapter 5 , the subject of~yntax-birecedtrwts~uriun,In which the compiler uses the hierarchical structure on the input to help generate the output.Semantic Analysis</p> <p>The semantic analysis phase check...</p>

Recommended

View more >