Batch, conversational, and incremental compilers conversational, and incremental compilers by ... IS the program after being transformed by the compiler into a machine-oriented ... dependent and are beyond

Download Batch, conversational, and incremental compilers  conversational, and incremental compilers by ... IS the program after being transformed by the compiler into a machine-oriented ... dependent and are beyond

Post on 15-May-2018




3 download

Embed Size (px)


<ul><li><p>Batch, conversational, and incremental compilers </p><p>by HARRY KATZAN, JR. </p><p>Pratt Institute Brooklyn, New York </p><p>INTRODUCTION </p><p>Compiler-writing techniques have received a great deal of pragmatic and academic attention and are now fair-ly well-defined. * It was and still is generally felt that the compiler is independent of the operating system jn which it resides, if it resides in one at all. The invention of time-sharing systems with conversational capability, however, has required that compiler experts re-evaluate existing concepts to make better use of external facilities. This was done and conversational and in-cremental compilers have evolved. A generalized and consolidated discussion of these relatively new concepts is the subject of this paper. First, a model of a batch compiler is introduced. The concepts are then modified and extended for a conversational programming en-vironment. Finally, a recent development termed "incremental" compilation, which satisfies the needs of both batch and conversational compiling as well as interactive computing, is presented. First, some intro-ductory material is required. </p><p>Basic concepts </p><p>In the cla~:::;ical data processing environment, ** t.he "compile phase" or "souree language pl'ocet):sing phase" is of prime importance as are definitions of source pro-gram and object program. The latter are redefined in light of the time-sharing or iuteractive environment. Extran~ous items, such as where the object program is stored or whether or not the compiler should produce assembler language coding, are practically ignored. </p><p>The source program is the program as written by the </p><p> Two books devoted entirely to the subject are worth men-tioning: Lee, J.A .N., 'Phe Anatomy of a Compiler,l and Randell, B. and L. J. Russell, Algol 60 Implernentation.2 </p><p>*. See Lee,l p. 9. </p><p>programmer. It is coded in symbolic form and punched ?n cards or typed in at the terminal. The object program IS the program after being transformed by the compiler into a machine-oriented form which can be read into the computer and executed with very few (if any) modifications. Also of interest is the information vector which gives initial conditions for compilation and de-notes the types of output desired. A sample of specifica-tions which might be found in an information vector follow: (1) location of the source program; (2) name of the program; (3) the extent of compiler processing, i.e., syntax check only, optimize, etc.; (4) computer system parameters; (5) compiler output desired; and (6) dis-position of the object module. The form of the source program is sometimes required, although in most cases this information is known implicitly. This pertains to different BCD codes and file types which may range from sequential or indexed files on conventional systems to list-structured files in virtual machines. </p><p>Similarly for output, the user can request a specialized form of object module or none at all, source or object program listing, and cross-reference listings. The object module is known as a Program Module which contains the machine language text and relocation information. Additionally, it may contain an Internal Symbol Dic-tionary for use during execution-time debugging. The Internal Symhol Dictionary is especially useful in con-versational time-sharing systems where execution can be stopped on a conditional basis and the values of internal variables can be displayed or modified. </p><p>Batch compilation </p><p>Batch compilation methods are required, quite natu-rally, in a batch processing environment. The term "batch processing" stems from the days when the pro-grammer submitted his job to the computer center </p><p>47 ----------------------------------</p><p>From the collection of the Computer History Museum (</p></li><li><p>48 Spring Joint Computer Conference, 1969 </p><p>and subsequently received his results later in time. A collection of different jobs was accumulated by opera-tions personnel and the batch was then presented to the computer system on an input tape. The important point is that the programmer has no contact with his job be-tween the time it is submitted to operations and when he receives his output. The concept has been extended to caver .:\1ultiprogramming Systems, Remote Job Entry (HJE), and the trivial case where no operating system exists and the programmer runs the compiler to com-pletion. </p><p>The generalized environment </p><p>The most significant aspect of the batch processing envirOlllllent is that the entire source program is avail-able to the compiler initially and that all compiler out-put can be postponed until a later phase. The compiler writer, therefore, is provided with a liberal amount of flexibility in designing his language processor. For ex-ample, specification (i.e., declarative) statements can be recognized and processed in an initial phase and storage allocated immediately. In the same pass, state-ment labels are recognized and entabled; then in a later phase, validity decisions for statements that use statement labels can be made immediately rather than making a later analysis on the basis of table entries. If desired, source program error diagnostics can be postponed. ::.Yforeover, the designer may specify his compiler so that the source program is passed by the compiler or so that the compiler is passed over the source program, which resides semi-permanently in memory. </p><p>This inherent flexibility is not exploited in the com-piler model which follows. Instead, an attempt has been made to present the material in a conceptually straight-forward manner. </p><p>A generalized batch compiler </p><p>By itself, a model of a generalized batch compiler is of limited interest. The concept is useful, hmvever, for comparison with those designed to operate in time-shared computer systems. Therefore, the presentation is pedagogical in nature as compared to one which might present a step by step procedure for building one. </p><p>Processing by the compiler is rather naturally divided into several phases which tend to be more logical than physical. Each phase has one or more specific tasks to perform. In so doing, it operates on tables and lists pos-sibly modifying them and producing nmv ones. One phase, of course, works on the source program froin the system input device or external storage and another produces the required output. The entire compiler is </p><p>described therefore by listing the tasks each phase is to perform; ordinarily, the description would also denote which tables and lists each phase uses and what tables and lists it creates or modifies. The specific tables and lists which are required, however, tend to be language dependent and are beyond the scope of this treatment. </p><p>The compiler is composed of five phases and an ex-ecutive routine, as follows: </p><p>The Compiler Executive (EXEC). The various phases run under the control of a compiler executive routine (EXEC) which is the only communication with the outside world. It establishe.s initial con-ditions and calls the different phases as required. It can be assumed that EXEC performs all system input/ output services, upon demand from the phase modules. :\Iore specifically, the EXEC has five maj or and distinct functions: </p><p>1. to interface "\vith the compiler's environment; 2. to prepare the source statements for processing </p><p>by phase one; 3. to control and order the operation of the </p><p>phases; 4. to prepare edited lines for output; and 5. to provide compiler diagnostic information. </p><p>Phase 1. Phase 1 performs the source program syntactic analysis, error analysis, and translation of the program into a tabular representation. Each variable or con-stant is given an entry in the symbol table, with formal arguments being flagged as such. Initial values and array dimensions are stored in a table of preset data. </p><p>Lastly, information from specification statements is stored in the specification table. The most significant Pl'ocp.ssing; howfwer, occurs respect to t.hp. Program Reference File and the Expression Reference File. </p><p>Each executable statement and statement label is placed in the Program Reference File in skeletal form. In addition to standard Program Reference File entries, the Program Referellee File contains pointers to the Expression Heferenc(' File for statements involving; arithmetic or logical expressions. </p><p>The Expression Reference File stores expressions in an internal notation using pointers to the symbol table </p><p>. when necessary. As wjth the Expression Heference File, the Program Reference File also contains pointers to the symbol table. </p><p>Phase 2. In general, phase 2 performs analyses that cannot be performed in phase 1. It makes storage as-signments in the Program l\Iodule for all variables that are not formal parameters. It detects illegal flow in loops and recognizes early exits therefrom. It also detel'lnine::; blocks uf a program with IlU path of control </p><p>From the collection of the Computer History Museum (</p></li><li><p>Batch, Conversational, and Incremental Compilers 49 </p><p>to them; and lastly, it detects statement labels which are referenced but not defined. Phase 3. The object of phase 3 is to perform the global optimizations used during object code generation, which is accomplished in phase 4. </p><p>The first major function of phase 3 is the recognition and processing of common sub-expressions. Phase 3 determines which arithmetic expressions need be com-puted only once and then saved for later use. In addi-tion, it determines the range of statements over which expressions are not redefined by the definition of one or more of their constituents. If the occurrence of an ex-pression in that range is contained in one or more DO* loops which are also entirely contained in that range, Phase 3 determines the outermost such loop outside which such an expression may be computed, and physically moves the expression to the front of that DO loop. Only the evaluation process is removed from the loop; any statement label or replacement operation is retained in its original position. The moved ex-pression is linked to a place reserved for that purpose in t,he program reference file entries corresponding to the beginning of the respective DO loops. </p><p>The second major function of phase 3 is the recogni-tion and processing of removable statements. A "remov-able statement" is one whose individual operands do not have "definition points" inside the loop; obviously, the execution of this statement for each iteration would be unnecessary. A definition point is a statement in which the variable has, or may have, a new variable stored in it (e.g., appears on the left-hand side of an equal sign). In removing statements, they are usually placed before the DO statement. </p><p>Phase 3 also processes formal parameters and devel-ops the prologue to the program; it optimizes the use of registers; and it merges the Program Reference File and the Expression Reference File to form a Complete Program File in preparation for phase 4. </p><p>Phase 4-. Phase 4 performs the code generation function. I ts input consists of the symbol table and the Complete Program File and its output is the Code File, which rep-resents completed machine instructions and control information. </p><p>Phase 5. Phase 5 is the output phase and generates the Program Module, the source and object listings, and the cross reference listing. Upon request, an Internal Symbol Dictionary is also included in the Program Module. </p><p>* Although the DO keyword is a constituent part of several programming languages, it should be interpreted as representing the class of statements from different languages which effec-tively enable the programmer to write program loops in a straightforward manner. </p><p>Any compiler model of this type is clearly an abstrac-tion; moreover, there is almost as much variation be-tween different compilers for the same programming language as there is between compilers for different languages. The model does serve a useful purpose which . ' IS to present a conceptual foundation from which con-versational and incremental compilers can be intro-duced. </p><p>Conversational compilation </p><p>Compared with the "batch" environment in which user has no contact with his job once it is submitted, the conversational environment provides the exact opposite. A general-purpose time-sharing system of one kind or another is assumed, * with users having access to the computer system via terminal devices. </p><p>In the batch environment, the user was required to make successive runs on the system to eliminate syntax and setup errors with the intervening time ranging from minutes to days. Excluding execution-time 'bugs", it often took weeks to get a program running. In the conversational mode, syntactical and setup errors can be eliminated in one terminal session. Similarly, execu-tion-time debugging is also possible in a time-sharing system, on a dynamic basis. </p><p>. Conversational programming places a heavy load on a compiler and an operating system; the magnitude of the load is reflected in the basic additions necessary to support the conversational environment. </p><p>The time-sharing environment </p><p>The time-sharing environment is characterized by versatility. Tasks can exist in the "batch" or "con-versational" mode. Furthermore, source program in-put can reside on the system input device or be pre-stored. The time-sharing operating system is able to distinguish between batch and conversational tasks; therefore, batch tasks are recognized as such and pro-cessed as in any operating system. The ensuing dis-cussion will concern conversational tasks. It is assumed, also, that the user resides at a terminal and is able to respond to requests by the system. </p><p>During the compile phase, the source program may be entered on a statement-by-statement basis or be pre-stored. In either case, the compiler responds immedi-ately to the terminal with local syntactic errors. The user, therefore, is able to make changes to the source program immediately. Changes to the source pro-gram other than in response to immediate diagnos-</p><p>* Two typical general-purpm;e time-sharing systemR are T&amp;.,/36(}34 and MULTICS.o </p><p>From the collection of the Computer History Museum (</p></li><li><p>50 Spring Joint Computer Conference, 1969 </p><p>t.ics cause a restart. of t.he compilation process. Obvious-ly, the system must keep a fresh copy of the source pro-gram for the restart case. To satisfy this need, a copy of the current up-to-date source program is maintained on external storage; if the source was prestored, the original version is updated with change requests; if the source program is not prestored, the compiler saves :J.ll source (and changes) as they are entered line-by-line. With the user at a tenninal, the compiler is also able to stop midway during compilation (usually after the glob-al statement analysis and before optimization) to in-quire whether or not the user wants to continue. Under error conditions, the user may abort the compilation or make changes and restart the compilatio...</p></li></ul>