Compiler Wikibook

Download Compiler Wikibook

Post on 03-Mar-2015




4 download

Embed Size (px)


<p>Compiler construction</p> <p>PDF generated using the open source mwlib toolkit. See for more information. PDF generated at: Wed, 20 Oct 2010 09:21:48 UTC</p> <p>ContentsArticlesIntroductionCompiler construction Compiler Interpreter History of compiler writing 1 1 2 10 14 16 16 20 31 34 43 46 46 51 53 55 56 57 68 70 74 78 83 83 85 91 93 100 109 118 120</p> <p>Lexical analysisLexical analysis Regular expression Regular expression examples Finite-state machine Preprocessor</p> <p>Syntactic analysisParsing Lookahead Symbol table Abstract syntax Abstract syntax tree Context-free grammar Terminal and nonterminal symbols Left recursion BackusNaur Form Extended BackusNaur Form TBNF Top-down parsing Recursive descent parser Tail recursive parser Parsing expression grammar LL parser LR parser Parsing table Simple LR parser</p> <p>Canonical LR parser GLR parser LALR parser Recursive ascent parser Parser combinator Bottom-up parsing Chomsky normal form CYK algorithm Simple precedence grammar Simple precedence parser Operator-precedence grammar Operator-precedence parser Shunting-yard algorithm Chart parser Earley parser The lexer hack Scannerless parsing</p> <p>122 124 125 127 135 137 142 144 147 148 150 152 155 165 166 169 170 173 173 175 175 176 176 177 177 179 182 183 187 192 195 197 198 199 205 208 208</p> <p>Semantic analysisAttribute grammar L-attributed grammar LR-attributed grammar S-attributed grammar ECLR-attributed grammar Intermediate representation Intermediate language Control flow graph Basic block Call graph Data-flow analysis Use-define chain Live variable analysis Reaching definition Three address code Static single assignment form Dominator C3 linearization Intrinsic function</p> <p>Aliasing Alias analysis Array access analysis Pointer analysis Escape analysis Shape analysis Loop dependence analysis Program slicing</p> <p>209 211 213 214 215 216 219 222 225 225 236 239 239 242 243 244 245 246 257 258 262 265 266 268 270 271 271 272 274 276 277 278 280 281 282 286 287</p> <p>Code optimizationCompiler optimization Peephole optimization Copy propagation Constant folding Sparse conditional constant propagation Common subexpression elimination Partial redundancy elimination Global value numbering Strength reduction Bounds-checking elimination Inline expansion Return value optimization Dead code Dead code elimination Unreachable code Redundant code Jump threading Superoptimization Loop optimization Induction variable Loop fission Loop fusion Loop inversion Loop interchange Loop-invariant code motion Loop nest optimization Manifest expression Polytope model</p> <p>Loop unwinding Loop splitting Loop tiling Loop unswitching Interprocedural optimization Whole program optimization Adaptive optimization Lazy evaluation Partial evaluation Profile-guided optimization Automatic parallelization Loop scheduling Vectorization Superword Level Parallelism</p> <p>289 295 296 298 299 303 303 304 308 309 310 312 312 320 321 321 323 332 334 335 336 338 345 346 348 352 352 356 358 359 361 363 364 365 366 367 367</p> <p>Code generationCode generation Name mangling Register allocation Chaitin's algorithm Rematerialization Sethi-Ullman algorithm Data structure alignment Instruction selection Instruction scheduling Software pipelining Trace scheduling Just-in-time compilation Bytecode Dynamic compilation Dynamic recompilation Object file Code segment Data segment .bss Literal pool Overhead code Link time</p> <p>Relocation Library Static build Architecture Neutral Distribution Format</p> <p>368 369 376 377 379 379 381 382 383 385 391 393 393 395 396 398 404 414 417 423 426 427 428 429 430 432 433 435 440 441 444 445 446 447 448 448</p> <p>Development techniquesBootstrapping Compiler correctness Jensen's Device Man or boy test Cross compiler Source-to-source compiler</p> <p>ToolsCompiler-compiler PQCC Compiler Description Language Comparison of regular expression engines Comparison of parser generators Lex flex lexical analyser Quex JLex Ragel yacc Berkeley Yacc ANTLR GNU bison Coco/R GOLD JavaCC JetPAG Lemon Parser Generator LALR Parser Generator ROSE compiler framework SableCC Scannerless Boolean Parser Spirit Parser Framework</p> <p>S/SL programming language SYNTAX Syntax Definition Formalism TREE-META Frameworks supporting the polyhedral model</p> <p>451 452 453 455 457 461 461 469 479 479 481 481</p> <p>Case studiesGNU Compiler Collection Java performance</p> <p>LiteratureCompilers: Principles, Techniques, and Tools Principles of Compiler Design The Design of an Optimizing Compiler</p> <p>ReferencesArticle Sources and Contributors Image Sources, Licenses and Contributors 482 490</p> <p>Article LicensesLicense 491</p> <p>1</p> <p>IntroductionCompiler constructionCompiler construction is an area of computer science that deals with the theory and practice of developing programming languages and their associated compilers. The theoretical portion is primarily concerned with syntax, grammar and semantics of programming languages. One could say that this gives this particular area of computer science a strong tie with linguistics. Some courses on compiler construction will include a simplified grammar of a spoken language that can be used to form a valid sentence for the purposes of providing students with an analogy to help them understand how grammar works for programming languages. The practical portion covers actual implementation of compilers for languages. Students will typically end up writing the front end of a compiler for a simplistic teaching language, such as Micro.</p> <p>External links Compiler Construction at the University of New England [1]</p> <p>References[1] http:/ / mcs. une. edu. au/ ~comp319/</p> <p>Compiler</p> <p>2</p> <p>CompilerA compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). If the compiled program can only run on a computer whose CPU or operating system is different from the one on which the compiler runs the compiler is known as a cross-compiler. A program that translates from a low level language to a higher level one is A diagram of the operation of a typical multi-language, multi-target compiler. a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis(Syntax-directed translation), code generation, and code optimization. Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software. The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the lexer and parser.</p> <p>Compiler</p> <p>3</p> <p>HistorySoftware for early computers was primarily written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler. Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960.[1] In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex. Early compilers were written in assembly language. The first self-hosting compiler capable of compiling its own source code in a high-level language was created for Lisp by Tim Hart and Mike Levin at MIT in 1962.[2] Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping problemthe first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.</p> <p>Compilers in educationCompiler construction and compiler optimization are taught at universities and schools as part of the computer science curriculum. Such courses are usually supplemented with the implementation of a compiler for an educational programming language. A well-documented example is Niklaus Wirth's PL/0 compiler, which Wirth used to teach compiler construction in the 1970s.[3] In spite of its simplicity, the PL/0 compiler introduced several influential concepts to the field: 1. 2. 3. 4. 5. Program development by stepwise refinement (also the title of a 1971 paper by Wirth)[4] The use of a recursive descent parser The use of EBNF to specify the syntax of a language A code generator producing portable P-code The use of T-diagrams[5] in the formal description of the bootstrapping problem</p> <p>CompilationCompilers enabled the development of programs that are machine-independent. Before the development of FORTRAN (FORmula TRANslator), the first higher-level language, in the 1950s, machine-dependent assembly language was widely used. While assembly language produces more reusable and relocatable programs than machine code on the same architecture, it has to be modified or rewritten if the program is to be executed on different hardware architecture. With the advance of high-level programming languages soon followed after FORTRAN, such as COBOL, C, BASIC, programmers can write machine-independent source programs. A compiler translates the high-level source programs into target programs in machine languages for the specific hardwares. Once the target program is generated, the user can execute the program.</p> <p>Compiler</p> <p>4</p> <p>Structure of compilerCompilers bridge source programs in high-level languages with the underlying hardwares. A compiler requires 1) to recognize legitimacy of programs, 2) to generate correct and efficient code, 3) run-time organization, 4) to format output according to assembler or linker conventions. A compiler consists of three main parts: frontend, middle-end, and backend. Frontend checks whether the program is correctly written in terms of the programming language syntax and semantics. Here legal and illegal programs are recognized. Errors are reported, if any, in a useful way. Type checking is also performed by collecting type information. Frontend generates IR (intermediate representation) for the middle-end. Optimization of this part is almost complete so much are already automated. There are efficient algorithms typically in O(n) or O(n log n). Middle-end is where the optimizations for performance take place. Typical transformations for optimization are removal of useless or unreachable code, discovering and propagating constant values, relocation of computation to a less frequently executed place (e.g., out of a loop), or specializing a computation based on the context. Middle-end generates IR for the following backend. Most optimization efforts are focused on this part. Backend is responsible for translation of IR into the target assembly code. The target instruction(s) are chosen for each IR instruction. Variables are also selected for the registers. Backend utilizes the hardware by figuring out how to keep parallel FUs busy, filling delay slots, and so on. Although most algorithms for optimization are in NP, heuristic techniques are well-developed.</p> <p>Compiler outputOne classification of compilers is by the platform on which their generated code executes. This is known as the target platform. A native or hosted compiler is one whose output is intended to directly run on the same type of computer and operating system that the compiler itself runs on. The output of a cross compiler is designed to run on a different platform. Cross compilers are often used when developing software for embedded systems that are not intended to support a software development environment. The output of a compiler that produces code for a virtual machine (VM) may or may not be executed on the same platform as the compiler that produced it. For this reason such compilers are not usually classified as native or cross compilers.</p> <p>Compiled versus interpreted languagesHigher-level programming languages are generally divided for convenience into compiled languages and interpreted languages. However, in practice there is rarely anything about a language that requires it to be exclusively compiled or exclusively interpreted, although it is possible to design languages that rely on re-interpretation at run time. The categorization usually reflects the most popular or widespread implementations of a language for instance, BASIC is sometimes called an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters. Modern trends toward just-in-time compilation and bytecode interpretation at times blur the traditional categorizations of compilers and interpreters. Some language specifications spell out that implementations must include a compilation facility; for example, Common Lisp. However, there is nothing inherent in the definition of Common Lisp that stops it from being interpreted. Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special evaluation function. To implement these features in a compiled language, programs must usually be shipped with a</p> <p>Compiler runtime library that includes a version of the compiler itself.</p> <p>5</p> <p>Hardware compilationThe output of some compilers may target hardware at a very low level, for example a Field Programmable Gate Array (FPGA) or structured Application-specific integrated circuit (ASIC). Such compil...</p>