an overview of a compiler - iit hyderabadramakrishna/compilers-jan... · optimization techniques -...
TRANSCRIPT
An Overview of a Compiler
Y.N. Srikant
Department of Computer Science and AutomationIndian Institute of Science
Bangalore 560 012
NPTEL Course on Principles of Compiler Design
Y.N. Srikant Compiler Overview
Outline of the Lecture
About the courseWhy should we study compiler design?Compiler overview with block diagrams
Y.N. Srikant Compiler Overview
About the Course
A detailed look at the internals of a compilerDoes not assume any background but is intensiveDoing programming assignments and solving theoreticalproblems are both essentialA compiler is an excellent example of theory translated intopractice in a remarkable way
Y.N. Srikant Compiler Overview
Why Should We Study Compiler Design?
Compilers are everywhere!Many applications for compiler technology
Parsers for HTML in web browserInterpreters for javascript/flashMachine code generation for high level languagesSoftware testingProgram optimizationMalicious code detectionDesign of new computer architectures
Compiler-in-the-loop hardware development
Hardware synthesis: VHDL to RTL translationCompiled simulation
Used to simulate designs written in VHDLNo interpretation of design, hence faster
Y.N. Srikant Compiler Overview
About the Complexity of Compiler Technology
A compiler is possibly the most complex system softwareand writing it is a substantial exercise in softwareengineeringThe complexity arises from the fact that it is required tomap a programmer’s requirements (in a HLL program) toarchitectural detailsIt uses algorithms and techniques from a very largenumber of areas in computer scienceTranslates intricate theory into practice - enables toolbuilding
Y.N. Srikant Compiler Overview
About the Nature of Compiler Algorithms
Draws results from mathematical logic, lattice theory, linearalgebra, probability, etc.
type checking, static analysis, dependence analysis andloop parallelization, cache analysis, etc.
Makes practical application ofGreedy algorithms - register allocationHeuristic search - list schedulingGraph algorithms - dead code elimination, registerallocationDynamic programming - instruction selectionOptimization techniques - instruction schedulingFinite automata - lexical analysisPushdown automata - parsingFixed point algorithms - data-flow analysisComplex data structures - symbol tables, parse trees, datadependence graphsComputer architecture - machine code generation
Y.N. Srikant Compiler Overview
Other Uses of Scanning and Parsing Techniques
Assembler implementationOnline text searching (GREP, AWK) and word processingWebsite filteringCommand language interpretersScripting language interpretation (Unix shell, Perl, Python)XML parsing and document tree constructionDatabase query interpreters
Y.N. Srikant Compiler Overview
Other Uses of Program Analysis Techniques
Converting a sequential loop to a parallel loopProgram analysis to determine if programs are data-racefreeProfiling programs to determine busy regionsProgram slicingData-flow analysis approach to software testing
Uncovering errors along all pathsDereferencing null pointersBuffer overflows and memory leaks
Worst Case Execution Time (WCET) estimation andenergy analysis
Y.N. Srikant Compiler Overview
Language Processing System
Y.N. Srikant Compiler Overview
Compiler Overview
Y.N. Srikant Compiler Overview
Compilers and Interpreters
Compilers generate machine code, whereas interpretersinterpret intermediate codeInterpreters are easier to write and can provide better errormessages (symbol table is still available)Interpreters are at least 5 times slower than machine codegenerated by compilersInterpreters also require much more memory than machinecode generated by compilersExamples: Perl, Python, Unix Shell, Java, BASIC, LISP
Y.N. Srikant Compiler Overview
Translation Overview - Lexical Analysis
Y.N. Srikant Compiler Overview
Lexical Analysis
LA can be generated automatically from regular expressionspecifications
LEX and Flex are two such tools
LA is a deterministic finite state automatonWhy is LA separate from parsing?
Simplification of design - software engineering reasonI/O issues are limited LA aloneLA based on finite automata are more efficient to implementthan pushdown automata used for parsing (due to stack)
Y.N. Srikant Compiler Overview
Translation Overview - Syntax Analysis
Y.N. Srikant Compiler Overview
Parsing or Syntax Analysis
Syntax analyzers (parsers) can be generated automaticallyfrom several variants of context-free grammarspecifications
LL(1), and LALR(1) are the most popular onesANTLR (for LL(1)), YACC and Bison (for LALR(1)) are suchtools
Parsers are deterministic push-down automataParsers cannot handle context-sensitive features ofprogramming languages; e.g.,
Variables are declared before useTypes match on both sides of assignmentsParameter types and number match in declaration and use
Y.N. Srikant Compiler Overview
Translation Overview - Semantic Analysis
Y.N. Srikant Compiler Overview
Semantic Analysis
Semantic consistency that cannot be handled at theparsing stage is handled hereType checking of various programming languageconstructs is one of the most important tasksStores type information in the symbol table or the syntaxtree
Types of variables, function parameters, array dimensions,etc.Used not only for semantic validation but also forsubsequent phases of compilation
Static semantics of programming languages can bespecified using attribute grammars
Y.N. Srikant Compiler Overview
Translation Overview - Intermediate Code Generation
Y.N. Srikant Compiler Overview
Intermediate Code Generation
While generating machine code directly from source codeis possible, it entails two problems
With m languages and n target machines, we need to writem × n compilersThe code optimizer which is one of the largest andvery-difficult-to-write components of any compiler cannot bereused
By converting source code to an intermediate code, amachine-independent code optimizer may be writtenIntermediate code must be easy to produce and easy totranslate to machine code
A sort of universal assembly languageShould not contain any machine-specific parameters(registers, addresses, etc.)
Y.N. Srikant Compiler Overview
Different Types of Intermediate Code
The type of intermediate code deployed is based on theapplicationQuadruples, triples, indirect triples, abstract syntax treesare the classical forms used for machine-independentoptimizations and machine code generationStatic Single Assignment form (SSA) is a recent form andenables more effective optimizations
Conditional constant propagation and global valuenumbering are more effective on SSA
Program Dependence Graph (PDG) is useful in automaticparallelization, instruction scheduling, and softwarepipelining
Y.N. Srikant Compiler Overview
Translation Overview - Code Optimization
Y.N. Srikant Compiler Overview
Machine-independent Code Optimization
Intermediate code generation process introduces manyinefficiencies
Extra copies of variables, using variables instead ofconstants, repeated evaluation of expressions, etc.
Code optimization removes such inefficiencies andimproves codeImprovement may be time, space, or power consumptionIt changes the structure of programs, sometimes of beyondrecognition
Inlines functions, unrolls loops, eliminates someprogrammer-defined variables, etc.
Code optimization consists of a bunch of heuristics andpercentage of improvement depends on programs (may bezero also)
Y.N. Srikant Compiler Overview
Examples of Machine-Independant Optimizations
Common sub-expression eliminationCopy propagationLoop invariant code motionPartial redundancy eliminationInduction variable elimination and strength reductionCode opimization needs information about the program
which expressions are being recomputed in a function?which definitions reach a point?
All such information is gathered through data-flow analysis
Y.N. Srikant Compiler Overview
Translation Overview - Code Generation
Y.N. Srikant Compiler Overview
Code Generation
Converts intermediate code to machine codeEach intermediate code instruction may result in manymachine instructions or vice-cersaMust handle all aspects of machine architecture
Registers, pipelining, cache, multiple function units, etc.Generating efficient code is an NP-complete problem
Tree pattern matching-based strategies are among the bestNeeds tree intermediate code
Storage allocation decisions are made hereRegister allocation and assignment are the most importantproblems
Y.N. Srikant Compiler Overview
Machine-Dependent Optimizations
Peephole optimizationsAnalyze sequence of instructions in a small window(peephole) and using preset patterns, replace them with amore efficient sequenceRedundant instruction eliminatione.g., replace the sequence [LD A,R1][ST R1,A] by [LDA,R1]Eliminate “jump to jump” instructionsUse machine idioms (use INC instead of LD and ADD)
Instruction scheduling (reordering) to eliminate pipelineinterlocks and to increase parallelismTrace scheduling to increase the size of basic blocks andincrease parallelismSoftware pipelining to increase parallelism in loops
Y.N. Srikant Compiler Overview