chapter 1. overview j. h. wang sep.15, 2015. outline history of compilation what compilers do...

Chapter 1. Overview

J. H. WangSep.15, 2015

Outline

• History of Compilation• What Compilers Do• Interpreters• Syntax and Semantics• Organization of a Compiler• Programming Language and Compiler Design• Computer Architecture and Compiler Design• Compiler Design Considerations• Integrated Development Environments

Language Processors

• Translators– Transforming human-oriented

programming languages into computer-oriented machine languages

History of Compilation

• Early compilers– 1950s: by Grace Hopper– Late 1950s: Fortran

• Broad applications– Typesetting: TeX, LaTeX– Portable document representation:

PostScript– Symbolic and numeric problem solving:

Mathematica– VLSI: Verilog, VHDL

What Compilers Do

• Compilers may be distinguished in two ways– By the kind of machine code they

generate– By the format of the target code they

generate

Machine Code Generated by Compilers

• Pure machine code– Only instructions from a particular instruction set

• Without dependence on any software (library, OS)– Rare; mostly used in system implementation languages

• Augmented machine code– Augmented with OS and runtime language support routines

• I/O, storage allocation, mathematical functions• Data transfer, procedure call, and dynamic storage

instructions– More often

• Virtual machine code– Only virtual instructions– Virtual machine

• Pascal P-code• Java bytecodes

– Portability, program size reduction

Bootstrapping

Target Code Formats

• Assembly or other source formats– Easy to scrutinize– Useful for prototyping programming language designs

and cross-compilation

• Relocatable binary– More efficient and more control over the translation

process– External references, local instruction addresses, and

data addressed are not bound• A linkage step is required

• Absolute binary– Faster, but limited ability to interface with other code– Useful for exercises and prototyping

• Compilation costs far exceed execution costs

Interpreters

• Capabilities of interpreters– Programs can be easily modified as execution

proceeds• Interactive debugging

– Dynamic object typing can be easily supported• E.g. Lisp and Scheme

– Significant degree of machine independence

• Drawbacks– Direct interpretation of source programs can

involve significant overhead

Syntax and Semantics

• Syntax: structure– E.g. context-free grammars (CFGs)

• a=b+c is legal, but b+c=a is not

• Semantics: meaning– E.g.

• a=b+c is illegal if any of the variables are undeclared or if b or c is of type Boolean

– Static semantics– Runtime semantics

Static Semantics

• A set of rules that specify which syntactically legal programs are actually valid– E.g.: Identifier declaration, type-

compatibility of operators and operands, proper number of parameters in procedure calls

• Can be specified either formally or informally– E.g.: attribute grammars

An Example of Attribute Grammars

• Production rule:– E -> E+T

• Augmented production rule:– Eresult -> Ev1 + Tv2

• if v1.type=numeric and v2.type=numericthen result.type<-numericelse call ERROR()

– Verbose and tedious

Runtime Semantics• To specify what a program computes

– Can be specified informally• E.g.: program states

– a=1: the state component corresponding to a is changed to 1– Formal approaches

• Natural semantics: operational model– Given assertions before evaluations of a construct, we can

infer assertions that will hold after the construct’s evaluation• Axiomatic semantics: relations or predicates that relate

program variables– E.g.: var <- exp

» var is true after statement execution iff. the predicate obtained by replacing all occurrences of var by exp is true beforehand

– Good for deriving proofs of program correctness; but difficult to use

• Denotational semantics: more mathematical in form– E.g: E[T1+T2]m=E[T1]m+E[T2]m

• Difficulty in semantics: imprecise language specification– E.g.: (in Java)

• public static int subr(int b){ if (b != 0) return b+100;}

• public static int subr(int b){ if (b != 0) return b+100; else if (10*b==0) return 1;}

– The problem of deciding whether a particular statement in a program is reachable is undecidable

• In practice, a trusted reference compiler can serve as a de facto language definition– E.g.: Lisp

Organization of a Compiler

Analysis

Synthesis

The Structure of a Compiler

• Tasks performed by compilers– Analysis of the source program

• Syntax analysis• Semantic analysis

– Synthesis of a target program that, when executed, will correctly perform the computations described by the source program• Code generator • Optimizer

The Scanner

• Reading the input text and grouping individual characters into tokens– Identifiers– Integers– Reserved words– Delimiters

• What the scanner does– It puts the program into a compact and uniform format– It eliminates unneeded information– It processes compiler control directives– It sometimes enters preliminary information into symbol

table– It optionally formats and lists the source program

Lexical Analysis (Scanning)[Aho, Lam, Sethi, Ullman]

• Grouping characters into lexemes• Producing tokens

– (token-name, attribute-value)

• E.g. – position = initial + rate * 60– <id,1> <=> <id,2> <+> <id,3> <*>

<60>

• Regular expressions (Chap. 3) – An effective and powerful approach to

describe tokens– As a specification for automatic

generation of finite automata that recognizes regular sets• Scanner generator

The Parser

• Reading tokens and grouping them into phrases according to the syntax specification such as CFGs– Grammars (Chap. 2 & 4)– Parsing (Chap. 5 & 6)– Parser generator

• It usually builds an Abstract Syntax Tree (AST) as a concise representation of program structure– (Chap. 2 & 7)

Syntax Analysis (Parsing)[Aho, Lam, Sethi, Ullman]

• Creating a tree-like intermediate representation (e.g. syntax tree) that depicts the grammatical structure of the token streams– E.g.– <id,1> <=> <id,2> <+> <id,3> <*>

<60>–

=

<id, 1> +

<id, 2> *

<id, 3> 60

The Type Checker (Semantic Analysis)

• Checking the static semantics of each AST node– If the construct is semantically correct,

the type checker decorates the AST node by adding type information to it

– Otherwise, a suitable error message is issued

Semantic Analysis[Aho, Lam, Sethi, Ullman]

• Type checking• Type conversions or coercions• E.g.

– =

<id, 1> +

<id, 2> *

<id, 3>

60

int2float

Translator (Program Synthesis)

• Translating AST nodes into Intermediate Representation (IR) code– E.g. while loops -> two subtrees: expression,

body

• It’s largely dictated by the semantics of the source language

• In simple, nonoptimizing compilers, the translator may generate target code directly

• More elaborate compilers such as GCC may first generate a high-level IR and then translate it into a low-level IR

Intermediate Code Generation

[Aho, Lam, Sethi, Ullman]• Generating a low-level intermediate

representation– It should be easy to produce– It should be easy to translate into the

target machine– E.g. three-address code (in Chap. 6)

• t1 = int2float(60)t2 = id3 * t1t3 = id2 + t2id1 = t3

Symbol Tables

• A mechanism that allows information to be associated with identifiers and shared among compiler phases– Identifier declaration– Identifier use– Type checking

Symbol Table Management[Aho, Lam, Sethi, Ullman]

• To record the variable names and collect information about various attributes of each name– Storage, type, scope– Number and types of arguments,

method of argument passing, and the type returned

Name Type

position …

initial …

rate …

The Optimizer

• Analyzing and transforming the IR code generated by the translator into functionally equivalent but improved code– Complex– Optimizations may be performed in stages

• Optimization can also be done after code generation– E.g. peephole optimization: a few instructions at a time

• Multiplications by 1• Additions of 0• Loading a value into register when it’s already in another

register• Replacing a sequence of instructions by a single instruction

with the same effect

Code Optimization[Aho, Lam, Sethi, Ullman]

• Attempts to improve the intermediate code– Better: faster, shorter code, or code that

consumes less power– E.g.

• t1 = id3 * 60.0id1 = id2 + t1

The Code Generator

• Mapping the IR code generated by the translator into target machine code– Machine-dependent, complex

• Register allocation• Code scheduling

• Automatic construction of code generators has been actively studied– Matching a low-level IR to target-instruction

templates– This makes it easy to retarget a compiler to

a new target machine• E.g. GCC

Code Generation[Aho, Lam, Sethi, Ullman]

• Mapping intermediate representation of the source program into the target language– Machine code: register/memory location

assignments– E.g.

• LDF R2, id3MULF R2, R2, #60.0LDF R1, id2ADDF R1, R1, R2STF id1, R1

Phases of a Compiler [Aho, Lam, Sethi, Ullman]

Syntax Analyzer

character stream

target machine code

Lexical Analyzer

Intermediate Code Generator

Code Generator

token stream

syntax tree

intermediate representation

SymbolTable

Semantic Analyzer

syntax treeMachine-Independent

Code Optimization

Machine-Dependent Code Optimization

(optional)

(optional)

Compiler Writing Tools

• Compiler generators (compiler compilers)– Scanner generator– Parser generator– Symbol table manager– Attribute grammar evaluator– Code-generation tools

• Much of the effort in crafting a compiler lies in writing and debugging the semantic phases– Usually hand-coded

Programming Language and Compiler Design

• Many compiler techniques arise from the need to cope with some programming language construct

• The state of the art in compiler design also strongly affects programming language design

• The advantages of a programming language that’s easy to compile: – Easier to learn, read, understand– Have quality compilers on a wide variety of machines– Better code will be generated– Fewer compiler bugs– The compiler will be smaller, cheaper, faster, more

reliable, and more widely used– Better diagnostic messages and program development

tools

Computer Architecture and Compiler Design

• Compiler designers are responsible for making computing capability available to programmers

• Problems– Instruction sets for some popular architectures are highly

nonuniform– High-level programming language operations are not

always easy to support– Essential architectural features such as hardware caches

and distributed processors and memory are difficult to present to programmers in an architecturally independent manner

– Effective use of a large number of processors has always posed challenges to application developers and compiler writers

– For some programming languages, runtime checks for data and program integrity are dropped in favor of gains in execution speed

Compiler Design Considerations

• Debugging (development) compilers– Detailing programmer errors– E.g. CodeCenter– It can often tolerate or repair minor errors (e.g. inserting a

missing comma or parenthesis)• Optimizing compilers (Chap. 13 & 14)

– Producing efficient target code at the cost of increased compiler complexity and increased compilation times

– Optimal code, even when theoretically possible, is often infeasible in practice

– A variety of transformations might interfere with each other• Retargetable compilers (Chap. 11 & 13)

– Target architecture can be changed without its machine-independent components having to be rewritten

– More difficult to write, but development costs can be shared

Integrated Development Environments

• To integrate program development cycle into a single framework– Editing, compilation, testing, debugging

• Immediate feedback on syntax and semantic problems

• Focus on source program• Providing easy access to information about the

program• Many of the techniques in batch compilation can be

reformulated into incremental form to support IDEs– Parser, type checker, …

• In this book, we concentrate on the translation of C, C++, Java

End of Chapter 1

• Any Questions or Comments?

chapter 1. overview j. h. wang sep.15, 2015. outline history of compilation what compilers do...

Documents