abstract - parasol laboratoryrwerger/courses/434/lec2.pdf · abstract syn tax tree so, compilers...

21

Upload: others

Post on 22-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Abstract viewsourcecode compiler machinecodeerrorsImplications:� recognize legal (and illegal) programs� generate correct code� manage storage of all variables and code� need format for object (or assembly) codeBig step up from assembler { higher level notationsCPSC 434 Lecture 2, Page 1

Page 2: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Traditional two pass compilersourcecode frontend backend machinecodeerrorsilImplications:� intermediate language (il)� front end maps legal code into il� back end maps il onto target machine� simplify retargeting� allows multiple front ends� multiple passes ) better codeFront end is O(n) or O(n log n)Back end is NP-CompleteCPSC 434 Lecture 2, Page 2

Page 3: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

A fallacyFORTRANcode frontendC++code frontendADAcode frontendSmalltalkcode frontendbackend target 1backend target 2backend target 3

Can we build n�m compilers with n+mcomponents?� must encode all the knowledge in each front end� must represent all the features in one il� must handle all the features in each back endLimited success with low-level ilsCPSC 434 Lecture 2, Page 3

Page 4: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Front endsourcecode scanner parser ilerrorstokensResponsibilities:� recognize legal procedure� report errors� produce il� preliminary storage map� shape the code for the back endMuch of front end construction can be automatedCPSC 434 Lecture 2, Page 4

Page 5: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Scannersourcecode scanner parser ilerrorstokensScanner� maps characters into tokens { the basic unit ofsyntaxx = x + y;becomes<id, x> = <id, x> + <id, y> ;� character string for a token is a lexeme� typical tokens: number, id, +, -, *, /, do, end� eliminates white space (tabs, blanks, comments)� a key issue is speed) use specialized recognizer (lex)CPSC 434 Lecture 2, Page 5

Page 6: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Specifying patternsA scanner must recognize various parts of thelanguage's syntax.Some parts are easy:white spacesome combination of <6 b > and tabkeywords and operatorsspeci�ed as literal patterns | do, endcommentsopening and closing delimiters | /* � � � */CPSC 434 Lecture 2, Page 6

Page 7: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Specifying patternsOther parts are much harder:identi�ersalphabetic followed by k alphanumerics( , $, &, : : : )numbersintegers | 0 or digit from 1-9 followed bydigits from 0-9decimals | integer \." digits from 0-9reals | (integer or decimal) \E" (+ or -) digitsfrom 0-9complex | \(" real \," real \)"We need a powerful notation to specify thesepatterns.CPSC 434 Lecture 2, Page 7

Page 8: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

De�nitionsOperation De�nitionunion of L and M L [M = fs j s 2 L or s 2Mgwritten L [Mconcatenationof L and M LM = fst j s 2 L and t 2Mgwritten LMKleene closure of L L� = S1i=0 Liwritten L�positive closure of L L+ = S1i=1 Liwritten L+Aho, Sethi, and Ullman, Figure 3.8CPSC 434 Lecture 2, Page 8

Page 9: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Regular expressionsPatterns are often speci�ed as regular languages.Notations used to describe a regular language (or aregular set) include both regular expressions andregular grammars.Regular expressions (over an alphabet �):1. � is a RE denoting the set f�g2. if a 2 �, then a is a RE denoting fag3. if r and s are REs, denoting L(r) and L(s),then:(r) is a RE denoting L(r)(r) j (s) is a RE denoting L(r) SL(s)(r)(s) is a RE denoting L(r)L(s)(r)� is a RE denoting L(r)�If we adopt a precedence for operators, the extraparentheses can go away. We assume closure, thenconcatenation, then alternation as the order ofprecedence.CPSC 434 Lecture 2, Page 9

Page 10: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

RE examplesidenti�erletter ! (a j b j c j ::: j z j A j B j C j ::: j Z)digit ! (0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9)id ! letter (letter j digit)�numbersinteger !(+ j � j �) (0 j (1 j 2 j 3 j ::: j 9) (digit)�)decimal ! integer . (digit)�real ! (integer j decimal) E (+ j �) (digit)+complex ! \(" real \," real \)"Numbers can get much more complicatedMost programming language tokens can bedescribed with regular expressions.We can use regular expressions to automaticallybuild scanners.CPSC 434 Lecture 2, Page 10

Page 11: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Parsersourcecode scanner parser ilerrorstokensParser:� recognize context-free syntax� guide context-sensitive analysis� construct il(s)� produce meaningful error messages� attempt error correctionParser generators mechanize much of the workCPSC 434 Lecture 2, Page 11

Page 12: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

GrammarContext-free syntax is speci�ed with a grammar.<sheep noise> ::= baaj baa <sheep noise>This grammar de�nes the set of noises that a sheepmakes under normal circumstances.The format is called Backus-Naur form. (BNF)Formally, a grammar G = (S;N; T; P )S is the start symbolN is a set of non-terminal symbolsT is a set of terminal symbolsP is a set of productions or rewrite rules(P : N ! N [ T )CPSC 434 Lecture 2, Page 12

Page 13: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

SubstitutionContext free syntax can be put to better use.1 <goal> ::= <expr>2 <expr> ::= <expr> <op> <term>3 j <term>4 <term> ::= number5 j id6 <op> ::= +7 j -This grammar de�nes simple expressions withaddition and subtraction over the tokens id andnumber.S = <goal>T = number, id, +, -N = <goal>, <expr>, <term>, <op>P = 1, 2, 3, 4, 5, 6, 7CPSC 434 Lecture 2, Page 13

Page 14: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Parse treeGiven a grammar, valid sentences can be derivedby repeated substitution.Prod'n. Result<goal>1 <expr>2 <expr> <op> <term>5 <expr> <op> y7 <expr> - y2 <expr> <op> <term> - y4 <expr> <op> 2 - y6 <expr> + 2 - y3 <term> + 2 - y5 x + 2 - yTo recognize a valid sentence in some cfg, wereverse this process and build up a parse.CPSC 434 Lecture 2, Page 14

Page 15: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Parse treeA parse can be represented by a tree, called a parsetree or a syntax tree. goalexprop- term<id,y>exprop+ term<num,2>exprterm<id,x>Obviously, this contains a lot of unneededinformation.CPSC 434 Lecture 2, Page 15

Page 16: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Abstract syntax treeSo, compilers often use an abstract syntax tree.- <id,y>+ <num,2><id,x>This is much more concise.Abstract syntax trees (ASTs) are often used as an ilbetween front end and back end.CPSC 434 Lecture 2, Page 16

Page 17: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Back endil instructionselection registerallocation machinecodeResponsibilities� translate il into target machine code� choose instructions for each il operation� decide what to keep in registers at each point� ensure conformance with system interfacesAutomation has been less successful hereCPSC 434 Lecture 2, Page 17

Page 18: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Instruction selectionil instructionselection registerallocation machinecodeInstruction Selection� produce compact, fast code� use available addressing modes� pattern matching problem{ ad hoc techniques{ tree pattern matching{ string pattern matching{ dynamic programmingCPSC 434 Lecture 2, Page 18

Page 19: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Register allocationil instructionselection registerallocation machinecodeRegister Allocation� have value in a register when used� limited resources� changes instruction choices� can move loads and stores� optimal allocation is di�cult) NP-complete for 1 or k registersModern allocators often use an analogy to graphcoloringCPSC 434 Lecture 2, Page 19

Page 20: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Traditional three pass compilersourcecode frontend middleend backend machinecodeerrorsCode Improvement� analyzes and changes il� goal is to reduce runtime� must preserve valuesCPSC 434 Lecture 2, Page 20

Page 21: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This

Optimizer (middle end)il opt1 ... optn ilerrorsModern optimizers are usually built as a set ofpasses.Typical passes� discover & propagate constant values� reduction of operator strength� common subexpression elimination� redundant computation elimination� encode an idiom in some powerful instruction� move computation to less frequently executedplace (e.g., out of loops)CPSC 434 Lecture 2, Page 21