![Page 1: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/1.jpg)
Abstract viewsourcecode compiler machinecodeerrorsImplications:� recognize legal (and illegal) programs� generate correct code� manage storage of all variables and code� need format for object (or assembly) codeBig step up from assembler { higher level notationsCPSC 434 Lecture 2, Page 1
![Page 2: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/2.jpg)
Traditional two pass compilersourcecode frontend backend machinecodeerrorsilImplications:� intermediate language (il)� front end maps legal code into il� back end maps il onto target machine� simplify retargeting� allows multiple front ends� multiple passes ) better codeFront end is O(n) or O(n log n)Back end is NP-CompleteCPSC 434 Lecture 2, Page 2
![Page 3: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/3.jpg)
A fallacyFORTRANcode frontendC++code frontendADAcode frontendSmalltalkcode frontendbackend target 1backend target 2backend target 3
Can we build n�m compilers with n+mcomponents?� must encode all the knowledge in each front end� must represent all the features in one il� must handle all the features in each back endLimited success with low-level ilsCPSC 434 Lecture 2, Page 3
![Page 4: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/4.jpg)
Front endsourcecode scanner parser ilerrorstokensResponsibilities:� recognize legal procedure� report errors� produce il� preliminary storage map� shape the code for the back endMuch of front end construction can be automatedCPSC 434 Lecture 2, Page 4
![Page 5: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/5.jpg)
Scannersourcecode scanner parser ilerrorstokensScanner� maps characters into tokens { the basic unit ofsyntaxx = x + y;becomes<id, x> = <id, x> + <id, y> ;� character string for a token is a lexeme� typical tokens: number, id, +, -, *, /, do, end� eliminates white space (tabs, blanks, comments)� a key issue is speed) use specialized recognizer (lex)CPSC 434 Lecture 2, Page 5
![Page 6: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/6.jpg)
Specifying patternsA scanner must recognize various parts of thelanguage's syntax.Some parts are easy:white spacesome combination of <6 b > and tabkeywords and operatorsspeci�ed as literal patterns | do, endcommentsopening and closing delimiters | /* � � � */CPSC 434 Lecture 2, Page 6
![Page 7: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/7.jpg)
Specifying patternsOther parts are much harder:identi�ersalphabetic followed by k alphanumerics( , $, &, : : : )numbersintegers | 0 or digit from 1-9 followed bydigits from 0-9decimals | integer \." digits from 0-9reals | (integer or decimal) \E" (+ or -) digitsfrom 0-9complex | \(" real \," real \)"We need a powerful notation to specify thesepatterns.CPSC 434 Lecture 2, Page 7
![Page 8: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/8.jpg)
De�nitionsOperation De�nitionunion of L and M L [M = fs j s 2 L or s 2Mgwritten L [Mconcatenationof L and M LM = fst j s 2 L and t 2Mgwritten LMKleene closure of L L� = S1i=0 Liwritten L�positive closure of L L+ = S1i=1 Liwritten L+Aho, Sethi, and Ullman, Figure 3.8CPSC 434 Lecture 2, Page 8
![Page 9: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/9.jpg)
Regular expressionsPatterns are often speci�ed as regular languages.Notations used to describe a regular language (or aregular set) include both regular expressions andregular grammars.Regular expressions (over an alphabet �):1. � is a RE denoting the set f�g2. if a 2 �, then a is a RE denoting fag3. if r and s are REs, denoting L(r) and L(s),then:(r) is a RE denoting L(r)(r) j (s) is a RE denoting L(r) SL(s)(r)(s) is a RE denoting L(r)L(s)(r)� is a RE denoting L(r)�If we adopt a precedence for operators, the extraparentheses can go away. We assume closure, thenconcatenation, then alternation as the order ofprecedence.CPSC 434 Lecture 2, Page 9
![Page 10: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/10.jpg)
RE examplesidenti�erletter ! (a j b j c j ::: j z j A j B j C j ::: j Z)digit ! (0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9)id ! letter (letter j digit)�numbersinteger !(+ j � j �) (0 j (1 j 2 j 3 j ::: j 9) (digit)�)decimal ! integer . (digit)�real ! (integer j decimal) E (+ j �) (digit)+complex ! \(" real \," real \)"Numbers can get much more complicatedMost programming language tokens can bedescribed with regular expressions.We can use regular expressions to automaticallybuild scanners.CPSC 434 Lecture 2, Page 10
![Page 11: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/11.jpg)
Parsersourcecode scanner parser ilerrorstokensParser:� recognize context-free syntax� guide context-sensitive analysis� construct il(s)� produce meaningful error messages� attempt error correctionParser generators mechanize much of the workCPSC 434 Lecture 2, Page 11
![Page 12: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/12.jpg)
GrammarContext-free syntax is speci�ed with a grammar.<sheep noise> ::= baaj baa <sheep noise>This grammar de�nes the set of noises that a sheepmakes under normal circumstances.The format is called Backus-Naur form. (BNF)Formally, a grammar G = (S;N; T; P )S is the start symbolN is a set of non-terminal symbolsT is a set of terminal symbolsP is a set of productions or rewrite rules(P : N ! N [ T )CPSC 434 Lecture 2, Page 12
![Page 13: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/13.jpg)
SubstitutionContext free syntax can be put to better use.1 <goal> ::= <expr>2 <expr> ::= <expr> <op> <term>3 j <term>4 <term> ::= number5 j id6 <op> ::= +7 j -This grammar de�nes simple expressions withaddition and subtraction over the tokens id andnumber.S = <goal>T = number, id, +, -N = <goal>, <expr>, <term>, <op>P = 1, 2, 3, 4, 5, 6, 7CPSC 434 Lecture 2, Page 13
![Page 14: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/14.jpg)
Parse treeGiven a grammar, valid sentences can be derivedby repeated substitution.Prod'n. Result<goal>1 <expr>2 <expr> <op> <term>5 <expr> <op> y7 <expr> - y2 <expr> <op> <term> - y4 <expr> <op> 2 - y6 <expr> + 2 - y3 <term> + 2 - y5 x + 2 - yTo recognize a valid sentence in some cfg, wereverse this process and build up a parse.CPSC 434 Lecture 2, Page 14
![Page 15: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/15.jpg)
Parse treeA parse can be represented by a tree, called a parsetree or a syntax tree. goalexprop- term<id,y>exprop+ term<num,2>exprterm<id,x>Obviously, this contains a lot of unneededinformation.CPSC 434 Lecture 2, Page 15
![Page 16: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/16.jpg)
Abstract syntax treeSo, compilers often use an abstract syntax tree.- <id,y>+ <num,2><id,x>This is much more concise.Abstract syntax trees (ASTs) are often used as an ilbetween front end and back end.CPSC 434 Lecture 2, Page 16
![Page 17: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/17.jpg)
Back endil instructionselection registerallocation machinecodeResponsibilities� translate il into target machine code� choose instructions for each il operation� decide what to keep in registers at each point� ensure conformance with system interfacesAutomation has been less successful hereCPSC 434 Lecture 2, Page 17
![Page 18: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/18.jpg)
Instruction selectionil instructionselection registerallocation machinecodeInstruction Selection� produce compact, fast code� use available addressing modes� pattern matching problem{ ad hoc techniques{ tree pattern matching{ string pattern matching{ dynamic programmingCPSC 434 Lecture 2, Page 18
![Page 19: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/19.jpg)
Register allocationil instructionselection registerallocation machinecodeRegister Allocation� have value in a register when used� limited resources� changes instruction choices� can move loads and stores� optimal allocation is di�cult) NP-complete for 1 or k registersModern allocators often use an analogy to graphcoloringCPSC 434 Lecture 2, Page 19
![Page 20: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/20.jpg)
Traditional three pass compilersourcecode frontend middleend backend machinecodeerrorsCode Improvement� analyzes and changes il� goal is to reduce runtime� must preserve valuesCPSC 434 Lecture 2, Page 20
![Page 21: Abstract - Parasol Laboratoryrwerger/Courses/434/lec2.pdf · Abstract syn tax tree So, compilers often use an abstract syn tax tree.-< id, y > + < n um, 2 > id, x This](https://reader033.vdocuments.mx/reader033/viewer/2022060312/5f0b1b537e708231d42ee18b/html5/thumbnails/21.jpg)
Optimizer (middle end)il opt1 ... optn ilerrorsModern optimizers are usually built as a set ofpasses.Typical passes� discover & propagate constant values� reduction of operator strength� common subexpression elimination� redundant computation elimination� encode an idiom in some powerful instruction� move computation to less frequently executedplace (e.g., out of loops)CPSC 434 Lecture 2, Page 21