1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos

Download 1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos

Post on 19-Dec-2015

213 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

<ul><li> Slide 1 </li> <li> 1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos Jan 14 The University of North Carolina at Chapel Hill </li> <li> Slide 2 </li> <li> 2 COMP 144 Programming Language Concepts Felix Hernandez-Campos Phases of Compilation </li> <li> Slide 3 </li> <li> 3 COMP 144 Programming Language Concepts Felix Hernandez-Campos Specification of Programming Languages PLs require precise definitions (i.e. no ambiguity)PLs require precise definitions (i.e. no ambiguity) Language form (Syntax) Language meaning (Semantics) Consequently, PLs are specified using formal notation:Consequently, PLs are specified using formal notation: Formal syntax Tokens Grammar Formal semantics </li> <li> Slide 4 </li> <li> 4 COMP 144 Programming Language Concepts Felix Hernandez-Campos Phases of Compilation </li> <li> Slide 5 </li> <li> 5 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanner Main task: identify tokensMain task: identify tokens Basic building blocks of programs E.g. keywords, identifiers, numbers, punctuation marks Desk calculator language example:Desk calculator language example: read A sum := A + 3.45e-3 write sum write sum / 2 </li> <li> Slide 6 </li> <li> 6 COMP 144 Programming Language Concepts Felix Hernandez-Campos Formal definition of tokens A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet {read, write, +, -, *, /, :=, 1, 2, , 10, , 3.45e-3, } A set of tokens is a regular set that can be defined by comprehension using a regular expressionA set of tokens is a regular set that can be defined by comprehension using a regular expression For every regular set, there is a deterministic finite automaton (DFA) that can recognize itFor every regular set, there is a deterministic finite automaton (DFA) that can recognize it i.e. determine whether a string belongs to the set or not Scanners extract tokens from source code in the same way DFAs determine membership </li> <li> Slide 7 </li> <li> 7 COMP 144 Programming Language Concepts Felix Hernandez-Campos Regular Expressions A regular expression (RE) is:A regular expression (RE) is: A single character The empty string, The concatenation of two regular expressions Notation: RE 1 RE 2 (i.e. RE 1 followed by RE 2 ) The union of two regular expressions Notation: RE 1 | RE 2 The closure of a regular expression Notation: RE* * is known as the Kleene star * represents the concatenation of 0 or more strings </li> <li> Slide 8 </li> <li> 8 COMP 144 Programming Language Concepts Felix Hernandez-Campos Token Definition Example Numeric literals in PascalNumeric literals in Pascal Definition of the token unsigned_number digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 unsigned_integer digit digit* unsigned_number unsigned_integer ( (. unsigned_integer ) | ) ( ( e ( + | | ) unsigned_integer ) | ) Recursion is not allowed!Recursion is not allowed! Notice the use of parentheses to avoid ambiguityNotice the use of parentheses to avoid ambiguity </li> <li> Slide 9 </li> <li> 9 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanning Pascal scannerPascal scannerPseudo-code </li> <li> Slide 10 </li> <li> 10 COMP 144 Programming Language Concepts Felix Hernandez-Campos DFAs Scanners areScanners aredeterministic finite automata (DFAs) With some hacks </li> <li> Slide 11 </li> <li> 11 COMP 144 Programming Language Concepts Felix Hernandez-Campos Difficulties Keywords and variable namesKeywords and variable names Look-aheadLook-ahead Pascals ranges [1..10] FORTRANs example DO 5 I=1,25 =&gt; Loop 25 times up to label 5 DO 5 I=1.25 =&gt; Assign 1.25 to DO5I NASAs Mariner 1 (apocryphal?) Pragmas: significant commentsPragmas: significant comments Compiler options </li> <li> Slide 12 </li> <li> 12 COMP 144 Programming Language Concepts Felix Hernandez-Campos Outline ofOutline of the Scanner </li> <li> Slide 13 </li> <li> 13 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanner Generators Scanners generators:Scanners generators: E.g. lex, flex These programs take a table as their input and return a program (i.e. a scanner) that can extract tokens from a stream of characters </li> <li> Slide 14 </li> <li> 14 COMP 144 Programming Language Concepts Felix Hernandez-Campos Table-drivenTable-drivenscanner Lexical errorsLexical errors </li> <li> Slide 15 </li> <li> 15 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanners and String Processing Scanning is a common task in programmingScanning is a common task in programming String processing E.g. reading configuration files, processing log files, StringTokenizer and StreamTokenizer in JavaStringTokenizer and StreamTokenizer in Java http://java.sun.com/products/jdk/1.2/docs/api/java/util/StringTokenizer.html http://java.sun.com/products/jdk/1.2/docs/api/java/util/StringTokenizer.html http://java.sun.com/products/jdk/1.2/docs/api/java/io/StreamTokenizer.html http://java.sun.com/products/jdk/1.2/docs/api/java/io/StreamTokenizer.html Regular expressions in Python and other scripting languagesRegular expressions in Python and other scripting languages </li> <li> Slide 16 </li> <li> 16 COMP 144 Programming Language Concepts Felix Hernandez-Campos Reading Assignment Scotts Chapter 2:Scotts Chapter 2: Introduction Section 2.1.1 Section 2.2.1 </li> </ul>