syntax (pre lecture)syntax (pre lecture) dr. neil t. dantam csci-400, colorado school of mines...
TRANSCRIPT
Syntax (Pre Lecture)
Dr. Neil T. Dantam
CSCI-400, Colorado School of Mines
Spring 2020
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 1 / 29
Introduction
IntroductionI Syntax: what programs we can write
/ what the language “looks like”
I Semantics: what these programsmeans / what the language does(more later in the course)
I concrete syntax – human-readable
I abstract syntax – encoded for use byinterpreter/compiler
I Formal language: mathematicalbasis to represent and analyze syntax
OutcomesI Know basic definitions of formal
language theory
I Design grammars for commonprogramming language constructs
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 2 / 29
Formal Language Theory
Outline
Formal Language Theory
GrammarsDefinitionGrammars for the Functional ProgramsAmbiguity and PrecedenceAbstract Syntax
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 3 / 29
Formal Language Theory
Why formal language?
OverviewI Some program text is “valid”
I And some is “invalid”I Formal language lets us:
I Precisely define the program textthat is valid/invalid
I Automatically recognize (parse)program text
I (Also, profound implications onwhat computers can do (CSCI-561))
Example
Valid
I if true then false else true
I 1 + 2 * 3
Invalid
I if true else then false true
I 1 + * 3
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 4 / 29
Formal Language Theory
Sets
Definition (Set)
An unordered collection of object’s without repetition
NotationI S = {s0, s1, s2, . . . , sn}I Empty Set: ∅I set membership:
x ∈ S︸ ︷︷ ︸x in S
x 6∈ S︸ ︷︷ ︸x not in S
Example
I S = {1, 2, 3}I Common Sets:
Integers: ZReal Numbers: RReal Vector: Rn
Booleans: BI 2 ∈ ZI π 6∈ Z
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 5 / 29
Formal Language Theory
Sequences
Definition (Sequence)
An ordered list of objects.
Example (Example)
(1, 2, 3, 5, 8, . . .)
Definition (Tuple)
A sequence of finite length.
I k-tuple: An tuple of length k
I pair: An 2-tuple
Example (Example)
I 3-tuple: (2, 4, 8)
I pair-tuple: (a, b)
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 6 / 29
Formal Language Theory
Strings
Definition (Symbol)
An abstract, primitive, atomic “thing”
Example (Symbols)
0, 1, a, x, foo, bar, +, -, if, match
Definition (Alphabet)
A non-empty, finite set of symbols
Example (Alphabets)
I ΣB = {0, 1}I ΣE = {a, b, c , d}I ΣC = {if, match, case, +, −}
Definition (String)
A sequence over some alphabet
Example (Strings)
I ΓB = (1, 0, 1, 0, 1, 0)
I ΓE = (h, e, l , l , o)
I ΓC = (3, +, x)
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 7 / 29
Formal Language Theory
Formal Languages
Definition (Formal Language)
A language is a set of strings.
Representation
I How would you represent:I The language (set) of arithmetic expressions?I The language (set) of well-formed XML documents?I The language (set) of valid variable names in C?I The language (set) of C programs?
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 8 / 29
Grammars
Outline
Formal Language Theory
GrammarsDefinitionGrammars for the Functional ProgramsAmbiguity and PrecedenceAbstract Syntax
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 9 / 29
Grammars Definition
Overview of Grammars
Example (Conditional Expression)
if e1 then e2 else e3
A conditional consists of the followingsequence:
1. keyword “if”
2. an expression
3. the keyword “then”
4. an expression
5. the keyword “else”
6. an expression
OverviewI Programs are written as text
I There is a structure to the program
I Grammars represent this structure
Grammar
cond → “if” exp “then” exp “else” exp
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 10 / 29
Grammars Definition
Terminal and Nonterminal Symbols
Terminals and Nonterminals
Terminals: The alphabet of thelanguage. Atomic.
Nonterminals: Decompose into multipleterminals and nonterminals.Non-atomic.
Example
Grammar
cond → “if” exp “then” exp “else” exp
exp → “true” | “false”
I Terminals: if, then, else, true, false
I Nonterminals: cond, exp
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 11 / 29
Grammars Definition
Phases of Interpretation/Compilation
I Analysis (front-end):I Lexing: convert text to terminalsI Syntax: convert terminals to syntax treeI Semantics: check types
I Synthesis (back-end):I Compiler: Construct machine codeI Interpreter: Execute the program
Lexical Analysis
Syntax Analysis
Semantic Analysis
backend
Syntax Analysis
text
terminal sequence
abstract syntax tree
annotated syntax tree
machinecode
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 12 / 29
Grammars Definition
“Compiler Compilers”
I Generate code from description offormal language:I Lexer: Regular ExpressionsI Parser: Grammar
I Examples:I Lexer Generators:
I Lex / FlexI Ragel
I Parser Generators:I YACC / Bison
I Combined Generators:I JavaCCI ANTLR
LexerGenerator
RegularExpressions
Lexer
ParserGenerator
Grammar Parser
Lexical Analysis
Syntax Analysis
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 13 / 29
Grammars Definition
Context-Free Grammars
Definition
A context-free grammar G is thetuple G = (V ,T ,P,S), where:
I V is a finite set of nonterminals
I T is a finite set of terminals
I P is a finite set of productions ofform V → X1, . . . ,X1,where each Xi ∈ V ∪ T
I S ∈ V is the start symbol
Example
Grammar
cond → “if” exp “then” exp “else” exp
exp → “true” | “false”
Elements
I V = {cond, exp}I T = {if, then, else, true, false}I P = {cond→ “if” exp “then” exp “else” exp,
exp→ “True”, exp→ “False”}I S ∈ cond
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 14 / 29
Grammars Definition
What’s “context-free?”
v︸︷︷︸left-hand side
→ x0 x1 . . . xn︸ ︷︷ ︸right-hand side
no surrounding symbols
Nonterminals’ expansion is independent of surrounding context
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 15 / 29
Grammars Definition
What’s not “context-free?”
Non-context-free languages
I Most programming languagesyntax is context-free or nearlyso
I C/C++ are almost context free
I In practice: integrate parsingand lexing to distinguish typeand variable names
Counterexample (C/C++)
/∗ Context :∗ I s x a type or v a r i a b l e ?∗/
x ∗ y ; // d e c l a r a t i o n or// m u l t i p l i c a t i o n ?
f ( ( x )∗ y ) ; // m u l t i p l i c a t i o n or// d e r e f . and c a s t ?
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 16 / 29
Grammars Definition
Backus-Naur Form (BNF)
Example
LATEX
〈cond〉 → “if”〈exp〉“then”〈exp〉“else”〈exp〉〈exp〉 → “true” | “false”
Plain Text
<cond> ::= "if" <exp> "then" <exp> "else" <exp>
<exp> ::= "true" | "false"
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 17 / 29
Grammars Definition
Parse Tree
Parse TreesI Leaves: Terminals
I Nodes: Nonterminals
I Edges: Productions
Text
if true then
false
else
true
Grammar
cond → “if” exp “then” exp “else” exp
exp → “true” | “false”
Parse Tree
cond
if exp
true
then exp
false
else exp
true
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 18 / 29
Grammars Grammars for the Functional Programs
Example: Lambda Calculus Gramamr
Grammar
〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”
〈sym〉 → “a” | “b” | “c” | . . .
Parse Tree
(λa . a) b
〈exp〉
〈exp〉
“(” 〈exp〉
“λ” 〈sym〉
“a”
“.” 〈exp〉
〈sym〉
“a”
“)”
〈exp〉
〈sym〉
“b”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 19 / 29
Grammars Grammars for the Functional Programs
Exercise: Lambda Calculus Grammar
Grammar
〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”
〈sym〉 → “a” | “b” | “c” | . . .
Parse Tree
aλb . b c
〈exp〉
〈exp〉
〈sym〉
“a”
〈exp〉
“λ” 〈sym〉
“b”
“.” 〈exp〉
〈exp〉
〈sym〉
“b”
〈exp〉
〈sym〉
“c”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 20 / 29
Grammars Grammars for the Functional Programs
Exercise: Let Expression
Grammar
〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”
|
“let”〈sym〉“←”〈exp〉“in”〈exp〉
〈sym〉 → “a” | “b” | “c” | . . .
Parse Tree
let x ← f y in (x z)
〈exp〉
“let” 〈sym〉
“x”
“←” 〈exp〉
〈exp〉
〈sym〉
“f ”
〈exp〉
〈sym〉
“y”
“in” 〈exp〉
“(”〈exp〉
〈exp〉
〈sym〉
“x”
〈exp〉
〈sym〉
“z”
“)”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 21 / 29
Grammars Ambiguity and Precedence
Exercise: Arithmetic
Grammar
〈e〉 → 〈e〉“+”〈e〉| 〈e〉“∗”〈e〉| “1” | “2” | “3” | . . .
Parse Tree
1 + 2 ∗ 3
〈e〉
〈e〉
“1”
“+” 〈e〉
〈e〉
“2”
“∗” 〈e〉
“3”
〈e〉
〈e〉
〈e〉
“1”
“+” 〈e〉
“2”
“∗” 〈e〉
“3”
Ambiguous: multiple valid parse trees
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 22 / 29
Grammars Ambiguity and Precedence
Handling PrecedenceModify Grammar
Modify Grammar
〈e〉 → 〈t〉 | 〈e〉“+”〈t〉〈t〉 → 〈n〉 | 〈t〉“∗”〈n〉〈n〉 → “1” | “2” | “3” | . . .
Parse Tree
1 + 2 ∗ 3
〈e〉
〈e〉
〈t〉
〈n〉
“1”
“+” 〈t〉
〈t〉
〈n〉
“2”
“∗” 〈n〉
“3”
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 23 / 29
Grammars Ambiguity and Precedence
Handling PrecedenceParser-specific
Bison Grammar
expr: expr ’+’ expr
| expr ’-’ expr
| expr ’*’ expr
| expr ’/’ expr
| num;
Bison Precedence
% left ’+’ ’-’
% left ’*’ ’/’
Directs (some) parsing algorithms to resolve ambiguity
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 24 / 29
Grammars Abstract Syntax
Abstract Syntax
OverviewI Data structure encoding the
program for compiler orinterpreter
I Ambiguity resolved in the parser,not stored in abstract syntax
I Abstract Syntax Tree (AST):Use algebraic data types
Conditional Type
type Exp ←| TrueExp| FalseExp| CondExp of Exp× Exp× Exp
Example
if true then false else true
CondExp
TrueExp FalseExp TrueExp
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 25 / 29
Grammars Abstract Syntax
Abstract Syntax Tree vs. Parse Tree
Parse Tree
Directly maps to concrete syntax andgrammar.
〈e〉
〈e〉
〈t〉
〈n〉
“1”
“+” 〈t〉
〈t〉
〈n〉
“2”
“∗” 〈n〉
“3”
Abstract Syntax Tree
Abstracts precedence, parenthesis, etc.type Exp ←| NumExp of int| AddExp of Exp× Exp| MulExp of Exp× Exp
AddExp
NumExp
1
MulExp
NumExp
2
NumExp
3
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 26 / 29
Grammars Abstract Syntax
Exercise: Lambda Calculus Abstract Syntax
Data Type
type Exp ←
| SymExp of string| LambdaExp of string × Exp| CallExp of Exp× Exp
AST
aλb . b c
CallExp
SymExp
a
LambdaExp
SymExp
b
CallExp
SymExp
b
SymExp
c
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 27 / 29
Grammars Abstract Syntax
Summary
I Formal languages: underlying theory for lexical and syntax analysis
I Grammars: representation for programming language syntax
I Abstract Syntax: the important structure of the program
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 28 / 29
Grammars Abstract Syntax
References
Hennesssy The Semantics of Programming Languages
I Ch 1.2 Concrete and Abstract Syntax
Clarkson https://www.cs.cornell.edu/courses/cs3110/2019fa/textbook/
I Ch 10.1 Lexing and Parsing
Alt. Textbook Aho, Lam, Sethi, and Ullman. Compilers: Principles, Techniques, & Tools.
I Ch 4 Syntax Analysis
Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 29 / 29