Syntax (Pre Lecture)
Dr. Neil T. Dantam
CSCI-400, Colorado School of Mines
Spring 2020

Syntax (Pre Lecture)

Dr. Neil T. Dantam

CSCI-400, Colorado School of Mines

Spring 2020

IntroductionI Syntax: what programs we can write

/ what the language “looks like”

I Semantics: what these programsmeans / what the language does(more later in the course)

I concrete syntax – human-readable

I abstract syntax – encoded for use byinterpreter/compiler

I Formal language: mathematicalbasis to represent and analyze syntax

OutcomesI Know basic definitions of formal

language theory

I Design grammars for commonprogramming language constructs

Formal Language Theory


Formal Language Theory

GrammarsDefinitionGrammars for the Functional ProgramsAmbiguity and PrecedenceAbstract Syntax

Formal Language Theory

Why formal language?

OverviewI Some program text is “valid”

I And some is “invalid”I Formal language lets us:

I Precisely define the program textthat is valid/invalid

I Automatically recognize (parse)program text

I (Also, profound implications onwhat computers can do (CSCI-561))



I if true then false else true

I 1 + 2 * 3


I if true else then false true

I 1 + * 3

Formal Language Theory


Definition (Set)

An unordered collection of object’s without repetition

NotationI S = {s0, s1, s2, . . . , sn}I Empty Set: ∅I set membership:

x ∈ S︸ ︷︷ ︸x in S

x 6∈ S︸ ︷︷ ︸x not in S


I S = {1, 2, 3}I Common Sets:

Integers: ZReal Numbers: RReal Vector: Rn

Booleans: BI 2 ∈ ZI π 6∈ Z

Formal Language Theory


Definition (Sequence)

An ordered list of objects.

Example (Example)

(1, 2, 3, 5, 8, . . .)

Definition (Tuple)

A sequence of finite length.

I k-tuple: An tuple of length k

I pair: An 2-tuple

Example (Example)

I 3-tuple: (2, 4, 8)

I pair-tuple: (a, b)

Formal Language Theory


Definition (Symbol)

An abstract, primitive, atomic “thing”

Example (Symbols)

0, 1, a, x, foo, bar, +, -, if, match

Definition (Alphabet)

A non-empty, finite set of symbols

Example (Alphabets)

I ΣB = {0, 1}I ΣE = {a, b, c , d}I ΣC = {if, match, case, +, −}

Definition (String)

A sequence over some alphabet

Example (Strings)

I ΓB = (1, 0, 1, 0, 1, 0)

I ΓE = (h, e, l , l , o)

I ΓC = (3, +, x)

Formal Language Theory

Formal Languages

Definition (Formal Language)

A language is a set of strings.


I How would you represent:I The language (set) of arithmetic expressions?I The language (set) of well-formed XML documents?I The language (set) of valid variable names in C?I The language (set) of C programs?

Formal Language Theory

GrammarsDefinitionGrammars for the Functional ProgramsAmbiguity and PrecedenceAbstract Syntax

Grammars Definition

Overview of Grammars

Example (Conditional Expression)

if e1 then e2 else e3

A conditional consists of the followingsequence:

1. keyword “if”

2. an expression

3. the keyword “then”

4. an expression

5. the keyword “else”

6. an expression

OverviewI Programs are written as text

I There is a structure to the program

I Grammars represent this structure


cond → “if” exp “then” exp “else” exp

Grammars Definition

Terminal and Nonterminal Symbols

Terminals and Nonterminals

Terminals: The alphabet of thelanguage. Atomic.

Nonterminals: Decompose into multipleterminals and nonterminals.Non-atomic.



cond → “if” exp “then” exp “else” exp

exp → “true” | “false”

I Terminals: if, then, else, true, false

I Nonterminals: cond, exp

Grammars Definition

Phases of Interpretation/Compilation

I Analysis (front-end):I Lexing: convert text to terminalsI Syntax: convert terminals to syntax treeI Semantics: check types

I Synthesis (back-end):I Compiler: Construct machine codeI Interpreter: Execute the program

Lexical Analysis

Syntax Analysis

Semantic Analysis


Syntax Analysis


terminal sequence

abstract syntax tree

annotated syntax tree


Grammars Definition

“Compiler Compilers”

I Generate code from description offormal language:I Lexer: Regular ExpressionsI Parser: Grammar

I Examples:I Lexer Generators:

I Lex / FlexI Ragel

I Parser Generators:I YACC / Bison

I Combined Generators:I JavaCCI ANTLR





Grammar Parser

Lexical Analysis

Syntax Analysis

Grammars Definition

Context-Free Grammars


A context-free grammar G is thetuple G = (V ,T ,P,S), where:

I V is a finite set of nonterminals

I T is a finite set of terminals

I P is a finite set of productions ofform V → X1, . . . ,X1,where each Xi ∈ V ∪ T

I S ∈ V is the start symbol



cond → “if” exp “then” exp “else” exp

exp → “true” | “false”


I V = {cond, exp}I T = {if, then, else, true, false}I P = {cond→ “if” exp “then” exp “else” exp,

exp→ “True”, exp→ “False”}I S ∈ cond

Grammars Definition

What’s “context-free?”

v︸︷︷︸left-hand side

→ x0 x1 . . . xn︸ ︷︷ ︸right-hand side

no surrounding symbols

Nonterminals’ expansion is independent of surrounding context

Grammars Definition

What’s not “context-free?”

Non-context-free languages

I Most programming languagesyntax is context-free or nearlyso

I C/C++ are almost context free

I In practice: integrate parsingand lexing to distinguish typeand variable names

Counterexample (C/C++)

/∗ Context :∗ I s x a type or v a r i a b l e ?∗/

x ∗ y ; // d e c l a r a t i o n or// m u l t i p l i c a t i o n ?

f ( ( x )∗ y ) ; // m u l t i p l i c a t i o n or// d e r e f . and c a s t ?

Grammars Definition

Backus-Naur Form (BNF)



〈cond〉 → “if”〈exp〉“then”〈exp〉“else”〈exp〉〈exp〉 → “true” | “false”

Plain Text

<cond> ::= "if" <exp> "then" <exp> "else" <exp>

<exp> ::= "true" | "false"

Grammars Definition

Parse Tree

Parse TreesI Leaves: Terminals

I Nodes: Nonterminals

I Edges: Productions


if true then





cond → “if” exp “then” exp “else” exp

exp → “true” | “false”

Parse Tree


if exp


then exp


else exp


Grammars Grammars for the Functional Programs

Example: Lambda Calculus Gramamr


〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”

〈sym〉 → “a” | “b” | “c” | . . .

Parse Tree

(λa . a) b



“(” 〈exp〉

“λ” 〈sym〉


“.” 〈exp〉







Grammars Grammars for the Functional Programs

Exercise: Lambda Calculus Grammar


〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”

〈sym〉 → “a” | “b” | “c” | . . .

Parse Tree

aλb . b c






“λ” 〈sym〉


“.” 〈exp〉







Grammars Grammars for the Functional Programs

Exercise: Let Expression


〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”



〈sym〉 → “a” | “b” | “c” | . . .

Parse Tree

let x ← f y in (x z)


“let” 〈sym〉


“←” 〈exp〉



“f ”




“in” 〈exp〉









Grammars Ambiguity and Precedence

Exercise: Arithmetic


〈e〉 → 〈e〉“+”〈e〉| 〈e〉“∗”〈e〉| “1” | “2” | “3” | . . .

Parse Tree

1 + 2 ∗ 3




“+” 〈e〉



“∗” 〈e〉






“+” 〈e〉


“∗” 〈e〉


Ambiguous: multiple valid parse trees

Grammars Ambiguity and Precedence

Handling PrecedenceModify Grammar

Modify Grammar

〈e〉 → 〈t〉 | 〈e〉“+”〈t〉〈t〉 → 〈n〉 | 〈t〉“∗”〈n〉〈n〉 → “1” | “2” | “3” | . . .

Parse Tree

1 + 2 ∗ 3






“+” 〈t〉




“∗” 〈n〉


Grammars Ambiguity and Precedence

Handling PrecedenceParser-specific

Bison Grammar

expr: expr ’+’ expr

| expr ’-’ expr

| expr ’*’ expr

| expr ’/’ expr

| num;

Bison Precedence

% left ’+’ ’-’

% left ’*’ ’/’

Directs (some) parsing algorithms to resolve ambiguity

Grammars Abstract Syntax

Abstract Syntax

OverviewI Data structure encoding the

program for compiler orinterpreter

I Ambiguity resolved in the parser,not stored in abstract syntax

I Abstract Syntax Tree (AST):Use algebraic data types

Conditional Type

type Exp ←| TrueExp| FalseExp| CondExp of Exp× Exp× Exp


if true then false else true


TrueExp FalseExp TrueExp

Grammars Abstract Syntax

Abstract Syntax Tree vs. Parse Tree

Parse Tree

Directly maps to concrete syntax andgrammar.






“+” 〈t〉




“∗” 〈n〉


Abstract Syntax Tree

Abstracts precedence, parenthesis, etc.type Exp ←| NumExp of int| AddExp of Exp× Exp| MulExp of Exp× Exp









Grammars Abstract Syntax

Exercise: Lambda Calculus Abstract Syntax

Data Type

type Exp ←

| SymExp of string| LambdaExp of string × Exp| CallExp of Exp× Exp


aλb . b c












Grammars Abstract Syntax


I Formal languages: underlying theory for lexical and syntax analysis

I Grammars: representation for programming language syntax

I Abstract Syntax: the important structure of the program

Grammars Abstract Syntax


Hennesssy The Semantics of Programming Languages

I Ch 1.2 Concrete and Abstract Syntax


I Ch 10.1 Lexing and Parsing

Alt. Textbook Aho, Lam, Sethi, and Ullman. Compilers: Principles, Techniques, & Tools.

I Ch 4 Syntax Analysis

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 29 / 29