syntax (pre lecture)syntax (pre lecture) dr. neil t. dantam csci-400, colorado school of mines...

29
Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 1 / 29

Upload: others

Post on 08-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Syntax (Pre Lecture)

Dr. Neil T. Dantam

CSCI-400, Colorado School of Mines

Spring 2020

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 1 / 29

Page 2: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Introduction

IntroductionI Syntax: what programs we can write

/ what the language “looks like”

I Semantics: what these programsmeans / what the language does(more later in the course)

I concrete syntax – human-readable

I abstract syntax – encoded for use byinterpreter/compiler

I Formal language: mathematicalbasis to represent and analyze syntax

OutcomesI Know basic definitions of formal

language theory

I Design grammars for commonprogramming language constructs

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 2 / 29

Page 3: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Formal Language Theory

Outline

Formal Language Theory

GrammarsDefinitionGrammars for the Functional ProgramsAmbiguity and PrecedenceAbstract Syntax

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 3 / 29

Page 4: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Formal Language Theory

Why formal language?

OverviewI Some program text is “valid”

I And some is “invalid”I Formal language lets us:

I Precisely define the program textthat is valid/invalid

I Automatically recognize (parse)program text

I (Also, profound implications onwhat computers can do (CSCI-561))

Example

Valid

I if true then false else true

I 1 + 2 * 3

Invalid

I if true else then false true

I 1 + * 3

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 4 / 29

Page 5: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Formal Language Theory

Sets

Definition (Set)

An unordered collection of object’s without repetition

NotationI S = {s0, s1, s2, . . . , sn}I Empty Set: ∅I set membership:

x ∈ S︸ ︷︷ ︸x in S

x 6∈ S︸ ︷︷ ︸x not in S

Example

I S = {1, 2, 3}I Common Sets:

Integers: ZReal Numbers: RReal Vector: Rn

Booleans: BI 2 ∈ ZI π 6∈ Z

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 5 / 29

Page 6: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Formal Language Theory

Sequences

Definition (Sequence)

An ordered list of objects.

Example (Example)

(1, 2, 3, 5, 8, . . .)

Definition (Tuple)

A sequence of finite length.

I k-tuple: An tuple of length k

I pair: An 2-tuple

Example (Example)

I 3-tuple: (2, 4, 8)

I pair-tuple: (a, b)

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 6 / 29

Page 7: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Formal Language Theory

Strings

Definition (Symbol)

An abstract, primitive, atomic “thing”

Example (Symbols)

0, 1, a, x, foo, bar, +, -, if, match

Definition (Alphabet)

A non-empty, finite set of symbols

Example (Alphabets)

I ΣB = {0, 1}I ΣE = {a, b, c , d}I ΣC = {if, match, case, +, −}

Definition (String)

A sequence over some alphabet

Example (Strings)

I ΓB = (1, 0, 1, 0, 1, 0)

I ΓE = (h, e, l , l , o)

I ΓC = (3, +, x)

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 7 / 29

Page 8: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Formal Language Theory

Formal Languages

Definition (Formal Language)

A language is a set of strings.

Representation

I How would you represent:I The language (set) of arithmetic expressions?I The language (set) of well-formed XML documents?I The language (set) of valid variable names in C?I The language (set) of C programs?

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 8 / 29

Page 9: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars

Outline

Formal Language Theory

GrammarsDefinitionGrammars for the Functional ProgramsAmbiguity and PrecedenceAbstract Syntax

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 9 / 29

Page 10: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

Overview of Grammars

Example (Conditional Expression)

if e1 then e2 else e3

A conditional consists of the followingsequence:

1. keyword “if”

2. an expression

3. the keyword “then”

4. an expression

5. the keyword “else”

6. an expression

OverviewI Programs are written as text

I There is a structure to the program

I Grammars represent this structure

Grammar

cond → “if” exp “then” exp “else” exp

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 10 / 29

Page 11: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

Terminal and Nonterminal Symbols

Terminals and Nonterminals

Terminals: The alphabet of thelanguage. Atomic.

Nonterminals: Decompose into multipleterminals and nonterminals.Non-atomic.

Example

Grammar

cond → “if” exp “then” exp “else” exp

exp → “true” | “false”

I Terminals: if, then, else, true, false

I Nonterminals: cond, exp

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 11 / 29

Page 12: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

Phases of Interpretation/Compilation

I Analysis (front-end):I Lexing: convert text to terminalsI Syntax: convert terminals to syntax treeI Semantics: check types

I Synthesis (back-end):I Compiler: Construct machine codeI Interpreter: Execute the program

Lexical Analysis

Syntax Analysis

Semantic Analysis

backend

Syntax Analysis

text

terminal sequence

abstract syntax tree

annotated syntax tree

machinecode

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 12 / 29

Page 13: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

“Compiler Compilers”

I Generate code from description offormal language:I Lexer: Regular ExpressionsI Parser: Grammar

I Examples:I Lexer Generators:

I Lex / FlexI Ragel

I Parser Generators:I YACC / Bison

I Combined Generators:I JavaCCI ANTLR

LexerGenerator

RegularExpressions

Lexer

ParserGenerator

Grammar Parser

Lexical Analysis

Syntax Analysis

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 13 / 29

Page 14: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

Context-Free Grammars

Definition

A context-free grammar G is thetuple G = (V ,T ,P,S), where:

I V is a finite set of nonterminals

I T is a finite set of terminals

I P is a finite set of productions ofform V → X1, . . . ,X1,where each Xi ∈ V ∪ T

I S ∈ V is the start symbol

Example

Grammar

cond → “if” exp “then” exp “else” exp

exp → “true” | “false”

Elements

I V = {cond, exp}I T = {if, then, else, true, false}I P = {cond→ “if” exp “then” exp “else” exp,

exp→ “True”, exp→ “False”}I S ∈ cond

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 14 / 29

Page 15: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

What’s “context-free?”

v︸︷︷︸left-hand side

→ x0 x1 . . . xn︸ ︷︷ ︸right-hand side

no surrounding symbols

Nonterminals’ expansion is independent of surrounding context

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 15 / 29

Page 16: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

What’s not “context-free?”

Non-context-free languages

I Most programming languagesyntax is context-free or nearlyso

I C/C++ are almost context free

I In practice: integrate parsingand lexing to distinguish typeand variable names

Counterexample (C/C++)

/∗ Context :∗ I s x a type or v a r i a b l e ?∗/

x ∗ y ; // d e c l a r a t i o n or// m u l t i p l i c a t i o n ?

f ( ( x )∗ y ) ; // m u l t i p l i c a t i o n or// d e r e f . and c a s t ?

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 16 / 29

Page 17: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

Backus-Naur Form (BNF)

Example

LATEX

〈cond〉 → “if”〈exp〉“then”〈exp〉“else”〈exp〉〈exp〉 → “true” | “false”

Plain Text

<cond> ::= "if" <exp> "then" <exp> "else" <exp>

<exp> ::= "true" | "false"

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 17 / 29

Page 18: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Definition

Parse Tree

Parse TreesI Leaves: Terminals

I Nodes: Nonterminals

I Edges: Productions

Text

if true then

false

else

true

Grammar

cond → “if” exp “then” exp “else” exp

exp → “true” | “false”

Parse Tree

cond

if exp

true

then exp

false

else exp

true

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 18 / 29

Page 19: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Grammars for the Functional Programs

Example: Lambda Calculus Gramamr

Grammar

〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”

〈sym〉 → “a” | “b” | “c” | . . .

Parse Tree

(λa . a) b

〈exp〉

〈exp〉

“(” 〈exp〉

“λ” 〈sym〉

“a”

“.” 〈exp〉

〈sym〉

“a”

“)”

〈exp〉

〈sym〉

“b”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 19 / 29

Page 20: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Grammars for the Functional Programs

Exercise: Lambda Calculus Grammar

Grammar

〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”

〈sym〉 → “a” | “b” | “c” | . . .

Parse Tree

aλb . b c

〈exp〉

〈exp〉

〈sym〉

“a”

〈exp〉

“λ” 〈sym〉

“b”

“.” 〈exp〉

〈exp〉

〈sym〉

“b”

〈exp〉

〈sym〉

“c”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 20 / 29

Page 21: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Grammars for the Functional Programs

Exercise: Let Expression

Grammar

〈exp〉 → 〈sym〉| “λ”〈sym〉“.”〈exp〉| 〈exp〉〈exp〉| “(”〈exp〉“)”

|

“let”〈sym〉“←”〈exp〉“in”〈exp〉

〈sym〉 → “a” | “b” | “c” | . . .

Parse Tree

let x ← f y in (x z)

〈exp〉

“let” 〈sym〉

“x”

“←” 〈exp〉

〈exp〉

〈sym〉

“f ”

〈exp〉

〈sym〉

“y”

“in” 〈exp〉

“(”〈exp〉

〈exp〉

〈sym〉

“x”

〈exp〉

〈sym〉

“z”

“)”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 21 / 29

Page 22: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Ambiguity and Precedence

Exercise: Arithmetic

Grammar

〈e〉 → 〈e〉“+”〈e〉| 〈e〉“∗”〈e〉| “1” | “2” | “3” | . . .

Parse Tree

1 + 2 ∗ 3

〈e〉

〈e〉

“1”

“+” 〈e〉

〈e〉

“2”

“∗” 〈e〉

“3”

〈e〉

〈e〉

〈e〉

“1”

“+” 〈e〉

“2”

“∗” 〈e〉

“3”

Ambiguous: multiple valid parse trees

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 22 / 29

Page 23: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Ambiguity and Precedence

Handling PrecedenceModify Grammar

Modify Grammar

〈e〉 → 〈t〉 | 〈e〉“+”〈t〉〈t〉 → 〈n〉 | 〈t〉“∗”〈n〉〈n〉 → “1” | “2” | “3” | . . .

Parse Tree

1 + 2 ∗ 3

〈e〉

〈e〉

〈t〉

〈n〉

“1”

“+” 〈t〉

〈t〉

〈n〉

“2”

“∗” 〈n〉

“3”

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 23 / 29

Page 24: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Ambiguity and Precedence

Handling PrecedenceParser-specific

Bison Grammar

expr: expr ’+’ expr

| expr ’-’ expr

| expr ’*’ expr

| expr ’/’ expr

| num;

Bison Precedence

% left ’+’ ’-’

% left ’*’ ’/’

Directs (some) parsing algorithms to resolve ambiguity

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 24 / 29

Page 25: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Abstract Syntax

Abstract Syntax

OverviewI Data structure encoding the

program for compiler orinterpreter

I Ambiguity resolved in the parser,not stored in abstract syntax

I Abstract Syntax Tree (AST):Use algebraic data types

Conditional Type

type Exp ←| TrueExp| FalseExp| CondExp of Exp× Exp× Exp

Example

if true then false else true

CondExp

TrueExp FalseExp TrueExp

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 25 / 29

Page 26: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Abstract Syntax

Abstract Syntax Tree vs. Parse Tree

Parse Tree

Directly maps to concrete syntax andgrammar.

〈e〉

〈e〉

〈t〉

〈n〉

“1”

“+” 〈t〉

〈t〉

〈n〉

“2”

“∗” 〈n〉

“3”

Abstract Syntax Tree

Abstracts precedence, parenthesis, etc.type Exp ←| NumExp of int| AddExp of Exp× Exp| MulExp of Exp× Exp

AddExp

NumExp

1

MulExp

NumExp

2

NumExp

3

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 26 / 29

Page 27: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Abstract Syntax

Exercise: Lambda Calculus Abstract Syntax

Data Type

type Exp ←

| SymExp of string| LambdaExp of string × Exp| CallExp of Exp× Exp

AST

aλb . b c

CallExp

SymExp

a

LambdaExp

SymExp

b

CallExp

SymExp

b

SymExp

c

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 27 / 29

Page 28: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Abstract Syntax

Summary

I Formal languages: underlying theory for lexical and syntax analysis

I Grammars: representation for programming language syntax

I Abstract Syntax: the important structure of the program

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 28 / 29

Page 29: Syntax (Pre Lecture)Syntax (Pre Lecture) Dr. Neil T. Dantam CSCI-400, Colorado School of Mines Spring 2020 Dantam (Mines CSCI-400)Syntax (Pre Lecture)Spring 20201/29

Grammars Abstract Syntax

References

Hennesssy The Semantics of Programming Languages

I Ch 1.2 Concrete and Abstract Syntax

Clarkson https://www.cs.cornell.edu/courses/cs3110/2019fa/textbook/

I Ch 10.1 Lexing and Parsing

Alt. Textbook Aho, Lam, Sethi, and Ullman. Compilers: Principles, Techniques, & Tools.

I Ch 4 Syntax Analysis

Dantam (Mines CSCI-400) Syntax (Pre Lecture) Spring 2020 29 / 29