csi 3125, grammars, page 1 language description methods major topics in this part of the course:...

37
CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: – Syntax and semantics – Grammars – Axiomatic semantics (next handout)

Upload: reynard-bishop

Post on 18-Jan-2016

250 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 1

Language description methods

• Major topics in this part of the course:

– Syntax and semantics

– Grammars

– Axiomatic semantics (next handout)

Page 2: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 2

Syntax and semantics

• Points to discuss:

– The form and meaning of programming languages

– Types of processing

– Types of languages

Page 3: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 3

Syntax

• The syntax of a language determines how programs are built from elementary units (keywords, identifiers, numbers, brackets, and so on).

• A syntactically correct program may still not be acceptable, or it may work in a way that we do not want (or do not expect).

• Formal syntax is a system for describing the structure of programs exactly.

– Such systems include grammars, BNF, syntactic diagrams (syntax graphs).

Page 4: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 4

Grammars

• There are infinitely many different programs, but every program is finite and must be recognized in finite time.

• A grammar should allow a finite description of a usually infinite language.

Page 5: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 5

Semantics

• Semantics of a language determines the meaning of elementary units and their combinations:

–how does the meaning of a program derive from the meaning of its fragments?

• The effect of a compound statement (such as a loop, an "if") should depend only on the effect of the elementary statements (such as an assignment).

Page 6: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 6

Methods of semantic description

• Operational semantics

– Simple lower-level operations explain how higher-level statements are performed.

• Denotational semantics

– A program computes a function, a mappingdata results

• Axiomatic semantics

– A program establishes a relationdata results

Page 7: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 7

Lexical analysis

• Lexical analysis pre-processes a file that contains a source program:

– recognize units larger than single characters (keywords, predefined names, identifiers, numbers, brackets, operators, and so on).

– remove white space.

• This helps make translators simpler, by keeping low-level details out.

Page 8: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 8

Syntactic analysis

• Syntactic analysis, based on grammars, can mean two things:

– Recognition (the program is / is not correct)

– Parsing (a representation of the syntactic structure is built for correct programs).

• Syntactic analysis is the essential part of any implementation of a programming language.

• By the way, syntactic generation, also based on grammars, is the flip side of analysis: it runs from a syntactic structure to a source. Important in language technology, not much in programming languages.

Page 9: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 9

What is a language?

• A language is a set of sentences.

• A sentence is a sequence of elementary pieces, built according to certain rules (usually grammar rules).

• In a natural language, sentences have the usual meaning.

• In programming languages, various syntactic unit can be considered as "sentences".

– For example, in a set of all expressions, each valid expression is a sentence.

– In a set of all programs, each valid complete program is a sentence. And so on.

Page 10: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 10

A hierarchy of formal languages

Formal languages are classified on their complexity.

A four-level hierarchy, from the simplest to the most complicated:

regular <

context-free <

context-sensitive <

recursively enumerable.

Grammars too are classified in this way.

Page 11: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 11

• Programming languages usually have:

– context-free syntax,

– context-sensitive semantics.

• Context-freeness (important in syntactic analysis) means that a fragment we are analyzing does not depend on any other fragments of the program.

– for example, an occurrence of a variable is not related to its declaration;

– a message to a method is analyzed separately of the definition of this method.

A hierarchy of formal languages (2)

Page 12: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 12

Formal grammars

• Points to discuss:

– Concepts of formal grammars

– A sample grammar in BNF

– Derivations and parse trees

– Ambiguity in grammars

Page 13: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 13

A formal grammar has four components

• Terminal symbols = language elements (for example, variable names in Java, or English words).

• Non-terminal symbols = auxiliary symbols, denoting classes of constructions (for example: loop_statement, Boolean_expression).

• The goal (start) symbol denotes any sentence.

• Productions = rewriting rules ("this structure has such and such components") used to recognize or generate sentences.

Page 14: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 14

Two ways of rewriting

• From the start symbol, produce more and more specific approximations of a sentence, replacing non-terminals with their definitions;

• Reduce the sentence into more and more general forms, replacing definitions with non-terminals, and reach the goal symbol (the same!).

• Productions are what makes a grammar regular, context-free or context-sensitive.

Page 15: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 15

Example: a grammar of expressions

• Seven terminal symbols:

+ - * ( ) x y

• Four non-terminal symbols:

‹expr› ‹term› ‹factor› ‹var›

These names are what we choose: writing a grammar is not different from writing a program. The names are meant to help us read the grammar.

• Start/goal symbol:

‹expr›

Page 16: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 16

Notation

• Angle brackets distinguish non-terminal from terminal symbols. (It's like distinguishing strings and keywords in Java: "class" and class are different.)

• LHS RHS means:

"the Left-Hand Side consists of things on the Right-Hand Side".

• The bar | separates alternative Right-Hand Sides with the same Left-Hand Side.

Page 17: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 17

Productions of our grammar

‹expr› ‹term› |‹expr› + ‹term› |‹expr› - ‹term›

‹term› ‹factor› |‹term› * ‹factor›

‹factor› ‹var› | ( ‹expr› )

‹var› x | y

Page 18: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 18

Top-down and bottom-up

• For any sentence , productions can be applied in two directions.

• Top-down:

– Derive from the start symbol.

– will then be an example of an expression.

• Bottom-up:

– Fold into the initial symbol.

Page 19: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 19

• Let us take the sequence of terminal symbols:

( x - y ) * x + y

[It is an expression, but we must first show that it is.]

• Consider two derivations (the next two pages).

• On each line, the highlighted part is involved in rewriting into the next line, according to some grammar production.

Derivations

Page 20: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 20

A top-down derivation‹expr› ‹expr› + ‹term› ‹term› + ‹term› ‹term› * ‹factor› + ‹term› ‹factor› * ‹factor› + ‹term› ( ‹expr› ) * ‹factor› + ‹term› ( ‹expr› - ‹term› ) * ‹factor› + ‹term› ( ‹term› - ‹term› ) * ‹factor› + ‹term› ( ‹factor› - ‹term› ) * ‹factor› + ‹term› ( ‹var› - ‹term› ) * ‹factor› + ‹term› ( x - ‹term› ) * ‹factor› + ‹term› ( x - ‹factor› ) * ‹factor› + ‹term› ( x - ‹var› ) * ‹factor› + ‹term› ( x - y ) * ‹factor› + ‹term› ( x - y ) * ‹var› + ‹term› ( x - y ) * x + ‹term› ( x - y ) * x + ‹factor› ( x - y ) * x + ‹var› ( x - y ) * x + y

Page 21: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 21

A bottom-up derivation( x - y ) * x + y ( ‹var› - y ) * x + y ( ‹factor› - y ) * x + y ( ‹term› - y ) * x + y ( ‹expr› - y ) * x + y ( ‹expr› - ‹var› ) * x + y ( ‹expr› - ‹factor› ) * x + y ( ‹expr› - ‹term› ) * x + y ( ‹expr› ) * x + y ‹factor› * x + y ‹term› * x + y ‹term› * ‹var› + y ‹term› * ‹factor› + y ‹term› + y ‹expr› + y ‹expr› + ‹var› ‹expr› + ‹factor› ‹expr› + ‹term› ‹expr›

Page 22: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 22

And both side by side

( x - y ) * x + y ( ‹var› - y ) * x + y ( ‹factor› - y ) * x + y ( ‹term› - y ) * x + y ( ‹expr› - y ) * x + y ( ‹expr› - ‹var› ) * x + y ( ‹expr› - ‹factor› ) * x + y ( ‹expr› - ‹term› ) * x + y ( ‹expr› ) * x + y ‹factor› * x + y ‹term› * x + y ‹term› * ‹var› + y ‹term› * ‹factor› + y ‹term› + y ‹expr› + y ‹expr› + ‹var› ‹expr› + ‹factor› ‹expr› + ‹term› ‹expr›

‹expr› ‹expr› + ‹term› ‹term› + ‹term› ‹term› * ‹factor› + ‹term› ‹factor› * ‹factor› + ‹term› ( ‹expr› ) * ‹factor› + ‹term› ( ‹expr› - ‹term› ) * ‹factor› + ‹term› ( ‹term› - ‹term› ) * ‹factor› + ‹term› ( ‹factor› - ‹term› ) * ‹factor› + ‹term› ( ‹var› - ‹term› ) * ‹factor› + ‹term› ( x - ‹term› ) * ‹factor› + ‹term› ( x - ‹factor› ) * ‹factor› + ‹term› ( x - ‹var› ) * ‹factor› + ‹term› ( x - y ) * ‹factor› + ‹term› ( x - y ) * ‹var› + ‹term› ( x - y ) * x + ‹term› ( x - y ) * x + ‹factor› ( x - y ) * x + ‹var› ( x - y ) * x + y

Page 23: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 23

• In both derivations, guessing is required: which production should we choose to apply next?

• Strategies of choice are at the heart of parsing algorithms. Ideally, we would always guess correctly. Less ideally, we may have to try a production, fail, and return to try another.

• Both processes recognize the given sequence of symbols

( x - y ) * x + y

as an expression that is well-formed according to our grammar.

Is it really so easy?

Page 24: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 24

Note that we do not show in this tree the order in which productions have been applied during derivations.

The results of both derivations can be summarized in the same parse tree (or abstract syntax tree).

Parse trees<expr>

<expr> <term>+

<term>

<term> <factor>*

( )<expr>

<expr> <term>—

<term> <factor>

<factor> <var>

<factor>

<var>

y

x

x

<var>

y

<factor>

<var>

Page 25: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 25

Ambiguity

• A grammar is ambiguous when an expression defined by this grammar has more than one structurally different parse tree.

• For example, here is a grammar of arithmetic expressions:

‹E› ‹E› + ‹E› | ‹E› * ‹E› | ‹N›where ‹N› denotes any unsigned integer.

• The expression 6 * 17 + 23 has two different derivation trees.

Page 26: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 26

Two different parse trees...

<E>

<E> <E>*

<E> <E>+<N>

6 <N>

17

<N>

23

<E>

<E> <E>+

<E> <E>* <N>

23<N>

6

<N>

17

‹E› ‹E› + ‹E› | ‹E› * ‹E› | ‹N› 6 * 17 + 23

Page 27: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 27

... and their meaning...

• These trees represent two different ways of computing the value of the expression!

• Ambiguity should be avoided.

*

+6

17 23

+

* 23

6 17

6 * ( 17 + 23 ) ( 6 * 17 ) + 23

Page 28: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 28

• In our previous example, we should have written the usual two-level definition instead of a definition with + and * at the same level.

An expressions ‹E› is a sum of terms ‹T›.

A term is a product of numbers ‹N›.

‹E› ‹T› | <E> + <T>

‹T› ‹N› | <T> * <N>

... and what to do with ambiguity

Page 29: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 29

A long phrase...

• the dog

• the dog that chased the cat

• the dog that chased the cat that caught the mouse

• the dog that chased the cat that caught the mousethat chewed the shoe

• the dog that chased the cat that caught the mousethat chewed the shoe that squashed the fruit

• the dog that chased the cat that caught the mousethat chewed the shoe that squashed the fruitthat stained the chair

and so on...

Examples

Page 30: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 30

... a grammar of long phrases

<long phrase> the <noun> | the <noun> that <verb> <long phrase><noun> cat | chair | dog | fruit | mouse | shoe | ...<verb> caught | chased | chewed | squashed | stained | ...

Examples

Page 31: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 31

A clause...

the dog that chased the cat

that caught the mouse that chewed the shoe

that squashed the fruit that stained the chair

grabbed the sausage that tempted the wolf

that fought the fox that scared the squirrel

that bit the twig that cracked the nut

that hit the boy that lifted the hat

Examples

Page 32: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 32

... a grammar of clauses

<clause>

<longPhrase> <verb> <longPhrase>

<longPhrase>

the <noun>

the <noun> that <verb>

<longPhrase>

<noun> boy | cat | ...

<verb> bit | caught | ...(Add maybe 1500 rules and you will have

a reasonable grammar of English. )

Examples

Page 33: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 33

Simple lists in Scheme...

A list is either empty:

()

or it is a sequence of elements separated by blank spaces, all enclosed in parentheses:

( element ... element )

Each element is either a list, or an atom. An atom is an identifier made of small letters. We assume that a scanner converts a text on input into a sequence of tokens—atoms and parentheses.

Example: ( ab ( xyz br ) () ( no ) yes )

Examples

Page 34: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 34

... a grammar of lists

<list> () | ( <elements> )

<elements> <element> |

<element> <elements>

<element> <atom> | <list>

<atom> <letter> |

<letter> <atom>

<letter> a | b | c | ... | z

Examples

Page 35: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 35

A flower garden...

We have four kinds of things in our garden:

a wall

a large flower

a small flower

a house

Starting from the left, a garden has a wall, then at least one large flower, another wall, some small flowers (more than we have large ones) and finally a house.

Examples

Page 36: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 36

... and a few examples ...

a garden?

a garden?

a garden?

a garden?

Examples

Page 37: CSI 3125, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next

CSI 3125, Grammars, page 37

... a grammar of gardens

<garden>

<largeWallSmall> <moreSmall> <largeWallSmall>

|

<largeWallSmall> <moreSmall>

| <moreSmall>

Examples