cs 330 programming languages 09 / 13 / 2007 instructor: michael eckmann

31
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Post on 20-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

CS 330Programming Languages

09 / 13 / 2007

Instructor: Michael Eckmann

Page 2: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Today’s Topics• Questions/comments?• Syntax & Semantics

Page 3: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Now Chapter 3 ...

• Describing a language– How to give it a clear and precise definition so that

implementers (compiler writers) will get it right– How to describe it to users (programmers) of the

language

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 4: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax and Semantics

• Describing a language involves both it's syntax and semantics

• Syntax is the form, semantics is the meaning– e.g. English language example:

• Time flies like an arrow.– Syntactically correct but has 3 different meanings (semantics)

• Easier to describe syntax formally than it is to describe semantics formally

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 5: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax

• A language is a set of strings (or sentences or statements) of characters from an alphabet.

• Lexemes vs. tokens– tokens (more general) are categories of lexemes

(more specific)– e.g. Some tokens might be: identifier, int_literal,

plus_op– e.g. Some lexemes might be: idx, 42, +

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 6: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax

Lexemesidx

=

42

+

Count

;

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Tokensidentifier

equal_sign

int_literal

plus_op

identifier

semicolon

idx = 42 + count;

Page 7: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• Recognizers and generators are used to define languages.

• Generators generate valid programs in a language.

• Recognizers determine whether or not a program is in the language (valid syntactically.)

• Generators are studied in Chapter 3 (stuff coming up next in this lecture) and recognizers (parsers) in Chapter 4.

• How many valid programs are there for some particular language, say Java?

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 8: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• Context Free Grammars (CFGs) developed by Noam

Chomsky are essentially equal to Backus-Naur Form (BNF) by Backus and Naur.

• They are used to describe syntax.

• These are metalanguages (languages used to describe languages.)

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 9: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• A Context Free Grammar is a four-tuple

(T, N, S, P) where– T is the set of terminal symbols– N is the set of non-terminal symbols– S is the start symbol (which is one of the non-

terminals)– P is the set of productions of the form:

• A -> X1 . . . XM where– A element of N

– Xi element of N U T, 1 <= i <= m, m>=0

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 10: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax

• How are CFGs used to describe the syntax of a programming language?– The nonterminals are abstractions– The terminals are tokens and lexemes– The productions are used to describe programs,

individual statements, expressions etc.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 11: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• Example production:

<while_stmt> while ( <logic_expr> ) <stmt>

• Everything to the left of the arrow is considered the left-hand side, LHS, and to the right the RHS.

• The only thing that can appear on the LHS is one nonterminal.

• Multiple RHS's for a LHS are separated by the | or symbol, e.g.

<compound_stmt> <single_stmt> ;

| { <stmt_list> }

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 12: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• Recursion is allowed in productions, e.g.

<ident_list> ident

| ident, <ident_list>

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 13: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• An example grammar:

<program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term>

| <term> - <term> <term> <var> | const

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 14: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax• Derivations are repeated applications of production rules.

• An example derivation:

<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 15: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax

• Every string of symbols in the derivation is a sentential form

• A sentence is a sentential form that has only terminal symbols

• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded in each step of the derivation.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 16: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Syntax / Parse Trees

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

• A hierarchical representation of a derivation (parse trees also hold some semantic information)

<program>

<stmts>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>

Page 17: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

An Ambiguous Expression Grammar

<expr> <expr> <op> <expr> | const

<op> / | -

<expr>

<expr> <expr>

<expr> <expr>

<expr>

<expr> <expr>

<expr> <expr>

<op>

<op>

<op>

<op>

const const const const const const- -/ /

<op>

Page 18: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

This one is now unambiguous• Ambiguity is bad for compilers, so the language

description should be unambiguous.

<expr> <expr> - <term> | <term>

<term> <term> / const | const

• Compiler examines parse tree to determine the code to generate. Two parse trees for the same syntax causes the meaning (semantics) of the code to not be unique.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 19: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Ambiguous?• Look at the if statement rules below

<if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>

<stmt> <if_stmt> | ...

• Do you think this is ambiguous? That is, can more than one parse tree be generated from the same code?

• if (a==b) then if (c==d) then print_something() else print_something_else()

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 20: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Ambiguous?if (a==b) then if (c==d) then print_something() else

print_something_else()

if (a==b) then

if (c==d) then

print_something()

else

print_something_else()

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

if (a==b) then

if (c==d) then

print_something()

else

print_something_else()

Page 21: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Ambiguous?• To make it unambiguous take a look at page 131

in our text.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 22: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Extended BNF• So far the examples we've seen have used BNF. Common

extensions to BNF include:

• Use of square brackets [ ] to enclose optional parts of

RHS's.• Use of braces { } to enclose parts of RHS's that can be

repeated indefinitely or left out. That is, the part in the

braces may be repeated 0 or more times.• Use of parentheses ( ) around a group of items of which

one is chosen. The items are seperated by |.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 23: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Extended BNF• It should be obvious that these new symbols are not

terminal symbols in the language nor are they non-terminals.

• If the language does require brackets, braces or parentheses

as terminal symbols (as many languages do) they have to

be denoted in some way like underlining them to

differentiate them from the EBNF symbols.

• What good are these extensions?

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 24: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

CFG's and Recognizers• Given a formal description of a language, a recognizer

(syntax analyzer, aka parser) for that language can be algorithmically constructed. Therefore a program can be written to do this. yacc is an example of one.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 25: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

A more complex grammar• Let's take a look at the handout for the mini-pascal

language.• Let's try to determine if some programs are syntactically

correct.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 26: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Attribute Grammars• An attribute grammar is an extension to a CFG.• There are some rules of programming languages that cannot be

specified in BNF (or by a CFG for that matter.)• e.g. All variables must be declared before they are used.• Also, there are things that are possible, but just too hairy to

specify using CFG's, so Attribute Grammars are used.

• These kinds of things that cannot be specified using CFGs are termed “static semantics.”

• This is a bit of a misnomer because they are really still syntax rules not semantics.

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

Page 27: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Attribute Grammars

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

• An attribute grammar is a CFG (S, N, T, P) with the following additions:– For each grammar symbol x there is a set A(x) of

attribute values– Each rule has a set of functions that define certain

attributes of the nonterminals in the rule– Each rule has a (possibly empty) set of predicates to

check for attribute consistency

• Proposed by Knuth in 1968.

Page 28: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Attribute Grammars

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

• The example on page 138 shows the use of an attribute grammar to enhance the BNF of an assignment statement with rules that specify the allowable types that can be assigned to each other.

• e.g. A float (real) cannot be assigned to a variable whose type is int. But the opposite is allowed.

• Also, the example shows how one can determine the resulting type of an expression.

Page 29: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Attribute Grammars

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

• I'm not concerned with us knowing all the ins and outs of attribute grammars, but what I feel is important is the general concepts involved and the intended purpose of them.

• Attribute grammars are generally not used in practice for a few reasons. Can you guess them?

Page 30: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Attribute Grammars

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

• Attribute grammars are generally not used in practice for a few reasons. Can you guess them?

– Size and complexity of the grammar will be high for a typical modern programming language

– The many attributes and rules that need to be added cause the grammar to be difficult to read and write, formally

– The attribute values during parsing would be costly to evaluate (the way it is described in the text.)

• So, in practice less formal ways are used to check for “static semantics” at compile-time but the ideas are the same.

Page 31: CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann

Practice Problems

Michael Eckmann - Skidmore College - CS 330 - Fall 2007

• Before moving on to our discussion of formally describing the Semantics of a language, let's take a look at problem 8, 10 and 11 on pages 163-164.