course: ics313 fundamentals of programming languages. instructor: abdul wahid wali [email protected]...

Course: ICS313 Fundamentals of Programming Languages.

Instructor:

Abdul Wahid [email protected]

Lecturer, College of Computer Science and Engineering

University of Hail

mailto:[email protected]

Chapter 3: Describing Syntax and Semantics

- Introduction- The General Problem of Describing Syntax- Formal Methods of Describing Syntax- Attribute Grammars- Describing the Meanings of Programs: Dynamic Semantics

Objectives:

2

Who must use language definitions?-Language designers, - Implementers- Programmers (the users of the language)

- A formal description of the language is essential in learning, writing, and implementing the language. Thus description must be precise and understandable.

Introduction

A language is a set of sentences or statements. Sentences or statements are the valid strings of a language. It consists of valid alphabets sequenced in a way that is consistent with the grammar of the language.

3

- Syntax describes what the language looks like. Semantics determines what a particular construct actually does in a formal way.- Syntax is much easier to describe than semantics.

What is the meaning of Semantics?The meaning of the expressions, statements, and program units.

What is the meaning of Syntax? Its means, the form or structure of the expressions, statements and program units. The Syntax rules of a language specify which strings of characters from the language’s alphabet are in the language.

- Languages are described by their syntaxes and semantics

4

Formal descriptions of the syntax of programming languages, for simplicity sake, often do not include descriptions of the lowest-level syntactic units. These small units called lexemes.

A lexeme is basic component during lexical analysis. A lexeme consists of related alphabet from the language. (e.g. numeric literals, operators, and special words).

Lexemes are partitioned into groups, such as, the names of variables, methods, classes, and so forth, each of these groups is represented by a name, of a token. A token is a category of related lexemes, and a lexeme is an instance of a token. (e.g., identifier)

3.2 The General Problem of Describing Syntax: : Terminology

5

Lexemes Tokens

Index Identifier

= Equal_sign

2 Int_literal

* Mult_op

17 Int_literal

; semicolon

example: consider the following Java statement

index = 2 * count +17;The lexemes and tokens of this statement are represented the coming table

In general, languages can be formally defined in two distinct ways: A language can be (1) generated or (2) recognized.

6

3.2.1 Language RecognizerA recognizer of a language identifies those strings that are within a language from those that are not.The lexical analyzer and the parser of a compiler are the recognizer of the language the compiler translates. The lexical analyzer recognizes tokens, and the parser recognizes the syntactic structure.

3.2.2 Language GeneratorsA generator generates the valid sentences of a language.In some cases it is more useful than the recognizer since we can “watch and learn”.

7

3.3 Formal Methods of Describing Syntax

The formal language that is used to describe the syntax of

programming languages is called grammar.

3.3.1 BNF (Backus-Naur Form) and Context-Free GrammarsBNF is widely accepted way to describe the syntax of a programming language.

3.3.1.1 Context-Free Grammars Regular expression and context-free grammar are useful in describing the syntax of a programming language. Regular expression describes how a token is made of alphabets, and context-free grammar determines how the tokens are put together as a valid sentence in the grammar.

8

3.3 Formal Methods of Describing Syntax

3.3.1.2 Origins of Backus- Naur From (BNF)John Backus and Peter Naur used BNF to describe ALGOL 58 and ALGOL 60, and BNF is nearly identical to context-free grammar.

3.3.1.3 BNF Fundamentals• BNF is a metalanguage, i.e., a language that describes a language, which can describe a programming language.• The BNF consists of rules (or productions). A rule has a left-hand side (LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in java assignment statement the defintion could be as follows:

<assign> → <var> = <expression>

9

The LHS is a non-terminal and the RHS called terminal.

• The rule indicates that whenever you see a token on the LHS, you can replace it with the RHS, just like expanding a non-terminal into its children in a tree.

3.3.1.4 Describing Lists

Recursion is used in BNF to describe lists.

3.3.1.5 Grammars and Derivations

- BNF is a generator of the language. The sentences of a language can be generated from the start symbol by applying a series of rules on it. The process of starting from the start symbol to the final sentence is called a derivation.

10

- Replacing a non-terminal with different RHS’s may derive different sentences.- Each sentence during a derivation is a sentential form. If we replace every leftmost non-terminal of the sentential form, the derivation is leftmost. However, the set of sentences generated are not affected by the derivation order.

11

Describing Syntax and Semantics

3.3.1.5 Grammars and Derivations

- BNF is a generator of the language. The sentences of a language can be generated from the start symbol by applying a series of rules on it. The process of starting from the start symbol to the final sentence is called a derivation.

Example (3.1) a grammar for Small language<program> → begin <stmt_list> end<stmt_list> → <stmt>

| <stmt>; <stmt_list> <stmt> → <var>= <expression><var> → A| B|C<expression> → <var + var>

| <var> - <var> | <var>

12

A derivation of a program in this language follows:

- Replacing a non-terminal with different RHSs may derive different sentences.- Each sentence during a derivation is a sentential form. If we replace every leftmost non-terminal of the sentential form, the derivation is leftmost. However, the set of sentences generated are not affected by the derivation order.

<program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A= <expression>; <stmt_list> end => begin A = <var> + < var> ; <stmt_list> end

=> begin A = B+ < var> ; <stmt_list> end => begin A = B + C; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = C end

13

A = B*(A+C) is generated by the leftmost derivation:

<assign> => <id> = <expr> => A = <expr> => A = <id> * <expr> => A = B * <expr> => A = B * (<expr>) => A = B * (<id> + <expr>) => A = B * (A + <expr>) => A = B * (A + <id>) => A = B * (A + C)

Example (3.2) a grammar for a Simple Assignment Statements <assign> → <id> = <expr><id> → A| B| C |<expr> → <id> + <expr>

| <id>*<expr> | (<expr>) | <id>

14

3.3.1.6 Parse Trees: A derivation can be represented by a tree hierarchy called parse tree. The root of the tree is the start symbol and applying a derivation rule corresponds to expand a non-terminal in a tree into its children.

<assign>

A parse tree for the simple statement: A = B * (A + C)

<id> = <expr>

<id> * <expr>

( <expr> )

<id> + <expr>

A

B

A <id>

C 15

3.3.1.7 Ambiguity: A grammar is ambiguous if for a given sentence, there is more than one parser tree, i.e., there are two derivations that lead to the same sentence.

<assign>Two distinct parse trees for the same sentence, A = B +A * C

<id> = <expr>

<expr> + <expr>

<expr> * <expr>

<id>

A

<id>

A

<id>

C

B

<assign>

<id> = <expr>

<expr> * <expr>

<expr> + <expr>

<id>

A

<id>

B

<id>

C

A

Formal Method of Describing Syntax (contd.)

16

3.3.1.8 Operator Precedence: Operator Precedence can be maintained by modifying a grammar so that operators with higher precedence are grouped earlier with its operands, so that they appear lower in the parse tree.

<expr> => <expr> + const | const (unambiguous)

<expr> => <expr> + <expr> | const (ambiguous)

<expr>

<expr> + <const>

<const>

<expr> + <const>

17

Associativity of OperatorsOperator associativity can also be indicated by a grammar

- Optional parts are placed in brackets ([ ])

<proc_call> → ident [ ( <expr_list>)]

- Alternative parts of RHSs in parentheses and separate them with vertical bars

<term> → <term> (+ | -) const

- Put repetitions (0 or more) are placed inside braces ({})

<ident> → letter {letter | digit}

3.3.2 Extended EBNF

- EBNF has the same expression power as BNF

- The addition includes optional constructs (parts), repetition, and multiple choices, very much like regular expression.

18

3.3.2 Extended EBNF

BNF

<expr> <expr> + <term> | <expr> - <term> | <term>

<term> <term> * <factor> | <term> / <factor> | <factor>EBNF

<expr> <term> {(+ | -) <term>}

<term> <factor> {(* | /) <factor>}

19

3.3.4 Attribute Grammars

An attribute grammar is an extension to a context –free grammar . It is used to describe more of the structure of a programming language than can be described with context-free grammar. The extension allows certain language rules to be described, such as type compatibility.

3.3.4.1 Static Semantics- There are many restrictions on programming languages that are either difficult or impossible to describe in BNF, however, they can be described when we add attributes to the terminal/non-terminals in BNF.- These added attribute and their computation could be computed at compile-time, thus the name static semantics.

20


3.3.4.1 Static Semantics (contd.)

- Consider a rule which states that a variable must be declared before it is referenced.

-- Cannot be specified in a context-free grammar.-- Can be tested at compile time.

- Some rules can be specified in the grammar of a language, but will unnecessarily complicate the grammar.

e.g. a rule in JAVA that states that a string literal cannot be assigned to a variable which was declared to be type int.

21


3.3.4. 2 Basic concepts

An attribute grammar consists of, in addition to its BNF rules, the attributes of grammar symbols, a set of attribute computation functions (or semantic functions), and predicate functions that determine whether a rule could be applied. The latter two functions are associated with the grammar rules. Thus basic concepts are:

- Attribute grammars are grammars to which have been added attributes, attribute computation functions and predicate functions.

- Attributes associate with grammar symbols, are similar to variables in the sense that they can have values assigned to them.

22

- Attribute computation functions semantic functions, are associated with grammar rules. They are used to specify how attribute values are computed.

- Predicate function which state some of the syntax and static semantic rules of the language are associated with grammar rule.

23

3.3.4. 3 Attribute grammars defined

Definition: An attribute grammar is a context-free grammar (CFG) with the following additions:- For each grammar symbol x there is a set A(x) of attribute

values.- Each rule has a set of functions that define certain attributes

of the non-terminals in the rule.- Each rule has a set of predicates to check for attribute

consistency.

24

-Associate with each grammar symbol X is a set of attributes A(X).

- A(X) consists of two disjoint sets S(X) and I(X), called synthesized and inherited attributes

Attribute Grammars (continued)

- Synthesized attributed are used to pass semantic information up a parse tree.

- Inherited attributes pass semantic information down a parse tree.

- Let X0 X1 ……………… Xn be a rule

- Function of the form I (Xn) = f (A(X0), …A (Xn)) define synthesized attributes.

- Functions of the form I(Xj) = f(A(X0), ... , A (Xn)), for i <= j <= n, define inherited attributes

25

Attribute Grammars (continued)

Example: expressions of the form id + idid's can be either int_type or real_typetypes of the two id's must be the sametype of the expression must match it's expected type

BNF<expr> <var> + <var><var> id

Attributes:Actual_type: A synthesized attribute associated with

the non-terminals for <var> and <expr>Exceted_type: An inherited attribute associated with

the non-terminal <expr>26

An attribute grammar for simple assignment statements

1. Syntax rule: <assign> <var> = <expr>

Semantic rule: <expr>.expected_type <var>.actual_type

2. Syntax rule: <expr> <var>[2] + <var>[3]

Semantic rule: <expr>.actual_type

if (<var>[2].actual_type =int) and

(<var>[3].actual_type =int)

then int

else real

end if

Predicate: <expr>.actual_type = <expr>.expected_type

27

An attribute grammar for simple assignment statements

3. Syntax rule: <expr> <var>

Semantic rule: <expr>.actual_type <var>.actual_type

Predicate: <expr>.actual_type = <expr>.expected_type

4. Syntax rule: <var> A | B | C

Semantic rule: <var>.actual_type look-up (<var>.string)

The look-up function looks up a given name in symbol table and returns the variable’s type.

28

Describing the meaning of Programs: Dynamic Semantics

- There is no single widely acceptable notation or formalism for describing semantics- The dynamic semantics of a program is the meaning of its expressions, statements, and program units. Dynamic semantic determines the meanings of programming constructs during their execution.

- Accurately describing semantics is essential so that...- ...users writing a program can precisely understand how the various language constructs work.- ...compilers will be implemented consistently with respect to one another.

- Three methods that are used to describe semantics formally:- Operational semantics- Axiomatic semantics-Denotational semantics

29

1- Operational Semantics

- Describe the meaning of a program by executing its statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement

- The for loop in C++ or Java...

for (expr1; expr2; expr3) { stmt}

30

2- Axiomatic Semantics

- Axiomatic semantics defined in conjunction with the development of a method to prove the correctness of a program.- Axiomatic semantics is based on mathematical logic. The logical expressions are called predicated, or assertions. - An assertion immediately following a statement describes a new constraints on those variables after execution of the statement.

31

- These assertions are called the precondition and post condition.- Developing an axiomatic description or proof of a given program requires that every statement in the program have both a precondition and a post condition.

- Based on formal logic (first order predicate calculus)- Original purpose: formal program verification- Approach: Define axioms or inference rules for each statement type in the language (to allow transformations of expressions to other expressions)- The expressions are called assertions

An assertion before a statement (a precondition) states the relationships and constraints among variables that are true at that point in execution

- An assertion following a statement is a post-condition

32

Example : Show that the program segmenty := 2z := x + y ,is correct with respect to the initial assertion {P} : x = 1 and the final assertion{Q}: z = 3.

Solution: Suppose that p is true, so that x=1 as the program begins.Then y is assigned 2 and z is assigned the sum of x and y, which is 3. Hence program segment is correct..

x y z1 2 1 + 2 = 3 , thus : {P} statement {Q} is true.

33

3 – Denotational Semantics

- Based on recursive function theory- The most abstract semantics description method-Originally developed by Scott and Strachey (1970)

- The process of building a denotational specification for a language (not necessarily easy):

-Define a mathematical object for each language entity

-Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects

- The meaning of language constructs are defined by only the values of the program's variables

- is the most rigorous, widely known method for describing the meaning of programs

34

The difference between denotational and operational

semantics:

In operational semantics, the state changes are defined by

coded algorithms; in denotational semantics, they are defined

by rigorous mathematical functions

The state of a program is the values of all its current variables

s = {<i1, v1>, <i2, v2>, …, <in, vn>}

- Let VARMAP be a function that, when given a variable name

and a state, returns the current value of the variable

VARMAP(ij, s) = vj

35

(e.g.) Decimal Numbers<dec_num> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | <dec_num> ('0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9')

Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9

Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)

Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1…

Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9

1-36

1-37

SummaryBNF and context-free grammars are equivalent meta-

languagesWell-suited for describing the syntax of programming

languagesAn attribute grammar is a descriptive formalism that

can describe both the syntax and the semantics of a language

Three primary methods of semantics descriptionOperation, axiomatic, denotational------------------------------------END------------------------------------------------

course: ics313 fundamentals of programming languages. instructor: abdul wahid wali [email protected]...

Documents