1 / 48 formal a language theory and describing semantics 886340 principles of programming languages...

1 / 48

Formal a Language Theory

and Describing Semantics

886340Principles of Programming Languages

4

2 / 48

Content

•Formal a Language Theory• Regular Expressions• Finite State Automaton (FA)

•Describing Semantics• Operation• Denotational• axiomatic

3 / 48

Review

• A grammar can be ambiguous– i.e. more than one parse tree for same string of

terminals– in a PL we want to base meaning on parse so– ambiguous parse -> ambiguous meaning ->

• Especially grammars for expressions– precedence associativity

4 / 48

Review

• Solution: encode precedence & associativity in grammar

– non-terminal for each level of precedence<term>* / <factor>

5 / 48

Review

• Context Free Grammars (CFGs) are used to specify the overall structure of a programming language:

– if/then/else, ...– brackets: ( ), { }, begin/end, ...

• Regular Grammars (RGs) are used to specify the structure of tokens:

– identifiers, numbers, keywords, ...

6 / 48


• Offers a way to describe computation problems formulated as language recognition problems

– Enables proofs of relative difficulty of certain computational problems

• Provides a mechanism to aid description of programming language constructs

– Regular expressions ~ PL tokens (e.g., keywords) - Finite state automata (FSAs)

– Context-free grammars ~ PL statements

7 / 48


• Recognizers for languages are more complex as the languages themselves become more complex

– Simple constructs correspond to FSAsKeywords, numerical constants

– More complex constructs correspond to Push-down Automata

If statements, looping statements, declarations– Even more complex constructs correspond to more complex automata

Type checking of use with declared type

8 / 48

Regular Expressions

• Formalism for describing simple PL constructs– reserved words– identifiers– numbers

• Simplest sort of structure• Recognized by a finite state automaton• Defined recursively

9 / 48

Regular Expressions

PL construct RE Notation Language

an empty { }

symbol a a {a}

null symbol {}

R,S regular exprsa,b terminals

R | S a|b (alternation)

LR LS

{a,b}

R,S regular exprsa,b terminals

RS ab (concatenation)

LRLS

{ab}

10 / 48

Regular Expressions

PL construct RE Notation LanguageR regular exprsa

R*a*

{} LR LR LR LR LR LR ...{ ,a,aa,aaa,...}

R regular exprsa

R+

a+LR LR LR LR LR LR ...

{a,aa,aaa,...}

Note: a = a = aPrecedence is {* +} ----concatenation ---- |high to low (all are left associative operators)

11 / 48

Regular Expressions

1 | 2 {1,2}1* | 2 {2, ,1,11,111,...}1 2* {1, 12, 122, 1222, ...}1 2* | 0+ {0,00,000,...,1,12,122,...}(1 | 2)* {,1,2,12,11,21,22,...}(0 | 1)+ {,1,01,001,011,111,...}

12 / 48

RE’s for PLs

• Let letter stand for a|b|c|...|z and digit stand for 0|1|2|3|4|5|6|7|8|9

– letter (letter | digit)* is identifier– digit+ is an integer constant– digit* . digit + is real number

• Which identifiers are described by– letter (letter | digit)*

? ABC 0C B% X1

Louden uses [0-9]

[a-z]([a-z] | [0-9])*

[0-9]+

[0-9]* . [0-9]+

13 / 48

Examples

• Which of the following are legal real numbers described by – digit * . digit +

? .5 1.5 2 4. 6.3 0.2

• Can see that simple PL constructs can be defined as regular expressions – Can you define a number in scientific notation as an RE? (e.g., 1.25e+20, .05e-3)

14 / 48

Finite State Automaton (FA)

• Described by<set of states, labelled transitions, start state, final

state(s)>

• Example:

15 / 48


• FA accepts or recognizes an input string if there is a path from its start state to a final state such that the labels on the path are the terminals in that string.

• What strings are recognized?

16 / 48


• Binary numbers containing a pair of adjacent 1’s:

• Recognizes: (0 | 1) * 1 1 (0 | 1) *

17 / 48


• Binary numbers containing a pair of adjacent 1’s:

• Recognizes: (0 | 1) * 1 1 (0 | 1) *• FA1 and FA2 recognize the same set of strings, i.e.,

the same language! Therefore, FAs are not unique.

18 / 48


• Exponent in scientific notation:

• Recognizes: E (+ | -) digit + | E digit +

19 / 48


• Binary numbers which begin and end with a 1:

• Recognizes: 1(0|1)*1

20 / 48


• Binary numbers containing at least one digit, in which all the 1’s precede all the 0’s:

• Recognizes: 0+ | 1+0*

21 / 48

• Recognition of a string– Is this given string in the language described

(recognized) by this given RE (FA)?

• Description of a language– Given an RE (FA), what language does it describe(recognize)?

• Codification of a language– Given a language, find an RE and an FA that

corresponds to it

Tasks for REs and FAs

22 / 48


• Recognition of a string– Given 10*, which of these strings is described by it:

1, 00, 10, 1000, 01– Given the following FA:

which of these strings is recognized by it: 1, 00, 10, 1000, 01

23 / 48


• Description of a language– What language is described by the following RE:

(01)+ | (10)+

– What language is recognized by the following FA:

24 / 48


• Description of a language– What language is recognized by the following FA:

25 / 48

Tasks for REs and FAs• Codification of a language

– Complex constants are parenthesized pairs of integers– Let digit = 0|1|2|3|4|5|6|7|8|9– The RE for complex constants is:

( digit + , digit + )– The FA for complex constants is:

26 / 48

Describing Semantics

27 / 48Copyright © 2012 Addison-Wesley. All rights

reserved.

Attribute Grammars

•Attribute grammars (AGs) have additions to CFGs to carry some semantic info on parse tree nodes

•Primary value of AGs:•Static semantics specification•Compiler design (static semantics checking)


reserved.

Attribute Grammars : Definition

•Def: An attribute grammar is a context-free grammar G = (S, N, T, P) with the following additions:•For each grammar symbol x there is a set A(x) of attribute values

•Each rule has a set of functions that define certain attributes of the nonterminals in the rule

•Each rule has a (possibly empty) set of predicates to check for attribute consistency


reserved.

Attribute Grammars: Definition

•Let X0 X1 ... Xn be a rule•Functions of the form

S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes

•Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n,

define inherited attributes

•Initially, there are intrinsic attributes on the leaves


reserved.

Attribute Grammars: An Example

•Syntax<assign> <var> = <expr><expr> <var> + <var> | <var><var> A | B | C

•actual_type: synthesized for <var> and <expr>

•expected_type: inherited for <expr>

31 / 48


32 / 48

• A parse tree of the sentence A = A + B


33 / 48


34 / 48

• The flow of attributes in the tree• In this example, A is defined as a real and B is defined as an int.


35 / 48

• A fully attributed parse tree


36 / 48

Operational Semantics

• Operational Semantics• Describe the meaning of a program by executing its

statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement

• To use operational semantics for a high-level language, a virtual machine is needed

Copyright © 2012 Addison-Wesley. All rights reserved.


reserved.

Operational Semantics

•A hardware pure interpreter would be too expensive

•A software pure interpreter also has problems•The detailed characteristics of the particular computer would make actions difficult to understand

•Such a semantic definition would be machine- dependent

•A better alternative: A complete computer simulation

•The process:•Build a translator (translates source code to the machine code of an idealized computer)

•Build a simulator for the idealized computer

38 / 48

Denotational Semantics

• Based on recursive function theory• The process of building a denotational specification

for a language: - Define a mathematical object for each language entity

-Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects

• The meaning of language constructs are defined by only the values of the program's variables


39 / 48

• The meaning of binary numbers using denotational semantics, we associate the actual meaning (a decimal number)

• The semantic function, named Mbin

Mbin('0') = 0

Mbin('1') = 1

Mbin(<bin_num> '0') = 2 *Mbin(<bin_num>)

Mbin(<bin_num> '1') = 2 *Mbin(<bin_num>) + 1

Binary Numbers

40 / 48

Decimal Numbers

<dec_num> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | <dec_num> ('0' | '1' | '2' | '3' |

'4' | '5' | '6' | '7' | '8' | '9')

Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9

Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)

Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1

…

Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9



reserved.

Axiomatic Semantics

•Based on formal logic (predicate calculus)

•Original purpose: formal program verification

•Axioms or inference rules are defined for each statement type in the language

•The logic expressions are called assertions


reserved.

Axiomatic Semantics (continued)

•An assertion before a statement (a precondition) states the relationships and constraints among variables that are true at that point in execution

•An assertion following a statement is a postcondition

•A weakest precondition is the least restrictive precondition that will guarantee the postcondition


reserved.

Axiomatic Semantics Form

•Pre-, post form: {P} statement {Q}

•examples•a = b + 1 {a > 1}•One possible precondition: {b > 10}•Weakest precondition: {b > 0}

•sum = 2 * x + 1 {sum > 1}•Weakest precondition: ???

44 / 48

• examples•a = b / 2 - 1 {a < 10}• Weakest precondition:??

•x = x + y - 3 {x > 10}• Weakest precondition: {y > 13 – x}


45 / 48

• examples•y = 3 * x + 1;•x = y + 3;•{x < 10} • The precondition for the second assignment statement is y

< 7•The precondition for •the first assignment statement•3 * x + 1 < 7•x < 2 • So, {x < 2} is the precondition of both



reserved.

Program Proof Process

•The postcondition for the entire program is the desired result•Work back through the program to the first statement. If the precondition on the first statement is the same as the program specification, the program is correct.


reserved.

Summary

•BNF and context-free grammars are equivalent meta-languages•Well-suited for describing the syntax of programming languages

•An attribute grammar is a descriptive formalism that can describe both the syntax and the semantics of a language

•Three primary methods of semantics description•Operation, denotational, axiomatic

48 / 48

Conclusionand

Question

1 / 48 formal a language theory and describing semantics 886340 principles of programming languages...

Documents

finite state automaton

e digit e digit

letter letter digit

final state

start state

identifier digit

complex constructs

regular expressions