processadors de llenguatge ii - etic upfrramirez/pl2/l01.pdf · practicas/seminarios: 40% ... en...

Processadors de Llenguatge II

Rafael Ramirez [email protected]

55.316

Temas

Sintaxis Semántica Verificación formal Compilación de programas Programación Lógica Programación Funcional Programación Concurrente

Si hay tiempo… Programación Imperativa Programación Orientada a Objectos

Evaluación

Examen teoría: 60% Practicas/seminarios: 40% Practicas = (practicas + examen)/2

Participación

*Se han de aprobar tanto examen (teoria y practicas) como practicas.

Libros

Programming Languages Design and Implementation Terrence Pratt and Marvin Zelkowitz Prentice Hall (2001)

Concepts of Programming Languages Robert Sebesta Addison Wesley (2002)

Modern compiler implementation in ML Andrew W. Appel

Cambridge University Press (1997)

Contacto

Clase Prácticas Email [email protected] Horas oficina: martes 12-2.30pm Anuncios, documentacion, … http://www.dtic.upf.edu/~rramirez/PL2/

Prácticas/seminarios

Seminarios (3): grupos de 2-4

Prácticas (6): grupos de 2


Introducción Pratt 1.2, 1.3.1, 1.3.2

Programming Languages

Main issues in language design, impl. and usage Syntax -- What a program looks like Semantics -- What a program means → programming paradigm Implementation -- How a program executes Verification – is the program 100% correct?


A programming paradigm is a fundamental style of computer programming for solving specific software engineering problems.

Paradigms differ in the concepts and abstractions used to represent the elements of a program (such as objects, functions, variables, constraints...) and the steps that compose a computation (assignation, evaluation,...).

A programming language can support multiple paradigms. For example programs written in C++ can be purely procedural, or purely object-oriented, or contain elements of both paradigms.

P.e., in object-oriented programming, programmers can think of a program as a collection of interacting objects, while in functional programming a program can be thought of as a sequence of stateless function evaluations.


Which language for a specific problem? Pascal, C -- procedural, statement

oriented C++, Java, Smalltalk -- Object oriented ML, Lisp -- Functional Prolog -- Logic-based

Language expressiveness

Existe un programa que puede ser escrito en un lenguaje para el cual no hay realmente programa equivalente en alguno de los otros lenguajes?

e.g. existe un programa que puede ser escrito en Prolog para el cual no haya equivalente en Pascal?

Language expressiveness

Church’s thesis Any computable function can be

computed by a Turing machine.

Why do we need all these different languages?

Language Goals

1950s--1960s Compile programs to execute efficiently. Programmers cheap; Machines expensive. Direct connection between language features and

hardware - goto statements, ...

Today CPU power and memory very cheap; programmers

expensive. Compile programs that are built efficiently Direct connection between language features and design

concepts - encapsulation, inheritance, ...

Attributes of a good language

Clarity, simplicity – a language should provide a clear, simple and unified set of concepts to be used as primitives to develop algorithms.

Orthogonality -every combination of features is meaningful – easier to remember if few exceptions

Naturalness for the application - program structure reflects the logical structure of algorithm

Support for abstraction - program data reflects problem being solved

Attributes of a good language (cont)

Ease of program verification - verifying that program correctly performs its required function

Programming environment - external support for the language

Portability of programs - transportability of the resulting programs from the computer on which they are developed to other computer systems

Cost of use - program execution, program translation, program creation, and program maintenance

Language paradigms Imperative programming Basic concept is machine state Program consists of a sequence of statements:

Stat1; Stat2; ...

Execution of each statement to change state, i.e. change the value of one or more locations in memory.

Functional programming Idea is to find a function that produces the answer, i.e.

what is the function that we must apply to the initial state to get the answer.

Function composition is major operation: f1(f2(f3(X))) Programming consists of building the function that

computes the answer

Language paradigms (cont) Logic programming Based on predicate logic State what the problem is, as opposed to how to solve the problem. Execution is deduction. father(X,Y), father(Y,Z) grandfather(X,Z)

Object-oriented programming Data Objects are built Limited set of functions operate on these data Complex objects are designed as extensions of simpler objects

(inheritance) Syntax: Set of objects (classes) containing data and methods.

Language paradigms (cont)

Concurrent programming Program = set of processes Each process execute sequentially Processes execute concurrently Processes communicate and synchronize their

execution. Involve safety and progress properties.


Language Syntax Pratt 3.1, 3.3.1

Program structure

Language = Syntax + Semantics

The syntax of a language is concerned with the form of a program: how expressions, commands, declarations etc. are put together to result in the final program.

The semantics of a language is concerned with the meaning of a program: how the programs behave when executed on computers.

Program structure

Just like in natural language:

Semantics (Meaning) You have in your mind, to express a conditional, when true do

something, when false, do something else. Syntax (Form/Structure)

C: if (condition) { …part1… } else { …part2… }

Pascal: if condition begin…part1…end else begin…part2…end

LISP: if (condition) (part1) (part2))

Program structure

Syntax What a program looks like Useful notations for describing PL’s syntax:

BNF grammars (context free grammars) Regular grammars

Semantics What the program means Static semantics - Semantics determined at compile time:

var A: integer; Type and storage for A Dynamic semantics - Semantics determined during execution:

X = ``ABC'' X a string; value of X Useful notations:

Informal: English documents (e.g. ref manuals, etc.) Formal: operational semantics, axiomatic semantics, etc.

BNF grammars Definition: (Some preliminaries)

1. An alphabet is a set whose elements are called symbols.

2. A word (or string) w, over an alphabet A is a finite sequence of symbols from the alphabet. i.e. w ∈An where n ∈ N. When n=0, the word is called an empty word/string and is denoted as ε.

3. The set of words of length n over an alphabet A, is denoted as An.

4. The set of words over an alphabet A, is denoted as A* where:

A* = ∪ An n∈N

BNF grammars Definition:

1. A context-free grammar G, is a quadruple (V,Σ,R,S) where:

a. V is an alphabet b. Σ is a set of terminals, where Σ ⊆ V

(V – Σ is known as the set of non-terminals) c. R is a set of production rules such that R ⊆ (V – Σ) x V* d. S is the start symbol, where S ∈ (V – Σ)

2. The language generated by grammar G (denoted L(G)), is the set of words over Σ, which can be produced by R.

BNF grammars Nonterminal: A finite set of symbols:

<sentence> <subject> <predicate> <verb> <article> <noun>

Terminal: A finite set of symbols: the, boy, girl, ran, ate, cake

Start symbol: One of the nonterminals: <sentence>

Backus-Naur form (BNF) grammars

Rules (productions): A finite set of replacement rules: <sentence> ::= <subject> <predicate> <subject> ::= <article> <noun> <predicate>::= <verb> <article> <noun> <verb> ::= ran | ate <article> ::= the <noun> ::= boy | girl | cake

Replacement Operator: Replace any nonterminal by a right hand side value using any rule (written ⇒)

Example BNF sentences <sentence> ⇒ <subject> <predicate> First rule ⇒ <article> <noun> <predicate> Second rule ⇒ the <noun> <predicate> Fifth rule ... ⇒ the boy ate the cake

Also from <sentence> you can derive ⇒ the cake ate the boy Syntax does not imply correct semantics

Note: Rule <A> ::= <B><C> This BNF rule also written with equivalent (CFG) syntax: A → BC

Languages Any string derived from the start symbol is a

sentential form.

A language generated by grammar G (written L(G) ) is the set of all strings over the terminal alphabet (i.e., sentences) derived from start symbol.

That is, a language is the set of sentential forms containing only terminal symbols.

Derivations A derivation is a sequence of sentential forms starting

from start symbol.

Grammar: B → 0B | 1B | 0 | 1 Derivation: B ⇒ 0B ⇒ 01B ⇒ 010 Each derivation step is the application of a production

rule.

Derivations Derivations may not be unique S → SS | (S) | () S ⇒ SS ⇒(S)S ⇒(())S ⇒(())() S ⇒ SS ⇒ S() ⇒(S)() ⇒(())() Different derivations but get the same

parse tree

Parse tree A parse tree is a hierarchical synt. structure Internal node denote non-terminals Leaf nodes denote terminals. Grammar: B → 0B | 1B | 0 | 1 Derivation: B ⇒ 0B ⇒ 01B ⇒ 010 From derivation get parse tree as shown in the right.

Grammar V = {…}

Σ = {…}

R = {…}

S = {…}

String 19a0c6duw

Tree

DERIVATION

Tree

PARSING

Ambiguity But from some grammars you can get 2 different parse trees for the same string: ()()(). Each corresponds to a unique derivation: S ⇒ SS ⇒ SSS ⇒()SS ⇒()()S ⇒()()()

A grammar is ambiguous if some sentence has 2 distinct parse trees. We desire unambiguous grammars to understand semantics.

Why Ambiguity is a problem Most common cause is due to infix binary operations

<expr> ::= <num> | <expr> – <expr> Grammar 1 – 2 – 3 String

<expr>

<expr> – <expr>

<expr> – <expr> <num>

<num> <num>

1 2

3

<expr>

<expr> – <expr>


<num> <num>

2 3

1

Parse

Which One?

Why Ambiguity is a problem It may be true that (1-2)-3 = 1-(2-3) and different parse trees doesn’t

matter. Such cases, are not true in general, and we must assume the more conservative position: different parse trees have different meaning.

<expr> ::= <num> | <expr> – <expr> Grammar 1 – 2 – 3 String

Parse

<expr>

1 2

–

<expr>

3

<expr>


<num> <num>

<expr>

2 3

–

<expr>

1

<expr>


<num> <num>

(1-2)-3 1-(2-3)

Which One? Different Parse Trees,

Different Meaning!

Why Ambiguity is a problem <expr> ::= <num> | <expr> – <expr> ambiguous

To make it unambiguous, I want the ‘–’ to be… Left associative:

<expr> ::= <num> | <expr> – <num>

Right Assocative: <expr> ::= <num> | <num> – <expr>

Why Ambiguity is a problem Several programming languages allow 2 forms of

conditional statements: 1. If boolean_exp then statement1 2. If boolean_exp then statement1 else statement2

If boolean_exp1 then if boolean_exp2 then S1 else S2

Exercise 1 Is the following grammar (of arithmetic expressions) an

ambiguous grammar ? Try to construct a derivation with two different parse trees.

<E> ::= <E> + <E> <E> ::= <E> *<E> <E> ::= ( <E> ) <E> ::= I

Exercise 1 - Answer <E> ::= <E> + <E> <E> ::= <E> *<E> 2 + 3 * 4 <E> ::= i

E E

E

+

I *

I I

2 3 4

E E

I + I I

2 3 4

E + *

To make it unambiguous (Case 1) + to be of a higher precedence than *

<expr> ::= <expr2> | <expr2> * <expr> <expr2> ::= <num> | <num> + <expr2>

1+2*3

<expr>

<expr2> * <expr>

<num> + <expr2>

<num> 1

2

<expr2>

<num>

3 (1+2)*3

Design of Rules From semantics

(what you want it to mean)… eg. “I want to express the idea of addition”

To abstract syntax (what components you need to capture the meaning)… eg. “I need an operator and two operands”

To concrete syntax … <expr> ::= <num> | <expr> + <expr>

To unambiguous concrete syntax. <expr> ::= <num> | <expr> + <num>

Unambiguous concrete syntax essential for a parser (parser takes a string, returns a syntax tree)

Implementation of Rules Divide the BNF rules into 2 groups:

Lexical rules Syntactic rules

Lexical rules are rules which deal with grouping individual symbols into meaningful words.

Syntactic rules are rules, which deal with the structure of the overall sentence.

Example <E> ::= <E> + <E> | <E> - <T> | <T> <T> ::= <T> * <F> | <T> / <F> | <F> <F> ::= <name> | <num> | (<E>) <name> ::= <alpha> | <alpha><alphanum> <alphanum> ::= <alpha><alphanum> | <digit><alphanum> | <alpha> | <digit> <num> ::= <digit> | <digit><num> <alpha> ::= a | b | c | … | A | B | C | … | Z <digit> ::= 0 | 1 | 2 | … | 9

Example <E> ::= <E> + <E> | <E> - <T> | <T> <T> ::= <T> * <F> | <T> / <F> | <F> <F> ::= <name> | <num> | (<E>) <name> ::= <alpha> | <alpha><alphanum> <alphanum> ::= <alpha><alphanum> | <digit><alphanum> | <alpha> | <digit> <num> ::= <digit> | <digit><num> <alpha> ::= a | b | c | … | A | B | C | … | Z <digit> ::= 0 | 1 | 2 | … | 9

Syntactic Rules

Lexical Rules

Implementation of Rules Lexical Rules get implemented as a Lexical Analyzer:

Input : String of Characters Output : List of Tokens

Syntactic Rules get implemented as a Syntactic Analyzer: Input : List of Tokens Output : Parse Tree

Parser = Lexical Analyzer + Syntactic Analyzer. (Very often, ‘parser’ is used in a loose sense to refer to the syntactic analyzer)

Extended BNF This is a shorthand notation for BNF rules. It adds no

power to the syntax,only a shorthand way to write productions:

[ ] – Grouping from which one must be chosen Binary_E T [+|-] T

{ }* - Repetition - 0 or more E T {[+|-] T}*

Extended BNF { }+ - Repetition - 1 or more

Usage similar to { }*

{ }opt - Optional I if E then S | if E then S else S Can be written in EBNF as I if E then S { else S}opt

Extended BNF

Example: Identifier - a letter followed by 0 or more letters or digits:

Extended BNF Regular BNF I → L { L | D }*

L → a | b |... D → 0 | 1 |...

Extended BNF

Example: Identifier - a letter followed by 0 or more letters or digits:

Extended BNF Regular BNF I → L { L | D }* I → L | L M

L → a | b |... M → CM | C D → 0 | 1 |... C → L | D

L → a | b |...

D → 0 | 1 |...

Classes of grammars BNF: Backus-Naur Form - Context free - Type 2 - Already

described

Regular grammars: subclass of BNF - Type 3: BNF rules are restricted: A → t N | t where: N = nonterminal, t = terminal

Examples: Binary numbers: B → 0 B | 1 B | 0 | 1

Identifiers: I → a L | b L | c L |...| z L | a |...| y | z L → 1 L | 2 L |...| 9 L | 0 L | 1 |...| 9 | 0 | a L | b L | c L |...| z L | a

|...| y | z

Other classes of grammars The context free and regular grammars are important for programming language design.

Other classes have theoretical importance, but not in this course:

Context sensitive grammar: Type 1 – Rules: α → β where | α | ≤ | β | i.e., length of α ≤ length of β, α: string of non terminals, β: string of terminals & non terminals

Unrestricted, recursively enumerable: Type 0 - Rules: α → β. No restrictions on α and β.

Parsing BNF and extended BNF are notations for formally describing

program syntax.

Given the BNF grammar for the syntax of a programming language (say Java), how do we determine that a given Java program obeys all the grammar rules.

This is achieved by parsing.

Two main methods of parsing Top-down parsing (recursive-descent). Bottom-up parsing.

Implementation of Rules Grammar

<num> ::= <int> | <real>

<int> ::= <digit> | <digit><int> <real> ::= <int-part> . <fraction> <int-part> ::= <digit> | <int-part> <digit> <fraction> ::= <digit> | <digit> <fraction> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Is “12.34” accepted by the grammar with <num> as the starting non-terminal?

Top-Down Parse => <num> => <int> (Try first choice) => <digit> (Try first choice) => 1 (Backtrack : “2.34” not parsed)





Top-Down Parse => <num> => <int> (Try first choice) => <digit><int> (Try second choice) => 1<int> => 1<digit> (Try first choice) => 12 (Backtrack: “.34” not parsed)





Top-Down Parse => <num> => <real> (Try second choice) => <int-part>.<fraction> => … => 12.34 (success!)





Bottom-Up Parse => 12.34 => <digit>2.34 => <int>2.34 => <num>2.34 (Backtrack)





Bottom-Up Parse => 12.34 => <digit>2.34 => <digit><digit>.34 => <digit><int>.34 => <int>.34 => <num>.34 (Backtrack)





Bottom-Up Parse => 12.34 => … => <int-part>.34 => … => <int-part>.<fraction> => <real> => <num> (Success!)

Recursive descent parsing overview

A simple parsing algorithm

Shows the relationship between the formal description of a programming language and the ability to generate executable code.

Each non-terminal becomes a procedure. Recursively call the procedure to check different parts. (+) Simple Translation from BNF (–) Not Efficient

Recursive descent parsing Example of Top-down parsing.

<expression> ::= <term>+<term> | <term>-<term>

(Some details are left out; just get the idea) procedure Expression; begin Term; /* Call Term to find first term */ while ((nextchar=`+') or (nextchar=`-')) do begin nextchar:=getchar; /* Skip operator */ Term end end

procedure Term; begin ... end

First recognize a Term; as long as the next symbol is either a + or - , we recognize another Term.

Structure of a compiler

Attribute Grammars

Productions

E → E+T | T

T → T*P | P

P → I | (E)

Attribute Grammars Production Attribute E → E+T value(E1) = value(E2)+value(T) E → T value(E) = value(T) T → T*P value(T1) = value(T2)* value(P) T → P value(T) = value(P) P → I value(P) = value(I) P → (E) value(P) = value(E)

E1 is first E in production and E2 is second E in production

Technique often useful for passing data type information within a parse tree.

Example attributed tree

Use of attribute grammars First need to develop parse tree. Attribute grammars assume

you already have the derivation; it is not a parsing algorithm.

Functions for attributes can be arbitrary. Process to build attributes is mostly manual.

Attribute grammars may be used along with parsing to generate intermediate code.

As the parse tree is built, information from one nonterminal is passed to another nonterminal higher in the tree.

When parse is completed, all attribute values have already been computed.

Summary We need a formalism for describing the set of all

allowed programs in a programming language. BNF and EBNF grammars

Given a program P in a programming language L and the BNF grammar for L, we can find out whether P is a syntactically correct program in language L.

This activity is called parsing. The Recursive Descent Parsing technique is one such

parsing technique.

processadors de llenguatge ii - etic upfrramirez/pl2/l01.pdf · practicas/seminarios: 40% ... en...

Documents