cs 330 programming languages 09 / 21 / 2006 instructor: michael eckmann

50
CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Upload: abigayle-banks

Post on 04-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

CS 330Programming Languages

09 / 21 / 2006

Instructor: Michael Eckmann

Page 2: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

Today’s Topics• Questions / comments?• Semantics

– Operational (did this last time)– Axiomatics– Denotational

• Chapter 4– Lexical analyzers– Parsers

• Top down, recursive descent

Page 3: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Based on formal logic (predicate calculus)– Original purpose was for formal program verification

--- that is, to prove the correctness of programs– We define axioms or inference rules for each

statement type in the language (to allow transformations of expressions to other expressions)

– The expressions are called assertions or predicates

Page 4: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• An assertion before a statement is a precondition and states the relationships and constraints among variables that are true at that point in execution

• An assertion following a statement is a postcondition

• A weakest precondition is the least restrictive precondition that will guarantee the postcondition

Page 5: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Notation for specifying the Axiomatic semantics of a statement: {P} statement {Q}

• {P} is precondition, {Q} is postcondition

• Example: a = b + 1 {a > 1}– Read this as a must be greater than one after this

statement executes. So,– One possible precondition: {b > 4}– What's the weakest precondition here?

Page 6: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• For: a = b + 1 {a > 1}– The weakest precondition is: {b > 0}– Because a > 1 implies that b + 1 has to be > 1 which

implies that b must be > 0.

Page 7: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Jumping ahead to the big picture, we might be able to gleam from the example that to prove program correctness, one might have a post condition for the entire program and then work backwards through the program until we get to the beginning and generate a weakest precondition for the entire program. If that is within the program specs, then it is correct.

• What does that mean?• What would we do as we went backwards through

the program?

Page 8: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• When multiple statements or more complex structures are involved, we need what are called inference rules.

• For a sequence of statements S1;S2:• {P1} S1 {P2}• {P2} S2 {P3}• The inference rule is:

• And it is read like: If {P1}S1 {P2} is true and {P2} S2 {P3} is true, then we infer that {P1} S1,S2 {P3} is true.

P1 S1 P2 , P2 S2 P3

P1 S1; S2 P3

Page 9: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• It gets more complex to determine the precondition for while loops and other structures that might iterate different numbers of times depending on values of variables.

Page 10: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Developing axioms or inference rules for all of the statements in a language is difficult

• It is a good tool for correctness proofs but proving that an arbitrary program is correct is incomputable, so obviously there are limits to this (or any) approach that tries to prove correctness.

• It is an excellent framework for reasoning about programs

• It is not very useful for language users and compiler writers

Page 11: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Axiomatic Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Before moving on to Denotational Semantics let's try problem 20 a) on page 173

Page 12: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• This is the most widely known and rigorous method for describing the semantics (meaning) of programs.

• The book only touches on how it works and we will do the same.

Page 13: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The process of building a denotational spec for a language (not necessarily easy):– Define a mathematical object for each language

entity– Define a function that maps instances of the

language entities onto instances of the corresponding mathematical objects

Page 14: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Example:– The following (next slide) denotational semantics

description maps decimal numbers as strings of symbols into numeric values

– Mdec is the semantic function and it is used to map the syntactic objects to the decimal numbers

Page 15: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

<dec_num> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | <dec_num> (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)Mdec- a function that maps from the syntactic elements to their meaning

Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9

Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)

Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1…

Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9

Page 16: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Going further ....• The state of a program is the values of all its

current variables

s = {<i1, v1>, <i2, v2>, …, <in, vn>}(where i's are names of variables and v's are current

values of the variables.)

Page 17: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• There's a good example in the text that contains denotational semantics for a while loop construct.

• M is a “meaning function” and maps statements and the current state to something else

• Let's look at the example on p. 168 on the board.

Page 18: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Denotational Semantics

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Can be used to prove the correctness of programs

• Provides a rigorous way to think about programs

• Can be an aid to language design (if the denotational description starts to become really hairy then maybe you should rethink the design of the language.)

• Has been used in compiler generation systems

Page 19: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical & Syntax Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The syntax analysis portion of a language processor nearly always consists of two parts:– A low-level part called a lexical analyzer

(mathematically, a finite automaton based on a regular grammar)

– A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar, or BNF)

• The parser can be based directly on the BNF

• Parsers based on BNF are easy to maintain

Page 20: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical & Syntax Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Lexical and syntax analysis are seperated because of– Simplicity - less complex approaches can be used for

lexical analysis; separating them simplifies the parser– Efficiency - separation allows optimization of the

lexical analyzer– Portability - parts of the lexical analyzer may not be

portable, but the parser always is portable• b/c the lexical analyzer reads from files and buffers input

and files and file systems can be platform dependent

Page 21: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Lexical analysis basically matches patterns and it is a front-end to the parser

• Identifies substrings of the source program that belong together - lexemes– Lexemes match a character pattern, which is

associated with a token–Recall examples of lexemes and tokens

Page 22: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexemes and tokens

Lexemesidx

=

42

+

Count

;

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

Tokensidentifier

equal_sign

int_literal

plus_op

identifier

semicolon

idx = 42 + count;

Page 23: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The lexical analyzer is usually a function that is called by the parser when it needs the next token

• Three approaches to building a lexical analyzer:– Write a formal description of the tokens and use a

software tool (e.g. lex) that constructs table-driven lexical analyzers given such a description

– Design a state diagram that describes the tokens and write a program that implements the state diagram (the way we'll focus on.)

– Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram

Page 24: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Lexical analyzers are finite automatons (meaningful to you if you've taken MC 306 Theory of Computation)

• A finite automaton is a 5-tuple (Q, Sigma, q0, delta, A) where

– Q is a finite set of states

– Sigma is a finite set of input symbols

– q0 (the start state) is an element of Q

– delta (the transition function) is a function from QxSigma to Q

– A (the accepting states) is a subset of Q

Page 25: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The following diagram is a state diagram for a lexical analyzer for only the program names and reserved words and literals in some language.

• The states are represented by ovals

– Q = { Start, id, int }

• The input symbols are specified along the arrows

– Sigma = { Letter, Digit }

• The arrows are the transitions among states

– delta can be easily drawn into a chart with the headings:

From-state, input symbol, To-state

• The start state is Start (that is, q0 = Start)

• The accepting states are drawn with two ovals

– A = {id, int}

Page 26: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

Page 27: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The way it works is, we start at the Start state and depending on what input symbol we get (a Letter or a Digit) we move to the appropriate state. And so on until we hit an accepting state (valid.)

• If we get an input symbol while in a state that isn't on an arrow leaving that state, then we end (return).

– However, if we're in the Start state and get something other than a Letter or a Digit as the input symbol, we do not have a program name, reserved word nor a literal which is what that finite automaton was designed to detect.

Page 28: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Implementation:int lex() { getChar(); switch (charClass) { case LETTER: addChar(); getChar(); while (charClass == LETTER || charClass == DIGIT) { addChar(); getChar(); } return lookup(lexeme); break;

Page 29: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

…case DIGIT:

addChar();

getChar();

while (charClass == DIGIT) {

addChar();

getChar();

}

return INT_LIT;

break;

} /* End of switch */

} /* End of function lex */

Page 30: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Utility subprograms:–getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass

–addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme

– lookup - determines whether the string in lexeme is a reserved word (returns a code)

Page 31: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Lexical Analysis

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Lexical analyzers often construct the symbol table for the rest of the compiler to use.

• The symbol table consists of user-defined names.• The symbol table also consists of attributes of the names

(e.g. their type and other information) that is put there by some other step usually after the lexical analyzer creates the table. (Recall the flow chart diagram from chapter 1 that specified all the parts of the compilation/interpretation process.)

Page 32: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Before we get into parsing, we need to remember what leftmost derivations and rightmost derivations (I don't think I defined rightmost derivations yet) are.

• Anyone care to explain?

• Any recollection from chapter 3 what a sentential form is?

• Pop quiz:

• What's the purpose of the lexical analyzer again?

Page 33: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Goals of the parser, given an input program:– Find all syntax errors; for each, produce an appropriate

diagnostic message, and recover quickly --- that is be able to find as many syntax errors in the same pass through the code (is this easy?)

– Produce the parse tree, or at least a trace of the parse tree, for the program

• Two categories of parsers

– Top down parsers – build the tree from the root down to the leaves

– Bottom up parsers – build the tree from the leaves upwards to the root.

Page 34: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Top down - produce the parse tree, beginning at the root– Order is that of a leftmost derivation

• Bottom up - produce the parse tree, beginning at the leaves– Order is that of the reverse of a rightmost derivation

• The parsers we'll be dealing with look only one token ahead in the input– Can anyone guess what this means?

Page 35: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• OK, as input we have a possible sentence (program) in the language that may or may not be syntactically correct.

• The lexical analyzer is called by the parser to get the next lexeme and associated token code from the input sentence (program.)

Page 36: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• A step in the life of a top down parser...

• Think of this now as during some arbitrary step in the top down parser we have a sentential form that is part of a leftmost derivation, the goal is to find the next sentential form in that leftmost derivation.

• e.g. Given a left sentential form xA – where x is a string of terminals– A is a non-terminal– is a string of terminals and non-terminals

• Since we want a leftmost derivation, what would we do next?

Page 37: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• We have a left sentential form xA (some string in the derivation)

• Expand A (the leftmost non-terminal) using a production that has A as its LHS.

• How might we do this?

Page 38: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• We have a left sentential form xA

• Expand A (the leftmost non-terminal) using a production that has A as its LHS.

• How might we do this?– We would call the lexical analyzer to get the next lexeme.

– Then determine which production to expand based on the lexeme/token returned.

– For grammars that are “one lookahead”, exactly one RHS of A will have as its first symbol the lexeme/token returned. If there is no RHS of A with its first symbol being the lexeme/token returned --- what do we think occurred?

Page 39: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Top down parsers can be implemented as recursive descent parsers --- that is, a program based directly on the BNF description of the language.

• Or they can be represented by a parsing table that contains all the information in the BNF rules in table form.

• These top down parsers are called LL algorithms --- L for left-to-right scan of the input and L for leftmost derivation.

Page 40: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Left recursion is problematic for top-down parsers.

• e.g. <A> -> <A> b

• Each non-terminal is written as a function in the parser. So, a recursive descent parser function for this production would continually call itself first and never get out of the recursion. Similarly for any top-down parser.

• Left recursion is problematic for top-down parsers even if it is indirect.

• e.g. <A> -> <B> b

<B> -> <A> a

• Bottom up parsers do not have this problem with left recursion.

Page 41: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• When a nonterminal has more than one RHS, the first terminal symbol that can be generated in a derivation for each of them must be unique to that RHS. If it is not unique, then top-down parsing with one lookahead is impossible for that grammar.

• Because there is only one lookahead symbol, which lex would return to the parser, that one symbol has to uniquely determine which RHS to choose.

• Does that make sense?

Page 42: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Example:

• Here are two productions in a possible grammar:

• <variable> -> identifier | identifier [ <expression> ]

• Assuming we have a left sentential form where the next nonterminal to be expanded is <variable> and we call out to lex() to get the next token and it returns identifier. Which production will we pick?

Page 43: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The pairwise disjointness test is used on grammars to determine if they have this problem. What it does is, it determines the first symbol for every RHS of a production and for those with the same LHS, the first symbols must be different for each.

• <A> -> a <B> | b<A>b | c <C> | d

• The 4 RHSs of A here each have as their first symbol {a}, {b}, {c}, {d} respectively. These sets are all disjoint so these productions would pass the test. The ones on the previous slide would not.

• Note the use of curly braces on this slide is NOT the EBNF usage but the standard set usage of braces.

Page 44: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Top Down Parsers

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Note: if the first symbol on the RHS is a nonterminal, we would need to expand it to determine the first symbol.

• e.g.

• <A> -> <B> a | b<A>b | c <C> | d

• <B> -> <C> f

• <C> -> s <F>

• Here, to determine the first symbol of the first RHS of A, we need to follow <B> then <C> to find out that it is s. So, the 4 RHSs of A here each have as their first symbol {s}, {b}, {c}, {d} respectively. These sets are all disjoint so these productions would pass the test.

Page 45: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Recursive descent example

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• So called because many of its subroutines/functions (one for each non-terminal) are recursive and descent because it is a top-down parser.

• EBNF is well suited for these parsers because EBNF reduces the number of non-terminals (as compared to plain BNF.)

• A grammar for simple expressions:<expr> <term> {(+ | -) <term>}<term> <factor> {(* | /) <factor>}<factor> id | ( <expr> )

• Anyone remember what the curly braces mean in EBNF?

Page 46: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Recursive descent example

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Assume we have a lexical analyzer named lex, which puts the next token code in nextToken

• The coding process when there is only one RHS:– For each terminal symbol in the RHS, compare it

with the next input token; if they match, continue, else there is an error

– For each nonterminal symbol in the RHS, call its associated parsing subprogram

Page 47: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Recursive descent example

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

/* Function expr (written in the C language) Parses strings in the language generated by the rule: <expr> → <term> {(+ | -) <term>} */

void expr() {

/* Parse the first term */

term(); /* term() is it's own function representing the non-terminal <term>*/

/* As long as the next token is + or -, call lex to get the next token, and parse the next term */   while (nextToken == PLUS_CODE || nextToken == MINUS_CODE) {     lex();     term();   } }

Page 48: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Recursive descent example

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• Note: by convention each function leaves the most recent token in nextToken

• A nonterminal that has more than one RHS requires an initial process to determine which RHS it is to parse– The correct RHS is chosen on the basis of the next

token of input (the lookahead)– The next token is compared with the first token that

can be generated by each RHS until a match is found– If no match is found, it is a syntax error

• Let's now look at an example of how the parser would handle multiple RHS's.

Page 49: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Recursive descent example

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

/* Function factor Parses strings in the language generated by the rule: <factor> -> id | (<expr>) */ void factor() {  if (nextToken) == ID_CODE) /*Determine which RHS*/

/* For the RHS id, just call lex */     lex();

/* If the RHS is (<expr>) – call lex to pass over the leftparenthesis, call expr, and check for the right parenthesis */

   else if (nextToken == LEFT_PAREN_CODE) { lex(); expr(); if (nextToken == RIGHT_PAREN_CODE) lex(); else error(); } /* End of else if (nextToken == ... */ else error(); /* Neither RHS matches */

}

Page 50: CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann

Recursive descent example

Michael Eckmann - Skidmore College - CS 330 - Fall 2006

• The term function for the <term> non-terminal will be similar to the expr function for the <expr> non-terminal. See the text for this function.