ice1341 programming languages spring 2005 lecture #4 lecture #4 in-young ko iko.at. icu.ac.kr...

17
ICE1341 ICE1341 Programming Languages Programming Languages Spring 2005 Spring 2005 Lecture #4 Lecture #4 In-Young Ko iko .AT. i cu . ac.kr Information and Communications University (ICU)

Upload: rose-gilmore

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

ICE1341 ICE1341 Programming LanguagesProgramming Languages

Spring 2005Spring 2005

Lecture #4Lecture #4

In-Young Koiko .AT. icu.ac.kr

Information and Communications University (ICU)

Spring 2005 2 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

AnnouncementsAnnouncements

Send the language-survey information to Send the language-survey information to the TAthe TA

Form your project teams by this Thursday Form your project teams by this Thursday March 10March 10thth

Include 4-5 students in each teamInclude 4-5 students in each team Mix skill levelsMix skill levels Mix genders (if it is possible)Mix genders (if it is possible)

Spring 2005 3 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Language evaluation criteriaLanguage evaluation criteria ReadabilityReadability WritabilityWritability ReliabilityReliability

Last LectureLast Lecture

Spring 2005 4 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

This LectureThis Lecture

Language Syntax and SemanticsLanguage Syntax and Semantics Formal Ways to Define LanguagesFormal Ways to Define Languages

Chomsky HierarchyChomsky Hierarchy Backus-Naur Form (BNF)Backus-Naur Form (BNF)

Spring 2005 5 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

What are Syntax and Semantics?What are Syntax and Semantics?

SyntaxSyntax: the form of expression, statements, and : the form of expression, statements, and program unitsprogram units

e.g., e.g., while (<boolean_expr>) <statement>while (<boolean_expr>) <statement> SemanticsSemantics: the meaning of those expressions, : the meaning of those expressions,

statements, and program unitsstatements, and program unitse.g., “e.g., “When the current value of the Boolean When the current value of the Boolean expression is true, the embedded statement is expression is true, the embedded statement is executed.executed.””

Spring 2005 6 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Describing SyntaxDescribing Syntax A A sentencesentence (statement)(statement)

is a string of characters is a string of characters over some alphabetover some alphabet

A A languagelanguage is a set of is a set of sentencessentences

A A lexemelexeme is the lowest is the lowest level syntactic unit of a level syntactic unit of a language (e.g., *, sum, language (e.g., *, sum, begin)begin)

A A tokentoken is a category of is a category of lexemes (e.g., identifier)lexemes (e.g., identifier)

* AW Lecture Notes

e.g.,e.g.,

index = 2 * count + 17;index = 2 * count + 17;

LexemesLexemes TokensTokens

indexindex identifieridentifier

== equal_signequal_sign

22 int_literalint_literal

** mult_opmult_op

countcount identifieridentifier

++ plus_opplus_op

1717 int_literalint_literal

;; semicolonsemicolon

Spring 2005 7 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Formal Ways to Define LanguagesFormal Ways to Define Languages

Language RecognizersLanguage Recognizers A device that determines A device that determines

whether a given program is whether a given program is in a languagein a language

e.g, a syntax analyzer of a e.g, a syntax analyzer of a compiler, finite automatacompiler, finite automata

Language GeneratorsLanguage Generators A device that can be used to A device that can be used to

generate the sentences of a generate the sentences of a languagelanguage

e.g, regular expressions, e.g, regular expressions, context-free grammarscontext-free grammars

(( 00 )(( 00 )** 1 ( 11 ) 1 ( 11 )**))++ 0 0

q0 q1

q2q3

1

1

0

0

0

The transition diagram of a finite automaton

F = (Q, ∑, δ, q0, F)

001110

111110

000110

Accepted

Accepted

Not accepted

Spring 2005 8 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Regular ExpressionsRegular Expressions

Define Define patternspatterns of strings of strings (languages) (languages) Widely used for Widely used for text-search applicationstext-search applications

e.g., UNIX e.g., UNIX grepgrep command, String match in command, String match in PerlPerl

Used as the input to Used as the input to lexical analyzer lexical analyzer generatorsgenerators, such as Lex or Flex, such as Lex or Flex

e.g., e.g., HandelHandel, , HändelHändel, and , and HaendelHaendel are are described by the pattern “described by the pattern “H(a|ä|ae)ndelH(a|ä|ae)ndel””http://en.wikipedia.org/wiki/Regular_expression

Spring 2005 9 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Regular Expression SyntaxRegular Expression Syntax

AlternationAlternation: : || e.g., “e.g., “graygray||greygrey””

QuantificationQuantification: : ??, , ++, , ** ??: the preceding pattern may be present at most : the preceding pattern may be present at most

once (e.g., “once (e.g., “coloucolou??rr”)”) ++: the preceding pattern may be present at least : the preceding pattern may be present at least

once (e.g., “once (e.g., “googoo++glegle”)”) **: the preceding pattern may be present zero, one, or : the preceding pattern may be present zero, one, or

more times (e.g., “more times (e.g., “00**4242”)”) GroupingGrouping: : ( )( )

e.g., “e.g., “grgr((a|ea|e))yy”, “”, “((grandgrand))?father?father””http://en.wikipedia.org/wiki/Regular_expression

Spring 2005 10

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Regular Expression ExamplesRegular Expression Examples

(0|10)*1*(0|10)*1*

εε, 0, 1, 0001, 1010101, 01111111, …, 0, 1, 0001, 1010101, 01111111, … 1?(00*1)*0*1?(00*1)*0*

εε, 0, 1, 001, 0010, 00010, 10010010, …, 0, 1, 001, 0010, 00010, 10010010, … aaa, aabb, abba, aabb, abbbbaaa, aabb, abba, aabb, abbbb

(aaa|aabb|abba|aabb|abbbb)(aaa|aabb|abba|aabb|abbbb)

(a|aa)(bb)*a?(a|aa)(bb)*a?

a+b+a?a+b+a?

Spring 2005 11

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Formal Methods of Describing Syntax – Formal Methods of Describing Syntax – Context-free GrammarsContext-free Grammars

Developed by Developed by Noam ChomskyNoam Chomsky in the mid-1950s in the mid-1950s Language generators, meant to describe the syntax of Language generators, meant to describe the syntax of

natural languagesnatural languages Represented by Represented by variables (non-terminals)variables (non-terminals) that are that are

described recursively in terms of each other and described recursively in terms of each other and primitive symbols called primitive symbols called terminalsterminals

The rules relating the variables are calledThe rules relating the variables are called productions productionse.g, <sentence> e.g, <sentence> <noun phrase> <verb phrase> <noun phrase> <verb phrase>

<noun phrase> <noun phrase> <adjective> <noun phrase> <adjective> <noun phrase> <noun phrase> <noun phrase> <noun> <noun> <noun> <noun> boy boy <adjective> <adjective> little little

Context free languages are the theoretical basis for the Context free languages are the theoretical basis for the syntax of most programming languagessyntax of most programming languages * Hopcroft & Ullman Chap 4

Spring 2005 12

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Chomsky HierarchyChomsky Hierarchy

Regular Grammars (Type 3)Regular Grammars (Type 3) A A wwB (or A B (or A B Bw)w) or A or A ww, ,

where A and B are variables, and where A and B are variables, and ww is a string of terminals (or empty)is a string of terminals (or empty)

Regular languages can be Regular languages can be recognized by recognized by finite automatafinite automata

Context-free Grammars (Type 2)Context-free Grammars (Type 2) A A , where A is a variable and , where A is a variable and

is a string of variables and terminalsis a string of variables and terminals Context-free languages can be Context-free languages can be

recognized by recognized by push-down automatapush-down automata

Four Four classes (models)classes (models) of generative devices (grammars) of generative devices (grammars) that define four languagesthat define four languages

0 1

e.g., e.g., 0(10)*0(10)* S S 0 A, A 0 A, A 1 0 A | 1 0 A | єє or, or, S S S 1 0 | 0 S 1 0 | 0

0 1 0 1 0

Finite Control

Input Tape

0 1

e.g., e.g., S S 0S0 | 1S1 | c 0S0 | 1S1 | c

1 c 1 1 0

Finite Control

Input Tape

Stack

Spring 2005 13

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Chomsky Hierarchy Chomsky Hierarchy (cont’d)(cont’d)

Context-sensitive Grammars (Type 1)Context-sensitive Grammars (Type 1) AA , where A is a variable, and , where A is a variable, and

, , and and are strings of variables and are strings of variables and

terminals (terminals ( and and may be empty, may be empty, ≠ ≠ є))

““Permit replacement of variable A by Permit replacement of variable A by string string in the context ofin the context of and and ””

Context-sensitive languages can be Context-sensitive languages can be recognized by recognized by non-deterministic Turing non-deterministic Turing machinesmachines

Unrestricted Grammars (Type 0)Unrestricted Grammars (Type 0) , where , where and and are strings of are strings of

variables and terminals (variables and terminals ( ≠ ≠ є) Unrestricted languages can be Unrestricted languages can be

recognized by recognized by Turing machinesTuring machines

0 1 0 1 0 1 0

Finite Control

Input Tape

Turing MachinesTuring Machines A simple mathematical model of A simple mathematical model of

a computera computer Input tape is infinite to the rightInput tape is infinite to the right n leftmost cells hold the inputn leftmost cells hold the input The remaining infinity of cells The remaining infinity of cells

each the blankeach the blank In one move,In one move,

1.1. Change stateChange state

2.2. Print the symbol on the tape cell, Print the symbol on the tape cell, and replace itand replace it

3.3. Move the head left or right one cellMove the head left or right one cell

* Hopcroft & Ullman Chap 4

Spring 2005 14

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Backus-Naur Form (BNF)Backus-Naur Form (BNF) Invented by Invented by John BackusJohn Backus to describe Algol 58 (1959) to describe Algol 58 (1959) The most widely used method for The most widely used method for programming programming

language syntaxlanguage syntax Equivalent to Equivalent to context-free grammarscontext-free grammars A A meta-languagemeta-language to describe other languages to describe other languages

e.g., A small program (Example 3.1)e.g., A small program (Example 3.1)

<program> <program> beginbegin <stmt_list> <stmt_list> endend<stmt_list> <stmt_list> <stmt> <stmt>

| <stmt> | <stmt> ;; <stmt_list> <stmt_list><stmt> <stmt> <var> <var> == <expression> <expression><var> <var> A | B | C A | B | C<expression> <expression> <var> <var> ++ <var> <var>

| <var> | <var> -- <var> <var>| <var>| <var>

LHS (Left-hand side): LHS (Left-hand side): AbstractionAbstraction or or Non-terminalNon-terminal or or

VariableVariable

RHS (Right-hand side): RHS (Right-hand side): Lexemes and tokens Lexemes and tokens

(terminals)(terminals), and , and reference to other reference to other

abstractionsabstractions

Production (rule)Production (rule)

Spring 2005 15

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

DerivationsDerivations

Repeated application of Repeated application of productionsproductions, starting with the , starting with the start start symbolsymbol and ending with a and ending with a sentencesentence (all terminal symbols) (all terminal symbols)

<program> <program> beginbegin <stmt_list> <stmt_list> endend beginbegin <stmt> <stmt>;; <stmt_list> <stmt_list> endend beginbegin <var> <var> == <expression> <expression>;; <stmt_list> <stmt_list> endend beginbegin A A == <expression> <expression>;; <stmt_list> <stmt_list> endend beginbegin A A == <var> <var> ++ <var> <var>;; <stmt_list> <stmt_list> endend beginbegin A A == B B ++ <var> <var>;; <stmt_list> <stmt_list> endend beginbegin A A == B B ++ C C;; <stmt_list> <stmt_list> endend beginbegin A A == B B ++ C C;; <stmt> <stmt> endend beginbegin A A == B B ++ C C;; <var> <var> == <expression> <expression> end end beginbegin A A == B B ++ C C;; B B == <expression> <expression> end end beginbegin A A == B B ++ C C;; B B == <var> <var> end end beginbegin A A == B B ++ C C;; B B == C C end end

Leftmost Leftmost DerivationsDerivations

Sentential FormSentential Form

SentenceSentence

Start Start SymbolSymbol

Spring 2005 16

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

Parse TreesParse Trees

A Parse Tree: a hierarchical A Parse Tree: a hierarchical representation of a derivationrepresentation of a derivation Internal nodesInternal nodes of a parse tree: of a parse tree:

non-terminal symbolsnon-terminal symbols Leaf nodesLeaf nodes of a parse tree: of a parse tree:

terminal symbolsterminal symbols Each Each sub-treessub-trees of a parse tree: of a parse tree:

an instance of an abstractionan instance of an abstraction

<program>

<stmts>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>

Spring 2005 17

ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University

ReferenceReference

Reference for Reference for computation theorycomputation theory Introduction to Automata Theory, Languages, Introduction to Automata Theory, Languages,

and Computationand Computation by John E. Hopcroft, by John E. Hopcroft, Rajeev Motwani, and Jefferey D. Ullman, 2Rajeev Motwani, and Jefferey D. Ullman, 2ndnd Ed., Addison Wesley, 2003Ed., Addison Wesley, 2003