ICE1341 ICE1341 Programming LanguagesProgramming Languages
Spring 2005Spring 2005
Lecture #4Lecture #4
In-Young Koiko .AT. icu.ac.kr
Information and Communications University (ICU)
Spring 2005 2 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
AnnouncementsAnnouncements
Send the language-survey information to Send the language-survey information to the TAthe TA
Form your project teams by this Thursday Form your project teams by this Thursday March 10March 10thth
Include 4-5 students in each teamInclude 4-5 students in each team Mix skill levelsMix skill levels Mix genders (if it is possible)Mix genders (if it is possible)
Spring 2005 3 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Language evaluation criteriaLanguage evaluation criteria ReadabilityReadability WritabilityWritability ReliabilityReliability
Last LectureLast Lecture
Spring 2005 4 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
This LectureThis Lecture
Language Syntax and SemanticsLanguage Syntax and Semantics Formal Ways to Define LanguagesFormal Ways to Define Languages
Chomsky HierarchyChomsky Hierarchy Backus-Naur Form (BNF)Backus-Naur Form (BNF)
Spring 2005 5 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
What are Syntax and Semantics?What are Syntax and Semantics?
SyntaxSyntax: the form of expression, statements, and : the form of expression, statements, and program unitsprogram units
e.g., e.g., while (<boolean_expr>) <statement>while (<boolean_expr>) <statement> SemanticsSemantics: the meaning of those expressions, : the meaning of those expressions,
statements, and program unitsstatements, and program unitse.g., “e.g., “When the current value of the Boolean When the current value of the Boolean expression is true, the embedded statement is expression is true, the embedded statement is executed.executed.””
Spring 2005 6 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Describing SyntaxDescribing Syntax A A sentencesentence (statement)(statement)
is a string of characters is a string of characters over some alphabetover some alphabet
A A languagelanguage is a set of is a set of sentencessentences
A A lexemelexeme is the lowest is the lowest level syntactic unit of a level syntactic unit of a language (e.g., *, sum, language (e.g., *, sum, begin)begin)
A A tokentoken is a category of is a category of lexemes (e.g., identifier)lexemes (e.g., identifier)
* AW Lecture Notes
e.g.,e.g.,
index = 2 * count + 17;index = 2 * count + 17;
LexemesLexemes TokensTokens
indexindex identifieridentifier
== equal_signequal_sign
22 int_literalint_literal
** mult_opmult_op
countcount identifieridentifier
++ plus_opplus_op
1717 int_literalint_literal
;; semicolonsemicolon
Spring 2005 7 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Formal Ways to Define LanguagesFormal Ways to Define Languages
Language RecognizersLanguage Recognizers A device that determines A device that determines
whether a given program is whether a given program is in a languagein a language
e.g, a syntax analyzer of a e.g, a syntax analyzer of a compiler, finite automatacompiler, finite automata
Language GeneratorsLanguage Generators A device that can be used to A device that can be used to
generate the sentences of a generate the sentences of a languagelanguage
e.g, regular expressions, e.g, regular expressions, context-free grammarscontext-free grammars
(( 00 )(( 00 )** 1 ( 11 ) 1 ( 11 )**))++ 0 0
q0 q1
q2q3
1
1
0
0
0
The transition diagram of a finite automaton
F = (Q, ∑, δ, q0, F)
001110
111110
000110
Accepted
Accepted
Not accepted
Spring 2005 8 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Regular ExpressionsRegular Expressions
Define Define patternspatterns of strings of strings (languages) (languages) Widely used for Widely used for text-search applicationstext-search applications
e.g., UNIX e.g., UNIX grepgrep command, String match in command, String match in PerlPerl
Used as the input to Used as the input to lexical analyzer lexical analyzer generatorsgenerators, such as Lex or Flex, such as Lex or Flex
e.g., e.g., HandelHandel, , HändelHändel, and , and HaendelHaendel are are described by the pattern “described by the pattern “H(a|ä|ae)ndelH(a|ä|ae)ndel””http://en.wikipedia.org/wiki/Regular_expression
Spring 2005 9 ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Regular Expression SyntaxRegular Expression Syntax
AlternationAlternation: : || e.g., “e.g., “graygray||greygrey””
QuantificationQuantification: : ??, , ++, , ** ??: the preceding pattern may be present at most : the preceding pattern may be present at most
once (e.g., “once (e.g., “coloucolou??rr”)”) ++: the preceding pattern may be present at least : the preceding pattern may be present at least
once (e.g., “once (e.g., “googoo++glegle”)”) **: the preceding pattern may be present zero, one, or : the preceding pattern may be present zero, one, or
more times (e.g., “more times (e.g., “00**4242”)”) GroupingGrouping: : ( )( )
e.g., “e.g., “grgr((a|ea|e))yy”, “”, “((grandgrand))?father?father””http://en.wikipedia.org/wiki/Regular_expression
Spring 2005 10
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Regular Expression ExamplesRegular Expression Examples
(0|10)*1*(0|10)*1*
εε, 0, 1, 0001, 1010101, 01111111, …, 0, 1, 0001, 1010101, 01111111, … 1?(00*1)*0*1?(00*1)*0*
εε, 0, 1, 001, 0010, 00010, 10010010, …, 0, 1, 001, 0010, 00010, 10010010, … aaa, aabb, abba, aabb, abbbbaaa, aabb, abba, aabb, abbbb
(aaa|aabb|abba|aabb|abbbb)(aaa|aabb|abba|aabb|abbbb)
(a|aa)(bb)*a?(a|aa)(bb)*a?
a+b+a?a+b+a?
Spring 2005 11
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Formal Methods of Describing Syntax – Formal Methods of Describing Syntax – Context-free GrammarsContext-free Grammars
Developed by Developed by Noam ChomskyNoam Chomsky in the mid-1950s in the mid-1950s Language generators, meant to describe the syntax of Language generators, meant to describe the syntax of
natural languagesnatural languages Represented by Represented by variables (non-terminals)variables (non-terminals) that are that are
described recursively in terms of each other and described recursively in terms of each other and primitive symbols called primitive symbols called terminalsterminals
The rules relating the variables are calledThe rules relating the variables are called productions productionse.g, <sentence> e.g, <sentence> <noun phrase> <verb phrase> <noun phrase> <verb phrase>
<noun phrase> <noun phrase> <adjective> <noun phrase> <adjective> <noun phrase> <noun phrase> <noun phrase> <noun> <noun> <noun> <noun> boy boy <adjective> <adjective> little little
Context free languages are the theoretical basis for the Context free languages are the theoretical basis for the syntax of most programming languagessyntax of most programming languages * Hopcroft & Ullman Chap 4
Spring 2005 12
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Chomsky HierarchyChomsky Hierarchy
Regular Grammars (Type 3)Regular Grammars (Type 3) A A wwB (or A B (or A B Bw)w) or A or A ww, ,
where A and B are variables, and where A and B are variables, and ww is a string of terminals (or empty)is a string of terminals (or empty)
Regular languages can be Regular languages can be recognized by recognized by finite automatafinite automata
Context-free Grammars (Type 2)Context-free Grammars (Type 2) A A , where A is a variable and , where A is a variable and
is a string of variables and terminalsis a string of variables and terminals Context-free languages can be Context-free languages can be
recognized by recognized by push-down automatapush-down automata
Four Four classes (models)classes (models) of generative devices (grammars) of generative devices (grammars) that define four languagesthat define four languages
0 1
e.g., e.g., 0(10)*0(10)* S S 0 A, A 0 A, A 1 0 A | 1 0 A | єє or, or, S S S 1 0 | 0 S 1 0 | 0
0 1 0 1 0
Finite Control
Input Tape
0 1
e.g., e.g., S S 0S0 | 1S1 | c 0S0 | 1S1 | c
1 c 1 1 0
Finite Control
Input Tape
Stack
Spring 2005 13
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Chomsky Hierarchy Chomsky Hierarchy (cont’d)(cont’d)
Context-sensitive Grammars (Type 1)Context-sensitive Grammars (Type 1) AA , where A is a variable, and , where A is a variable, and
, , and and are strings of variables and are strings of variables and
terminals (terminals ( and and may be empty, may be empty, ≠ ≠ є))
““Permit replacement of variable A by Permit replacement of variable A by string string in the context ofin the context of and and ””
Context-sensitive languages can be Context-sensitive languages can be recognized by recognized by non-deterministic Turing non-deterministic Turing machinesmachines
Unrestricted Grammars (Type 0)Unrestricted Grammars (Type 0) , where , where and and are strings of are strings of
variables and terminals (variables and terminals ( ≠ ≠ є) Unrestricted languages can be Unrestricted languages can be
recognized by recognized by Turing machinesTuring machines
0 1 0 1 0 1 0
Finite Control
Input Tape
Turing MachinesTuring Machines A simple mathematical model of A simple mathematical model of
a computera computer Input tape is infinite to the rightInput tape is infinite to the right n leftmost cells hold the inputn leftmost cells hold the input The remaining infinity of cells The remaining infinity of cells
each the blankeach the blank In one move,In one move,
1.1. Change stateChange state
2.2. Print the symbol on the tape cell, Print the symbol on the tape cell, and replace itand replace it
3.3. Move the head left or right one cellMove the head left or right one cell
* Hopcroft & Ullman Chap 4
Spring 2005 14
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Backus-Naur Form (BNF)Backus-Naur Form (BNF) Invented by Invented by John BackusJohn Backus to describe Algol 58 (1959) to describe Algol 58 (1959) The most widely used method for The most widely used method for programming programming
language syntaxlanguage syntax Equivalent to Equivalent to context-free grammarscontext-free grammars A A meta-languagemeta-language to describe other languages to describe other languages
e.g., A small program (Example 3.1)e.g., A small program (Example 3.1)
<program> <program> beginbegin <stmt_list> <stmt_list> endend<stmt_list> <stmt_list> <stmt> <stmt>
| <stmt> | <stmt> ;; <stmt_list> <stmt_list><stmt> <stmt> <var> <var> == <expression> <expression><var> <var> A | B | C A | B | C<expression> <expression> <var> <var> ++ <var> <var>
| <var> | <var> -- <var> <var>| <var>| <var>
LHS (Left-hand side): LHS (Left-hand side): AbstractionAbstraction or or Non-terminalNon-terminal or or
VariableVariable
RHS (Right-hand side): RHS (Right-hand side): Lexemes and tokens Lexemes and tokens
(terminals)(terminals), and , and reference to other reference to other
abstractionsabstractions
Production (rule)Production (rule)
Spring 2005 15
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
DerivationsDerivations
Repeated application of Repeated application of productionsproductions, starting with the , starting with the start start symbolsymbol and ending with a and ending with a sentencesentence (all terminal symbols) (all terminal symbols)
<program> <program> beginbegin <stmt_list> <stmt_list> endend beginbegin <stmt> <stmt>;; <stmt_list> <stmt_list> endend beginbegin <var> <var> == <expression> <expression>;; <stmt_list> <stmt_list> endend beginbegin A A == <expression> <expression>;; <stmt_list> <stmt_list> endend beginbegin A A == <var> <var> ++ <var> <var>;; <stmt_list> <stmt_list> endend beginbegin A A == B B ++ <var> <var>;; <stmt_list> <stmt_list> endend beginbegin A A == B B ++ C C;; <stmt_list> <stmt_list> endend beginbegin A A == B B ++ C C;; <stmt> <stmt> endend beginbegin A A == B B ++ C C;; <var> <var> == <expression> <expression> end end beginbegin A A == B B ++ C C;; B B == <expression> <expression> end end beginbegin A A == B B ++ C C;; B B == <var> <var> end end beginbegin A A == B B ++ C C;; B B == C C end end
Leftmost Leftmost DerivationsDerivations
Sentential FormSentential Form
SentenceSentence
Start Start SymbolSymbol
Spring 2005 16
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
Parse TreesParse Trees
A Parse Tree: a hierarchical A Parse Tree: a hierarchical representation of a derivationrepresentation of a derivation Internal nodesInternal nodes of a parse tree: of a parse tree:
non-terminal symbolsnon-terminal symbols Leaf nodesLeaf nodes of a parse tree: of a parse tree:
terminal symbolsterminal symbols Each Each sub-treessub-trees of a parse tree: of a parse tree:
an instance of an abstractionan instance of an abstraction
<program>
<stmts>
<stmt>
const
a
<var> = <expr>
<var>
b
<term> + <term>
Spring 2005 17
ICE 1341 – Programming Languages © In-Young Ko, Information and Communications University
ReferenceReference
Reference for Reference for computation theorycomputation theory Introduction to Automata Theory, Languages, Introduction to Automata Theory, Languages,
and Computationand Computation by John E. Hopcroft, by John E. Hopcroft, Rajeev Motwani, and Jefferey D. Ullman, 2Rajeev Motwani, and Jefferey D. Ullman, 2ndnd Ed., Addison Wesley, 2003Ed., Addison Wesley, 2003