chapter 7
DESCRIPTION
Chapter 7. Introduction to Languages and Compiler. Contents. Computer architecture Compiler Grammars Formal languages Parse trees Ambiguity Regular expressions. Von Neumann Architecture. Compiler. A compiler is a program that reads a program written in one - PowerPoint PPT PresentationTRANSCRIPT
Winter 2007 SEG2101 Chapter 7 1
Chapter 7
Introduction to
Languages and Compiler
Winter 2007 SEG2101 Chapter 7 2
Contents
• Computer architecture
• Compiler
• Grammars
• Formal languages
• Parse trees
• Ambiguity
• Regular expressions
Winter 2007 SEG2101 Chapter 7 3
Von Neumann Architecture
Winter 2007 SEG2101 Chapter 7 4
Compiler
A compiler is a program that reads a program written in one language – the source language – and translates it into an equivalent program in another language – the target language.
Winter 2007 SEG2101 Chapter 7 5
The Compilation process
Winter 2007 SEG2101 Chapter 7 6
Grammars
• A grammar is defined as a 4-tuple: the alphabet , the nonterminals N, the production P, and a goal symbol S.
• (, N, P, S), N, P are set, S is a particular element of
set N.
Winter 2007 SEG2101 Chapter 7 7
Alphabets and Strings
is the alphabet, or set of terminals.
• It is a finite set consisting of all the input characters or symbols that can be arranged to form sentences in the language.
• English: A to Z, in our definition, punctuation and space symbols
• Programming language: usually some well-defined computer set such as ASCII
Winter 2007 SEG2101 Chapter 7 8
Alphabets and Strings (II)
• A compiler is usually defined with 2 grammars.
• The alphabet for the scanner grammar is ASCII or some subset of it.
• The alphabet for the parse grammar is the set of tokens generated by the scanner, not ASCII at all.
Winter 2007 SEG2101 Chapter 7 9
An Example of Strings
={a,b,c,d}
• Possible strings of terminals from include aaa, aabbccdd, d, cba, abab, ccccccccccacccc, and so on.
Winter 2007 SEG2101 Chapter 7 10
Formal Languages
: alphabet, it is a finite set consisting of all input characters or symbols.
*: closure of the alphabet, the set of all possible strings in , including the empty string .
• A (formal) language is some specified subset of *.
Winter 2007 SEG2101 Chapter 7 11
Nonterminals
• Nonterninal set N is a finite set of symbols not in the alphabet.
• A particular nonterminal, the goal symbol S, represents exactly all the strings in the language.
• The goal symbol is also often called the start symbol because we start with it.
• The set of terminal and set of nonterminals, taken together, is called vocabulary of the grammar.
Winter 2007 SEG2101 Chapter 7 12
Productions
• The productions P of a grammar is a set of rewriting rules, each written as two strings of symbols separated by an arrow.
• The symbols on each side of the arrow may be drawn from both terminals and nonterminals, subject to certain restrictions in the form of the grammars.
Winter 2007 SEG2101 Chapter 7 13
An Example Grammar
• G1=({a,b,c}, {A,B}, {AaB, AbB, AcB, B a, B b, B c}, A)
• The grammar generates 9 two-letter strings.
Winter 2007 SEG2101 Chapter 7 14
Syntax and Semantics
• Syntax: a syntax of a programming language is the form of its expression, statements, and program units.
• Semantics: the meaning of those expression, statements, and program units.
• If (<expr>) <statement>
Winter 2007 SEG2101 Chapter 7 15
Sentences, Lexeme, Token
• Sentences: the strings of a language are called sentences or statements.
• Lexeme: the lexemes of a programming language include its identifier, literals, operators, and special words.
• Token: a token of a language is a category of its lexemes.
Winter 2007 SEG2101 Chapter 7 16
Lexeme and Token
Lexemes Tokens
Index Identifier
= equal_sign
2 int_literal
* multi_op
Count identifier
+ plus_op
17 int_literal
; semicolon
Index = 2 * count +17;
Winter 2007 SEG2101 Chapter 7 17
The Role of Grammars
• The grammar of a language defines the correct form for sentences in that language.
• Grammar is the formal language generation mechanism that are commonly used to describe the syntax of programming languages.
Winter 2007 SEG2101 Chapter 7 18
BNF: Backus-Naur Form
• Backus presented a new formal notation for specifying programming language syntax.
• Naur modified the notation slightly.• Known as Backus-Naur Form, or BNF.• BNF is a very natural notation for
describing syntax.• BNF and context-free grammar (grammar)
are used interchangeably.
Winter 2007 SEG2101 Chapter 7 19
BNF
• Metalanguage: A language used to describe another language. BNF is a metalanguage for programming language.
• Abstraction: the symbol on the left-hand of the arrow
• Definition: the text to the right of the arrow
• Rule (production): altogether the description is called rule.
Winter 2007 SEG2101 Chapter 7 20
BNF Description(A simple C assignment statement)
<assign> <var> = <expression>
rule (production)
LHS(Left Hand Side)
abstraction
RHS(Right Hand Side)
definition
Winter 2007 SEG2101 Chapter 7 21
Nonterminal and Terminal
• Nonterminal symbol: the abstraction in a BNF description or grammar
• Terminal symbol: the lexemes and tokens of the rules
• A BNF description or grammar is simply a collection of rules.
• Nonterminals can have two or more distinct definitions.
• Multiple definitions can be written as a single rule, with the different definitions separated by |, meaning logical OR.<if_stmt>if <logic_expr>then<stmt> |if <logic_expr>then<stmt>else<stmt>
Winter 2007 SEG2101 Chapter 7 22
List of Syntactic Elements
• BNF does not include ellipsis (…)
• BNF uses recursion
• A rule is recursive if its LHS appears in its RHS.
• e.g., <ident_list> identifier | identifier , <ident_list>
Winter 2007 SEG2101 Chapter 7 23
A Grammar
Winter 2007 SEG2101 Chapter 7 24
A Derivation of a Program
Winter 2007 SEG2101 Chapter 7 25
Another Grammar
Winter 2007 SEG2101 Chapter 7 26
A Derivation of a Statement
Winter 2007 SEG2101 Chapter 7 27
Parse Tree
Grammars naturally describe the hierarchical syntactic structure of the sentences of the languages they define.
These hierarchical structures are called parse trees.
Winter 2007 SEG2101 Chapter 7 28
Ambiguous Grammar
• A grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous.
Winter 2007 SEG2101 Chapter 7 29
Ambiguity
Winter 2007 SEG2101 Chapter 7 30
Regular Expressions
Regular expression is a method of describing string.