chapter 7

30
Winter 2007 SEG2101 Chapter 7 1 Chapter 7 Introduction to Languages and Compiler

Upload: manchu

Post on 05-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Chapter 7. Introduction to Languages and Compiler. Contents. Computer architecture Compiler Grammars Formal languages Parse trees Ambiguity Regular expressions. Von Neumann Architecture. Compiler. A compiler is a program that reads a program written in one - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 7

Winter 2007 SEG2101 Chapter 7 1

Chapter 7

Introduction to

Languages and Compiler

Page 2: Chapter 7

Winter 2007 SEG2101 Chapter 7 2

Contents

• Computer architecture

• Compiler

• Grammars

• Formal languages

• Parse trees

• Ambiguity

• Regular expressions

Page 3: Chapter 7

Winter 2007 SEG2101 Chapter 7 3

Von Neumann Architecture

Page 4: Chapter 7

Winter 2007 SEG2101 Chapter 7 4

Compiler

A compiler is a program that reads a program written in one language – the source language – and translates it into an equivalent program in another language – the target language.

Page 5: Chapter 7

Winter 2007 SEG2101 Chapter 7 5

The Compilation process

Page 6: Chapter 7

Winter 2007 SEG2101 Chapter 7 6

Grammars

• A grammar is defined as a 4-tuple: the alphabet , the nonterminals N, the production P, and a goal symbol S.

• (, N, P, S), N, P are set, S is a particular element of

set N.

Page 7: Chapter 7

Winter 2007 SEG2101 Chapter 7 7

Alphabets and Strings

is the alphabet, or set of terminals.

• It is a finite set consisting of all the input characters or symbols that can be arranged to form sentences in the language.

• English: A to Z, in our definition, punctuation and space symbols

• Programming language: usually some well-defined computer set such as ASCII

Page 8: Chapter 7

Winter 2007 SEG2101 Chapter 7 8

Alphabets and Strings (II)

• A compiler is usually defined with 2 grammars.

• The alphabet for the scanner grammar is ASCII or some subset of it.

• The alphabet for the parse grammar is the set of tokens generated by the scanner, not ASCII at all.

Page 9: Chapter 7

Winter 2007 SEG2101 Chapter 7 9

An Example of Strings

={a,b,c,d}

• Possible strings of terminals from include aaa, aabbccdd, d, cba, abab, ccccccccccacccc, and so on.

Page 10: Chapter 7

Winter 2007 SEG2101 Chapter 7 10

Formal Languages

: alphabet, it is a finite set consisting of all input characters or symbols.

*: closure of the alphabet, the set of all possible strings in , including the empty string .

• A (formal) language is some specified subset of *.

Page 11: Chapter 7

Winter 2007 SEG2101 Chapter 7 11

Nonterminals

• Nonterninal set N is a finite set of symbols not in the alphabet.

• A particular nonterminal, the goal symbol S, represents exactly all the strings in the language.

• The goal symbol is also often called the start symbol because we start with it.

• The set of terminal and set of nonterminals, taken together, is called vocabulary of the grammar.

Page 12: Chapter 7

Winter 2007 SEG2101 Chapter 7 12

Productions

• The productions P of a grammar is a set of rewriting rules, each written as two strings of symbols separated by an arrow.

• The symbols on each side of the arrow may be drawn from both terminals and nonterminals, subject to certain restrictions in the form of the grammars.

Page 13: Chapter 7

Winter 2007 SEG2101 Chapter 7 13

An Example Grammar

• G1=({a,b,c}, {A,B}, {AaB, AbB, AcB, B a, B b, B c}, A)

• The grammar generates 9 two-letter strings.

Page 14: Chapter 7

Winter 2007 SEG2101 Chapter 7 14

Syntax and Semantics

• Syntax: a syntax of a programming language is the form of its expression, statements, and program units.

• Semantics: the meaning of those expression, statements, and program units.

• If (<expr>) <statement>

Page 15: Chapter 7

Winter 2007 SEG2101 Chapter 7 15

Sentences, Lexeme, Token

• Sentences: the strings of a language are called sentences or statements.

• Lexeme: the lexemes of a programming language include its identifier, literals, operators, and special words.

• Token: a token of a language is a category of its lexemes.

Page 16: Chapter 7

Winter 2007 SEG2101 Chapter 7 16

Lexeme and Token

Lexemes Tokens

Index Identifier

= equal_sign

2 int_literal

* multi_op

Count identifier

+ plus_op

17 int_literal

; semicolon

Index = 2 * count +17;

Page 17: Chapter 7

Winter 2007 SEG2101 Chapter 7 17

The Role of Grammars

• The grammar of a language defines the correct form for sentences in that language.

• Grammar is the formal language generation mechanism that are commonly used to describe the syntax of programming languages.

Page 18: Chapter 7

Winter 2007 SEG2101 Chapter 7 18

BNF: Backus-Naur Form

• Backus presented a new formal notation for specifying programming language syntax.

• Naur modified the notation slightly.• Known as Backus-Naur Form, or BNF.• BNF is a very natural notation for

describing syntax.• BNF and context-free grammar (grammar)

are used interchangeably.

Page 19: Chapter 7

Winter 2007 SEG2101 Chapter 7 19

BNF

• Metalanguage: A language used to describe another language. BNF is a metalanguage for programming language.

• Abstraction: the symbol on the left-hand of the arrow

• Definition: the text to the right of the arrow

• Rule (production): altogether the description is called rule.

Page 20: Chapter 7

Winter 2007 SEG2101 Chapter 7 20

BNF Description(A simple C assignment statement)

<assign> <var> = <expression>

rule (production)

LHS(Left Hand Side)

abstraction

RHS(Right Hand Side)

definition

Page 21: Chapter 7

Winter 2007 SEG2101 Chapter 7 21

Nonterminal and Terminal

• Nonterminal symbol: the abstraction in a BNF description or grammar

• Terminal symbol: the lexemes and tokens of the rules

• A BNF description or grammar is simply a collection of rules.

• Nonterminals can have two or more distinct definitions.

• Multiple definitions can be written as a single rule, with the different definitions separated by |, meaning logical OR.<if_stmt>if <logic_expr>then<stmt> |if <logic_expr>then<stmt>else<stmt>

Page 22: Chapter 7

Winter 2007 SEG2101 Chapter 7 22

List of Syntactic Elements

• BNF does not include ellipsis (…)

• BNF uses recursion

• A rule is recursive if its LHS appears in its RHS.

• e.g., <ident_list> identifier | identifier , <ident_list>

Page 23: Chapter 7

Winter 2007 SEG2101 Chapter 7 23

A Grammar

Page 24: Chapter 7

Winter 2007 SEG2101 Chapter 7 24

A Derivation of a Program

Page 25: Chapter 7

Winter 2007 SEG2101 Chapter 7 25

Another Grammar

Page 26: Chapter 7

Winter 2007 SEG2101 Chapter 7 26

A Derivation of a Statement

Page 27: Chapter 7

Winter 2007 SEG2101 Chapter 7 27

Parse Tree

Grammars naturally describe the hierarchical syntactic structure of the sentences of the languages they define.

These hierarchical structures are called parse trees.

Page 28: Chapter 7

Winter 2007 SEG2101 Chapter 7 28

Ambiguous Grammar

• A grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous.

Page 29: Chapter 7

Winter 2007 SEG2101 Chapter 7 29

Ambiguity

Page 30: Chapter 7

Winter 2007 SEG2101 Chapter 7 30

Regular Expressions

Regular expression is a method of describing string.