1 chapter 5 context-free grammars and languages cathedral of st. basil the blessed, red square,...

59
1 Chapter 5 Chapter 5 Context-Free Context-Free Grammars and Grammars and Languages Languages Cathedral of St. Basil the Blessed, Red Square, Moscow, Russia

Upload: sylvia-allison

Post on 02-Jan-2016

224 views

Category:

Documents


2 download

TRANSCRIPT

11

Chapter 5Chapter 5

Context-Free Grammars Context-Free Grammars and Languagesand Languages

Cathedral of St. Basil the Blessed, Red Square, Moscow, Russia

2

OutlineOutline

►5.0 Introduction5.0 Introduction►5.1 Context-Free Grammars (CFG’s)5.1 Context-Free Grammars (CFG’s)►5.2 Parse Trees5.2 Parse Trees►5.3 Applications of CFG’s5.3 Applications of CFG’s►5.4 Ambiguity in Grammars and Languages5.4 Ambiguity in Grammars and Languages

3

5.0 Introduction5.0 Introduction

►Context-free grammars (CFG’s) generate context-Context-free grammars (CFG’s) generate context-

free languages (CFL’s).free languages (CFL’s).

►CFG’s play a central role in compiler technology CFG’s play a central role in compiler technology

since 1960’s.since 1960’s.

4

5.0 Introduction5.0 Introduction

►Recently, CFG’s are used to describe document Recently, CFG’s are used to describe document

formats via document-type definitions (DTD’s) formats via document-type definitions (DTD’s)

which is used in XML’s (extensible markup which is used in XML’s (extensible markup

languages).languages).

►XML’s are used mainly for information exchange XML’s are used mainly for information exchange

on the Web.on the Web.

5

5.1 CFG’s5.1 CFG’s

►5.1.1 An Informal Example5.1.1 An Informal Example Palindromes --- e.g., otto, madamiamadam (“Palindromes --- e.g., otto, madamiamadam (“

Madam, I’m Adam”)…Madam, I’m Adam”)… To design a CFG for the language of binary pTo design a CFG for the language of binary p

alindromesalindromes

LLpalpal = { = {ww | | ww{0, 1}{0, 1}**, , wwRR = = ww}.}.

►Examples: 00, 0110, 011110, …Examples: 00, 0110, 011110, …

6

5.1 CFG’s5.1 CFG’s►5.1.1 An Informal Example5.1.1 An Informal Example

CFG:CFG:►PP (1)(1)►PP 0 0 (2)(2)►PP 1 1 (3)(3)►PP 0 0PP00 (4)(4)►PP 1 1PP11 (5)(5)

Productions Productions (4) & (5) are (4) & (5) are recursiverecursive. . PP is a is a variable variable ((nonterminalnonterminal). 0 and 1 are ). 0 and 1 are terminals.terminals.

Examples of derivations of strings:Examples of derivations of strings:►PP (4)(4) 0 0P0P0 (5)(5) 01 01PP10 10 (5)(5) 011 011PP110 110 (1)(1) 011110 011110►PP (4)(4) 0 0P0P0 (3)(3) 010 … 010 …

7

5.1 CFG’s5.1 CFG’s

►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s

A CFG is a 4-tuple A CFG is a 4-tuple GG = ( = (VV, , TT, , PP, , SS) where) where

►VV is the set of is the set of variables (or nonterminals, variables (or nonterminals,

syntactic categories)syntactic categories);;

►TT is the set of is the set of terminalsterminals;;

►SS is the is the start symbolstart symbol;;

(cont’d in the next page)(cont’d in the next page)

8

5.1 CFG’s5.1 CFG’s

►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s A CFG is a 4-tuple A CFG is a 4-tuple GG = ( = (VV, , TT, , PP, , SS) where ) where

(cont’d)(cont’d)

►PP is the set of is the set of productions or rules productions or rules of the form of the form “head”“head” “body” “body”

AA where where AA VV, , ( (VV∪∪TT))**, i.e., the head is a , i.e., the head is a

single variable and the body is a string of zero or single variable and the body is a string of zero or

more terminals and variables.more terminals and variables.

9

5.1 CFG’s5.1 CFG’s

►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s For the last example:For the last example:

►VV = { = {PP}}►TT = {0, 1} = {0, 1}►PP = = AA = the set of the five productions (1)~(5) bel = the set of the five productions (1)~(5) bel

owow PP (1)(1) PP 0 0 (2)(2) PP 1 1 (3)(3) PP 0 0PP00 (4)(4) PP 1 1PP11 (5)(5)

►SS = = P P (*P is used for two meanings here; do not get confused!)(*P is used for two meanings here; do not get confused!)

10

5.1 CFG’s5.1 CFG’s►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s

Example 5.3Example 5.3 -- a CFG for expressions ( -- a CFG for expressions ( 算數式算數式 ))►EE II►EE EE + + EE►EE EE**EE►EE ( (EE))

wherewhere I I is an identifier describable by RE is an identifier describable by RE((aa + + bb)()(aa + + bb + + 00 + + 11))** and transformable into and transformable into productions: productions:

►II aa►II bb►II IaIa►II IbIb►II II00►II II11

11

5.1 CFG’s5.1 CFG’s

► 5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s Example 5.3Example 5.3 --- a CFG for expressions ( --- a CFG for expressions ( 算數式算數式 ))

1.1. EE II2.2. EE EE + + EE3.3. EE EE**EE4.4. EE ( (EE))5.5. II aa6.6. II bb7.7. II IaIa8.8. II IbIb9.9. II II0010.10. II II11

The above productions may be rewritten asThe above productions may be rewritten as ► EE I | EI | E + + E | EE | E**E | E | ((EE))► II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11

12

5.1 CFG’s5.1 CFG’s

►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s

Example 5.3Example 5.3 --- a CFG for expressions --- a CFG for expressions (( 算數式算數式 ))

EE I | EI | E + + E | EE | E E | E | ((EE))

II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11

►Examples of derivations:Examples of derivations:

EE EE EE II EE aa EE aa ( (EE) )

aa ( (EE + + EE) ) aa ( (II0 + 0 + EE) ) ... ... aa ( (aa0 +0 +bb1)1)

13

5.1 CFG’s5.1 CFG’s

►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar We use productions to generate (“infer”) strings in We use productions to generate (“infer”) strings in

the language described by the grammar.the language described by the grammar. Two ways for such Two ways for such string inferencestring inference::

►Recursive inferenceRecursive inference --- --- bottom up (“from body to head”)bottom up (“from body to head”) starting from known strings (often from terminals in starting from known strings (often from terminals in

productions)productions)►DerivationDerivation --- ---

top down (“from head to body”) in concepttop down (“from head to body”) in concept as shown by the examples beforeas shown by the examples before

14

5.1 CFG’s5.1 CFG’s►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar

Example 5.4Example 5.4 --- ---

Show recursive inference of the string Show recursive inference of the string aa((aa++bb00) 00) String inferred

For language of

Production used

String(s) used

(i) a I 5 (I a) (ii) b I 6 (I b)

(iii) b0 I 9 (I I0) (ii)

(iv) b00 I 9 (I I0) (iii)

(v) a E 1 (E I) (i)

(vi) b00 E 1 (E I) (iv)

(vii) a + b00 E 2 (E E+ E) (v), (vi)

(viii) (a + b00) E 4 (vii)

(iv) a(a + b00) E 3 (v), (viii)

15

5.1 CFG’s5.1 CFG’s

►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar

Notations of derivations:Notations of derivations:

►If If AA is a string of terminals and variables, and if is a string of terminals and variables, and if

AA is a production, then we write is a production, then we write

AA

to denote a to denote a derivationderivation..

G

16

5.1 CFG’s5.1 CFG’s

►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar

Notations of derivations:Notations of derivations:

►For zero and more derivations, we use the notationFor zero and more derivations, we use the notation

AA

►The labelThe label G G under the double arrow may be omitted under the double arrow may be omitted

if which grammar is being used is understood.if which grammar is being used is understood.

G

17

5.1 CFG’s5.1 CFG’s►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar

Example 5.5Example 5.5 --- --- Derivation of the string Derivation of the string aa((aa++bb00) 00)

EE EE EE II EE aa EE aa ( (EE) ) aa ( (II + + EE) ) aa ( (aa + + EE) ) aa ( (aa + + II) ) aa ( (aa + + II0)0) aa ( (aa + + II00) 00) aa ( (aa + + bb00)00)

18

5.1 CFG’s5.1 CFG’s►5.1.4 Leftmost and Rightmost Derivations5.1.4 Leftmost and Rightmost Derivations

Leftmost derivationLeftmost derivation --- replacing the leftmost variabl --- replacing the leftmost variable in each derivation step (use notation but for typie in each derivation step (use notation but for typing convenience we also use ng convenience we also use lmlm ))

Rightmost derivationRightmost derivation --- rightmost instead ( --- rightmost instead (rmrm).).

Example 5.6Example 5.6 --- a continuation from Example 5.5 --- a continuation from Example 5.5

►Leftmost derivationLeftmost derivation

EE lmlm EEEE lmlm IIEE lmlm aaEE lmlm aa((EE) ) lmlm aa((EE + + EE) ) lmlm

aa((II + + EE) ) lmlm ... ... lmlm aa((aa + +II00) 00) lmlm aa((aa + + bb00)00)

lm

19

5.1 CFG’s5.1 CFG’s

►5.1.4 Leftmost and Rightmost Derivations5.1.4 Leftmost and Rightmost Derivations

Example 5.6Example 5.6 --- Continuation of Example 5.5 --- Continuation of Example 5.5

►Rightmost derivationRightmost derivation

EE rmrm EEEE rmrm EE((EE) ) rmrm EE((EE + + EE) ) rmrm EE((EE + + II) ) rmrm

EE((EE + + II0) 0) rmrm EE((EE + + II00) 00) rmrm ... ... rmrm II((aa + + bb00) 00) rmrm

aa((aa + + bb00)00)

Any derivation has an equivalent leftmost derivation Any derivation has an equivalent leftmost derivation

and an equivalent rightmost one (proved in the next sand an equivalent rightmost one (proved in the next s

ection).ection).

20

5.1 CFG’s5.1 CFG’s

►5.1.5 The Language of a Grammar5.1.5 The Language of a Grammar The language The language LL((GG) of a CFG ) of a CFG GG = ( = (VV, , TT, , PP, , SS) is ) is

LL((GG) = {) = {ww | | ww TT**, , S S ww}.}.

The language of a CFG is called a context-free The language of a CFG is called a context-free

language (CFL).language (CFL).

Theorem 5.7 Theorem 5.7 in the text bookin the text book shows a typical way to shows a typical way to

prove that a given grammar really generates the prove that a given grammar really generates the

desired language (the set of palindromes) (desired language (the set of palindromes) (using using

inductioninduction) (read by yourself).) (read by yourself).

G

21

5.1 CFG’s5.1 CFG’s

►5.1.6 Sentential Forms5.1.6 Sentential Forms Derivations from the start symbol are called Derivations from the start symbol are called

sentential formssentential forms.. That is, given a CFG That is, given a CFG GG = ( = (VV, , TT, , PP, , SS), if ), if SS where where

( (VV∪∪TT))**, then , then is a sentential form. is a sentential form. if if SS where where ( (VV∪∪TT))**, then , then is a left- is a left-

sentential form.sentential form. if if SS where where ( (VV∪∪TT))**, then , then is a right- is a right-

sentential form.sentential form. Example 5.8Example 5.8 --- easy; read by yourself. --- easy; read by yourself.

lm

rm

22

5.2 Parse Trees5.2 Parse Trees

►Advantage of parse trees –Advantage of parse trees – In a compiler, the parse tree structure facilitates In a compiler, the parse tree structure facilitates

translationtranslation of the source program into recursive of the source program into recursive

executable codes.executable codes.

Parse trees are closely related to derivations and Parse trees are closely related to derivations and

recursive inferences.recursive inferences.

An important application of the parse tree is the An important application of the parse tree is the

study of study of grammatical ambiguitygrammatical ambiguity which makes the which makes the

grammar unsuitable for a programming language.grammar unsuitable for a programming language.

23

5.2 Parser Trees5.2 Parser Trees

► A note about the term YACC in textbook ---A note about the term YACC in textbook --- The computer program YACC is a parser generator developed The computer program YACC is a parser generator developed

by Stephen C. Johnson at AT&T for the Unix operating by Stephen C. Johnson at AT&T for the Unix operating system.system.

The name is an acronym for “Yet Another Compiler The name is an acronym for “Yet Another Compiler Compiler.”Compiler.”

It generates a parser (the part of a compiler that tries to make It generates a parser (the part of a compiler that tries to make syntactic sense of the source code) based on an analytic syntactic sense of the source code) based on an analytic grammar written in a notation similar to BNF.grammar written in a notation similar to BNF.

YACC generates the code for the parser in the C YACC generates the code for the parser in the C programming language.programming language.

(Retrieved from Wikipedia, 2007/10/8)(Retrieved from Wikipedia, 2007/10/8)

24

5.2 Parse Trees5.2 Parse Trees

►5.2.1 Constructing Parse Trees5.2.1 Constructing Parse Trees

Given a grammar Given a grammar GG = ( = (VV, , TT, , PP, , SS), the parse tree is ), the parse tree is

defined as:defined as:

►Each interior node is labeled by a variable in Each interior node is labeled by a variable in VV..

►Each leaf is labeled by either a variable, a terminal, or Each leaf is labeled by either a variable, a terminal, or . If . If

is the label, it must be the only child of its parent. is the label, it must be the only child of its parent.

(cont’d in the next page)(cont’d in the next page)

25

5.2 Parse Trees5.2 Parse Trees

►5.2.1 Constructing Parse Trees5.2.1 Constructing Parse Trees

Given a grammar Given a grammar GG = ( = (VV, , TT, , PP, , SS), the parse tree is de), the parse tree is de

fined as (cont’d):fined as (cont’d):

►If an interior node is labeled If an interior node is labeled AA, and its children are labeled , and its children are labeled

XX11, , XX22, …, , …, XXkk, respectively, from the left, then , respectively, from the left, then AA XX11XX22……

XXkk is a production in is a production in PP..

►Note that the only time one of the Note that the only time one of the XX’s can be ’s can be is when is when is is

the label of the only child, and the label of the only child, and AA is a production of is a production of GG..

26

5.2 Parse Trees5.2 Parse Trees

►5.2.1 Constructing Parse Trees5.2.1 Constructing Parse Trees Example 5.10 --- a parse tree of derivation Example 5.10 --- a parse tree of derivation PP 0110 0110

of the palindrome.of the palindrome.

PP

0 P 00 P 0

1 P 11 P 1

27

5.2 Parse Trees5.2 Parse Trees

►5.2.2 The Yield of a Parse Tree5.2.2 The Yield of a Parse Tree The The yield yield of a parse tree is the string obtained by of a parse tree is the string obtained by

concatenating all the leaves from the left, like 01concatenating all the leaves from the left, like 0110= 10= 0110 for the tree of the last example below.0110 for the tree of the last example below.

PP 0 P 00 P 0

1 P 11 P 1

Showing the yields of the parse trees of a grammar Showing the yields of the parse trees of a grammar GG is is

another way to describe the language of another way to describe the language of GG (provable) (provable)..

28

5.2 Parse Trees5.2 Parse Trees

►5.2.3 Inference, Derivations, and Parse Trees5.2.3 Inference, Derivations, and Parse Trees Given a grammar Given a grammar GG = ( = (VV, , TT, , PP, , SS), the following ), the following

facts are all equivalent:facts are all equivalent:

►the recursive inference procedure determines that the recursive inference procedure determines that

terminal string terminal string ww is in the language of variable is in the language of variable AA;;

►AA ww;;

►AA ww;;

►AA ww;;

►there is a parse tree with root there is a parse tree with root AA and yield and yield ww..

lm

rm

29

5.2 Parse Trees5.2 Parse Trees

►5.2.3 Inference, Derivations, and Parse Trees5.2.3 Inference, Derivations, and Parse Trees Equivalences of the facts in the previous page are Equivalences of the facts in the previous page are

proved in a way as shown in the following diagram proved in a way as shown in the following diagram by theorems in Sections 5.2.4~5.2.6 (read by by theorems in Sections 5.2.4~5.2.6 (read by yourself):yourself):

Parse tree

Leftmost derivation

Rightmost derivationDerivati

on

Recursive inference

30

5.3 Applications of CFG’s5.3 Applications of CFG’s

►CFG’s were originally conceived by N. CFG’s were originally conceived by N. Chomsky for describing natural languages. But Chomsky for describing natural languages. But not all natural languages can be so described.not all natural languages can be so described.

►Two applications of CFG’s ---Two applications of CFG’s --- Describing programming languages.Describing programming languages.

►There is a mechanical way for turning a CFG into a parser.There is a mechanical way for turning a CFG into a parser.

Describing DTD’s in XML’s ---Describing DTD’s in XML’s ---►DTD --- document type definitionDTD --- document type definition

►XML --- extensible markup languageXML --- extensible markup language

31

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.1 Parsers5.3.1 Parsers There are components in programming languages There are components in programming languages

which are not RL’s.which are not RL’s.

Two examples of non-RL structures ---Two examples of non-RL structures ---►Balanced structures like parentheses “Balanced structures like parentheses “( )( ),” “,” “beginbegin--end”end”

pair, “pair, “ifif--elseelse” pair, …” pair, …

►Unbalanced structures like Unbalanced structures like unbalanced unbalanced ““if elseif else” pairs…” pairs…

►E.g., a balanced if-else pair in C isE.g., a balanced if-else pair in C is

if (condition) Statement; else Statement;if (condition) Statement; else Statement;

32

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.1 Parsers5.3.1 Parsers Example 5.19Example 5.19 --- ---

►A grammar for balanced parenthesesA grammar for balanced parentheses

GGbalbal = ({ = ({BB}, {(, )}, }, {(, )}, PP, , BB))

PP: : BB BBBB | ( | (BB) | ) |

e.g., e.g., xx = (()) = (())

()()()()

… …

33

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.1 Parsers5.3.1 Parsers Example 5.19Example 5.19 (cont’d)--- (cont’d)---

►A grammar for unbalanced if-else pairs A grammar for unbalanced if-else pairs SS | | SSSS | | iSiS | | iSeSiSeS

where where ii = = if …, if …, ee = = else …else … The production The production SS iS iS is used to generate is used to generate unbalanceunbalance “ “ifif” (with no m” (with no m

atching “atching “elseelse”)”) e.g., the following is a generated segment of a C programe.g., the following is a generated segment of a C program

if (Condition) {if (Condition) { ……

if (Condition) Statement;if (Condition) Statement;else Statement;else Statement;……if (Condition) Statement;if (Condition) Statement;else Statement;else Statement;……

}}

34

5.3 Applications of CFG’s5.3 Applications of CFG’s

EE IIEE EE + + EEEE EE**EEEE ( (EE))

II aaII bbII IaIaII IbIbII II00II II11

ExpExp : : IdId {…}{…} | Exp ’+’ Exp| Exp ’+’ Exp {…}{…} | Exp ’*’ Exp| Exp ’*’ Exp {…}{…} | ’(’ Exp ’)’| ’(’ Exp ’)’ {…}{…} ;;

Id Id : ’a’ : ’a’ | ’b’| ’b’ {…}{…} | Id ’a’| Id ’a’ {…}{…} | Id ’b’| Id ’b’ {…}{…} | Id ’0’| Id ’0’ {…}{…} | Id ’1’| Id ’1’ {…}{…} ;;

►5.3.2 The YACC Parser-Generator5.3.2 The YACC Parser-Generator Input to YACC is a CFG with each production being Input to YACC is a CFG with each production being

associated additionally with an action ({…} below).associated additionally with an action ({…} below). Example 5.21Example 5.21 --- a CFG in the YACC notation --- a CFG in the YACC notation

identical to that of Example 5.3 (Fig. 5.2).identical to that of Example 5.3 (Fig. 5.2).

35

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.3 Markup Languages5.3.3 Markup Languages

The strings in a markup language are documents witThe strings in a markup language are documents wit

h “marks,” called h “marks,” called tagstags which specify semantics of th which specify semantics of th

e strings.e strings.

An example --- HTML (HyperText Markup LanguagAn example --- HTML (HyperText Markup Languag

e) for webpage design, including 2 functions:e) for webpage design, including 2 functions:►Creating links between documentsCreating links between documents

►Describing formats (“looks”) of documents Describing formats (“looks”) of documents

36

5.3 Applications of CFG’s5.3 Applications of CFG’s

The thing IThe thing I hatehate::

1. Moldy bread.1. Moldy bread.

2. People who drive too slow 2. People who drive too slow

in the fast lane.in the fast lane.

<P>The thing I <EM> <P>The thing I <EM> hate </EM> hate </EM>

<OL><OL><LI>Moldy bread.<LI>Moldy bread.<LI>People who drive <LI>People who drive too slow in the too slow in the fast lane.fast lane.

</OL></OL>

► 5.3.3 Markup Languages5.3.3 Markup Languages Example 5.22 – Example 5.22 – HTML of A webpageHTML of A webpage

(a) The text as viewed on webpage

(b) The HTML source

37

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.3 Markup Languages5.3.3 Markup Languages Example 5.22 – Example 5.22 – HTML of A webpage (cont’d)HTML of A webpage (cont’d)

►Meanings of tags:Meanings of tags: <P> --- paragraph <P> --- paragraph

(unmatched single tag)(unmatched single tag) <EM>…</EM> --- emphasizing text … <EM>…</EM> --- emphasizing text …

(matched tag pair)(matched tag pair) <LI> --- list item<LI> --- list item

(unmatched s(unmatched singleingle tag) tag) <OL>…</OL> --- ordered list<OL>…</OL> --- ordered list

(matc(matched tag pair)hed tag pair)

38

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.3 Markup Languages5.3.3 Markup Languages Example 5.22 Example 5.22 – HTML of A webpage – HTML of A webpage

(cont’d)(cont’d)►Meanings of Meanings of tagstags::

<P> - paragraph<P> - paragraph

<EM>…</EM> - emphasizing text<EM>…</EM> - emphasizing text

<LI> - list item<LI> - list item

<OL>…</OL> - ordered list<OL>…</OL> - ordered list

<P>The thing I <EM> hate <P>The thing I <EM> hate

</EM> </EM>

<OL><OL>

<LI>Moldy bread.<LI>Moldy bread.

<LI>People who drive too <LI>People who drive too

slow in the fast lane.slow in the fast lane.

</OL></OL>

39

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.3 Markup Languages5.3.3 Markup Languages Part of an HTML grammarPart of an HTML grammar

1. 1. DocDoc | | ElementElement DocDoc

2. 2. ElementElement TextText | |

<EM> <EM> DocDoc </EM> | </EM> |

<P> <P> DocDoc | |

<OL> <OL> ListList </OL> | … </OL> | …

3. 3. TextText | | CharChar TextText

4. 4. CharChar aa | | AA | … | …

5. 5. ListList | | ListitemListitem ListList

6. 6. ListitemListitem <LI> <LI> DocDoc

40

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.4 XML and Document-Type Definitions 5.3.4 XML and Document-Type Definitions

(DTD’s)(DTD’s) An XML (extensible markup language) describes the An XML (extensible markup language) describes the

semanticssemantics of the text in a document using DTD’s, of the text in a document using DTD’s,

while an HTML describes the while an HTML describes the formatformat of a document. of a document.

The DTD is itself described with a language which is The DTD is itself described with a language which is

essential essential in the form of a CFG mixed with regular in the form of a CFG mixed with regular

expressionsexpressions..

41

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.4 XML and Document-Type Definitions 5.3.4 XML and Document-Type Definitions (DTD’s)(DTD’s) The form of a The form of a DTDDTD is is

<!DOCTYPE name-of-DTD [<!DOCTYPE name-of-DTD [

list of list of elementelement definitionsdefinitions

]>]>

The form of an The form of an elementelement definitiondefinition is is<!<!ElementElement element-name ( element-name (description of the elementdescription of the element)>)>

Descriptions of elementsDescriptions of elements are essentially are essentially regular regular expressionsexpressions..

42

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.4 XML and Document-Type Definitions 5.3.4 XML and Document-Type Definitions (DTD’s)(DTD’s) The regular expression mayThe regular expression may be be described in the described in the

following way:following way:►An element may appear in another element.An element may appear in another element.►The special term The special term #PCDATA#PCDATA stands for any text involving stands for any text involving

nono tagstags..►The allowed operators are The allowed operators are

A vertical bar | means union;A vertical bar | means union; A comma , denotes concatenationA comma , denotes concatenation 3 variants of the closure operator --- *, +, ? as described before.3 variants of the closure operator --- *, +, ? as described before.

(*: zero or more occurrences of; (*: zero or more occurrences of; +: one or more occurrences of;+: one or more occurrences of; ?: zero or one occurrence of)?: zero or one occurrence of)

43

5.3 Applications of CFG’s5.3 Applications of CFG’s

► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s Example 5.23Example 5.23 --- a DTD for describing personal computers (PC’s) --- a DTD for describing personal computers (PC’s)

<<!DOCTYPE PcSpecs !DOCTYPE PcSpecs [[<!ELEMENT PCS (<!ELEMENT PCS (PC*PC*)>)><!ELEMENT PC (MODEL<!ELEMENT PC (MODEL,, PRICE PRICE,, PROCESSOR PROCESSOR,, RAM RAM,, DISK+DISK+)>)><!ELEMENT MODEL (#PCDATA)><!ELEMENT MODEL (#PCDATA)><!ELEMENT PRICE (#PCDATA)><!ELEMENT PRICE (#PCDATA)><!ELEMENT PROCESSOR (MANF<!ELEMENT PROCESSOR (MANF,, MODEL MODEL, , SPEED)>SPEED)><!ELEMENT MANF (#PCDATA)><!ELEMENT MANF (#PCDATA)><!ELEMENT MODEL (#PCDATA)><!ELEMENT MODEL (#PCDATA)><!ELEMENT SPEED (#PCDATA)><!ELEMENT SPEED (#PCDATA)><!ELEMENT RAM (#PCDATA)><!ELEMENT RAM (#PCDATA)><!ELEMENT DISK (HARDDISK <!ELEMENT DISK (HARDDISK || CD CD || DVD)> DVD)><!ELEMENT HARDDISK (MANF<!ELEMENT HARDDISK (MANF,, MODEL MODEL,, SIZE)> SIZE)><!ELEMENT SIZE (#PCDATA)><!ELEMENT SIZE (#PCDATA)><!ELEMENT CD (SPEED)><!ELEMENT CD (SPEED)><!ELEMENT DVD (SPEED)><!ELEMENT DVD (SPEED)>

]]>>(MANf: manufacturer)(MANf: manufacturer)

Fig. 5.14 A DTD for a PC

44

5.3 Applications of CFG’s5.3 Applications of CFG’s

►5.3.4 XML & DTD’s5.3.4 XML & DTD’s Example 5.23Example 5.23 --- a DTD for describing persona --- a DTD for describing persona

l computers (PC’s) (continued)l computers (PC’s) (continued)► Each element is represented in the document by a tag with the name of the elemEach element is represented in the document by a tag with the name of the elem

ent and a matching tag at the end, with an extra slash, just as in HTML.ent and a matching tag at the end, with an extra slash, just as in HTML.► For example: For example:

<!ELEMENT MODEL (#PCDATA)><!ELEMENT MODEL (#PCDATA)> <MODEL><MODEL>45604560</MODEL></MODEL> <!ELEMENT PCS (<!ELEMENT PCS (PC*PC*)> )>

<PCS><PCS> <PC><PC> … …</PC></PC>

<PC><PC> … …</PC></PC>

</PCS></PCS>

45

5.3 Applications of CFG’s5.3 Applications of CFG’s

► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s Example 5.23Example 5.23 (cont’d) --- a description for two PC’s using the DTD language (cont’d) --- a description for two PC’s using the DTD language

<PCS><PCS> <PC><PC> <MODEL><MODEL>45604560</MODEL></MODEL> <PRICE><PRICE>$2295$2295</PRICE></PRICE> <PROCESSOR><PROCESSOR>

<MANF><MANF>IntelIntel</MANF></MANF><MODEL><MODEL>PentiumPentium</MODEL></MODEL><SPEED><SPEED>800MHz800MHz</SPEED></SPEED>

</PROCESSOR></PROCESSOR> <RAM><RAM>256256</RAM></RAM> <DISK><DISK><HARDDISK><HARDDISK>

<MANF><MANF>MaxtorMaxtor</MANF></MANF><MODEL><MODEL>DiamondDiamond</MODEL></MODEL><SIZE><SIZE>30.5Gb30.5Gb</SIZE></SIZE>

</HARDDISK></HARDDISK></DISK></DISK> <DISK><DISK><CD><CD>

<SPEED><SPEED>32x32x</SPEED></SPEED> </CD></CD></DISK></DISK>

</PC></PC><PC><PC>

……</PC></PC>

</PCS></PCS>

Fig. 5.15 Part of a document obeying DTD of Fig. 5.14

46

5.3 Applications of CFG’s5.3 Applications of CFG’s►5.3.4 XML & DTD’s5.3.4 XML & DTD’s

Example 5.23Example 5.23 (cont’d) --- converting DTD rules which inclu(cont’d) --- converting DTD rules which includes RE’s into CFG productionsdes RE’s into CFG productions

► <!ELEMENT PROCESSOR (MANF, MODEL, SPEED)><!ELEMENT PROCESSOR (MANF, MODEL, SPEED)> ProcessorProcessor Manf Model SpeedManf Model Speed

(Note: the commas mean concatenations)(Note: the commas mean concatenations)

► <!ELEMENT DISK (HARDDISK | CD | DVD)><!ELEMENT DISK (HARDDISK | CD | DVD)> DiskDisk HarddiskHarddisk | | CdCd | | DvdDvd(Note: the vertical bars mean the same as in production bodie(Note: the vertical bars mean the same as in production bodie

s)s)

► <!ELEMENT PC (MODEL<!ELEMENT PC (MODEL,, PRICE PRICE,, PROCESSOR PROCESSOR,, RAM RAM,, DISK+DISK+)>)> PcPc Model Price Processor Ram Model Price Processor Ram DisksDisks

DisksDisks DiskDisk | | Disk Disk DisksDisks(Note: the 2nd rule corresponds to the regular exp. DISK+ wh(Note: the 2nd rule corresponds to the regular exp. DISK+ wh

ich meansich means one or more disks)one or more disks)

47

5.3 Applications of CFG’s5.3 Applications of CFG’s

► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s General technique for converting DTD rules which General technique for converting DTD rules which

includes RE’s into legal CFG productionsincludes RE’s into legal CFG productions

► Basis:Basis: if the body of a production if the body of a production PP is a concatenation of is a concatenation of

elementselements, then , then PP is in the legal form for CFG’s. is in the legal form for CFG’s.

► Induction:Induction: assume assume EE11 and and EE22 are in legal forms. are in legal forms.

1.1. AA EE11, , EE2 2 (1) (1) AA BC BC (2) (2) BB EE1 1 (3) (3) CC EE22

2.2. A A EE11 | | EE22 (1) (1) AA EE11 (2) (2) AA EE22

3.3. AA ( (EE11))* * (1) (1) AA BA BA (2) (2) AA BB EE11

4.4. AA ( (EE11))+ + (1) (1) AA BA BA (2) (2) AA BB EE11

5.5. AA ( (EE11)?)? (1) (1) AA AA EE11

48

5.3 Applications of CFG’s5.3 Applications of CFG’s►5.3.4 XML & DTD’s5.3.4 XML & DTD’s

Example 5.2.4Example 5.2.4►Try to use the above rules to convert the following into legal Try to use the above rules to convert the following into legal

CFG productions:CFG productions:

<!ELEMENT PC (MODEL<!ELEMENT PC (MODEL,, PRICE PRICE,, PROCESSOR PROCESSOR,, RAM RAM,, DISK+DISK+)>)> (a)(a)

► Solution: Solution: By 1.By 1. (a) (a) Pc Pc ABAB AA Model Price Processor RamModel Price Processor Ram BB DiskDisk+ + (illegal)(illegal)

(b)(b) By 4By 4. (b). (b) BB CBCB | | CC CC DiskDisk By observation By observation AA and and CC may be eliminated, so the final result is may be eliminated, so the final result is Pc Pc Model Price Processor RamModel Price Processor Ram BB BB Disk BDisk B | | Disk Disk (compare with (compare with

last result!)last result!)

49

5.3 Applications of CFG’s5.3 Applications of CFG’s► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s

Example 5.2.4aExample 5.2.4a (supplemental) (supplemental)► Convert the following into legal CFG productionConvert the following into legal CFG production

s:s:<!ELEMENT PCS (<!ELEMENT PCS (PC*PC*)>)>

(d)(d)

► Solution: Solution: By 3By 3. . AA ( (EE11))

* * (1) (1) AA BA BA (2) (2) AA BB EE11

(d)(d) PcsPcs B PcsB Pcs

Pcs Pcs BB PcPc

50

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Definition –Definition –

A CFG A CFG GG = ( = (VV, , TT, , PP, , SS) is ) is ambiguousambiguous if there exists if there exists at least at least one string one string ww in in TT** for which there are for which there are twotwo different parse trees, each with root labeled different parse trees, each with root labeled SS and and yield yield w w identicallyidentically..

If each string has at most one parse tree in If each string has at most one parse tree in GG, then , then GG is is unambiguousunambiguous. .

Ambiguity causes problems in program compiling.Ambiguity causes problems in program compiling.

51

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Example 5.25Example 5.25 --- Given the expression grammar of --- Given the expression grammar of

Example 5.3 (Fig. 5.2) as follows,Example 5.3 (Fig. 5.2) as follows,

EE I | EI | E + + E | EE | E**E | E | ((EE))

II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11

the parse trees of the following two derivations are sthe parse trees of the following two derivations are shown in the next page.hown in the next page.

EE EE + + EE EE + + EE * * EE

EE EE * * EE EE + + EE * * EE

The parse trees obviously are The parse trees obviously are distinctdistinct..

52

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Example 5.25Example 5.25 --- the parse trees of the following --- the parse trees of the following

two derivations are distinct and shown below.two derivations are distinct and shown below.EE EE + + EE EE + + EE * * EEEE EE * * EE EE + + EE * * EEEE EE

EE + + EE E E * * EE

E E * * EE EE + + EE

Different parse trees !!!

53

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Ambiguity in certain grammars may be removed by Ambiguity in certain grammars may be removed by

re-designing the grammar.re-designing the grammar. But some CFL’s are “But some CFL’s are “inherently ambiguousinherently ambiguous,” i.e., ,” i.e.,

every grammar has more than one distinct parse tree every grammar has more than one distinct parse tree forfor each of each of some stringssome strings in the language. in the language.

Note: Two different derivations might have the same Note: Two different derivations might have the same parse tree (see Example 5.26). parse tree (see Example 5.26). So, it is So, it is notnot multiplicity of derivations that causes ambiguitymultiplicity of derivations that causes ambiguity ..

54

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Example 5.26 --- Example 5.26 ---

E E E + EE + E II + + EE aa + + EE aa + + II aa + + b b

E E E + EE + E EE + + II II + + II II + + bb aa + + b b

So, it is So, it is notnot multiplicity of derivations that causes multiplicity of derivations that causes ambiguityambiguity..

55

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.2 Removing Ambiguity from Grammars5.4.2 Removing Ambiguity from Grammars There is no general algorithm that can tell whether a There is no general algorithm that can tell whether a

grammar is ambiguous or not. That is, testing of grammar is ambiguous or not. That is, testing of grammatical ambiguity is grammatical ambiguity is undecidableundecidable..

Ambiguity in Ambiguity in inherentlyinherently ambiguousambiguous grammars is also grammars is also irremovableirremovable..

But elimination of ambiguity in some common But elimination of ambiguity in some common programming language structures is possible. See programming language structures is possible. See discussions next.discussions next.

56

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.2 Removing Ambiguity from Grammars5.4.2 Removing Ambiguity from Grammars An example --- removing ambiguity of the expressioAn example --- removing ambiguity of the expressio

n grammar of Fig. 5.2.n grammar of Fig. 5.2.►Trick: creating “terms,” which Trick: creating “terms,” which cannot be brokencannot be broken, as the un, as the un

its of the generated expressions (for details, see the textboits of the generated expressions (for details, see the textbook).ok).

►Example 5.27 --- an unambiguous version of the original eExample 5.27 --- an unambiguous version of the original expression grammar is as follows (xpression grammar is as follows (TT represents “term”): represents “term”):

E E T | E + T | E + TTTT F | T F | T F FFF I | I | ((EE))II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11

Original:Original:EE I | EI | E + + E E | E| EE | E | ((EE))II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11

57

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.2 Removing Ambiguity from 5.4.2 Removing Ambiguity from GrammarsGrammars A check of the parse tree for A check of the parse tree for EE + + EE * * E --- only the E --- only the

following one; no other alternative!following one; no other alternative!EE

EE + + TT

T T * * FF YACC has its own way of removing ambiguity.YACC has its own way of removing ambiguity.

58

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.3 Leftmost Derivations as a Way to 5.4.3 Leftmost Derivations as a Way to Express AmbiguityExpress Ambiguity In an unambiguous grammar, the leftmost derivation In an unambiguous grammar, the leftmost derivation

is is uniqueunique (so is the rightmost one), as shown by the (so is the rightmost one), as shown by the following theorem.following theorem.

Theorem 5.29 Theorem 5.29

For each grammar For each grammar GG = ( = (VV, , TT, , PP, , SS) and string ) and string ww in in TT*, *, ww has two distinct parse trees has two distinct parse trees if and only ifif and only if ww has has two distinct leftmost (rightmost) derivations.two distinct leftmost (rightmost) derivations.

Proof.Proof. See the textbook. See the textbook.

59

5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.

►5.4.4 Inherent Ambiguity5.4.4 Inherent Ambiguity A CFL A CFL LL is said to be is said to be inherently ambiguousinherently ambiguous if all its gr if all its gr

ammars are ambiguous.ammars are ambiguous. An example of inherently ambiguous languagesAn example of inherently ambiguous languages

LL = { = {aannbbnnccmmddmm | | nn 1, 1, mm 1} 1}∪∪{{aannbbmmccmmddnn | | nn 1, 1, mm 1}1}

Proof. Proof. See the textbook.See the textbook.