1 chapter 5 context-free grammars and languages cathedral of st. basil the blessed, red square,...
TRANSCRIPT
11
Chapter 5Chapter 5
Context-Free Grammars Context-Free Grammars and Languagesand Languages
Cathedral of St. Basil the Blessed, Red Square, Moscow, Russia
2
OutlineOutline
►5.0 Introduction5.0 Introduction►5.1 Context-Free Grammars (CFG’s)5.1 Context-Free Grammars (CFG’s)►5.2 Parse Trees5.2 Parse Trees►5.3 Applications of CFG’s5.3 Applications of CFG’s►5.4 Ambiguity in Grammars and Languages5.4 Ambiguity in Grammars and Languages
3
5.0 Introduction5.0 Introduction
►Context-free grammars (CFG’s) generate context-Context-free grammars (CFG’s) generate context-
free languages (CFL’s).free languages (CFL’s).
►CFG’s play a central role in compiler technology CFG’s play a central role in compiler technology
since 1960’s.since 1960’s.
4
5.0 Introduction5.0 Introduction
►Recently, CFG’s are used to describe document Recently, CFG’s are used to describe document
formats via document-type definitions (DTD’s) formats via document-type definitions (DTD’s)
which is used in XML’s (extensible markup which is used in XML’s (extensible markup
languages).languages).
►XML’s are used mainly for information exchange XML’s are used mainly for information exchange
on the Web.on the Web.
5
5.1 CFG’s5.1 CFG’s
►5.1.1 An Informal Example5.1.1 An Informal Example Palindromes --- e.g., otto, madamiamadam (“Palindromes --- e.g., otto, madamiamadam (“
Madam, I’m Adam”)…Madam, I’m Adam”)… To design a CFG for the language of binary pTo design a CFG for the language of binary p
alindromesalindromes
LLpalpal = { = {ww | | ww{0, 1}{0, 1}**, , wwRR = = ww}.}.
►Examples: 00, 0110, 011110, …Examples: 00, 0110, 011110, …
6
5.1 CFG’s5.1 CFG’s►5.1.1 An Informal Example5.1.1 An Informal Example
CFG:CFG:►PP (1)(1)►PP 0 0 (2)(2)►PP 1 1 (3)(3)►PP 0 0PP00 (4)(4)►PP 1 1PP11 (5)(5)
Productions Productions (4) & (5) are (4) & (5) are recursiverecursive. . PP is a is a variable variable ((nonterminalnonterminal). 0 and 1 are ). 0 and 1 are terminals.terminals.
Examples of derivations of strings:Examples of derivations of strings:►PP (4)(4) 0 0P0P0 (5)(5) 01 01PP10 10 (5)(5) 011 011PP110 110 (1)(1) 011110 011110►PP (4)(4) 0 0P0P0 (3)(3) 010 … 010 …
7
5.1 CFG’s5.1 CFG’s
►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s
A CFG is a 4-tuple A CFG is a 4-tuple GG = ( = (VV, , TT, , PP, , SS) where) where
►VV is the set of is the set of variables (or nonterminals, variables (or nonterminals,
syntactic categories)syntactic categories);;
►TT is the set of is the set of terminalsterminals;;
►SS is the is the start symbolstart symbol;;
(cont’d in the next page)(cont’d in the next page)
8
5.1 CFG’s5.1 CFG’s
►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s A CFG is a 4-tuple A CFG is a 4-tuple GG = ( = (VV, , TT, , PP, , SS) where ) where
(cont’d)(cont’d)
►PP is the set of is the set of productions or rules productions or rules of the form of the form “head”“head” “body” “body”
AA where where AA VV, , ( (VV∪∪TT))**, i.e., the head is a , i.e., the head is a
single variable and the body is a string of zero or single variable and the body is a string of zero or
more terminals and variables.more terminals and variables.
9
5.1 CFG’s5.1 CFG’s
►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s For the last example:For the last example:
►VV = { = {PP}}►TT = {0, 1} = {0, 1}►PP = = AA = the set of the five productions (1)~(5) bel = the set of the five productions (1)~(5) bel
owow PP (1)(1) PP 0 0 (2)(2) PP 1 1 (3)(3) PP 0 0PP00 (4)(4) PP 1 1PP11 (5)(5)
►SS = = P P (*P is used for two meanings here; do not get confused!)(*P is used for two meanings here; do not get confused!)
10
5.1 CFG’s5.1 CFG’s►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s
Example 5.3Example 5.3 -- a CFG for expressions ( -- a CFG for expressions ( 算數式算數式 ))►EE II►EE EE + + EE►EE EE**EE►EE ( (EE))
wherewhere I I is an identifier describable by RE is an identifier describable by RE((aa + + bb)()(aa + + bb + + 00 + + 11))** and transformable into and transformable into productions: productions:
►II aa►II bb►II IaIa►II IbIb►II II00►II II11
11
5.1 CFG’s5.1 CFG’s
► 5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s Example 5.3Example 5.3 --- a CFG for expressions ( --- a CFG for expressions ( 算數式算數式 ))
1.1. EE II2.2. EE EE + + EE3.3. EE EE**EE4.4. EE ( (EE))5.5. II aa6.6. II bb7.7. II IaIa8.8. II IbIb9.9. II II0010.10. II II11
The above productions may be rewritten asThe above productions may be rewritten as ► EE I | EI | E + + E | EE | E**E | E | ((EE))► II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11
12
5.1 CFG’s5.1 CFG’s
►5.1.2 Definition of CFG’s5.1.2 Definition of CFG’s
Example 5.3Example 5.3 --- a CFG for expressions --- a CFG for expressions (( 算數式算數式 ))
EE I | EI | E + + E | EE | E E | E | ((EE))
II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11
►Examples of derivations:Examples of derivations:
EE EE EE II EE aa EE aa ( (EE) )
aa ( (EE + + EE) ) aa ( (II0 + 0 + EE) ) ... ... aa ( (aa0 +0 +bb1)1)
13
5.1 CFG’s5.1 CFG’s
►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar We use productions to generate (“infer”) strings in We use productions to generate (“infer”) strings in
the language described by the grammar.the language described by the grammar. Two ways for such Two ways for such string inferencestring inference::
►Recursive inferenceRecursive inference --- --- bottom up (“from body to head”)bottom up (“from body to head”) starting from known strings (often from terminals in starting from known strings (often from terminals in
productions)productions)►DerivationDerivation --- ---
top down (“from head to body”) in concepttop down (“from head to body”) in concept as shown by the examples beforeas shown by the examples before
14
5.1 CFG’s5.1 CFG’s►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar
Example 5.4Example 5.4 --- ---
Show recursive inference of the string Show recursive inference of the string aa((aa++bb00) 00) String inferred
For language of
Production used
String(s) used
(i) a I 5 (I a) (ii) b I 6 (I b)
(iii) b0 I 9 (I I0) (ii)
(iv) b00 I 9 (I I0) (iii)
(v) a E 1 (E I) (i)
(vi) b00 E 1 (E I) (iv)
(vii) a + b00 E 2 (E E+ E) (v), (vi)
(viii) (a + b00) E 4 (vii)
(iv) a(a + b00) E 3 (v), (viii)
15
5.1 CFG’s5.1 CFG’s
►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar
Notations of derivations:Notations of derivations:
►If If AA is a string of terminals and variables, and if is a string of terminals and variables, and if
AA is a production, then we write is a production, then we write
AA
to denote a to denote a derivationderivation..
G
16
5.1 CFG’s5.1 CFG’s
►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar
Notations of derivations:Notations of derivations:
►For zero and more derivations, we use the notationFor zero and more derivations, we use the notation
AA
►The labelThe label G G under the double arrow may be omitted under the double arrow may be omitted
if which grammar is being used is understood.if which grammar is being used is understood.
G
17
5.1 CFG’s5.1 CFG’s►5.1.3 Derivations Using a Grammar5.1.3 Derivations Using a Grammar
Example 5.5Example 5.5 --- --- Derivation of the string Derivation of the string aa((aa++bb00) 00)
EE EE EE II EE aa EE aa ( (EE) ) aa ( (II + + EE) ) aa ( (aa + + EE) ) aa ( (aa + + II) ) aa ( (aa + + II0)0) aa ( (aa + + II00) 00) aa ( (aa + + bb00)00)
18
5.1 CFG’s5.1 CFG’s►5.1.4 Leftmost and Rightmost Derivations5.1.4 Leftmost and Rightmost Derivations
Leftmost derivationLeftmost derivation --- replacing the leftmost variabl --- replacing the leftmost variable in each derivation step (use notation but for typie in each derivation step (use notation but for typing convenience we also use ng convenience we also use lmlm ))
Rightmost derivationRightmost derivation --- rightmost instead ( --- rightmost instead (rmrm).).
Example 5.6Example 5.6 --- a continuation from Example 5.5 --- a continuation from Example 5.5
►Leftmost derivationLeftmost derivation
EE lmlm EEEE lmlm IIEE lmlm aaEE lmlm aa((EE) ) lmlm aa((EE + + EE) ) lmlm
aa((II + + EE) ) lmlm ... ... lmlm aa((aa + +II00) 00) lmlm aa((aa + + bb00)00)
lm
19
5.1 CFG’s5.1 CFG’s
►5.1.4 Leftmost and Rightmost Derivations5.1.4 Leftmost and Rightmost Derivations
Example 5.6Example 5.6 --- Continuation of Example 5.5 --- Continuation of Example 5.5
►Rightmost derivationRightmost derivation
EE rmrm EEEE rmrm EE((EE) ) rmrm EE((EE + + EE) ) rmrm EE((EE + + II) ) rmrm
EE((EE + + II0) 0) rmrm EE((EE + + II00) 00) rmrm ... ... rmrm II((aa + + bb00) 00) rmrm
aa((aa + + bb00)00)
Any derivation has an equivalent leftmost derivation Any derivation has an equivalent leftmost derivation
and an equivalent rightmost one (proved in the next sand an equivalent rightmost one (proved in the next s
ection).ection).
20
5.1 CFG’s5.1 CFG’s
►5.1.5 The Language of a Grammar5.1.5 The Language of a Grammar The language The language LL((GG) of a CFG ) of a CFG GG = ( = (VV, , TT, , PP, , SS) is ) is
LL((GG) = {) = {ww | | ww TT**, , S S ww}.}.
The language of a CFG is called a context-free The language of a CFG is called a context-free
language (CFL).language (CFL).
Theorem 5.7 Theorem 5.7 in the text bookin the text book shows a typical way to shows a typical way to
prove that a given grammar really generates the prove that a given grammar really generates the
desired language (the set of palindromes) (desired language (the set of palindromes) (using using
inductioninduction) (read by yourself).) (read by yourself).
G
21
5.1 CFG’s5.1 CFG’s
►5.1.6 Sentential Forms5.1.6 Sentential Forms Derivations from the start symbol are called Derivations from the start symbol are called
sentential formssentential forms.. That is, given a CFG That is, given a CFG GG = ( = (VV, , TT, , PP, , SS), if ), if SS where where
( (VV∪∪TT))**, then , then is a sentential form. is a sentential form. if if SS where where ( (VV∪∪TT))**, then , then is a left- is a left-
sentential form.sentential form. if if SS where where ( (VV∪∪TT))**, then , then is a right- is a right-
sentential form.sentential form. Example 5.8Example 5.8 --- easy; read by yourself. --- easy; read by yourself.
lm
rm
22
5.2 Parse Trees5.2 Parse Trees
►Advantage of parse trees –Advantage of parse trees – In a compiler, the parse tree structure facilitates In a compiler, the parse tree structure facilitates
translationtranslation of the source program into recursive of the source program into recursive
executable codes.executable codes.
Parse trees are closely related to derivations and Parse trees are closely related to derivations and
recursive inferences.recursive inferences.
An important application of the parse tree is the An important application of the parse tree is the
study of study of grammatical ambiguitygrammatical ambiguity which makes the which makes the
grammar unsuitable for a programming language.grammar unsuitable for a programming language.
23
5.2 Parser Trees5.2 Parser Trees
► A note about the term YACC in textbook ---A note about the term YACC in textbook --- The computer program YACC is a parser generator developed The computer program YACC is a parser generator developed
by Stephen C. Johnson at AT&T for the Unix operating by Stephen C. Johnson at AT&T for the Unix operating system.system.
The name is an acronym for “Yet Another Compiler The name is an acronym for “Yet Another Compiler Compiler.”Compiler.”
It generates a parser (the part of a compiler that tries to make It generates a parser (the part of a compiler that tries to make syntactic sense of the source code) based on an analytic syntactic sense of the source code) based on an analytic grammar written in a notation similar to BNF.grammar written in a notation similar to BNF.
YACC generates the code for the parser in the C YACC generates the code for the parser in the C programming language.programming language.
(Retrieved from Wikipedia, 2007/10/8)(Retrieved from Wikipedia, 2007/10/8)
24
5.2 Parse Trees5.2 Parse Trees
►5.2.1 Constructing Parse Trees5.2.1 Constructing Parse Trees
Given a grammar Given a grammar GG = ( = (VV, , TT, , PP, , SS), the parse tree is ), the parse tree is
defined as:defined as:
►Each interior node is labeled by a variable in Each interior node is labeled by a variable in VV..
►Each leaf is labeled by either a variable, a terminal, or Each leaf is labeled by either a variable, a terminal, or . If . If
is the label, it must be the only child of its parent. is the label, it must be the only child of its parent.
(cont’d in the next page)(cont’d in the next page)
25
5.2 Parse Trees5.2 Parse Trees
►5.2.1 Constructing Parse Trees5.2.1 Constructing Parse Trees
Given a grammar Given a grammar GG = ( = (VV, , TT, , PP, , SS), the parse tree is de), the parse tree is de
fined as (cont’d):fined as (cont’d):
►If an interior node is labeled If an interior node is labeled AA, and its children are labeled , and its children are labeled
XX11, , XX22, …, , …, XXkk, respectively, from the left, then , respectively, from the left, then AA XX11XX22……
XXkk is a production in is a production in PP..
►Note that the only time one of the Note that the only time one of the XX’s can be ’s can be is when is when is is
the label of the only child, and the label of the only child, and AA is a production of is a production of GG..
26
5.2 Parse Trees5.2 Parse Trees
►5.2.1 Constructing Parse Trees5.2.1 Constructing Parse Trees Example 5.10 --- a parse tree of derivation Example 5.10 --- a parse tree of derivation PP 0110 0110
of the palindrome.of the palindrome.
PP
0 P 00 P 0
1 P 11 P 1
27
5.2 Parse Trees5.2 Parse Trees
►5.2.2 The Yield of a Parse Tree5.2.2 The Yield of a Parse Tree The The yield yield of a parse tree is the string obtained by of a parse tree is the string obtained by
concatenating all the leaves from the left, like 01concatenating all the leaves from the left, like 0110= 10= 0110 for the tree of the last example below.0110 for the tree of the last example below.
PP 0 P 00 P 0
1 P 11 P 1
Showing the yields of the parse trees of a grammar Showing the yields of the parse trees of a grammar GG is is
another way to describe the language of another way to describe the language of GG (provable) (provable)..
28
5.2 Parse Trees5.2 Parse Trees
►5.2.3 Inference, Derivations, and Parse Trees5.2.3 Inference, Derivations, and Parse Trees Given a grammar Given a grammar GG = ( = (VV, , TT, , PP, , SS), the following ), the following
facts are all equivalent:facts are all equivalent:
►the recursive inference procedure determines that the recursive inference procedure determines that
terminal string terminal string ww is in the language of variable is in the language of variable AA;;
►AA ww;;
►AA ww;;
►AA ww;;
►there is a parse tree with root there is a parse tree with root AA and yield and yield ww..
lm
rm
29
5.2 Parse Trees5.2 Parse Trees
►5.2.3 Inference, Derivations, and Parse Trees5.2.3 Inference, Derivations, and Parse Trees Equivalences of the facts in the previous page are Equivalences of the facts in the previous page are
proved in a way as shown in the following diagram proved in a way as shown in the following diagram by theorems in Sections 5.2.4~5.2.6 (read by by theorems in Sections 5.2.4~5.2.6 (read by yourself):yourself):
Parse tree
Leftmost derivation
Rightmost derivationDerivati
on
Recursive inference
30
5.3 Applications of CFG’s5.3 Applications of CFG’s
►CFG’s were originally conceived by N. CFG’s were originally conceived by N. Chomsky for describing natural languages. But Chomsky for describing natural languages. But not all natural languages can be so described.not all natural languages can be so described.
►Two applications of CFG’s ---Two applications of CFG’s --- Describing programming languages.Describing programming languages.
►There is a mechanical way for turning a CFG into a parser.There is a mechanical way for turning a CFG into a parser.
Describing DTD’s in XML’s ---Describing DTD’s in XML’s ---►DTD --- document type definitionDTD --- document type definition
►XML --- extensible markup languageXML --- extensible markup language
31
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.1 Parsers5.3.1 Parsers There are components in programming languages There are components in programming languages
which are not RL’s.which are not RL’s.
Two examples of non-RL structures ---Two examples of non-RL structures ---►Balanced structures like parentheses “Balanced structures like parentheses “( )( ),” “,” “beginbegin--end”end”
pair, “pair, “ifif--elseelse” pair, …” pair, …
►Unbalanced structures like Unbalanced structures like unbalanced unbalanced ““if elseif else” pairs…” pairs…
►E.g., a balanced if-else pair in C isE.g., a balanced if-else pair in C is
if (condition) Statement; else Statement;if (condition) Statement; else Statement;
32
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.1 Parsers5.3.1 Parsers Example 5.19Example 5.19 --- ---
►A grammar for balanced parenthesesA grammar for balanced parentheses
GGbalbal = ({ = ({BB}, {(, )}, }, {(, )}, PP, , BB))
PP: : BB BBBB | ( | (BB) | ) |
e.g., e.g., xx = (()) = (())
()()()()
… …
33
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.1 Parsers5.3.1 Parsers Example 5.19Example 5.19 (cont’d)--- (cont’d)---
►A grammar for unbalanced if-else pairs A grammar for unbalanced if-else pairs SS | | SSSS | | iSiS | | iSeSiSeS
where where ii = = if …, if …, ee = = else …else … The production The production SS iS iS is used to generate is used to generate unbalanceunbalance “ “ifif” (with no m” (with no m
atching “atching “elseelse”)”) e.g., the following is a generated segment of a C programe.g., the following is a generated segment of a C program
if (Condition) {if (Condition) { ……
if (Condition) Statement;if (Condition) Statement;else Statement;else Statement;……if (Condition) Statement;if (Condition) Statement;else Statement;else Statement;……
}}
34
5.3 Applications of CFG’s5.3 Applications of CFG’s
EE IIEE EE + + EEEE EE**EEEE ( (EE))
II aaII bbII IaIaII IbIbII II00II II11
ExpExp : : IdId {…}{…} | Exp ’+’ Exp| Exp ’+’ Exp {…}{…} | Exp ’*’ Exp| Exp ’*’ Exp {…}{…} | ’(’ Exp ’)’| ’(’ Exp ’)’ {…}{…} ;;
Id Id : ’a’ : ’a’ | ’b’| ’b’ {…}{…} | Id ’a’| Id ’a’ {…}{…} | Id ’b’| Id ’b’ {…}{…} | Id ’0’| Id ’0’ {…}{…} | Id ’1’| Id ’1’ {…}{…} ;;
►5.3.2 The YACC Parser-Generator5.3.2 The YACC Parser-Generator Input to YACC is a CFG with each production being Input to YACC is a CFG with each production being
associated additionally with an action ({…} below).associated additionally with an action ({…} below). Example 5.21Example 5.21 --- a CFG in the YACC notation --- a CFG in the YACC notation
identical to that of Example 5.3 (Fig. 5.2).identical to that of Example 5.3 (Fig. 5.2).
35
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.3 Markup Languages5.3.3 Markup Languages
The strings in a markup language are documents witThe strings in a markup language are documents wit
h “marks,” called h “marks,” called tagstags which specify semantics of th which specify semantics of th
e strings.e strings.
An example --- HTML (HyperText Markup LanguagAn example --- HTML (HyperText Markup Languag
e) for webpage design, including 2 functions:e) for webpage design, including 2 functions:►Creating links between documentsCreating links between documents
►Describing formats (“looks”) of documents Describing formats (“looks”) of documents
36
5.3 Applications of CFG’s5.3 Applications of CFG’s
The thing IThe thing I hatehate::
1. Moldy bread.1. Moldy bread.
2. People who drive too slow 2. People who drive too slow
in the fast lane.in the fast lane.
<P>The thing I <EM> <P>The thing I <EM> hate </EM> hate </EM>
<OL><OL><LI>Moldy bread.<LI>Moldy bread.<LI>People who drive <LI>People who drive too slow in the too slow in the fast lane.fast lane.
</OL></OL>
► 5.3.3 Markup Languages5.3.3 Markup Languages Example 5.22 – Example 5.22 – HTML of A webpageHTML of A webpage
(a) The text as viewed on webpage
(b) The HTML source
37
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.3 Markup Languages5.3.3 Markup Languages Example 5.22 – Example 5.22 – HTML of A webpage (cont’d)HTML of A webpage (cont’d)
►Meanings of tags:Meanings of tags: <P> --- paragraph <P> --- paragraph
(unmatched single tag)(unmatched single tag) <EM>…</EM> --- emphasizing text … <EM>…</EM> --- emphasizing text …
(matched tag pair)(matched tag pair) <LI> --- list item<LI> --- list item
(unmatched s(unmatched singleingle tag) tag) <OL>…</OL> --- ordered list<OL>…</OL> --- ordered list
(matc(matched tag pair)hed tag pair)
38
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.3 Markup Languages5.3.3 Markup Languages Example 5.22 Example 5.22 – HTML of A webpage – HTML of A webpage
(cont’d)(cont’d)►Meanings of Meanings of tagstags::
<P> - paragraph<P> - paragraph
<EM>…</EM> - emphasizing text<EM>…</EM> - emphasizing text
<LI> - list item<LI> - list item
<OL>…</OL> - ordered list<OL>…</OL> - ordered list
<P>The thing I <EM> hate <P>The thing I <EM> hate
</EM> </EM>
<OL><OL>
<LI>Moldy bread.<LI>Moldy bread.
<LI>People who drive too <LI>People who drive too
slow in the fast lane.slow in the fast lane.
</OL></OL>
39
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.3 Markup Languages5.3.3 Markup Languages Part of an HTML grammarPart of an HTML grammar
1. 1. DocDoc | | ElementElement DocDoc
2. 2. ElementElement TextText | |
<EM> <EM> DocDoc </EM> | </EM> |
<P> <P> DocDoc | |
<OL> <OL> ListList </OL> | … </OL> | …
3. 3. TextText | | CharChar TextText
4. 4. CharChar aa | | AA | … | …
5. 5. ListList | | ListitemListitem ListList
6. 6. ListitemListitem <LI> <LI> DocDoc
40
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.4 XML and Document-Type Definitions 5.3.4 XML and Document-Type Definitions
(DTD’s)(DTD’s) An XML (extensible markup language) describes the An XML (extensible markup language) describes the
semanticssemantics of the text in a document using DTD’s, of the text in a document using DTD’s,
while an HTML describes the while an HTML describes the formatformat of a document. of a document.
The DTD is itself described with a language which is The DTD is itself described with a language which is
essential essential in the form of a CFG mixed with regular in the form of a CFG mixed with regular
expressionsexpressions..
41
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.4 XML and Document-Type Definitions 5.3.4 XML and Document-Type Definitions (DTD’s)(DTD’s) The form of a The form of a DTDDTD is is
<!DOCTYPE name-of-DTD [<!DOCTYPE name-of-DTD [
list of list of elementelement definitionsdefinitions
]>]>
The form of an The form of an elementelement definitiondefinition is is<!<!ElementElement element-name ( element-name (description of the elementdescription of the element)>)>
Descriptions of elementsDescriptions of elements are essentially are essentially regular regular expressionsexpressions..
42
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.4 XML and Document-Type Definitions 5.3.4 XML and Document-Type Definitions (DTD’s)(DTD’s) The regular expression mayThe regular expression may be be described in the described in the
following way:following way:►An element may appear in another element.An element may appear in another element.►The special term The special term #PCDATA#PCDATA stands for any text involving stands for any text involving
nono tagstags..►The allowed operators are The allowed operators are
A vertical bar | means union;A vertical bar | means union; A comma , denotes concatenationA comma , denotes concatenation 3 variants of the closure operator --- *, +, ? as described before.3 variants of the closure operator --- *, +, ? as described before.
(*: zero or more occurrences of; (*: zero or more occurrences of; +: one or more occurrences of;+: one or more occurrences of; ?: zero or one occurrence of)?: zero or one occurrence of)
43
5.3 Applications of CFG’s5.3 Applications of CFG’s
► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s Example 5.23Example 5.23 --- a DTD for describing personal computers (PC’s) --- a DTD for describing personal computers (PC’s)
<<!DOCTYPE PcSpecs !DOCTYPE PcSpecs [[<!ELEMENT PCS (<!ELEMENT PCS (PC*PC*)>)><!ELEMENT PC (MODEL<!ELEMENT PC (MODEL,, PRICE PRICE,, PROCESSOR PROCESSOR,, RAM RAM,, DISK+DISK+)>)><!ELEMENT MODEL (#PCDATA)><!ELEMENT MODEL (#PCDATA)><!ELEMENT PRICE (#PCDATA)><!ELEMENT PRICE (#PCDATA)><!ELEMENT PROCESSOR (MANF<!ELEMENT PROCESSOR (MANF,, MODEL MODEL, , SPEED)>SPEED)><!ELEMENT MANF (#PCDATA)><!ELEMENT MANF (#PCDATA)><!ELEMENT MODEL (#PCDATA)><!ELEMENT MODEL (#PCDATA)><!ELEMENT SPEED (#PCDATA)><!ELEMENT SPEED (#PCDATA)><!ELEMENT RAM (#PCDATA)><!ELEMENT RAM (#PCDATA)><!ELEMENT DISK (HARDDISK <!ELEMENT DISK (HARDDISK || CD CD || DVD)> DVD)><!ELEMENT HARDDISK (MANF<!ELEMENT HARDDISK (MANF,, MODEL MODEL,, SIZE)> SIZE)><!ELEMENT SIZE (#PCDATA)><!ELEMENT SIZE (#PCDATA)><!ELEMENT CD (SPEED)><!ELEMENT CD (SPEED)><!ELEMENT DVD (SPEED)><!ELEMENT DVD (SPEED)>
]]>>(MANf: manufacturer)(MANf: manufacturer)
Fig. 5.14 A DTD for a PC
44
5.3 Applications of CFG’s5.3 Applications of CFG’s
►5.3.4 XML & DTD’s5.3.4 XML & DTD’s Example 5.23Example 5.23 --- a DTD for describing persona --- a DTD for describing persona
l computers (PC’s) (continued)l computers (PC’s) (continued)► Each element is represented in the document by a tag with the name of the elemEach element is represented in the document by a tag with the name of the elem
ent and a matching tag at the end, with an extra slash, just as in HTML.ent and a matching tag at the end, with an extra slash, just as in HTML.► For example: For example:
<!ELEMENT MODEL (#PCDATA)><!ELEMENT MODEL (#PCDATA)> <MODEL><MODEL>45604560</MODEL></MODEL> <!ELEMENT PCS (<!ELEMENT PCS (PC*PC*)> )>
<PCS><PCS> <PC><PC> … …</PC></PC>
<PC><PC> … …</PC></PC>
</PCS></PCS>
45
5.3 Applications of CFG’s5.3 Applications of CFG’s
► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s Example 5.23Example 5.23 (cont’d) --- a description for two PC’s using the DTD language (cont’d) --- a description for two PC’s using the DTD language
<PCS><PCS> <PC><PC> <MODEL><MODEL>45604560</MODEL></MODEL> <PRICE><PRICE>$2295$2295</PRICE></PRICE> <PROCESSOR><PROCESSOR>
<MANF><MANF>IntelIntel</MANF></MANF><MODEL><MODEL>PentiumPentium</MODEL></MODEL><SPEED><SPEED>800MHz800MHz</SPEED></SPEED>
</PROCESSOR></PROCESSOR> <RAM><RAM>256256</RAM></RAM> <DISK><DISK><HARDDISK><HARDDISK>
<MANF><MANF>MaxtorMaxtor</MANF></MANF><MODEL><MODEL>DiamondDiamond</MODEL></MODEL><SIZE><SIZE>30.5Gb30.5Gb</SIZE></SIZE>
</HARDDISK></HARDDISK></DISK></DISK> <DISK><DISK><CD><CD>
<SPEED><SPEED>32x32x</SPEED></SPEED> </CD></CD></DISK></DISK>
</PC></PC><PC><PC>
……</PC></PC>
</PCS></PCS>
Fig. 5.15 Part of a document obeying DTD of Fig. 5.14
46
5.3 Applications of CFG’s5.3 Applications of CFG’s►5.3.4 XML & DTD’s5.3.4 XML & DTD’s
Example 5.23Example 5.23 (cont’d) --- converting DTD rules which inclu(cont’d) --- converting DTD rules which includes RE’s into CFG productionsdes RE’s into CFG productions
► <!ELEMENT PROCESSOR (MANF, MODEL, SPEED)><!ELEMENT PROCESSOR (MANF, MODEL, SPEED)> ProcessorProcessor Manf Model SpeedManf Model Speed
(Note: the commas mean concatenations)(Note: the commas mean concatenations)
► <!ELEMENT DISK (HARDDISK | CD | DVD)><!ELEMENT DISK (HARDDISK | CD | DVD)> DiskDisk HarddiskHarddisk | | CdCd | | DvdDvd(Note: the vertical bars mean the same as in production bodie(Note: the vertical bars mean the same as in production bodie
s)s)
► <!ELEMENT PC (MODEL<!ELEMENT PC (MODEL,, PRICE PRICE,, PROCESSOR PROCESSOR,, RAM RAM,, DISK+DISK+)>)> PcPc Model Price Processor Ram Model Price Processor Ram DisksDisks
DisksDisks DiskDisk | | Disk Disk DisksDisks(Note: the 2nd rule corresponds to the regular exp. DISK+ wh(Note: the 2nd rule corresponds to the regular exp. DISK+ wh
ich meansich means one or more disks)one or more disks)
47
5.3 Applications of CFG’s5.3 Applications of CFG’s
► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s General technique for converting DTD rules which General technique for converting DTD rules which
includes RE’s into legal CFG productionsincludes RE’s into legal CFG productions
► Basis:Basis: if the body of a production if the body of a production PP is a concatenation of is a concatenation of
elementselements, then , then PP is in the legal form for CFG’s. is in the legal form for CFG’s.
► Induction:Induction: assume assume EE11 and and EE22 are in legal forms. are in legal forms.
1.1. AA EE11, , EE2 2 (1) (1) AA BC BC (2) (2) BB EE1 1 (3) (3) CC EE22
2.2. A A EE11 | | EE22 (1) (1) AA EE11 (2) (2) AA EE22
3.3. AA ( (EE11))* * (1) (1) AA BA BA (2) (2) AA BB EE11
4.4. AA ( (EE11))+ + (1) (1) AA BA BA (2) (2) AA BB EE11
5.5. AA ( (EE11)?)? (1) (1) AA AA EE11
48
5.3 Applications of CFG’s5.3 Applications of CFG’s►5.3.4 XML & DTD’s5.3.4 XML & DTD’s
Example 5.2.4Example 5.2.4►Try to use the above rules to convert the following into legal Try to use the above rules to convert the following into legal
CFG productions:CFG productions:
<!ELEMENT PC (MODEL<!ELEMENT PC (MODEL,, PRICE PRICE,, PROCESSOR PROCESSOR,, RAM RAM,, DISK+DISK+)>)> (a)(a)
► Solution: Solution: By 1.By 1. (a) (a) Pc Pc ABAB AA Model Price Processor RamModel Price Processor Ram BB DiskDisk+ + (illegal)(illegal)
(b)(b) By 4By 4. (b). (b) BB CBCB | | CC CC DiskDisk By observation By observation AA and and CC may be eliminated, so the final result is may be eliminated, so the final result is Pc Pc Model Price Processor RamModel Price Processor Ram BB BB Disk BDisk B | | Disk Disk (compare with (compare with
last result!)last result!)
49
5.3 Applications of CFG’s5.3 Applications of CFG’s► 5.3.4 XML & DTD’s5.3.4 XML & DTD’s
Example 5.2.4aExample 5.2.4a (supplemental) (supplemental)► Convert the following into legal CFG productionConvert the following into legal CFG production
s:s:<!ELEMENT PCS (<!ELEMENT PCS (PC*PC*)>)>
(d)(d)
► Solution: Solution: By 3By 3. . AA ( (EE11))
* * (1) (1) AA BA BA (2) (2) AA BB EE11
(d)(d) PcsPcs B PcsB Pcs
Pcs Pcs BB PcPc
50
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Definition –Definition –
A CFG A CFG GG = ( = (VV, , TT, , PP, , SS) is ) is ambiguousambiguous if there exists if there exists at least at least one string one string ww in in TT** for which there are for which there are twotwo different parse trees, each with root labeled different parse trees, each with root labeled SS and and yield yield w w identicallyidentically..
If each string has at most one parse tree in If each string has at most one parse tree in GG, then , then GG is is unambiguousunambiguous. .
Ambiguity causes problems in program compiling.Ambiguity causes problems in program compiling.
51
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Example 5.25Example 5.25 --- Given the expression grammar of --- Given the expression grammar of
Example 5.3 (Fig. 5.2) as follows,Example 5.3 (Fig. 5.2) as follows,
EE I | EI | E + + E | EE | E**E | E | ((EE))
II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11
the parse trees of the following two derivations are sthe parse trees of the following two derivations are shown in the next page.hown in the next page.
EE EE + + EE EE + + EE * * EE
EE EE * * EE EE + + EE * * EE
The parse trees obviously are The parse trees obviously are distinctdistinct..
52
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Example 5.25Example 5.25 --- the parse trees of the following --- the parse trees of the following
two derivations are distinct and shown below.two derivations are distinct and shown below.EE EE + + EE EE + + EE * * EEEE EE * * EE EE + + EE * * EEEE EE
EE + + EE E E * * EE
E E * * EE EE + + EE
Different parse trees !!!
53
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Ambiguity in certain grammars may be removed by Ambiguity in certain grammars may be removed by
re-designing the grammar.re-designing the grammar. But some CFL’s are “But some CFL’s are “inherently ambiguousinherently ambiguous,” i.e., ,” i.e.,
every grammar has more than one distinct parse tree every grammar has more than one distinct parse tree forfor each of each of some stringssome strings in the language. in the language.
Note: Two different derivations might have the same Note: Two different derivations might have the same parse tree (see Example 5.26). parse tree (see Example 5.26). So, it is So, it is notnot multiplicity of derivations that causes ambiguitymultiplicity of derivations that causes ambiguity ..
54
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.1 Ambiguous Grammars5.4.1 Ambiguous Grammars Example 5.26 --- Example 5.26 ---
E E E + EE + E II + + EE aa + + EE aa + + II aa + + b b
E E E + EE + E EE + + II II + + II II + + bb aa + + b b
So, it is So, it is notnot multiplicity of derivations that causes multiplicity of derivations that causes ambiguityambiguity..
55
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.2 Removing Ambiguity from Grammars5.4.2 Removing Ambiguity from Grammars There is no general algorithm that can tell whether a There is no general algorithm that can tell whether a
grammar is ambiguous or not. That is, testing of grammar is ambiguous or not. That is, testing of grammatical ambiguity is grammatical ambiguity is undecidableundecidable..
Ambiguity in Ambiguity in inherentlyinherently ambiguousambiguous grammars is also grammars is also irremovableirremovable..
But elimination of ambiguity in some common But elimination of ambiguity in some common programming language structures is possible. See programming language structures is possible. See discussions next.discussions next.
56
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.2 Removing Ambiguity from Grammars5.4.2 Removing Ambiguity from Grammars An example --- removing ambiguity of the expressioAn example --- removing ambiguity of the expressio
n grammar of Fig. 5.2.n grammar of Fig. 5.2.►Trick: creating “terms,” which Trick: creating “terms,” which cannot be brokencannot be broken, as the un, as the un
its of the generated expressions (for details, see the textboits of the generated expressions (for details, see the textbook).ok).
►Example 5.27 --- an unambiguous version of the original eExample 5.27 --- an unambiguous version of the original expression grammar is as follows (xpression grammar is as follows (TT represents “term”): represents “term”):
E E T | E + T | E + TTTT F | T F | T F FFF I | I | ((EE))II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11
Original:Original:EE I | EI | E + + E E | E| EE | E | ((EE))II a | b | Ia | Ib | Ia | b | Ia | Ib | I0 | 0 | II11
57
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.2 Removing Ambiguity from 5.4.2 Removing Ambiguity from GrammarsGrammars A check of the parse tree for A check of the parse tree for EE + + EE * * E --- only the E --- only the
following one; no other alternative!following one; no other alternative!EE
EE + + TT
T T * * FF YACC has its own way of removing ambiguity.YACC has its own way of removing ambiguity.
58
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.3 Leftmost Derivations as a Way to 5.4.3 Leftmost Derivations as a Way to Express AmbiguityExpress Ambiguity In an unambiguous grammar, the leftmost derivation In an unambiguous grammar, the leftmost derivation
is is uniqueunique (so is the rightmost one), as shown by the (so is the rightmost one), as shown by the following theorem.following theorem.
Theorem 5.29 Theorem 5.29
For each grammar For each grammar GG = ( = (VV, , TT, , PP, , SS) and string ) and string ww in in TT*, *, ww has two distinct parse trees has two distinct parse trees if and only ifif and only if ww has has two distinct leftmost (rightmost) derivations.two distinct leftmost (rightmost) derivations.
Proof.Proof. See the textbook. See the textbook.
59
5.4 Ambiguity in Grammars & Langs.5.4 Ambiguity in Grammars & Langs.
►5.4.4 Inherent Ambiguity5.4.4 Inherent Ambiguity A CFL A CFL LL is said to be is said to be inherently ambiguousinherently ambiguous if all its gr if all its gr
ammars are ambiguous.ammars are ambiguous. An example of inherently ambiguous languagesAn example of inherently ambiguous languages
LL = { = {aannbbnnccmmddmm | | nn 1, 1, mm 1} 1}∪∪{{aannbbmmccmmddnn | | nn 1, 1, mm 1}1}
Proof. Proof. See the textbook.See the textbook.