chomsky and greibach normal forms - university of iowahomepage.cs.uiowa.edu/~rus/cfg3.pdf ·...

27
Chomsky and Greibach Normal Forms Teodor Rus [email protected] The University of Iowa, Department of Computer Science Computation Theory – p.1/27

Upload: others

Post on 11-Jun-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Chomsky and Greibach NormalFormsTeodor Rus

[email protected]

The University of Iowa, Department of Computer Science

Computation Theory – p.1/27

Simplifying a CFG• It is often convenient to simplify a CFG;• One of the simplest and most useful simplified

forms of CFG is called the Chomsky normalform;

• Another normal form usually used in algebraicspecifications is Greibach normal form.

Note the difference between grammar cleaning and grammar simplification!

Computation Theory – p.2/27

FactNormal forms are useful when more advanced topics

in computation theory are approached, as we shall see

further.

Computation Theory – p.3/27

DefinitionA context-free grammarG is in Chomsky normalform if every rule is of the form:

A −→ BC

A −→ a

wherea is a terminal,A,B,C are nonterminals, and

B,C may not be the start variable (the axiom).

Computation Theory – p.4/27

ObservationThe ruleS −→ ǫ, whereS is the start variable, is not

excluded from a CFG in Chomsky normal form.

Computation Theory – p.5/27

Theorem 2.9Any context-free language is generated by acontext-free grammar in Chomsky normal form.

Proof idea:• Show that any CFGG can be converted into a CFGG′ in Chomsky

normal form;

• Conversion procedure has several stages where the rules that violate

Chomsky normal form conditions are replaced with equivalent rules

that satisfy these conditions;

• Order of transformations:(1) add a new start variable, (2) eliminate all

ǫ-rules, (3) eliminate unit-rules, (4) convert other rules;

• Check that the obtained CFGG′ define the same language as the initial

CFGG.

Computation Theory – p.6/27

ProofLet G = (N,Σ, R, S) be the original CFG.

Step 1: add a new start symbolS0 to N , and the ruleS0 −→ S to R

Note: this change guarantees that the start symbol of

G′ does not occur on therhs of any rule.

Computation Theory – p.7/27

Step 2: eliminateǫ-rulesRepeat

1. Eliminate theǫ ruleA −→ ǫ from R whereA is not the start symbol;

2. For each occurrence ofA on therhs of a rule, add a new rule toR with

that occurrence ofA deleted;

Examples: (1) replaceB −→ uAv by B −→ uAv|uv;

(2) replaceB −→ uAvAw by B −→ uAvAw|uvAw|aAvw|uvw.

3. Replace the ruleB −→ A, (if it is present) byB −→ A|ǫ unless the

ruleB −→ ǫ has been previously eliminated;

until all ǫ rules are eliminated.

Computation Theory – p.8/27

Step 3: remove unit rulesRepeat:

1. Remove a unit ruleA −→ B ∈ R;

2. For each ruleB −→ u ∈ R, add the ruleA −→ u to R, unless

B → u was a unit rule previously removed;

until all unit rules are eliminated.

Note: u is a string of variables and terminals.

Computation Theory – p.9/27

Convert all remaining rulesRepeat:

1. Replace a ruleA −→ u1u2 . . . uk, k ≥ 3, where eachui, 1 ≤ i ≤ k, is

a variable or a terminal, by:

A −→ u1A1, A1 −→ u2A2, . . ., Ak−2 −→ uk−1uk

whereA1,A2, . . ., Ak−2 are new variables;

2. If k ≥ 2 replace any terminalui with a new variableUi and add the rule

Ui −→ ui;

until no rules of the formA −→ u1u2 . . . uk with k ≥ 3

remain.

Computation Theory – p.10/27

Example CFG conversionConsider the grammarG6 whose rules are:

S −→ ASA|aB

A −→ B|S

B −→ b|ǫ

Notation: symbols removed aregreenand those added arered.

After first step of transformation we get:

S0 −→ S

S −→ ASA|aB

A −→ B|S

B −→ b|ǫ

Computation Theory – p.11/27

Removingǫ rulesRemovingB → ǫ:

S0 −→ S

S −→ ASA|aB|a

A −→ B|S|ǫ

B −→ b|ǫ

RemovingA → ǫ:

S0 −→ S

S −→ ASA|aB|a|SA|AS|S

A −→ B|S|ǫ

B −→ b

Computation Theory – p.12/27

Removing unit ruleRemovingS → S:

S0 −→ S

S −→ ASA|aB|a|SA|AS|S

A −→ B|S

B −→ b

RemovingS0 → S:

S0 −→ S|ASA|aB|a|SA|AS

S −→ ASA|aB|a|SA|AS

A −→ B|S

B −→ b

Computation Theory – p.13/27

More unit rulesRemovingA → B:

S0 −→ ASA|aB|a|SA|AS

S −→ ASA|aB|a|SA|AS

A −→ B|S|b

B −→ b

RemovingA → S:

S0 −→ ASA|aB|a|SA|AS

S −→ ASA|aB|a|SA|AS

A −→ S|b|ASA|aB|a|SA|AS

B −→ b

Computation Theory – p.14/27

Converting remaining rules

S0 −→ AA1|UB|a|SA|AS

S −→ AA1|UB|a|SA|AS

A −→ b|AA1|UB|a|SA|AS

A1 −→ SA

U −→ a

B −→ b

Computation Theory – p.15/27

Observation• The conversion procedure produces several

variablesUi along with several rulesUi → a;• Since all these represent the same rule, we may

simplify the result using a single variableU and asingle ruleU → a.

Computation Theory – p.16/27

Greibach Normal FormA context-free grammarG = (V,Σ, R, S) is inGreibach normal form if each ruler ∈ R has theproperty:lhs(r) ∈ V , rhs(r) = aα, a ∈ Σ andα ∈ V ∗.

Note: Greibach normal form provides a justification of

operator prefix-notation usually employed in algebra.

Computation Theory – p.17/27

Greibach TheoremEvery CFLL whereǫ 6∈ L can be generated by a CFGin Greibach normal form.

Proof idea: Let G = (V, Σ, R, S) be a CFG generatingL. Assume thatG is

in Chomsky normal form

• Let V = {A1, A2, . . . , Am} be an ordering of nonterminals.

• Construct the Greibach normal form from Chomsky normal form.

Computation Theory – p.18/27

Construction1. Modify the rules inR so that ifAi → Ajγ ∈ R thenj > i

2. Starting withA1 and proceeding toAm this is done as follows:

(a) Assume that productions have been modified so that for

1 ≤ i ≤ k, Ai → Ajγ ∈ R only if j > i;

(b) If Ak → Ajγ is a production withj < k, generate a new set of

productions substituting for theAj the rhs of eachAj production;

(c) Repeating (b) at mostk − 1 times we obtain rules of the form

Ak → Apγ, p ≥ k;

(d) Replace rulesAk → Akγ by removing left-recursive rules.

Computation Theory – p.19/27

FactsLet m be the highest order nonterminal generatedduring Greibach normal form construction algorithm.

Note: if no left-recursive rules (Ai → Aiα, for some1 ≤ i ≤ m) are in the

grammar then we have:

1. The leftmost symbol on the rhs of each rule forAm must be a terminal,

sinceAm is the highest order variable.

2. The leftmost symbol on the rhs of any rule forAm−1 must be either

Am or a terminal symbol. If it isAm we can generate new rules

replacingAm with the rhs of theAm rules.

3. Repeating (2) forAm−2, . . . , A2, A1 we obtain only rules whose rhs

start with terminal symbols.

Computation Theory – p.20/27

Left recursionA CFG containing rules of the form

A → Aα|β

is called left-recursive inA.

The language generated by such rules is of the formA

∗⇒ βαn. If we replace the rulesA → Aα|β by

A → βB|β,B → αB|α

whereB is a new variable, then the language generated

by A is the same while no left-recursiveA-rules are

used in the derivation.

Computation Theory – p.21/27

Removing left-recursionLeft-recursion can be eliminated by the followingscheme:

• If A → Aα1|Aα2 . . . |Aαr are allA left recursive rules, and

A → β1|β2| . . . |βs are all remainingA-rules then chose a new

nonterminal, sayB;

• Add the newB-rulesB → αi|αiB, 1 ≤ i ≤ r;

• Replace theA-rules byA → βi|βiB, 1 ≤ i ≤ s.

This construction preserve the languageL.

Computation Theory – p.22/27

Conversion algorithmfor (k=1; k<=m; k++)

{for (j=1; j<=k-1; j++)

{for (each A_k ---> A_j alpha)

{

for all rules A_j ---> beta

add A_k ---> beta alpha

remove A_k ---> A_j alpha

}

for (each rule A_k ---> A_k alpha)

{

add rules B_k ---> alpha | alpha B_k

remove A_k ---> A_k alpha

}

for (each rule A_k ---> beta, beta does not begin with A_k)

add rule A_k ---> beta B_k

}

}

Computation Theory – p.23/27

More on Greibach NFSee Introduction to Automata Theory, Languages, and

Computation, J.E, Hopcroft and J.D Ullman, Addison-

Wesley 1979, p. 94–96.

Computation Theory – p.24/27

ExampleConvert the CFGG = ({A1, A2, A3}, {a, b}, R, A1)

whereR = {A1 → A2A3, A2 → A3A1|b, A3 → A1A2|a}

into Greibach normal form.

Computation Theory – p.25/27

Solution1. Step 1: ordering the rules: (OnlyA3 rules violate ordering conditions,

hence onlyA3 rules need to be changed).

Following the procedure we replaceA3 rules by:

A3 → A3A1A3A2|bA3A2|a

2. Eliminating left-recursion we get:A3 → bA3A2B3|aB3|bA3A2|a,

B3 → A1A3A2|A1A3A2B3;

3. All A3 rules start with a terminal. We use them to replaceA1 → A2A3.

This introduces the rulesB3 → A1A3A2|A1A3A2B3;

4. UseA1 production to make them start with a terminal.

Computation Theory – p.26/27

The resultA3 → bA3A2B3|bA3A2|aB3|a

A2 → bA3A2B3A1|bA3A2A1|aB3A1|aA1|b

A1 → bA3A2B3A1A3|bA3A2A1A3|aB3A1A3|aA1A3|bA3

B3 → bA3A2B3A1A3A3A2B3|bA3A2B3A1A3A1A3A3A2

B3 → bA3A3A2B3|bA3A3A2|bA3A2A1A3A3A2B3|bA3A2A1A3A3A2

B3 → aB3A1A3A3A2B3|aB3A1A3A3A2

B3 → aA1A3A3A2B3|aA1A3A3A2

Disclaimer: I copied this result from Hopcroft and Ulman, page 98-99!

Hence, I cannot guarantee its correctness.

Computation Theory – p.27/27