grammars a grammar is a 4-tuple g = (v, t, p, s) where 1)v is a set of nonterminal symbols (also...

GrammarsA grammar is a 4-tuple G = (V, T, P, S) where 1) V is a set of nonterminal symbols (also

called variables or syntactic categories)2) T is a finite set of terminal symbols,

disjoint from V 3) P is a finite subset of

(VT)*V(VT)* (VT)*

4) S is a distinguished symbol in V called the start symbol (or sentence)

An element (, ) in P is called a production and is written as .

is referred to as the left hand side of the production and is referred to as the right hand side of the production.

Notice that the left side must have at least one nonterminal

Example: G1 = (V, T, P, S) where V = {A, S} T = {0, 1} P = {S 0A10A 00A1A }

Note: Grammars are generators, while automata are recognizers. A grammar defines a language in a recursive manner.

Definition: A sentential form of a grammar G = (V, T, P, S) is defined recursively as follows:1. S is a sentential form2. If is a sentential form and

is a production in P, then is also a sentential form

A sentential form of G containing no nonterminal symbol is called a sentence generated by G.

The language generated by a grammar G, denoted by L(G), is the set of all sentences generated by G.

Terminology:G (directly derives) is a relation on

(V T)*. If is a production in P, then G

k k-fold product of + transitive closure of * reflexive transitive closure of

If k then there is a sequence of strings 0, i, i+1, k such that

0 = , i-1 i (1 i k), k =

This sequence of strings is called a derivation sequence of length k.

L(G) = { w | w in T* and S * w}

Example: L(G1) = {0n1n | n > 0}

Design of grammars

L = {w | w {0,1}* and w starts and ends with the same symbol}

S → 0A | 1B | A → 0A | 1A | 0B → 0B | 1B | 1

Classification of grammarsDefinition: A grammar G is said to be 1) Right-linear if each production in P is

of the form A xB or A x where A and B are in V and x in T*.

2) Context-free if each production is of the form A , where A is in V and is in (VT)*.

3) Context-sensitive if each production is of the form , where || ||.

4) A grammar with no restrictions as above is called unrestricted.

If a language L is generated by a ‘type x’ grammar, then the language L is said to be ‘type x’ language.

Right-linear grammars generate right-linear languages, context-free grammars (CFGs) generate context-free languages (CFLs), context-sensitive grammars (CSGs) generate context-sensitive languages (CSLs).

The four types are referred to as Chomsky hierarchy.

Equivalence of regular languages and right-linear languages

Lemma 1(1) , (2) { }, (3) {a} for all a in are right-linear languagesProof:In each case we construct a right-linear grammar

generating the language. (1) G = ({S}, , , S} is a right linear

grammar such that L(G) = . (2) G = ({S}, , { S }, S) is a right-linear

grammar such that L(G) = { }. (3) Ga = ({S}, , { S a}, S) is a right-linear

grammar such that L(Ga) = { a }.

Lemma2 If L1 and L2 are right-linear languages, then (1) L1 L2, (2) L1L2, (3) L1

* are right-linear languages.

Proof: Since L1 and L2 are right-linear languages, we can assume that there are right-linear grammars G1 = (N1, , P1, S1) and G2 = (N2, , P2, S2) such that L(G1) = L1 and L(G2) = L2.

We can assume that N1 and N2 are disjoint (otherwise rename the symbols).

Define the following grammars: (1) G3 = (N1 N2 {S3}, , P1 P2 {S3 S1 | S2}, S3)

(2) G4 = (N1 N2, , P4, S1) where P4 is defined as follows:

(i) if A xB is in P1, then A xB is in P4

(ii) if A x is in P1, then A xS2 is in P4

(iii) All productions in P2 are in P4

(3) G5 = (N1 {S5}, , P5, S5) where P5 is defined as follows:

(i) if A xB is in P1, then A xB is in P5

(ii) if A x is in P1, then A xS5 and Ax are in P5

(iii) S5 S1 | are in P5

Claim: (1) L(G3) = L(G1) L(G2), (2) L(G4) = L(G1)L(G2), and

(3) L(G5) = (L(G1))*.

Proof of claim (2): If S1 * w1 in G1 then S1 * w1S2 in G4.

If S2 * x in G2 then S2 x* in G4.

So, L(G1)L(G2) L(G4).

Now suppose S1 * w in G4. Since there are no productions of the form A x in G4 that came out of G1, we can write the derivation in the form

S1 * xS2 * xy where w = xy. By construction, there must be derivations of the form S1 * x and

S2 * y. Hence, L(G4) L(G1)L(G2).

Proofs of claims 1 and 3 are left as exercise.

Theorem : A regular language is a right-linear language

Proof: Follows from Lemma 1 and inductive applications of Lemma2 on the construction of the regular language.

Regular Expression EquationsX = aX + b where a and b are regular expressions

and X is an unknown is a regular expression equation.

It can be verified that a*b is a solution for the equation.

A set of regular expression equations is said to be in standard form over a set of indeterminates = {X1, …, Xn}if for each Xi there is an equation of the form

Xi = ai0 + ai1X1 + … + ainXn where aij are regular expressions over some alphabet disjoint form .

Solving regular expression equations

Example: X1 = 0X2 + 1X1 +

X2 = 0X3 + 1X2

X3 = 0X1 + 1X3

Theorem : The language defined by a right-linear grammar is a regular set

Proof: Construct a set of regular expression equations from the productions. The solution of the start symbol is a regular expression denoting the language defined by the right-linear grammar.

ExampleConstruct a regular expression equivalent to

S 0A | 1S | A 0B | 1A

B 0S | 1B

Definitions:1) A grammar G = (V, T, P, S) is said to be

left-linear if each production is of the form A Bx or A x, where A and B are in V and x is in T*.

2) A grammar G = (V, T, P, S) is said to be regular if every production takes one of the forms A aB or A a or S . If S is a production, then S does not appear in the RHS of any production.

Example:A B | CB 0B | 1B | 011

C 0D | 1C | D 0C | 1D

Let M = (Q, , , q0, F) be a DFA.

Define a grammar G = (Q, , P, q0) where P is defined as follows:P = {B aC | (B,a) = C} {B a | (B,a) is in F}

If q0 is in F add q0 to P.

Then L(M) = L(G)

a, b a

b

ba

A B

C

Let G = (V, T, P, S) be a regular grammarDefine as finite automaton M = (V {f}, T,

, S, {f}), where(q,a) = p if q ap is in P(q,a) = f if q a is in Pif S is a production, (S,) = f Then, L(G) = L(M)

grammars a grammar is a 4-tuple g = (v, t, p, s) where 1)v is a set of nonterminal symbols (also...

Documents