grammars a grammar is a 4-tuple g = (v, t, p, s) where 1)v is a set of nonterminal symbols (also...

24
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1) V is a set of nonterminal symbols (also called variables or syntactic categories ) 2) T is a finite set of terminal symbols , disjoint from V 3) P is a finite subset of (VT) * V(VT) * (VT) * 4) S is a distinguished symbol in V called the start symbol (or sentence )

Upload: kelly-casey

Post on 06-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Example: G 1 = (V, T, P, S) where V = {A, S} T = {0, 1} P = {S  0A1 0A  00A1 A   } Note: Grammars are generators, while automata are recognizers. A grammar defines a language in a recursive manner.

TRANSCRIPT

Page 1: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

GrammarsA grammar is a 4-tuple G = (V, T, P, S) where 1) V is a set of nonterminal symbols (also

called variables or syntactic categories)2) T is a finite set of terminal symbols,

disjoint from V 3) P is a finite subset of

(VT)*V(VT)* (VT)*

4) S is a distinguished symbol in V called the start symbol (or sentence)

Page 2: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

An element (, ) in P is called a production and is written as .

is referred to as the left hand side of the production and is referred to as the right hand side of the production.

Notice that the left side must have at least one nonterminal

Page 3: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Example: G1 = (V, T, P, S) where V = {A, S} T = {0, 1} P = {S 0A10A 00A1A }

Note: Grammars are generators, while automata are recognizers. A grammar defines a language in a recursive manner.

Page 4: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Definition: A sentential form of a grammar G = (V, T, P, S) is defined recursively as follows:1. S is a sentential form2. If is a sentential form and

is a production in P, then is also a sentential form

A sentential form of G containing no nonterminal symbol is called a sentence generated by G.

Page 5: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

The language generated by a grammar G, denoted by L(G), is the set of all sentences generated by G.

Terminology:G (directly derives) is a relation on

(V T)*. If is a production in P, then G

k k-fold product of + transitive closure of * reflexive transitive closure of

Page 6: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

If k then there is a sequence of strings 0, i, i+1, k such that

0 = , i-1 i (1 i k), k =

This sequence of strings is called a derivation sequence of length k.

L(G) = { w | w in T* and S * w}

Example: L(G1) = {0n1n | n > 0}

Page 7: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Design of grammars

L = {w | w {0,1}* and w starts and ends with the same symbol}

S → 0A | 1B | A → 0A | 1A | 0B → 0B | 1B | 1

Page 8: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Classification of grammarsDefinition: A grammar G is said to be 1) Right-linear if each production in P is

of the form A xB or A x where A and B are in V and x in T*.

2) Context-free if each production is of the form A , where A is in V and is in (VT)*.

3) Context-sensitive if each production is of the form , where || ||.

4) A grammar with no restrictions as above is called unrestricted.

Page 9: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

If a language L is generated by a ‘type x’ grammar, then the language L is said to be ‘type x’ language.

Right-linear grammars generate right-linear languages, context-free grammars (CFGs) generate context-free languages (CFLs), context-sensitive grammars (CSGs) generate context-sensitive languages (CSLs).

The four types are referred to as Chomsky hierarchy.

Page 10: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Equivalence of regular languages and right-linear languages

 

Page 11: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Lemma 1(1) , (2) { }, (3) {a} for all a in are right-linear languagesProof:In each case we construct a right-linear grammar

generating the language.  (1)     G = ({S}, , , S} is a right linear

grammar such that L(G) = . (2)     G = ({S}, , { S }, S) is a right-linear

grammar such that L(G) = { }. (3)     Ga = ({S}, , { S a}, S) is a right-linear

grammar such that L(Ga) = { a }.

Page 12: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Lemma2 If L1 and L2 are right-linear languages, then (1) L1 L2, (2) L1L2, (3) L1

* are right-linear languages.

Proof: Since L1 and L2 are right-linear languages, we can assume that there are right-linear grammars G1 = (N1, , P1, S1) and G2 = (N2, , P2, S2) such that L(G1) = L1 and L(G2) = L2.

We can assume that N1 and N2 are disjoint (otherwise rename the symbols).

Define the following grammars: (1)     G3 = (N1 N2 {S3}, , P1 P2 {S3 S1 | S2}, S3)

Page 13: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

(2)  G4 = (N1 N2, , P4, S1) where P4 is defined as follows:

(i)  if A xB is in P1, then A xB is in P4

(ii)  if A x is in P1, then A xS2 is in P4

(iii)  All productions in P2 are in P4

(3) G5 = (N1 {S5}, , P5, S5) where P5 is defined as follows:

(i) if A xB is in P1, then A xB is in P5

(ii) if A x is in P1, then A xS5 and Ax are in P5

(iii)  S5 S1 | are in P5

Claim: (1) L(G3) = L(G1) L(G2), (2) L(G4) = L(G1)L(G2), and

(3) L(G5) = (L(G1))*.

Page 14: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Proof of claim (2): If S1 * w1 in G1 then S1 * w1S2 in G4.

If S2 * x in G2 then S2 x* in G4.

So, L(G1)L(G2) L(G4).

Now suppose S1 * w in G4. Since there are no productions of the form A x in G4 that came out of G1, we can write the derivation in the form

S1 * xS2 * xy where w = xy. By construction, there must be derivations of the form S1 * x and

S2 * y. Hence, L(G4) L(G1)L(G2).

 Proofs of claims 1 and 3 are left as exercise.

Page 15: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Theorem : A regular language is a right-linear language

Proof: Follows from Lemma 1 and inductive applications of Lemma2 on the construction of the regular language.

Page 16: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Regular Expression EquationsX = aX + b where a and b are regular expressions

and X is an unknown is a regular expression equation.

It can be verified that a*b is a solution for the equation.

A set of regular expression equations is said to be in standard form over a set of indeterminates = {X1, …, Xn}if for each Xi there is an equation of the form

Xi = ai0 + ai1X1 + … + ainXn where aij are regular expressions over some alphabet disjoint form .

Page 17: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Solving regular expression equations

Example: X1 = 0X2 + 1X1 +

X2 = 0X3 + 1X2

X3 = 0X1 + 1X3

Page 18: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Theorem : The language defined by a right-linear grammar is a regular set

 Proof: Construct a set of regular expression equations from the productions. The solution of the start symbol is a regular expression denoting the language defined by the right-linear grammar.

Page 19: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

ExampleConstruct a regular expression equivalent to

S 0A | 1S | A 0B | 1A

B 0S | 1B

Page 20: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Definitions:1) A grammar G = (V, T, P, S) is said to be

left-linear if each production is of the form A Bx or A x, where A and B are in V and x is in T*.

2) A grammar G = (V, T, P, S) is said to be regular if every production takes one of the forms A aB or A a or S . If S is a production, then S does not appear in the RHS of any production.

Page 21: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Example:A B | CB 0B | 1B | 011

C 0D | 1C | D 0C | 1D

Page 22: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Let M = (Q, , , q0, F) be a DFA.

Define a grammar G = (Q, , P, q0) where P is defined as follows:P = {B aC | (B,a) = C} {B a | (B,a) is in F}

If q0 is in F add q0 to P.

Then L(M) = L(G)

Page 23: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

a, b a

b

ba

A B

C

Page 24: Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite

Let G = (V, T, P, S) be a regular grammarDefine as finite automaton M = (V {f}, T,

, S, {f}), where(q,a) = p if q ap is in P(q,a) = f if q a is in Pif S is a production, (S,) = f Then, L(G) = L(M)