lec05

DFA to Regular Expressions

Proposition: If L is regular then there is a regular expression r such that

L = L(r).

Proof Idea: Let M = (Q,Σ, δ, q1, F ) be a DFA recognizing L, with

Q = {q1, q2, . . . qn}, and F = {qf1, . . . qfk

}

• Construct regular expression r1fisuch that

L(r1fi) = {w : | δ(q1, w) = qfi

}, i.e., r1fidescribes the set of strings

on which M ends up in state qfiwhen started in the initial state q1.

• Then

L = L(M) = L(r1f1) ∪ L(r1f2

) ∪ · · · ∪ L(r1fk)

= L(r1f1+ r1f2

+ · · · + r1fk)

Thus, the desired regular expression is r1f1+ r1f2

+ · · · + r1fk

89

Constructing r1fi

Idea 1 For every i, j, build regular expressions rij describing strings

taking M from state qi to state qj , where qi need not be the initial

state and qj need not be a final state.

Idea 2 Build the expression rij inductively.

• Start with expressions that describe paths from qi to qj that do

not pass through any intermediate states; i.e., these are single

nodes or single edges.

• Inductively build expressions that describe paths that pass

through progressively a larger set of states.

90

Definitions and Notation

Define Rkij to be the set (not regular expression) of strings leading from

qi to qj such that any intermediate state is ≤ k

• Note, the superscript k refers only to the intermediate states; so i

and j could be greater than k.

• R0ij set of strings that go from qi to qj without passing through any

intermediate states; in other words they are ǫ or single edges.

• Rnij is set of all strings going from qi to qj

91

Constructing set Rkij: Base Case

R0ij =

8<: {a | δ(qi, a) = qj} if i 6= j

{a | δ(qi, a) = qj} ∪ {ǫ} if i = j

92

Constructing set Rkij: Inductive Step

k − 1

j

i

k

Assume we have Rk−1

ij

Rkij = R

k−1

ij ∪ Rk−1

ik (Rk−1

kk )∗Rk−1

kj

93

Constructing the Regular Expression

Task: Construct expression rkij such that L(rk

ij) = Rkij .

Base Case

r0ij =

8>><>>: ∅ if R0ij = ∅

a1 + a2 + · · · am if R0ij = {a1, a2, . . . am}

ǫ + a1 + · · · am if R0ij = {ǫ, a1, . . . am}

94

Constructing the Regular Expression: Inductive step

Assume inductively, rk−1

ij is the regular expression for Rk−1

ij

Rkij = R

k−1

ij ∪ Rk−1

ik (Rk−1

kk )∗Rk−1

kj

= L(rk−1

ij ) ∪ L(rk−1

ik )(L(rk−1

kk ))∗L(rk−1

kj )

= L(rk−1

ij + rk−1

ik (rk−1

kk )∗rk−1

kj )

rk−1

ij + rk−1

ik (rk−1

kk )∗rk−1

kj is the Regular Expression for Rkij .

95

Completing the Proof

Proposition: If L is regular then there is a regular expression r such that

L = L(r).

Proof: Let q1 be the initial state, and {qf1, qf2

, . . . qfk} the final states of

M (which recognizes L), then the desired regular expression is

rn1f1

+ rn1f2

+ · · · rn1fk

96

Example

1 10

0q1 q2

r011 = 1 + ǫ r

022 = 1 + ǫ

r012 = 0 r

021 = 0

r112 = r

012 + r

011(r

011)

∗

r012 r

122 = r

022 + r

021(r

011)

∗

r012

= 0 + (1 + ǫ)+0 = (1 + ǫ) + 0(1 + ǫ)∗0

r2

12 = r1

12 + r1

12(r1

22)∗

r1

22

= (0 + (1 + ǫ)+

0) + (0 + (1 + ǫ)+

0)((1 + ǫ) + 0(1 + ǫ)∗

0)+

= (0 + (1 + ǫ)+

0)(1 + ǫ + 0(1 + ǫ)∗

0)∗

= (1 + ǫ)∗0(1 + ǫ + 01∗0)∗

= 1∗

0(1 + 01∗

0)∗

L(M) = L(r2

12)

97

Analysis of the Translation

Size of the constructed regular expression

• Number of regular expressions = O(n3)

• At each step the regular expression may blowup by a factor of 4

• Each regular expression rnij can be of size O(4n)

The above method works for both NFA and DFA

For converting DFA there is slightly more efficient method (see

textbook)

98

Thus far . . .

NFA DFA

ǫ-NFA Regular Exp.

99

Regular Expression Identities

Associativity and Commutativity

L + M = M + L

(L + M) + N = L + (M + N)

(LM)N = L(MN)

Note: LM 6= ML

Distributivity

L(M + N) = LM + LN

(M + N)L = ML + NL

100

More Identities

Identities and Anhilators

∅ + L = L + ∅ = L

ǫL = Lǫ = L

∅L = L∅ = ∅

Idempotent Law

L + L = L

Closure Laws

(L∗)∗ = L∗

(∅)∗ = ǫ

ǫ∗ = ǫ

L+ = LL∗ = L∗L

L∗ = L+ + ǫ

101

Testing Regular Expression Identities

To test if an equation E = F holds [Example: (L + M)∗ = (L∗M∗)∗]

1. Convert E and F into concrete expressions C and D, by replacing

each language variable in E and F by a concrete symbol

[Replacing L by a and M by b, we get C = (a + b)∗ and D = (a∗b∗)∗]

2. L(C) = L(D) iff the equation E = F holds

[L((a + b)∗) = L((a∗b∗)∗) and so (L + M)∗ = (L∗M∗)∗ holds]

Correctness of algorithm follows from Theorems 3.13 and 3.14 in the

book

102

However, caution!!

The algorithm only applies to equations of regular expressions and not to

all equations

• Consider the equation L ∩ M ∩ N = L ∩ M , where ∩ means set

intersection

• The equation is clearly false

• Concretizing we get {a} ∩ {b} ∩ {c} and {a} ∩ {b}, which are clearly

equal!!

103

lec05

Documents