ps02sol

Solution Set 2CS 475 – Fall 2006

Problem 1 Let INIT(L) = {x | xy ∈ L}. Let r, s, rI , sI be the regular expressions for languagesR,S, INIT(R), and INIT(S), respectively. Using only these regular expressions and the operations +,concatenation, and ∗, give the regular expressions for the following languages:

(a) INIT(R ∪ S)

(b) INIT(RS)

(c) INIT(R∗)

Solution:

(a) Answer: rI + sI . Prefixes of strings in R ∪ S are exactly the prefixes of strings in R and the prefixesof strings in S.

(b) Answer: rI + rsI . A string x is a prefix of a string uv ∈ RS for u ∈ R and v ∈ S, if and only if eitherx is a prefix of u or x = ux′ where x′ is a prefix of v.

(c) Answer: r∗rI . A string x is a prefix of R∗ if x is a prefix of R, or if x consists of some word in Rfollowed by a prefix of R, or if x consists of a word in R followed by another word in R followed by someprefix of R, and so on.

Problem 2 A regular expression is in disjunctive normal form if it is of the form (α1 + α2 + · · · + αn)for some n ≥ 1, where none of the αi’s contains an occurrence of +. Show that every regular languageis represented by some regular expression in disjunctive normal form. Hint: Prove and use the fact that{a, b}∗ = {a}∗({b}{a}∗)∗.

Solution: We use struction induction (induction on the height of the expression tree) to prove this. Thebase case when our regular expression is a single symbol of the alphabet is trivial. Suppose we are givena regular expression R. We suppose that R is fully parenthasized, i.e., every operator along with itsoperands is surrounded by a pair of parenthesis, e.g. instead of a∗(ba + d)∗ we use ((a∗)(((ba) + d)∗)).Consider the highest level operator of R. If it is a +, i.e. R = (R1 + R2), then inductively, R1 and R2

can be written in disjunctive normal forms α1 + · · ·+ αn and β1 + · · ·+ βm respectively and therefore Rcan be written as α1 + β1 + · · · + βm.

On the other hand, if R = (R1R2), then again, inductively we can write R1 and R2 as above and thenR = (α1 + · · · + αn)(β1 + · · · + βm) = α1β1 + α1β2 + · · · + αnβm.

The last case is when R = (R∗1). Using the formula given in the hint, we get disjunctive normal form

from R in this case too.

Problem 3 The use of ∩ with regular expressions does not allow one to describe new languages (seefuture lecture on closure properties). However it does allow for more compact expressions. Show that the

1

shortest regular expression for the language consisting of one word (. . . ((a20a1)2a2)2 . . .)2 over alphabet

{a0, a1, . . . an} is O(2n) while there is an O(n2) expression using ∩ describing the same language. Thus,using ∩ can shorten expressions by an exponential amount.

Solution: For n + 1 symbol alphabet Σ = {a0, . . . , an}, let Ln be the language defined as

Ln = {(. . . ((a20a1)2a2)2 . . . an)2}

We first show that any regular expression for Ln is exponentially large. Notice that constant exponen-tiation (like a2

0) is not valid in standard regular expressions. Also notice that if a regular expressionuses either of + or ∗ operators, its language has to contain more than one word. Therefore, the onlyallowed operation to use in a regular expression for Ln is concatenation. In other words, the only regularexpression we can find for Ln is the single word of Ln itself.

Define recursively the sequence of words w0, . . . , wn as,

wi ={

a0a0 if i = 0wi−1aiwi−1ai otherwise

It can be easily observed that Ln = {wn}. Let us denote by li the length of the expression wi. Observethat the above recurrence gives us the following for li.

li ={

2 if i = 02li−1 + 2 otherwise

This recurrence gives us ln = O(2n) which is the length of the only possible regular expressions for Ln.

If we are allowed to use ∩, we can replace the formula above for wi with the following vi.

vi ={

a0a0 if i = 0(vi−1ai)∗ ∩ ({a0, . . . , ai}∗ai{a0, . . . , ai}∗ai{a0, . . . , ai}∗) otherwise

If we denote by l′i the length of the expression vi, using the above recurrence we obtain the following forl′i.

l′i ={

2 if i = 0l′i−1 + O(i) otherwise

Therefore l′i = O(n2). Compared to li = O(2n) this is an exponential improvement.

Problem 4 (For 4 hour graduate students only) In this problem we will define a new class of expressionscalled star free regular expressions over an alphabet Σ which are defined inductively as follows:

2

(a) ∅ is a star free expression and the language it denotes is L(∅) = {}

(b) ε is a star free expressions and its language is L(ε) = {ε}

(c) For each a ∈ Σ, a is an expression and it denotes L(a) = {a}

(d) If r is a star free expression, then r̄ is also a star free expression, where L(r̄) = Σ∗ \ L(r)

(e) If r and s are star free expressions, then r+s is also a star free expression, with L(r+s) = L(r)∪L(s).

(f) And finally, if r and s are star free expressions, then rs is a star free expression with L(rs) =L(r)L(s).

So unlike regular expressions, star free expressions have complementation of a language, however theydo not have Kleene closure (hence, “star free”). However, it is possible to define languages like Σ∗ usingstar free expressions; Σ∗ = L(∅̄).

A language L will be called aperiodic if there is an integer n > 0 such that for all x, y, z ∈ Σ∗, xynz ∈ Lif and only if xyn+1z ∈ L. Show that if r is any star free expression then L(r) is aperiodic. (In fact, theconverse also holds: if L is aperiodic then there is a star free expression r such that L = L(r). You mightwant to think of how you might prove the converse.)

Solution: We show that the language of every star free expression is aperiodic by induction on thecomplexity of the expression. The base cases where the expression is any of ∅, ε, or a for some a ∈ Σtrivially holds.

Now let e be a star free expression and assume that every expression simpler than e has an aperiodiclanguage. We consider all the possibilities for the principle operator of e. Since e is non-trivial thisoperator has to be one of complementation, union, or concatenation.

Case 1: e = r + s for star free expressions r and s. By induction hypothesis, r and s haveconstants nr and ns satisfying the condition given in the problem. It is easy to observe that sinceL(e) = L(r) ∪ L(s), the constant ne = max{nr, ns} serves as the desired n for e.

Case 2: e = rs for star free expressions r and s. Let nr and ns be as in the previous case andlet ne = nr + ns. Now let w = xykz ∈ L(e) where k > ne = nr + ns and let w = wrws wherewr ∈ L(r) and ws ∈ L(s). It is easy to observe that either wr has at least nr repetitions of y orws has at least ns repetitions of y (and of course maybe both). Take as example the first case (thesecond case is similar), i.e. wr = xyiy1 and ws = y2y

jz where y1y2 = y, i+ j = k− 1 ≥ nr +ns − 1,and i ≥ nr. Then by induction hypothesis, xyiy1 ∈ L(r) if and only if xyi+1y1 ∈ L(r) and thusxykz = xyiy1y2y

jz ∈ L(rs) if and only if xyi+1y1y2yjz = xyk+1z ∈ L(rs).

Case 3: e = r̄ for some star free expression r. Notice that even though ordinary regular expressionsdo not have the complementation operator defined for them, they are closed under complementationdue to their equivalence to regular languages. Thus, the language of every star free expression isregular while the converse does not hold. Now let Dr be a DFA with state set Q that accepts L(r).For a string w = xyn, let q = δ̂(q0, xyn) and let q′ = δ̂(q0, xyn+1), where q0 is the start state of Dr.Denote by F the set of accept states of Dr.

The statement xynz ∈ L if and only if xyn+1z ∈ L is equivalent to the following: for every stringz ∈ Σ∗, δ̂(q, z) ∈ F if and only if δ̂(q′, z) ∈ F . Identically δ̂(q, z) 6∈ F if and only if δ̂(q′, z) 6∈ F .

3

Now, not belonging to F is the same as belonging to Q − F . Thus δ̂(q, z) ∈ Q − F if and only ifδ̂(q′, z) ∈ Q − F . If we switch the accepting/non-accepting attribute of every state of D, we get aDFA for the complement of L(r). But, our latter statement shows that in the complement DFA,still xynz is accepted if and only if xyn+1z is accepted. Thus, L(r̄) is also aperiodic.

A much simpler arguement is the following: Suppose xynz ∈ L(r̄) = ¯L(r) but xyn+1z 6∈ ¯L(r). Thenxyn+1z ∈ L(r). That means xynz ∈ L(r), which is a contradiction.

4

ps02sol

Documents