coalgebraic characterizations of context-free languages

24
COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES JOOST WINTER, MARCELLO M. BONSANGUE, AND JAN RUTTEN CWI, Science Park 123, 1098 XG Amsterdam, The Netherlands e-mail address : [email protected] LIACS, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands e-mail address : [email protected] CWI, Science Park 123, 1098 XG Amsterdam, The Netherlands e-mail address : [email protected] Abstract. We provide a coalgebraic account of context-free languages, using functors D := 2 × (-) A for deterministic automata over an alphabet A, in three different but equivalent ways: (i) by viewing context-free grammars as D-coalgebras; (ii) by defining a format for behavioural differential equations (w.r.t. D) for which the unique solutions are precisely the context-free languages; and (iii) as the D-coalgebra of generalized regular expressions in which the Kleene star is replaced by a unique fixed point operator. In all cases, semantics is defined by the unique homomorphism into the final coalgebra of all lan- guages, paving the way for coinductive proofs of context-free language equivalence. This presentation can be seen categorically as an instance of the generalized powerset construc- tion [SBBR10]. Furthermore, we generalize this approach from context-free languages to formal power series in general, by generalizing to functors D := S × (-) A for automata with output values over an arbitrary semiring S. 1. Introduction The set P(A * ) of all formal languages over an alphabet A is a final coalgebra of the functor D(X ) := 2 × X A . Deterministic automata are D-coalgebras and their behaviour, in terms of language acceptance, is given by the final homomorphism into P(A * ). A language is regular if it is in the image of the final homomorphism from a finite D-coalgebra to P(A * ). Or, equivalently by Kleene’s theorem, if it is in the image of the final homomorphism from the set of regular expressions, which constitute a D-coalgebra by means of the so-called Brzozowski derivatives. Thus the coalgebraic picture of regular languages and regular expressions is well- understood (cf. [Rut98] for details). Moreover the picture is so elementary that it has 1998 ACM Subject Classification: F4.2 Grammars and Other Rewriting Systems. Key words and phrases: coalgebra, formal languages, context-free languages, formal power series. Supported by the NWO project CoRE: Coinductive Calculi for Regular Expressions. LOGICAL METHODS IN COMPUTER SCIENCE DOI:10.2168/LMCS-??? c Joost Winter et al. Creative Commons 1

Upload: leidenuni

Post on 22-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE

LANGUAGES

JOOST WINTER, MARCELLO M. BONSANGUE, AND JAN RUTTEN

CWI, Science Park 123, 1098 XG Amsterdam, The Netherlandse-mail address: [email protected]

LIACS, Niels Bohrweg 1, 2333 CA Leiden, The Netherlandse-mail address: [email protected]

CWI, Science Park 123, 1098 XG Amsterdam, The Netherlandse-mail address: [email protected]

Abstract. We provide a coalgebraic account of context-free languages, using functorsD := 2 × (−)A for deterministic automata over an alphabet A, in three different butequivalent ways: (i) by viewing context-free grammars as D-coalgebras; (ii) by defininga format for behavioural differential equations (w.r.t. D) for which the unique solutionsare precisely the context-free languages; and (iii) as the D-coalgebra of generalized regularexpressions in which the Kleene star is replaced by a unique fixed point operator. In allcases, semantics is defined by the unique homomorphism into the final coalgebra of all lan-guages, paving the way for coinductive proofs of context-free language equivalence. Thispresentation can be seen categorically as an instance of the generalized powerset construc-tion [SBBR10]. Furthermore, we generalize this approach from context-free languages toformal power series in general, by generalizing to functors D := S × (−)A for automatawith output values over an arbitrary semiring S.

1. Introduction

The set P(A∗) of all formal languages over an alphabet A is a final coalgebra of the functorD(X) := 2×XA. Deterministic automata are D-coalgebras and their behaviour, in termsof language acceptance, is given by the final homomorphism into P(A∗). A language isregular if it is in the image of the final homomorphism from a finite D-coalgebra to P(A∗).Or, equivalently by Kleene’s theorem, if it is in the image of the final homomorphism fromthe set of regular expressions, which constitute a D-coalgebra by means of the so-calledBrzozowski derivatives.

Thus the coalgebraic picture of regular languages and regular expressions is well-understood (cf. [Rut98] for details). Moreover the picture is so elementary that it has

1998 ACM Subject Classification: F4.2 Grammars and Other Rewriting Systems.Key words and phrases: coalgebra, formal languages, context-free languages, formal power series.Supported by the NWO project CoRE: Coinductive Calculi for Regular Expressions.

LOGICAL METHODSIN COMPUTER SCIENCE DOI:10.2168/LMCS-???

c© Joost Winter et al.Creative Commons

1

2 JOOST WINTER ET AL.

recently been possible [Sil10] to generalize it to a large class of other systems, includingMealy machines, labelled transition systems, and various probabilistic automata.

In the present paper, we will develop in part a similar coalgebraic picture for context-freelanguages, which form another well-known class, extending regular languages. Our focuswill be on context-free grammars, which constitute one of the common definition schemesfor context-free languages. (Another well-known characterization is through pushdown au-tomata, which will not be treated here.)

Because the set of all languages is a final coalgebra of the functor D(X) = 2×XA, aswas mentioned above, it seems natural to try and use the same functor for a coalgebraictreatment of context-free grammars and languages. We will do so in three different butequivalent ways:

(1) We will, in Sect. 3, view context-free grammars (in a modified Greibach normalform) as D-coalgebras, which we shall call grammar coalgebras; their elements willcorrespond to partial derivations.

(2) Next we will, in Sect. 4, define a format of behavioural differential equations [Rut03]with respect to the functor D, for which the unique solutions are precisely thecontext-free languages.

(3) In Sect. 6, we will define context-free languages and power series by means of gen-eralized regular expressions, in which the Kleene star is replaced by a unique fixedpoint operator, and which are given a D-coalgebra structure using a variation ofBrzozowski derivatives. In all three cases, semantics is defined by the final homo-morphism into P(A∗).

In Sect. 5, we will generalize the picture up to that point to a wider range of systems,namely to Moore automata where the output alphabet possesses a semiring structure. Here,we obtain a coalgebraic characterization of classes of formal power series, which can be calledcontext-free or algebraic.

We show that the above three coalgebraic characterizations are equivalent in the fol-lowing sense: a language is context-free if and only if it is in the image of the final ho-momorphism starting in either a grammar D-coalgebra; or a D-coalgebra correspondingto a (finite) system of behavioural differential equations; or the D-coalgebra of generalizedexpressions with fixed point operator.

The proofs of these equivalences are not trivial, but contain few surprises, consistingof ingredients that are already present at various places in the literature. What we do seeas the contribution of this paper are the three characterizations as such, together with thefact that their equivalence could be established in such an elementary fashion. We expectthat this will lead to various further results, as follows.

Grammar coalgebras establish a direct correspondence between context-free grammarsand context-free languages, by finality, and thus pave the way for coinductive proofs ofcontext-free language equivalence. Furthermore, we will argue that our way of definingcontext-free languages through behavioural differential equations will lead to a generaliza-tion of the very notion of context-freeness to other types of systems, much in the same wayas regular expressions and regular languages were generalized in [Sil10]. A sketch of anelementary but interesting first instance hereof is given at the end of the present paper, inSect. 7, where we will introduce the new notion of context-free streams. Finally, expressionswith a fixed point operator are well-suited for the formulation of algebraic characterizations,which we see as yet another direction for future research.

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 3

1.1. Related work. In contrast to regular languages, equality of context-free languagesis known to be an undecidable property [HMU06]. This may explain why not so muchalgebraic or coalgebraic work has been devoted to study the theory of context-free languages.The first, and only, coalgebraic treatment of context-free languages we are aware of, ispresented in [HJ05]. In this paper context-free languages are described indirectly, as theresult of flattening finite skeletal parsed trees. The authors study context-free grammars ascoalgebras for a functor different form ours, i.e. the functor P((A+ (−))∗).

It should be noted that, in contrast to our approach, even for finitely many non-terminalsymbols, only some coalgebras for this functor are ordinary context-free grammars, as in-finitely many productions must be allowed in order to obtain the set of skeletal parsed treesof finite depth via finality.

Algebraically, the starting point is Kozen’s complete characterization of regular lan-guages in terms of Kleene algebras, idempotent semirings equipped with a star-operationsatisfying some fixed point equations [Koz94]. In [Lei91, EL05], Kleene algebras have beenextended with a least fixed point operator to axiomatize fragments of the theory of context-free languages. We take a similar approach, but coalgebraic in nature and substituting theKleene star with a unique fixed point operator. Whereas [Lei91, EL05] are interested inproviding solutions to systems of equations of the form x = t using least fixed points, welook at systems of behavioural differential equations and give a semantic solution in termsof context-free languages (in Sect. 4) and syntactic solutions in terms of regular expressionswith unique fixed points (in Sect. 6). Regular expressions with the Kleene star replacedby a unique right-recursive fixed point operator have been studied in [SBR10, Sil10] for alarge variety of coalgebras, including the one for deterministic automata. The observationthat context-free languages can be seen as solutions to systems of equations dates backto [GR62].

This article is an extended version of [WBR11], which was submitted to the CALCO’11 conference.

2. Preliminaries

In this section we will recall the main definitions and results regarding algebraic structures,coalgebraic theory, monads and algebras, context-free grammars, as well as the generalizedpowerset construction. A more extensive coalgebraic treatment of languages, automata andregular expressions can be found, for example, in [Rut00, Rut98, Jac06]. Following this,we will give a brief overview of the generalized powerset construction, a full presentation ofwhich is given in [SBBR10].

2.1. Algebraic structures. Recall that a monoid (M, ·) consists of a set M , with a binaryoperation · : M ×M → M , and an identity element 1 ∈ M , such that for all m ∈ M ,1 ·m = m = m · 1, and for all m1,m2,m3 ∈M , (m1 ·m2) ·m3 = m1 · (m2 ·m3). A monoidis called commutative if for all m1 and m2, m1 ·m2 = m2 ·m1. The set of words over analphabet A with the empty word (here denoted as 1) as unit, and concatenation of wordsas multiplication, will, in this article, be the prime example of a monoid. This monoid isthe free monoid over the alphabet A.

A semiring (S, ·,+) consists of a set S with two binary operations + and ·, such that(S,+) is a commutative monoid with identity element 0, and (S, ·) is a monoid with identity

4 JOOST WINTER ET AL.

1, and · distributes over +, i.e. s1 · (s2 + s3) = (s1 · s2) + (s1 · s3) and (s1 + s2) · s3 =(s1 · s3) + (s2 · s3) for all s1, s2, s3 ∈ S, and moreover, for all s ∈ S, 0 · s = s · 0 = 0.

2.2. Notions from coalgebraic theory. A coalgebra for an endofunctor F : C → C ona category C consists of a carrier set C together with a map c : C → FC. The functorF is usually called the type of the coalgebra. In this paper we will be concerned withcoalgebras for structured automata [SBBR10], i.e. of type DF , for functors D : S × (−)A,and endofunctors F : Set→ Set. Here A is a finite set (in this context also called alphabet),S is the carrier of a semiring (S, ·,+), and× is the Cartesian product. When F is the identityfunctor 1Set, DF will be the familiar functor D representing deterministic automata withinputs in A and outputs in S.

In a large part of this article, the semiring used will be the Boolean semiring 2, char-acterizable by the equation 1 + 1 = 1, which can also be viewed as a complete lattice with0 ≤ 1 and join ∨ and meet ∧ as usual. 1

A coalgebra (C, c : C → DFC) can be interpreted as an automaton that for a givenstate t ∈ C returns a pair c(t) = 〈o(t), δ(t)〉, determining the output value o(t) ∈ S of astate t, and offering a structured state δ(t)(a) ∈ FC for each input a ∈ A. When dealingwith the semiring 2, o(t) = 1 is equivalent to the notion that t is a final state. Typicallywe will write ta for δ(t)(a), call o(t) the output of t and ta the a-derivative of t, makingclear the connection between our work and Brzozowski’s notion of derivatives of regularexpressions [Brz64]. When confusion may arise about the coalgebra we are referring to, wewill superscribe o(t) and ta with the coalgebra map c, as in

oc(t) and tca

In the case of D-coalgebras (but not for DF -coalgebras), we can extend the notion ofa-derivative to word derivatives tw, for w ∈ A∗, by setting tλ = t for the empty word λ andtaw = (ta)w for a ∈ A and w ∈ A∗.

For arbitrary F -coalgebras, a homomorphism from a F -coalgebra (C, c) to anotherF -coalgebra (D, d) is a mapping f : C → D, such that d f = Ff c.

When dealing with D-coalgebras, this general condition corresponds to the conditionthat f : C → D is a function preserving outputs and next states, that is, for all t ∈ C:

o(f(t)) = o(t) and f(ta) = f(t)a

A F -coalgebra (Ω, ω) is called final, wheneven, for any F -coalgebra (C, c) there is aunique homomorphism J−K from (C, c) to (D, d). Often, this final homomorphism can beregarded as some kind of interpretation of elements of F -coalgebras in some kind of domain.When we are concerned with different F -coalgebras simultaneously, say (C, c) and (D, d),and their respective mappings to the final coalgebra, we sometimes denote the respectivehomomorphisms with the name of the coalgebra, e.g. J−Kc and J−Kc.

For example, the set P(A∗) of all languages on the alphabet A can be equipped witha 2 × (−)A-coalgebra map by setting for every L ⊆ A∗, o(L) = 1 if and only if λ ∈ L,and La = w ∈ A∗ | aw ∈ L. This coalgebra is final, and the unique homomorphismJ−KC : C → P(A∗) is given by

w ∈ JtKC iff o(tw) = 1, for all w ∈ A∗ .1In this article, the symbol D will either represent the functor 2 × (−)A representing formal languages,

or the more general functor S × (−)A representing formal power series, depending on the section and thetopic involved. We hope that this ambiguity will not create any confusion to the reader.

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 5

Similarly, for every semiring S, the set of all functions in SA∗

can be equipped with aS × (−)A-coalgebra map making it final [Rut00].

In general, a bisimulation between two F -coalgebras (C, c) and (D, d) is a relationR ⊆ C ×D such that there exists a morphism r : R→ FR, making the diagram

C π1

Rπ2- D

FC

c

?Fπ1

FR

r

?

Fπ2- FD

d

?

commute. Here π1 and π2 are the projections from R to C and D, respectively. Wheneverthere exists a bisimulation R such that (s, t) ∈ R, we say that s and t are bisimilar andwrite s ∼ t.

When a final F -coalgebra exists, we have

s ∼ t iff JsKc = JtKdor, in other words, s and t are bisimilar exactly when they are mapped onto the sameelement in the final coalgebra [Rut00] In the case of functor of the type D, these elementswill be formal languages or, in the more general case, formal power series.

In the case of D-coalgebras, this general notion of a bisimulation instantiates to thefollowing definition: a relation R ⊆ C×D is a bisimulation if, whenever (s, t) ∈ R, we haveo(s) = o(t), and (sa, ta) ∈ R for all a ∈ A. For D-coalgebras we call a relation R ⊆ C ×D abisimulation up-to if, whenever (s, t) ∈ R, we have o(s) = o(t), and for all a ∈ A, there ares′ ∈ C and t′ ∈ D such that sa ∼ s′, ta ∼ t′, and (s′, t′) ∈ R. Clearly ∼ is a bisimulationup-to. Conversely:

Lemma 2.1. For every bisimulation up-to R, if (s, t) ∈ R, then s ∼ t.

Proof. We extend the bisimulation up-to R to a relation R′ defined by

R′ = (s, t) | ∃s′, t′(s ∼ s′ ∧ (s′, t′) ∈ R ∧ t′ ∼ tIf (s, t) in R′, then there are s′ and t′ such that s ∼ s′, (s′, t′) ∈ R, and t′ ∼ t, and henceo(s) = o(s′) = o(t′) = o(t).

Furthermore note that sa ∼ s′a, that t′a ∼ ta, and there are s′′ and t′′ such that s′a ∼ s′′,(s′′, t′′) ∈ R, and t′′ ∼ t′a. By transitivity of ∼, we get sa ∼ s′′ and t′′ ∼ ta, so (sa, ta) ∈ R′,proving that R′ is a bisimulation.

Clearly R ⊆ R′, and hence, when (s, t) ∈ R, then s ∼ t.

2.3. Monads and Algebras. Recall that a monad on a category C consists of an endo-functor T on C, together with two natural transformations η : 1C → T , called the unit ofthe monad, and µ : T 2 → T , called the multiplication of the monad, making the following

6 JOOST WINTER ET AL.

diagrams commute:

T 3 Tµ- T 2 T

ηT- T 2

TηT

T 2

µT

?µ- T

µ

?T

µ

?====

====

====

================

In general, given a category C and endofunctor T , a T -algebra consists of an object Xof C, together with a mapping α : TX → X.

Given a monad T over a category C, a algebra for the monad T is a T -algebra α suchthat α ηX = 1X , and α T (α) = α µX . We note that, given any X, µX : T 2X → TX isthe free T -algebra over X, with ηX : X → TX as unit.

For a more comprehensive background view on monads, algebras, and the deeper un-derlying theory, we refer to any textbook on category theory, such as [Awo10].

2.4. The generalized powerset construction. In [SBBR10], a categorical generalizationof the powerset construction, a well-known method to transform nondeterministic finiteautomata into deterministic automata, is presented. The finite powerset functor Pω hereis generalized to an arbitrary monad T , and instead of determinizing nondeterministicautomata into deterministic automata we now transform FT -coalgebras into F -coalgebrasusing a categorically analogous technique. The familiar powerset construction can then beseen as an instance of this general construction, where F is the functor 2×(−)A representingdeterministic automata, and T is the powerset monad Pω.

To start, we present the categorical formulation of the familiar powerset construction,for the powerset monad and deterministic automata. We start by fixing an alphabet A andassume D = 2× (−)A.

Say, we are given a nondeterministic automaton, f : X → DPωX. From this automa-ton, we will construct a new, deterministic, automaton f# : Pω(X) → DPωX, acceptingthe same language.

This can be done by setting, for all Y ∈ Pω(X):

of#

(Y ) =∨y∈Y

of (y) , and Y f#

a =⋃y∈Y

yfa

(Recall the convention of superscribing output values and derivatives with the names ofcoalgebra mappings when we are considering multiple such mappings at the same time.)

This definition provides us with a way to obtain an interpretation of the original non-deterministic automaton by means of the final homomorphism J−K, as in the followingdiagram:

X−- Pω(X) ...........

J−K- P(A∗)

DPω(X)

f

?......................................-

f#

DP(A∗)?

Another way to look at this construction, is to observe that we can assign a Pω-algebrastructure to the set DPωX, such that there is a unique Pω-algebra homomorphism from

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 7

PωX, the free Pω-algebra over X, to DPωX, making the triangle in the above diagramcommute (after applying the forgetful functor, which will be omitted in the remainder ofthis paper).

We can now formulate the general case as follows: let T be a monad on a category Cand let F be an endofunctor C, and consider any coalgebra f : X → FTX. The aim nowis to extend f into some f# : TX → FTX, making the diagram

XηX- TX

FTX

f

?

f#

commute. Here η is the unit of the monad T .This extension can be constructed by assigning an appropriate T -algebra structure

(FTX,m) to FTX. We will now have a unique T -algebra morphism

f# : (TX, µX)→ (FTX, h)

from the free algebra (TX, µX) into (FTX, h), making the above diagram commute.When F has a final coalgebra (Ω, ω), the diagram can be extended as follows:

XηX- TX ...................

J−K- Ω

FTX

f

?.............................................-

f#

ω

?

We will later see that the main constructions in this paper can all be seen as instancesof this generalized powerset construction.

As a final remark for readers with a background in universal coalgebra and distributivelaws, note that if λ : TF → FT is a distributive law then FTX can be given a canonicalT -algebra structure as follows:

TFTXΛTX- FTTX

FµX- FTX .

This algebra structure is consistent with the algebra structure one can give (by finality) tothe final coalgebra (Ω, ω) in the sense that the coalgebra map ........

J−K- is a also a T -algebra

homomorphism. When needed we will prefer to derive such compositional results directlywithout referring to the result in [Bar04], in order to keep this article as elementary andself-contained as possible. For more background information on distributive laws and theirrelation to coalgebra, we refer to [Kli11].

2.5. Context-free languages and grammars. We assume the reader to be familiar withthe standard notions on context-free grammars and languages, and give only the definitionsand results we need in the rest of this paper. For a more comprehensive study of context-freegrammars and languages, see e.g. [HMU06].

A context-free grammar, or CFG, on a finite alphabet A is a pair (X, p), where X isa finite set of symbols we call nonterminals or variables, and p : X → Pω((A + X)∗) is

8 JOOST WINTER ET AL.

a function describing the production rules. We use the standard notation to describe theproduction rules:

x→ t iff t ∈ p(x)

where x ∈ X and t ∈ (A+X)∗. Here + denotes the coproduct (or disjoint union), Pω thefinite power set, and (A+X)∗ is the set of all the strings of finite length over A and X.

Note that the finiteness conditions on both X and the powerset are required, becauseotherwise the set of resulting languages will be the set of all languages. The finitenesscondition on A is essentially not required (barring some rewriting and reformulation), butit is kept here for convenience, as loosening it does not lead to any significant new insights.

In order to define the language associated to a context-free grammar, next we definethe notion of derivation. Given a CFG G = (X, p), for any string s, s′ ∈ (A + X)∗, wewrite s⇒ s′, and say s′ is derivable from s in a single derivation step, whenever s = s1xs2and s′ = s1ts2 for a production rule x → t of G, and s1, s2 ∈ (A + X)∗. We say that s′

is derivable from s in a single leftmost derivation step whenever s1 is a (possibly empty)string of terminals in A∗. As usual,⇒∗ denotes the reflexive and transitive closure of⇒. Ingeneral, if s⇒∗ s′, then s′ is derivable from s using only leftmost derivation steps. Thereforewe can restrict our attention to leftmost derivations only.

For a CFG (X, p) and any variable x ∈ X, called the starting symbol, we define thelanguage L(x) ⊆ A∗ generated by (X, p) from x by

L(x) = w ∈ A∗ |x⇒∗ wA language L ⊆ A∗ is called context-free if there exists a CFG (X, p) and a variable x ∈ X,such that L = L(x).

For our coalgebraic treatment of context-free languages it will be convenient to workwith CFGs with production rules of a specific form. We say that a CFG is in weak Greibachnormal form if all of its production rules are of the form

x→ at or x→ λ

where a ∈ A is an alphabet symbol, and t ∈ (A + X)∗ is a (possibly empty) sequence ofnonterminal and alphabet symbols. The difference between this weak Greibach normal formand the familiar Greibach normal form [Gre65] is that the right hand side of productionrules here consists of strings t over (A+X), rather than simply over (X). Hence, it is clearthat every CFG in Greibach normal form is also in weak Greibach normal form. Conversely,we can easily convert grammars in weak Greibach normal form into the familiar Greibachnormal form by introducing a nonterminal a for each alphabet symbol a, together with asingle rule a→ a for each such nonterminal.

For every terminal symbol x, the language L(x) generated by a CFG (X, p) in weakGreibach normal form is a context-free language. Conversely, every context free language Lcan be generated by a CFG in Greibach normal form from some nonterminal symbol [Gre65].Therefore CFGs in weak Greibach normal form characterize precisely the context-free lan-guages.

3. Context-Free Languages via Grammar Coalgebras

In this section we look at CFGs in weak Greibach normal form (as defined in Section 2.5),and, for each such grammar, we define a corresponding D-coalgebra, in the sense thatthe unique coalgebra homomorphism from the grammar (seen as a D-coalgebra) to thefinal D-coalgebra of all languages, maps nonterminal symbols precisely to the context-free

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 9

language they generate. The key observation is that every CFG (X, p) with productions inweak Greibach normal form can be seen as a coalgebra for the functor DPω((A + (−))∗).More precisely, we represent the production rules by a map p : X → DPω((A+X)∗), withp(x) = 〈o(x), δ(x)〉 defined, for all terminal symbols x ∈ X, by

o(x) = 1 iff x→ λ , and t ∈ xa iff x→ at

(writing as before xa for δ(x)(a)).Consider for example the grammar (in weak Greibach normal form) over the alphabet

A = a, b with nonterminal symbols X = x, y and productions

x → axa

x → ayb

x → aa

y → ayb

y → λ

The language generated from x is L(x) = anbmak |n = m + k, n ≥ 1, while the lan-guage generated from y is L(y) = anbn |n ≥ 0. In coalgebraic form, the above productionsread as follows:

o(x) = 0 xa = xa, yb, a xb = ∅o(y) = 1 ya = yb yb = ∅

The coalgebra associated to each CFG in weak Greibach normal form is not a proper de-terministic automaton (i.e. a D-coalgebra) because its type is of the form DPω((A+(−))∗).However, we can turn it into a deterministic automaton by embedding the nonterminalsymbols X into Pω((A+X)∗) using the assignment ηX : X → Pω((A+X)∗), mapping eachx ∈ X into the singleton set x (in which x is seen as a string). In fact, we extend in acanonical manner each coalgebra p : X → DPω((A + X)∗) for a CFG to what we call agrammar coalgebra p# : Pω((A+X)∗)→ DPω((A+X)∗) as follows: for each finite subsetS ⊆ (A+X)∗ we define its output value and its a-derivative inductively by

S o(S) Sa∅ 0 ∅λ 1 ∅bs 0 if b = a then s else ∅xs o(x) ∧ o(s) ts | t ∈ xa ∪ (if o(x) = 1 then sa else ∅)T ∪ U o(T ) ∨ o(U) Ta ∪ Ua

We now observe that Pω((A + X)∗) is the free idempotent semiring on (A + X), andforms a monad for the algebra of idempotent semirings with constants in A, with − asthe unit of the monad, and

µ : T × U → tu | t ∈ T ∧ u ∈ Uas multiplication. Another important observation is that the category of Pω((A + −)∗)-algebras can also be seen as the category of idempotent semirings with constants in A. Wewill henceforth refer to this category as A-IdSem.

In order to see this construction as an instance of the generalized powerset construction,we need to specify an idempotent semiring structure on the set DPω(A+X)∗, with constants

10 JOOST WINTER ET AL.

in A such that p# is the unique A-Idsem-morphism from Pω((A+X)∗) to DPω((A+X)∗)making the diagram

XηX- Pω((A+X)∗)

DPω((A+X)∗

p

?p#

commute.DPω((A+X)∗) can indeed be regarded as an idempotent semiring in a canonical way,

because 2 is, and algebras are closed under products. However, this canonical algebrastructure is not the one needed here, as a result of the fact that the operation of taking aderivative to an alphabet symbol does not commute with word composition. As a result,we will obtain a unique homomorphism, but it will not coincide with p#. Hence, we willassign another semiring structure to DPω((A + X)∗), which indeed yields p#, as definedabove, as the unique A-Idsem-morphism from Pω((A+X)∗) to DPω((A+X)∗).

Such a semiring structure can indeed be defined on 2× (Pω(A+X)∗)A, as follows:

0 (02, λa.∅)1 (12, λa.∅)a (02, λb.if b = a then ∅ else ε)

(o1, d1) + (o2, d2) ((o1 + o2), λa.(d1(a) ∪ d2(a))(o1, d1) · (o2, d2) ((o1 · o2), λa.(o1 · d2(a) ∪ d1(a) ·

⋃b∈A b · d2(a)))

This semiring structure yields the same p# as before as unique A-IdSem-morphismfrom Pω((A + X)∗), to DPω(A + X)∗. Our earlier inductive construction of p# can nowindeed be seen as an instance of the generalized powerset construction, as in the followingdiagram:

XηX- Pω((A+X)∗) .........

J−K- P(A∗)

DPω((A+X)∗

p

?................................................-

p#

DP(A∗)?

We are now ready to state our main result for this section, namely the correspondencebetween context-free languages and the languages associated by the final homomorphismJ−K above to each nonterminal symbol of a CFG.

Theorem 3.1. Let (X, p) be a context-free grammar in weak Greibach normal form overa finite alphabet A, and S a finite subset of (A + X)∗. For every word w ∈ A∗, we havew ∈ JSKp# if and only if there exists some s ∈ S such that s⇒∗ w.

Proof. In this proof, two facts that are easily verified by induction will be used:

(1) o(Sw) = 1 iff w ∈ JSK for all words w and all S; and(2) 2) for all S ⊆ (A+X)∗, if w ∈ JSK, then there is a s ∈ S such that w ∈ JsK.We proceed by induction on the length of words w. For the empty word λ, it is easy

to see that s ⇒∗ λ if and only if s only consists of nonterminal symbols x that have aproduction rule x→ λ. Conversely, we have λ ∈ JSK iff o(Sλ) = 1, that is, if o(S) = 1: but

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 11

it follows easily from the definition that this is the case iff there is a s ∈ S consisting onlyof nonterminal symbols x that have a production rule x→ λ.

Assume that the inductive hypothesis holds for w, and consider the word aw. Assumes ⇒∗ aw, and consider the first term in the leftmost derivation of aw that is of the format. Thus s⇒∗ at⇒∗ aw. By inspecting the definitions, it is easily seen that if s ∈ S, thent ∈ Sa. Furthermore, it also follows that t ⇒∗ w, and the inductive hypothesis then givesw ∈ JSaK, and hence that o((Sa)w) = 1. But clearly Saw = (Sa)w, so aw ∈ JSK.

For the other direction, assume that aw ∈ JSK. Then there must be some s ∈ S, suchthat aw ∈ JsK, or that o(saw) = o((sa)w) = 1. We now get that w ∈ JsaK, andthe inductive hypothesis now gives some t ∈ sa such that t ⇒∗ w. From inspecting thedefinitions, it is also easy to see that s⇒∗ at. Hence, we get s⇒∗ at⇒∗ aw, which is whatneeded to be shown.

It follows that a language L is context-free iff L = JηX(x)Kp# , for some grammarcoalgebra generated by a CFG (X, p), and some x ∈ X.

4. Context-Free Languages via Equations

We will now look at a characterization of context-free languages in terms of systems ofbehavioural differential equations, analogous to those introduced in [Rut05]. The ideais to define context-free languages by means of equations that involve output values andderivatives for each alphabet symbol, using a simple language with only variables, choice andsequential composition. Each system of behavioural differential equations in our format hasas its unique solution a language, which we prove to be context-free whenever the numberof equations is finite. Conversely, for each context-free language L we will construct a finitesystem of behavioural differential equations with L as unique solution.

To illustrate our approach, consider the example from the previous section. A formaldefinition of this context-free language could be given by the following system of behaviouraldifferential equations:

output value a-derivative b-derivativeo(x) = 0 xa = (x · a) + (y · b) + a xb = 0o(y) = 1 ya = y · b yb = 0

Next we present a syntax describing the format of behavioural differential equations thatwill be considered: let A be a finite set of alphabet symbols, X be a (possibly infinite) set ofvariables, and o(x) |x ∈ X and xa |x ∈ X for each a ∈ A be (syntactic) sets of symbols,representing notational variants of the variables. The variables x ∈ X will play the role ofplaceholders for languages L ⊆ A∗, while their notational variants xa will be placeholdersfor the corresponding language La, for each a ∈ A, and the o(x) will correspond to theinformation whether L contains the empty string λ or not. Further, + will be interpretedas union of languages, and · as language concatenation, while 0, 1 and a will be interpretedas the empty language, the language containing the empty word and the singleton word a,respectively.

We call a behavioural differential equation well-formed if it consists of an equationo(x) = v and an equation xa = t for each a ∈ A, where v ∈ 0, 1 and t is a term defined by

t ::= 0 | 1 |x | a | t+ t | t · t

12 JOOST WINTER ET AL.

where x ∈ X and a ∈ A. (In the remainder of this paper, we will often simply write ‘ab’rather than ‘a · b’.) We let T denote the so called term monad, making T(X) equal to theset of all terms, as defined above, over the set X. A well-formed system of equations for Xconsists of one well-formed equation for each x ∈ X. Equivalently, a well-formed system ofequations over X (for a fixed A) can be seen as a mapping f : X → DT(X) [Bar04] where,writing f(x) = 〈o(x), δ(x)〉, we define o(x) and, for each a ∈ A, xa = δ(x)(a) by the valuesspecified by the system of equations.

Before defining what a solution of a system of equations is, we need to interpret theabove operations on terms as functions on languages.

A solution to such a well-formed system of equations is essentially a mapping of variablesx to languages L(x), such that, for all equations, the interpretations in P(A∗) of the lefthand sides and right hand sides are the same. To make this idea precise, we define

L(xa) = L(x)a = w ∈ A∗ | aw ∈ L(x)and recall that o(L) = 1 if and only if λ ∈ L. Further we need to define, denotationally, theinterpretation of all terms, by inductively extending L : X → P(A∗) to L : T(X) → P(A∗)in the following way inductively:

t L(t)x L(x)0 ∅1 ε

t1 + t2 L(t1) + L(t2)t1 · t2 w1 · w2 |w1 ∈ L(t1) ∧ w2 ∈ L(t2)

Next we show that every well-formed system of equations has a unique solution. To thisend we transform a system of equations f : X → DT(X) into a deterministic automatonf# : T(X)→ DT(X) inductively as follows:

t o(t) tax ∈ X o(x) xa (as specified by f)

0 0 01 1 0

b ∈ A 0 if b = a then 1 else 0u+ v o(u) ∨ o(v) ua + vau · v o(u) ∧ o(v) ua · v + o(u) · va

Again, this definition can be obtained as an application of the result in [SBBR10] sinceDT(X) can easily be equipped with an algebra structure for the term monad T. To be moreprecise, this algebra structure can be specified as follows:

0 (02, λa.∅)1 (12, λa.∅)a (02, λb.if b = a then ∅ else ε)

(o1, d1) + (o2, d2) ((o1 + o2), λa.(d1(a) + d2(a))(o1, d1) · (o2, d2) ((o1 · o2), λa.(o1 · d2(a) + d1(a) ·

∑b∈A b · d2(a)))

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 13

We can thus combine a system of equations f : X → DT(X), its extension f#, and thefinal homomorphism to the coalgebra of all languages in the following diagram:

X ⊂

i- T(X) ............

J−K- P(A∗)

DT(X)

f

?......................................-

f#

DP(A∗)?

Chasing over the diagram above shows that the map J−K i is a solution of our system ofequations. By finality, it follows that it is the unique solution.

Turning back to the system of equations from the beginning of this section, it turns outthat

JxK = anbmak |n = m+ k, n ≥ 1JyK = anbn |n ≥ 0

yielding the same languages as the grammar given in section 3.Next, we will show that languages occur as such solutions if and only if they are context-

free. Before we do this, however, we introduce the notion of term equivalence:

Definition 4.1. We say that two terms t, u ∈ T(X) are equivalent, denoted by t ≡ u,when for every well-formed system of equations (X, f), t ∼ u with respect to the coalgebra(T(X), f#) generated from (X, f).

One can easily show that the relation ≡ is a congruence with respect to the sum + andmultiplication · of terms in T(X), for any set X. Furthermore, T(X) modulo ≡ forms anidempotent semiring:

Proposition 4.2. For any set X and terms t, u, v ∈ T(X), the following hold:

t+ u ≡ u+ t u ≡ v ⇒ t[u/x] ≡ t[v/x]t+ t ≡ t 0 · t ≡ 0 ≡ t · 0

0 + t ≡ t ≡ t+ 0 1 · t ≡ t ≡ t · 1t+ (u+ v) ≡ (t+ u) + v t · (u · v) ≡ (t · u) · v(t+ u) · v ≡ t · v + u · v t · (u+ v) ≡ t · u+ t · v

Proof. As two useful illustrations of coinductive proofs, we will show the proofs that t · (u+v) ≡ t · u+ t · v and t · (u · v) ≡ (t · u) · v:

To see that t · (u+ v) ≡ t · u+ t · v, consider the relation

R = (t · (u+ v), t · u+ t · v) ∪ ((t · (u+ v)) + w, (t · u+ t · v) + w)We will show that this is a bisimulation up-to. Assume r ∈ R. If r is of the form (t·(u+v), t·u+t·v), we have o(t·(u+v)) = o(t)∧(o(u)∨o(v)) = (o(t)∧o(u))∨(o(t)∧o(v)) = o(t·u+t·v).Furthermore, whenever o(t) = 0, we have:

(t · (u+ v))a = ta · (u+ v)

(t · u+ t · v)a = ta · u+ ta · vand clearly (ta · (u+ v), ta · u+ ta · v) ∈ R. When o(t) = 1, we have:

(t · (u+ v))a = ta · (u+ v) + (u+ v)a

= ta · (u+ v) + (ua + va)

14 JOOST WINTER ET AL.

and

(t · u+ t · v)a = (t · u)a + (t · v)a

= (ta · u+ ua) + (ta · v + va)

But clearly, we have, by earlier laws

(ta · u+ ua) + (ta · v + va) ∼ (ta · u+ ta · v) + (ua + va).

As we know that (ta · (u + v) + (ua + va), (ta · u + ta · v) + (ua + va)) ∈ R, we know thatthe bisimulation up-to condition is fulfilled. The case where r is of the form ((t · (u+ v)) +w, (t · u+ t · v) + w) goes very similarly.

To see that t · (u · v) ≡ (t · u) · v, consider the relation

R = t · (u · v), (t · u) · v ∪ t · (u · v) + w, (t · u) · v + wAgain, we will show that this is a bisimulation up-to. Again, we will treat the case wherewe have a r ∈ R of the form (t · (u · v), (t · u) · v) – the other case goes very similarly, andnow we distinguish three cases.

If o(t) = 0, we will necessarily also have o(t · u) = 0, and (t · (u · v))a = ta · (u · v), aswell as ((t · u) · v)a = (t · u)a · v = (ta · u) · v. Clearly (ta · (u · v), (ta · u) · v) ∈ R.

If o(t) = 1 and o(u) = 0, we have

(t · (u · v))a = ta · (u · v) + (u · v)a

= ta · (u · v) + ua · vOn the other side,

((t · u) · v)a = (t · u)a · v= (ta · u+ ua) · v

However, we already know that (ta · u + ua) · v ∼ (ta · u) · v + ua · v, and now because(ta · (u · v) + ua · v, (ta · u) · v + ua · v) ∈ R the bisimulation up-to condition is satisfied.

Finally, if o(t) = o(u) = 1, we have

(t · (u · v))a = ta · (u · v) + (u · v)a

= ta · (u · v) + (ua · v + va)

and

((t · u) · v)a = (t · u)a · v + va

= (ta · u+ ua) · v + va

But again,(ta · u+ ua) · v + va ∼ ((ta · u) · v + ua · v) + va

and((ta · u) · v + ua · v) + va ∼ (ta · u) · v + (ua · v + va)

As (ta · (u · v) + (ua · v+ va), (ta · u) · v+ (ua · v+ va)) ∈ R, this completes the proof that Ris a bisimulation up-to.

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 15

Equivalence between terms in T(X) can be extended to bisimilarity between differentsystems of equations on the same set of variables. This result will be convenient whenproving that every context-free language is the solution of a well-formed system of equations.

Proposition 4.3. If (X, f) and (X, g) are two systems of equations such that, for every

x ∈ X, and every symbol a, of (x) = og(x), and xfa ≡ xga, then the identity relation on T(X)is a bisimulation up-to between the generated coalgebras (T(X), f#) and (T(X), g#).

Proof. Let t be a term.

First, we have to show that of#

(x) = og#

(x). When t is a variable, an alphabet symbol,0 or 1, this is trivial, and when t is a compound term, this is proven by induction.

Secondly, we have to show that for all alphabet symbols a, there are terms t′, t′′, such

that tf#

a ∼ t′, tg#

a ∼ t′′, and (t′, t′′) ∈ R (or, in other words, t′ = t′′). This, too, will beproven by induction, and is trivial for the base cases where t is an alphabet symbol, 0, or 1.

When t is a variable x, we have xf#

a = xfa ∼ xga = xg#

a . Taking t′ = t′′ = xf#

a thensuffices.

When t is of the form u + v or of the form u · v, we will make use of the inductiveassumption that the bisimilarity condition holds for u and v. But then it is easy to see thatit holds for t too, making use of the fact that ∼ is a congruence with respect to + and ·.

We can now establish the first main result of this section, relating states of a grammarcoalgebra to terms in T(X).

Theorem 4.4. Let A and X be finite sets. For every context-free grammar (X, p) in weakGreibach normal form and finite subset S of (A + X)∗ there exists a well-formed systemof equations f : X → DT(X) and a term t ∈ T(X) such that S ∼ t with respect to thegenerated coalgebras p# and f#, respectively.

Proof. The system of equations f can be constructed from the grammar p as follows: o(x) =1 if λ ∈ p(x), and 0 otherwise; and xa =

∑s | as ∈ p(x), where s is the obvious translation

of s ∈ (A+X)∗ into a term of T(X). The proof now proceeds by showing that (S, t) |S ⊆(A+X)∗, t =

∑s | s ∈ S is a bisimulation up-to with respect to the generated coalgebras.

Because for every context-free language L there exists a CFG (X, p) and x ∈ X suchthat L = JηX(x)Kp# , it follows that every context-free language is the solution of a wellformed system of equations.

It remains to be shown that every solution of a system of equations is a context-freelanguage. Our approach will be to construct, for every system of equations (X, f), a context-free grammar (X, p), such that x ∼ ηX(x) for all x with respect to the generated coalgebrasf# and p#. To this end, we first transform our system of equations so that terms at theright hand side of all equations are in disjunctive normal form. This is possible because ofthe laws proven in Proposition 4.2 and because ≡ is a congruence.

We say that a term t ∈ T(X) is conjunctive when either t = 1; or when t = a · u,where a ∈ A and u a conjunctive term; or when t = x · u, where x is a variable, and uis a conjunctive term. We say that a term t ∈ T(X) is in disjunctive normal form, wheneither t = 0; or when t = u+ v, where u is conjunctive, and v is in disjunctive normal form.Using Proposition 4.2, it is easy to see that for every term t, there is an equivalent (andthus bisimilar for all (T(X), f#)-coalgebras) term t′ in disjunctive normal form.

16 JOOST WINTER ET AL.

This implies that for every system of equations we can construct a new system ofequations with the same variables and having the same solution but such that all terms onthe right-hand side of equations specifying the derivatives are in disjunctive normal form.We call a system of equations with the latter property a system of equations in disjunctivenormal form.

We are finally ready for the other main result of this section, stating that every solutionof a system of equations is a context-free language. Combined with the first main theoremfrom this section, we thus obtain the result that context-free languages are precisely thesolutions of well-formed systems of equations.

Theorem 4.5. For a finite set X, if (X, f) is a system of equations in disjunctive normalform, there exists a context-free grammar (X, p), such that x ∼ ηX(x) with respect to thegenerated coalgebras (T(X), f#) and (Pω((A+X)∗), p#), respectively.

Proof. Given a system of equations (X, f), we construct the grammar (X, p), such that

p(x) =⋃at | t is a disjunct of xa ∪ λ | if o(x) = 1

It is easy to see that f is a translation of p in the sense of Theorem 4.4. Hence, it followsthat x ∼ x = ηX(x) with respect to f# and p#.

5. Generalizing the Notion of Context-Freeness

In this section, we will present a more general picture of context-freeness, that coincideswith a number of existing notions, such as those of algebraic power series, automatic se-quences, and weighted context-free grammars and languages. We generalize from the Booleansemiring 2 to an arbitrary semiring S and to coalgebras for functors of the type S×−A, rep-resenting formal power series. With this small modification, we can now describe systemsof equations and (weighted) grammar coalgebras in much the same way as before. Mostof the earlier theorems and proofs in this paper can be translated easily to the generalizedcase where we deal with arbitrary semirings.

As a first example of a context-free power series (over the semiring of the naturalnumbers and over a singleton alphabet), taken from [Rut02], consider the stream definedby the following system, consisting of just a single equation:

o(x) = 1 x′ = x · xIts solution is the stream of Catalan numbers

1, 1, 2, 5, 14, 42, 132, 429, 1430, . . .

The nth element of this stream counts the number of well-bracketed words consisting of npairs of opening and closing brackets.

Another example of a context-free stream, using the finite field F2 (characterized by1 + 1 = 0) as underlying semiring, is defined by the following system of equations (wherenow X = x, y, z, w):

x(0) = 0 x′ = yy(0) = 1 y′ = zz(0) = 1 z′ = z + y + x · x · ww(0) = 1 w′ = y · w

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 17

(here we write x(0) etc. instead of o(x)). One can show that it generates the stream

0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, . . .

which is the so-called Thue-Morse sequence. (This stream can also be characterized as themorphic sequence generated by the morphism with p(0) = 01 and p(1) = 10.)

Having seen the above examples of context-free power series, let us consider the followinggeneralized definition of systems of equations: let A be a finite set of alphabet symbols, Xbe a (possibly infinite) set of variables, and S be the semiring we are concerned with.

We call a behavioural differential equation for context-free languages well-formed if itconsists of an equation o(x) = v and an equation xa = t for each a ∈ A, where v ∈ 0, 1and t is a term defined by

t ::= s |x | a | t+ t | t · twhere x ∈ X, a ∈ A, and s ∈ S. T(X) again denotes the set of all terms over the set X. Awell-formed system of equations for X consists of one well-formed equation for each x ∈ X,and can again be seen as a mapping f : X → DT(X).

We can again transform systems of equations f : X → DT(X) into a deterministicautomaton f# : T(X)→ DT(X) inductively:

t o(t) tax ∈ X o(x) xa (as specified by f)s ∈ S s 0b ∈ A 0 if b = a then 1 else 0u+ v o(u) + o(v) ua + vau · v o(u) · o(v) ua · v + o(u) · va

Once again, we can assign an algebra structure for the term monad T to the set DT(X),and combine a system of equations f : X → DT(X), its extension f#, and the finalhomomorphism to the coalgebra of all languages in the following diagram:

X ⊂

i- T(X) ..............

J−K- SA

DT(X)

f

?......................................-

f#

D(SA∗)

?

For a more in-depth perspective on these context-free power series, and their relationto automatic sequences and (weighted) languages see [BRW12].

6. Context-Free Expressions

In this section, we will introduce context-free expressions as an extension of regular ex-pressions, where the Kleene star is replaced by a (unique) fixed point operator µ. We dothis directly in the generalized form for arbitrary semirings. We then will define a notionof Brzozowski-like derivatives for these expressions, and prove that the languages charac-terizable by such expressions are precisely the context-free languages. In contrast to thecoalgebraic approaches from the earlier sections, this formalism gives us a single coalgebraof which the elements correspond exactly to the context-free languages.

Our usage of fixed point expressions with a coinductive semantics has a very similarflavour to that in [Sil10], in which fixed point expressions are used as a characterization of

18 JOOST WINTER ET AL.

regular expressions over a variety of functors. The additional expressive power obtained bythe context-free expressions presented here is due to an explicit inclusion of a concatenationoperator.2 This provides an additional perspective on the treatment given here, in which‘context-freeness’ is obtained by the addition of a new operator to a calculus of regularexpressions3, and may pave the way for an investigation of (1) extending this approachto other coinductively defined operators, and (2) extending this approach to a generalizednotion of context-freeness for other functors.

We define the set of terms t (henceforth to be called context-free expressions) andguarded terms g over a semiring S, an alphabet A and a set of variables X as follows:

t ::= s ∈ S |x ∈ X | a ∈ A | t+ t | t · t |µx.gg ::= a · t (a ∈ A) | s ∈ S | g + g

For all closed terms t, we can describe the behaviour by defining the output value o(t),and the derivative ta for each alphabet symbol a. We can do this by extending the earlierinductive scheme defining the output values and derivatives of expressions with the newµ-operator, which performs the role of an unfolding operator here:

t o(t) tas ∈ S s 0b ∈ A 0 if b = a then 1 else 0u+ v o(u) + o(v) ua + vau · v o(u) · o(v) ua · v + o(u) · vaµx.u o(u[µx.u/x]) (u[µx.u/x])a

Here t[u/x], as usual, denotes the term obtained from t by replacing all free occurrencesof x by u. Because of the guardedness conditions of terms occurring directly inside the µoperator, it is easy to see that the above definition is well-defined.

Note furthermore that we have just defined a D-coalgebra with the set of all closedcontext-free expressions as objects, and the behaviour defined above as its transition func-tion.

Again, the behavioural equivalence relation ∼ is a congruence with respect to the sum+ and multiplication · of context-free expressions. Furthermore, the set of context-freeexpressions modulo ∼ forms an idempotent semiring:

Proposition 6.1. For all context-free expressions t, u, v, the following hold:

t+ u ∼ u+ t u ∼ v ⇒ t[u/x] ∼ t[v/x]t+ t ∼ t 0 · t ∼ 0 ∼ t · 0

0 + t ∼ t ∼ t+ 0 1 · t ∼ t ∼ t · 1t+ (u+ v) ∼ (t+ u) + v t · (u · v) ∼ (t · u) · v(t+ u) · v ∼ t · v + u · v t · (u+ v) ∼ t · u+ t · vt[u/x] ∼ u⇒ µx.t ∼ u µx.t ∼ µy.(t[y/x]) if y is not free in t

µx.t ∼ t[µx.t/x]

2In [Sil10], a translation from the familiar format of regular expressions (with concatenation) into µ-styleexpressions is given by means of substitution. However, this translation does not work for expressions of thetype x · t.

3Although this calculus does not explicitly contain the Kleene star, it can easily be expressed by meansof the equality t∗ = µx.((t · x+ 1)

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 19

These laws can be seen as a partial (sound but not complete) axiomatization of be-havioural equivalence between context-free expressions. Note that, because language equiv-alence of context-free languages is not decidable, there cannot be any complete finitaryaxiomatization of behavioural equivalence [EL05].

As an illustration of context-free expressions over the Boolean semiring 2, it is easy tosee that the expression µx.(axb+ 1) will be mapped onto the language anbn. As anotherexample, consider the expression

µx.(axa+ aa+ aµy.(ayb+ 1)b).

In the next subsection, it will become clear that this expression corresponds to the language

anbmak |n = m+ k, n ≥ 1from the earlier examples.

6.1. From Systems of Equations to Context-Free Expressions. Assume we have acoalgebra generated by a system of equations f : X → DT(X), and a term t ∈ T(X). FromSect. 4, we know that this term is mapped by the final homomorphism to a context-freelanguage. In this section, we will look for a context-free expression corresponding to thisterm, in the sense that it is mapped onto the same language, using a process of repeatedsubstitution.

To start with, given a system of equations (X, f), we will associate with every variablex the µ-expression

sx := µx.(∑a∈A

a · xa + o(x))4

and call it the corresponding or associated µ-expression. As before, this notation strictlyspeaking does not denote a single expression, but rather a set of expressions which, byassociativity of addition, are all bisimilar.

We now go on by defining the notions of single syntactic substitutions and chains ofsyntactic substitutions: these definitions can be seen as a formalization of the correspondingnotions in [Mil10], [SBR10], and [Sil10]. Given an association of terms tx to variables x, aterm t′ is a single syntactic substitution of t, if t′ is obtained by replacing (syntactically) asingle occurrence of a single variable x the term tx. A chain of syntactic substitutions is alist of terms t1, . . . , tn such that, for each 1 ≤ i < n, ti+1 is a single syntactic substitutionof ti.

We are especially interested in chains of syntactic substitutions, where the resultingterm does not contain any free variables, or only a limited set of free variables. We callsuch terms closures and pseudoclosures of the original term:

Definition 6.2. We say a term t′ is a Z-pseudoclosure of t for a set Z ⊆ X of variables,with respect to an association of terms tx to variables x, if there exists a chain of syntacticsubstitutions t1, . . . , tn such that t1 = t, tn = t′ and t′ only contains free variables from Z.We call a ∅-pseudoclosure simply a closure.

4With the definition as written here, it is required that the alphabet A is finite, although this requirementcan easily be bypassed as long as there are only finitely many a ∈ A with xa 6= 0, by setting sx :=µx.(

∑a∈A | xa 6=0 a · xa + o(x)). However, with a finite set X of nonterminals and for each x ∈ X finitely

many a ∈ A such that xa 6= 0, the alphabet will always be effectively finite in the sense that only finitelymany a ∈ A are actually used in the system of equations.

20 JOOST WINTER ET AL.

As a continuation of our running example, recall the system of equations correspondingto the language anbmak |n = m+ k, n ≥ 1. From this system of equations, we obtain thefollowing assignment of µ-expressions to variables:

sx = µx.(axa+ ayb+ aa+ 0) sy = µy.(ayb+ 1)

From x, we obtain µx.(axa+ayb+aa+ 0) by means of a single syntactic substitutions, andanother single syntactic substitution then gives us µx.(axa+aµy.(ayb+ 1)b+aa+ 0). Thisexpression does not contain any free variables anymore, and therefore is a closure of x.

Some general laws about closures and pseudoclosures are easily established:

Proposition 6.3.

(1) If u′ is a W -pseudoclosure of u and v′ a W -pseudoclosure of v, then u′ + v′ isa W -pseudoclosure of u + v, u′ · v′ is a W -pseudoclosure of u · v, and µx.u′ is aW − x-pseudoclosure of µx.u.

(2) If t = u+ v, and t′ is a W -pseudoclosure of t, then t′ is of the form u′+ v′, where u′

is a W -pseudoclosure of u, and v′ is a W -pseudoclosure of v′. The same fact holdsif we replace + by ·.

Using the previous proposition, we can establish that, for every term t, a closure t′

exists. It should be noted, though, that this t′ generally is not unique: for a term t, ingeneral, many closures exist.

Proposition 6.4. Given a term t ∈ T(X) (that is, a µ-free term), a set of variables Z ⊆ X,and an assignment of a term tx to each variable x ∈ X, there exists a Z-pseudoclosure t′ oft with respect to this assignment.

Proof. By (reverse) induction on the size of Z. If Z = X, the result is trivial, because everyterm is its own X-pseudoclosure.

Now we assume the theorem holds for W ⊆ X, and need to prove that, for any x ∈ X,the theorem also holds for W − x. We do this by induction on (µ-free) terms.

(1) For terms s ∈ S and a ∈ A, the result is trivial as these terms do not contain anyfree variables.

(2) For terms t = u + v or t = u · v, the result follows from Proposition 6.3 and theinductive hypothesis.

(3) For the variable x, we know that there must be a W -pseudoclosure u of tx. Thenµx.u is a (W − x)-pseudoclosure of x.

(4) For variables y 6= x, assume that u is a W -pseudoclosure of ty, and v is a W -pseudoclosure of tx. Then u[µx.v/x] is a W−x-pseudoclosure of ty (it is easy to seethat u[µx.v/x] can be obtained from u by a chain of single syntactic substitutions),and hence µy.u[µx.v/x] is a W − x-pseudoclosure of y.

With the next proposition we construct a bisimulation up-to between a coalgebra gener-ated by a system of equations and the coalgebra of closed context-free expressions, relatingevery term t to every t′ such that t′ is a closure of t.

Proposition 6.5. Given a system of equations (X, f) (yielding a corresponding expressionsx for each variable x ∈ X), a coalgebra generated by it (T(X), f#), and an assignment ofterms ux to variables x ∈ X such that ux ≡ sx, the relation

R = (t, t′) | t ∈ T(X) and t′ is a closure of t (w.r.t. the assignment ux)

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 21

is a bisimulation up-to between (T(X), f#) and the coalgebra of closed context-free expres-sions.

Proof. Say (t, t′) ∈ R. It suffices to show that o(t) = o(t′), and that for each alphabetsymbol a, there is a sa, such that t′a ∼ sa, and (ta, sa) ∈ R. We will do both by induction.

Showing that o(t) = o(t′) is immediate when t = 0, t = 1, or t = a, because in thesecases we have t = t′. When t = x, it must be the case that t′ is obtainable by a chain of singlesyntactic substitutions from µx.(

∑a∈A a ·xa + o(x)). But, as every chain of single syntactic

substitutions from µx.(∑

a∈A a ·xa+o(x)) will be of the form µx.(∑

a∈A a ·s(a)+o(x)), ando(a · s(a)) = 0 for all s(a), it follows easily that o(t′) = o(x). When t = u+v, it follows thatt′ must be of the form u′+v′, where u′ is a closure of u, and v′ is a closure of v. We then geto(t) = o(u) ∨ o(v) = o(u′) ∨ o(v′) = o(t′), using the inductive assumption that o(u) = o(u′)and o(v) = o(v′). The case where t = u · v goes analogously.

Showing that (ta, t′a) ∈ R, again, is immediate when t = 0, t = 1, or t = a. When

t = x, the first single syntactic substitution to obtain t′ must, again, be replacing x byµx.(

∑a∈A a ·xa+o(x)). As a result, t′ must be of the shape µx.(

∑a∈A a ·s(a)+o(x)), where

each s(a) is a x-pseudoclosure of xa. It follows that ua := s(a)[µx.(∑

a∈A a·s(a)+o(x))/x]is a closure of xa. As (b · tb)a = 0 when a 6= b, as (a · ta)a = ta, and as 1a = 0a = 0, it followsthat ua ∼ t′a: as ua is a closure of ta, the condition holds. The cases where t = u + v andt = u · v are immediate from the inductive hypothesis and Proposition 6.3.

Returning to our example, this proposition directly establishes that the expressionµx.(axa+aµy.(ayb+1)b+aa+0), which is clearly bisimilar to µx.(axa+aµy.(ayb+1)b+aa),corresponds to the language

anbmak |n = m+ k, n ≥ 1 .The two previous propositions directly imply that, for any term in a coalgebra generated

by a system of equations (and, hence, for every context-free language), we can use anyclosure of it as a bisimilar context-free expression. Hence, and because every term has aclosure, for every context-free language we can find a context-free expression that is mappedto it by the final homomorphism:

Theorem 6.6. Let L be a context-free power series over some semiring. There exists acontext-free expression t over the same semiring such that JtK = L.

6.2. From Context-Free Expressions to Systems of Equations. Going in the otherdirection, the recipe is as follows: given a context-free expression t′ in which every variable isbound by a µ-operator just once, we ‘deconstruct’ this expression into a system of equations,and a term t, of which t′ is a closure. Then Proposition 6.5 directly gives us the result, thatthere is a system of equations (X, f), a variable x ∈ X, such that t′ ∼ t with respect tothe coalgebra of closed context-free expressions and the coalgebra (T(X), f#) generated by(X, f). Hence, the final homomorphism maps t to a context-free language.

By applying a process of α-renaming, we can obtain an expression t′ from any expressiont such that, in t′, no variable is bound twice, or, in other words, such that there are no twodistinct subexpressions of t′ that bind the same variable. It is easy to see that the resultingterm t′ will always be bisimilar to t.

Now we are able to move on to the main proposition of this section, in which for everycontext-free expression t′ a system of equations is constructed, such that in the coalgebragenerated by it, there is a term t with t ∼ t′.

22 JOOST WINTER ET AL.

Proposition 6.7. Given a closed context-free expression t′, such that no two distinct sub-expressions of t′ are µ-expressions binding the same variable, there exists a system of equa-tions (X, f) generating an assignment of expressions sx to variables x ∈ X, and a term t,such that (with respect to (X, f)) t′ is a closure of t, based on an assignment of expressionstx to variables x ∈ X with tx ≡ sx.

Proof. In this proof, we define the µ-pruning of a term as the µ-free term obtained byreplacing the outermost µ-expressions in it by the variables bound by these expressions.

For every variable x, let tx be equal to the µ-pruning of ux, where µx.ux is the uniquesub-expression of t binding x. Let s be equal to the µ-pruning of t.

To see that there exists a system of equations (X, f) such that sx ≡ tx for all x ∈ X,we note that for each tx there exists an sx with sx ≡ tx of the form

sx =∑a∈A

a · s(a, x) + o (o ∈ S)

(The reason why this is so, roughly is the following: because ux occurs directly within thescope of a µ-operator, it has to be a guarded term. We can easily show, by induction onguarded terms g, that the µ-pruning of each such term has a normal form of this type.)

However we can easily construct a system of equations (X, f) that yields back these sx,by setting, for each x ∈ X, xa = s(a, x), and o(x) = o. The following lemma proves that,with respect to this system of equations, t is indeed a closure of s:

Lemma 6.8. Given the above association of terms tx to each variable x, if t is the µ-pruning of a term t′ (that is an ingredient of the original term in consideration) with onlyfree variables in W ⊆ X, then t′ is a W -pseudoclosure of t.

Proof. By induction on terms.For terms t of the form s ∈ S or a ∈ A, if t is a µ-pruning of t′, then t = t′, and hence

t′ is a closure (and, hence, a W -pseudoclosure for any W ⊆ X) of t as 0, 1, and a are closedterms.

For terms t′ of the form u′+v′, if t is a µ-pruning of t′, then t is of the form u+v, whereu is a µ-pruning of u′ and v is a µ-pruning of v′. But then, by the inductive hypothesis, u′

is a W -pseudoclosure of u and v′ is a W -pseudoclosure of v, and then, by Proposition 6.3,t′ is a W -pseudoclosure of t. The case where t = u · v goes analogously.

For terms t′ of the form x ∈ W , x is its own µ-pruning, and clearly x is a W -pseudoclosure of itself, as x ∈W .

For terms t′ of the form µx.u, if t is a µ-pruning of t′, then t = x. A single syntacticsubstitution yields µx.tx from x, where tx is the µ-pruning of the unique sub-expressionbinding x, that is u. As u can contain free variables from W as well as x itself, the inductiveassumption gives us that u is a W ∪x-pseudoclosure of tx, and now Proposition 6.3 givesthe result that t′ is a (W ∪ x)− x-pseudoclosure (and, hence, also a W -pseudoclosure)of t.

Now having proved that t is a closure of s, the proof of the proposition is complete,too.

As every context-free expression is bisimilar to a term in a coalgebra generated by somesystem of equations, it follows directly that the final homomorphism maps every context-freeexpression to a context-free language:

Theorem 6.9. For every context-free expression t, JtK is a context-free language.

COALGEBRAIC CHARACTERIZATIONS OF CONTEXT-FREE LANGUAGES 23

7. Discussion

Our coalgebraic account of context-free languages in terms of grammars, systems of be-havioural equations, and context-free expressions can be taken as a starting point for ageneralization in at least two different and orthogonal directions. In one direction (in whichthe first steps have already been taken, see Section 5 and [BRW12]), we can generalize fromlanguages towards power series in arbitrary semirings. Here, a good deal of work remainsto be done in order to see how existing results from the literature on formal power seriesand languages can be cast into the coalgebraic framework, and to find uses of coalgebraicand coinductive techniques in this area.

In another direction, we can consider other languages of expressions for the functor D

to obtain different classes of languages. As an interesting example of this type, one couldconsider systems of behavioural differential equations for which the term at the right ofeach equation stems from the language of expressions

t ::= 0 | 1 |x ∈ X | t+ t | a · tThe semantic solution of such a system is given by regular languages, while the syntacticone is given by a language of expressions as studied in [SBR10]. The corresponding notionof grammars for regular languages is then given by considering productions of the formp : X → DPω(A∗ × X), i.e. right-linear grammars [HMU06]. When we generalize thislanguage of expressions to

t ::= s ∈ S |x ∈ X | t+ t | a · t (7.1)

over an arbitrary semiring S, we obtain the rational power series over S. In both cases, itis also possible to give a corresponding notion of µ-expressions.

It might also be worthwhile look at systems of equations in which other coinductivelyspecified operators than sum and concatenation (or, in the general case, the convolutionproduct) are involved: we can for example investigate systems of equations where, on theright-hand side, the operator for language intersection (or, more general, the Hadamardproduct) is allowed. Because the class of context-free languages is not closed under inter-section, it is directly clear that adding this operator to the language of terms will yield alarger class of languages.

Further research directions include a coalgebraic characterization of context-free lan-guages in terms of pushdown automata [HMU06], and the study of coinductive decisionprocedures for bisimilarity of subclasses of the context-free languages of which equivalenceis decidable, such as deterministic pushdown automata [Sti01] and visibly pushdown au-tomata [AM04].

Acknowledgements

We would like to thank Alexandra Silva, for valuable suggestions and discussions. Further-more, we would like to thank the anonymous referees of the earlier CALCO paper for theircorrections and suggestions for improvements.

24 JOOST WINTER ET AL.

References

[AM04] Rajeev Alur and P. Madhusudan. Visibly pushdown languages. In Laszlo Babai, editor, STOC,pages 202–211. ACM, 2004.

[Awo10] Steve Awodey. Category Theory. Oxford University Press, 2010.[Bar04] Falk Bartels. On Generalized Coinduction and Probabilistic Specification Formats. PhD thesis,

Vrije Universiteit Amsterdam, 2004.[BRW12] Marcello M. Bonsangue, Jan J. M. M. Rutten, and Joost Winter. Coalgebraic characterizations

of context-free power series. In CMCS, 2012.[Brz64] Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM, 11:481–494, 1964.

[EL05] Zoltan Esik and Hans Leiß. Algebraically complete semirings and Greibach normal form. Annalsof Pure and Applied Logic, 133(1-3):173–203, 2005.

[GR62] Seymour Ginsburg and H. Gordon Rice. Two families of languages related to ALGOL. Journalof the ACM, 9:350–371, July 1962.

[Gre65] Sheila A. Greibach. A new normal-form theorem for context-free, phrase structure grammars.Journal of the Association for Computing Machinery, 12:42–52, 1965.

[HJ05] Ichiro Hasuo and Bart Jacobs. Context-free languages via coalgebraic trace semantics. In Jose LuizFiadeiro, Neil Harman, Markus Roggenbach, and Jan J. M. M. Rutten, editors, CALCO, volume3629 of Lecture Notes in Computer Science, pages 213–231. Springer, 2005.

[HMU06] John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. Introduction to automata theory,languages, and computation (3rd ed.). Addison-Wesley, 2006.

[Jac06] Bart Jacobs. A bialgebraic review of deterministic automata, regular expressions and languages.In Essays Dedicated to Joseph A. Goguen, pages 375–404, 2006.

[Kli11] Bartek Klin. Bialgebras for structural operational semantics: An introduction. Theoretical Com-puter Science, 412(38):5043–5069, 2011.

[Koz94] Dexter Kozen. A completeness theorem for Kleene algebras and the algebra of regular events.Information and Computation, 110(2):366–390, 1994.

[Lei91] Hans Leiß. Towards Kleene algebra with recursion. In Egon Borger, Gerhard Jager, Hans KleineBuning, and Michael M. Richter, editors, CSL, volume 626 of Lecture Notes in Computer Science,pages 242–256. Springer, 1991.

[Mil10] Stefan Milius. A sound and complete calculus for finite stream circuits. In LICS, pages 421–430.IEEE Computer Society, 2010.

[Rut98] Jan J. M. M. Rutten. Automata and coinduction (an exercise in coalgebra). In Davide Sangiorgiand Robert de Simone, editors, CONCUR, volume 1466 of Lecture Notes in Computer Science,pages 194–218. Springer, 1998.

[Rut00] Jan J. M. M. Rutten. Universal coalgebra: a theory of systems. Theoretical Computer Science,249(1):3–80, 2000.

[Rut02] Jan J. M. M. Rutten. Coinductive counting: bisimulation in enumerative combinatorics. Electr.Notes Theor. Comput. Sci., 65(1):286–304, 2002.

[Rut03] Jan J. M. M. Rutten. Behavioural differential equations: a coinductive calculus of streams, au-tomata, and power series. Theoretical Computer Science, 308(1-3):1–53, 2003.

[Rut05] Jan J. M. M. Rutten. A coinductive calculus of streams. Mathematical Structures in ComputerScience, 15(1):93–147, 2005.

[SBBR10] Alexandra Silva, Filippo Bonchi, Marcello M. Bonsangue, and Jan J. M. M. Rutten. Generaliz-ing the powerset construction, coalgebraically. In Kamal Lodaya and Meena Mahajan, editors,FSTTCS, volume 8 of LIPIcs, pages 272–283. Schloss Dagstuhl – Leibniz-Zentrum fur Informatik,2010.

[SBR10] Alexandra Silva, Marcello M. Bonsangue, and Jan J. M. M. Rutten. Non-deterministic Kleenecoalgebras. Logical Methods in Computer Science, 6(3), 2010.

[Sil10] Alexandra Silva. Kleene Coalgebra. PhD thesis, Radboud Universiteit Nijmegen, 2010.[Sti01] Colin Stirling. Decidability of DPDA equivalence. Theoretical Computer Science, 255(1-2):1–31,

2001.[WBR11] Joost Winter, Marcello M. Bonsangue, and Jan J. M. M. Rutten. Context-free languages, coalge-

braically. In Andrea Corradini, Bartek Klin, and Corina Cırstea, editors, CALCO, volume 6859of Lecture Notes in Computer Science, pages 359–376. Springer, 2011.