graph structure and monadic second-order logic, a … · graph structure and monadic second-order...

741
Bruno Courcelle and Joost Engelfriet Graph Structure and Monadic Second-Order Logic, a Language Theoretic Approach April 2011 to be published by Cambridge University Press

Upload: trinhcong

Post on 19-May-2018

231 views

Category:

Documents


1 download

TRANSCRIPT

  • Bruno Courcelle and Joost Engelfriet

    Graph Structure and MonadicSecond-Order Logic, a Language

    Theoretic Approach

    April 2011

    to be published by

    Cambridge University Press

  • 2

  • 3

    To Dominique and Louco

  • 4

  • Contents

    Introduction 1

    1 Overview 171.1 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . 18

    1.1.1 Context-free word grammars . . . . . . . . . . . . . . . . 181.1.2 Cographs . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.1.3 Series-parallel graphs . . . . . . . . . . . . . . . . . . . . . 211.1.4 The general setting . . . . . . . . . . . . . . . . . . . . . . 221.1.5 Derivation trees . . . . . . . . . . . . . . . . . . . . . . . . 26

    1.2 Inductive sets of properties and recognizability . . . . . . . . . . 301.2.1 Properties of the words of a context-free language . . . . 301.2.2 Some properties of series-parallel graphs . . . . . . . . . . 321.2.3 Inductive sets of properties . . . . . . . . . . . . . . . . . 341.2.4 Recognizability . . . . . . . . . . . . . . . . . . . . . . . . 371.2.5 From inductive sets to automata . . . . . . . . . . . . . . 38

    1.3 Monadic second-order logic . . . . . . . . . . . . . . . . . . . . . 411.3.1 Monadic second-order graph properties . . . . . . . . . . 411.3.2 Monadic second-order logic and recognizability . . . . . . 45

    1.4 Two graph algebras . . . . . . . . . . . . . . . . . . . . . . . . . 461.4.1 The algebra of simple graphs with ports . . . . . . . . . . 471.4.2 The algebra of graphs with sources . . . . . . . . . . . . . 491.4.3 A weak Recognizability Theorem . . . . . . . . . . . . . 51

    1.5 Fixed-parameter tractability . . . . . . . . . . . . . . . . . . . . . 531.6 Decidability of monadic second-order logic . . . . . . . . . . . . . 561.7 Graph transductions . . . . . . . . . . . . . . . . . . . . . . . . . 58

    1.7.1 Examples of monadic second-order transductions . . . . . 591.7.2 The main properties of monadic second-order transductions 651.7.3 The Equationality Theorem . . . . . . . . . . . . . . . . . 67

    1.8 Monadic second-order logic with edge set quantifications . . . . . 681.8.1 Expressing graph properties with edge set quantifications 681.8.2 Monadic second-order transductions over incidence graphs 70

    1.9 Relational structures . . . . . . . . . . . . . . . . . . . . . . . . . 741.9.1 Relational signatures and structures . . . . . . . . . . . . 741.9.2 Betweenness and cyclic ordering . . . . . . . . . . . . . . 76

    5

  • 6 CONTENTS

    1.9.3 Relational databases . . . . . . . . . . . . . . . . . . . . . 781.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    2 Graph algebras and widths of graphs 812.1 Algebras and terms . . . . . . . . . . . . . . . . . . . . . . . . . . 822.2 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882.3 The HR algebra of graphs with sources . . . . . . . . . . . . . . . 100

    2.3.1 The HR graph operations . . . . . . . . . . . . . . . . . . 1002.3.2 Construction of the s-graph defined by a term . . . . . . . 1112.3.3 Algebraic properties and derived operations defined by

    contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.4 Tree-decompositions . . . . . . . . . . . . . . . . . . . . . . . . . 122

    2.4.1 Tree- and path-decompositions . . . . . . . . . . . . . . . 1232.4.2 Some properties of tree-decompositions . . . . . . . . . . 1282.4.3 Transformations of tree- and path-decompositions . . . . 1302.4.4 Tree-decompositions and chordal graphs . . . . . . . . . . 1342.4.5 A syntax for tree-decompositions . . . . . . . . . . . . . . 137

    2.5 The VR algebra of simple graphs with ports . . . . . . . . . . . . 1462.5.1 The VR graph operations . . . . . . . . . . . . . . . . . . 1462.5.2 Construction of the p-graph defined by a term . . . . . . 1512.5.3 Algebraic properties and derived operations defined by

    contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1552.5.4 Properties of clique-width . . . . . . . . . . . . . . . . . . 1592.5.5 Comparisons between tree-width and clique-width . . . . 1652.5.6 Variations on FVR . . . . . . . . . . . . . . . . . . . . . . 171

    2.6 Many-sorted graph algebras . . . . . . . . . . . . . . . . . . . . . 1782.6.1 Many-sorted algebras . . . . . . . . . . . . . . . . . . . . 1782.6.2 The many-sorted HR algebra . . . . . . . . . . . . . . . . 1822.6.3 The many-sorted VR algebra . . . . . . . . . . . . . . . . 186

    2.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

    3 Equational and recognizable sets in many-sorted algebras 1893.1 The equational sets of an algebra . . . . . . . . . . . . . . . . . . 190

    3.1.1 Powerset algebras . . . . . . . . . . . . . . . . . . . . . . 1903.1.2 Equation systems and equational sets . . . . . . . . . . . 1923.1.3 Context-free languages . . . . . . . . . . . . . . . . . . . . 1973.1.4 Equational sets of terms . . . . . . . . . . . . . . . . . . . 2003.1.5 Homomorphic images of equational sets . . . . . . . . . . 2043.1.6 Equational sets of commutative words . . . . . . . . . . . 205

    3.2 Transformations of equation systems . . . . . . . . . . . . . . . . 2073.2.1 Unfolding . . . . . . . . . . . . . . . . . . . . . . . . . . . 2073.2.2 Simplifications of equation systems . . . . . . . . . . . . . 2123.2.3 Using derived operations . . . . . . . . . . . . . . . . . . . 2163.2.4 Closure properties of the class of equational sets . . . . . 2193.2.5 Concluding remarks on equational sets . . . . . . . . . . . 220

    3.3 Intermezzo on automata . . . . . . . . . . . . . . . . . . . . . . . 222

  • CONTENTS 7

    3.3.1 Automata on terms . . . . . . . . . . . . . . . . . . . . . 2223.3.2 Pumping arguments . . . . . . . . . . . . . . . . . . . . . 227

    3.4 The recognizable sets of an algebra . . . . . . . . . . . . . . . . . 2283.4.1 Definitions and examples . . . . . . . . . . . . . . . . . . 2283.4.2 Recognizable sets of terms . . . . . . . . . . . . . . . . . . 2313.4.3 Recognizability and congruences . . . . . . . . . . . . . . 2343.4.4 Effective recognizability . . . . . . . . . . . . . . . . . . . 2393.4.5 Inductive predicates . . . . . . . . . . . . . . . . . . . . . 2453.4.6 Closure properties . . . . . . . . . . . . . . . . . . . . . . 2483.4.7 The Filtering Theorem . . . . . . . . . . . . . . . . . . . . 2503.4.8 Recognizable sets of commutative words . . . . . . . . . . 2553.4.9 Decidability questions . . . . . . . . . . . . . . . . . . . . 2563.4.10 Concluding remarks on recognizability . . . . . . . . . . . 256

    3.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

    4 Equational and recognizable sets of graphs 2614.1 HR-equational sets of graphs . . . . . . . . . . . . . . . . . . . . 262

    4.1.1 HR equation systems . . . . . . . . . . . . . . . . . . . . . 2624.1.2 HR-equational sets and tree-width . . . . . . . . . . . . . 2704.1.3 Type analysis of HR equation systems . . . . . . . . . . . 2724.1.4 Sizes of graphs and the finiteness problem . . . . . . . . . 2754.1.5 Hyperedge replacement . . . . . . . . . . . . . . . . . . . 278

    4.2 HR-recognizable sets of graphs . . . . . . . . . . . . . . . . . . . 2824.2.1 Definitions and first examples . . . . . . . . . . . . . . . . 2824.2.2 A simpler HR-recognizability criterium . . . . . . . . . . . 2894.2.3 Uncountably many HR-recognizable sets . . . . . . . . . . 2904.2.4 HR-recognizability and bounded tree-width . . . . . . . . 292

    4.3 VR-equational sets of simple graphs . . . . . . . . . . . . . . . . 2934.3.1 VR equation systems . . . . . . . . . . . . . . . . . . . . . 2944.3.2 VR-equational sets and clique-width . . . . . . . . . . . . 2974.3.3 Type analysis of VR equation systems . . . . . . . . . . . 2974.3.4 Comparison with the HR-equational sets . . . . . . . . . . 299

    4.4 VR-recognizable sets of graphs . . . . . . . . . . . . . . . . . . . 3064.4.1 Definitions and examples . . . . . . . . . . . . . . . . . . 3064.4.2 A simpler VR-recognizability criterium . . . . . . . . . . . 3094.4.3 Comparison with the HR-recognizable sets . . . . . . . . 311

    4.5 HR- and VR-, equational and recognizable sets . . . . . . . . . . 3134.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

    5 Monadic second-order logic 3155.1 Relational structures and logical languages . . . . . . . . . . . . 315

    5.1.1 Relational structures . . . . . . . . . . . . . . . . . . . . . 3165.1.2 First-order logic . . . . . . . . . . . . . . . . . . . . . . . 3225.1.3 Second-order logic . . . . . . . . . . . . . . . . . . . . . . 3245.1.4 Monadic second-order logic . . . . . . . . . . . . . . . . . 3255.1.5 Logical definitions of properties of relational structures . . 326

  • 8 CONTENTS

    5.1.6 Decidability questions . . . . . . . . . . . . . . . . . . . . 3295.2 Graph properties expressible in monadic second-order logic . . . 331

    5.2.1 Substitutions and relativization . . . . . . . . . . . . . . . 3325.2.2 Transitive closure and path properties . . . . . . . . . . . 3345.2.3 A worked example: the definition of square grids . . . . . 3395.2.4 Monadic second-order definability of regular languages . . 3435.2.5 Edge set quantifications . . . . . . . . . . . . . . . . . . . 3445.2.6 Cardinality predicates . . . . . . . . . . . . . . . . . . . . 3535.2.7 Expressive power of monadic second-order languages . . . 357

    5.3 Monadic second-order logic and recognizability . . . . . . . . . . 3585.3.1 The Splitting Theorem for unions of disjoint concrete struc-

    tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3595.3.2 Quantifier-free transformations of structures . . . . . . . . 3725.3.3 Backwards translation with respect to QF operations . . . 3775.3.4 The VR and HR graph operations . . . . . . . . . . . . . 3805.3.5 The Splitting Theorem for derived operations . . . . . . . 3825.3.6 Computing bounded theories . . . . . . . . . . . . . . . . 3865.3.7 The many-sorted algebra of relational structures . . . . . 3925.3.8 The Recognizability Theorem for STRpres . . . . . . . . . 3935.3.9 Recognizable languages and recognizable sets of graphs . 3955.3.10 Handling general quantifier-free operations . . . . . . . . 402

    5.4 Decidable monadic second-order theories . . . . . . . . . . . . . . 4065.5 Logical characterization of recognizability . . . . . . . . . . . . . 4075.6 Equivalences of logical formulas . . . . . . . . . . . . . . . . . . . 415

    5.6.1 Boolean formulas . . . . . . . . . . . . . . . . . . . . . . . 4155.6.2 Monadic second-order formulas . . . . . . . . . . . . . . . 4165.6.3 Numbers of formulas and recognizability indices . . . . . 421

    5.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

    6 Algorithmic Applications 4256.1 Fixed-parameter tractable algorithms for model-checking . . . . . 4266.2 Decomposition and parsing algorithms . . . . . . . . . . . . . . . 431

    6.2.1 Constructing tree-decompositions . . . . . . . . . . . . . . 4316.2.2 Parsing with respect to HR equation systems . . . . . . . 4326.2.3 Graphs of bounded clique-width . . . . . . . . . . . . . . 433

    6.3 Monadic second-order formulas compiled into finite automata . . 4376.3.1 Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 4386.3.2 Normalizing monadic second-order formulas . . . . . . . . 4426.3.3 Monadic second-order formulas on terms . . . . . . . . . . 4476.3.4 Monadic second-order properties of graphs of bounded

    clique-width . . . . . . . . . . . . . . . . . . . . . . . . . . 4576.3.5 Monadic second-order properties of graphs of bounded

    tree-width . . . . . . . . . . . . . . . . . . . . . . . . . . . 4766.4 Other monadic second-order problems solved with automata . . . 490

    6.4.1 Property-checking problems . . . . . . . . . . . . . . . . . 490

  • CONTENTS 9

    6.4.2 Listing and selection problems for monadic second-orderqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492

    6.4.3 Monadic second-order counting and optimizing functions . 496

    6.4.4 Other algorithmic applications . . . . . . . . . . . . . . . 497

    6.4.5 Optimality results . . . . . . . . . . . . . . . . . . . . . . 498

    6.4.6 Comparing some proofs of the Recognizability Theorem . 500

    6.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500

    7 Monadic second-order transductions 503

    7.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . 504

    7.1.1 Transductions of relational structures . . . . . . . . . . . 504

    7.1.2 Monadic second-order transductions producing structureswith constants . . . . . . . . . . . . . . . . . . . . . . . . 509

    7.1.3 Transductions of words, terms and graphs . . . . . . . . . 512

    7.1.4 The fundamental property of monadic second-order trans-ductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514

    7.1.5 Constructions of monadic second-order transductions . . . 518

    7.1.6 Some particular monadic second-order transductions . . . 522

    7.1.7 Comparing sets of structures via monadic second-ordertransductions . . . . . . . . . . . . . . . . . . . . . . . . . 529

    7.1.8 Evaluation of monadic second-order transductions . . . . 532

    7.2 The Equationality Theorem for the VR algebra . . . . . . . . . . 532

    7.3 Graph transductions using incidence graphs . . . . . . . . . . . . 553

    7.4 The Equationality Theorem for the HR algebra . . . . . . . . . . 557

    7.5 Decidability of monadic second-order satisfiability problems . . . 564

    7.6 Questions about logical characterizations of recognizability . . . 572

    7.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574

    8 Transductions of terms and words 577

    8.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

    8.1.1 Terms and words . . . . . . . . . . . . . . . . . . . . . . . 579

    8.1.2 Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 581

    8.1.3 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582

    8.2 Tree-walking transducers . . . . . . . . . . . . . . . . . . . . . . . 584

    8.3 The basic characterization . . . . . . . . . . . . . . . . . . . . . . 590

    8.4 From jumping to walking . . . . . . . . . . . . . . . . . . . . . . 592

    8.5 From global to local tests . . . . . . . . . . . . . . . . . . . . . . 594

    8.6 Multi bottom-up tree-to-word transducers . . . . . . . . . . . . . 603

    8.7 Attribute grammars and macro tree transducers . . . . . . . . . 609

    8.8 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . 612

    8.9 VR-equational sets of terms and words . . . . . . . . . . . . . . . 613

    8.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617

  • 10 CONTENTS

    9 Relational structures 6219.1 Two types of ternary relational structures related to ordered sets 622

    9.1.1 Betweenness . . . . . . . . . . . . . . . . . . . . . . . . . . 6229.1.2 Cyclic ordering . . . . . . . . . . . . . . . . . . . . . . . . 627

    9.2 Relational structures of bounded tree-width . . . . . . . . . . . . 6299.3 Terms denoting relational structures . . . . . . . . . . . . . . . . 636

    9.3.1 Monadic second-order model-checking problems . . . . . . 6379.3.2 From terms to relational structures by MS-transductions . 6389.3.3 Width notions for relational structures . . . . . . . . . . . 6429.3.4 A powerful subsignature of FQF and another width for

    relational structures . . . . . . . . . . . . . . . . . . . . . 6469.4 Sparse relational structures . . . . . . . . . . . . . . . . . . . . . 650

    9.4.1 Edge set quantifications in uniformly k-sparse graphs . . . 6539.4.2 Uniformly k-sparse relational structures . . . . . . . . . . 6749.4.3 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . 685

    9.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686

    Conclusion and open problems 687

    Index of notation 693

    Index of definitions 702

    Bibliography 709

  • Introduction

    This book contributes to several fields of Fundamental Computer Science. Itextends to finite graphs several central concepts and results of Formal LanguageTheory and it establishes their relationship to results about Fixed-ParameterTractability. These developments and results have applications in StructuralGraph Theory. They make an essential use of logic for expressing graph prob-lems in a formal way and for specifying graph classes and graph transformations.We need some short historical accounts in order to describe these contributions.

    Formal Language Theory

    This theory has been developed with different motivations. Linguistics andcompilation have been among the first ones, around 1960. In view of the appli-cations to these fields, different types of grammars, automata and transducershave been defined to specify formal languages, i.e., sets of words, and trans-formations of words called transductions, in finitary ways. The formalizationof the semantics of sequential and parallel programming languages, that usesrespectively program schemes and traces1, the modelling of biological develop-ment and yet other applications have motivated the study of new objects, inparticular of sets of terms.2 These objects and their specifying devices havesince been investigated from a mathematical point of view, independently ofimmediate applications. However, all these investigations have been guided bythree main types of questions: comparison of descriptive power, closure prop-erties (with effective constructions in case of positive answers) and decidabilityproblems.

    A context-free grammar generates words, hence specifies a formal language.However, each generated word has a derivation tree that represents its structurerelative to the considered grammar. Such a tree, which can also be viewed asa term, is usually the support of further computation, typically a translation

    1Traces are equivalence classes of words for congruences generated by commutations ofletters; see the book [*DiekRoz]. For program schemes, see [*Cou90a]. The list of referencesis divided into two parts. The first part lists books, book chapters and survey articles: the *in, e.g., [*DiekRoz] indicates a reference of this kind. The second part lists research articlesand dissertations.

    2In Semantics, one is also interested in infinite words, traces and terms. In this book thesewill not be considered.

    1

  • 2 CONTENTS

    into a word of another language (this is the case in linguistics and in compila-tion). Hence, even for its initial applications, Formal Language Theory has hadto deal with trees as well as with words. In Semantics, terms are even moreimportant than words. Thus, sets of terms, usually called tree languages3, andtransductions of terms, called tree transductions, have become central notionsin Formal Language Theory.

    Together with context-free grammars, finite (also called finite-state) au-tomata are among the basic notions of Language Theory, in particular for theirapplications to lexical analysis and pattern matching. They have also beenused early (around 1960) for building algorithms to check the validity of certainlogical formulas, especially those of Monadic Second-Order Logic, in certain re-lational structures. On the other hand, Monadic Second-Order Logic can beused to specify and to classify sets of words and terms.4 There are deep re-lationships between monadic second-order formulas and finite automata thatrecognize words and terms (see [*Tho97a]). The fundamental result is that ev-ery language that is specified by a sentence of Monadic Second-Order Logic(expressing a property of words) can be recognized by a finite automaton, andvice-versa. Moreover, the finite automaton can be constructed effectively fromthe sentence. This means that Monadic Second-Order Logic can be viewed as ahigh-level specification language that can be compiled into machine code: afinite automaton that recognizes the words that satisfy the specification. Thesame result holds for terms, with respect to finite automata on trees. As a con-sequence of this fundamental relationship, Monadic Second-Order Logic is nowone of the basic tools used in Formal Language Theory and its applications,in addition to context-free grammars, finite automata and finite transducers(which are finite automata with output).

    The extension of the basic concepts of Formal Language Theory to graphsis a natural step because graphs generalize trees. However, graphs have alreadybeen present from the beginnings in several of its fields. In compilation, oneuses attribute grammars that are context-free grammars equipped with seman-tic rules ([*AhoLSU, *Cre]). These rules associate graphs (called dependencygraphs) with derivation trees. An attribute grammar is actually the paradig-matic example of a context-free graph grammar (based on hyperedge replacementrewriting rules, [*DreKH]). In the semantics of parallelism, traces are canon-ically represented by graphs, and an important concern is to specify them byfinite automata ([*DiekRoz]).

    One starting point of the research presented in this book has been the devel-opment of a robust theory of context-free graph grammars, of recognizability ofgraphs (to be short, an algebraic formulation of finite automata) and of graphtransductions. In order to use the theory of context-free grammars and rec-ognizability in arbitrary algebras initiated by Mezei and Wright in [MezWri],we choose appropriate (and natural) operations on graphs. Thus, graphs be-

    3In addition to being words, terms have canonical representations as labelled, rooted andordered trees. They are thus called trees but this terminology is inadequate.

    4This logical language and the related one called -calculus ([*ArnNiw]) are also convenientfor expressing properties of programs.

  • CONTENTS 3

    come the value of terms that are built with these (infinitely many) operations.Roughly speaking, a context-free graph grammar is a finite set of rules of theform A0 f(A1, . . . , An), n 0, where each Ai is a nonterminal of the gram-mar and f is one of the chosen graph operations. The rule means that if thegraphs G1, . . . , Gn are generated by respectively A1, . . . , An, then A0 can gener-ate the graph f(G1, . . . , Gn). Such grammars have useful applications to GraphTheory: they can be used to describe many graph classes in uniform ways andto prove by inductive arguments certain properties of their graphs. Still roughlyspeaking, a set of graphs is recognizable if there is a finite automaton that rec-ognizes all the terms that evaluate to a graph in the set. Thus, the automatondoes not work directly on the given graph, but rather on any term that repre-sents that graph. In a similar way one can define graph transductions throughthe use of tree transducers. Note that, to describe a set of graphs or a graphtransduction in a finitary way, one can necessarily use only finitely many graphoperations. As we will see, that is a rather severe, but natural restriction.

    Our main goal will be to show that the fundamental use of Monadic Second-Order Logic as a high-level specification language carries over to graphs, notonly for the specification of recognizable sets of graphs, but also for context-free sets of graphs and for certain types of graph transductions. This gives anew dimension to the above-mentioned fundamental result for words and terms,because the properties of graphs that can be specified in Monadic Second-OrderLogic are more varied and useful than those of words and terms.

    We will specify a set of graphs by a monadic second-order sentence, and agraph transduction by a tuple of monadic second-order formulas that define aninterpretation of the output graph in the input graph. From such a specifica-tion we will show how one can construct a finite automaton on terms, or a treetransducer in the second case, that is related to the specification as explainedabove. Note that the logic acts directly on the graphs, whereas the automataand transducers work on the terms that denote these graphs. Thus, MonadicSecond-Order Logic can be viewed as playing the role of finite automata ongraphs and finite transducers of graphs in our Formal Language Theory forGraphs.

    Graph algorithms

    The above-mentioned developments have important applications for the con-struction of polynomial-time algorithms on graphs. In his 16th NP-completenesscolumn, published in 1985 [John], Johnson reviews a number of NP-completegraph problems that become polynomial-time solvable if their inputs are re-stricted to particular classes of graphs such as those of trees, of series-parallelgraphs, of planar graphs to name a few. For many of these classes, in particularfor trees, almost trees (with parameter k), partial k-trees, series-parallel graphs,outerplanar graphs and cographs, the efficient algorithms take advantage of cer-tain hierarchical structures of the input graphs. Because of these structures,

  • 4 CONTENTS

    these graphs are somehow close to trees5. The notion of a partial k-tree hasemerged as a powerful one subsuming many other types of tree-like graphs.(The cographs have a canonical hierarchical structure but they are not includedin the class of partial k-trees for any fixed k.) Many articles have producedpolynomial-time algorithms for NP-complete problems restricted to partial k-trees. In 1994, Hedetniemi has compiled a list of 238 references [*Hed] onpartial k-trees and algorithms concerning them. The notion of a partial k-treehas also been used with a different terminology (tree-width, tree-decomposition)by Robertson and Seymour in their study of the structure of graph classes thatexclude fixed graphs as minors. They formulate this notion in terms of particu-lar decompositions of graphs, called tree-decompositions, that are at the basis ofthe construction of polynomial-time algorithms. Each tree-decomposition has awidth, and a graph is a partial k-tree if and only if it has tree-width at most k,which means that it has a tree-decomposition of width at most k.

    The recent theory of Fixed-Parameter Tractability (the founding book byDowney and Fellows [*DowFel] was published in 1999) now gives a conceptualframework to most of these results. The notion of a fixed-parameter tractable al-gorithm specifies how the multiplicative constant factor of the time-complexityof a polynomial-time algorithm depends on certain parts of the data. It happensthat for most of the graph algorithms based on tree-decompositions, the expo-nent of the polynomial is 1: these algorithms are linear-time in the size of theinput graphs, with multiplicative constant factors that depend exponentially(or more) on the widths of the input tree-decompositions.

    The explanation for this fact is one of the main goals of this book. We willshow that, for a certain natural choice of graph operations, tree-decompositionscorrespond to terms, and tree-decompositions of width at most k correspondto terms that are built from a finite subset of those operations. A generalalgorithmic result that encompasses many of the above-mentioned results, fol-lows from the fundamental relationship between monadic second-order logicand finite automata discussed before: if the considered problem is specified bya monadic second-order sentence (and this is the case for many NP-completegraph problems not using numerical values in their inputs), then a finite au-tomaton on the terms that encode the tree-decompositions of width at most kcan be constructed (for each k) to give the answer to the considered question(for example, Is the given graph 3-colorable? ) where the input graph is givenby a tree-decomposition (or a term encoding it). The linearity result followsbecause finite automata can be implemented so as to work in linear time (andbecause a tree-decomposition of a graph can be found in linear time).

    We will extend the case of tree-width bounded graphs (already discussed in[*DowFel]) to another type of graph decompositions, based on another naturalchoice of graph operations. This leads to the notion of clique-width of a graph.Clique-width is more powerful than tree-width in the sense that every set of

    5These classes can actually be generated by certain context-free graph grammars andthe corresponding hierarchical structures of the generated graphs are represented by theirderivation trees. There is thus a close relationship between the algorithmic issues and theextensions of language theoretic concepts discussed above.

  • CONTENTS 5

    graphs of bounded tree-width has bounded clique-width but not vice-versa, anexample being the set of cographs. On the other hand, in the above generalresult, the monadic second-order sentences must be restricted to use quantifi-cations on sets of vertices (instead of both vertices and edges), so less graphproblems can be specified. The algorithms are cubic-time instead of linear-timebecause, for these graph operations, cubic time is needed to find a term for agiven graph.

    The theory that will be exposed in the nine chapters of this book has arisenfrom the confluence of the two main research directions presented above. Theremainder of this introduction will present in a more detailed way, but stillinformally, the main concepts and results.

    The role of logic

    We will study and compare finitary descriptions of sets of finite graphs by usingconcepts from Logic, Universal Algebra and Formal Language Theory. We firstexplain the role of Logic. A graph6 can be considered as a logical structure (alsocalled relational structure) whose domain (also called its universe) consists ofthe vertices, and that is equipped with a binary relation that represents adja-cency. Graph properties can thus be expressed by logical formulas of differentlanguages and classified accordingly.

    First-order formulas are rather weak in this respect because they can onlyexpress local properties such as that a graph has maximum degree or diameterbounded by a fixed integer. Most properties of interest in Graph Theory canbe expressed by second-order formulas: these formulas can use quantificationson relations of arbitrary arity. Unfortunately, little can be obtained from theexpression of a graph property in second-order logic. Our favourite logical lan-guage will be its restriction called Monadic Second-Order Logic. Its formulasare the second-order formulas that only use quantifications on unary relations,i.e., on sets. They can express many useful graph properties like connectivity,p-colorability (for fixed p) and minor inclusion, whence planarity. Such proper-ties are said to be monadic second-order expressible, and the corresponding setsof graphs are monadic second-order definable.

    These logical expressions have interesting algorithmic consequences as ex-plained above, but only for graphs that are somehow tree-like (because 3-colorability is NP-complete and expressible by a monadic second-order sen-tence). Monadic second-order sentences are also used in Formal Language The-ory to specify languages, i.e., sets of words or terms. The fundamental resultestablishes that monadic second-order sentences and finite automata have thesame descriptive power. But monadic second-order formulas are even more im-portant for specifying sets of graphs than for specifying languages because thereis no convenient notion of graph automaton. They replace finite automata,not only for specifying sets of graphs, but also for specifying graph transforma-

    6In order to simplify the discussion, we only discuss simple graphs, i.e., graphs withoutparallel edges.

  • 6 CONTENTS

    tions. Such transformations, called monadic second-order transductions, gen-eralize the transductions of terms and words defined by finite automata withoutput called finite transducers.7 Independently of these language theoreticapplications, monadic second-order transductions are technically useful for con-structing monadic second-order formulas because the inverse image of a monadicsecond-order definable set of relational structures under a monadic second-ordertransduction is monadic second-order definable.

    However, monadic second-order logic alone yields no interesting results. Inorder to be useful for the construction of algorithms, the expression of a graphproperty by a monadic second-order sentence must be coupled with constraintson the graphs of interest such as having bounded tree-width or bounded clique-width. The language theoretical issues to be discussed below will also combinemonadic second-order sentences and the very same constraints. Hence, we willstudy certain hierarchical graph decompositions, like tree-decompositions, thatfit with monadic second-order logic.

    Graph algebras

    Graph decompositions will be formalized algebraically by terms written with ap-propriate graph operations. Hence, we will use concepts from Universal Algebrain addition to ones from Logic.

    For treating graphs as algebraic objects, i.e., as elements of appropriate alge-bras (words and traces are elements of monoids), we will define graph operationsthat generalize the concatenation of words. We will consider two natural ways toconcatenate two graphs. One way is to glue them together, by identifyingsome of their vertices. The other way is to bridge them (or rather, bridge thegap between them), by adding edges between their vertices. Clearly, to obtainsingle valued operations, we have to specify which vertices must be glued orbridged. By means of labels attached to vertices, we will specify that verticeswith the same label must be identified, or that edges must be created betweenall vertices with certain labels. Hence, we will define concatenation opera-tions on labelled graphs. To allow the flexible use of vertex labels, we also define(unary) operations that modify these labels. Terms written with these opera-tions evaluate to finite (labelled) graphs. The value G of a term t = f(t1, t2)is a certain combination, specified by f , of the values of its subterms t1 and t2.These values are, roughly speaking, subgraphs of G (only roughly because thelabels of the vertices of the graphs defined by t1 and t2 may differ from theirlabels in the resulting graph G). The same holds for all subterms of t, hence, trepresents a hierarchical decomposition of G.

    Based on the idea of gluing graphs (and using the numbers 1, . . . , k + 1as labels), we will define, for each k, a finite set of graph operations, FHR[k+1],that generates exactly the graphs of tree-width at most k. Hence, these oper-ations formalize algebraically an existing combinatorial notion. They yield agraph algebra (that generalizes the monoid of words) having countably many

    7In particular, the rational transductions that are transductions of words defined either byfinite(-state) transducers or, algebraically, in terms of homomorphisms and regular languages.

  • CONTENTS 7

    operations. We will call it the HR algebra for reasons explained below. An-other countable family of graph operations, also indexed by positive integersand based on the idea of bridging graphs, will yield a different graph algebra,called the VR algebra, and a graph complexity measure called clique-width. Bydefinition, a graph has clique-width at most k if it belongs to the analogous finiteset of graph operations FVR[k] . As observed before, clique-width is more powerfulthan tree-width in the sense that every set of graphs of bounded tree-width hasbounded clique-width but not vice-versa. Many definitions and results will besimilar for these two graph algebras. We will explain below why both algebrasare interesting.

    The introduction of graph operations is essential for our project of extendingto graphs the basic concepts of Formal Language Theory in a clean way. We willuse for that the algebraic notions of an equational set and of a recognizable set.An equational set is a component of the least solution of an equation systemwritten with set union and the operations of the considered algebra. Equationsystems formalize context-free grammars that generate elements of the algebra:if such a context-free grammar has, e.g., three rules A f(B,C), A g(A)and A a for the nonterminal A, where B and C are two other nonterminals,f and g are operations of the algebra and a is a constant of the algebra, thenthe corresponding equation system has the equation A = f(B,C) g(A) {a}(where A, B and C now stand for sets of elements of the algebra). The context-free languages are actually the equational sets of the monoids of words over theirterminal alphabets (due to the least fixed-point characterization of context-freegrammars of [GinRic] and [ChoSch]). A recognizable set is a set saturated bya congruence having finitely many classes. The regular languages are thus therecognizable sets of the monoids of words. When all elements of the algebracan be denoted by a term (which is the case for the HR and VR algebra), aset is recognizable if and only if there exists a finite automaton on terms thatrecognizes all the terms that evaluate to an element of the set.

    The chart of Figure 1 shows some relationships between the above definednotions. An arrow means: used for a definition or a construction.

    Two graph algebras

    Since we will define two graph algebras, we will obtain two types of equationalsets, called the HR- and the VR-equational sets. For each k, the set of graphs oftree-width at most k is HR-equational (because it is generated by the finite setof operations FHR[k+1]), and similarly, the set of graphs of clique-width at most kis VR-equational. There are also two types of recognizable sets of graphs, theHR- and the VR-recognizable sets. Every HR-equational set is VR-equationaland every VR-recognizable set is HR-recognizable, but not vice-versa. The classof HR-equational sets is incomparable with the class of HR-recognizable sets,and similarly for the VR algebra.

    These facts show some important differences with the case of words. Forwords, we have a unique algebraic structure based on a single operation, andthe class of recognizable sets (the regular languages) is properly included in that

  • 8 CONTENTS

    Figure 1: The main notions.

    of equational sets (the context-free languages). But graphs are intrinsically morecomplicated than words: this explains why we need countably many operationsand not just one. We will explain next why we have two algebras and two(robust) classes of equational sets that both generalize the class of context-freelanguages.

    The two graph algebras have been defined initially in such a way thattheir equational sets coincide with existing context-free sets of graphs: the HR-equational sets are actually (but not by definition) those generated by certaincontext-free graph grammars based on a rewriting mechanism called HyperedgeReplacement (that uses gluing of graphs) and we call the corresponding alge-bra the HR algebra to refer to this fact; the other algebra, called the VR algebra,has been designed similarly so that its equational sets are those generated by thecontext-free graph grammars based on Vertex Replacement (that uses bridg-ing of graphs); see [*DreKH] and [*EngRoz] respectively for these two typesof graph grammars.

    Many properties of the equational and recognizable sets of graphs of bothkinds are just particular instances of those of the equational and recognizablesets in arbitrary algebras. By using this algebraic approach, we generalize thecontext-free languages without having to define a graph rewriting mechanismand check that such rewriting is actually context-free (the general notion ofcontext-free rewriting is defined in [Cou87]). Similarly, we generalize the regularlanguages without having to define any notion of graph automaton and to lookfor closure properties of the class of sets of graphs that are recognized by suchautomata.

  • CONTENTS 9

    Monadic Second-Order Logic and the VR graph algebra

    We first discuss the equational sets and the recognizable sets of the VR algebra,and their relationships with monadic second-order logic.

    Two main results of this book are the Recognizability Theorem and theEquationality Theorem. They relate two ways of handling graphs: the logicalway by which graphs are characterized in terms of what they are made of andcontain (vertices, edges, paths, minors, subgraphs with particular properties)and the algebraic way by which sets of graphs are characterized more globallyby means of equation systems and congruences. In the latter approach, graphsare treated as elements of algebras and related with other elements that are notnecessarily among their subgraphs.

    The Recognizability Theorem says that if a set of graphs is monadic second-order definable, then it is VR-recognizable. The Equationality Theorem saysthat a set of graphs is VR-equational if and only if it is the image of the set offinite trees under a monadic second-order transduction8. Here are some conse-quences of these two results.

    The Recognizability Theorem entails that if a graph G is defined by a termt written with operations of the VR algebra belonging to any fixed finite set F ,then one can check in time O(|t|), whether or not G satisfies a fixed monadicsecond-order property. This fact, based on a compilation of monadic second-order formulas into finite automata over F , is one of the keys9 to the constructionof fixed-parameter tractable algorithms for the verification of monadic second-order properties of graphs of bounded clique-width (whence also of graphs ofbounded tree-width since bounded tree-width implies bounded clique-width).Another consequence is the Filtering Theorem saying that the graphs of a VR-equational set that satisfy a fixed monadic second-order property (for exampleplanarity) form a VR-equational set. This is based on a Filtering Theoremthat holds in all algebras and says that the intersection of an equational setand a recognizable one is equational (generalizing the corresponding fact forcontext-free and regular languages). Since the emptiness of an equational set isdecidable, we get as another corollary that the monadic second-order satisfia-bility problem is decidable for each VR-equational set L. This means that onecan decide whether or not a given monadic second-order sentence is satisfied bysome graph in L.

    The Equationality Theorem entails that the class of VR-equational sets ofgraphs is preserved under monadic second-order transductions, because the classof monadic second-order transductions is closed under composition. This corol-lary strengthens the Filtering Theorem. It is similar to the fact that the imageof a context-free language under a rational transduction is context-free.

    8This means, informally, that it is the set of graphs defined inside finite trees by a fixedfinite tuple of monadic second-order formulas. These transductions are based on, and extend,the model-theoretical notion of interpretation.

    9The other one is a polynomial-time algorithm that finds a term evaluating to a givengraph G if there exists one.

  • 10 CONTENTS

    Monadic second-order logic and the HR graph algebra

    The Recognizability and the Equationality Theorems have versions relative tothe HR algebra. For describing them, we must go back to the initial definitionof monadic second-order formulas (MS formulas in the sequel) interpreted ingraphs: they only use quantifications on vertices and sets of vertices. This is dueto the chosen representation of a graph by a relational structure whose domainis its set of vertices. However, we can also express logically the properties of agraph G via its incidence graph Inc(G). The vertices of this (bipartite) graph arethe vertices and the edges of G, and its adjacency relation links a vertex and theedges incident with it. Thus, monadic second-order formulas to be interpretedin Inc(G) (MS2 formulas in the sequel) can also use quantifications on edgesand sets of edges. A graph property is MS2-expressible if it is expressible byan MS2 formula, and the corresponding set of graphs is MS2-definable. Thenotation MS2 refers to this extension of the initially defined language (referred toby MS in the sequel). It is strictly more expressive. For example, the existenceof a perfect matching is MS2-expressible but not MS-expressible. However,MS2 formulas are not more expressive than MS formulas for properties of words,of trees and of certain types of graphs such as planar graphs and, for each k, ofgraphs of degree at most k. These facts show the existence of deep links betweenstructural graph properties (such as planarity) and the expressive power of MS2versus MS sentences.

    The Recognizability Theorem for the HR algebra says that every MS2-definable set of graphs is HR-recognizable, and the Equationality Theorem saysthat a set of graphs is HR-equational if and only if the set of its incidence graphsis the image of the set of finite trees under a monadic second-order transduction.We obtain an algorithmic consequence similar to the one we have discussed forMS-expressible problems and the VR algebra: if a graph is defined by a term tover FHR[k] for some fixed k, then one can check in time O(|t|) whether or not itsatisfies a fixed MS2 property. Since there exists a polynomial-time algorithmthat decomposes appropriately the input graphs, we obtain, for each MS2 prop-erty, a fixed-parameter tractable verification algorithm, tree-width being theparameter. The algorithm for MS properties applies to larger classes of graphs,because bounded tree-width implies bounded clique-width, but to less proper-ties than this one, because not every MS2-expressible property is MS-expressible.The notions of tree-width and clique-width are thus both useful, for solving dif-ferent problems. We also have a Filtering Theorem for the HR-equational setsand MS2-expressible properties, whence the decidability of the MS2-satisfiabilityproblem for each HR-equational set. The Equationality Theorem for the HRalgebra entails that the class of HR-equational sets of graphs is preserved underthe monadic second-order transductions that transform incidence graphs.

    A graph is uniformly k-sparse if its number of edges is at most k times itsnumber of vertices, and the same holds for all its subgraphs. Another mainresult of this book is the Sparseness Theorem: MS2 formulas are not moreexpressive than MS formulas for properties of uniformly k-sparse graphs, foreach fixed k. The above-mentioned types of graphs are uniformly k-sparse for

  • CONTENTS 11

    some k.

    Logical and Language Theoretical issues

    Tree-width and clique-width are closely related with the decidability of monadicsecond-order satisfiability problems for particular sets of graphs. The satisfia-bility problem of MS2 sentences for the set of graphs of tree-width at most somefixed k is decidable (because it is decidable for each HR-equational set), andthe same holds for MS sentences and the set of graphs of clique-width at mostk. Some converse results also hold: bounded tree-width is a necessary (but notsufficient) condition for a set of graphs to have a decidable MS2-satisfiabilityproblem, and a similar result holds for clique-width and MS sentences. Theirproofs use monadic second-order transductions and deep results of Graph The-ory.

    The Recognizability and the Equationality Theorems contribute to estab-lishing the foundations of a sound and robust extension of the theory of formallanguages to the description of sets of finite graphs. In this extension, monadicsecond-order logic plays a major role. From the above informal (and simplified)statements, this extension may seem to be straightforward. However, graphsare intrinsically more complex than words and terms, and some results do notextend as one could expect or hope. We give two examples. First, the set of allgraphs is not equational in any of the two graph algebras, whereas the set ofall words over a finite alphabet is (almost trivially) context-free. Second, thereare uncountably many VR- and HR-recognizable sets of graphs, and this factprevents any exact characterization of these sets in terms of graph automata orlogical formulas. Such a characterization would generalize nicely the classicalcharacterization of the recognizable (i.e., the regular) languages in terms of fi-nite automata and monadic second-order sentences, but it cannot exist. Theseexamples are related to the fact that the sets of operations of the two graphalgebras are infinite, and that this infiniteness is somehow unavoidable10.

    Graph structure

    Graph structure is a flexible concept covering many different cases. Graph de-compositions form an important type of structuring. We have already discussedthose that yield the notions of tree-width and clique-width in connection withalgorithmic applications. There exist other types of graph decomposition thatare useful for algorithmic purposes or for proving results. Examples are the mod-ular decomposition defined by Gallai [Gal], the decomposition in 3-connectedcomponents defined by Tutte [*Tut] and the clique-sum decomposition usedby Robertson and Seymour [*Gro, RobSey03]. The existence of an embeddingin a fixed surface, or of a homomorphism into a fixed graph (a proper vertexcoloring with p colors of a loop-free graph can be defined as a homomorphism

    10One can generate all finite graphs by a finite number of graph operations, but the Rec-ognizability Theorem fails for the corresponding algebra. So this algebra is useless for ourpurposes.

  • 12 CONTENTS

    of this graph into the complete graph Kp) is a type of structure. Finally, thenon-existence in a graph of particular induced subgraphs, minors or vertex-minors is also an important type of structural property. (See [*Die] for minorsand [Oum05] for vertex-minors). There exist nontrivial relations between thesedifferent notions. Here are some examples: the graphs without a fixed pla-nar graph as a minor have tree-width bounded by a value computable from thisgraph and those embeddable in a fixed surface are characterized by finitely manyexcluded minors [*Die]; forbidding certain induced subgraphs implies boundedclique-width [BraDLM, BraELL].

    Monadic second-order sentences can express many such structural proper-ties. The expression of p-colorability (for fixed p) is immediate. It is easy toconstruct a monadic second-order sentence expressing that a given graph hasno minor or no induced subgraph isomorphic to a fixed graph. Hence the sets ofplanar graphs and of graphs of tree-width at most k (for all k) are MS-definablebecause each of them is characterized by finitely many excluded minors. A setof graphs defined by finitely many excluded induced subgraphs (this is the caseof cographs) or by an infinite but MS-definable set of minimal excluded inducedsubgraphs is also MS-definable. The latter observation applies to comparabil-ity graphs ([Cou06a, Gal]) and to perfect graphs ([*ChuRST, ChuRST]). Theirdefinitions are not directly expressible by monadic second-order sentences, andfinding the minimal excluded induced subgraphs requires difficult proofs.

    In many situations concerning graph structure, we need more than a yes orno answer. For example, that a graph does not contain K5 or K3,3 as a minorimplies that it is planar, but this negative fact, when it is valid, does not help tofind a planar embedding. In other words, we are not only interested in checkingthat a given graph has some structure, e.g., has a planar embedding or a tree-decomposition of width bounded by a fixed integer, but we are also interestedin having a monadic second-order transduction that constructs from the givengraph some planar embedding or some tree-decomposition. Such transductionsmay be difficult to construct. Some constructions are given in [Cou96a, Cou99,Cou00, Cou06b, Cou08a] and challenging questions remain open in this area.

    To conclude with this aspect, we can state that many constructions ofmonadic second-order formulas and transductions use in an essential way re-sults of Graph Theory, and even very deep ones in some cases. Conversely, themethods developed in this book bring new results in Graph Theory apart fromalgorithmic applications. For example, the infinite set of minimal excludedinduced subgraphs that characterizes the comparability graphs has a certainregularity that we can formalize by observing that this set is VR-equational.Such applications deserve further study.

    The main contributions of this book

    Let us summarize the main ideas and results to be developed in this book. Wedefine two graph algebras called the HR algebra and the VR algebra, from whichwe get two classes of equational and two classes of recognizable sets of graphs.The terms of these algebras denote graphs and formalize certain hierarchical

  • CONTENTS 13

    decompositions from which we get the graph complexity measures called tree-width and clique-width.

    Monadic second-order logic in its two variants denoted by MS and MS2 canbe used to express formally graph properties and thus to specify sets of graphs.The Recognizability Theorem says that every MS-definable set of graphs isrecognizable in the VR algebra and that every MS2-definable set is recognizablein the HR algebra. We obtain from it fixed-parameter tractable algorithmsfor checking MS and MS2 properties with, respectively, clique-width and tree-width as parameters. It entails that the corresponding monadic second-ordersatisfiability problems are decidable for the equational sets of the two algebras.The Sparseness Theorem says that MS and MS2 logic have the same power fordefining sets of uniformly k-sparse graphs.

    Graph transformations called monadic second-order transductions can bespecified by MS or by MS2 formulas. The Equationality Theorem says thatthey generate from the set of trees, respectively, the equational sets of the VRand of the HR algebra. This shows the robustness of this theory that combinesalgebraic and logical notions. Its main definitions and results are actually for-mulated for relational structures which generalize graphs and incidence graphs,but several problems are open regarding this extension.

    We will only consider finite graphs and finite relational structures. Anotherbook would be necessary to cover the rich existing theory of countable structuresthat is important in Program Semantics.

    Summary

    The letters GT, UA, LT, L and A indicate that a chapter deals mainly withGraph Theory, Universal Algebra, (Formal) Language Theory, Logic and Algo-rithmic applications respectively.

    Chapter 1 is an overview which presents the main definitions and results inan informal way, with the help of examples.

    Chapter 2 (GT, UA) defines two families of graph operations and the as-sociated graph complexity measures of tree-width and clique-width. The twocorresponding graph algebras, called the HR algebra and the VR algebra, arefirst defined as single sorted algebras and, later on, they are refined intomany-sorted algebras.

    Chapter 3 (UA, LT) defines and studies the equational and recognizable setsof many-sorted algebras in general. Its main result is the (algebraic version ofthe) Filtering Theorem.

    Chapter 4 (GT, LT) applies the definitions and results of Chapter 3 to thegraph algebras defined in Chapter 2 and establishes results which do not followonly from the general algebraic definitions.

    Chapter 5 (L, UA, GT) introduces monadic second-order logic and devel-ops tools for expressing graph properties by monadic second-order formulas.Definitions and proofs are given for relational structures. In particular, theRecognizability Theorem is proved for a many-sorted algebra of finite relational

  • 14 CONTENTS

    structures. The particular cases of this theorem for the HR and the VR graphalgebras follow as immediate corollaries.

    Chapter 6 (L, LT, A) is devoted to algorithmic applications. It reviews theparsing algorithms that construct the necessary expressions of the input graphsby terms over the operations of the HR and of the VR algebra. It develops indetail the compilation of monadic second-order formulas into finite automataintended to run on the terms resulting from the parsing step. This constructionis hopefully more usable than the one of Chapter 5. It yields alternative proofsof weaker versions of the Recognizability Theorem.

    Chapter 7 (L, LT, GT) defines monadic second-order transductions and es-tablishes their main properties: closure under composition and preservation ofmonadic second-order definability under inverse monadic second-order trans-ductions: we call this latter result the Backwards Translation Theorem. TheEquationality Theorem characterizes the VR- and the HR-equational sets as theimages of the set of trees (equivalently of terms over any rich enough functionalsignature) under monadic second-order transductions of appropriate types. Fourtypes of transductions come from the two possible representations of a graphby a relational structure for the input and the output. The Equationality The-orems characterize bounded clique-width and bounded tree-width in a way thatdoes not depend on the graph operations chosen in Chapter 2. Hence, VR-and HR-equationality as well as the properties of having bounded clique-widthand tree-width are robust in the sense that they are stable under the monadicsecond-order transductions that are respectively specified by MS and by MS2formulas.

    Chapter 8 (L, LT) shows that the classical automata-theoretic characteriza-tion (recalled in Chapter 5) of the monadic second-order definable sets of terms(hence, also of words) extends to monadic second-order transductions. Moreprecisely, these transductions are characterized in terms of two-way finite-statetransducers on words, and of tree-walking transducers on terms, where treerefers to the representation of terms by labelled ordered trees. Characterizationsof the VR-equational (equivalently of the HR-equational) languages of wordsand terms are obtained. Every (functional) monadic second-order transductionof graphs of bounded tree-width can be realized by a tree-walking transduceron the level of terms (of the HR algebra), i.e., it can be realized by parsingthe input graph, applying the tree transducer to the resulting input term, andevaluating the output term of the transducer to produce the output graph. Thesame result holds for clique-width and the VR algebra. This can be viewed asa generalization of the Recognizability Theorem to graph transductions.

    Chapter 9 (L, GT, UA, A) extends to finite relational structures the defi-nitions and results of Chapters 2 to 7. It contains in particular an extensionto relational structures of the Equationality Theorem. Although many resultsextend easily from graphs to relational structures, some seemingly difficult ques-tions remain open. Additionally, this chapter proves the Sparseness Theoremwhich establishes that, for expressing properties of uniformly k-sparse graphs

  • CONTENTS 15

    (or relational structures11) by monadic second-order formulas, quantificationsover sets of edges (or sets of tuples, respectively) bring no additional power:every MS2 formula can be translated into an equivalent MS formula.

    Chapter 10 reviews some open problems and some results not presented inthe previous chapters.

    The bibliography is organized in two parts: the first part (with referencelabels starting with *) lists books, book chapters and survey articles. Thesecond one lists research articles and dissertations.

    All necessary definitions will be given, but the reader is expected to be famil-iar with the basic notions of Logic (mainly first-order logic), of Universal Algebra(algebras, congruences), of Formal Language Theory (context-free grammars, fi-nite automata), and of Graph Theory (basic notions). Chapters 2 to 9 presentdetailed proofs of results that have been published in articles. It was not aneasy task to elaborate consistent definitions and notations for many differentnotions from various fields, namely Language Theory, Universal Algebra, Logicand Graph Theory. By giving precise definitions and carefully written proofs,our first aim is to give a robust foundation to the field described in this intro-duction. Our second aim is that these definitions and proofs can be adaptedto related notions, implemented and improved by researchers without too mucheffort.

    Acknowledgements

    We thank M. Nivat for writing a preface. The LATEX typing of the first drafts ofmost chapters has been done by graduate and doctoral students of Bordeaux 1University: M. Kante, R. Chen, R. Synave, R. Li and S. Abbas. We thank themwarmly for their care and patience. We also thank A. Arnold, A. Blumensath,S. Djelloul, I. Durand and S. Oum who have read and commented some chapters.

    Without being a member of the Institut Universitaire de France (IUF),B. Courcelle could not have started the project of writing this book. He thanksM. Nivat and W. Thomas who presented his application to IUF, and all thosewho supported it by writing recommendation letters. He dedicates his work tothe memory of Philippe Flajolet (1948-2011), a forty year friend with whom hebegan his career of researcher at INRIA in 1973.

    11Relational structures are used to prove the case of the theorem that concerns graphs.A relational structure is uniformly k-sparse if its number of tuples is at most k times thecardinality of its domain, and the same holds for all its substructures.

  • 16 CONTENTS

  • Chapter 1

    Overview

    This chapter presents the main definitions and results of this book and theirsignificance, with the help of a few basic examples. It is written so as to bereadable independently of the others. Definitions are sometimes given infor-mally, with simplified notation, and most proofs are omitted. All definitionswill be repeated with the necessary technical details in the subsequent chapters.

    In Section 1.1, we present the notion of equational set of an algebra byusing as examples a context-free language, the set of cographs and the set ofseries-parallel graphs. We also introduce our algebraic definition of derivationtrees.

    In Section 1.2, we introduce the notion of recognizability in a concrete way,in terms of properties that can be proved or refuted, for every element of theconsidered algebra, by an induction on any term that defines this element. Weformulate a concrete version of the Filtering Theorem saying that the intersec-tion of an equational set and a recognizable one is equational. It follows thatone can decide if a property belonging to a finite inductive set of properties isvalid for every element of a given equational set. We explain the relationshipbetween recognizability and finite automata on terms.

    In Section 1.3, we show with several key examples how monadic second-ordersentences can express graph properties. We recall the fundamental equivalenceof monadic second-order sentences and finite automata for words and terms.

    In Section 1.4, we introduce two graph algebras. They are called the VR andthe HR algebra because their equational sets are those that are generated by thecontext-free Vertex Replacement and Hyperedge Replacement graph grammarsrespectively. The cographs and the series-parallel graphs are respectively ourcurrent examples of a VR- and an HR-equational set. We state (a weak versionof) the Recognizability Theorem which says, in short, that monadic second-order definability implies recognizability. From it we obtain a logical versionof the Filtering Theorem where the recognizable sets are defined by monadicsecond-order sentences.

    In Section 1.5, we review the basic definitions of Fixed-Parameter Tractabil-ity and we state the algorithmic consequences of the (weak) Recognizability

    17

  • 18 CHAPTER 1. OVERVIEW

    Theorem. This theorem has actually two versions, relative to the two graphalgebras defined in Section 1.4, and yields two Fixed-Parameter TractabilityTheorems.

    In Section 1.6, we describe the consequences of the Recognizability and Fil-tering Theorems for the problem of deciding whether a given monadic second-order sentence is satisfied by some graph of tree-width at most a given k or,more generally, by some graph of an equational set.

    In Section 1.7, we introduce the notion of a monadic second-order trans-duction by means of examples that have some graph theoretic content, and westate the Equationality Theorem for the VR algebra. It gives a characterizationof the VR-equational sets, and in particular of the sets of graphs of boundedclique-width, that is formulated in purely logical terms.

    In Section 1.8, we consider monadic second-order formulas interpreted inincidence graphs (as opposed to in graphs directly). These formulas can useedge set quantifications. We compare the corresponding four types of monadicsecond-order transduction and we state the Equationality Theorem for the HRalgebra: it is based on monadic second-order transductions that transform in-cidence graphs.

    In Section 1.9, we define relational structures and we extend to them (eas-ily) some results relative to graphs represented by their incidence graphs. Weintroduce betweenness and cyclic ordering as examples of combinatorial notionsthat are based on linear orderings but are defined in a natural way as ternaryrelations.

    1.1 Context-free grammars

    By starting from the standard notion of a context-free grammar, we introducethe notion of an equational set and we define two equational sets of graphs. Wedefine the equational sets of a (one-sorted) algebra and the corresponding setsof derivation trees.

    1.1.1 Context-free word grammars

    By using context-free grammars, one can specify certain formal languages, namelythe context-free languages, in a finitary way. Context-free grammars are usuallydefined as rewriting systems satisfying particular properties, conveyed by theterm context-free and axiomatized in [Cou87]. However, the Least Fixed-PointCharacterization of context-free languages due to Ginsburg and Rice [GinRic]and to Chomsky and Schutzenberger [ChoSch] is formulated in terms of systemsof recursive equations written with the operations of union and concatenationover languages. This algebraic view has been developed by Mezei and Wright[MezWri] and has many advantages. First, it is more synthetic in that it dealswith languages rather than with words produced individually by derivation se-quences. Second, it puts the study of context-free languages in the more general

  • 1.1. CONTEXT-FREE GRAMMARS 19

    framework of recursive definitions handled as least solutions of systems of equa-tions, and last but not least, it is applicable to any algebra. This latter aspectis especially important for the extension to graphs.

    We recall how context-free languages can be characterized as the componentsof the least solutions of certain systems of equations in languages. A context-free grammar G is a finite set of rewriting rules defined with two alphabets,a terminal alphabet A and a nonterminal alphabet N . For every S in N , thecontext-free language over A generated by G from S is denoted by L(G,S).

    Example 1.1 We consider for example the context-free grammar G with ter-minal alphabet A = {a, b, c}, nonterminal alphabet N = {S, T} and rules namedrespectively p, q, . . . , w (where denotes the empty word):

    p : S aSTq : S SSr : S as : T bTSTu : T av : T cw : T

    It defines two languages L(G,S) and L(G,T ) over A, i.e., sets of words in A.These languages satisfy the equations of the following system G:

    G

    {K = aKL KK {a}L = bLKL {a, c, }

    with K = L(G,S) and L = L(G,T ). The pair (L(G,S), L(G,T )) is thus a so-lution of G. However, it is not the only one. The pair of languages (A

    , bA {a, c, }) is another solution as one checks easily1. The Least Fixed-Point Char-acterization of context-free languages establishes that the pair (L(G,S), L(G,T ))is the least solution of G for component-wise inclusion.

    1.1.2 Cographs

    We give two examples of similar definitions of sets of graphs. We first consideras ground set the set Gu of undirected simple graphs2. Two isomorphic graphsare considered as the same object. We will use to denote the disjoint unionof two graphs G and H. This means that GH is the union of G and of a copyof H disjoint with G (hence G G 6= G). We will also use the complete join,GH, defined as GH augmented with undirected edges linking every vertex

    1Since L(G,S) A+ = AA, the pair (A, bA {a, c, }) is a solution of G that differsfrom (L(G,S), L(G,T )).

    2In this book, all graphs are finite. A graph is simple if it has no two parallel edges, i.e., notwo edges with the same ends, and the same directions in the case of directed graphs. Paralleledges are also called multiple edges. An edge with equal ends is a loop. The superscript uin Gu refers to undirected graphs.

  • 20 CHAPTER 1. OVERVIEW

    Figure 1.1: A cograph.

    of G and every vertex of H. We let 1 denote any graph with one vertex and noedges. Note that both and are commutative and associative operations.

    The set of cographs C can be defined as the least set of graphs satisfying theequation

    C = (C C) (C C) {1}. (1.1)

    This set (it is a proper subset of Gu) has actually alternative characterizations(see Section 1.3.1 below). From this equation, one can derive definitions of cer-tain subsets of C. Consider for example the following system of two equations:{

    C0 = (C0 C0) (C1 C1) (C0 C0) (C1 C1)C1 = (C0 C1) (C0 C1) {1}.

    (1.2)

    Its least solution in P(Gu) P(Gu) is the pair of sets (C0, C1) where C0 (resp.C1) is the set of cographs having an even (resp. an odd) number of vertices.

    3 Wewill give general and effective methods for deriving from an equation or a systemof equations that defines a set L, an equation or a system of equations defining{x L | P (x)} where P is a property of the objects under consideration. This ispossible if P has an appropriate inductive behaviour relative to the operationswith which the given equation or system of equations is written.

    From the definition of cographs as elements of the least subset C of Gusatisfying (1.1), it follows that each of them is denoted by a term, more formally,is the value of a term in an algebra of graphs. Examples of terms denotingcographs are

    1, 1 1, (1 1) 1, (1 1) (1 1).

    The cograph of Figure 1.1 is the value of the term t = (111)(1(11)).Since is associative, we have written t by omitting some parentheses asusual, for readability. These terms belong to the set T ({,,1}) of all termswritten with the constant 1 and the two binary operations and . Equa-tion (1.1) can also be solved with ground set T ({,,1}). For this interpreta-tion of (1.1) the unknown C denotes subsets of T ({,,1}). Clearly, the set

    3We denote by P(X) the powerset of a set X, i.e., its set of subsets.

  • 1.1. CONTEXT-FREE GRAMMARS 21

    of terms T ({,,1}) itself is the least (in fact, the only) solution of (1.1) inP(T ({,,1})).

    A similar fact holds for System (1.2). Its least solution in P(T ({,,1}))P(T ({,,1})) is a pair of sets (T0, T1) where T0, T1 T ({,,1}) and foreach i = 0, 1, the set Ci is the set of cographs which are the values of the termsin Ti.

    This example shows that a grammar, i.e., a system of equations like (1.2),specifies not only a tuple of sets of objects, here graphs, but also denotations byterms of the specified objects. These objects can be words, terms, trees, graphs,as we will see. Each term is a formalization of the structure of the object itdenotes, as specified by the grammar; it provides a hierarchical decomposition ofthat object. In many cases, an object can be denoted by several terms that arecorrect with respect to the grammar. In such a case, we say that the grammaris ambiguous. The grammars (1.1) and (1.2) are ambiguous: since and are commutative and associative, most cographs are denoted by more than oneterm.

    As an example of structure, consider again the term (111)(1(11))that denotes the cograph of Figure 1.1. It provides a decomposition of thatcograph, because the subterm 1 1 1 denotes the triangle at the left ofFigure 1.1, whereas the subterm 1 (1 1) denotes the three vertices at theright of Figure 1.1 together with the edge between two of them.

    1.1.3 Series-parallel graphs

    The ground set of graphs is here the set J d2 of directed graphs G equipped withtwo distinct distinguished vertices marked 1 and 2 called its sources, denotedrespectively by srcG(1) and srcG(2). These graphs may have multiple edges

    4.Let e be a constant denoting the graph with two vertices and only one edge fromsource 1 to source 2. The operations are the parallel-composition, denoted by ,and the series-composition, denoted by . For G and H in J d2 , the graph GH isthe union of G and an isomorphic copy H of H such that srcG(1) = srcH(1),srcG(2) = srcH(2), and G and H

    have nothing else in common. We definesrcGH(1) := srcG(1) and srcGH(2) := srcG(2). Note that G G has twice asmany edges as G, hence G 6= G G in general.

    Series-composition is defined similarly. For G,H J d2 , we let G H bethe union of G and an isomorphic copy H of H such that srcG(2) = srcH(1)and G and H have nothing else in common. We let srcGH(1) := srcG(1) andsrcGH(2) := srcH(2). These operations are illustrated in Figure 1.2.

    4The letter J in the notations J d2 and, below, in Jd2 , JS and the related notions refersto graphs that can have multiple edges. By contrast, the letter G used in the notations Guand, below, Gu, GP etc. refers to simple graphs. The subscript 2 refers to the two sources,and the superscript d to directed graphs.

  • 22 CHAPTER 1. OVERVIEW

    Figure 1.2: Series- and parallel-compositions.

    The set of series-parallel graphs5 is defined by the equation:

    S = (S S) (S S) {e} (1.3)

    where by defined we mean that S is the least subset of J d2 satisfying (1.3). Asfor cographs, this definition gives a notation of series-parallel graphs by terms.The set of terms is here T ({, , e}). Examples of terms are:

    e, e e, (e e) (e e), ((e e) e) (e e).

    The graph denoted by the last of these terms is shown in Figure 1.3. Note thatthe subterm (e e) e denotes the three edges at the left of Figure 1.3, withtheir three incident vertices, whereas the subterm e e denotes the two edges atthe right, with their three incident vertices.

    1.1.4 The general setting

    Let F be a (functional) signature, that is, a set of function symbols such thateach symbol f is given with a nonnegative integer intended to be the number ofarguments of the corresponding function. This number is called its arity and isdenoted by (f). A function symbol of arity 0 is also called a constant symbol.

    An F -algebra M is a set M equipped with total functions fM : M(f) Mfor all f in F . We write it M = M, (fM)fF . We call M the domain and fMan operation of M; if f has arity 0, then fM is also called a constant of M. TheF -algebra M is finite if M is finite.

    Let X = {x1, . . . , xn} be a set of unknowns (or variables), intended to denotesubsets of M . A polynomial is an expression of the form p = m1 mk

    5The term series-parallel is also used for partial orders ([*Mohr]) and, in a wider sensefor undirected graphs without K4 as a minor ([*Die]). Our series-parallel graphs are calledtwo-terminal series-parallel digraphs in [*Mohr].

  • 1.1. CONTEXT-FREE GRAMMARS 23

    Figure 1.3: A series-parallel graph.

  • 24 CHAPTER 1. OVERVIEW

    where each mi is a monomial, i.e., a term written with the symbols of F Xand well formed with respect to arities (the unknowns are of arity 0).

    For each n-tuple (L1, . . . , Ln) of subsets of M and each monomial m, the setm(L1, . . . , Ln) is a subset of M . This subset is defined by taking xi = Li andby interpreting each function symbol f as fM, where, for all A1, . . . , A(f) M :

    fM(A1, . . . , A(f)) := {fM(a1, . . . , a(f)) | ai Ai}.

    Hence fM also denotes the extension to sets of the function fM : M(f) M .

    For a polynomial p = m1 mk we define:

    p(L1, . . . , Ln) := m1(L1, . . . , Ln) mk(L1, . . . , Ln).

    A system of polynomial equations (an equation system for short) is a systemof the form:

    S = x1 = p1, . . . , xn = pn (1.4)

    where p1, . . . , pn are polynomials.

    Example 1.2 In the particular case of the grammar G considered in Exam-ple 1.1, we let F = {, , a, b, c} and M = A, , , a, b, c, where A = {a, b, c}and denotes concatenation; the equation system G can be written formallyas follows: {

    x1 = a (x1 x2) x1 x1 ax2 = b ((x2 x1) x2) a c ,

    where the associativity of concatenation is not taken for granted any more. Notethat for the constant symbol a of F we have aM = a and also, according to theabove extension, aM = {a}; similarly, the constant symbol denotes both theempty word and the language {}.

    Going back to the general case, a solution of a system S as in (1.4) is ann-tuple (L1, . . . , Ln) in P(M)n that satisfies the equations of S, which meansthat Li = pi(L1, . . . , Ln) for every i = 1, . . . , n. Solutions are compared bycomponent-wise inclusion and every system has a least solution. The compo-nents of the least solutions of such systems are called the equational sets of theF -algebra M. We will denote by Equat(M) the family of equational sets of M.

    For a signature F , we denote by T (F ) the set of terms written with thesymbols of F and well formed with respect to arities. The usual notation forterms is with the function symbols in leftmost position, their arguments arebetween parentheses and separated by commas. In this notation, the termdenoting the graph of Figure 1.3 is written (((e, e), e), (e, e)).6 As is wellknown, terms can be represented by certain labelled, directed and rooted trees.

    6For associative binary operations the more readable infix notation will be used, althoughit is ambiguous as already observed. The infix notation of this term is ((e e) e) (e e).

  • 1.1. CONTEXT-FREE GRAMMARS 25

    This representation is the reason that terms are usually called trees in FormalLanguage Theory.

    The set T (F ) is turned into an F -algebra, denoted by T(F ), by defining theoperation fT(F ) by

    fT(F )(t1, . . . , t(f)) := f(t1, . . . , t(f)).

    This operation performs no computation; it combines its arguments which areterms into a larger term.

    For every F -algebraM, a term t T (F ) has a value tM in M that is formallydefined as follows:

    tM := fM if t = f and f has arity 0 (it is a constant symbol),

    tM := fM(t1M, . . . , t(f)M) if t = f(t1, . . . , t(f)).

    Since every term can be written in a unique way as f or f(t1, . . . , t(f)) forterms t1, . . . , t(f), the value tM of t is well defined. The mapping t 7 tM, alsodenoted by valM, is the unique F -algebra homomorphism from T(F ) into M.7An F -algebra M is generated by F if every element of M is the value of someterm in T (F ).

    An equation system S of the form (1.4) has a least solution in P(T (F ))n thatis an n-tuple (T1, . . . , Tn) of subsets of T (F ). The least solution (L1, . . . , Ln) ofS in P(M)n is also characterized by Li = {tM | t Ti}, for each i = 1, . . . , n.This is an immediate consequence of a result of [MezWri] saying that the leastfixed-point operator commutes with homomorphisms. A term t in Ti representsthe structure of the element tM of Li as specified by the system S.

    It follows in particular that for each i, Li = if and only if Ti = . Hence theleast solutions of a system S in all algebras have the same empty components.The emptiness of each set Ti can be decided by the algorithm that decides theemptiness of a context-free language. Each set Ti is actually a context-freelanguage over the alphabet consisting of F , parentheses and comma.

    We will use these definitions for algebras of graphs M in the following way:M will be a class of graphs like Gu or J d2 in the examples of cographs and series-parallel graphs, F will be a set of total functions fM : M

    (f) M that will beused to construct graphs. These functions, called the operations ofM, generalizethe concatenation of words. The constants will be basic graphs. For each suchgraph algebra M, the class of equational sets Equat(M) generalizes the classof context-free languages since they are characterized as the components of theleast solutions of equation systems as recalled in Section 1.1.1. There is thusno unique notion of a context-free set of graphs because this notion depends onthe considered algebra.

    However, even in the case of languages, several algebras can be considered,because one can enrich the monoid structure of A by new operations. This

    7In general, a homomorphism from N to M, where N = N, (fN)fF is another F -algebra,is a mapping h : N M such that for every f F and all n1, . . . , n(f) N , we haveh(fN(n1, . . . , n(f))) = fM(h(n1), . . . , h(n(f))).

  • 26 CHAPTER 1. OVERVIEW

    increases the class of equational sets, hence defines richer notions of context-freelanguages, if we take this term in the algebraic sense. The squaring functionthat associates with a word u the word uu, can be such an operation. Anotherone is the shift that associates with a word au the word ua, where a is a letter.The corresponding classes of equational sets have not received specific attention.

    In the case of graphs, we will show that there are only two robust classes ofequational sets, where robust means that they are closed under certain graphtransformations definable by formulas of monadic second-order logic. Thesetransformations, called monadic second-order transductions, play the role ofrational transductions in the theory of formal languages.

    Each of the two classes of context-free sets of graphs is the class of imagesof the set of finite binary trees under monadic second-order transductions ofappropriate forms. Somewhat similarly, the class of context-free languages isthe class of images of the language defined by the equation L = aLbLc {d},under all rational transductions. This language encodes binary trees. Hencetrees play a major role in all three cases.

    1.1.5 Derivation trees

    Context-free grammars specify languages. However the real importance of thenotion of a context-free grammar is that, when a word is recognized as well-formed, the grammar specifies one or several parse trees for this word. Thesetrees are obtained as results of the syntactic analysis (or parsing) of the con-sidered word. They represent the syntactical structures of the considered wordas generated by the grammar. In compiling applications, grammars are con-structed so as to be unambiguous, and each recognized word has a unique parsetree. This tree is the support of further computation, in particular type checkingand translation into intermediate code.

    Similarly, an equation system specifies a set of objects and, as we have seen,it additionally specifies terms that denote those objects and represent theirstructure. Let S be an equation system and M = M, (fM)fF an algebra, andconsider an algorithm that, for each element m of M , computes a term t thatdenotes m as specified by S (if such a term exists). Due to the similarity withcontext-free grammars, we will say that this is a parsing algorithm for S.

    However, if we view a context-free grammar G such as the one of Example 1.1as an equation system S = G over the signature F = {, , a, b, c}, as indicatedin Example 1.2, then the terms in T (F ) specified by the system G are notthe parse trees of G (because they do not show which rules of G are applied).Nevertheless, it is possible to view G as an equation system S in a differentway, such that the terms of S do correspond to the parse trees of G, or rathera variant of parse trees called derivation trees. Let us illustrate this for thecontext-free grammar G of Example 1.1.

    Example 1.3 We consider again the grammar G of Example 1.1. Its rulesare named by symbols p, q, . . . , w, that we will consider as function symbolswith arities defined by such that (s) = 3, (p) = (q) = 2, (r) = (u) =

  • 1.1. CONTEXT-FREE GRAMMARS 27

    Figure 1.4: The parse tree of D and the term d.

    (v) = (w) = 0; they form a signature P . The arity of a rule is the number ofoccurrences of nonterminals in the righthand side of the rule.

    Consider for example the word baaac generated from nonterminal T by thederivation sequence D:

    T bTST bTaSTT baaSTT baaaTT baaaT baaac

    where the rules s, p, u, r, w, v are successively applied. (The arrow denotesthe one-step derivation relation of G). Assuming that rule w is applied to theleftmost T in baaaTT , we associate with D the term d = s(u, p(r, w), v) of T (P ).This term contains more information than the sequence (s, p, u, r, w, v); from itone can find all derivation sequences of the word baaac that are equivalent toD by permutations of steps. In particular the leftmost derivation sequence usessuccessively rules s, u, p, r, w, v, and the rightmost one uses rules s, v, p, w, r, u.Figure 1.4 shows the parse tree of D and the corresponding term d.

    Terms like d will be called derivation trees. We keep the name parse treefor the trees like the one of Figure 1.4 (left part) that are used in the theory ofparsing. (Good text books exposing this theory are the Dragon Book by Ahoet al. [*AhoLSU] and the book by Crespi-Reghizzi [*Cre]).

    The equation system G of Example 1.1 can be rewritten into the followingsystem:

    G

    {K = p(K,L) q(K,K) rL = s(L,K,L) u v w.

    Instead of solving this system for the F -algebra M = {a, b, c}, , , a, b, c(which is the algebra for G in Example 1.2), we solve it for the P -algebraM with the same domain {a, b, c} but with the following interpretation of thesymbols of P . If we interpret the symbols p, q, s by the following operations on

  • 28 CHAPTER 1. OVERVIEW

    A = {a, b, c} (where x, y, z denote words in A):

    p(x, y) := axy,

    q(x, y) := xy,

    s(x, y, z) := bxyz,

    and the constant symbols r, u, v, w by the following words:

    r := a,

    u := a,

    v := c,

    w := ,

    then G is just an alternative writing of G, and its least solution for the algebraM is also (L(G,S), L(G,T )). But the system G has also a least solution(K , L) in P(T (P )) P(T (P )), and the derivation tree d is an element of L.More generally we define the sets of derivation trees of G respectively associatedwith S and T as the sets of terms K and L. For the above interpretationof the symbols of P , we can evaluate every term t of T (P ) into a word tM

    in A. In particular dM = baaac. Clearly, L(G,S) = {tM | t K } andL(G,T ) = {tM | t L}. Thus, since a parsing algorithm for G producesderivation trees of G, it corresponds to a classical parsing algorithm of thecontext-free grammar G.

    The system G and the derivation trees of G represent the abstract syntaxof the grammar G whereas the P -algebra M represents its concrete syntax. Itshould be clear that the construction of G and M can be realized for everycontext-free grammar G. It should be noted, however, that the signature P andthe algebra M both depend on G.

    A term that is associated with the word baaac according to the equationsystem G in Example 1.2, is b ((a (a (a ))) c). That term can be obtainedfrom derivation tree d by (re)interpreting the symbols p, q, s as the followingoperations on terms in T ({, , a, b, c}): p(x, y) := a (x y), q(x, y) := x y,and s(x, y, z) := b ((x y) z). Thus, a parsing algorithm for G (producingderivation trees) can easily be transformed into one for G (producing terms).

    In fact, derivation trees can be defined for the elements of general equationalsets. The transformation of G into

    G can be generalized into the transforma-

    tion of an arbitrary equation system S = x1 = p1, . . . , xn = pn into a systemS = x1 = p1, . . . , xn = pn such that each polynomial pi is a union of monomi-als of the form r(xi1 , . . . , xi(r)) corresponding one-to-one to the monomials of pi,where r belongs to a signature P associated with S. If m T (F {x1, . . . , xn})is the monomial of pi to which r(xi1 , . . . , xi(r)) corresponds, then xi1 , . . . , xi(r)is the sequence of unknowns that occur in m. Furthermore, we impose that eachr has a unique occurrence in S. The least solution of S in P(T (P ))n definesthe n-tuple of sets of derivation trees of S.

  • 1.1. CONTEXT-FREE GRAMMARS 29

    Let F be the signature over which S is written, and M the F -algebra for