data structures and algorithms for disjoint set union … structures and algorithms for disjoint set...

26
Data Structures and Algorithms for Disjoint Set Union Problems ZVI GALIL Department of Computer Science, Columbia Univers~ty, New York, NY10027and Department of Computer Science, Tel-Aviv University, Tel-Aviu, Israel GIUSEPPEF. ITALIANO Department of Computer Science, Columbia Unwersity, New York, NY 10027 and Dipartlmento di Informatica eSistemwtica, Univers~t&di Roma “LaSapienza”, Rome, Italy This paper surveys algorithmic techniques and data structures that have been proposed tosolve thesetunion problem and its variants, Thediscovery of these data structures required anew set ofalgorithmic tools that have proved useful in other areas. Special attention is devoted to recent extensions of the original set union problem, and an attempt is made to provide a unifying theoretical framework for this growing body of algorithms. Categories and Subject Descriptors: E. 1 [Data]: Data Structures trees; F, 2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems computations on discrete structures; G.2. 1 [Discrete Mathematics]: Combinatorics combinatorial algorithms; G. 2.2 [Discrete Mathematics]: Graph Theory—graph algorithms. General Terms: Algorithms, Theory Additional Key Words and Phrases: equivalence algorithm, partition, set umon, time complexity INTRODUCTION The set union problem has been widely studied during the past decades. The problem consists of maintaining a collec- tion of disjoint sets under the operation of union. More precisely, the problem is to perform a sequence of operations of the following two kinds on disjoint sets: union( A, B): Combine the two sets A and B into a new set named A. find(x): Return the name of the unique set containing the element x. The operations are presented on line; namely, each operation must be per- formed before the next one is known. Initially, the collection consists of n sin- gleton sets {1}, {2}, .... {n}. The name of set { i} is i. The input to the set union problem is therefore the initial collection of disjoint sets and the sequence of opera- tions to be processed on line; the output of the problem consists of the answers to find operations. Figure 1 shows examples of union and find operations. The set union problem has many ap- plications in a wide range of areas. In the following we mention some of them, but the list is by no means exhaustive. Main applications of the set union prob- lem include handling COMMON and EQUIVALENCE statements in FOR- TRAN compilers [Arden et al. 1961; Galler and Fischer 19641, implementing property grammars [Stearns and Lewis Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 0360-0300/91/0900-0319 $01.50 ACM Computmg Surveys, Vol. 23, No. 3, September 1991

Upload: trantruc

Post on 29-May-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Data Structures and Algorithms for Disjoint Set Union Problems

ZVI GALIL

Department of Computer Science, Columbia Univers~ty, New York, NY10027and Department of Computer

Science, Tel-Aviv University, Tel-Aviu, Israel

GIUSEPPEF. ITALIANO

Department of Computer Science, Columbia Unwersity, New York, NY 10027 and Dipartlmento di

Informatica eSistemwtica, Univers~t&di Roma “LaSapienza”, Rome, Italy

This paper surveys algorithmic techniques and data structures that have been proposed

tosolve thesetunion problem and its variants, Thediscovery of these data structures

required anew set ofalgorithmic tools that have proved useful in other areas. Special

attention is devoted to recent extensions of the original set union problem, and an

attempt is made to provide a unifying theoretical framework for this growing body of

algorithms.

Categories and Subject Descriptors: E. 1 [Data]: Data Structures – trees; F, 2.2 [Analysisof Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—computations on discrete structures; G.2. 1 [Discrete Mathematics]: Combinatorics—combinatorial algorithms; G. 2.2 [Discrete Mathematics]: Graph Theory—graphalgorithms.

General Terms: Algorithms, Theory

Additional Key Words and Phrases: equivalence algorithm, partition, set umon, time

complexity

INTRODUCTION

The set union problem has been widelystudied during the past decades. Theproblem consists of maintaining a collec-tion of disjoint sets under the operationof union. More precisely, the problem isto perform a sequence of operations ofthe following two kinds on disjoint sets:

union( A, B): Combine the two sets Aand B into a new set named A.

find(x): Return the name of the uniqueset containing the element x.

The operations are presented on line;namely, each operation must be per-formed before the next one is known.Initially, the collection consists of n sin-

gleton sets {1}, {2}, . . . . {n}. The nameof set { i} is i. The input to the set unionproblem is therefore the initial collectionof disjoint sets and the sequence of opera-tions to be processed on line; the outputof the problem consists of the answers tofind operations. Figure 1 shows examplesof union and find operations.

The set union problem has many ap-plications in a wide range of areas. Inthe following we mention some of them,but the list is by no means exhaustive.Main applications of the set union prob-lem include handling COMMON andEQUIVALENCE statements in FOR-TRAN compilers [Arden et al. 1961;Galler and Fischer 19641, implementingproperty grammars [Stearns and Lewis

Permission to copy without fee all or part of this material is granted provided that the copies are not madeor distributed for direct commercial advantage, the ACM copyright notice and the title of the publicationand its date appear, and notice is given that copying is by permission of the Association for Computing

Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.@ 1991 ACM 0360-0300/91/0900-0319 $01.50

ACM Computmg Surveys, Vol. 23, No. 3, September 1991

320 * Z. Galil and G. F. Italiano

I CONTENTS

INTRODUCTIONPrehrnmarlesModels of computation

Orgamzation of the paper1 SET UNION PROBLEM

1 1 Worst-Case Complexity12 Single-Operation Worst-Case Time

Complexity13 Average Case Complexity1,4 Special Linear Cases

15 Applications of Set Union

2 SET UNION PROBLEM ON INTERVAIS21 Interval Union-Spht-Fmd22 Interval Union-Find23 Interval Spht-Find

24 Applications of Set Umon on Intervals

3 SET UNION PROBLEM WITH DEUNIONS3.1 Algorithms for Set Umon with Deunions

3.2 Applications of Set Umon with Deumons

4 EXTENSIONS OF THE SET UNIONPROBLEM WITH DEUNIONS41 Set Umon with Arbitrary Deunions

42 Set Umon with Dynamic WeightedBacktracking

43 Set Union with Unhmited Backtracking

5 PERSISTENT DATA STRUCTURES FORSET UNION

6 CONCLUSIONS AND OPEN PROBLEMSACKNOWLEDGEMENTS

REFERENCES

1969; Stearns and Rosenkrantz 19691,computational geometry problems [Imaiand Asano 1987; Mehlhorn 1984c;Mehlhorn and Naher 1990], testingequivalence of finite state machines [Ahoet al. 1974; Hopcroft and Karp 1971],computing the longest common subse-quence of two sequences [Aho et al. 1983;Hunt and Szymanski 1977], performingunification in logic programming andtheorem proving [Huet 1976; Vitterand Simons 1986], and several combin-atorial problems such as finding min-imum spanning trees [Aho et al. 1974;Kerschenbaum and van Slyke 1972],solving dynamic edge- and vertex-connectivity problems [Westbrook 1989and Tarjan 1989 b], computing least-com-mon ancestors in trees [Aho et al. 19731,solving off-line minimum problems[Gabow and Tarjan 1985; Hopcroft andUnman 1973], finding dominators in

{1} {2} {3} {4} {5} {6}

(a) The initial collection of dlsjomt sets Find(Z) re -turns L,l<i=6

[1,3} {2,5} {4} {6}

(b) The dmjoint sets of (a) after performmg union(l, 3)and union(5, 2) Fred(l) and find(3) return 1, find(2)and find(5) return 5, find(4) returns 4, and find(6)returns 6

{1,3,4} {2,5} {6}

(c) The disjoint sets of (b) after performmg union(4. 1)Fred(l), find(3), and find(4) return 4, find(2) and f,nd(5)

return 5, and find(6) returns 6

{1,3,4,2,5} {6}

(d) The d]sjoint sets of (c) after performmg union(4, 5)Fred(l), find(2), find(3), find(4), and find(5) return 4,and find(6) returns 6

Figure 1. Examples of union and find operations.

graphs [Tarjan 19741, and checking flowgraph reducibility [Tarjan 19731.

Recently, many variants of this prob-lem have been introduced in which thepossibility of backtracking over the se-quences of unions was taken into account[Apostolic et al. 1989; Gambosi et al.1988, 1989, 1991; Mannila and Ukkonen1986a, 1988; Westbrook and Tarjan1989al. This was motivated by problemsarising in logic programming interpretermemory management [Hogger 1984;Mannila and Ukkonen 1986b, 1986c;Warren and Pereira 1977], in incremen-tal execution of logic programs [Mannilaand Ukkonen 19881, and in implementa-tion of search heuristics for resolution[Ibaraki 1978; Pearl 19841.

In this paper we survey the most effi-cient algorithms known for set unionproblems and some of their variants. Wepresent the algorithms and discuss someof their applications.

Preliminaries

Although the problems discussed in thispaper have wide practical applications,we consider the analysis of the algo-rithms for their solution from the per-spective of theoretical computer science.We concentrate on the ways in whichresource usage (computing time, memory

ACM Computing Surveys, Vol 23, No 3, September 1991

Disjoint Set Union Problems ● 321

space) grows with increasing problemsize. Therefore, we compare the merit ofdifferent algorithms based on formulasfor their resource usage rather than onexperimental information. Further, weignore, for the most part, the constantfactors in these formulas giving only or-der of magnitude evaluations. That is,we concentrate on asymptotic growthrates rather than on exact mathematicalexpressions.

We use the notation 0( f( n)) (capitaloh notation) to describe the set of func-tions that grow asymptotically no fasterthan f(n). The notation !J( f( n)) de-scribes the set of functions that gp-owasymptotically at least as fast as ~(n).Functions that grow asymptotically atthe same rate as f(n) are expressed by@( f( n)). Formally the definitions for O,0, and @ are as follows [Knuth 19761:

g(n) e 0( f( n)) if there exists constantsno >0 and c >0 such that g(n) s cf(n)for all n > no

g(n) e 0( f(n) if there exists constants no> 0 and c > 0 such that g(n) > cf( n) forall n > no

g(n) e @(;(n) if there exists constants no> 0 and c > 0 such that g(n) = cf(n) forall n a no

Sometimes it is useful to say that onefunction has strictly smaller asymptoticgrowth than another. The lowercase ohnotation is defined as follows:

g(n) eo(f[n)) ifand only if

That is, O( f( n)) is the set of functionswith order strictly smaller than f(n).

We consider the analysis of algorithmsfrom two different points of view. In onewe analyze the time complexity of a se-quence of operations; in the other weanalyze the time complexity of a singleoperation. When we consider a sequenceof operations, we estimate either theworst case of the algorithm (i. e., we con-sider those input instances of that size onwhich the algorithm takes the most time)

or the auerage case of the algorithm (i. e.,we average the execution times on in-stances of a certain size). When we con-sider a single operation, we estimate theamortized complexity [Tarjan 19851(namely, we average the running timeper operation over a worst-case sequenceof operations) or the worst-case complex-ity. In case of the single operation com-plexity (either amortized or worst case),we sometimes give different bounds forthe different operations (in our caseunions and finds). Note that unless wegive different bounds for the differentoperations, the worst-case complexity ofa sequence of operations determines theamortized complexity of single opera-tions and vice versa.

ModeLs of Computation

Different models of computation havebeen developed for analyzing algorithmsthat solve set union problems. The mainmodel of computation considered is thepointer machine [Ben-Araram and Galil1988; Knuth 1968; Kolmogorov 1953;Schonage 1980; Tarjan 1979a]. Its stor-age consists of an unbended collection ofregisters (or records) connected by point-ers. Each register can contain an arbi-trary amount of additional information,and no arithmetic is allowed to computethe address of a register. In this modeltwo classes of algorithms were defined:separable pointer algorithms [Tarjan1979a] and nonseparable pointer algo-rithms [Mehlhorn et al. 1988].

Separable pointer algorithms run on apointer machine and satisfy the separa-bility assumption as defined in Tarjan[1979al (see below). A separable pointeralgorithm makes use of a linked datastructure, namely a collection of recordsand pointers that can be thought of as adirected graph: Each record is repre-sented by a node, and each pointer isrepresented by an edge of the graph. Thealgorithm solves the set union problemaccording to the following rules [Blum1986; Tarjan 1979al:

(i) The operations must be performedon line; that is, each operation must

ACM Computing Surveys, Vol. 23, No. 3, September 1991

322

(ii)

(iii)

(iv)

(v)

. Z. Galil and G. F. Italiano

be executed before the next one isknown.

Each element of each set is a node of’the data structure. There can be alsoadditional (working) nodes.

(Separability). After each operation,the data structure can be parti-tioned into disjoint subgraphs suchthat each subgraph corresponds toexactly one current set. The name ofthe set occurs in exactly one node inthe subgraph. No edge leads fromone subgraph to another.

To perform find(x), the algorithmobtains the node u corresponding toelement x and follows paths start-ing from u until it reaches the nodethat contains the name of the corre-sponding set.

During any operation, the algorithmmay insert or delete any number ofedges. The only restriction is thatrule iii must hold after each opera-tion.

The class of nonseparable pointer algo-rithms [Mehlhorn et al. 19881 does notrequire the separability assumption. Theonly requirement is that the number ofedges leaving each node must be boundedby some constant c >0. More formally,rule iii is replaced by the followingrule, while the other four rules are leftunchanged:

(iii) There exists a constant c >0 suchthat there are at most c edges leav-ing a node.

Often these two classes of pointer algo-rithms admit quite different upper andlower bounds for the same problem~.

A second model of computation consid-ered in this paper is the random accessmachine, whose memory consists of anunbounded sequence of registers, each ofwhich is capable of holding an arbitraryinteger. The main difference with pointermachines is that in random access ma-chines the use of address arithmetic tech-niques is permitted. To make the variouslower and upper bounds meaningful, it is

usually assumed that the size of a regis-ter is bounded by O(log n)l bits. A for-mal definition of random access ma-chines can be found in Aho et al. [1974,pp. 5-141.

A third model of computation, knownas the cell probe model of computation,was introduced by Yao [1981]. In thismodel, the cost of computation is mea-sured by the total number of memoryaccesses to a random access memory withrlog nl bits cell size. All other compu-

tations are considered to be free.

Organization of the Paper

The remainder of the paper consists ofsix sections. Section 1 surveys the mostefficient algorithms known for solving theset union problem. Section 2 deals withthe set union problem on adjacent inter-vals. Section 3 presents data structuresthat allow us to undo the last union per-formed. This result has been recentlygeneralized in several directions, whichare dealt with in Section 4. Section 5shows some of the techniques used toobtain persistent data structures, as de-fined in Driscoll et al. [1989], for the setunion problem. Finally, Section 6 listssome open problems and presents con-cluding remarks.

1. SET UNION PROBLEM

The set union problem consists of per-forming a sequence of union and findoperations, starting from a collection of nsingleton sets {l}, {2}, ., {n}. The ini-tial name of set {i} is i, The number ofunions in any sequence of operations isbounded above by n – 1. Due to the defi-nition of the union and find operations,there are two ifivariants that hold at anytime. First, the sets are always dis-joint and define a partition of the set{1, 2,. ... n}. Second, the name of eachset corresponds to one of the items con-tained in the set itself. The set union

lThroughout this paper all logarithms are assumedto be to the base 2 unless explicitly specified asotherwise.

ACM Computing Surveys, Vol 23, No. 3, September 1991

problem requires developing data struc-tures and algorithms that are efficientfor both union and find operations sincethe applications use both operations.

A different version of this problem con-siders the following operation instead ofunion:

unite( A, l?): Combine the two sets A andB into a new set, whose name is either Aor B.

Unite allows the name of the new setto be arbitrarily chosen. This is not asignificant restriction in many applica-tions, where one is mostly concerned withtesting whether two elements belong tothe same set, no matter what the nameof the set can be. Some extensions of theset union problem have quite differenttime bounds depending on whetherunions or unites are considered. In thefollowing, we will deal with unions un-less otherwise specified.

1.1 Worst-Case Complexity

In this section, we describe algorithmsfor the set union problem [Tarjan 1975;Tarjan and van Leeuwen 1984], givingthe optimal worst-case time complexityfor a sequence of operations (and thusthe amortized time complexity per op-eration). For the sake of completeness,we first survey some basic algorithmsthat have been proposed in the literature[Aho et al. 19’74; Fischer 1972; Gallerand Fischer 1964], These are the quick-flnd, the weighted quick-find, the quick-union, and the weighted quick-unionalgorithms,

The quick-find algorithms allow oneto perform find operations quickly; thequick-union algorithms allow one to per-form union operations quickly. Theirweighted counterparts speed thesecomputations up by introducing someweighting rules.

Most of these algorithms represent setsas rooted trees, following a technique in-troduced by Galler and Fischer [19641.Each tree corresponds to a set. Nodes ofthe tree correspond to elements of thecorresponding set. The name of the set is

Disjoint Set Union Problems ● 323

stored in the root of the tree. Each treenode has a pointer to its parent. We referto P(x) as the parent of node x.

The quick-find algorithm can be de-scribed as follows. Each set is repre-sented by a tree of height 1. Elements ofthe set are the leaves of the tree. Theroot of the tree is a special node thatcontains the name of the set. Initially,singleton set { i}, 1 s i S n, is repre-sented by a tree of height 1 composed ofone leaf and one root. To perform aunion( A, B), all the leaves of the treecorresponding to B are made children ofthe root of the tree corresponding to A.The old root of B is deleted. This main-tains the invariant that each tree is ofheight 1 and can be performed in 0( I B /)time, where I B I denotes the total num-ber of elements in set B. Since a set canhave as many as 0(n) elements, thisgives an 0(n) time complexity in theworst case for each union. To perform afind(x), return the name stored in theparent of x, Since all trees are main-tained of height 1, the parent of x is atree root. Consequently a find requiresO(1) time. The same algorithm can bedescribed by using an array instead oftrees [Aho et al. 19741.

A more efficient variant attributed toMcIlroy and Morris by Aho et al. [19741and known as weighted quick-find usesthe freedom implicit in each union opera-tion according to the following weightingrule.

Union by size: Make the children of theroot of the smaller tree point to the rootof the larger, arbitrarily breaking a tie.This requires that the size of each tree ismaintained throughout any sequence ofoperations.

Although this rule does not improvethe worst-case time complexity of eachoperation it improves to O(log n) theamortized bound of unions [Aho et al.19741, as the following lemma shows.

Lemma 1.1.1

The weighted quick-find algorithm is ableto perform each find operation in O(1)

ACM Computmg Surveys, Vol. 23, No. 3, September 1991

324 ● Z. Galil and G. F. Italiano

time. The total time required to performn – 1 union operations is O(n log n).

Proofi Each find operation is imple-mented as in the quick-find algorithmand therefore requires O(1) time. Denoteby N, the total number of times a node iis moved from a set to another because ofa union operation. The total time re -quired to perform n – 1 union operationsis

If the union by size rule is used, eachtime node i is moved from a set SI toanother set Sz because of a union(Sz, S1), the size of the new set Sz afterthis operation is at least twice the size ofthe set S’l in which i was before. Asa consequence, N,~logn for i=

1,2, . . ..n. Therefore, the total timeneeded to perform n – 1 unions is

The quick-union algorithm [Galler andFischer 19641 can be described as follows.Again, each set is represented by a tree.

There are, however, two main differenceswith the data structure used by thequick-find algorithm. The first is thatthe height of a tree can be greater than1. The second is that each node of eachtree corresponds to an element of a set,and therefore there is no need for specialnodes. Once again, the root of each treecontains the name of the correspondingset. A union (A, 1?) is performed by mak-ing the tree root of set B a child of thetree root of set A. A find(x) is performedby starting from the node x and follow-ing the pointer to the parent until thetree root is reached. The name of the setstored in the tree root is then returned.As a result, the quick-union algorithm isable to support each union in O(1) timeand each find in 0(n) time.

The time bound can be improved byusing the freedom implicit in each unionoperation according to one of the follow-ing two union rules. This gives rise totwo weighted quick-union algorithms.

union by size: Make the root of thesmaller tree point to the root of thelarger, arbitrarily breaking a tie. Thisrequires maintaining the number of de-scendants for each node, in the followingreferred to as the size of a node, through-out all the sequence of operations.

union by rank [Tarjan and van Leeuwen19841: Make the root of the shallowertree point to the root of the other, arbi-trarily breaking a tie. This requiresmaintaining the height of the subtreerooted at each node, in the following re-ferred to as the rank of a node, through-out all the sequences of operations.

After a union( A, B), the name of thenew tree root is set to A.

Lemma 1.1.2

If either union rule is used, any treeof height h must contain at least 2 helements.

Proof We prove only the case wherethe union by size rule is used. The caseof union by rank can be proved in asimilar fashion. We proceed by inductionon the number of union operations. Be-fore the first union, the lemma is clearlytrue since each set is of height O andcontains at least 20 = 1 element. As-sume that the lemma is true before aunion( A, B). Assume without loss ofgenerality that I A I = I 1?I so the root ofB will be made a child of the root of A.Denote by h(A) and h(~) the heights ofA and B before the union operation andby h( A U II) the height of the com-bined tree A U Il. Clearly, h(A U El) =

max{ h(A), h(l?) + 1}.We have

lAUBl>max{ lAl,21 B\}

~ max{2~(A),2~@)+l}

—— 2max{h(A), h(B)+l} = 2h(AuB)

0

As a consequence of the above lemma,the height of the trees achieved with ei-ther the union by size or the union by

ACM Computing Surveys, Vol. 23, No. 3, September 1991

Disjoint Set Union Problems ● 325

v

x&Yzwu

A

[b)

(c)

Figure 2. Path compaction techniques.compression; (c) path splittin~ (d) path halving.

L!K(d)

(a) The tree before performing a find(x) operation; (b) path

rank rule is never more than Flogl n.Henceforth, with either rule each unioncan be performed in 0(1) time and eachfind in O(log n) time.

A better amortized bound can be ob-tained if one of the following compactionrules is applied to the find path (seeFigure 2).

path compression [Hopcroft and Unman19731: Make every encountered nodepoint to the tree root.

path splitting [van der Weide 1980;van Leeuwen and van der Weide 1977]:Make every encountered node (except thelast and the next to last) point to itsgrandparent.

path halving [van der Weide 1980;

van Leeuwen and van der Weide 1977]:Make every other encountered node(except the last and the next to last)point to its grandparent.

Combining the two choices of a unionrule and the three choices of a com-paction rule, six possible algorithms areobtained. They all have an O(a( m +

n, n)) amortized time complexity, wherea is a very slowly growing function, afunctional inverse of Ackermann’s [1928]function.

Theorem 1.1.1

[Tarjan and van Leeuwen 1984] The algo-rithms with either union rule com-bined with either compaction rule run in

ACM Computmg Surveys, Vol. 23, No 3, September 1991

326 - Z. Galil and G. F. Italiano

O(n + ma (m + n, n)) time on a sequenceof at most n – 1 unions and m finds.

No better amortized bound is possiblefor separable and nonseparable pointeralgorithms or in the cell probe model ofcomputation.

Theorem 1.1.2

[Banachowski 1980; Tarjan 1979a;Tarjan and van Leeuwen 1984] Any sepa-rable pointer algorithm requires Q(n +ma(m + n, n)) worst-case time for process-ing a sequence of n – 1 unions and mfinds.

Theorem 1.1.3

[Fredman 1989; Fredman and Saks 1989]The set union problem requiresL?(n + ma(m + n, n)) worst-case time inthe cell probe model of computation.

Theorem 1.1.4

[La Poutr~ 1990b] Any nonseparablepointer algorithm requires L’(n + ma(m+ n, n)) worst-case time for processing asequence of n – 1 unions and m finds.

Although the six algorithms givenabove are all asymptotically optimal,there are certain differences of practicalimportance among the various union andcompaction rules. First, union by rankseems preferable to union by size since it

requires less storage. Indeed, it needsonly O(log log n) bits per node to store arank in the range [0, flog nl ], whereasunion by size needs O(log n) bits per nodeto store a size in the range [1, n]. Second,path halving seems to be preferableamong the three compaction rules. Pathcompression requires two passes over thefind path, whereas path halving and pathsplitting can be implemented in only onepass. Furthermore, path halving has theadvantage over path splitting in that itrequires nearly as many as half pointerupdates along the find path. Indeed, pathhalving needs to update every other nodeinstead of every node in the find path.

An algorithm that performs find opera-tions more efficiently has been recently

proposed by La Poutr6 [1990a]. Followinga technique first introduced by Gabow[1985], he showed how to support the ithfind operation in 0( a( i, n)) time in theworst case, while still solving the setunion problem in a total of 0( n +ma( m + n, n)) worst-case time.

We conclude this section by mention-ing that the trade-off between union andfind operations has been recently studiedby Ben-Amran and Galil [1990]. Theygave an algorithm that for any integerk > O; supports unions in O(a(h, n))amortized time and finds in 0(k) amor-tized time, where a( k, n) is a row inverseof Ackermann’s function [Tarjan 1983].

1.2 Single Operation Worst-Case Time

Complexity

The algorithms that use any union andany compaction rule still have single-operation worst-case time complexityO(log n) [Tarjan and van Leeuwen 1984]since the trees created by any of theunion rules can have a height as large asO(log n). Blum [1986] proposed a datastructure for the set union problem thatsupports each union and find inO(log n/log log n) time in the worst case.

The data structure used to establishthe upper bound is called k- UF tree. Forany k > 2, a k-UF tree is a rooted treesuch that

(i) the root has at least two children,(ii) each internal node has at least k

children,

(iii) all leaves are at the same level.

As a consequence of this definition, theheight of a k- UF tree with n leaves is notgreater than [log ~ n 1. We refer to theroot of a k- UF tree as fat if it has morethan k children and as slim otherwise.A k- UF tree is said to be fat if its root is

fat; otherwise it is referred to as slim.Disjoint sets can be represented by

k- UF trees as follows. The elements ofthe set are stored in the leaves, and thename of the set is stored in the root.Furthermore, the root also contains theheight of the tree and a bit specifyingwhether it is fat or slim.

ACM Computmg Surveys, Vol 23, No 3, September 1991

A find(x) is performed as described inthe previous section by starting from theleaf containing x and returning the namestored in the root. This can be accom-plished in O(log~ n) worst-case time.

A union( A, B) is performed by firstaccessing the roots r~ and r~ of the cor-responding k- UF trees T~ and TB. Blumassumed that his algorithm obtained inconstant time r~ and r~ before perform-ing a union( A, B). If this is not the case,r* and r~ can be obtained by means oftwo finds [i.e., find(A) and find(B)] dueto the property that the name of each setcorresponds to one of the items containedin the set itself.

We now show how to unite the twok- UF trees TA and TB. Assume withoutloss of generality that height(l’~) sheight(T~). Let u be the node on the pathfrom the leftmost leaf of TA to r~ withthe same height as TB. Clearly, u can belocated by following the leftmost pathstarting from the root r~ for exactlyheight(T~) – height(T~) steps. When im-planting TB and TA, only three cases arepossible, which gives rise to three differ-ent types of unions.

Type 1: Root rB is fat (i.e., it has morethan k children), and u is not the root ofT~. Then r~ is made a sibling of U.Type 2: Root rB is fat and u is fat andequal to r~ (the root of T~). A new (slim)root r is created, and both r* and r~ aremade children of r.

Type 3: This deals with the remainingcases, that is, either root rB is slim or uis equal to r~ and slim. If root r~ is slim,all the children of rB are made the right-most children of U. Otherwise, v is equalto r~ and is slim. In this case, all thechildren of u = r* are made the right-most children of r~.

Theorem 1.2.1

[Blum 1986] k-UF trees can support eachunion and find in O(log n / log log n) timein the worst case. Their space complexityis O(n).

Proofi Each find can be performed inO(log~ n) time. Each union( A, B) can

Disjoint Set Union Problems “ 327

require at most O(logh n) time to locatethe nodes r*, rB, and v as defined above.Both type 1 and type 2 unions can beperformed in constant time, whereas type3 unions require at most 0(k) time dueto the definition of a slim root. Choosingk = Flog n/log log nl yields the claimedtime bound. The space complexity is de-rived easily from the fact that a k- UFtree with ~ leaves has at most 2 ~– 1nodes. Henceforth the forest of k- UF treesthat store the disjoint sets requires themost a total of 0(n) space. ❑

Blum also showed that this bound istight for the class of separable pointeralgorithms.

Theorem 1.2.2

[Blum 1986] Every separable pointer algo-rithm for the disjoint set union problemhas single-operation worst-case time com-plexity at least L?(log n / log log n).

Recently, Fredman and Saks [19891showed that the same lower bound holdsin the cell probe model of computation.

Theorem 1.2.3

[Fredman and Saks 1989] Any algorithmfor the set union problem requiresQ(log n / log log n) single-operation worst-time case in the cell probe model ofcomputation.

1.3 Average Case Complexity

The average running time of the quickunion, quick-find, and weighted quick-find algorithms described in Section 1.1has been investigated [BollobA andSimon 1985; Doyle and Rivest 1976;Knuth and Sch6nage 1978; Yao 19761under different assumptions on the dis-tribution of the input sequence.

In the rest of this section, we assume0(n) union and find instructions are be-ing performed. This is not a significantrestriction for the asymptotic time com-plexity as shown, for instance, by Hartand Sharir [19861. We denote by E[ t~QF)l,E[ t~wQ~)l, and E[ t~Qu)l the average run-ning time of quick-find, weighted quick-

ACM Computmg Surveys, Vol. 23, No. 3, September 1991

328 ● Z. Galil and G. F. Italiano

find, and quick-union algorithms, re-spectively, which perform 0(n) set unionoperations on n elements. Three modelsof random input sequences have beenconsidered in the literature: the randomgraph model, the random spanning treemodel, and the random componentsmodel.

1.3.1 Random Graph Model

The random graph model was proposedby Yao [19761 based upon the randomgraph model by Erdos and R6nyi [1960].Each of the (j) undirected edges betweentwo vertices of an n-vertices graph ischosen independently according to aPoisson process. When an edge (x, y) ischosen, a union( x, y) is executed if xand y are in different sets.

Theorem 1.3.1

[Bollobks and Simon 1985; Knuth andSchonage 1978] The average running timeof quick-find and weighted quick-find al-gorithms in the random graph model is

E[t~QF)] = ~ + o(n(log n)’);

()E[p’Q~’] =c!rz + o nlog n ‘

where c = 2.0847 . . . .

1.3.2 Random Spanning Tree Model

Each sequence of union operations corre-sponds to a “union tree” in which theedge (x, y) means that the set containingx is merged into the set containing y. Inthe random spanning tree model, all pos-sible union trees are equally likely; thereare nn–’ possible unoriented trees and(n-l)! ways of choosing edges in each tree.

Theorem 1.3.2

[Knuth and Schonage 1978; Yao 1976]The average running time of quick-find

and weighted quick-find algorithms in therandom spanning tree model is

E [ t~QF)]=[ !! n3/’2 + O(nlOg n);8

E(t~wQ~)] = ~nlog n + O(n).ii’

1.3.3 Random Components Model

In the simplest model, it is assumed thatat any given time each pair of sets isequally likely to be merged by a unionoperation. This is also the least realisticmodel. Indeed, this assumption does notapply when one is interested in joiningsets containing two elements chosen in-dependently with uniform probability.

Theorem 1.3.3

[Doyle and Rivest 19761 The average run-ning time of the quick-union algorithm inthe random components model is

The reader is referred to the originalpapers [Bollobas and Simon 1985; Doyleand Rivest 1976; Knuth and Schonage1978; Yao 1976] for details concerningthe analysis of the expected behavior ofthese algorithms. Besides the signifi-cance of the three random input modelschosen, these results underline thatweighted quick-find algorithms are muchfaster than quick-find algorithms notonly in the worst case but also in theaverage.

1.4 Special Linear Cases

The six algorithms using either unionrule and either compaction rule as de-scribed in Section 1.1 run in 0( n +ma( m, n)) time on a sequence of at mostn – 1 union and m find operations. Nobetter amortized bound is possible forseparable and nonseparable pointer algo-rithms or in the cell probe model of com-putation. As a consequence, to get a bet-ter bound, we must consider a specialcase of set union. Gabow and Tarjan[1985] used this idea to devise one ran-

ACM Computmg Surveys, Vol 23, No. 3, September 1991

Disjoint Set Union Problems ● 329

dom access machine algorithm that runsin linear time for a special case in whichthe structure of the unions is known inadvance, Interestingly, Tarjan’s lowerbound for separable pointer algorithmsalso applies to this restricted problem.This result is of theoretical interest andis significant in many applications, suchas scheduling problems, the off-line minproblem, finding maximum matching ongraphs, VLSI channel routing, findingnearest common ancestors in trees, andflow graph reducibility [Gabow andTarjan 19851.

The problem can be formalized as fol-lows. We are given a tree T containing n

nodes that correspond to the initial n

singleton sets. Denoting by parent(U) theparent of the node u in T, we have toperform a sequence of union and findoperations such that each union can beonly of the form union( parent(u), u). Forsuch a reason, T is called the static uniontree and the problem will be referred toas the static tree set union. Also, the casein which the union tree can dynamicallygrow by means of new node insertions(referred to as incremental tree set union)can be solved in linear time. We firstbriefly sketch the solution of the statictree set union problem.

Gabow and Tarjan’s static tree algo-rithm partitions the nodes of T in suit-ably chosen small sets, called microsets.

Each microset contains less than b nodes[where b is such that b = !J(log log n)

and b = O(log n/log log n)], and thereare at most 0( n / b) microsets. To eachmicroset S a node r # S is associated,referred to as the root of S, such thatS U {r} induces a subtree of T with rootr.

The roots of the microsets are main-tained as a collection of disjoint sets,called macrosets. Macrosets facilitate ac-cess to and manipulation of microsets.

The basic ideas underlying the algo-rithm are the following. First, a prioriknowledge about the d~tic union treecan be used to precompute the answers tothe operations performed in microsets us-ing a table look up. Second, any one ofthe six optimal algorithms described in

Section 1.1 can be used to maintain themacrosets. By combining these two tech-niques, a linear-time algorithm for thisspecial case of the set union problem canbe obtained.

Theorem 1.4.1

[Gabow and Tarjan 1985] If the knowl-edge about the union tree is available inadvance, each union and find operationcan be supported in O(1) amortized time.The total space required is O(n).

The same algorithm given for the statictree set union can be extended to theincremental tree set union problem. Forthis problem, the union tree is not knownin advance but is allowed to grow onlyone node at the time during the sequenceof union and find operations performed.This problem has application in severalalgorithms for finding maximum match-ing in general graphs. The incrementaltree set union algorithm is similar to thealgorithm for static tree set union. Theonly difference is in the construction ofmicrosets, which now might change overtime because of new node insertions inthe union tree T. The basic idea is tostart with only one microset, the root ofT. When a new node w is inserted to T,say as a child of u, w is put in the samemicroset as v. If the size of this microsetdoes not exceed b, nothing need be done.Otherwise, the microset is split into twonew microsets. It can be proved that thissplit can be performed in 0(b) time andthe total number of such splits is at most0( n / b). Therefore, the incremental treeset union problem can also be solved inlinear time.

Theorem 1.4.2

[Gabow and Tarjan 1985] The algorithmfor incremental tree set union runs in atotal of O(m + n) time and requires O(n)preprocessing time and space.

Loebl and Ne5ietfil [19881 presented alinear-time algorithm for another specialcase of the set union problem. They con-sidered sequences of unions and finds

ACM Computing Surveys, Vol. 23, No. 3, September 1991

330 “ Z. Galil and G. F. Italiano

with a constraint on the subsequence offinds. Namely, the finds are listed in apostorder fashion, where a postorder isa linear ordering of the leaves induced bya drawing of the tree in the plane. In thisframework, they proved that such se-quences of union and find operations canbe performed in linear time, thus gettingO(1) amortized time per operation. Aslightly more general class of input se-quences, denoted by local postorder, wasproved not to be linear, but its rate ofgrowth is not provable in the theory offinite sets. A preliminary version of theseresults was reported in Loebl and NeEetiil[1988].

1.5 Applications of Set Union

In this section, we list a few applicationsof the set union problem coming fromareas such as combinatorial optimiza-tion, graph algorithms, and theoremproving. We describe how to take advan-tage of set union algorithms for findingminimum spanning trees [Aho et al.1974; Kerschenbaum and van Slyke1972], for maintaining on-line connectedcomponents of undirected graphs [Evenand Shiloach 1981], and for performingunification of first-order terms [Alt-Kaci1986; Alt-Kaci and Nasr 1986; Huet 1976;Vitter and Simons 19861.

A spanning tree S = (X, T) of a graphG = (V, E) is a tree that has the samevertex set G (i.e., X = V) and that con-tains only edges of G (i.e., T z E). Ifeach edge (i, J“) of G is assigned a costC(i, j), then the cost of spanning tree isdefined as

c(s) = E C(i>j).(i,J)e T

A minimum spanning tree of G is aspanning tree of minimum cost. Thisproblem arises naturally in many appli-cations, including communication net-works and distributed computation. Ahoet al. [1974] give the following algorithmto compute the minimum spanning treeof a graph, which was first developed by

Kruskal [19561:

begin

S+g;for each vertex

initialize a singleton set {U};

while there is more than one set leftdo begin

choose the cheapest edge (u, u) c E;

delete (u, u) from E;A + F~~~( ~);

B + FIND(U);

if A # B then begin

UNION( A, B);add (u, U)to S

end;

end;

end

The algorithm starts with a spanningforest of n singleton trees and examinesall the edges of the graph in order ofincreasing costs. When an edge (u, U) isexamined, the algorithm checks whether(u, v) connects two trees of the spanningforest. This is done by checking whethervertices u and v are in different trees ofthe spanning forest. If so, the trees arecombined and (u, U) is inserted into thespanning tree. As shown in the abovepseudocode, this algorithm can be effi-ciently implemented if the trees of thespanning forest are represented asdisjoint sets subject to union and findoperations.

There are numerous spanning treeproblems that also benefit from ideas be-hind the set union algorithms. Amongthem are testing whether a given span-ning tree is minimum [Tarjan 1979b],performing sensitivity y analysis on mini-mum spanning trees [Tarjan 1982], andfinding edge replacements in minimumspanning trees [Tarjan 1979bl.

Another application of set union algo-rithms is the on-line maintenance of theconnected components of an undirectedgraph. Namely, we are given an undi-rected graph in which edges can be in-serted one at a time and for which con-nectivity questions, such as “Are verticesu and v in the same connected compo-nent?” must be answered at any time inan on-line fashion. As noted by Even andShiloach [19811, this problem can be

ACM Computing Surveys, Vol. 23, No, 3, September 1991

Disjoint Set Union Problems ● 331

solved by maintaining the connectedcomponents of the graph as disjoint setssubject to union and find operations.Checking whether two vertices u and uare in the same connected component canbe performed by testing whether find(u)= find(v). The insertion of a new edge(x, y) can be accomplished by first check-ing whether x and y are already in thesame connected component by means oftwo find operations. If so, the representa-tion of disjoint sets need not be changed.Otherwise, x and y are in two differentconnected components and therefore intwo different sets, say A and B. As aresult of the insertion of edge (x, y), Aand B are combined through a unionoperation. By using the set union algo-rithms with any union rule and pathcompaction, the total time required tomaintain an n-vertices graph under m

edge insertions and k connectivity ques-tion is O(n -1-ka(k, n)).

A third application of set union algo-rithms is the unification problem. Infor-mally, the unification problem can bestated as follows: “Given two differentdescriptions, can an object fitting bothdescriptions be found?” More formally, adescription is an expression in logic com-posed of function symbols, variables, andconstants. Finding value of variables thatmake two expressions equal is an impor-tant process in logic and deduction calledunification. The reader is referred to theexcellent survey of Knight [19891 for amore precise definition of the unificationproblem and its significance in differentinterdisciplinary areas.

Many algorithms proposed to solve theunification problem exploit the efficiencyof set union algorithms [Ait-Kaci 1986;Huet 1976; Vitter and Simons 19861. Themain idea behind those algorithms is touse a graph representation for terms tobe unified, where function symbols, vari-ables, and constants are the vertices ofthe graph. Initially, each node is in asingleton class. Throughout the execu-tion of the algorithm, equivalence classesof vertices are maintained. If two ver-tices u and u belong to the same equiva-lence class, they have been unified by the

algorithm. As the unification proceeds,different equivalence classes are com-bined together. If at the end of this algo-rithm a class contains different functionsymbols, the two original terms were notunifiable. For more details see Huet[19761 and Vitter and Simons [1986]. Wenote that these algorithms resemble an-other algorithm (again based upon setunion data structures) required to solveequivalence of finite state machines[Hopcroft and Karp 19711 (see Aho et al.[1974, pp. 143-145]).

Another interesting application of setunion algorithms is the depth determina-tion problem. In this problem, one is in-terested in maintaining informationabout the depth of vertices in a forest oftrees subject to update operations, suchas linking two trees by inserting a newedge. Details can be found in Aho et al.[1974, pp. 141-1421.

2. SET UNION PROBLEM ON INTERVALS

In this section, we describe efficient solu-tions to the set union problem on inter-vals, which can be defined as follows.Informally, we would like to maintain apartition of a list {1, 2,. ... n} in adja-cent intervals. A union operation joinstwo adjacent intervals, a find returns thename of the interval containing x, and asplit divides the interval containing x(at x itself). More formally, at any timewe maintain a collection of disjoint setsA, with the following properties. TheA ,’s, 1 s i s k, are disjoint sets whosemembers are ordered by the relation sand such that

~AL={l,2,..,, n}.i=l

Furthermore, every item in A, is lessthan or equal to all the items in A,+ ~,for i=l,2,. . ., n – 1. In other words,the intervals A, partition the interval[1, n]. Set A, is said to be adjacent tosets A,_ ~ and A,+ ~. The set union prob-lem on intervals consists of performing asequence of the following operations:

ACM Computing Surveys, Vol. 23, No 3, September 1991

332 ● Z. Galil and G. F. Italiano

AA

s, S2

x

uniOn(Sl, SZ, S)

1

AS3 AS4

tSplit (S,S1, S2, z)

Figure 3. Union and spht operations

union(Sl, Sz, S): Given the adjacent setsS’l and Sz, combine them into a new setS= S1L)S2.

find(x): Given the item x, return thename of the set containing x.

split(S, SI, Sz, x): Partition S into twosets Sl={a~Sla <x} and S2={aeS I a ~ x}. Figure 3 shows examples ofunion and split operations.

Adopting the same terminology usedin Mehlhorn et al. [1988], we will refer tothe set union problem on intervals as theinterval union-split-find problem. Afterdiscussing this problem, we consider twoparticular cases: the interval union-findproblem and the interval split-find prob-lem, where only union-find and split-findoperations are allowed, respectively.

The interval union-split-find problemand its subproblems have applications ina wide range of areas, including prob-

lems in computational geometry such asdynamic segment intersection [Imai andAsano 1987; Mehlhorn 1984c; Mehlhornand Naher 1990], shortest paths prob-lems [Ahuja et al. 1990; Mehlhorn 1984b],and the longest common subsequenceproblem [Aho et al. 1983; Hunt andSzymanski 1977].

2.1 Interval Union-Split-Find

In this section, we will describe optimalseparable and nonseparable pointer algo-rithms for the interval union-split-findproblem. The best separable algorithmfor this problem runs in O(log n) worst-case time for each operation, whereasnonseparable pointer algorithms requireonly O(log log n) worst-case time for eachoperation. In both cases, no better boundis possible.

ACM Computmg Surveys, Vol 23, No, 3, September 1991

Disjoint Set Union Problems - 333

The upper bound for separable point-er algorithms can be easily obtainedby means of balanced trees [Aho et al.1974; Adelson-Velskii and Landis1962; Mehlhorn 1984a; Nievergelt andReingold 1973], whereas the followinglower bound holds.

Theorem 2.1.1

[Mehlhorn et al. 19881 For any separablepointer algorithm, both the worst-case peroperation time complexity of the intervalsplit-find problem and the amortized timecomplexity of the interval union-split-findproblem are LK?og n).

Turning to nonseparable pointer algo-rithms, the upper bound can be found inKarlsson [19841, Mehlhorn and Naher[1990], van Erode Boas [1977], and vanErode Boas et al. [19771. In particular,van Erode Boas et al. [1977] introduced apriority queue that supports among otheroperations insert, delete, and successoron a set with elements belonging to afixed universe S = {1,2, . . . . n}. Thetime required by each of those operationis O(log log n). Originally, the space was0( n log log n) but later was improved to0(n). It can be shown [Mehlhorn et al.1988] that the above operations corre-spond to union, split, and find, respec-tively, and therefore the followingtheorem holds.

Theorem 2.1.2

[van Erode Boas 1977] There exists a datastructure supporting each union, find, andsplit in O(log log n) worst-case time. Thespace required is O(n).

No better bound can be achieved fornonseparable pointer algorithms.

Theorem 2.1.3

[Mehlhorn et al. 1988] For any nonsepa-rable pointer algorithm, both the worst-ca.se pev operation time complexity of theinterval split-find problem and the amor-tized time complexity of the intervalunion-split-find problem are L?(log log n).

Notice that this implies that for theinterval union-split-find problem the sep -

arability assumption causes an exponen-tial loss of efficiency.

2.2 Interval Union-Find

The interval union-find problem is a re-striction of the set union problem de-scribed in Section 1, where only adjacentintervals are allowed to be joined. Hence-forth, both the O(a(m + n, n) amortizedbound given in Theorem 1.1.1 andthe O(log n/log log n) single-operationworst-case bound given in Theorem 1.2.1still hold.

Whereas Tarjan’s proof of the fl(a( m+ n, n)) amortized lower bound also holdsfor the interval union-find problem,Blum’s proof does not seem to be easilyadaptable to the new problem. Hence, itremains an open problem whether a bet-ter bound than O(log n /log log n) is pos-sible for the single-operation worst-casetime complexity of separable pointeralgorithms.

It is also open whether less thanO(log log n) worst-case per operation timecan be achieved for nonseparable pointeralgorithms. Gabow and Tarjan [19851used the data structure described in Sec-tion 1.4 to obtain an 0(1) amortized timeon a random access machine.

2.3 Interval Split-Find

According to Theorems 2.1.1, 2.1.2 and2.1.3, the two algorithms given for themore general interval union-split findproblem are still optimal for the single-operation worst-case time complexity ofthe interval split-find problem. As a re-sult, each split and find operation can besupported in @(log n) and in @(log log n)time, respectively, in the separable andnonseparable pointer machine model.

The amortized complexity of this prob-lem can be reduced to O(log* n), wherelog* n is the iterated logarithm function,zas shown by Hopcroft and Unman[19’73]. Their algorithm is based upon anextension of an idea by Stearns and

‘log* n = min{z Ilog[’] n s 1}, where log”] n =log log”- 1’ n for i > 0 and log ‘0] n = n. Informally,

it is the number of times the logarithm must betaken to obtain a number less than 1.

ACM Computing Surveys, Vol. 23, No 3, September 1991

334 “ Z. Galil and G. F. Italiano

Rosenkrantz [19691. The basic data struc-ture is a tree for which each node at leveli, i ~ 1, has at most 2 ~(z– 1) children,

where F(i) = F(i – 1)2 F(’–1), for i > 1,and F(0) = 1. A node is said to be com-plete either if it is at level O or if it is atlevel i >1 and has 2‘(’ -1) children, all ofwhich are complete. The invariant main-tained for the data structure is that nonode has more than two incomplete chil-dren. Moreover, the incomplete children(if any) will be leftmost and rightmost.As in the usual tree data structures forset union, the name of a set is stored inthe tree root.

Initially, such a tree with n leaves iscreated. Its height is O(log* n), andtherefore a find(x) will require O(log* n)time to return the name of the set. Toperform a split(x), we start at the leafcorresponding to x and traverse the pathto the root to partition the tree into twotrees. It is possible to show that usingthis data structure, the amortized cost ofa split is O(log* n) [Hopcroft and Unman19731.

This bound can be further improved toO(a(m, n)) as shown by Gabow [19851.The algorithm used to establish this up-per bound relies on a sophisticated parti-tion of the items contained in each set.

Theorem 2.3.1

[Gabow 1985] There exists a data struc-ture supporting a sequence of m find andsplit operations in O(ma(m, n)) worst-casetime. The space required is O(n).

Very recently, La Poutr& [1990blproved that this bound is tight for (bothseparable and nonseparable) pointeralgorithms.

Theorem 2.3.2

[La Poutrk 1990b] Any pointer algorithmrequires LYn + m a(m, n)) time to performn – 1 split and m find operations.

Using the power of a random accessmachine, Gabow and Tarjan [19851 wereable to achieve El(l) amortized time forthe interval split-find problem. This

bound is obtained by using a slight vari-ant of the data structure sketched inSection 1.4.

2.4 Applications of Set Union on Intervals

The problem of determining the longestcommon subsequence of two input se-quences is an application of set union onintervals. This problem has many appli-cations, including sequence comparisonin molecular biology and the widely usedcliff file comparison program [Aho et al.1983]. The problem can be defined asfollows. A subsequence of a sequence x isobtained by removing elements from x.The elements removed need not neces-sarily be contiguous. The longest com-

mon subsequence of two sequences x andy is a longest sequence that is a subse-quence of both x and y.

Hunt and Szymanski [1977] deviseda solution to this problem based uponset union on intervals, Denote by x =x~, x~, . . ..xm andy=yl, yz, ..., y. thetwo sequences whose longest commonsubsequence we would like to compute.Without loss of generality, assume m < n(otherwise exchange the role of the twosequences in what follows). For eachsymbol a of the alphabet over which thetwo sequences are defined, computeOCCURRENCES(a) = { i I y, = a}. Thisgives us all the positions of the sequencey at which symbol a appears.

For the sake of simplicity, we onlymention how to find the length of thelongest common subsequence of x and y.The longest common subsequence itselfcan be found with some additional book-keeping. The algorithm considers eachsymbol XJ, 1 s j s n-z, and computes, forO s i < n, the length of the longest com-mon subsequence of the two prefixesX1, X2,..., xl and yl, yz, . . ., y,. To ac-complish this task efficiently, for a fixedj we define sets Ak of indexes as follows.Ak consists of all the integers i such thatthe longest common subsequence ofxl, X29. . , XJ. and yl, yz, . . ., y, haslength k. NotIce that the sets Ak parti-tion {1, 2, . . . . n} into adjacent intervalssince each Ah contains consecutive inte -

ACM Computing Surveys, VOI 23, No 3, September 1991

Disjoint Set Union Problems ● 335

gers and items in A~+ ~ are larger thanthose in Ah for any k. Assume we havealready computed the sets Ah up to posi-tion j – 1 of the string x. We show howto update them to apply the position j.For each r in OCCURRENCES( x~), weconsider whether we can add the matchbetween x~ and y, to the longest commonsubsequence of whether we canadd the match between x~ and yr tothe longest common subsequence ofX1, X2,..., x~. and yl, yz, . . . , Yr. Thecrucial point M that if both r – 1 and rare in A~, then all the indexes s > rbelong to A~~ ~ when x~ is considered.The following pseudocode describes thisalgorithm. The reader is referred to Huntand Szymanski [1977] for details.

begin

initialize A = {O, 1,....n};fori+l tondo

A, G ~;forj+l tondo

for r E OCCURRENCES( XJ) do begink + ~zN~(r);

if k = FZND(r – 1)then begin

SPLIT(Ah, Ah, AL, r);UNION(A~, A~+l, ‘k)

end;

end,

return( FIND( n))end

We conclude this section by mention-ing that Hunt and Szymanski’s algo-rithm has been recently improved byApostolic and Guerra [19871 and byEppstein et al. [1990]. Both these algo-rithms use similar paradigms but moresophisticated data structures.

3. SET UNION PROBLEM WITH DEUNIONS

Mannila and Ukkonen [1986al defined ageneralization of the set union problem,referred to as set union with deunions. Inaddition to union and find, the followingoperation is allowed:

denunion: Undo the most recently per-formed union operation not yet undone.

Motivations for studying this problemarise in logic programming interpretermemory management without function

symbols [Hogger 1984; Mannila andUkkonen 1986b, 1986c; Warren andPereira 1977]. In Prolog, for example,variables of clauses correspond to the ele-ments of the sets, unifications correspondto unions, and backtracking correspondsto deunions [Mannila and Ukkonen1986bl.

3.1 Algorithms for Set Union with Deunions

Recently, the amortized complexity of setunion with deunions was characterizedby Westbrook and Tarjan [1989al, whoderived a @(log n/log log n) upper andlower bound. The upper bound is ob-tained by extending the path compactiontechniques described in the previous sec-tions in order to deal with deunions. Thelower bound holds for separable pointeralgorithms. The same upper and lowerbounds also hold for the single-operationworst-case time complexity of theproblem.

We now describe @(log n/log log n)amortized algorithms for the set unionproblem with deunions [Mannila andUkkonen 1987; Westbrook and Tarjan1989a]. They all use one of the unionrules combined with path splitting andpath halving. Path compression with anyone of the union rules leads to an O(log n)amortized algorithm, since it can be seenby first performing n – 1 unions thatbuild a binomial tree (as defined, for in-stance, in Tarjan and van Leeuwen[1984]) of depth O(log n) and then byrepeatedly carrying out a find on thedeepest leaf, a deunion, and a redo ofthat union.

In the following, a union operation notyet undone will be referred to as live andotherwise as dead. To handle deunions, aunion stack is maintained, which con-tains the roots made nonroots by liveunions. Furthermore, for each node x anode stack P(x) is maintained, whichcontains the pointers leaving x createdeither by unions or by finds. During apath compaction caused by a find, the oldpointer leaving x is left in P(x) andeach newly created pointer (x, y) ispushed onto P(x), The bottommost

ACM Computing Surveys, Vol 23, No. 3, September 1991

336 * Z. Galil and G. 1? Italiano

pointer on these stacks is created by aunion and will be referred to as a unionpointer. The other pointers are createdby the path compaction performed duringthe find operations and are called findpointers. Each of these pointers is associ-ated with a unique union operation, theone whose undoing would invalidate thepointer. The pointer is said to be live ifthe associated union operation is live andit is said to be dead otherwise.

Unions are performed as in the setunion problem, except for each union anew item is pushed onto the union stackcontaining the tree root made nonrootand some bookkeeping information aboutthe set name and either size or rank. Toperform a deunion, the top element ispopped from the union stack, and thepointer leaving that node is deleted. Theextra information stored in the unionstack is used to maintain set names andeither sizes or ranks.

There are actually two versions of thesealgorithms, depending on when deadpointers are removed from the datastructure. Eager algorithms pop pointersfrom the node stacks as soon as theybecome dead (i.e., after a deunion opera-tion). On the other hand, lazy algorithmsremove dead pointers in a lazy fashionwhile performing subsequent union andfind operations. Combined with the al-lowed union and compaction rules, thisgives a total of eight algorithms. Theyall have the same time and space com-plexity, as the following theorem shows.

Theorem 3.1.1

Either union by size or union by rank incombination with either path splitting orpath halving gives both eager and lazyalgorithms that run in O(log n / log log n)amortized time for operation. The spacerequired by all these algorithms is O(n).

Proofi The time bounds for the eagerand lazy algorithms follow from Theorem1 and Theorem 2 in Westbrook andTarjan [1989 al. The space bound for theeager algorithms is 0(n). The space com-plexity of the lazy algorithms can be

shown to be 0(n) by following the stamp-ing technique introduced by Gambosi etal. [1989b], which establishes that thelazy algorithms require no more spacethan their eager counterparts. n

This bound is tight for separablepointer algorithms.

Theorem 3.1.2

[Westbrook and Tarjan 1989al Every sep-arable pointer algorithm for the set unionproblem with deunions requires at leasttl(log n / log log n) amortized time peroperation.

As for the single-operation worst-casetime complexity of set union with deu-nions, an extension of Blum’s data struc-ture described in Section 1.2 can alsosupport deunions. As a result, the aug-mented data structure will support eachunion, find, and deunion in O(log n/loglog n) time in the worst case, with an0(n) space usage.

3.2 Applications of Set Unions with Deunions

The main application of set union withdeunions arise in logic programming in-terpreter memory management withoutfunction symbols [Mannila and Ukkonen1986b]. Indeed, the most popular logicprogramming language, Prolog, uses uni-fication and backtracking as crucial op-erations [Warren and Pereira 1977]. Wenow consider the following example, bor-rowed from Clocksin and Mellish [19811to show how unification and backtrack-ing work in Prolog.

Consider a database consisting of thefollowing assertions

likes(mary, food)

likes(mary, wine)

likes(john, wine)

likes(john, mary)

whose meaning is representing the factsthat Mary likes food, Mary and John likewine, and John likes Mary. The ques-tion, “Is there anything John and Mary

ACM Computing Surveys, Vol 23, No. 3, September 1991

Disjoint Set Union Problems “ 337

both like?” is phrased in Prolog asfollows:

?- likes(mary, X), likes(john, X).

Prolog answers this question by firstattempting to unify the first term ofthe query with some assertion inthe database. The first matching factfound is likes(mary, food). As a result,the terms likes(mary, food) andlikes(mary, X) are unified and Prolog in-stantiates X to food everywhere X ap-pears in the query. Now the database issearched for the second term in the query,which is likes(john, food) because of theabove substitution. But these term failsto unify with any other term in thedatabase.

Then Prolog backtracks, that is, it“undoes” the last unification per-formed: It undoes the unification oflikes(mary, food) with Iikes(mary, X).As a result, the variable X becomes non-instantiated again. Then Prolog tries toreunify the first term of the query withanother term in the database. The nextmatching fact is likes(mary, wine), andtherefore the variable X is instantiatedto wine everywhere X appears. As be-fore, Prolog now tries to unify the sec-ond term, searching this time forIikes(john, wine). This can be unifiedwith the third assertion in the databaseand Prolog notifies the user with theanswer

X = wine.

Consequently, the execution of a Pro-log program without function symbolscan be seen as a sequence of unificationsand deunifications. Therefore, datastructures for the set union problems withdeunions can be effectively used for thisproblem. More details on this applicationcan be found in Mannila and Ukkonen[1986bl.

4. EXTENSIONS OF THE SET UNION

PROBLEM WITH DEUNIONS

Recently, other variants of the set unionproblem with deunions have been consid-ered, such as set union with arbitrary

deunions [Mannila and Ukkonen 19881,set union with dynamic weighted back-tracking [Gambosi et al. 1991c], and setunion with unlimited backtracking[Apostolic et al. 19891. In the followingsections, we will discuss these problemsand give the best algorithms known fortheir solutions.

4.1 Set Union with Arbitrary Deunions

Mannila and Ukkonen [1988] introduceda variant of the set union problem calledset union problem with arbitrary deu -nions. This problem consists of maintain-ing a collection of disjoint sets under anintermixed sequence of the followingoperations:

uniorz( x, y, A): Combine the sets con-taining elements x and y into a new setnamed A.

find(x): Output the name of the set thatcurrently contains element x.

deunion( i): Undo the ith union so farperformed.

After a deunion( i), the name of the setsare as if the ith union has never oc-curred.

Motivations for studying this problemarise in the incremental execution of logicprograms [Mannila and Ukkonen 1988].The following lower bound can be ob-tained by reduction to the intervalunion-find-split problem [Mehlhorn et al.19881, as characterized in Theorems 2.1.1and 2.1.3.

Theorem 4.1.1

[Mannila and Ukkonen 1988] The amor-tized complexity of the set union prob-lem with arbitrary deunions is tl(log n)for separable pointer algorithms and~(log log n) for nonseparable pointeralgorithms.

There is actually an O(log n) time al-gorithm matching this lower bound, asthe following theorem st~te~.

Theorem 4.1.2

[G’alil and Italiano 1991] There exists adata structure that supports each union,

ACM Computing Surveys, Vol. 23, No. 3, September 1991

338 ● Z. Galil and G. F. Italiano

find, anddeunion(i) in O(logn) time andO(n) space.

4.2 Set Union with Dynamic Weighted

Backtracking

Gambosi et al. [19911 considered a fur-ther extension of the set union problemwith deunions by assigning weights toeach union and by allowing backtrackingeither to the union of maximal weight orto a generic union so far performed. Theyrefer to this problem as the set unionproblem with dynamic weighted back-tracking. The problem consists of sup-porting the following operations.

union( A, B, w): Combine sets A, B intoa new set named A and assign weight wto the operation.

firzd( x): Return the name of the set con-taining element x.increase– weight( i, A): increase by A theweight of the ith union performed, A >0.

decrease_ weight(i, A): decrease by A theweight of the ith union performed, A > 0.

backweight: Undo all the unions per-formed after the one with maximalweight.

backtrack(i): Undo all the unions per-formed after the ith one.

Motivations for the study of the setunion problem with dynamic weightedbacktracking arise in the implementa-tion of search heuristics in the frame-work of Prolog environment design[Hogger 1984; Warren and Pereira 1977].In such a context, a sequence of unionoperations models a sequence of uni-fications between terms [Mannila andUkkonen 1986b], whereas the weight as.sociated with a union is used to evaluatethe goodness of the state resulting fromthe unification to which the union is as-sociated. Thus, backtracking correspondsto returning to the most promising stateexamined so far in the case of a failure ofthe current path of search. Furthermore,the repertoire of operations is enhancedby allowing the weight associated witheach union already performed to be up-dated (both increased and decreased).

This operation adds more heuristic powerto the algorithms for logic programminginterpreter memory management andtherefore improves the practical perfor -mance of the previous known “blind”uniform-cost algorithms. Backtrackingto the state just before any union isperformed is implemented by the back-track(i) operation. This makes it possibleto implement several hybrid strategiesbased on best-first search combined withbacktracking [Ibaraki 1978; Pearl 1984].

Theorem 4.2.1

[Gambosi et al. 1991] It is possible toperform each union, find, increase_weight, decrease– weight, back weight, andbacktrack in O(log n) time. The space re-quired is O(n). This bound is the bestpossible for any nonseparable pointeralgorithm.

Better amortized bounds can beachieved. Indeed, it is possible to performeach backweight, backtrack, andincrease– weight in O(1) amortized timeand decrease–weight and union inO(log n) amortized time. Finds canbe supported in O(log n /max{l,log(y log n)}) amortized time, where y isthe ratio of the number of finds to thenumber of unions and backtracks in thesequence. This can be obtained by usingthe data structure described in West-brook and Tarjan [1989al in combina-tion with Fibonacci heaps [Fredman andTarjan 1987]. The space required isO(n).

4.3 Set Union with Unlimited Backtracking

A further generalization of the set unionproblem with deunions was considered inApostolic et al. [1989]. This generaliza-tion is called the set union problem withunlimited backtracking since the limita-tion that at most one union could beundone per operation was removed.

As before, we denote a union not yetundone by live and otherwise by dead. Inthe set union problem with unlimitedbacktracking, deunions are replaced by

ACM Computing Surveys, Vol 23, No. 3, September 1991

Disjoint Set Union Problems g 339

the following more general operation:

backtrack(i): Undo the last i live unionsperformed for any integer i >0.

Note that this problem lies between setunion with deunions and set union withweighted backtracking. In fact, as pre-viously noted, it is more general thanthe set union problem with deunions,since a deunion can be implemented asbacktrack(l). On the other hand, it is aparticular case of the set union withweighted backtracking, where only un-weighed union, find, and backtrack op -erations are considered. As a conse-quence, its time complexity should bebetween O(log n./log log n) and O(log n).Apostolic et al. [1989] showed that thetime complexity of this problem isO(log n /log log n) for separable pointeralgorithms when unites instead of unionsare performed (i. e., when the name of thenew set can be arbitrarily chosen).

There is a strict relationship betweenbacktracks and deunions. We alreadynoted that a backtrack(1) is simply adeunion operation. Furthermore, a back-track (i) can be implemented by perform-ing exactly i deunions. Hence, a se-quence of ml unions, mz finds, and m~backtracks can be carried out by simplyperforming at most ml deunions insteadof the backtracks. Applying Westbrookand Tarjan’s algorithms to the sequenceof union, find, and deunion operations,a total of 0(( ml + mz)log n/log log n)worst-case running time will result. As aconsequence, the set union problem withunlimited backtracking can be solved inO(log n/log log n) amortized time per op-eration. Since backtracks contain de-unions as a particular case, this bound istight for the class of separable pointeralgorithms.

Using either Westbrook and Tarjan’salgorithms or Blum’s augmented datastructure, each backtrack(i) can require0( i log n/log log n) in the worst case.Also note that the worst-case time com-

plexity of backtrack(i) is at least Q(i) aslong as one insists on deleting pointersas soon as they are invalidated by back-tracking (as in the eager methods de-

scribed in Section 3.1). This is so, sincein this case at least one pointer must beremoved for each erased union. This isclearly undesirable, since i can be aslarge as n – 1. To overcome this diffi-culty, dead pointers must be removed ina lazy fashion.

The following theorem holds for the setunions with unlimited backtrackingwhen union operations are taken intoaccount.

Theorem 4.3.1

It is possible to perform each union, find,and backtrack(i) in O(log n) time in theworst case. This bound is tight for non-separable pointer algorithms.

Proofi The upper bound is a straight-forward consequence of Theorem 4.2.1since unlimited backtracking is a partic-ular case of weighted backtracking. Fur-thermore, the proof of the lower boundfor set union with weighted backtrackingcan be adapted for the new problem. ❑

If unite operations (instead of unions)are performed, the upper bound re-duces to O(log n/log log n). No betterbound is possible for separable pointeralgorithms.

Theorem 4.3.2

[Apostolic et al. 1989] There exists a datastructure that supports each unite andfind operation in O(log n / log log n) time,each backtrack in O(1) time, and requiresO(n) space.

No better bound is possible for anyseparable pointer algorithm or in the cellprobe model of the computation.

Theorem 4.3.3

For any n, any separable pointer algo-rithm for the set union with unlimitedbacktracking has single-operation timecomplexity at least Q(log n J log log n) inthe worst case. The same lower boundholds also in the cell probe model ofcomputation.

Proofi It is a trivial extension ofTheorems 1.2.2 and 1.2.3, which state

ACM Computing Surveys, Vol. 23, No. 3, September 1991

340 “ Z. Galil and G. F. Italiano

that it suffices to consider only unite andfind operations. ❑

It is somewhat surprising that the twoversions of the set union problem withunlimited backtracking have such a dif-ferent time complexity and that the ver-sion with unites can be solved more effi-ciently than the version with unions. Werecall here that after a unite (A, B), the

name of the newly created set is either Aor B. This is not significant restriction inmany applications, where one is mostlyconcerned with testing whether two elements belong to the same equivalenceclass, no matter what the name of theclass. The lower bound of fl(log n) is aconsequence of Theorem 2 in Mannilaand Ukkonen [1988], which dependsheavily on the fact that a union cannotarbitrarily choose a new name. The cru-cial idea behind the proof of Theorem 2in Mannila and Ukkonen [1988] is thatat some point we may have to chooseamong @(n) different names of a set con-taining any given element in order tooutput a correct answer. But if a newname can be arbitrarily chosen afterperforming a union, the inherent com-plexity of the set union problem withunlimited backtracking reduces tofl(log n /log log n). Hence, the constrainton the choice of a new name is responsi-ble for the gap between ~(log n/log log n)and fl(log n).

5. PERSISTENT DATA STRUCTURES FOR SET

UNION

In this section we describe persistent datastructures for the set union problem[Driscoll et al. 1989; Overmars 19831.Following Driscoll et al. [1989], we definea data structure to be ephemeral whenan update destroys the previous version.A partially persistent data structure sup-ports access to multiple versions, but onlythe most recent version can be modified.A data structure is said to be fully per-sistent if every version can be both ac-cessed and modified.

The fully persistent disjoint set unionproblem can be defined as follows.Throughout any sequence of operations,

multiple versions of a set union datastructure are maintained (i. e., multipleversions of the partition are maintained).Union operations are updates, whereasfind operations are accesses. If the jthunion operation applies to version u, u <

j, the result of the update is a newversion j. The operations on the fullypersistent data structure can be definedas follows (we use uppercase initialsto distinguish them from the correspond-ing operations on the ephemeral datastructure).

Union( x, y, u): Denote by X and Y thetwo sets in version u containing, respec-tively, x and y. If X = Y, do nothing.Otherwise, create a new version in whichX and Y are combined into a new set,The new set gets the same name as X.

Find( x, u): Return the name of the setcontaining element x in version U.

Initially the partition consists of n sin-gleton sets {1}, {2}, . . . ,{n}, and thename of set {i} is i. This is version O ofthe set union data structure. The re-stricted case in which union operationsare allowed to modify only the most re -cent version defines the partially persis-tent disjoint set union problem.

We first consider the partially persis-tent disjoint set union problem. As in thecase of the set union problem with un-limited backtracking, we have two ver-sions of this problem, depending onwhether union or unite operations areperformed. The time complexity of thetwo versions is quite different, as shownin the following two theorems.

Theorem 5.1

[Mannila and Ukkonen 1988] Thereexists a data structure that solves thepartial l-~ persistent disjoint set unionproblem in O(log n) worst-case time peroperation with an O(n) space usage. Nobetter bound is possible for pointer based

algorithms.

The amortized time of a union can befurther reduced to O(1) by using the datastructures introduced in Brown and

ACM Computmg Surveys, Vol 23, No 3, September 1991

Disjoint Set Union Problems 9 341

Tarjan [1980] and Huddleston andMehlhorn [1982]. Different data struc-tures can be also used to establish theprevious upper bound, as shown forinstance in Gaibisso et al. [1990] andMannila and Ukkonen [1988]. Further-more, if we perform unites instead ofunions, a better algorithm can be found.

Theorem 5.2

[Apostolic et al. 1989] If unites instead ofun~ons have to performed, the partiallypersistent disjoint set union problem canbe solved in O(log n /log log n) time peroperation with an O(n) space usage. Nobetter bound is possible for separablepointer algorithms and in the cell probemodel of computation.

As in the case of the set union problemwith unlimited backtracking, the con-straint on the choice of a new nameis responsible for the gap betweenO(log n /log log n) and fl(log n).

Turning to the fully persistent disjointset union problem, Italiano and Sarnak[19911 gave an efficient data structure forthis problem that achieves the followingbounds.

Theorem 5.3

[Italiano and Sarnak 1991] The fully per-sistent disjoint set union problem can besolved in O@g n) worst-case time per op-eration and in O(1) amortized space perupdate. No better pointer based algorithmis possible.

6. CONCLUSIONS AND OPEN PROBLEMS

In this paper we described the most effi-cient known algorithms for solving theset union problem and some of this prob -lem variants. Most of the algorithms wedescribed are optimal with respect to acertain model of computation (e. g.,pointer machines with or without thesepar-ability assumption, random accessmachines). There are still several intrigu-ing problems in all the models of com-putation we have considered. Fredmanand Saks’ lower bound analysis

[Fredman 1989; Fredman and Saks 19891settled both the amortized time complex-ity and the single operation worst-casetime complexity of the set union problemin the cell probe model of computation.La Poutr6 [1990b] settled the amortizedtime complexity of the set union problemfor nonseparable pointer algorithms. It isstill open, however, as to whether in thesemodels the amortized and the single op-eration worst-case complexity of the setunion problems with deunions or back-tracking can be improved. Furthermore,there are no lower bounds for some of theset union problems on intervals. In thepointer machine model with the separa-bility assumption, there is no lower boundfor the amortized nor is there one for theworst-case complexity of interval union-find. In the realm of nonseparable pointeralgorithms, it remains open whether theO(log n /log log n) worst-case bound[Blum 1986] for interval union-find istight. This problem requires E)(l) amor-tized time on a random access machineas shown by Gabow and Tarjan [19851.

ACKNOWLEDGMENTS

We would like to thank Alberto Apostolic, Giorgio

Ausiello, Hal Gabow, Lane Hemachandra, Heikki

Mannila, Kurt Mehlhorn, Neil Sarnak, Bob Tarjan,

Esko Ukkonen, Jan van Leeuwen, and Henryk

Wozniakowski who read earlier versions of this

paper and supplied many helpful comments. We

are also grateful to Salvatore March and to the

anonymous referees for their valuable suggestions.

This research was supported in part by NSF

Grant CCR-88-14977, by the ESPRIT II Basic Re-

search Actions Program of the European Communi-

ties under contract 3075 (Project ALCOM), and by

the Italian MURST Project, Algoritmi e Strutture

di Calcolo. The second author was supported in

part by an IBM Graduate Fellowship.

REFERENCES

ACKERMAN, W. 1928. Zum Hilbertshen Aufbau derreelen Zahlen. Math. Arm. 99, 118-133.

ADELSON-VELSKII, G. M., AND LANDIS, Y. M. 1962.An algorithm for the organization of the infor-mation. Souiet. Math. Dokl. 3, 1259-1262.

AHO, A. V., HOPCROFT, J. E., AND ULLMAN, J. D.1973, On computing least common ancestorsin trees. In Proceedings of the 5th Annual A CMSymposium on Theoi-y of Computmg. ACMPress, New York, NY, pp. 253-265.

ACM Computing Surveys, Vol 23, No 3, September 1991

342 ● Z. Galil and G. F. Italiano

AHO, A. V., HOPCROFT, J. E., AND ULLMAN, J, D.1974. The Design and Analysis of ComputerAlgorithms. Addison-Wesley, Reading, Mass.

AHO, A. V., HOPCROFT, J. E., AND ULLMAN, J. D.1983. Data Structures and Algorithms.

Addison-Wesley, Reading, Mass.

AHUJA, R. K., MEHLHORN, K., ORLIN, J. B., ANDTARJAN, R. E. 1990. Faster algorithms for the

shortest path problem. J. ACM 37, 213-223.

&iT-KAcl, H. 1986. An algebraic semantics ap-

proach to the effective resolution of type equa-

tions. Theor. Comut. Sci. 45, 293-351.

AiiT-KACI, H., AND NASR, R. 1986 LOGIN: A logicprogramming language with built-in inheri-tance. J. Logw Program. 3.

APOSTOLIC, A., AND GUERRA, C. 1987 The longestcommon subsequence problem revisited. Algo-rzthmica 2, 315–336.

APOSTOLIC, A., GAMBOSI, G., ITALIANO, G. F., ANDTALAMO, M 1989. The set union problem with

unlimited backtracking. Tech. Rep. CS-TR-908,Dept. of Computer Science, Purdue University.

ARDEN, B. W.j GALLER, B. A., AND GRAHAM, R M

1961. An algorithm for equivalence declara-tions. Commun. ACM 4, 310-314.

BANACHOWSKI, L. 1980. A complement of Tarjan’sresult about the lower bound on the complexity

of the set union problem. Inf. Process. Lett.11, 59-65.

BEN-AMRAM) A. M., AND GALIL, Z. 1988. On point-ers versus addresses. In Proceedings of the 29th

Annual SymposZum on Foundations of Com-puter Science. IEEE Computer Society, pp

532-538. To appear in J. ACM.

BEN-AMRAIM, A. M., AND GALIL, Z. 1990. On atradeoff in data structures. Manuscript.

BLUM, N. 1986. On the single operation worst-casetime complexity of the disjoint set union prob-

lem. SIAM J. Comput. 15, 1021-1024

BOLLOBLS, B., AND SIMON, 1. 1985. On the ex-

pected behaviour of disjoint set union prob-lems. In Proceedings of the 17th Annual ACMSymposium on Theory of Computing ACMPress, New York, NY, pp. 224-231.

BROWN, M. R., AND TARJAN, R. E 1980. Designand analysis of a data structure for represent-ing sorted lists. SIAM J. Corrtput. 9, 594-614.

CLOCKSIN, W. F., AND MELLISH, C. S. 1981. Pro-gramming in Prolog. Springer-Verlag, Berhn.

DOYLE, J., AND RIVEST, R. 1976. Linear expectedtime of a simple union-find algorithm. Znf.Process. Lett. 5, 146-148.

DRISCOLL, J. R., SARNAK, N., SLEATOR, D. D,, ANDTARJAN, R. E. 1989. Making data structures

persistent. J Comput. Syst. Sci. 38, 86-124.

EPPSTEIN, D., GALIL, Z., GIANCARLO, R. ANDITALIANO, G F. 1990. Sparse dynamic pro-gramming. In Proceedings of the 1st AnnualACM-SIAM Symposium on Discrete Algo-rithms. Society for Applied and IndustrialMathematics, Philadelphia, PA, pp. 513-522.To appear in J. ACM.

ERDOS, P. AND RtiNYI, A. 1960. On the evolution

of random graphs. Publ. Math. Inst. Hungar.Acad. Sci. bf5, 17-61.

EVEN, S. AND SHILOACH, Y. 1981. An on-line edgedeletion problem. J. ACM 28, 1-4.

FISCHER, M. J. 1972. Efficiency of equivalence al-gorithms. In Complexity of Computer Computa-tions, R. E. Miller and J. W. Thatcher, Eds.

Plenum Press, New York, pp. 153-168.

FREDMAN, M. L. 1989. On the cell probe complex-ity of the set union problem. Tech. Memoran-

dum TM-ARH-013-570, Bell Communications

Research.

FREDMAN, M. L. AND SAKS, M. E. 1989. The cellprobe complexity of dynamic data structures.

In Proceedings of the 21st Annual ACM Sym-posium on Theory of Computing. ACM Press,

New York, NY, pp. 345-354.

FREDMAN, M. L., AND TARJAN, R. E. 1987. Fl-bonacci heaps and their uses in improved net-work optimization algorithms. J ACM 34,

596-615.

GABOW, H. N. 1985. A scaling algorithm forweighted matching on general graphs In Pro-ceedings of the 26th Annual Symposium on

Foundations of Computer Sczence. IEEE Com-puter Society, pp. 90-100

GABOW, H. N., AND TARJAN, R. E. 1985. A lineartime algorithm for a special case of disjoint setunion. J. Comput. Syst. Sci. 30, 209-221.

GAIBISSO, C., GAMBOSI, G., AND TALAMO, M. 1987A partially persistent data structure for the set

union problem. RAIRO Theoretical Informaticsand Applications. 24, pp. 189-202

GALIL, Z., ITALIANO, G. F. 1989. A note on setunion with arbitrary deunions. Inf Process.Lett. 37, pp. 331-335.

GALLER, B. A., AND FISCHER, M. J. 1954. An im-proved equivalence algorithm Commun. ACM7, 301-303.

GAMBOSI, G., ITALIANO, G. F., AND TALAMO, M.1988. Getting back to the past in the union

find problem. In proceedings of the 5th Sympo-smm on Theoretical Aspects of Computer Sci-ence (STACS 1988). Lecture Notes in Computer

ScLence 294. Sprmger-Verlag, Berlin, pp. 8-17.

GAMBOSI, G., ITALIANO, G. F., AND TALAMO, M.1989. Worst-case analysis of the set union

problem with extended backtracking. !fheor.Comput. Sci. 68 (1989), 57-70.

GAMBOSI, G., ITALIANO, G F., AND TALAMO, M.1991. The set union problem with dynamic

weighted backtracking. BIT. To be published

HART, S., AND SHARIR, M. 1986. Non-linearity ofDavenport -%hinzel Sequences and of general-ized path compression schemes. Corn bmatorzca6, 151-177.

HOGGER, C. J. 1984. Introduction to Logic Pro-gramming. Academic Press.

HOPCROFT, J. E., AND KARP, R M. 1971. An algo-rithm for testing the equivalence of finite au-

ACM Computmg Surveys, Vol 23, No 3, September 1991

Disjoint Set Union Problems ● 343

tomata. Tech. Rep. TR-71-1 14, Dept. of Com-puter Science, Cornell University, Ithaca, N.Y.

HOPCROFT, J. E., AND ULLMAN, J. D. 1973. Set

merging algorithms. SIAM J. Cornput. 2,294-303.

HUDDLESTON, S., AND MEHLHORN, K. 1982. Anew

data structure for representing sorted lists.Acts Inf. 17, 157-184.

HUET, G. 1976. Resolutions d’equations clans les

langages d’ordre 1,2, a. Ph.D. dissertation,Univ. de Paris VII, France.

HUNT, J. W., AND SZYMANSKI, T. G. 1977. A fastalgorithm for computing longest common sub

sequences. Commun. ACM 20, 350-353.

IBARAK1, T. 1978. M-depth search in branch and

bound algorithms. Int. J. Comput. Inf. Sci. 7,

313-373.

IMAI, T., AND ASANO, T. 1987. Dynamic segment

intersection with applications. J. Algorithms

8, 1-18.

ITALIANO, G. F., AND SARNAK, N. 1991. Fully per-

sistent data structures for disjoint set unionproblems. In Proceedings Workshop on Algo-rithms and Data Structure (Ottawa, Canada).

To appear.

KARLSSON, R. G. 1984. Algorithms in a restricted

universe. Tech. Rep. CS-84-50, Dept. of Com-puter Science, University of Waterloo.

KERSCHENBAUM, A., AND VAN SLYKE, R. 1972.

Computing minimum spanning trees effi-ciently. In Proceedings of the 25th Annual

Conference of the ACM. ACM Press, New York,

NY, pp. 518-527.

KNIGHT, K. 1989. Unification: a multidisciplinarysurvey ACM Comput. Surv. 21, 93–124.

KNUTH, D. E. 1968. The Art of Computer Pro-gramming. Vol. 1, Fundamental Algorithms.Addison-Wesley, Reading, Mass.

KNUTH, D. E. 1976. Big omicron and big omegaand big theta. SIGACT News 8, 18-24.

KNUTH, D. E., AND SCHONAGE, A. 1978. The ex-pected linearity of a simple equivalence algo-

rithm. Z’heor. Comput. Sci. 6, 281-315.

KOLMOGORV, A. N. 1953. On the notion of algo-rithm. Uspehi Mat. Nauk. 8, 175-176.

KRUSKAL, J. B. 1956. On the shortest spanningsubtree of a graph and the traveling salesmanproblem. Proc. Am. Math. Sot. 7, 48-50.

LA POUTRti, J. A. 1990a. New techniques for the

union-find problem. In Proceedings of the 1stAnnual ACM-SIAM Symposium on Discrete Al-gorithms. Society for Industrial and AppliedMathematics, Philadelphia, PA, pp. 54-63.

LA POUTRti, J. A. 1990b. Lower bounds for theunion-find and the split-fhd problem on pointermachines. In Proceedings of the 22th AnnualACM Symposium on Theory of Computing.ACM Press, New York, NY, pp. 34-44,

LOEBL, M., AND NE5ETiiIL, J. 1988. Linearity andunprovability of set union problem strategies.In Proceedings of the 20th Annual ACM Sym-

posium on Theory of Computing. ACM Press,

New York, NY, pp. 360-366.

MANNILA, H., AND UKKONEN, E. 1986a. The set

union problem with backtracking. In Proceed-

ings of the 13th International Colloquium onAutomata, Languages and Programming(ICALP 86). Lecture Notes in Computer Sc~ence

226. Springer-Verlag, Berlin, pp. 236-243.

MANNILA, H., AND UKKONEN, E. 1986b. On thecomplexity of unification sequences. In Pro-ceedings of the 3rd International Conference onLogic Programming. Lecture Notes in Com-puter Science 225. Springer-Verlag, Berlin, pp.122-133.

MANNILA, H., AND UKKONEN, E. 1986c. Time-stamped term representation for implementingProlog. In Proceedings of the 3rd IEEE Confer-

ence on Logic Programming. The MIT Press,

Cambridge, MA, pp. 159-167.

MANNILA, H., AND UKKONEN, E. 1987. Space-timeoptimal algorithms for the set union problemwith backtracking. Tech. Rep. C-1987-80, Dept.of Computer Science, University of Helsinki,

Helsinki, Finland.

MANNILA, H., AND UKKONEN, E. 1988. Time pa-

rameter and arbitrary deunions in the set unionproblem. In Proceedings of the 1st Scandrna -

uian Workshop on Algorithm Theory (SWAT

88). Lecture Notes in Computer Sc~ence 318.Springer-Verlag, Berlin, pp. 34-42.

MEHLHORN, K. 1984a. Data Structures and Algo-

rithms. Vol. 1, Sorting and Searching.

Springer-Verlag, Berlin.

MEHLHORN, K. 1984b. Data Structures and Algo-

rithms. Vol. 2, Graph Algorithms and NP-

Completeness. Springer-Verlag, Berlin.

MEHLHORN, K. 1984c. Data Structures and Algo-rithms. Vol. 3, Multidimensional Searchingand Computational Geometry, Springer-Verlag,Berlin.

MEHLHORN, K., AND NAHER, S. 1990. Dynamic

fractional cascading. Algorith mica 5, 215-241.

MEHLHORN, K., NAHER, S., AND ALT, H. 1988. A

lower bound for the complexity of the union-

eplit-find problem. SIAM J. Comput. 17,

1093-1102.

NIEVERGELT, J., AND REINGOLD, E. M. 1973. Bi-nary search trees of bounded balance. SIAM J.Comput. 2, 33-43.

OVERMARS, M. H. 1983. The design of dynamic

data structures, Lecture Notes in Computer

Science 156. Springer-Verlag, Berlin.

PEARL, J. 1984. Heuristics. Addison-Wesley,

Reading, Mass.

SCHONAGE, A. 1980. Storage modification ma-chines. SIAM J. Comput. 9. 490–508.

STEARNS, R. E., AND LEWIS, P. M. 1969. Propertygrammars and table machines. Znf. Control 14,

524-549.

STEARNS, R. E., AND ROSENKRANTZ, P. M. 1969.Table machine simulation. Conference RecordsIEEE 10th Annual Symposium on Switching

ACM Computing Surveys, Vol. 23, No 3, September 1991

344 “ Z. Galil and G. F. Italiano

and A utornata Theory. IEEE Computer Soci-ety, pp. 118-128.

TARJAN, R. E. 1973. Testing flow graph repro-ducibility In Proceedings of the 5th AnnualACM Symposium on Theory of Computing.

ACM Press, New York, NY, pp. 96-107.

TARJAN, R. E. 1974. Finding dominators in di-rected graphs. SIAM J. Comput. 3, 62-89.

TARJAN, R. E. 1975. Efficiency of a good but not

linear set union algorithm. J. ACM 22,215-225.

TARJAN, R. E. 1979a. A class of algorithms whichrequire non linear time to maintain disjointsets. J. Comput. Syst. Sci. 18, 110-127.

TARJAN, R. E. 1979b. Application of path com-pression on balanced trees. J. ACM 26,

690-715.

TARJAN, R. E. 1982. Sensitivity analysis of mini-mum spanning trees and shortest path trees.Inf. Process. Lett. 14, 30-33

TARJAN, R. E. 1983. Data structures and networkalgorithms. SIAM, Philadelphia, Penn.

TARJAN, R. E. 1985. Amortized computationalcomplexity. SIAM J. Algebraic Discrete Meth -

ods6, 306-318.

TARJAN, R. E., AND VAN LEEUWEN, J. 1984.Worst-case analysis ofset union algorithms. J.ACM 31, 245-281.

VAN DER WEIDE, T. 1980. Data Structures: AnAxiomatic Approach and the Use of BinomZalTrees in Developing and Analyzing Algorithms.Mathematisch Centrum. Amsterdam, TheNetherlands.

VAN EMDE BOAS, P. 1977. Preserving order in aforest in less than logarithmic time and linear

space. Inf. Process. Lett. 6, 80-82

VAN EMDE BOAS) P., KAAS, R., AND ZIJLSTRA) E.1977. Design and implementation of an effi-cient priority queue. Math. Syst Theory 10,99-127.

VAN LEEUWEN, J., AND VAN DER WEIDE, T. 1977.Alternative path compression techmques. Tech.Rep. RUU-CS-77-3, Dept. of Computer Science,

University of Utrecht, Utrecht, The Nether-

lands.

VITTER, J S., AND SIMONS, R. A. 1986. New classesfor parallel complexity: A study of unification

and other complete problems for P IEEE

Trans. Comput. C-35.

WARREN, D. H D., ANDPEREIRA,L. M. 1977. Pro-

log: The language and its implementation com-

pared with LISP. ACM SIGPLAN Not~ces 12,109-115.

WESTBROOK, J, 1989. Algorithms and data struc-

tures for dynamic graphs problems Ph.D. dis-sertation. Also Tech. Rep. CS-TR-229-89, Dept.of Computer Science, Princeton University,Princeton, N.J

WESTBROOK, J., AND TARJAN, R. E. 1989a. Amor-

tized analysis of algorithms for set union with

backtracking. SIAM J Comput. 18, 1-11.WESTBROOK, J., AND TARJAN, R. E, 1989b. Main-

taining bridge-connected and biconnected com-ponents on-line. Tech, Rep. CS-TR-228-89,

Dept, of Computer Science, Princeton Univer-

sity, Princeton, N. J.

YAO, A C. 1976. On the average behavior of set

merging algorithms. In Proceedings of the 8thAnnual ACM Symposzum on Theory of Com-puting. ACM Press, New York, NY, pp

192-195.

YAO, A. C. 1981. Should tables be sorted? J. ACM

28.615-628.

Received November 1988; final revision accepted September 1990

ACM Computing Surveys, Vol 23, No. 3, September 1991