uniroma1.itfinocchi/papers/phdthesis.pdf · i abstract in this thesis we study algorithmic problems...

160
Universit ` a degli Studi di Roma “La Sapienza” Dottorato di Ricerca in Informatica XIII Ciclo – 2002– XIII-02-1 Hierarchical Decompositions for Visualizing Large Graphs Irene Finocchi

Upload: others

Post on 01-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Università degli Studi di Roma “La Sapienza”

    Dottorato di Ricerca in Informatica

    XIII Ciclo – 2002– XIII-02-1

    Hierarchical Decompositions for Visualizing

    Large Graphs

    Irene Finocchi

  • Università degli Studi di Roma “La Sapienza”

    Dottorato di Ricerca in Informatica

    XIII Ciclo - 2002– XIII-02-1

    Irene Finocchi

    Hierarchical Decompositions for Visualizing

    Large Graphs

    Thesis Committee

    Prof. Rossella Petreschi (Advisor)Prof. Giuseppe Di BattistaProf. Stefano Levialdi

    Reviewers

    Prof. Narsingh DeoProf. Bernard Moret

  • Author’s address:

    Irene FinocchiDipartimento di InformaticaUniversità degli Studi di Roma “La Sapienza”Via Salaria 113, I-00198 Roma, Italye-mail: [email protected]

    www: http://www.dsi.uniroma1.it/~finocchi

  • i

    Abstract

    In this thesis we study algorithmic problems related to the visualization oflarge graphs. We devise techniques based on hierarchical decompositions, im-posed either on the graph or on its drawing. The first approach leads toconsidering clustered representations of graphs, where some parts are visual-ized in detail and others are collapsed into single vertices or filled-in regions:this makes it possible both to maintain context and to reduce the amountof displayed information by hiding irrelevant details and uninteresting partsof the structure. The second approach leads to studying hierarchical draw-ings, where vertices are constrained to lie on a set of parallel lines and edgesare represented as polygonal chains. In this case the graph is drawn entirely,though its visualization may not fit in the screen: the assumption behind theuse of the layered convention is that drawings exceeding only the screen heightfacilitate tracing paths in the graph, and thus are easier to be explored thandrawings exceeding both height and width.

    Original contributions of this dissertation include algorithms for computinghierarchical decompositions of graphs and for reducing the visual complexityof layered drawings. We analyze our algorithms both theoretically and exper-imentally.

    In more detail, Chapters 3 to 5 are concerned with clustered representa-tions. Chapter 3 tackles the problem of characterizing hierarchical decompo-sitions that preserve the structure and the relevant properties of the originalgraph, introducing the new notion of P-validity. Chapter 4 presents efficientstructure-preserving algorithms for tree clustering; our algorithms also guaran-tee decomposition properties, such as balanced cluster size and limited numberof clusters, especially useful in graph visualization applications. Chapter 5 isrelated to the computation of graph decompositions tailored to special-purposeexploration tasks, such as finding dense subgraphs and large cuts: it describestwo partitioning algorithms amenable to externalization and to paralleliza-tion, respectively, and thus able to manage graphs that could not even fit inmain memory. The remaining part of the thesis focuses on layered drawings.Chapter 6 addresses the crossing minimization problem, pointing out a strongrelation with the computation of feedback arc sets in directed graphs andproposing a new strategy that outperforms experimentally all the algorithmsfrom the literature, both on random and on real test sets. Finally, Chapter 7studies the possibility of imposing constraints to the drawing: it proves thatthe problem of deciding if a set of constraints is satisfiable is NP-complete andsuggests approximation strategies for its optimization version.

  • ii

  • iii

    Acknowledgments

    I first of all wish to thank my advisor and co-author Rossella Petreschi forintroducing me to algorithmic research and for being always present duringthese years. I enjoyed Rossella’s encouragement and support since I was anundergraduate student.

    I am sincerely grateful to Pino Di Battista and to Stefano Levialdi for serv-ing, together with Rossella, in my thesis committee and for being constantlyinterested in my research activities. I also thank Pino for having sparked myinterest in graph drawing and Stefano for organizing stimulating conferenceson visual interfaces that I have had the pleasure to attend.

    I am indebted to Narsingh Deo and Bernard Moret, who kindly acceptedto serve as external reviewers of this dissertation: their constructive commentsallowed me to improve significantly the presentation of these results.

    I owe a lot to Pino Italiano, for his kindness and for giving me the possibilityto spend a period at the AT&T Shannon Research Laboratories, and to JamesAbello, who hosted me during that period. Working at the Labs has been anincredibly challenging experience. I am also grateful to Giorgio Ausiello forhis constant support and attention to my research activities.

    Many people at the Department of Computer Science of the University ofRome “La Sapienza” gave me friendly assistance throughout these years. Inparticular, I thank Rossella, Giancarlo Bongiovanni, and Francesco Malvestutofor coordinating the PhD program, and Corrado Böhm for always attendingmy talks, promoting interesting discussions and suggesting me unconventionalresearch directions.

    I would like to thank all my co-authors for their expertise and enthusiasmfor our common research projects. Along the way I had substantial help frommany other friends, roommates, and colleagues that would be a long list tomention. I am grateful to all of them.

    I am glad to cite the project “Algorithms for large data sets: science andengineering” of the Italian Ministry for Scientific Research, which allowed meto travel and to attend many conferences.

    A special mention goes to my parents Marina and Agostino and to mygrandfather Mario: they have been a solid support during these years andhave lovingly shared with me successes and delusions.

    My human and professional development would not have been certainlypossible without the help and love of my husband Camil: I thank Camil forbeing always part of what I do.

  • iv

  • v

    The following concertos have been a magical source of inspirationduring the preparation of this thesis:

    Concerto for Violoncello and Orchestra no. 6 in D major, G 479(Luigi Boccherini)

    Guitar Concerto in A major, Op. 30(Mauro Giuliani)

    Violin Concerto no. 0 in E minor(Nicolo’ Paganini)

    Concerto no. 11 Op. 3 in D minor, RV 565(Antonio Vivaldi)

  • vi

  • Contents

    1 Introduction 1

    1.1 The screen bottleneck . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Original contributions of the thesis . . . . . . . . . . . . . . . . 5

    2 Preliminaries 11

    2.1 Graph terminology . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.3 Hierarchical decompositions of graphs . . . . . . . . . . . . . . 14

    2.3.1 Definitions and basic properties . . . . . . . . . . . . . . 14

    2.3.2 Design criteria . . . . . . . . . . . . . . . . . . . . . . . 16

    2.4 Hierarchical drawings . . . . . . . . . . . . . . . . . . . . . . . 18

    2.4.1 Layout aesthetic criteria . . . . . . . . . . . . . . . . . . 19

    2.4.2 The hierarchical approach . . . . . . . . . . . . . . . . . 20

    3 Structure-preserving hierarchical decompositions 23

    3.1 The concept of P-validity . . . . . . . . . . . . . . . . . . . . . 243.2 A structure theorem for valid hierarchy trees . . . . . . . . . . 25

    3.3 P-validity testing . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 32

    4 Tree decompositions via vertex deletion 33

    4.1 On t-dividers and their properties . . . . . . . . . . . . . . . . . 35

    4.1.1 A characterization of the set of t-dividers . . . . . . . . 36

    4.1.2 Finding t-dividers . . . . . . . . . . . . . . . . . . . . . 39

    4.2 Näıve decomposition approaches . . . . . . . . . . . . . . . . . 39

    4.3 A near optimal decomposition algorithm . . . . . . . . . . . . . 43

    4.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.4.1 Two well-known clustering algorithms . . . . . . . . . . 49

    4.4.2 Instance generators and performance indicators . . . . . 51

    4.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 52

    4.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 56

    vii

  • viii CONTENTS

    5 Exploring large graphs via graph decompositions 595.1 Discovering dense subgraphs . . . . . . . . . . . . . . . . . . . . 61

    5.1.1 The basic partitioning step . . . . . . . . . . . . . . . . 615.1.2 Constructing hierarchy trees . . . . . . . . . . . . . . . . 65

    5.2 Data structure for maintaining graph sketches . . . . . . . . . . 665.2.1 Navigation operations . . . . . . . . . . . . . . . . . . . 69

    5.3 Sketching dense subgraphs . . . . . . . . . . . . . . . . . . . . . 715.4 Discovering large cuts . . . . . . . . . . . . . . . . . . . . . . . 73

    5.4.1 A partitioning algorithm based on max cut . . . . . . . 735.4.2 Data structure for maintaining sketches . . . . . . . . . 80

    5.5 Sketching large cuts . . . . . . . . . . . . . . . . . . . . . . . . 805.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 82

    6 Minimizing crossings in hierarchical drawings 836.1 Computing feedback arc sets in directed graphs . . . . . . . . . 856.2 The penalty approach to crossing minimization . . . . . . . . . 916.3 Removing cycles for minimizing crossings . . . . . . . . . . . . 93

    6.3.1 An AP-reduction from CP to FAS . . . . . . . . . . . . 946.3.2 The relaxed penalty minimization scheme . . . . . . . . 956.3.3 The approximation algorithm for CP . . . . . . . . . . . 95

    6.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . 986.4.1 Algorithms under evaluation . . . . . . . . . . . . . . . 986.4.2 Performance indicators . . . . . . . . . . . . . . . . . . . 1006.4.3 Test sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    6.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 1046.5.1 Randomly generated test sets . . . . . . . . . . . . . . . 1046.5.2 Warfield instances and solution structure . . . . . . . . 1126.5.3 Real test sets . . . . . . . . . . . . . . . . . . . . . . . . 113

    6.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 114

    7 Constrained hierarchical drawings 1177.1 Hierarchical realizability . . . . . . . . . . . . . . . . . . . . . . 119

    7.1.1 NP-completeness . . . . . . . . . . . . . . . . . . . . . . 1197.1.2 Approximability . . . . . . . . . . . . . . . . . . . . . . 122

    7.2 One-sided bipartite realizability . . . . . . . . . . . . . . . . . . 1237.2.1 NP-completeness . . . . . . . . . . . . . . . . . . . . . . 1237.2.2 Approximability and polynomiality . . . . . . . . . . . . 125

    7.3 Constrained crossing minimization . . . . . . . . . . . . . . . . 1277.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 130

    8 Conclusions and further directions 131

  • Chapter 1

    Introduction

    The representation and exploration of complex information spaces is a keycomponent of support tools for many applications. In fact, the ubiquity ofcomputer technologies makes larger and larger amounts of data available tonon-specialist users, and visualization appears as an effective mean to easilyunderstand and explore the relations between them. Examples of applicationsthat typically deal with large data sets and benefit from their visualization arerelated to the analysis of the World-Wide Web, the Internet structure, tele-phone call graphs, co-authorship and citation networks of scientists, semanticnetworks, knowledge bases, and large software engineering projects.

    It is well known that graphs are well suited for modeling structured infor-mation from a variety of domains and that graph-based visualizations usuallyconvey information about the structure and the properties of a graph morequickly and effectively than textual descriptions and other displays based ongraphical attributes such as size, color, shape, or proximity. This led in thelast years to the design of several algorithms and techniques for computingreadable layouts of graphs and networks. Many drawing conventions havebeen considered in the literature and algorithms aimed at optimizing differentaesthetic criteria (e.g., number of edge crossings, area of the drawing, symme-try) have been devised [35]. However, in contrast with the increasing size ofgraphs managed by real applications, standard visualization algorithms do notgenerally scale up well and the majority of the results known in graph-drawingliterature are able to deal just with graphs of small or medium size. This isdue not only to efficiency reasons, but mostly to the finite resolution of thescreen, which represents a physical limitation on the size of the graphs thatcan be conveniently displayed. In order to cope with the cluttering effects dueto the visualization of huge quantities of data, it is therefore imperative touse information hiding techniques and decompositions of the visual space thatreflects some structural view of the data.

    1

  • 2 CHAPTER 1. INTRODUCTION

    This thesis is concerned with the study of algorithmic problems relatedto the use of hierarchical decompositions for visualizing large graphs. In par-ticular, we consider two kinds of decompositions, that can be imposed eitheron the graph or on its drawing. The first approach leads to studying clus-tered representations of graphs, where some parts of the graph are visualizedin detail and others are collapsed into single vertices or filled-in regions: thismakes it possible both to maintain context during the exploration, and toreduce the amount of displayed information by hiding irrelevant details anduninteresting parts of the structure. The second approach leads to consid-ering hierarchical drawings, where vertices are constrained to lie on a set ofparallel lines and edges are represented as polygonal chains. In this case thegraph is displayed entirely and the assumption behind the use of this drawingconvention is that, when a graph is too large to fit on a screen, drawings ex-ceeding the screen height are easier to navigate than drawings exceeding bothheight and width. Thanks to the possibility of exploring a graph by movingonly in one dimension, width-restricted hierarchical drawings allow the userto easily trace paths in the graph and are therefore preferred to other drawingparadigms (e.g., orthogonal or straight-line) in case of large graphs.

    Original contributions of this dissertation include algorithms for computinghierarchical decompositions of graphs and for reducing the visual complexityof hierarchical drawings. We analyze most of our algorithms both theoreticallyand experimentally.

    In the remainder of this chapter we first discuss the main issues arising inthe visualization of large graphs (Section 1.1) and then we survey the mainachieved results of this thesis (Section 1.2).

    1.1 The screen bottleneck

    In this section we consider the crucial problems related to the visualization oflarge graphs and we examine the solutions proposed in the literature.

    When dealing with large graphs, the running time and the memory require-ments of the algorithms for producing graph layouts are not the only criticalmeasures. Probably, the most fundamental issue is the so-called screen bottle-neck, which is a consequence of the fact that the amount of information thatcan be displayed at once is ultimately limited by the resolution of the displaydevice (e.g., the number of available pixels of the screen). In addition, from acognitive point of view it is often unnecessary to display a large graph entirelydue to the speed at which the information can be digested by a user. In otherwords, even though a display device with high resolution might leverage thescreen bottleneck, it does not help the user’s visual processing abstraction,unless the display metaphor incorporates some global data set semantics.

  • 1.1. THE SCREEN BOTTLENECK 3

    In the rest of this section we describe mechanisms to alleviate the screenbottleneck. We present different approaches that have been considered in liter-ature, based on clustering, navigation, and fisheye views, respectively. Mixedapproaches have been also proposed, and thus any strict classification attemptmight be too restrictive. We start by briefly discussing how classical graphdrawing criteria relate to the problem of visualizing large graphs.

    Drawing conventions and aesthetic criteria. Standard graph drawingalgorithms attempt to produce layouts of graphs according to specific drawingconventions and optimizing specific aesthetic criteria. This is still importantwhen the graph to be visualized is large, but is not sufficient. For instance,it is worth trying to minimize the area of the drawing, though many kindsof drawings suffer from intrinsic geometric lower bounds [35]. In addition,drawing conventions that are very effective when small graphs are involvedtend to be less useful in case of large graphs. For instance, orthogonal drawingsappear not to properly convey the global view of the graph in both two andthree dimensions and their use for visualizing large graphs is discouraged [102].

    The clustering approach. It is well-known that visualization should sup-port concentration on local details while providing an overview of importantaspects of the context [113]. Thus, a natural approach to visualizing largegraphs consists of integrating portions of the visualization at different levels ofdetail. The clustering approach uses recursion combined with graph decom-position in order to build a sequence of contractions of the original graph G:informally speaking, a contraction is a high-level representation of the graphand is obtained by grouping together suitably chosen sets of related verticesto form clusters and by computing induced edges between clusters. Clusterscan be then represented as single vertices or filled-in regions, thus obtaininga high-level visualization of the graph while reducing the amount of displayedinformation by hiding irrelevant details and uninteresting parts of the struc-ture. Visualizing graph contractions is also useful in the context where theuser’s time is at premium: in this case it is appropriate to give a contractedvisualization so as to save reading time. The sequence of contractions definesa hierarchical decomposition of the original graph G represented by meansof a data structure usually called hierarchy tree [18, 38, 53]. The user cantherefore explore the graph by performing expansions and shrinks of clusterson the hierarchy tree.

    Hierarchical decompositions should fulfill several requirements that are im-portant for their effective deployment in graph visualization. For instance, it isimportant that any contraction of a graph G in the decomposition reflects thetopology and the properties of G, so as not to mislead the viewer. We will coverthis and related aspects in Section 2.3. Many previous works have pointed outthe utility of hierarchical decompositions for drawing large graphs provided

  • 4 CHAPTER 1. INTRODUCTION

    such requirements are satisfied (see, e.g., [18, 38, 40, 51, 53]). The problem ofdesigning data structures to support efficiently expansions and shrinks of clus-ters is addressed in [18]. Algorithms for computing multilevel visualizationsand orthogonal drawings of clustered graphs are discussed in [40, 41]. Theuse of binary space partitions to produce graph clusters is addressed in [39].The problem of computing planarity-preserving clustering and embedding forlarge planar graphs is studied in [38]. Other works have highlighted the ne-cessity of constructing recursive graph decompositions by means of suitablepartitioning algorithms when they are not pre-established. One of the topicsof this thesis is the design of algorithms for computing decompositions thatcan be effectively used in graph visualization applications.

    The navigation approach. Navigation strategies can be also exploited toalleviate the screen bottleneck [29, 73, 102]. With this approach, during thenavigation the user is allowed to see only the content of a limited size windowthat he/she can move for exploring the graph. Hence, the graph is displayedas a sequence of drawings determined by the navigation path followed by theuser. Since consecutive drawings in the sequence can have a large overlap, it isessential to devise visualization strategies that preserve the mental map of theuser [95]. Smooth transitions are particularly useful at this aim. We remarkthat the visualization window can be either geometric or topological.

    Using geometric visualization windows corresponds to the simplest nav-igation strategy, where the window is moved over a pre-computed drawing.In order this approach to be effective, the graph must be completely knownin advance and the drawing convention must be suitably chosen. As al-ready observed, orthogonal drawings are not well suited to be explored inthis sense [102], while hierarchical drawings can be effectively used for navi-gating in large directed graphs provided their width is kept bounded [17]: infact, the user is forced to move along a preferred exploration direction and itis easier to explore a graph navigating in one dimension, instead than in twodimensions.

    Topological windows, on the other side, are defined in terms of the struc-ture of the graph. For example, at each time of the navigation a topologicalwindow may display the subgraph induced by the vertices that have a con-stant topological distance from a given vertex, considered as the center of thewindow. The use of topological windows has been explored in [29], where aframework that exploits the knowledge of future moves of the user in order topreserve the mental map during the navigation has been proposed.

    The fisheye approach. Methods for stretching and distorting spaces for theeffective visualization of data have been finally considered. In particular, fish-eye views use the same idea as fisheye lenses in photography, i.e., to magnifyan image near the lens’ focus much more than at its periphery. Fisheye tech-

  • 1.2. ORIGINAL CONTRIBUTIONS OF THE THESIS 5

    niques were originally applied for navigating in tree structures and in sourcecode texts [58], but have been later applied to geographic data [109] (actually,graphs embedded in the plane): the focused region is enlarged, while otherregions are reduced in size. Multiple foci and different kinds of distortions canbe also supported [109].

    Other distorting methods employ the characteristics of 3D graphics and ofhyperbolic spaces for visualizing and navigating large trees. Cone trees [107]and the hyperbolic browser [86] offer prime examples of these kinds of visu-alizations. Though these methods appear to be very effective, it is not clearhow they can be adapted to deal with general graphs.

    1.2 Original contributions of the thesis

    This dissertation consists of two main parts. The first part covers Chap-ters 3 to 5 and is related to hierarchical decompositions of graphs. The secondpart covers Chapters 6 and 7 and presents algorithms for reducing the visualcomplexity of hierarchical drawings. Chapter 2 provides basic notation, defi-nitions, and properties that will be useful later on for describing the originalcontributions of the thesis. We now survey the main achieved results.

    Structure-preserving hierarchical decompositions

    Designing decompositions that preserve the relevant features and propertiesof graphs, so that the viewer can grasp them even observing contractions, isan important aspect for their effective deployment for visualizing large graphs.Unfortunately, most partitioning algorithms do not address this issue.

    Obtained results. We introduce the concept of P-validity of hierarchicaldecompositions with respect to a given property P: this notion reflects thesimilarity between the topological structure of the original graph and of itscontractions on the hierarchy tree. We present conditions on the structureof the clusters necessary and sufficient to guarantee the P-validity of the en-tire decomposition when the original graph is a tree and property P is theacyclicity. Although our structure theorem is interesting in its own right, ithas an interesting algorithmic implication: given a hierarchy tree associatedto a tree T , it is possible to check in polynomial time that each contraction ofT obtained from the hierarchy tree is valid, despite the fact that the numberof different contractions may be exponential.

    These results are presented in Chapter 3 and appeared in the Proceedingsof the 7th Annual International Computing and Combinatorics Conference(COCOON’01) [54].

  • 6 CHAPTER 1. INTRODUCTION

    P-valid tree decompositionsTree decompositions, apart from being useful subroutines for partitioninggeneric graphs, find application in disparate areas, ranging from parallel anddistributed computations to external searching and operating systems. Manytree partitioning algorithms have been therefore designed, but none of themconsiders the problem of computing valid hierarchical decompositions withstructural requirements useful in graph visualization applications.

    Obtained results. Based on the design criteria described in Section 2.3 andon the concept of P-validity introduced in Chapter 3, we study the problemof finding tree decompositions that: 1) are valid; 2) have bounded degree; 3)have logarithmic depth; and 4) are balanced.

    We first show that näıve approaches can easily guarantee either logarith-mic depth or bounded degree for the hierarchy tree, but not both, unless forspecial classes of trees. We then present an algorithm that overcomes thesedrawbacks and, if n is the number of vertices of the original tree, computesin O(n log n) worst-case running time hierarchy trees that meet all the afore-mentioned requirements. (We remark that Ω(n) is a trivial lower bound onthe construction of any hierarchy tree.) Our algorithm hinges upon the newconcept of t-divider, which generalizes concepts well-known in graph-theoryliterature, and exploits a reduction to a classical scheduling problem. Finally,we experimentally investigate the effectiveness of our algorithms by comparingthem against well-known clustering procedures.

    These results are presented in Chapter 4 and appeared in the Proceed-ings of the 3rd Workshop on Algorithm Engineering and Experimentations(ALENEX’01) [53] and in the Proceedings of the 1st Cologne-Twente Work-shop on Graphs and Combinatorial Optimization (CTW’01) [55].

    Graph decompositions tailored to exploration tasks

    When visualizations are used for special-purpose tasks, decomposition designcriteria more specific than P-validity can be considered. For example, in orderto use sketches of large graphs to guide the search for subgraphs with certaininteresting properties, it is convenient to exploit decompositions that are tai-lored to the specific problem of interest and to use different decompositionsfor different goal-driven navigations.

    Obtained results. We consider algorithmic problems related to the com-putation of hierarchical graph decompositions suited for specific explorationtasks. We exemplify the idea of visualization-guided subgraph discovering byaddressing two common problems: finding large cuts and locating dense sub-graphs. Since ideally one would be interested in finding maximum size cuts and

  • 1.2. ORIGINAL CONTRIBUTIONS OF THE THESIS 7

    maximum size cliques, respectively, but both problems are NP-hard, we sug-gest heuristics or approximation algorithms aimed at finding good solutions.Our algorithms, besides being fast and simple, have other appealing features:they are both combinatorial in nature and amenable to externalization and toparallelization, respectively. This can be very important when dealing withmassive data sets that, for instance, could not even fit in main memory. Wethen show how the partitioning algorithms can be turned into effective graphsketches, i.e., into zoomable visualizations that offer simple overall views ofthe graph structure tailored to the particular problem of interest.

    In more detail, with respect to the maximum cut problem, we present abipartitioning algorithm that works on 3-regular graphs and runs in O(log n)worst-case running time on a CRCW P-RAM model with O(n) processors. (Weremark that very simple methods to transform generic graphs into 3-regulargraphs are known.) Our algorithm approximates the size of a maximum cutby a factor 4/3, thus improving the best known parallel approximation ratio(i.e., 2). With respect to the dense subgraph problem, we present a par-titioning algorithm that is based on breadth-first search, thus providing a2-approximation of the diameter of the graph and inducing edge partitionssuch that the vertices of any clique can belong to at most two clusters. Thealgorithm has been incorporated into MGV, the Massive Graph Visualizer de-veloped at the AT&T research laboratories and used to navigate graphs withvertex set sizes ranging from 100 to 250 million vertices.

    These results are presented in Chapter 5 and appeared in the Proceedings ofthe 7th IEEE Symposium on Information Visualization (INFOVIS’01) [1] andin the international journal Parallel Algorithms and Applications [20]. Ex-tended abstracts also appeared in the Proceedings of the 5th Asian ComputingScience Conference (ASIAN’99) [22] and in the Proceedings of the ArgentinianWorkshop on Theoretical Computer Science (WAIT’99) [21].

    Crossing minimization in hierarchical drawings

    Empirical investigations have shown that the most important aesthetic crite-ria for the readability of graph layouts is by far the number of edge crossings.Despite the fact that crossing minimization in hierarchical drawings is NP-hard, due to the importance of this aesthetic criterion many heuristics andapproximation algorithms have been designed. Unfortunately, experimentalstudies show a considerable gap between the theoretical results and the realbehavior of the algorithms, even on simplified versions of the problem: al-gorithms achieving constant approximation ratios are often outperformed byalgorithms with worse theoretical performances or even by simple heuristics.

    Obtained results. We consider the one-sided crossing minimization prob-lem, i.e., the problem of minimizing crossings in bipartite drawings when the

  • 8 CHAPTER 1. INTRODUCTION

    permutation of vertices on one side of the bipartition is fixed. According towell-known methods, this is an important subroutine for crossing minimizationin hierarchical drawings.

    We address a strong relation between one-sided crossing minimization andthe computation of feedback arc sets in a penalty digraph, i.e., the problemof computing a minimum cost sets of arcs whose removal makes the penaltydigraph acyclic. We propose a new crossing minimization strategy that hingesupon an approximation preserving reduction to the feedback arc set problemand uses as a subroutine a simple approximation algorithm for this problem:the algorithm has performance ratio bounded by the length of a longest simplecycle and is likely to work well on penalty digraphs, that are usually very denseand contain many short cycles.

    We also perform an experimental analysis of the performances of our al-gorithm w.r.t. previous ones both on randomly generated bipartite graphs,on pathological instances for CP, and on real test sets. Performance indica-tors include number of edge crossings and running time, as well as structuralmeasures of the problem instances. The computational experiments clearlyseparate the behavior of the algorithms, highlighting tradeoffs between qual-ity of the solutions and running time and showing the effectiveness of ourstrategy for crossing reduction, bar higher running times on dense instanceswith unbalanced layers.

    These results are presented in Chapter 6 and part of them appeared in theProceedings of the 2nd International Workshop on Algorithm Engineering andExperiments (ALENEX’00) [31]. The feedback arc set algorithm is presentedin [30]. Paper [31] has been also invited to the special issue of ACM Journalon Experimental Algorithmics devoted to selected papers from ALENEX’00.

    Constrained hierarchical drawings

    A useful addition to optimizing common aesthetic criteria in visualizing largegraphs is the possibility of imposing constraints to the drawing: constraintsallow the user to customize the layout of specific subgraphs or subdrawingsaccording to his/her actual necessities, thus making concentration on localdetails easier.

    Obtained results. We study the realizability problem of hierarchical draw-ings, i.e., the problem of deciding if a hierarchical drawing of a graph existssatisfying a given set of non-crossing constraints between pairs of edges. Wefirst investigate the computational complexity of some variants of hierarchicalrealizability, proving their NP-completeness. We then describe approximationstrategies for the optimization versions of these problems, which require tofind a maximum realizable subset of constraints. Similarly to the crossingminimization problem, we exploit a strong relation between our problem and

  • 1.2. ORIGINAL CONTRIBUTIONS OF THE THESIS 9

    the problem of computing maximum acyclic subgraphs of directed graphs: inparticular, we prove an approximation-preserving reduction that maintains thesame approximation ratio and implies that some useful variants of hierarchicalrealizability can be approximated with performance ratio < 2. We finally showhow algorithms for the variants that we consider can be used as subroutinesto boost an existing algorithm for computing hierarchical drawings in order tosupport both non-crossing constraints and constraints concerning the relativepositions of vertices within each layer.

    These results are presented in Chapter 7 and appeared in the Proceedingsof the 7th Annual International Computing and Combinatorics Conference(COCOON’01) [52].

  • 10 CHAPTER 1. INTRODUCTION

  • Chapter 2

    Preliminaries

    In this chapter we give preliminary definitions and notation used throughoutthe thesis. Besides recalling some standard graph terminology (Section 2.1)and definitions in approximation theory (Section 2.2), the rest of the chapterfocuses on providing the background on hierarchical decompositions of graphsand hierarchical drawings, respectively. Section 2.3 discusses the definitionof hierarchy tree and the related concepts of covering and contraction of agraph on a hierarchy tree, pointing out some relevant properties. Section 2.4introduces the hierarchical drawing convention and describes a well-knownalgorithm for computing such drawings. In the course of the presentation wealso highlight design and aesthetic criteria that should be fulfilled by hierarchytrees and hierarchical drawings, respectively, for their effective deployment ingraph visualization applications.

    2.1 Graph terminology

    We use standard graph terminology (see, e.g., [68]) and we limit to recallhere the definitions and concepts that are most important for the analysis ofthe algorithms presented in this thesis. We will be usually concerned withunweighted undirected graphs; when direction of edges or edge weights mustbe considered, we will state it explicitly. Let G(V,E) be a n-vertex undirectedgraph with vertex set V and edge set E ⊆ (V2

    )

    . If E =(V

    2

    )

    or E = ∅, graph Gis called clique or independent set, respectively. The density of G is the ratiobetween |E| and the upper bound (n2

    )

    on number of edges.

    Definition 2.1 [Distance, girth] Let G(V,E) be an undirected graph andlet u, v ∈ V . The distance d(u, v) is the length of any shortest path joining uand v in G, if any; otherwise d(u, v) = +∞. The girth of G is the length of ashortest cycle in G, if any.

    11

  • 12 CHAPTER 2. PRELIMINARIES

    33

    44

    44

    4

    4

    5

    5

    4

    4

    5

    5

    Radius (G) = 3

    Diameter (G) = 5

    Figure 2.1: Eccentricities of vertices of a graph G. The centers of G are black.

    Definition 2.2 [Radius, diameter, center] Let G(V,E) be an undirectedconnected graph. The eccentricity e(v) of a vertex v ∈ V is the maximumlength of a shortest path departing from v, i.e., e(v) = maxu∈V d(v, u).

    Radius and diameter of G are the minimum and maximum eccentricity ofits vertices: radius(G) = minu∈V e(u) and diameter(G) = maxu∈V e(u). Avertex v ∈ V is a center of G if and only if e(v) = radius(G).

    Figure 2.1 shows a graph, reporting the eccentricity of each vertex andhighlighting the set of centers. Given a tree T rooted at vertex r, depth(T ) =e(r) and, for each vertex v, level(v) = d(v, r). We also denote with size(T )the number of vertices of T , with leaves(T ) the set of its leaves, and withchildren(v) the set of children of a vertex v.

    Definition 2.3 [Centroid] A centroid of a n-vertex tree T is a vertex whoseremoval disconnects T into subtrees of size ≤ b n2 c.

    Every tree has one or two adjacent centers and one or two adjacent cen-troids [68]. Both centers and centroids can be easily computed in linear time.

    In the case of directed graphs we speak of arcs instead of edges to denoteordered pairs of vertices.

    Definition 2.4 [Feedback arc set] A feedback arc set of a directed graph Gis a set of arcs whose removal makes G acyclic.

    2.2 Approximation

    The theory of approximation algorithms have developed in response to theimpossibility of finding efficiently optimal solutions for many important op-timization problems. In particular, it is well-known that when a problem isNP-hard no polynomial time algorithm for it can exist unless P = NP (werefer the reader to [61] for basic definitions on computational complexity). Inthis scenario, the idea behind approximation algorithms can be resumed asfollows: the optimality of the solution can be sacrificed in favor of findingfeasible solutions that are “provably good” and can be computed efficiently.

  • 2.2. APPROXIMATION 13

    Clearly, it is desirable to design algorithms that return solutions as close tothe optimum as possible. Garey, Graham, and Ullmann [60] and later John-son [75] formalized the concept of approximation algorithm and performanceratio:

    Definition 2.5 A polynomial algorithm A for a minimization (maximization)problem P is a δ-approximation algorithm, δ > 1, if for every instance of P itdelivers a solution that is at most δ (at least 1δ ) times the optimum.

    δ is referred to as the approximation or performance ratio of the algorithm.Naturally, the closer it is to 1, the better. Note that δ is not necessarily a con-stant value: for instance, it may be a function of the input size. When aproblem admits an approximation algorithm with constant performance ra-tio, it is said to be in APX . For the purposes of this thesis, in addition toDefinition 2.5, we just need recall the definition of approximation-preservingreduction (in short, AP-reduction). We refer the interested reader to [7] for adetailed discussion of this topic.

    Definition 2.6 [AP-reduction] Given two optimization problems P1 andP2, we say that P1 is AP-reducible to P2 if two functions f and g and aconstant α ≥ 1 exist such that:

    1. for any instance x of P1 and for any δ > 1, f(x, δ) is an instance of P2;

    2. for any instance x of P1 and for any δ > 1, if there exists a feasiblesolution of x with respect to P1, then there exists a feasible solution off(x, δ) with respect to P2;

    3. for any instance x of P1, for any δ > 1, and for any solution y of f(x, δ)with respect to P2, g(x, y, δ) is a feasible solution of x with respect to P1;

    4. f and g are computable in polynomial time for any fixed δ;

    5. for any instance x of P1, for any δ > 1, and for any δ-approximatesolution y of f(x, δ) with respect to P2, g(x, y, δ) is a 1 + α(δ − 1)-approximate solution of x with respect to P1.

    Roughly speaking, Definition 2.6 allows us to compare the approximabilityproperties of any two optimization problems. In particular, it states that anyapproximate algorithm for problem P2 might be used to obtain an approxi-mate algorithm for problem P1 and establishes a linear relation between theperformance ratios of the two algorithms. Note that, if P1 is AP-reducible toP2 and P2 ∈ APX , then P1 ∈ APX , as well.

  • 14 CHAPTER 2. PRELIMINARIES

    2.3 Hierarchical decompositions of graphs

    Many naturally occurring graphs have associated semantics that induce a re-cursive partition of their vertex set. A direct example comes from the tele-phone call network, in which vertices correspond to telephone numbers andedges to calls placed: the hierarchy on the vertex set consists of the subdivi-sion of each number in country code, area code, exchanges, and so on. Thegraph of the WWW traffic has a similar structure: here the recursive partitionis based on domains and subdomains in the IP address space.

    In the rest of this section we give preliminary definitions and lemmas re-lated to hierarchical decompositions and we discuss design criteria that shouldbe satisfied for their effective deployment for the visualization of large graphs.

    2.3.1 Definitions and basic properties

    Inclusion relations between vertex sets in a hierarchical decomposition of agraph can be represented by means of a data structure known as hierarchytree (or inclusion tree). In this section we recall the definition of hierarchytree, we discuss the concepts of covering and of contraction of a graph on ahierarchy tree, and we state some basic properties.

    Definition 2.7 [Hierarchy tree] A hierarchy tree HT (N,A) associated toa graph G(V,E) is a rooted tree such that leaves(HT ) = V .

    For clarity, we refer to the elements of N as nodes of HT and to thoseof G as vertices of G. The nodes of HT are also called clusters: a cluster crepresents a set Vc of vertices of G, namely, the vertices that are the leaves ofthe subtree rooted at c. For any c ∈ N , we denote with S(c) the subgraph ofG induced by Vc. We say that the vertices in Vc are covered by c and we referto their number as cardinality of c. For brevity, we write u ≺ c to indicatethat a vertex u ∈ V is covered by a cluster c ∈ N . A singleton cluster coversa unique vertex, i.e., has cardinality 1.

    Two clusters c and c′ which are neither coincident nor ancestors of eachother are connected by a link if there exists at least an edge e = (u, v) ∈ Esuch that u ≺ c and v ≺ c′ in HT ; if more than one edge of this kind exists,we consider only a single link. In some cases it is useful to assign links with aweight as follows: w(c, c′) = |{(u, v) : u ≺ c and v ≺ c′}|.

    Definition 2.8 [Covering and contraction] Let HT (N,A) be a hierarchytree associated to a graph G(V,E). A set C ⊆ N is a covering of G on HT ifand only if ∀v ∈ V there exists unique c ∈ C such that v ≺ c. A contractionof graph G on HT is a graph W (C,L) such that:

    • C is a covering of G on HT ;

  • 2.3. HIERARCHICAL DECOMPOSITIONS OF GRAPHS 15

    1 2

    3 4 5 6

    7 8

    12 9 10 11

    r

    a b c

    d e f

    g

    (b) (c)

    d

    3

    e

    f

    b

    (a)

    1

    2 3

    78

    9

    11

    106

    45

    12

    d

    312

    g

    f

    b

    (d)

    Figure 2.2: (a) Graph G; (b) hierarchy tree of G and covering C = {3, d, b, e, f}on it; (c) contraction of G induced by C; (d) contraction obtained after ex-panding cluster e.

    • L = {(c, c′) | c, c′ ∈ C, c 6= c′, ∃(u, v) ∈ E : u ≺ c and v ≺ c′}.

    Trivial coverings consist of the root of HT and of the whole set of its leaves;for brevity, we denote the contractions of G corresponding to such coveringswith Wr and Wl, respectively. Figure 2.2a and Figure 2.2b show a 12-vertexgraph and a possible hierarchy tree of it, respectively. The internal nodes ofthe hierarchy tree are squared and, for clarity, no link is shown. A coveringconsisting of clusters {3, d, b, e, f} is highlighted on the hierarchy tree and thecorresponding contraction of the graph is depicted in Figure 2.2c. From nowon, when there is no ambiguity we will use the term view to indicate thevisualization of a contraction.

    Nodes and contractions in a hierarchy tree. W.l.o.g. we assume thatthe vertices covered by a cluster c in a hierarchy tree are a proper subset ofthe vertices covered by the parent of c. This means that in the hierarchy treethere is no node with a unique child and leads to the following lemma:

    Lemma 2.1 Let HT (N,A) be a hierarchy tree associated to a t-vertex graphG and let n be the number of its nodes. If no internal node of HT has a uniquechild, then n ≤ 2t− 1.

    Proof. The proof is by induction on t. The base step (t = 1) is trivial. Let usnow suppose by inductive hypothesis that any hierarchy tree with less than tleaves satisfies the thesis. Let v be an internal node of HT whose d childrenare all leaves. Starting from HT , we build a new tree HT ′ by removing fromHT all the children of v. Let n′ = n− d be the number of nodes of HT ′ andt′ = t − d + 1 the number of its leaves. As d ≥ 2, t′ < t, HT ′ satisfies theinductive hypothesis, and n′ ≤ 2t′ − 1. Hence n = n′ + d ≤ 2t′ − 1 + d =2(t− d + 1)− 1 + d = 2t− d + 1 ≤ 2t− 1, since d ≥ 2. 2

  • 16 CHAPTER 2. PRELIMINARIES

    Lemma 2.1 implies that adding a hierarchy tree on the top of a graphrequires only space linear in the number of vertices of the graph itself, yetproviding rich and structured information about it. This makes hierarchytrees well suited for supporting visualization operations on large graphs: be-sides providing high-level representations of a graph, hierarchy trees supportits exploration by performing expand and contract operations on the clustersvisualized at any instant of time. A sequence of expand and contract opera-tions, that we dub navigation of the hierarchy tree, corresponds to a sequenceof transformations of a contraction into another one. Figure 2.2d shows howthe contraction in Figure 2.2c changes after expanding cluster e. The num-ber of contractions in a hierarchy tree can be exponential, as proved by thefollowing lemma:

    Lemma 2.2 Hierarchy trees exist with n vertices and Ω(2n) contractions.

    Proof. Let HT (N,A) be a complete binary hierarchy tree with n nodesassociated to a graph G. Let w(n) be the number of contractions of G onHT . A contraction can be obtained by combining a contraction on the leftsubtree with a contraction on the right subtree or by simply considering theroot. Hence, w(n) = w(n2 )

    2 + 1. By means of simple calculations it can be

    proved that w(n) ≥ w(3)2log2(n/3) = 2n/3. 2

    2.3.2 Design criteria

    In this section we point out some design criteria that should be fulfilled byhierarchy trees so that graph visualization applications can substantially ben-efit from their usage. In particular, we distinguish between general clusteringcriteria, application-specific clustering criteria, and requirements on the struc-ture of the hierarchy tree.

    General clustering criteria. Navigating a hierarchy tree should help theviewer get insight into the relationships between vertices of the graph by sup-porting concentration on local details while providing an overview of impor-tant aspects of the context. The effectiveness of hierarchical decompositionsto visualizing large graphs strongly depends on a “good” recursive cluster-ing of the graph itself. For example, if clusters are generated not takinginto account the topology of the graph, no benefit may derive for the viewerfrom the clustering structure. Different optimization criteria can be consid-ered when building clusters, related, for instance, to the density of clustersor to their diameter. It is in general well accepted that good decompositionsshould exhibit a strong relationship between vertices in the same cluster anda low coupling between clusters. In other words, only vertices in the same“locality” of the graph should be grouped together to form a cluster, though

  • 2.3. HIERARCHICAL DECOMPOSITIONS OF GRAPHS 17

    there is no precise definition of locality, but only an intuition of the proper-ties desirable for it: common clustering approaches rely on graph theoreticalproperties, such as connectivity, density, degree, cuts, cores, and so on (see,for instance, [10, 38, 66, 69, 98, 114, 118]).

    Besides the locality requirement, it is also natural to ask a graph decompo-sition to preserve the relevant features and properties of the graph, so that theviewer can grasp them even observing a contraction of the graph. Since mostexisting graph drawing algorithms are targeted to special classes of graphs(see [35] for examples related to trees, series-parallel graphs, planar graphs),maintaining the graph theoretical properties of G at any level of abstractionalso offers another advantage: it makes it possible to use the same drawingalgorithm to visualize any contraction; using the same algorithm makes it inturn easier to preserve the mental map of the viewer when passing from a con-traction to another one, since different algorithms may produce very differentdrawings even of the same graph, according to the aesthetic criteria that theymanage to optimize (see [35] for examples). In Chapter 3 we will focus onthis topic by introducing the concept of structure-preserving hierarchical de-compositions and by addressing the problem of characterizing decompositionsthat preserve the properties of the original graph.

    Application-specific clustering criteria. More specific clustering crite-ria can be considered when the visualization is tailored to special-purposetasks. An example comes from the use of graph visualizations to guide thesearch for subgraphs that exhibit certain interesting properties. The possi-bility of integrating human and computer to solve optimization problems hasbeen successfully explored for problems such as interactive network partition-ing [88] and capacitated vehicle routing with time windows [6]. The mainreason for involving people in the optimization process relies in combining theprocessing speed of computers with the human ability in visual perceptionand abstraction. The user is typically presented with a “sketch” of a largegraph and is responsible for guiding an interactive solution refinement processfor the problem at hand. Defining suitable sketches and exploration opera-tors is fundamental to the success of the human-guided search process. Inour scenario, sketches are obtained from a suitable recursive decompositionof the graph and the search and focus operators available to the user consistof expansion and shrink of clusters and links. In this case, it is convenientto exploit decompositions that are tailored to the specific problem of interestand to use different decompositions for different goal-driven navigations. InChapter 5 we will exemplify this idea by addressing the problems of searchingfor large cuts and for dense subgraphs, respectively.

    Structural requirements. As suggested in previous works [38, 39, 53, 95],the viewer’s mental map should be preserved as much as possible during the

  • 18 CHAPTER 2. PRELIMINARIES

    navigation: in particular, the contraction obtained after an expansion/shrink,as well as its drawing, should not differ too much from the previous one. If thehierarchy tree is generated using a locality-preserving clustering algorithm, itturns out that the bigger is the number of contractions that can be gener-ated from the hierarchy tree, the smoother can be the transformation of acontraction into the next one: actually, a high number of contractions forcesthe viewer to perform many consecutive navigation steps, thus avoiding drasticchanges in the visualization. Under these conditions the viewer has better pos-sibilities of interaction with the visualization facility and his/her mental mapis better preserved. At this aim, the following structural properties appear tobe particularly relevant:

    • Limited degree: the expansion of a cluster should not result in the cre-ation of a high number of new nodes in the view. Indeed, adding all of asudden many new clusters could imply drastic changes in the drawing.

    • Small depth: traversing the hierarchy tree should not require too muchtime, but an excessively small depth (e.g., constant) is indicative of bigdifferences between consecutive views. A logarithmic depth appears tobe a reasonable choice.

    • Balancing: nodes on the same level of the hierarchy tree should havesimilar cardinalities. In this way, any navigation from the root to a leaftakes approximately the same time, independently of the followed path.

    We remark that these structural properties deserve special attention evenin other application settings. For instance, in distributed computations, whereclusters correspond to processors and vertices to tasks to be performed, havinga bounded number of clusters of almost equal size enhances locality, decreasescommunication, and guarantees better load balancing. In Chapter 4 we willaddress the problem of designing tree decomposition algorithms that optimizedegree, depth, and balancing measures on the hierarchy tree, while preservingthe tree-structure of any contraction.

    2.4 Hierarchical drawings

    Hierarchical drawings of graphs are polyline layered drawings, where verticesare constrained to lie on a set of parallel lines, called layers, and edges arerepresented as polygonal chains. An example is given in Figure 2.3. Edgescan span more than two layers: we will call the number of layers traversed byan edge as edge span (for instance, edge (12, 2) has span 3). A proper layeredgraph has no edge with span greater than 1. Observe also the existence ofedge bends and of edge crossings: as an example, edge (10, 3) has a bend on

  • 2.4. HIERARCHICAL DRAWINGS 19

    L0

    L1

    L2

    L3

    1 2

    3 4 5 6

    7 8 9

    10 11 12

    Figure 2.3: Hierarchical drawing of a directed graph with height = width = 4.

    layer L2 and edges (11, 7) and (10, 8) cross. Height and width of a hierarchicaldrawing are the number of layers and the maximum number of vertices in thesame layer, respectively. For brevity, we call bipartite straight-line drawing ahierarchical drawing with height 2.

    It is typical to use hierarchical drawings for visualizing large directedgraphs arising in a variety of fields, such as procedure call dependencies andclass hierarchies in software engineering, PERT diagrams in project planning,is-a relationships in knowledge representation, and several structures fromeconomic and social sciences and from graphic user interfaces. Actually, ifcompared to other drawing conventions (e.g., orthogonal or straight-line rep-resentations), they offer a major advantage: since edges usually point to thesame direction (e.g., towards the top of the display device as in Figure 2.3), theuser can more easily trace paths in the graph by exploring it by moving only inone direction (e.g., vertically), instead of two. For this reason width-restrictedlayered drawings are considered to be more suited at visualizing large graphsthan other drawing paradigms [17].

    2.4.1 Layout aesthetic criteria

    It is well-known that the viewer can effectively benefit from graph drawingsonly if they are clear and readable; for this reason many different aestheticcriteria have been proposed and studied in the graph drawing field. Satisfyingthese criteria gives rise to optimization problems that are typically NP-hard.In the following we summarize the aesthetic criteria that appear to mostlyimpact the readability of layouts of directed graphs. We remark that evaluat-ing the quality of visualizations necessarily requires human participation; ourobservations are in fact based on empirical studies of human understanding ofgraphs drawn using various layout aesthetics, such as [106].

    We already observed that edges pointing towards the same direction repre-sent a valuable feature of the drawing. Clearly, a layered drawing of this kindcan exist if and only if the graph contains no directed cycle; if this is not thecase, the number of edges “against the flow” should be minimized. The widthof the drawing should be also kept bounded, so that any layer can completely

  • 20 CHAPTER 2. PRELIMINARIES

    fit in the display device with its vertices arranged, e.g., horizontally. Thewidth size should be however traded-off with the average or maximum edgespan, because long edges make the exploration of the graph more difficult. Notlast, a small number of edge crossings has proven to be one of the fundamentalaesthetic criteria that increase the understandability of the drawing.

    2.4.2 The hierarchical approach

    The optimization of the aesthetic criteria discussed in Section 2.4.1 was theguideline followed in the design of a well-known method for drawing directedgraphs, usually denoted hierarchical approach. This method was first presentedin 1981 by Sugiyama, Tagawa, and Toda [115] and has been successively refinedin many other works and implemented in several software systems [45, 59, 103].The hierarchical approach is applied to directed graphs, regardless of theirgraph-theoretical properties, and consists of four main steps:

    1. Cycle removal. If the digraph is not acyclic, temporarily reverse thedirection of some edges in order to eliminate all the directed cycles.

    2. Layering. Assign the vertices of the digraph to layers. Dummy verticesare added for arcs spanning more than two levels.

    3. Crossing reduction. Order the vertices within each layer so as to mini-mize the number of edge crossings.

    4. Coordinate assignment. Choose vertical and horizontal coordinates forvertices and edge bends. At this point the direction of the reversed arcsis also restored.

    In the following we briefly discuss the complexity of each step and we reviewthe algorithmic techniques most frequently adopted for finding a solution.

    It is easy to prove [35] that obtaining an acyclic digraph by reversing aminimal set of edges is equivalent to obtaining an acyclic digraph by deletingedges, i.e., to finding a feedback arc set of the digraph. Since this problem isNP-complete [61], effective heuristics or approximation algorithms are needed.The best known approximation algorithm [48, 112] achieves performance ratioO(log n log log n), where n is the number of vertices of the digraph. Due tothe necessity of solving a linear program, this algorithm is too computation-ally demanding, and a simple greedy approach that runs in linear time andguarantees to remove at most m/2− n/6 edges is often preferred [43].

    As observed before, having width-restricted hierarchical drawings is cru-cial for effectively exploring large graphs. The problem of finding a layeringwith minimum width, subject to having minimum height, is NP-complete,but a very effective solution strategy is known in literature. The algorithm,

  • 2.4. HIERARCHICAL DRAWINGS 21

    mutuated from the theory of multiprocessor scheduling [25], takes as input themaximum allowed width ω and returns a layering with width at most ω thatapproximates the optimal height by a factor 2 − 2/ω [85]. At this point it isworth observing that the typical definition of width ignores the space neededfor edge bends, that should be instead considered as dummy vertices: some re-cent results on width restricted layering of acyclic digraphs with considerationof dummy vertices are reported in [17].

    The number of crossings is usually reduced by employing a layer-by-layersweep heuristic [35]. Let h be the number of layers after step 2. First, anordering of vertices on layer L1 is chosen; then, for i = 2 to h, the verticesof layer Li are ordered keeping the ordering of Li−1 fixed: the ordering of Limust be chosen so as to minimize the number of edge crossings. In this way,the crossing minimization step is reduced to solving the following problem:given a two-layered graph and a permutation of the vertices on a layer, finda permutation of the vertices on the other layer that minimizes the numberof edge crossings. We will refer to this problem as the one-sided crossingminimization problem. We defer a detailed overview to the literature on thisproblem to Chapter 6, where new results are also presented.

    The coordinate assignment step, finally, chooses an x-coordinate for eachvertex without perturbing the ordering established in the crossing reductionstep. (Note that y-coordinates are given by the layering.) The aim here is tominimize edge bends for edges with span greater than 1 and to draw edgesas straight as possible. The problem is usually formulated as a quadraticassignment problem (see [35] and the references therein), but unfortunatelyits solution requires considerable computational resources.

    A useful addition to optimizing common aesthetic criteria in visualizinglarge graphs is the possibility of imposing constraints to the drawing: con-straints allow the user to customize the layout of specific subgraphs or sub-drawings according to his/her actual necessities, thus making concentrationon local details easier, even if the appearance of the entire drawing may dete-riorate. Typical kinds of constraints require to place a vertex at the center oron the boundary of the drawing, to represent a given path vertically or hori-zontally aligned, or to force some pairs of edges not to cross. In spite of quiteextensive research on constraint satisfaction in the orthogonal and straight-linedrawing paradigms, many aspects of constrained hierarchical drawings havenot been explored so far. Some authors have studied methods to force verticeson the same layer to stay close to each other in the crossing reduction stepor vertices on different layers to be vertically aligned during the x-coordinateassignment step (see [35] and the references therein). In Chapter 7 we considerother kinds of constraints, concerned with the relative positions of vertices onthe same layer and to forbidden edge crossings, and we show how they can besupported by the hierarchical approach.

  • 22 CHAPTER 2. PRELIMINARIES

  • Chapter 3

    Structure-preserving

    hierarchical decompositions

    As discussed in Chapter 2, hierarchical decompositions should fulfill severalrequirements that are important for their effective deployment in graph vi-sualization. In particular, clusters should be generated taking into accountthe topology of the graph and the decomposition should preserve the rele-vant features and properties of the graph, so that the viewer can grasp themeven observing a high-level representation. This chapter is concerned withthe problem of characterizing structure-preserving hierarchical decompositionsand addresses the following question:

    Given a hierarchy tree of a graph G and a graph property P satisfied by G,does any contraction of G on the hierarchy tree satisfy property P?Related work. Though the question above is fairly natural in our context,surprisingly, it seems that it has not been addressed in literature so far. Someworks extend typical graph theory concepts to clustered graphs, but theymostly focus on the special case of planar graphs. In particular, Feng etal. [51] define the class of compound planar graphs. Differently from ourapproach, in their drawings the graph is entirely represented and the clusteringstructure is shown on the top of its visualization, with clusters drawn as simpleregions of the plane (an example is provided by Figure 3.1a). A graph isdubbed c-planar if and only if it has a drawing with no edge crossings oredge-region crossings under this drawing convention. A characterization ofc-planar clustered graphs and an efficient algorithm for c-planarity testing arepresented in [51]. Partitioning algorithms aimed at satisfying the c-planarityconditions and algorithms for drawing c-planar clustered graphs have been alsodesigned [38, 41, 50]. Unfortunately, displaying the graph and its clusteringstructure entirely can generate very cluttered drawings and is not well suitedin case of large graphs visualization.

    23

  • 24 CHAPTER 3. STRUCTURE-PRESERVING DECOMPOSITIONS

    Our results. In this chapter we formalize the concept of structure-preservinggraph decompositions. In Section 3.1 we introduce the general notion ofP-validity of hierarchical decompositions of graphs with respect to a givenproperty P: this notion reflects the similarity between the topological struc-ture of the original graph and of its contractions on the hierarchy tree and canbe used to measure the quality of different hierarchical decompositions of thesame graph. In Section 3.2 we address the problem of characterizing P-validhierarchy trees: namely, we present conditions on the structure of the clus-ters necessary and sufficient to guarantee the P-validity of the decompositionwhen the original graph is a tree and property P is the acyclicity. Althoughour structure theorem is interesting on its own right, in Section 3.3 we showthat it has an interesting algorithmic implication: given a hierarchy tree HTassociated to a n-vertex tree, it is possible to check the validity of HT in poly-nomial time (in particular, in time O(n2)), despite the fact that there existhierarchy trees containing Ω(2n) distinct contractions (see Lemma 2.2).

    3.1 The concept of P-validityIn this section we formalize the concept of structure-preserving graph parti-tions by introducing the concept of P-validity of contractions and hierarchytrees with respect to a graph property P. We start with a motivating exam-ple. In Figure 3.1 two different hierarchical decompositions associated to a8-vertex chain are considered (Figure 3.1a and Figure 3.1d). The correspond-ing hierarchy trees HT1 (Figure 3.1b) and HT2 (Figure 3.1e) are both completebinary trees of height 3 and differ only in the permutation of their leaves. Fig-ure 3.1c and Figure 3.1f report three contractions related to coverings {E, F},{E, C, D}, and {A, B, C, D} in the two decompositions, respectively. Observe thatall the contractions built from HT1 maintain the structural property of theoriginal graph to be a chain, while the contractions obtained from HT2 loosethis property introducing cycles, up to become even a clique. The notion ofP-validity aims at characterizing the “semantic” differences of hierarchy treeswith “syntactically” similar or even identical structure.

    Definition 3.1 [P-validity of contractions] Let HT (N,A) be a hierarchytree associated to a graph G(V,E) and let P be a property satisfied by G. Acontraction of G on HT is P-valid if and only if it satisfies property P.

    As Definition 3.1 is parametric in property P, different notions of validitymay be given for different classes of graphs. For example, we may require acontraction of a bipartite graph to be bipartite or a contraction of a planargraph to be planar: in these cases, P is the bipartiteness and the planarityproperty, respectively. Even more sophisticated definitions for P can be con-sidered: see, for instance, the c-planarity property [51].

  • 3.1 THE CONCEPT OF P-VALIDITY 25

    (a)

    (d)

    (c)

    (f)

    (b)

    A B C D

    E F

    E C D

    G

    E F

    A B C D

    542 3 6 71 8

    E F

    E C D

    A B

    C D

    G

    E F

    A B C D

    374 2 6 51 8

    (e)

    5

    4

    2

    3

    6

    7

    1

    8

    A

    B

    C

    DE

    F

    G

    5

    1

    3 4

    6

    7

    2 8

    A

    B

    C

    D

    E

    F

    G

    Figure 3.1: Hierarchical decompositions, hierarchy trees, and different con-tractions of a 8-vertex chain.

    The generalization of the concept of P-validity from a single contractionto the whole hierarchy tree is straightforward.

    Definition 3.2 [P-validity of hierarchy trees] Let HT (N,A) be a hierar-chy tree associated to a graph G(V,E) and let P be a property satisfied by G.HT is P-valid if and only if all the contractions of G obtained from HT areP-valid.

    In the rest of this chapter we study the case where the clustered graph isa tree and property P is the acyclicity (as we will see in Section 3.2, connec-tivity is trivially guaranteed). For brevity, we speak of valid contractions andhierarchy trees under these hypotheses. It is worth observing that there existnon-valid hierarchy trees. In particular, it is just the notion of validity thatmakes the difference between the hierarchy trees in Figure 3.1: the upper hi-erarchy tree is valid, while, among the 26 possible contractions obtained fromthe lower one, only 3 are valid (i.e., Wr, Wl, and the contraction induced bycovering {E, F}). Note that Wr and Wl are always valid for any hierarchy tree.

    3.2 A structure theorem for valid hierarchy trees

    In this section we present a structural characterization of valid hierarchy trees,i.e., we give necessary and sufficient conditions for a hierarchy tree to bevalid. We start with preliminary definitions and lemmas useful for proving

  • 26 CHAPTER 3. STRUCTURE-PRESERVING DECOMPOSITIONS

    c1 c2 c3 ch-1 ch

    r1l1 r2l2 r3l3 rh-1lh-1 rhlh......

    Figure 3.2: Vertices, edges, clusters, and links involved in cycle C in the proofof Lemma 3.2.

    our structure theorem. Lemma 3.1 studies the connectivity of contractionsobtained from hierarchy trees of connected graphs.

    Lemma 3.1 Each contraction obtained from a hierarchy tree associated to aconnected graph is connected.

    Proof. The existence of a disconnected contraction would imply the non-connectivity of the graph. 2

    Let HT (N,A) be a hierarchy tree associated to a free tree T (V,E). Foreach u, v ∈ V , let π(u, v) be the path joining u and v in T and let δ(u, v) be thelength of this path. The next definition introduces the concept of broken pairof a cluster: as we will see, broken pairs play a key role in the characterizationof valid hierarchy trees. We recall that S(c) denotes the subgraph of T inducedby the vertices covered by cluster c.

    Definition 3.3 [Broken pair] Let c be a node of a hierarchy tree HT asso-ciated to a free tree T . Let u and v be two vertices of T covered by c.

    • u, v are a broken pair of cluster c if and only if they are neither coincidentnor connected in S(c).

    • A broken pair u, v is a minimum-distance broken pair of c if and only ifw 6≺ c, ∀w ∈ π(u, v) such that w 6= u, v.

    Lemma 3.2 Let W be a contraction of tree T on a hierarchy tree HT . If Wis not acyclic, then each cycle involves at least one cluster containing a brokenpair.

    Proof. Let C = (c1, . . . , ch) be a cycle in W . Each cluster ci ∈ C is endpointof two links (ci−1, ci) and (ci, ci+1), respectively (to simplify the notation weassume h + 1 = 1). Let (ri−1, li) and (ri, li+1) be the tree edges which inducesuch links. For each cluster ci we therefore identify two vertices covered byit, named li and ri (see Figure 3.2). If no cluster of C involves a broken pair,for each i ∈ [1, h] vertices li and ri are either coincident or connected in S(ci).This implies the existence of a cycle in tree T , that is a contradiction. 2

  • 3.2 A STRUCTURE THEOREM FOR VALIDITY 27

    c

    z

    ...

    u v

    w

    >0

    Figure 3.3: Cycle that proves the necessary condition of Theorem 3.1.

    Theorem 3.1 [Validity structure theorem] Let T (V,E) be a free tree andlet HT (N,A) be a hierarchy tree associated to T . HT is valid if and only iffor each minimum-distance broken pair u, v of HT δ(u, v) = 2.

    Proof. We first prove the necessary condition:

    ∃u, v in HT : u, v is a minimum-distance broken pair & δ(u, v) > 2 ⇒HT is not valid

    Let c be a node of HT such that u, v is a minimum-distance broken pair in c.Let us consider the contraction W (C,L) where C = {c}∪{{x}, x ∈ V and x 6≺c}, i.e., covering C consists only of singleton clusters, except for cluster c. Letz and w be the neighbors of u and v, respectively, in π(u, v). Since u, vis a minimum-distance broken pair of c, z, w, and all the other vertices inπ(u, v) are not covered by c. Moreover, δ(u, v) > 2 implies δ(z, w) > 0, i.e.,z 6= w. Hence, as shown in Figure 3.3, contraction W contains the cycle(c, {z}, . . . , {w}, c), which proves that HT is not valid.

    We now focus on the sufficient condition, proving that a contradiction canbe derived if we assume that HT is not valid while satisfying the propertyon the minimum-distance broken pairs formulated in the statement of thetheorem. Namely, our hypotheses are that each minimum-distance brokenpair has distance 2 in T and that HT contains a non valid contraction W , i.e.,a contraction that is not a tree. Due to Lemma 3.1 W must contain a cycle,say C.

    The general idea of the proof is to convert W into another contraction W ′,still existing on the hierarchy tree, such that: (a) the number of singletons ofW ′ is strictly greater than the number of singletons of W ; (b) the cycle C inW is changed into a cycle C ′ in W ′. The sequence of manipulations that weperform has finite length, since the number of singletons is clearly boundedabove by |V |. We can therefore prove that at some finite step during thisprocess we find a contradiction due to Lemma 3.2, since we obtain a cycleinvolving no broken pair.

    Let C = (c1, . . . , ch) be a simple cycle of length h in W and let clusters,links, vertices, and edges be defined as in the proof of Lemma 3.2 (see also

  • 28 CHAPTER 3. STRUCTURE-PRESERVING DECOMPOSITIONS

    ci

    li=v1

    c1

    ci-1

    ri-1li-1...

    ci+1

    ri+1li+1...

    u1

    u2

    v2

    z1

    uk-1

    vk-1

    ri=uk

    vk

    ck-1

    zk-1...

    ' '

    Figure 3.4: Broken pair li, ri contained in cycle C in the proof of the sufficientcondition of Theorem 3.1.

    Figure 3.2). Due to Lemma 3.2 a broken pair li, ri must exist in C, thoughli, ri is not necessarily a minimum-distance broken pair.

    Let us consider the path from li to ri in T , which is unique and not com-pletely contained in S(ci). On this path we can univocally identify a set of kvertices uj , for 1 ≤ j ≤ k, such that uj ≺ ci but its successor on π(li, ri) is notcovered by ci. Analogously, we can univocally identify a set of k vertices vj ,for 1 ≤ j ≤ k, such that vj ≺ ci but its predecessor on π(li, ri) is not coveredby ci. Observe that it could be uj = vj for some j, but vertices with differentindexes always belong to different connected components of S(ci), that we callT1. . .Tk. (The configuration is illustrated by Figure 3.4.)

    It is clear that v1 = li and uk = ri. Furthermore, the pairs uj , vj+1,for 1 ≤ j ≤ k − 1, are minimum distance broken pairs of cluster ci and byhypothesis δ(uj , vj+1) = 2. Let us call zj the unique vertex of T in the pathbetween uj and vj+1 and let c

    ′j be the cluster of W that covers zj.

    We change contraction W into W ′ by expanding cluster ci at the singletonlevel, i.e., by substituting ci with the set of singleton clusters correspondingto the vertices in S(ci). In the following we prove that we are able to exhibitin W ′ a simple cycle C ′.

    Let us first consider the cycle Ĉ shown in Figure 3.5. Ĉ clearly exists inW ′, being obtained from C by unrolling π(li, ri) (compare vertices and clustersinvolved in Figure 3.4 and in Figure 3.5, respectively). However, Ĉ is notnecessarily simple. W.l.o.g. we can assume that c′j 6= c′s ∀j, s ∈ [1, k − 1],j 6= s. We can always reduce to this situation as follows: while Ĉ containsa pair j, s such that 1 ≤ j < s ≤ k − 1 and c′j = c′s, substitute the path(c′j , . . . , c

    ′s) with the path (c

    ′j). After this operation all the c

    ′j are distinct.

    If ∀j ∈ [1, k − 1] c′j 6∈ C, C′ = Ĉ is a simple cycle. Otherwise ∃j ∈ [1, k − 1]such that c′j ∈ C and two cases may happen:

  • 3.3. P-VALIDITY TESTING 29

    ci-1

    ri-1li-1

    ci+1

    ri+1li+1

    p

    c1'

    c2'

    ck-1'

    ck-2'

    p1

    pk-1

    p2

    pk

    v1 u1 v2u2

    vk-1uk-1uk vk

    Figure 3.5: Cycle Ĉ in the proof of the sufficient condition of Theorem 3.1.

    • c′1 = ci−1: change Ĉ by replacing the non-simple path (ci−1, p1, c′1, p2)with the simple path (ci−1, p2). If k − 1 = 1 or ∀j ∈ [2, k − 1] c′j 6∈ C,then the modified Ĉ is a simple cycle and we choose C ′ = Ĉ. Otherwise,let s be the smallest value in [2, k − 1] such that c′s ∈ C, i.e., c′s = ct forsome t ∈ [1, h], t 6= i. Moving clockwise on Ĉ, we find the simple cycleC′ = (ci−1, p2, c′2, . . . , ps, c′s = ct ∼ ci−1), where ∼ indicates a subpath ofp. (See Figure 3.5.)

    • c′1 6= ci−1: let s be the smallest value in [1, k − 1] such that c′s ∈ C, i.e.,c′s = ct for some t ∈ [1, h], t 6= i. If s = 1 then C ′ = (ci−1, p1, c′1 = ct ∼ci−1) is a simple cycle in W

    ′. Note that the length of the path (ct ∼ ci−1)is greater than or equal to 1, because ct = c

    ′1 6= ci−1. If s > 1 then let

    C′ = (ci−1, p1, c′1, p2, c′2, . . . , ps, c′s = ct ∼ ci−1). In this case it could bect = ci−1; however, the fact that the path from ci−1 to c

    ′s has length ≥ 3

    guarantees C ′ to be a cycle of W ′. By construction C ′ is simple.

    In any case we are therefore able to find a contraction W ′ containing moresingletons than W and to exhibit a simple cycle C ′ in W ′. Iterating this rea-soning, we obtain either a cycle with no broken pairs, which is a contradictiondue to Lemma 3.2, or a cyclic contraction containing only singleton clusters,i.e., Wl. This is also a contradiction because Wl must be acyclic being equalto T . 2

    3.3 P-validity testingRoughly speaking, the number of valid contractions in a hierarchy tree mea-sures how much the hierarchy tree reflects the topology of the graph and canbe used to rank different hierarchy trees associated to the same graph. It istherefore important to be able to check efficiently the validity of contractionsand of hierarchy trees. In this section we present efficient testing algorithms.

  • 30 CHAPTER 3. STRUCTURE-PRESERVING DECOMPOSITIONS

    procedure ContractionValidity (v : vertex of tree T )1. begin2. visited[v]← true3. for each u ∈ Adj[v] : visited[u] = false4. begin5. if parent[cluster[u]] = undefined6. then parent[cluster[u]]← cluster[v]7. else begin8. if cluster[u] 6= cluster[v] and9. parent[cluster[u]] 6= cluster[v] and10. parent[cluster[v]] 6= cluster[u]11. then return failure12. end13. ContractionV alidity(u)14. end15. end

    Figure 3.6: Contraction computation and validity checking.

    Testing the validity of contractions. This is a very simple task: Fig-ure 3.6 reports the pseudocode of an algorithm for computing the contractionassociated to a covering while testing its validity. Given a tree T , a related hi-erarchy tree HT , and a covering C on it, the algorithm visits T trying to buildthe parent vector of the contraction W (C,L) with vertex set C and link set L.A contradiction may happen if one tries to assign two different parents to thesame cluster. At the beginning, Procedure ContractionValidity is called ona vertex of T whose parent is defined to be NULL. For each vertex v of the tree,cluster[v] contains the node of C that covers v and parent[cluster[v]]represents a link of L. The correctness of the algorithm is guaranteed by thefollowing lemma:

    Lemma 3.3 Algorithm ContractionValidity fails if and only if contractionW (C,L) associated to covering C is not valid. In case of validity of W , thealgorithm correctly returns the set of links L.

    Proof. We first remark that the algorithm is a visit of the input tree T andthat the visit always moves from a vertex v of T to a neighbor of v; then, atany instant, the contraction induced by the visited clusters is connected. Iftwo adjacent vertices are in the same cluster, they can never raise a failure.Then, the only way to introduce a cycle in W , i.e., to make W not valid, isto discover adjacency between clusters which are not related each other by aparent relationship (see lines 8-10 in Figure 3.6). The content of the parentvector of a valid contraction is guaranteed to be correct by the visit of T . 2

  • 3.3. P-VALIDITY TESTING 31

    (a)

    r

    v1

    v2

    u

    (b)

    Figure 3.7: Checking the validity of hierarchy trees: (a) labeled tree T , withblack vertices having label in and grey vertices having label out; (b) tree Tafter pruning.

    Procedure ContractionValidity clearly requires time O(t), where t is thenumber of vertices in tree T . Anyway, it cannot be directly used to check thevalidity of the hierarchy tree HT as Definition 3.1 would suggest, because thenumber of contractions may be exponential (see Lemma 2.2). In the followingwe suggest a polynomial algorithm for solving this problem.

    Testing the validity of hierarchy trees. Theorem 3.1 is the backbone ofour polynomial time algorithm for checking the validity of hierarchy trees. LetT (V,E) be a t-vertex free tree and let HT (N,A) be a hierarchy tree associatedto T . If we root T at any vertex, for any v ∈ V let T (v) denote the subtree ofT rooted at vertex v. For each cluster c of the hierarchy tree, the algorithmworks as follows:

    1. Root tree T at a vertex r such that r ≺ c.

    2. Label the vertices of T as in or out vertices, according to the fact thatthey are covered by cluster c or not.

    3. Prune T : ∀v ∈ V remove the subtree T (v) if and only if it contains onlyout-labeled vertices.

    4. If the pruned tree contains two adjacent out-labeled vertices returnfailure, otherwise return success.

    It is easy to see that steps 1-4 can be implemented in time O(t). Since they arerepeated on each cluster of HT and the number of such clusters is O(t) (seeLemma 2.1), the total time required by the algorithm is O(t2). The correctnesscan be established as follows. If the pruned tree contains no adjacent out-labeled vertices, then it is easy to see that each minimum-distance broken pairis separated by exactly an out-labeled vertex, therefore satisfying the conditionstated in Theorem 3.1. Vice-versa, let us assume that there exist in the pruned

  • 32 CHAPTER 3. STRUCTURE-PRESERVING DECOMPOSITIONS

    tree two adjacent out-labeled vertices, say v1 and v2. W.l.o.g. let v1 be thevertex with smaller distance from the root r. As the algorithm works, vertexv2 has not been removed because T (v2) contains at least an in-labeled vertex;let u be an in-labeled vertex of T (v2) with smallest depth. r,u is clearly aminimum-distance broken pair, but its distance is greater than 2, as the pathjoining r and u in T contains both v1 and v2 (see also Figure 3.7). HT istherefore not valid due to Theorem 3.1. This implies the following theorem:

    Theorem 3.2 Let T (V,E) be a t-vertex free tree and let HT (N,A) be a hi-erarchy tree associated to T . The validity of HT w.r.t. T can be checked inO(t2) time.

    3.4 Concluding remarks

    In this chapter we have introduced the concept of P-validity of contractionsand hierarchy trees. This concept reflects the similarity between the topolog-ical structure of the original graph and of its contractions on the hierarchytree. The P-validity can be considered as a measure of the quality of differentdecompositions of the same graph. Moreover, when the hierarchy tree is notgiven along with the graph itself, but is generated by a partitioning algorithm,the number of valid contractions can be used for comparing the quality of thedecompositions returned by different algorithms.

    We have studied the P-validity of hierarchical decompositions when theclustered graph is a tree and property P is the acyclicity. Under these hy-potheses, we have presented a structural characterization of the P-validity,i.e., conditions on the structure of the clusters necessary and sufficient toguarantee the P-validity of the whole hierarchy tree. Since these conditionscan be easily checked in linear time, our structure theorem implies that thevalidity of a n-vertex hierarchy tree can be verified in time O(n2), in spite ofthe fact that the number of contractions that can be derived from it may beexponential.

  • Chapter 4

    Tree decompositions via

    vertex deletion

    In this chapter we address the problem of computing hierarchical decompo-sitions of trees. Based on the design criteria described in Section 2.3 and onthe concept of P-validity introduced in Chapter 3, our goal is to find decom-positions that: 1) are valid; 2) have bounded degree; 3) have logarithmic depth;and 4) are balanced.

    Related work. Tree partitioning is a much studied topic in many areas. Wefirst recall that tree clustering procedures are useful subroutines for partition-ing generic graphs: they can be applied, for example, to the block-cut-vertextree of a graph in order to obtain a rough partition of the vertices. In ad-dition, tree-like structures frequently arise in many practical problems (e.g.,evolutionary trees and parse trees). Not last, there are many application set-tings where recursive tree decompositions have proven to be effective: a limitedlist of examples include parallel and distributed computations, operating sys-tems, external searching, allocation of service centers - such as police stations- in rural area maps, dynamic graph algorithms; we refer the interested readerto [11, 36, 57, 67] for further details. For these reasons, a lot of research hasbeen devoted since the 80’s to designing specific tree partitioning algorithmstailored to a variety of applications and with different optimization criteriain mind (see, e.g., [11, 12, 36, 67, 90, 104, 110] and the references therein).Independently of the optimized objective function, we can roughly distinguishtwo main approaches to tree partitioning, according to the fact that clustersare obtained by vertex or by edge deletion.

    A well-known technique based on edge deletion, the shifting technique, hasbeen presented in [104] and applied later on to many optimization problemson trees [3, 11, 105]: a partition of a tree is identified by associating cuts toits edges. Cuts are assigned via a sequence of shifts, i.e., basic operations

    33

  • 34 CHAPTER 4. TREE DECOMPOSITIONS

    that move a cut from an edge to an adjacent one; different shifting rulesallow it to optimize different functions. Other edge deletion algorithms fortree partitioning are described in [12, 57, 67]. In particular, [67] suggestsalgorithms for partitioning a n-vertex tree into g clusters such that the size ofeach cluster is in the range [(1− α2 )ng , (1 + α)ng ], where parameter α ∈ [0, 1] isgiven as input.

    At first sight, removing vertices may appear less flexible than removingedges: based on the degree of the deleted vertex, the tree may be disconnectedinto several subtrees of very different sizes and optimizing both cluster sizesand number of subtrees may be more difficult. However, an accurate choiceof the vertex to be removed (e.g., choosing a centroid or a center of the tree)allows it to guarantee upper/lower bounds on the size or on the diameter ofeach cluster. We refer to the tree decomposition used in [84] for an exampleof application of this approach.

    It is worth remarking that most of the aforementioned algorithms are notusually employed in a hierarchical fashion; hence, whenever applied recursively,they may not be good at optimizing properties of the hierarchy tree such asbalancing, depth, or degree. Moreover, the problem of building valid hierarchytrees is not considered at all: actually, it is not difficult to prove that most ofthe partitioning algorithms that find disconnected clusters do not guaranteethe validity property (see, e.g., the algorithms in [67]).

    Our results. We present efficient algorithms for computing hierarchical treedecom