visual analysis of hierarchical data using 2.5d …visual analysis of hierarchical data using 2.5d...

15
Visual Analysis of Hierarchical Data Using 2.5D Drawing with Minimum Occlusion Kazuya Haraguchi Seok-Hee Hong Hiroshi Nagamochi § Abstract In this paper, we consider 2.5D drawing of a pair of trees which are connected by some edges, repre- senting relationships between nodes, as an attempt to develop a tool for analyzing pairwise hierarchical data. We consider two ways of drawing such a graph, called parallel and perpendicular drawings, where the graph appears as a bipartite graph viewed from two orthogonal angles X and Y . We define the oc- clusion of a drawing as the sum of the edge crossings that can be seen in the two angles, and propose algorithms to minimize the occlusion based on the fundamental one-sided crossing minimization prob- lem. We also give some visualization examples of our method using phylogenetic trees and a mushroom database. 1 Introduction Background. Many existing data from biology or taxonomy are represented in the form of hierarchical structure; e.g., phylogenetic trees (also called evolutionary trees) in phylogeny, file systems of computers, and so on [13, 26]. In order to gain insight or hidden knowledge from them, it is required to visualize such hierarchical data effectively. In this paper, we aim to analyze data with hierarchical structure by visualizing it with graph drawing technique. Graph drawing has been extensively studied over the last twenty years due to its popular application for visualization in VLSI layout, computer networks, software engineering, social networks and bioinformatics. As a result, many algorithms and methods are available (see [4, 28, 30, 32]). To drawing graphs automatically and nicely, we need to define aesthetic criteria in 2D drawings and occlusions in 3D drawing in a mathematical way. In 2D graph drawings, such criteria have been well studied and criteria such as the number of edge crossings and the number of symmetries play a key role in designing 2D drawing algorithms. As mentioned in many literature (e.g., [19]), edge crossings is a significant criterion for readability. Recently algorithms for reducing edge crossings in graph drawings are also used in some data analysis such as the rank aggregation problem [7]. However, in many methods for 3D drawings, edges can be drawn without creating a crossing in 3D, and it remains open to introduce a good mathematical definition for measuring occlusion of 3D drawings of graphs. Recently, a number of researchers have pointed out that full use of 3D layout may not be helpful. Instead, they suggested a 2.5D representation [36]. Further, a series of 2.5D graph drawing algorithms have been developed for various graph models [1, 5, 8, 12, 14, 17, 21, 22, 23, 24, 25]. The rich literature on 2.5D graph drawing methods motivates us to formally address the occlusion problem in 2.5D graph layout. Occlusion can be an important aesthetic criteria and evaluation criteria for good 3D and 2.5D graph drawing algorithms. Technical report 2009-010, April 6, 2009. Department of Information Technology and Electronics, Faculty of Science and Engineering, Ishinomaki Senshu University, Japan ([email protected]) School of Information Technologies, University of Sydney, Australia ([email protected]) § Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Japan ([email protected]) 1

Upload: others

Post on 01-Jun-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Visual Analysis of Hierarchical DataUsing 2.5D Drawing with Minimum Occlusion �

Kazuya Haraguchi† Seok-Hee Hong‡ Hiroshi Nagamochi§

Abstract

In this paper, we consider 2.5D drawing of a pair of trees which are connected by some edges, repre-senting relationships between nodes, as an attempt to develop a tool for analyzing pairwise hierarchicaldata. We consider two ways of drawing such a graph, called parallel and perpendicular drawings, wherethe graph appears as a bipartite graph viewed from two orthogonal angles X and Y . We define the oc-clusion of a drawing as the sum of the edge crossings that can be seen in the two angles, and proposealgorithms to minimize the occlusion based on the fundamental one-sided crossing minimization prob-lem. We also give some visualization examples of our method using phylogenetic trees and a mushroomdatabase.

1 Introduction

Background. Many existing data from biology or taxonomy are represented in the form of hierarchicalstructure; e.g., phylogenetic trees (also called evolutionary trees) in phylogeny, file systems of computers,and so on [13, 26]. In order to gain insight or hidden knowledge from them, it is required to visualize suchhierarchical data effectively. In this paper, we aim to analyze data with hierarchical structure by visualizingit with graph drawing technique.

Graph drawing has been extensively studied over the last twenty years due to its popular application forvisualization in VLSI layout, computer networks, software engineering, social networks and bioinformatics.As a result, many algorithms and methods are available (see [4, 28, 30, 32]).

To drawing graphs automatically and nicely, we need to define aesthetic criteria in 2D drawings andocclusions in 3D drawing in a mathematical way. In 2D graph drawings, such criteria have been wellstudied and criteria such as the number of edge crossings and the number of symmetries play a key rolein designing 2D drawing algorithms. As mentioned in many literature (e.g., [19]), edge crossings is asignificant criterion for readability. Recently algorithms for reducing edge crossings in graph drawings arealso used in some data analysis such as the rank aggregation problem [7].

However, in many methods for 3D drawings, edges can be drawn without creating a crossing in 3D,and it remains open to introduce a good mathematical definition for measuring occlusion of 3D drawings ofgraphs.

Recently, a number of researchers have pointed out that full use of 3D layout may not be helpful.Instead, they suggested a 2.5D representation [36]. Further, a series of 2.5D graph drawing algorithms havebeen developed for various graph models [1, 5, 8, 12, 14, 17, 21, 22, 23, 24, 25].

The rich literature on 2.5D graph drawing methods motivates us to formally address the occlusionproblem in 2.5D graph layout. Occlusion can be an important aesthetic criteria and evaluation criteria forgood 3D and 2.5D graph drawing algorithms.

�Technical report 2009-010, April 6, 2009.†Department of Information Technology and Electronics, Faculty of Science and Engineering, Ishinomaki Senshu University, Japan

([email protected])‡School of Information Technologies, University of Sydney, Australia ([email protected])§Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Japan

([email protected])

1

TA

TB

a1�� rA�

a2 a3a4

a5 a6 a7

a8

(angle Y )

(angle X)

TA

TB

(angle Y )

(angle X)

(a) (b)

Figure 1: (a) parallel drawing and (b) perpendicular drawing

TATB

TA

TB

a1a2 a3a4 a5 a6 a7 a8

(angle X) (angle Y )

Figure 2: Appearance of G in parallel drawing of Figure 1

However, defining a good model and concepts for occlusion in 3D graph drawing is a rather difficulttask. Further, the task of measuring occlusion can be more difficult. A preliminary research can be foundfrom [37], a study to find the best viewpoint of 3D graph drawing.

In this paper, we first propose how to display a pair of hierarchical data as a 2.5D drawing, introducing anew mathematical measurement of occlusion of 2.5D drawings based on edge crossings in bipartite graphs,and then present algorithms to the problem of finding a 2.5D drawing that minimizes the occlusion.

Mathematical Framework. We consider two hierarchical data having some relationships to each other.Let us denote by G � �TA�TB�E� a graph representing such a structure, where subgraphs TA � �VA�EA� andTB � �VB�EB� denote undirected rooted trees, and E denotes a set of edges connecting a node in V A and onein VB. We consider drawing G in 3D so that TA and TB are drawn in parallel planes. This type of graphdrawing is called 2.5D drawing since two dimensions are used for trees and the last one is for a differentpurpose [13].

For the trees TA and TB on planes, we consider two ways of drawing: parallel drawing and perpendiculardrawing. In parallel (resp., perpendicular) drawing, the directions of the two trees from the root to the leavesare parallel (resp., perpendicular). An illustration of a parallel (resp., perpendicular) drawing is shown inFigure 1, where the meaning of slate-gray arrows will be explained later.

Data analysts may like to inspect relationships between two hierarchical data, and thus the edges in Eshould be drawn so readably as possible. In this paper, we define the occlusion of a 2.5D drawing of Gbased on edge crossings of E.

For this, we consider viewing G from two orthogonal angles, i.e., angles X and Y , as indicated by

2

TA

TB

TA

TB(angle X) (angle Y )

Figure 3: Appearance of G in perpendicular drawing of Figure 1

Table 1: The viewed parts of TA and TB and the corresponding figuresangle X angle Y

parallel (side, side) (top, top)Figure 2 (left) Figure 2 (right)

perpendicular (side, top) (top, side)Figure 3 (left) Figure 3 (right)

slate-gray arrows in Figure 1. How G appears in parallel (resp., perpendicular) drawing from both angles isillustrated in Figure 2 (resp., 3). For example, in parallel drawing, we see the top of TA and TB from angleY . For each drawing and angle, the viewed parts of the trees are summarized in Table 1. In all cases, TA andTB appear as lined sequences of nodes, and thus G appears as a bipartite graph.

Then we define the occlusion occpara�Δ� of a parallel drawing Δ (resp., occperp�Δ� of a perpendiculardrawing Δ) as:

occpara�� � crosspara�X��� crosspara�Y ���occperp�� � crossperp�X ��� crossperp�Y ��� (1)

where crosspara�X�Δ� and crosspara�Y �Δ� (resp., crossperp�X �Δ� and crossperp�Y �Δ�) denote the edge crossingswhen G is viewed from angles X and Y . Since G appears as a bipartite graph from each angle, we utilizeconventional methods for edge crossing minimization in order to decrease cross (and thus occ).

Composition of the Paper. In Section 2, we give a survey and motivation on 2.5D graph drawing withsome historical notes. In Section 3, we mention one-sided crossing minimization problem, which is afundamental graph drawing problem and forms bases of our problem. We describe algorithms to minimizethe proposed occlusion (1) in Section 4 and give some visualization examples in Section 5. Finally we givethe concluding remarks in Section 6.

2 2.5D Graph Drawings

Affordable high quality 3D graphics in every PC has motivated a great deal of research in 3D graph drawingover the last ten to fifteen years. The proceedings of the annual Graph Drawing conferences document thesedevelopments. Three dimensional graph drawings with a variety of aesthetics and edge representations havebeen extensively studied (see [6, 9, 11, 16, 21, 31]). Examples include algorithms for 3D orthogonal drawingwith a limited number of bends, 3D straight-line grid drawing algorithms with good resolution (volume),and 3D graph drawing algorithms that maximize symmetry.

3

Laboratory experiments have shown that 3D graph visualizations can be up to three times more readablethan 2D [35]. However, the availability of the 3rd dimension has made little impact on graph visualizationindustry; currently no major graph visualization provider uses 3D. Even though these 3D algorithms ofthe past 10 years are theoretically significant, none of them have been adopted by the commercial graphdrawing software providers. Thus achieving good 3D visualization remains a challenging problem.

Recently, a number of researchers have recently pointed out that full use of 3D layout may not behelpful. Instead, they suggested a 2.5D representation. For example, Ware [36] advocates a “2 1

2 designattitude,” using 3D depth selectively and paying special attention to 2D layout. He indicates that this mayprovide the best match with the limited 3D capabilities of the human visual system.

Further, a series of 2.5D graph drawing algorithms have been developed for various graph models (see[1, 5, 8, 12, 14, 17, 21, 22, 23, 24, 25]). For example, the PolyPlane methods draw trees in 2.5D using a2D plane for each subtree [22]. Another method is to use two and a half dimensional methods to visualize(related) networks in parallel planes [1, 5, 8, 12].

A general framework which uses a divide and conquer approach for 2.5D graph drawing was presentedin [23]. More specifically, the framework divides the graph into a set of subgraphs, and then draws eachsubgraph in a plane (bounded planar region) using 2D drawing algorithms. Finally, a 2.5D drawing of thewhole graph is constructed by combining the 2D drawings, satisfying chosen optimization criteria.

Specific algorithms are designed as instantiations of the framework. For example, a 2.5D visualizationmethods for scale-free networks, see [1]; see [24, 25] for directed (or hierarchical) graphs, [20] for clusteredgraphs, [17] for temporal email networks, and [14] for the visual comparison of network centralities. Thesemethods are implemented in GEOMI, a visual analysis tool for large and complex networks [2].

As to 2.5D drawing of a pair of trees, Dwyer and Schreiber [13] proposed an approach to minimize theedge crossings between two undirected rooted trees. Developed from observation on phylogenetic trees,their method is limited to binary trees, to angle X in parallel drawing, and to the case where the edges in Econnect only leaves, while there is no such restriction on our algorithms.

3 One-sided Crossing Minimization Problem

One-sided crossing minimization problem (1CM) on a bipartite graph appears in many situations in graphdrawing, and thus is one of the most fundamental problems. Let us denote by �V�W�L� a bipartite graph,where V and W denote sets of nodes and L does a set of edges between nodes in V and W . An ordering Πof the nodes in V is defined as a sequence of the nodes: Π� �v1� � � � �v�V ��, where vk �V , k � 1� � � � � �V �. Wedenote by Π�v� the index of v �V in the ordering Π.

For two edges e� �v�w��e� � �u�r� � L, where v�u �V and w�r �W , let us define a 0-1 variable xe�e� asfollows:

xe�e� �

�1 if �v�� �u� and ��w� � ��r��0 otherwise�

Let us denote by χ�V�W�L�Π�Π�� the number of edge crossings in the bipartite graph �V�W�L� with respectto orderings Π of V and Π � of W , which is computed by:

χ�V�W�L�Π�Π�� � ∑e�e��L

�xe�e� � xe��e�� (2)

1CM is then described as follows.

Problem 1CM�V�W�L�Π�

Input: A bipartite graph �V�W�L� and an ordering Π of the nodes in V .

Output: An ordering Π� of the nodes in W that minimizes χ�V�W�L�Π�Π�� in (2).

4

Unfortunately, 1CM is known to be NP-hard [15], and various heuristic approaches have been proposed sofar. For a node w, we denote by N�w� the set of its neighbors. The barycenter β Π�w� of a node w �W withrespect to an ordering Π of V is then defined as:

βΠ�w� �

�1

�N�w�� ∑v�N�w�Π�v� if N�w� �� /0�

0 otherwise�

The barycenter heuristic orders the nodes in W in the increasing order of barycenter, and is an O�n 0�5�-approximation algorithm [33]. For other approach, there have been proposed the median heuristic (which is3-approximation algorithm) [15], the stochastic heuristic [10], and so on. The current best approximationbound is 1.4664 [29]. Among these heuristics, it is reported that the barycenter heuristic performs best inan empirical study [27].

Two-sided crossing minimization problem �2CM� is a natural extension of 1CM, and is required tocompute such orderings of both V and W that minimize edge crossings. 2CM is also an NP-hard problem[18], and the most common approach is to apply an exact or a heuristic 1CM algorithm iteratively untila local optimum is obtained (i.e., we first fix the ordering of V and apply a 1CM algorithm to determinethe ordering of W , and then again apply the algorithm to determine the ordering of V with the determinedordering of W , and so on). It is reported that the iteration using the barycenter heuristic performs bestamong various heuristics for 1CM [27].

4 Algorithms

Again let us take a graph G � �TA�TB�E� consisting of subgraphs TA � �VA�EA� and TB � �VB�EB� and aset E of edges, where E is a set of edges connecting a node in VA and one in VB, and is allowed to containmultiedges.

For each way of drawing G (i.e., parallel and perpendicular drawings), since edge crossings in anglesX and Y do not affect each other, their computation can be done independently. In this section, afterintroducing notations and terminologies, we describe the procedure to minimize edge crossings for eachcombination between a drawing way and a viewing angle.

4.1 Preliminary

Let us denote by rA � VA (resp., rB � VB) the root of the tree TA (resp., TB). For a node v � VA �VB, wedenote by D�v� (resp., C�v�) the set of its descendants (resp., its children) in the belonging tree. (If v is aleaf, then we define D�v� �C�v� � /0.) We denote D��v� �D�v���v� and C��v� �C�v���v�. We definean edge set Ev � ��v�w� � E�.

For each v, we consider two subgraphs of the belonging tree. One is the subtree rooted at v, and theother is the parent-children tree (pc-tree for short) rooted at v. The pc-tree is a subgraph induced by v andits children; i.e., it consists of the node set C��v� and the edge set ��v�u� � u �C�v��. Then we define twoedges sets Esub

v and Epcv as follows;

Esubv � ��u�w� � E � u � D��v���

Epcv � Ev��eq � �u�w� � u �C�v�� q � D��u�� �q�w� � E��

We note that both edge sets may contain multiedges.We denote by Πv (resp., πv) an ordering of the nodes in D��v� (resp., C��v�). Let us call Πv (resp., πv)

the family ordering (resp., pc-ordering) of v. If v � rA or rB, then we denote ΠA � ΠrA or ΠB � ΠrB forconvenience, and in this case, we call them the family ordering of the tree TA or TB.

Once pc-orderings are fixed for all inner nodes in the tree T � TA or TB, the family ordering Π � ΠA orΠB can be determined uniquely by traversing the tree from its root r � r A or rB by following pc-orderings,i.e., by denoting πr � �v1� � � � �vk�1�r�vk�1� � � � �v�C��r��� (and thus πr�r� � k), it becomes:

Π � �Πv1 � � � � �Πvk�1 �r�Πvk�1 � � � � �Πv�C��r����

5

TA

TB

a2 a3a4 a5 a6 a7 a8

TA

TB

a1a2 a3

(a) (b)

Figure 4: (a) edge crossings within subtrees rooted at the children of a 1 (circle) (b) edge crossings betweensubtrees (triangle) and those between Ea1 and subtrees (square)

where Πv1 � � � � �Πvk�1 �Πvk�1 � � � � �Πv�C��r��are determined recursively.

Let us take TA � �VA�EA� in Figure 1 for example, where VA � �a1� � � � �a8� and a1 is the root. ThenC�a1� � �a2�a3�, C�a2� � �a4�a5�, and so on. As in the figure (i.e., from left to right), the pc-orderings π a1

and πa2 may be taken as:

πa1 � �a2�a1�a3�� πa2 � �a4�a5�a2��

Then the family ordering ΠA is determined as:

ΠA � �Πa2 �a1�Πa3�

� �a4�a5�a2�a1�a6�a3�a7�a8��

Family ordering and pc-ordering described above have nice properties as follows:

Property 1 It is easy to draw a tree as a plane graph such that, when it is viewed from the top, the nodesappear to be aligned in line in the order of the family ordering of the tree. �See TA or TB in Figure 1 forexample.�

Property 2 Property 1 is reserved under any change of pc-orderings in the tree, which leads to change ofthe family ordering of the tree.

In the subsequent of this paper, we draw a tree by utilizing family ordering and pc-ordering.

4.2 Parallel Drawing

Angle X (left of Figure 2). Let us denote by dA (resp., dB) the height of the tree TA (resp., TB). Ford � 0�1� � � � �dA (resp., dB), let us denote by pd

A (resp., pdB) the node generated by contracting the nodes in

VA (resp., VB) in the depth d. Let us denote PA � �p0A� p

1A� � � � � p

dAA � and PB � �p0

B� p1B� � � � � p

dBB �.

When a tree is viewed from the side, the nodes appear as if those in the same depth were contracted,i.e., a lined sequence �p0

A� p1A� � � � � p

dAA � or �p0

B� p1B� � � � � p

dBB �.

Since our assumption on drawing trees does not allow to change the orderings of PA and PB, crosspara�Y

becomes a constant. Then minimization of occpara is reduced to that of crosspara�X , as will be shown in thenext.

Angle Y (right of Figure 2). We consider minimizing edge crossings cross para�X in the appeared bipartitegraph by changing the family orderings Π A and ΠB. This problem becomes a 2CM, and we utilize theconventional approach of iterative application of a 1CM algorithm, where we determine the family orderingsby manipulating pc-orderings.

6

Then how do we manipulate pc-orderings? Assume determining the family ordering Π A for a fixedΠB. What we would like to minimize is the edge crossings χ�VA�VB�E�ΠA�ΠB�, which is decomposed asfollows;

χ�VA�VB�E�ΠA�ΠB� � ∑v�C�rA�

χ�D��v��VB�Esubv �Πv�ΠB�

� χ�C��rA��VB�EpcrA�πrA �ΠB�� (3)

The first term of the right hand in (3) represents the sum of edge crossings within subtrees rooted at thechildren of rA, and the second term represents the sum of edge crossings between subtrees and those betweenEv and subtrees. (Figure 4 illustrates the above statements.) Since both terms are determined independently,if the family ordering ΠA attains an optimum, then the second term needs to be minimized. The first term isdecomposed recursively, and thus if ΠA is an optimum family ordering, then all pc-orderings also need toattain optimums. Since the converse is clearly true, the following proposition holds.

Proposition 1 The family ordering ΠA attains an optimum if and only if the pc-ordering π v of each innernode v �VA minimizes χ�C��v��VB�E

pcv �πv�ΠB�.

Thus all we need to do is to optimize the pc-orderings for all inner nodes in VA. Our iterative procedure tominimize crosspara�X is described as follows.

Procedure PARALLEL-X

Input: A Graph G� �TA�TB�E� consisting of trees TA � �VA�EA� and TB � �VB�EB� and a set E of edgessuch that E � ��v�w� � v �VA�w �VB�, pc-orderings of all inner nodes in VA�VB.

Output: Family orderings ΠA and ΠB.

Step 1: For each inner node v �VA, determine the pc-ordering πv by solving 1CM�VB�C��v��Epcv �ΠB�.

Step 2: For each inner node w � VB, determine the pc-ordering πw by solving1CM�VA�C��w��Epc

w �ΠA�.

Step 3: If no pc-ordering is changed from the last iteration, then output Π A and ΠB and halt. Otherwise,return to Step 1.

In the visualization example in the next section, we use the barycenter heuristic as the solver of 1CM.

4.3 Perpendicular Drawing

Angle X (left of Figure 3). Since the tree TA is viewed from the side and TB is viewed from the top, wedetermine only the family ordering of TB by manipulating pc-orderings. For a node w �VB, let us define anedge set Edepth

w as follows;

Edepthw �

dA�

d�0

��pdA�u� � u �C��w�� �v�u� � E� δ �v� � d��

where δ �v� denotes the depth of a node v � VA in the tree TA. Then the edge crossings crossperp�X in theappeared bipartite graph is minimized by determining the pc-ordering of each inner node w �V B by solving1CM�PA�C��w��Edepth

w ��p0A� p

1A� � � � � p

dAA ��.

Angle Y (right of Figure 3). By exchanging TA and TB, the discussion on angle X holds true in this case.Hence crossperp�Y is minimized by solving 1CM�PB�C��v��Edepth

v ��p0B� p

1B� � � � � p

dBB �� for each inner node

v �VA.

7

Figure 5: Pairwise phylogenetic trees of proteins from InfoVis 2003 Contest (parallel drawing)

5 Visualization Examples

Our visualization algorithms described above can be applied to data analysis of various purposes. In thissection, we show two application examples. The first one is about phylogenetic trees [26], and the secondone is about the Mushroom Database from UCI Machine Learning Repository [3]. In order to draw pair-wise trees on a computer, we write the source code by Python programming language and utilize VPythonpackage, which enables us to draw 3D graphics and to view the drawn objects from any angle.

5.1 Phylogenetic Trees

Summary of Data. Phylogenetic tree (a.k.a., evolutionary tree) represents evolutionary process of or-ganization. In a phylogenetic tree, an inner node represents evolutionary branch and a leaf does existingspecies. Here we use the pairwise phylogenetic trees of two proteins (named ABC and IM) belonging to asame family, which was an issue of InfoVis 2003 Contest [26]. Both trees are binary trees and have about60 leaves. Two leaves from different trees are connected by an edge if the corresponding species havesimilarity which is decided by specialists.

Observation. Figure 5 shows the pairwise phylogenetic trees which are drawn by parallel drawing andwhose nodes are ordered by our algorithm. We see that some pairs of subtrees (e.g., indicated by (a) and(b)) have fewer edge crossings compared with other parts of the graph. Such a pair of subtrees may meanthat the corresponding species of two proteins have evolved in a similar way. It is pleasure for biologists tosee such structure and our algorithm can help one to find it.

Also, the depth of a node has such meaning as follows: a long link leading to a split (i.e., an inner node)indicates that a large number of evolutionary changes occurred before the split and therefore that there isa high level of confidence that the grouping is valid. Figure 6 shows enlargements of parts (a) and (b) ofFigure 5. We see that all leaves of (a) are deep and may be able to draw a confident conclusion that these

8

(a) (b)

Figure 6: Enlargements of parts (a) and (b) of Figure 5

Figure 7: Pairwise phylogenetic trees of proteins from InfoVis 2003 Contest (perpendicular drawing)

species from two proteins have evolved similarly. On the other hand, connected leaves of (b) have differentdepths and are at a lower level. Then we may not be confident with the assertion that they have evolved ina similar way although the subtrees appear to be alike in Figure 5.

9

Table 2: Attributes, values, and their meaningsattribute value (meaning)cap-shape b (bell), c (conical), x (convex), f (flat)

k (knobbed), s (sunken)cap-surface f (fibrous), g (grooves), y (scaly),

s (smooth)cap-color n (brown), b (buff), c (cinnamon), g (gray),

r (green), p (pink), u (purple), e (red),w (white), y (yellow)

odor a (almond), l (anise), c (creosote), y (fishy),f (foul), m (musty), n (none), p (pungent),s (spicy),

population a (abundant), c (clustered), n (numerous),s (scattered), v (several), y (solitary),

habitat g (grasses), l (leaves), m (meadows),p (paths), u (urban), w (waste), d (woods)

We also show the phylogenetic trees drawn by perpendicular drawing in Figure 7. Unfortunately, bi-ologists were not pleased with this figure very much1 and we need to seek application area of this way ofdrawing.

5.2 Mushroom Database

Summary of Data. Mushroom Database is described in the tabulated form, i.e., rows and columns cor-respond to examples and attributes, respectively. The database consists of 8124 examples. Each examplecorresponds to a mushroom, where 4208 mushrooms �51�8%� belong to the class “edible” and the rest3916 ones �48�2%� to the class “poisonous.” An example has values for 22 categorical attributes; e.g.,cap-color (i.e., color of the cap), population, odor, and so on.

Visualization Settings. We inspect the relationships between attribute values of mushrooms. For this,we select two disjoint sets of interesting attributes to construct a pair of trees. In this paper, we take theattributes related to the cap (i.e., cap-shape, cap-surface, cap-color) as one set and the attributesodor, population, and habitat as the other set. In Table 2, we show the possible values of theseattributes and their meanings.

For each set of attributes, we construct such a tree of height 2, where each node in the depth 1 corre-sponds to one attribute and each of its children in the depth 2 does to an attribute value. For each mushroomand each pair of attributes in different trees, we add edges between the two nodes of the attribute valuestaken by the mushroom; the edges added in this way become the set E. Then there may exist multiedgesconnecting the same nodes, and we visualize the degree of a multiedge by thickness. Thus thicker edgesare one of our interest since they represent major combinations of attribute values.

Observation. First we construct the graph only from poisonous mushrooms. We show the graph in Fig-ures 8 and 9, where Figure 8 overviews the whole of the graph and Figure 9 is its enlargement.

There are many things to be understood from the figures at a glance. For example, we see the distributionof attribute values (e.g., “foul” is the most frequent in the attribute odor), major combinations of twoattribute values (e.g., “convex” of cap-shape and “various” of population) which may be importantto mushroom researchers, and so on. Major combinations of attribute values are called frequent sets in theliterature, and there have been developed many fast algorithms for enumerating such frequent sets [34]. Ourmethod visualizes such frequent sets (of at most size two) effectively.

1We knew the fact by personal communication with F. Schreiber, an author of such papers as [8, 13, 14].

10

Next we construct the graph from both edible and poisonous mushrooms. We show the graph in Fig-ure 10, where the distribution of classes is represented by edge color, i.e., white (resp., green) representspoisonous (resp., edible) mushrooms. By taking class into account, we observe more interesting informationsuch as:

� Most of mushrooms taking “foul” (resp., “none”) for the attribute odor are poisonous (resp., edible).

� Many mushrooms take “various” for the attribute population, but we cannot tell whether they arepoisonous or edible with a high accuracy (since the edge colors are impure).

The angle should be decided by the purpose of observation. We show the graph viewed from theopposite side in Figure 11. From this angle, it is easier for us to observe information subject to the attributesrelated to the cap.

In this way, our visualization method provides us with interesting information. Furthermore, as shown,even though tree structures are not given explicitly, the method is effective by constructing a pair of treesand defining the relationships between them in some proper way.

5.3 Other Applications

Our method may also work on PoS (point-of-sales) data collected at such places as supermarkets. In thiscase, two trees may be constructed as follows; One tree is for products. Distributed products usually havetheir own categories (e.g., food, stationery, plant may be big categories, and fish, meat, vegetable, fruitmay be subcategories of food), and we use their hierarchical information as a tree. The other tree is forcustomers. The nodes in the depth 1 may represent gender, those in 2 may represent age, and so on. For arecord of sales, we add an edge between the associated product and customer. The number of sales can berepresented by thickness and the time when the sales is made may be represented by color.

6 Conclusion

In this paper, we aimed to analyze data with hierarchical structure by visualizing it with graph drawingtechnique. Defining (or even measuring) the occlusion in 3D drawing is usually a difficult task, and we con-sidered 2.5D drawing of a pair of trees having some relationships to each other as such hierarchical data, anddefined the occlusion based on edge crossings in bipartite graphs for two ways of drawing (i.e., parallel andperpendicular drawings). After a survey on 2.5D drawing in Section 2 and preparation of 1CM in Section 3,we proposed algorithms to obtain a 2.5D drawing that minimizes the proposed occlusion in Section 4 andgave visualization examples in Section 5, which shows usefulness of our strategy of visualizing pairwisehierarchical data.

Our future work includes development of algorithms to the problems of alternative ways of drawingtrees (i.e., besides parallel and perpendicular drawings) and to those of more than two trees. An alternativedefinition of the occlusion should also be discussed.

References

[1] A. Ahmed, T. Dwyer, S. Hong, C. Murray, L. Song and Y. Wu, “Visualisation and Analysis of Largeand Complex Scale-free Networks,” In Proceedings of EuroVis, Leeds, UK, pp. 239-246, 2005.

[2] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. Ho, S. Hong, D. Koschutzki, C. Murray, N. Nikolov, A.Tarassov, R. Taib and K. Xu, “GEOMI: GEometry for Maximum Insight,” In Proceedings of GraphDrawing (GD 2005), Limerick, Ireland, pp. 468-479, 2006.

[3] A. Asuncion and D. J. Newman, UCI Machine Learning Repository[http://www.ics.uci.edu/mlearn/ MLRepository.html], Irvine, CA: Univer-sity of California, Department of Information and Computer Science, 2007.

11

[4] G. Di Battista, P. Eades, R. Tamassia and I. G. Tollis, Graph Drawing: Algorithms for the Visualizationof Graphs, Prentice Hall, 1999.

[5] M. Baur, U. Brandes, M. Gaertler and D. Wagner, “Drawing the AS Graph in 2.5 Dimensions,” InProceeding of Graph Drawing 2004 (GD 2004), New York, USA, pp. 43-48, 2004.

[6] T. Biedl, T. Thiele and D. R. Wood, “Three-Dimensional Orthogonal Graph Drawing with OptimalVolume,” In Proceedings of Graph Drawing 2000 (GD2000), Virginia, USA, pp. 284-295, 2001.

[7] T. Biedl, F. J. Brandenburg, and X. Deng, “Crossing and Permutations,” In Proceedings of GraphDrawing 2005 (GD 2005), Limerick, Ireland, pp. 1-12, 2006.

[8] U. Brandes, T. Dwyer and F. Schreiber, “Visualizing Related Metabolic Pathways in Two and a HalfDimensions,” In Proceedings of Graph Drawing 2003 (GD 2003), Perugia, Italy, pp. 111-122, 2003.

[9] R. Cohen, P. Eades, T. Lin and F. Ruskey, “Three Dimensional Graph Drawing,” Algorithmica, Vol. 17,No. 2, pp. 199-208, 1996.

[10] S. Dresbach, “A New Heuristic Layout Algorithm for DAGS,” Operations Research Proceedings,Springer Verlag, Berlin, pp. 121-126, 1994.

[11] V. Dujmovic, P. Morin and D. R. Wood, “Path-Width and Three-Dimensional Straight-Line GridDrawing of Graphs,” In Proceedings of Graph Drawing 2002 (GD 2002), California, USA, pp. 42-53,2002.

[12] T. Dwyer, Two-and-a-half Dimensional Visualisation of Relational Networks, PhD Thesis, The Uni-versity of Sydney, 2004.

[13] T. Dwyer and F. Schreiber, “Optimal Leaf Ordering for Two and a Half Dimensional PhylogeneticTree Visualisation,” In Proceedings of Australasian Symposium on Information Visualisation (InVis.au2004), Christchurch, New Zealand, pp. 109-115, 2004.

[14] T. Dwyer, S. Hong, D. Koschutzki, F. Schreiber and K. Xu, “Visual Comparison of Network Central-ities,” In Proceedings of APVIS 2006, to appear.

[15] P. Eades and N. C. Wormald, “Edge Crossings in Drawings of Bipartite Graphs,” Algorithmica,Vol. 11, pp. 379-403, 1994.

[16] P. Eades, A. Symvonis and S. Whitesides, “Three Dimensional Orthogonal Graph Drawing Algo-rithms,” Discrete Applied Mathematics, Vol. 103, pp. 55-87, 2000.

[17] X. Fu, S. Hong, R. Shen, Y. Wu and K. Xu, “Visualisation and Analysis of Temporal Email Networks,”In proceedings of APVIS 2007, to appear.

[18] M. R. Garey and D. S. Johnson, “Crossing number is NP-complete,” SIAM Journal on Algebraic andDiscrete Methods, Vol. 4, pp. 312-316, 1983.

[19] W. Huang, S. Hong, and P. Eades, “Layout Effects on Sociogram Perception,” In Proceedings of GraphDrawing 2005 (GD 2005), Limerick, Ireland, pp. 262-273, 2006.

[20] J. Ho and S. Hong, “Drawings Clustered Graphs in Three Dimensions,” In Proceedings of GraphDrawing 2005 (GD 2005), Limerick, Ireland, pp. 492-502, 2006.

[21] S. Hong, “Drawing Graphs Symmetrically in Three Dimensions,” In Proceedings of Graph Drawing2001 (GD 2001), Vienna, Austria, pp. 189-204, 2002.

[22] S. Hong and T. Murtagh, “Visualization of Large and Complex Networks Using PolyPlane,” In Pro-ceedings of Graph Drawing 2004 (GD 2004), New York, USA, pp. 471-482, 2004.

12

[23] S. Hong, “MultiPlane: a New Framework for Drawing Graphs in Three Dimensions,” NICTA Techni-cal Report, 2005.

[24] S. Hong, N. S. Nikolov, “Layered Drawings of Directed Graphs in Three Dimensions,” In Proceedingsof Asia Pacific Symposium on Information Visualisation (APVIS 2005), CRPIT 45, pp. 65-70, 2005.

[25] S. Hong, N. S. Nikolov, “Hierarchical Layouts of Directed Graphs in Three Dimensions,” In Proceed-ings of Graph Drawing 2005 (GD 2005), Limerick, Ireland, pp. 251-261, 2006.

[26] InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees[http://www.cs.umd.edu/hcil/iv03contest/], IEEE Symposium on InformationVisualization (INFOVIS 2003), 2003.

[27] M. Junger and P. Mutzel, “2-Layer Straightline Crossing Minimization: Performance of Exact andHeuristic Algorithms,” Journal of Graph Algorithms and Applications, Vol. 1, No. 1, pp. 1-25, 1997.

[28] M. Kaufmann and D. Wagner, “Drawing Graphs: Methods and Models,” Lecture Notes in ComputerScience Tutorial 2025, Springer, 2001.

[29] H. Nagamochi, “An Improved Bound on The One-sided Minimum Crossing number in Two-layeredDrawings,” Discrete and Computational Geometry, Vol. 33, No. 4, pp. 569-591, 2005.

[30] T. Nishizeki and M. S. Rahman, Planar Graph Drawing, World Scientific, 2004.

[31] J. Pach, T. Thiele and G. Toth, “Three-dimensional Grid Drawings of Graphs,” In Proceedings ofGraph Drawing 1997 (GD ’97), Rome, Italy, pp. 47-51, 1998.

[32] K. Sugiyama, Graph Drawing and Applications for Software and Knowledge Engineers, World Sci-entific, 2002.

[33] K. Sugiyama, S. Tagawa, and M. Toda, “Methods for Visual Understanding of Hierarchical SystemStructures,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-11, No.2, pp. 109-125,1981.

[34] T. Uno, M. Kiyomi, H. Arimura, “LCM ver 2: Efficient Mining Algorithms for Frequent�Closed�Maximal Itemsets,” In Proceedings of International Conference on Data Mining, Frequent ItemsetMining Implementations, 2004.

[35] C. Ware and G. Franck, “Viewing a Graph in a Virtual Reality Display is Three Times as Good as a2-D Diagram,” In Proceedings of IEEE Conference on Visual Languages, pp. 182-183, 1994.

[36] C. Ware, “Designing with a 2 1/2D Attitude,” Information Design Journal, Vol. 10, No. 3, pp. 171-182,2001.

[37] R. Webber, Finding the Best View-Point in 3D Graph Drawing, PhD Thesis, University of Newcastle,1999.

13

Figure 8: Overview of graph for poisonous mushrooms

Figure 9: Enlargement of Figure 8

14

Figure 10: Enlarged graph for both poisonous and edible mushrooms

Figure 11: Enlarged graph for both poisonous and edible mushrooms from the opposite side to Figure 10

15