visualizing book similarity as topographic map

2
Visualizing Book Similarity as Topographic Map Martin Gronemann * Michael J ¨ unger University of Cologne, Germany ABSTRACT The visualization of clustered graphs is an essential tool for the analysis of networks, in which clustering techniques like commu- nity detection can reveal various structural properties. We give a brief description of how we succeeded to draw clustered graphs as topographic maps by combining a tree map approach with the to- pographic map metaphor. The proposed method is then applied to a similarity network of bestseller books of the Amazon website. 1 I NTRODUCTION Visualizing information using a landscape metaphor has a long tradition in information visualization. This popularity is mainly caused by the fact that most people, including non-technical peo- ple, have a natural understanding of landscapes and are able to read a map. Recently, Fabrikant et al. [3] examine in their work the ef- fect of using the landscape metaphor in information visualization. While using this metaphor is very popular in visualizing multidi- mensional data like f.e. in Themescape, it is rarely used to draw graphs. Cortese et al. [1] use the topographic map metaphor for vi- sualizing hierarchical networks in a radial style. In order to draw clustered graphs, Gansner et al. [4] proposed GMap, where disjoint clusters are represented as a political map. Unlike GMap, we visualize a complete hierarchy of clusters. The basic idea is to generate a landscape where the nodes in different subtrees of the cluster hierarchy are separated by water, a valley, or a rift. With increasing distance from the root cluster, nodes will be located in the lowlands, the highlands, and ultimately, on mountain peaks. Given this basic idea, we present a new method for creating such drawings by combining a tree map approach with the topo- graphic map metaphor. In general, a tree map based approach suffers from the problem of placing entities close together that are not necessarily close in terms of graph theoretic distance. We counteract in two ways: First, the algorithm is modified such that it reduces the edge length and adjacent nodes/clusters are placed closer together while maintain- ing the nested structure. Second, color encoded elevation levels are used to visually separate different subtrees. The advantage of a tree map approach compared to a force di- rected approach, is that the containment property prevents fragmen- tation, i.e. clusters do not form a compact area which is a require- ment for a consistent map layout. 2 BRIEF DESCRIPTION OF THE METHOD In the following, we give a brief description of the techniques used for the topographic map displayed on the poster. As a first step, the network has been clustered using the popular edge betweenness clustering method of Girvan and Newman [5]. We compute the lay- out of the nodes by first generating a tree map for the cluster hierar- chy and then placing the nodes in the centroid of the corresponding * e-mail: [email protected] e-mail: [email protected] cell. As a second step, a 2.5D triangle mesh based on Delaunay tri- angulation is generated describing the elevation model. In the last step, the edges are routed along the terrain features. Edge-aware Fat Polygon Partitioning The layout is com- puted by the fat polygon partitioning of de Berg et al. [2], a tree map approach that is applied to the input hierarchy. It constructs a nested structure of convex polygons by recursively partitioning convex polygons by using a cutting line with specific direction. The work of de Berg et al. [2] provides a set of rules how to choose the cut direction to obtain polygons with a good aspect ratio. Since we do not only want to display the hierarchy, the edges of the underlying graph have to be taken into account. The idea is to keep the containment property of the tree map as a hard constraint for the layout, while the edges form soft constraints for which we modified the partition method by choosing a partition that tries to minimize the total edge length. When partitioning a polygon, we enumerate all allowed cut directions which depend on the shape of the polygon and the area required for the sub polygons and choose one that minimizes the distance to highly connected external clus- ters based on the edges. The idea is illustrated in Figure 1, where for a convex polygon two subdivisions are given, both resulting in different edge length. For more details, see [7]. (a) (b) Figure 1: Two possible partitions of a convex polygon. The first choice (a) minimizes edge length, while the second option results in longer total edge length. However, at a specific point during the subdivision process, only limited information about the final layout is available, because the polygons of highly connected external clusters may not have been computed yet. In order to base our decision on a layout that is as accurate as possible, we partition the polygons with highest area first, and this results in a more balanced refinement of the area. Triangle Mesh After computing the tree map, graph nodes are placed in the centroid of their corresponding polygon. This layout is then triangulated with the initial boundary polygon of the tree map as a constraint to provide a consistent boundary. The result- ing triangulation is then refined by subdividing each triangle. The inserted vertices correspond to cluster nodes in the input hierarchy from which the elevation levels are derived. The intuition can be described by considering the hierarchy tree drawn bottom-up, i.e. the root is located at the bottom and leaves form peaks at the top. This elevation model is then mapped to the newly inserted vertices of the mesh. More details on the mapping between triangle vertices and the cluster tree can be found in [6]. This triangle mesh is then used to draw the color encoded elevation map and as a basis for the edge routing.

Upload: others

Post on 29-Oct-2021

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Book Similarity as Topographic Map

Visualizing Book Similarity as Topographic Map

Martin Gronemann∗ Michael Junger†

University of Cologne, Germany

ABSTRACT

The visualization of clustered graphs is an essential tool for theanalysis of networks, in which clustering techniques like commu-nity detection can reveal various structural properties. We give abrief description of how we succeeded to draw clustered graphs astopographic maps by combining a tree map approach with the to-pographic map metaphor. The proposed method is then applied toa similarity network of bestseller books of the Amazon website.

1 INTRODUCTION

Visualizing information using a landscape metaphor has a longtradition in information visualization. This popularity is mainlycaused by the fact that most people, including non-technical peo-ple, have a natural understanding of landscapes and are able to reada map. Recently, Fabrikant et al. [3] examine in their work the ef-fect of using the landscape metaphor in information visualization.While using this metaphor is very popular in visualizing multidi-mensional data like f.e. in Themescape, it is rarely used to drawgraphs. Cortese et al. [1] use the topographic map metaphor for vi-sualizing hierarchical networks in a radial style. In order to drawclustered graphs, Gansner et al. [4] proposed GMap, where disjointclusters are represented as a political map.

Unlike GMap, we visualize a complete hierarchy of clusters. Thebasic idea is to generate a landscape where the nodes in differentsubtrees of the cluster hierarchy are separated by water, a valley, ora rift. With increasing distance from the root cluster, nodes will belocated in the lowlands, the highlands, and ultimately, on mountainpeaks. Given this basic idea, we present a new method for creatingsuch drawings by combining a tree map approach with the topo-graphic map metaphor.

In general, a tree map based approach suffers from the problemof placing entities close together that are not necessarily close interms of graph theoretic distance. We counteract in two ways: First,the algorithm is modified such that it reduces the edge length andadjacent nodes/clusters are placed closer together while maintain-ing the nested structure. Second, color encoded elevation levels areused to visually separate different subtrees.

The advantage of a tree map approach compared to a force di-rected approach, is that the containment property prevents fragmen-tation, i.e. clusters do not form a compact area which is a require-ment for a consistent map layout.

2 BRIEF DESCRIPTION OF THE METHOD

In the following, we give a brief description of the techniques usedfor the topographic map displayed on the poster. As a first step,the network has been clustered using the popular edge betweennessclustering method of Girvan and Newman [5]. We compute the lay-out of the nodes by first generating a tree map for the cluster hierar-chy and then placing the nodes in the centroid of the corresponding

∗e-mail: [email protected]†e-mail: [email protected]

cell. As a second step, a 2.5D triangle mesh based on Delaunay tri-angulation is generated describing the elevation model. In the laststep, the edges are routed along the terrain features.

Edge-aware Fat Polygon Partitioning The layout is com-puted by the fat polygon partitioning of de Berg et al. [2], a treemap approach that is applied to the input hierarchy. It constructsa nested structure of convex polygons by recursively partitioningconvex polygons by using a cutting line with specific direction. Thework of de Berg et al. [2] provides a set of rules how to choose thecut direction to obtain polygons with a good aspect ratio.

Since we do not only want to display the hierarchy, the edges ofthe underlying graph have to be taken into account. The idea is tokeep the containment property of the tree map as a hard constraintfor the layout, while the edges form soft constraints for which wemodified the partition method by choosing a partition that tries tominimize the total edge length. When partitioning a polygon, weenumerate all allowed cut directions which depend on the shape ofthe polygon and the area required for the sub polygons and chooseone that minimizes the distance to highly connected external clus-ters based on the edges. The idea is illustrated in Figure 1, wherefor a convex polygon two subdivisions are given, both resulting indifferent edge length. For more details, see [7].

(a) (b)

Figure 1: Two possible partitions of a convex polygon. The firstchoice (a) minimizes edge length, while the second option resultsin longer total edge length.

However, at a specific point during the subdivision process, onlylimited information about the final layout is available, because thepolygons of highly connected external clusters may not have beencomputed yet. In order to base our decision on a layout that is asaccurate as possible, we partition the polygons with highest areafirst, and this results in a more balanced refinement of the area.

Triangle Mesh After computing the tree map, graph nodes areplaced in the centroid of their corresponding polygon. This layoutis then triangulated with the initial boundary polygon of the treemap as a constraint to provide a consistent boundary. The result-ing triangulation is then refined by subdividing each triangle. Theinserted vertices correspond to cluster nodes in the input hierarchyfrom which the elevation levels are derived. The intuition can bedescribed by considering the hierarchy tree drawn bottom-up, i.e.the root is located at the bottom and leaves form peaks at the top.This elevation model is then mapped to the newly inserted verticesof the mesh. More details on the mapping between triangle verticesand the cluster tree can be found in [6]. This triangle mesh is thenused to draw the color encoded elevation map and as a basis for theedge routing.

Page 2: Visualizing Book Similarity as Topographic Map

Edge Routing For the layout of the edges we follow standardpractice by constructing a routing network and compute a shortestpath between the two endpoints of an edge. A slightly modifiedversion of the triangle mesh described earlier is used as a routingnetwork. This enables us to make the edge routing aware of thecluster hierarchy by combining the Euclidean distance and eleva-tion in the distance function used for the shortest path algorithm.

3 DATASET

The dataset for the network has been taken from [8] and is avail-able as part of the Stanford Large Network Dataset Collection1. Itcontains a total number of 548,552 products with various attributessuch as sales rank, categories and reviews.

Figure 2: A subset of bestseller books on the Amazon website insummer 2006 [8]. The map shows book titles of 1930 books fromvarious areas and 2913 links that correspond to the similarity.

The set of books shown in Figure 2 is obtained by filtering thecomplete list of products based on two conditions. First, it is clas-sified as a book. Second, it is one of the 5000 most sold products.The edge set is then obtained by the similarity provided by the dataset. In order to reduce the number of duplicates, books are iden-tified by title and not by identification code. The generated graphcontains 574 isolated nodes that have been removed, resulting in atotal number of 1930 books and 2913 undirected links. The run-time for the tree map algorithm is about 1.27 seconds, while theedge router takes 0.17 seconds2.

In Figure 2 the final map of the book network is shown. Theislands correspond to various subjects. The largest island in thenorth east contains books from general fiction like e.g., 1984, Lordof the Flies, and Animal Farm. Below the fiction books the classictitles can be found, see Figure 4. Figure 3 shows titles related tomanagement and financial subjects.

4 CONCLUSION AND DISCUSSION

Our poster visualizes the similarities and a cluster hierarchy of booktitles based on the data provided by Amazon. The combinationof the topographic map metaphor, a tree map approach based onconvex polygons, and a heuristic for reducing edge lengths is usedto overcome the drawbacks of a pure tree map approach.

The technique produces appealing maps for medium sized in-stances that are easily readable due to the color encoded elevationlevels. While the tree map approach scales well for larger instance,the drawing of edges has limitations in both runtime and displayingcapabilities. Compared to, e.g., small world graphs arising in socialnetwork analysis, the book similarity graph is rather sparse.

1http://snap.stanford.edu/2Machine with Core i7 2.7GHz and 8 GB RAM

Figure 3: A detailed view on the “corporate island”.

Figure 4: Classic books on the northern east island.

REFERENCES

[1] P. F. Cortese, G. D. Battista, A. Moneta, M. Patrignani, and M. Piz-

zonia. Topographic visualization of prefix propagation in the internet.

IEEE Trans. Vis. Comput. Graph., 12(5):725–732, 2006.

[2] M. de Berg, K. Onak, and A. Sidiropoulos. Fat polygonal par-

titions with applications to visualization and embeddings. CoRR,

abs/1009.1866.

[3] S. I. Fabrikant, D. R. Montello, and D. M. Mark. The natural landscape

metaphor in information visualization: The role of commonsense geo-

morphology. J. Am. Soc. Inf. Sci. Technol., 61(2):253–270, Feb. 2010.

[4] E. R. Gansner, Y. Hu, and S. Kobourov. Visualizing graphs and clusters

as maps. IEEE Computer Graphics and Applications, 30:54–66, 2010.

[5] M. Girvan and M. E. J. Newman. Community structure in social and

biological networks. Proceedings of the National Academy of Sciences,

99:7821–7826, 2002.

[6] M. Gronemann and M. Junger. Drawing clustered graphs as topo-

graphic maps. In Proceedings of the 20th International Symposium

on Graph Drawing, 2012. To appear.

[7] M. Gronemann, M. Junger, N. Kriege, and P. Mutzel. MolMap: Visu-

alizing molecule libraries as topographic maps. Technical report, 2012.

[8] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of

viral marketing. ACM Trans. Web, 1(1), 2007.