i/o optimal isosurface extraction

26
I/O Optimal Isosurface Extraction Yi-Jen Chiang Cl´ audio T. Silva March 1997; revised October 1997 Conference version appeared in Proc. IEEE Visualization ’97 Abstract In this paper we give I/O-optimal techniques for the extraction of isosurfaces from volumetric data, by an application of the I/O-optimal interval tree of Arge and Vitter. The main idea is to preprocess the dataset once and for all to build an efficient search structure in disk, and then each time we want to extract an isosurface, we perform an output-sensitive query on the search structure to retrieve only those active cells that are intersected by the isosurface. During the query operation, only two blocks of main memory space are needed, and only those active cells are brought into the main memory, plus some negligible overhead of disk accesses. This implies that we can efficiently visualize very large datasets on workstations with just enough main memory to hold the isosurfaces themselves. We also give simple preprocessing algorithms for building the static I/O interval tree that achieve the optimal I/O cost. The implementation of our techniques is delicate but not complicated. We give the first implementation of the I/O-optimal interval tree, and also implement our methods as an I/O filter for Vtk’s isosurface extraction for the case of unstructured grids. We show that, in practice, our algorithms improve the performance of isosurface extraction by speeding up the active-cell searching process so that it is no longer a bottleneck. Moreover, this search time is independent of the main memory available. The practical efficiency of our techniques reflects their theoretical optimality. Keywords: Isosurface Extraction, Marching Cubes, Interval Tree, Scientific Visualization. 1 Introduction Isosurface extraction represents one of the most effective and powerful techniques for the investigation of volume datasets. It has been used extensively, particularly in visualization [19, 20], simplification [14], and implicit modeling [24], just to name a few. Isosurfaces also play an important role in other areas of science such as biology and medicine, since they can be used to study and perform detailed measurements of proper- ties of the datasets. In fact, nearly all visualization packages include an isosurface extraction component. Its widespread use makes efficient isosurface extraction a very important problem. The problem of isosurface extraction can be stated as follows. A scalar volume dataset consists of tuples x x , where x is a 3D point and is a scalar function defined over 3D points. Given an isovalue (a scalar value) , the extraction of the isosurface of is to compute and display isosurface x x . The computational process of isosurface extraction can be viewed as consisting of two phases. First, in the search phase, one finds all cells of the dataset that are intersected by the isosurface; such cells are called active The conference version of this paper appeared in Proc. IEEE Visualization ’97, Oct. 1997. Department of Applied Mathematics and Statistics, SUNY Stony Brook, NY 11794-3600; [email protected]. Supported in part by NSF Grant DMS-9312098. Department of Applied Mathematics and Statistics, SUNY Stony Brook, NY 11794-3600; [email protected]. Partially supported by Sandia National Labs and the Dept. of Energy Mathematics, Information, and Computer Science Office, and by the National Science Foundation (NSF), grant CDA-9626370. 1

Upload: itesm

Post on 27-Jan-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

I/O Optimal Isosurface Extraction�Yi-Jen Chiangy Claudio T. Silvaz

March 1997; revised October 1997

Conference version appeared inProc. IEEE Visualization ’97

Abstract

In this paper we give I/O-optimal techniques for the extraction of isosurfaces from volumetric data,by an application of the I/O-optimal interval tree of Arge and Vitter. The main idea is to preprocessthe datasetonce and for allto build an efficient search structure indisk, and then each time we wantto extract an isosurface, we perform anoutput-sensitivequery on the search structure to retrieve onlythoseactivecells that are intersected by the isosurface. During the query operation, only two blocks ofmain memory space are needed, and only those active cells are brought into themain memory, plus somenegligible overhead of disk accesses. This implies that we can efficiently visualize very large datasetson workstations with just enough main memory to hold theisosurfaces themselves. We also give simplepreprocessing algorithms for building thestatic I/O interval tree that achieve the optimal I/O cost. Theimplementation of our techniques is delicate but not complicated. We give the first implementation of theI/O-optimal interval tree, and also implement our methods as anI/O filter for Vtk’s isosurface extractionfor the case of unstructured grids. We show that, in practice, our algorithms improve the performance ofisosurface extraction by speeding up the active-cell searching process so thatit is no longer a bottleneck.Moreover, this search time is independent of the main memory available. Thepractical efficiency of ourtechniques reflects their theoretical optimality.

Keywords: Isosurface Extraction, Marching Cubes, Interval Tree, Scientific Visualization.

1 Introduction

Isosurface extraction represents one of the most effectiveand powerful techniques for the investigation ofvolume datasets. It has been used extensively, particularly in visualization [19, 20], simplification [14], andimplicit modeling [24], just to name a few. Isosurfaces alsoplay an important role in other areas of sciencesuch as biology and medicine, since they can be used to study and perform detailed measurements of proper-ties of the datasets. In fact, nearly all visualization packages include an isosurface extraction component. Itswidespread use makes efficient isosurface extraction a veryimportant problem.

The problem of isosurface extraction can be stated as follows. A scalar volume datasetconsists of tuples(x;F(x)), wherex is a 3D point andF is a scalar function defined over 3D points. Given an isovalue(a scalarvalue)q, the extraction of the isosurface ofq is to compute and display isosurfaceC(q) = fxjF(x) = qg.The computational process of isosurface extraction can be viewed as consisting of two phases. First, in thesearch phase, one finds all cells of the dataset that are intersected by theisosurface; such cells are calledactive�The conference version of this paper appeared inProc. IEEE Visualization ’97, Oct. 1997.yDepartment of Applied Mathematics and Statistics, SUNY Stony Brook, NY 11794-3600; [email protected]. Supported inpart by NSF Grant DMS-9312098.zDepartment of Applied Mathematics and Statistics, SUNY Stony Brook, NY 11794-3600; [email protected]. Partiallysupported by Sandia National Labs and the Dept. of Energy Mathematics, Information, and Computer Science Office, and by theNational Science Foundation (NSF), grant CDA-9626370.

1

cells. Next, in thegeneration phase, depending on the type of cells, one can apply an algorithm toactuallygenerate the isosurface from those active cells (Marching Cubes [19] is one such algorithm for hexahedralcells). LetN be the total number of cells in the dataset, andK the number of active cells. It is estimated thatthe average value ofK is O(N2=3) [15], therefore an exhaustive scanning of all cells in the search phase isfound to be inefficient, spending a large portion of time traversing cells that are not active. A lot of researchefforts have thus focused on developingoutput-sensitivealgorithms to speed up the search phase by avoidingsuch exhaustive scanning.

Most algorithms developed so far (except for the inefficientexhaustive scanning), however, require the timeand main memory space to read and keep the entire dataset in the main memory, plus some additional prepro-cessing time and main memory space if some structures are built to speed up the search phase. Unfortunately,for really large volume datasets, these methods often suffer the problem of not having enough main memory,which can cause a major slow-down of the algorithms due to a large number of page faults. On the otherhand, when visualizing isosurfaces, as opposed to volume rendering, only a small portion (K = O(N2=3)active cells) of the dataset is ever needed. This seems to indicate that it is not necessary to use time and mainmemory to load and store the whole volume. If users never had to load a dataset completely but rather onlyhad to store the triangles that defined the isosurface, effective visualization could be performed on low tomiddle range workstations, or even on PCs in some cases.

In this paper, we present I/O-efficient techniques to resolve the memory issue and to speed up the searchphase of isosurface extraction, by using the I/O-optimal interval tree of [5]. The main idea is to preprocess thedatasetonce and for allto build an efficient search structure indisk, and then each time we want to extract anisosurface, we perform anoutput-sensitivequery on the search structure to report all active cells. Thequeryoperation needs only two blocks of main memory space, and only O(logB N +K=B) disk reads, whereBis the number of cells that can fit into a disk block (and thusdK=Be disk reads are necessary to report allK active cells). The technique hence acts as anI/O-filter, performing only those disk accesses to the datasetthat are needed, plus a negligible overhead. The search structure is in factI/O-optimal in space, query, andpreprocessing.

Previous Related Work

As mentioned before, isosurface extraction has been the focus of much research. Here we briefly reviewthe results that focus on speeding up the search phase of isosurface extraction by acting as afilter, avoidingtraversals of cells that are not active. (Please see [18] foran excellent and thorough review.)

In Marching Cubes [19], all cells in the volume dataset are searched for isosurface intersection. Essentially,each time a user runs Marching Cubes,O(N) time is needed. Concerning the main memory issue, thistechnique does not require the entire dataset to fit into the main memory, butdN=Be disk reads are necessary.Wilhems and Van Gelder [36] propose a method of using an octree to optimize isosurface extraction. Thisalgorithm has worst-case time ofO(K +K log(N=K)) (this analysis is presented by Livnatet al. [18]) forisosurface queries, once the octree has been built.

Itoh and Kayamada [15] propose a method based on identifyinga collection ofseed cellsfrom whichisosurfaces can be propagated by performing local search. Basically, once the seed cells have been identified,they claim to have a nearlyO(N2=3) expected performance. (Livnatet al.[18] estimate the worst-case runningtime to beO(N), and the memory overhead to be quite high.) More recently, Bajaj et al. [7] propose anothercontour propagation scheme, with expected performance ofO(K). Livnat et al. [18] propose NOISE, anO(pN +K)-time algorithm. Shenet al. [28, 29] also propose nearly optimal isosurface extractionmethods.

The first optimal isosurface extraction algorithm was given by Cignoniet al. [11], based on the use ofan interval tree to store the intervals induced by the dataset cells. After anO(N logN)-time preprocessing,queries can be performed in optimalO(logN +K) time. This achieves tight theoretical bounds.

2

All the techniques mentioned above are main-memory algorithms, requiring the entire dataset to fit inthe main memory (except for Marching Cubes). There is also a class ofexternal-memoryalgorithms (notparticularly for isosurface extraction). For large-scaleapplications in which the problem is too large to fit inthe main memory, Input/Output (I/O) communication betweenfast main memory and slower external memory(disk) becomes a major bottleneck. Algorithms specificallydesigned to reduce the I/O bottleneck are calledexternal-memoryalgorithms. In recent years various researchers have been investigating external-memoryalgorithms for graphs [10] and for computational geometry [2, 4, 5, 8, 13, 16, 23, 31, 34], in addition toearly work on sorting and scientific computing [1, 21, 35]. Although most of the results are theoretical,the experiments of Chiang [9] and of Vengroff and Vitter [33]on some of these techniques show that theyresult in significant improvements over traditional algorithms in practice. Also, Telleret al. [32] describe asystem to compute radiosity solutions for polygonal environments larger than main memory, and Funkhouseret al. [12] present prefetching techniques for interactive walk-throughs in architectural virtual environmentswhose models are larger than main memory.

Our Results

We give I/O-optimal techniques for isosurface extraction,by using the external-memory interval tree of Argeand Vitter [5], together with the isosurface engine from Vtk[25, 26].

Following the ideas of Cignoniet al. [11], we produce for each cell an interval[min;max], whereminandmax are the minimum and maximum values among the scalar values ofthe vertices of the cell. Thengiven an isovalueq, the active cells are exactly those cells whose intervalscontainq (i.e.,min � q � max).This reduces the searching of active cells to the following problem calledstabbing queries: given a set ofintervals and a query pointq in 1D, find all intervals containingq. We then use the external-memory intervaltree of [5] to solve the stabbing queries in an I/O-optimal way. In [5], bothstaticanddynamicversions of theinterval tree are given, with main focus on thedynamicversion (which supports insertions and deletions ofintervals), and no preprocessing algorithm is given for thestatic version1. In our application, we only needthe simplerstaticversion. We give simple preprocessing algorithms for the static version2 achieving optimalO(NB logMB NB ) I/O’s, whereM is the number of cells fitting in the main memory and one I/O operation readsor writes one disk block. With this search structure, we can find theK active cells for a given isovalueq usingoptimalO(logB N +K=B) I/O operations. The search structure itself occupies optimal O(N=B) blocks ofdisk space.

The implementation of the techniques is delicate but not complicated. We give the first implementation ofthe I/O-optimal interval tree, and also implement our methods as anI/O filter for Vtk’s isosurface extractionroutine for the case of unstructured grids. Our experimentsshow that the new techniques result in a signif-icant performance improvement over Vtk’s original implementation. In fact the search phase, originally thebottleneck of isosurface extraction, now needsless timethan the generation phase, i.e., the search phase is nota bottleneck any more! The practical efficiency of the techniques reflects their theoretical optimality.

The advantages of our techniques can be summarized as follows.� Our query algorithm improves the performance of isosurfaceextraction by speeding up the search phaseso that it is no longer a bottleneck.� Our query algorithm is output-sensitive, and yet by building the search structure in disk, our prepro-cessing is performedonce and for all, as opposed to other output-sensitive techniques which require

1As later pointed out by Arge and Vitter [6], an I/O-optimal preprocessing algorithm usingbuffer techniquewas known to themand maybe also to other specialists.

2Thescan and distributealgorithm and theinterval-to-node assignmentalgorithm; see Section 2.3.

3

preprocessing each time the process starts to run. Our search structure can be duplicated by just copyingfiles, without ever running the preprocessing again.� Our query algorithm only needs two blocks of main memory, andonly brings to main memory thoseKactive cells, whereK is usuallyO(N2=3). Other techniques either have to visit the whole dataset duringqueries, or have to use a large amount of main memory to keep the entire dataset plus some additionalsearch structure.� Our preprocessing algorithms need only a fixed amount of mainmemory space, which can be parame-terized. For datasets much larger than can fit into main memory, the running time of the implementedalgorithm is the same as performing a few external sortings,and is linear in the size of the datasets.Thus the preprocessing method is both efficient and predictable.

Our techniques have a wide range of applications. In addition to improving the performance of isosurfaceextraction, there are some other implications:� Datasets can be visualized very efficiently on workstationswith just enough main memory to hold the

isosurfaces themselves.� Datasets can be kept in remote file servers. Only the necessary parts are fetched during visualization,thus even at ethernet speeds, interactive isosurface extraction can be achieved.

Organization of the Paper

The rest of the paper is organized as follows. In Section 2, wepresent the technical details of our methods,which are based on an application of the I/O-optimal interval tree of [5]. Next we present in Section 3 theimplementation details of ourI/O filter for Vtk’s isosurface extraction routine. In Section 4, we present theoverall experimental performance of our methods, followedby conclusions and future work in Section 5.

2 Searching Active Cells with I/O Optimal Interval Tree

As described in the previous section, we produce an intervalI = [min;max] for each cell of the dataset, anduse the I/O-optimal interval tree of [5] to find the active cells of an isosurface by performing stabbing queries.

Since pointer references are very inefficient for disk accesses, we store thedirect cell informationtogetherwith the corresponding interval whenever that interval hasto be stored in the interval tree. Thus each recordof an interval includes the cell ID, the 3D coordinates and the scalar values of the vertices of the cell, and theleft and right endpoints of the interval.

If the input dataset is given in the format providing direct cell information, then we can build the intervaltree directly. Unfortunately, the datasets are often givenin the format that contains indices to vertices. Thuswe have to de-reference the indices before actually building the interval tree. We call this de-referencingprocessnormalization. Using the technique of [10], we can efficiently perform normalization as follows. Wemake one file (thevertex file) containing the direct information of the vertices (3D coordinates and scalarvalues), and another file (thecell file) of cell records with vertex indices. In the first pass, we externally sortthe cell file by the indices (pointers) to the first vertex, so that the first group in the file contains cells whosefirst vertices are vertex 1, the second group contains cells whose first vertices are vertex 2, and so on. Thenby scanning through the vertex file and the cell file simultaneously, we fill in the direct information of the firstvertex of each cell in the cell file. In the next pass, we sort the cell file by the indices to the second vertices,and fill in the direct information of the second vertex of eachcell in the same way. By repeating the process

4

for each vertex of the cells, we obtain the direct information for each cell; this completes the normalizationprocess.

For the rest of this section, we describe the I/O-optimal interval tree, including the data structure, the queryalgorithm, and the preprocessing algorithms. Each I/O operation reads or writes one disk block.

2.1 Data Structure

We first describe the data structure of our interval tree; thedescription is a re-presentation of the static I/O-optimal interval tree given in [5]. (For readers not acquainted with interval trees, see [22, pages 360–361].)Each node of our tree is one block in disk, capable of holdingO(B) items. Thebranching factor, b, is themaximum number of children an internal node can have. We letb = �(pB); the reason will be clear later.Let S be the set of allN intervals, andE be the set of2N endpoints of the intervals inS. We denote byn = djEj=Be the number ofblocksin E. First, we sortE from left to right in the order of increasing values,assuming that all endpoints have distinct values (we use cell ID to break ties). SetE is now fixedand willbe used to defineslab boundariesfor each internal node of the tree. The interval tree onE andS is definedrecursively as follows. The rootu is associated with the entire range ofE and with all intervalsS. If S has nomore thanB intervals, then nodeu is a leaf storing all intervals ofS. Otherwiseu is an internal node. We thenevenly divideE into b slabsE0; E1; � � � ; Eb�1, each containing the same numberdn=be blocksof endpointsin E. Theb � 1 slab boundariesare the first endpoints of slabsE1; � � � ; Eb�1. We store the endpoint valuesof the slab boundaries in nodeu as keys. We use these keys to consider each intervalI 2 S (see Fig. 1).If both endpoints ofI lie inside the same slab, say thei-th slabEi, thenI belongs to thei-th child ui ofnodeu, and is put into the interval setSi � S (e.g., in Fig. 1, intervalI0 is put inS0). Otherwise (the twoendpoints ofI belong to different slabs, i.e.,I crosses one or more slab boundaries),I belongs to nodeu. Theintervals belonging to nodeu will be stored in the secondary lists ofu pointed by pointers inu. We adopt theconvention that if an endpoint ofI is exactly the slab boundary separating slabsEi�1 andEi, that endpointis considered as lying in slabEi; this is consistent with our choice of slab boundaries. We associate eachchild ui of nodeu with the range of slabEi and with the interval setSi, and define the subtree rooted atuirecursively as the interval tree on rangeEi and intervalsSi. Notice that slabEi is pre-definedwhenE is given(dn=be blocks of endpoints in the first level,dn=b2e blocks of endpoints in the next level, and so on), but setSi has to be decided by scanning through the intervals inS and distribute them appropriately according to theslab boundaries. Observe thatSi may be empty, in which case childui of u is null (it is also possible that allchildren ofu are null).

For each internal nodeu, we use three kinds of secondary structures to store the intervals belonging tou:the left, right andmulti lists, described as follows.� There areb left lists, each corresponding to aslabof u. For eachi, thei-th left list stores the intervals

belonging tou whose left endpoints lie in thei-th slabEi. Each list is sorted byincreasing left endpointvaluesof the intervals (see Fig. 1).� There are alsob right lists, each corresponding to aslabof u. For eachi, the i-th right list stores theintervals belonging tou whose right endpoints lie in thei-th slabEi. Each list is sorted bydecreasingright endpoint valuesof the intervals (see Fig. 1).� There are(b � 1)(b � 2)=2 multi lists, each corresponding to amulti-slab of u. A multi-slab [i; j],0 � i � j � b � 1, is defined to be the union of slabsEi; � � � ; Ej . Themulti list for multi-slab [i; j]stores all intervals ofu thatcompletely spanEi [ � � � [Ej , i.e., all intervals ofu whose left endpointslie in slabEi�1 and whose right endpoints lie in slabEj+1. Since themulti lists [0; k] for anyk andthe multi lists [`; b � 1] for any l are always empty by the definition, we only care about multi-slabs

5

E0 E1 E2 E3 E4

S0 S1 S2 S3 S4

I4I5

I0 I1I2

I3 I6I7

u0 u1 u2 u3 u4

u

slabs:

multi-slabs:[1, 1][1, 2][1, 3][2, 2][2, 3][3, 3]

E0 ~ E4

Figure 1: A schematic example of the I/O-optimal interval tree for branching factorb = 5. Note that this isnot a complete example and some intervals are not shown (in a complete example, each slab boundary is aninterval endpoint, and each slabEi has the same number of blocks of endpoints). Consider only the intervalsshown here and nodeu. The interval sets for its children are:S0 = fI0g, andS1 = S2 = S3 = S4 = ;.Its left lists are:left(0) = fI2; I7g, left(1) = fI1; I3g, left(2) = fI4; I5; I6g, andleft(3) = left(4) = ;(the intervals in each list are sorted in the order as they appear). Itsright lists are:right(0) = right(1) = ;,right(2) = fI1; I2; I3g, right(3) = fI5; I7g, and right(4) = fI4; I6g (again the intervals in each listare sorted in the order as they appear). Itsmulti lists are: multi([1; 1]) = fI2g, multi([1; 2]) = fI7g,multi([1; 3]) = multi([2; 2]) = multi([2; 3]) = ;, andmulti([3; 3]) = fI4; I6g.[1; 1]; � � � ; [1; b � 2]; [2; 2]; � � � ; [2; b � 2]; � � � ; [i; i]; � � � ; [i; b � 2]; � � � ; [b � 2; b � 2]. Thus there are(b� 1)(b� 2)=2 such multi-slabs and the associatedmulti lists (see Fig. 1).

For eachleft, right, or multi list, we store the entire list in consecutive blocks in disk,and in nodeu westore a pointer to the starting position of the list in disk. Observe that there areb left andb right lists, andO(b2) = O(B) multi lists. Thus we need to keepO(B) items of information in nodeu, which is one diskblock capable of holdingO(B) items. This explains why the branching factorb is taken as�(pB).

Now let us analyze some properties of the interval tree. First, the height of the tree isO(logbN) =O(logB N), because each time we go down one level of the tree, the range of slabE associated with a nodeis reduced by a factor ofb. Secondly, each interval belongs to exactly one node, and isstored at most threetimes: if it belongs to a leaf node, then it is stored only once; if it belongs to an internal node, then it is storedonce in someleft list, once in someright list, and possibly one more time in somemulti list. Therefore weneedroughlyO(N=B) disk blocks to store the entire data structure.

In theory, however, we may need more blocks. The problem is because of themulti lists: in the worst case,a multi list may have only very few (<< B) intervals in it, but still requires one disk block for storage. Thesame situation may occur also forleft andright lists, but since each internal node hasb left/right lists and thesame number of children, theseunderflowblocks can be charged to the children nodes. But since there areO(b2) multi lists for an internal node, this charging argument does not work for themulti lists. In [5], theproblem is solved by using thecorner structureof [16]. Corner structure is an I/O-optimal stabbing-query

6

data structure for a subset of intervals that can entirely fitinto the main memory during preprocessing (butwe do not want to read all of them from disk during queries). Given a set oft � B2 intervals (where themain memory is assumed to be able to holdB2 intervals), we can use optimalO(t=B) I/O operations to builda corner structure occupying optimalO(t=B) blocks of disk space, such that a stabbing query reportingkintervals can be performed in optimalO(k=B + 1) I/O operations [16]. Although corner structure is veryelegant, it is more complex; we thus only treat it as a black box. More details can be found in [16].

The usage of the corner structure in interval tree is as follows. For each of theO(b2) multi lists of aninternal node, if there are at leastB=2 intervals, we directly store the list in disk as before; otherwise, the(< B=2) intervals are maintained in a corner structure associatedwith the internal node. Since there areO(b2) = O(B) multi lists, and only those with less thanB=2 intervals are maintained in the corner structure,the number of intervals in the corner structure of an internal node is less thanB2=2, satisfying the restrictionof corner structure. In summary, the height of the interval tree isO(logB N), and using the corner structure,the space needed by the interval tree isO(N=B) blocks in disk, which is worst-case optimal.

2.2 Query Algorithm

We now review the query algorithm given in [5], which is very simple. Given query valueq, we perform thefollowing recursive process starting from the root of the interval tree. For the current nodeu that we wantto visit, we read it from disk (recall thatu is a disk block). Ifu is a leaf, we just check theO(B) intervalsstored inu, report those intervals containingq and stop. Ifu is an internal node, we perform a binary searchfor q on the keys (slab boundaries) stored inu to identify the slabEi containingq. Now we want to reportall intervals belonging tou that containq. We check thei-th left list left(i) of u, scan through the list frombeginning and report intervals, until we reach some interval whoseleft endpoint value islarger thanq. Recallthat eachleft list is sorted by increasing left endpoint values of the intervals. Similarly, we check theright listright(i) of u, scan through the list from beginning and report intervals,until we reach some interval whoseright endpoint value issmallerthanq. Again, recall that eachright list is sorted by decreasing right endpointvalues of the intervals. Also, all intervals thatspanslabEi must containq and thus must be reported. Wecarry out this task by reporting all intervals inmulti listsmulti([`; r]), where1 � ` � i andi � r � b � 2.Finally, we visit thei-th childui of nodeu, and recursively perform the same process onui (if child ui is nullthen we stop). An example is given in Fig. 2.

Since the height of the tree isO(logB N), we only visitO(logB N) nodes of the tree. We also visit theleft,right, andmulti lists for reporting intervals. LetK be the total number of intervals reported. Then the queryalgorithm reads roughlyO(logB N + K=B) disk blocks. Now let us discuss the theoretically worst-casesituations about underflow blocks in the lists. An underflow block in theleft or right list is fine: Since we onlyvisit oneleft list and oneright list per internal node, we can charge the I/O cost to that internal node. But thischarging argument does not work for themulti lists, since we may visit manymulti lists for an internal node.This problem is again solved in [5] by using the corner structure of [16] mentioned at the end of Section 2.1:underflowmulti lists of an internal nodeu are not accessed, but are collectively taken care of by performinga query on the corner structure ofu. Thus the query algorithm achievesO(logB N +K=B) I/O operations,which is worst-case optimal.

2.3 Preprocessing Algorithms

We first describe a new preprocessing algorithm achieving nearly optimalO(NB logB N) I/O operations. Itis based on a paradigm we callscan and distribute, inspired by thedistribution sweepI/O technique [9,13]. Later in this section we discuss how to make the algorithm I/O-optimal, as well as other preprocessingmethods that are I/O-optimal.

7

I4I5

I0 I1I2

I3 I6I7

u0 u1 u2 u3 u4

u

q

E0 E1 E2 E3 E4

Figure 2: Performing a query in the example of Fig. 1 (considering only the intervals shown). First we visitthe rootu. Query pointq lies in slabE2. We checkleft(2) and report nothing. We then checkright(2) andreportI1 andI2. We also report all intervals inmulti([1; 2]), multi([1; 3]), multi([2; 2]), andmulti([2; 3]),which reportsI7. Finally, we want to recurse on nodeu2, but it is null, so we stop.

Scan and Distribute Preprocessing Algorithm

The basicscan and distributealgorithm follows the definition of the interval tree given in Section 2.1. Westart by duplicating each interval record, one with the leftendpoint as the key and the other with the rightendpoint as the key. We then sort (using external sorting) all these endpoints by the keys from left to rightin increasing order (breaking ties by cell ID’s). This givesthe sorted set of endpointsE. This set is usedto decide the range and the slab boundaries of the current node throughout the process. Again, we usen todenote the number of blocks inE. The setS of intervals sorted by increasing left endpoint values is initiallycreated by copying and filtering out the right endpoints fromE.

Now we use a recursive procedure to build the tree, as follows. The rootu of the tree is associated with theentire range ofE and the entire interval setS. If S has no more thanB intervals, then we make nodeu a leaf,store those intervals inu, and stop. Otherwise, we makeu an internal node. We extract theb � 1 endpointsfrom E as slab boundaries which evenly divideE into b slabsE0; � � � ; Eb�1 of dn=be blocks each. We thenscan through the setS; for each intervalI being examined, we perform a binary search for each of the twoendpoints on the slab boundaries, and decide whetherI lies entirely inside some slabEi (in which case wehave to putI into setSi), or crosses some slab boundary (in which caseI belongs to nodeu and we want toput I into appropriateleft, right and/ormulti lists). We use a temporary list for eachSi, and similarly for eachlist left(i), right(i), andmulti([`; r]) of nodeu. Each temporary list is kept in consecutive blocks in disk,and is also associated with onebuffer, which is of one disk block size, in the main memory.

Notice that we only needO(B) blocks in the main memory for the buffers. In our actual implementation,we only need4b buffers instead; see Section 3. Each time we want to put the current intervalI into somesetSi or someleft/right/multi list, we just insertI into the buffer of the corresponding temporary list. Whenthe buffer is full, we write the buffer out to the corresponding temporary list in disk, and the buffer is againavailable for use.

After scanning through the entire interval setS, each interval inS is distributed to its appropriate temporary

8

lists. Observe that originallyS is sorted by increasing left endpoint values of the intervals, thus after thisscanand distributeprocess is done, each temporary list is automatically sorted by increasing left endpoint values aswell. We then sort (using external sorting) eachright list by decreasing right endpoint values of the intervals,so that all temporary lists are in the desired sorted orders.After this, we copy each temporary list back toits corresponding list in disk, set up appropriate information (e.g., the number of intervals and a pointer tothe starting position in disk of each list, etc) in nodeu. Finally, we write nodeu to the disk, and recursivelyperform the same procedure on each childui (of nodeu) with the range of slabEi and interval setSi (if Si isempty then nodeui is null).

Let us analyze the complexity of the above algorithm. The initial external sorting on the endpoint setEtakesO(NB logMB NB ) I/O operations [1]. Recall thatM is the number of cells (intervals) that can fit into the

main memory. During tree construction, we performO(N=B) I/O operations for thescan and distributeprocess at each level of the tree, and thusO(NB logB N) I/O’s in total. We also externally sort all right listsduring tree construction. Since each interval belongs to exactly one right list, this takesO(Pi NiB logMB NiB ) =O(NB logMB NB ) I/O operations. Noting that usuallyM � B2, the total I/O cost of the basicscan and distribute

algorithm isO(NB logMB NB ) +O(NB logB N) = O(NB logB N).Since the tree construction implies a sorting of the intervals, the optimal I/O bound for preprocessing is the

external sorting bound�(NB logMB NB ) [1]. As pointed out by Arge [3], the boundO(NB logB N), achievedby the basicscan and distributealgorithm just described, can also be achieved by using abuffer techniquewith a buffer tree[2]. Moreover, both ourscan and distributemethod and the buffer technique can be madeI/O-optimal, by basically building a

pM=B-fan-out tree and converting it into apB-fan-out tree during tree

construction [3]. For example, in ourscan and distributemethod, we can use the same procedure above tobuild a

pM=B-fan-out tree (in which the range of each node is partitionedintopM=B slabs). Notice that we

now needO(M=B) blocks in main memory for the buffers in building thispM=B-fan-out tree, as opposed

to usingO(B) blocks for the buffers in building apB-fan-out tree described before. The total number of

I/O’s per level is stillO(N=B), and thus the overall number of I/O’s is optimalO(NB logMB NB ) since there

areO(logpM=B NB ) levels. Each time we build a nodeu for such tree, we actually construct apB-fan-out

subtree of heightlogpBpM=B (recall that nodeu haspM=B slabs), and re-assign the intervals of nodeu to new nodes and newleft, right andmulti lists accordingly. The overall height of the final tree is thusO([logpM=B NB ] � [logpBpM=B]) = O(logB N), as desired.

Interval-to-node Assignment Preprocessing Algorithm

Independently, we also devised a completely different method based on aninterval-to-node assignmentscheme,achieving the same optimal I/O bound. This time we constructthe

pB-fan-out tree directly. After sorting the2N endpoints into setE in increasing endpoint values as before, each node of the tree is associated with acertain fixed range of the endpoints inE. In particular, each leaf corresponds to abasic rangeof B endpoints.Thus we can first build the entire “skeleton” of the tree (i.e., tree nodes and the slab boundaries of each node),and then assign the intervals one by one to the appropriate nodes. As we assign intervalI to nodeu, weproduce an item(I;node ID ofu); at completion, we externally sort all such items by node ID’s so that allintervals assigned to the same node are put together. Finally, we put the intervals assigned to each nodeu intothe appropriateleft, right, andmulti lists ofu; this completes the construction of the tree.

Now consider how to assign intervals to nodes. A naive way is to scan and assign all intervals one by oneas follows. For each current intervalI, we decide the destination node ofI by searching along a root-to-leafpath. Starting from the root, we read the current nodeu from disk, and perform binary searches for the twoendpoints ofI on the slab boundaries ofu. If the two endpoints lie in different slabs, thenI belongs tou and

9

we stop; otherwise the two endpoints lie in the same slab, sayslabj of nodeu, and we recursively performthe same process on thej-th child of nodeu. Observe that this naive approach is very undesirable, because foreach intervalI we need to readO(logB N) nodes from a root-to-leaf path, resulting a total ofO(N logB N)I/O’s.

Our method follows the above naive approach, with one major difference: instead of reading the nodesin the root-to-leaf search path from disk, we generate the node ID’s and the slab boundaries of such nodesby in-core computation. This enables us to avoid theO(logB N) disk reads for each interval, and thus byscanning allN intervals withO(N=B) I/O’s, we can assign all intervals to their destination nodes. Noticethat the I/O cost is reduced fromO(N logB N) toO(N=B).

The node ID’s are easily computed if we use a breadth-first numbering of the nodes, say, giving 0 to theroot, 1; � � � ; b to the children of the root from left to right, and so on. As forthe slab boundaries of eachnode, the key idea is to use thebasic range indicesassociated with the slab boundaries, rather than the actualendpoint values of the slab boundaries. Accordingly, we also need to convert the actual values of the twoendpoints of each intervalI into theindicesof the two basic ranges containing the two endpoints ofI. Then,during the root-to-leaf search for the current intervalI, when we want to locate the slab(s) of the current nodeu containing the two endpoints ofI, the binary searches, for the two endpoints ofI on the slab boundaries ofu, are performed in the “metric space” of basic range indices.

We can calculate the basic range indices of the slab boundaries of any root-to-leaf nodes as follows. Theroot corresponds to the entire range ofd2N=Be basic ranges, and thus its slab boundary separating slabi andslabi+1 is the one separating basic rangedd2NB e=be�(i+1)�1 and basic rangedd2NB e=be�(i+1). Similarly,the basic range indices of the slab boundaries of each node inany root-to-leaf path can be computed, usingthe fact that each time we go down one level, the range of a slabis reduced by a factor ofb.

Now we show how to obtain the indices of the basic ranges containing the two endpoints of each intervalI. In sorted setE of 2N endpoints, we mark the firstB endpoints as lying in basic range 0, the nextBendpoints as in basic range 1, and so on. Now each endpoint item has its interval ID and its basic range index;we externally sort all these items by interval ID’s so that the two endpoint items of the same interval are puttogether. This enables us to know the indices of the two basicranges containing the two endpoints of eachinterval.

Putting together all steps described above, we can easily see that the total number of I/O’s in the entirepreprocessing is bounded by the external sorting bound, which is optimal.

In summary, thescan and distributemethod, the buffer technique, and the interval-to-node assignmentscheme can all achieve the optimal I/O boundO(NB logMB NB ) for preprocessing.

Remark. Consider the I/O boundO(NB logB N) and the optimal oneO(NB logMB NB ). Although the former is

not optimal, it is actually nearly optimal: In practice,logB() is almost the same aslogMB (); in particular, thelogB() term corresponds to the tree height, which is at most 4 in all our experiments. Therefore changing thetree fan-out (and thus the tree height) inscan and distributewould not be necessary, and we only implementthe basicscan and distributealgorithm as our preprocessing code. As for the basic buffertechnique (i.e.,without changing the tree fan-out), the disk scratch space needed during preprocessing might potentially beless, and yet the two-phasebuffer-emptyingoperation in buffer tree [2] is more complicated thanscan anddistributeand might potentially slow down the algorithm. Similarly, the interval-to-node assignment schememight potentially use less disk scratch space, and yet the extra sortings and the index computations mightpotentially result in more running time. We leave the question of how well the basic buffer technique and theinterval-to-node assignment scheme perform in practice compared to the basicscan and distributealgorithmas an interesting open question.

10

3 Implementation

We implemented our methods as anI/O filter for Vtk’s isosurface extraction for the case of unstructured grids.First,ioBuild is used to preprocess the dataset to build the interval tree,and thenioQuery is used to reportthe active cells of a given isosurface. We first describe how to implement the interval tree, and then describethe interface with Vtk.

3.1 Implementing the External Memory Interval Tree

Data Structure

We describe the organization of the data structure. We use filesdataset.intTree, dataset.left,dataset.right, anddataset.multi to hold the interval tree nodes, allleft lists, all right lists, andall multi lists, respectively. Every time we create a new tree node, weallocate the next available blockfrom file dataset.intTree and store the node there. (The root of the tree always starts from posi-tion 0 of file dataset.intTree.) Similarly, every time we create a newleft (resp. right/multi) list,we allocate the next available consecutive blocks just enough to hold the list, from filedataset.left(resp.dataset.right/dataset.multi) and store the list there. Notice that we always allocate diskspace of size anintegral number of blocks. Each block in filedataset.left, dataset.right anddataset.multi stores up toB intervals. Each interval contains a cell ID, four vertices of the cell (x-, y-,z- values and the scalar value of each vertex), and the left andright endpoints of the interval associated withthe cell, which are themin andmax values of the four scalar values. In our case with one disk block being4,096 bytes, a block can store up to 53 intervals, i.e.,B = 53.

Now we describe the layout of the interval tree nodes, each ofsize one disk block. It has a flag to indicatewhether it is a leaf or an internal node.

– If it is a leaf, the rest of the node contains the following:

(1) number (� B) of intervals stored in the node;

(2) actual intervals stored.

– If it is an internal node, the rest of the node contains the following:

(1) number (� b� 1) of keys (i.e., slab boundaries) stored;

(2) the actual slab boundaries stored;

(3) b pointers to the starting positions of the children nodes in file dataset.intTree (-1 if thatchild is null);

(4) information about itsb left lists;

(5) information about itsb right lists;

(6) information about its(b� 1)(b� 2)=2 multi lists.

The information about eachleft list include the following: (a) a pointer to the starting position of the list infile dataset.left, (b) number of intervals in the list, and (c) the minimum leftendpoint value among allintervals of the list (which is just the left endpoint value of the first interval in the list, according to the sortedorder of the list). Item (c) is used to speed up the query: whenwe need to check the query pointq against thisleft list, if q is smaller than the value stored here, then no interval in thelist will contain q and thus we can

11

avoid reading any block of the list. The information about each right list are similar. The information abouteachmulti list are also similar, but do not contain item (c) since during queries amulti list is reported as awhole with no checking necessary (recall from Section 2.2).From the size of each data type of each field andthe fact that a node holds at most 4,096 bytes, we can compute the branching factorb of the tree, which in ourcase is 29 (observe thatb > pB here).

For the simplicity of the coding, we currently do not implement the corner structure.

Preprocessing

External sorting is used extensively during preprocessing, both in normalization (see the beginning of Sec-tion 2) and in tree construction (see Section 2.3). We use external merge sort for this operation, which isI/O-optimal. For the interested reader, see Aggarwal and Vitter [1] for a description of the algorithm underthe I/O-model we are using, and Knuth [17, Section 5.4] for anexcellent reference for full details. In ourimplementation, since the valuek in the k-way merging procedure is fairly large (k = bM=Bc), we use asimple “selection tree”, implemented as a binary tree in themain memory, to select the next smallest elementfrom thek sorted lists currently being merged. Also, in the selectiontree, we operate on thepointersto theelements, rather than the elements themselves, to avoid moving large records around in the main memory.Knuth [17, Section 5.4] discusses several methods for achieving optimal merging schemes, even taking intoaccount the tape and disk delays.

For constructing the interval tree, as remarked at the end ofSection 2.3, we only implement the basicscan and distributealgorithm as our tree-construction code, which is nearly I/O-optimal. There are someinteresting issues involved in this implementation. Recall from Section 2.3 that during thescan and distributeprocess for the current nodeu (associated with endpoint setE and interval setS), we use a temporary listfor each of theb left lists, b right lists, (b � 1)(b � 2)=2 multi lists, andb interval setsSi. A first attemptwould be to use a file for each temporary list. This would require us to open3b + (b � 1)(b � 2)=2 files atthe same time, since no temporary list is completed until one pass of thescan and distributeprocess is done.Unfortunately, there is a hard limit imposed by the operating system on the number of files a process can opensimultaneously (given by the system parameterOPEN MAX; older version of Unix allowed up to 20 open filesand this was increased to 64 by many systems).

Our solution to this problem is to use a scratch file as a collection of the temporary lists of the sametype. For example, we use filedataset.left temp to collect all temporary lists for theb left lists.Observe that all intervals belonging toleft(i) (and thus belonging to the temporary list ofleft(i)) musthave their left endpoints lying in slabEi, but there are no more thandn=be blocks of endpoints in slabEi, wheren is the number of blocks inE. Therefore, each temporary list has at mostdn=be blocks ofintervals, and thus we let thei-th temporary list start from blocki � dn=be of file dataset.left temp, fori = 0; � � � ; b� 1. Notice that the size of filedataset.left temp is no more than the size ofE. After theconstruction of all temporary left lists is complete, we copy them to the filedataset.left, and the scratchfile dataset.left temp is again available for use in the next recursion.

We handle the temporary lists for theright lists in the same way, except that before copying to filedataset.right, each temporary right list has to be sorted first (in the orderof decreasing right end-point values). In the same way, each interval setSi for child ui of nodeu has at mostdn=be blocks. We scanthe entire file forS (file dataset.intvls) and distribute the intervals to the appropriate temporarylistsfor eachSi (i.e., appropriate portions of the scratch file holding a collection of all temporary lists for eachSi),and then copy the temporary lists back to the corresponding portions of filedataset.intvls. Now eachsetSi is just the portion of filedataset.intvls starting from blocki � dn=be with no more thandn=beblocks. Each setSi, together with slabEi, are then used as input for the next level of recursion.

12

Finally, consider the construction of themulti lists for the current nodeu. By the same argument, each listhas at mostdn=be blocks. Unfortunately, there are(b� 1)(b � 2)=2 such lists. If we collected all temporarylists into a single scratch file by the above method, then thisscratch file would have size�(nb) blocks, whichis definitely undesirable. To solve the problem, observe that multi([i; j]) consists of all intervals with leftendpoints in the same slabEi�1. Therefore, we construct allmulti listsmulti([i; j]) for a fixedi from theleftlist left(i � 1) (again by ascan and distributeprocess), and repeat the processb � 2 times for all possiblevalues ofi. Then during each iteration, there are at mostb multi lists being constructed, and thus our scratchfile only needsn blocks of space. The number of buffers in main memory needed for constructing themultilists is also reduced from(b� 1)(b � 2)=2 to b.

In summary, to construct theleft, right, andmulti lists and the interval setsSi for children, we use fourscratch files, each with sizen blocks, and also4b blocks of main memory as buffers.

Handling Degeneracies

The issue of degenerate cases arises when the endpoint values of the intervals are not distinct. As describedbefore, we use cell ID’s to break ties. We also adopt the convention that if an endpoint is exactly the slabboundary separating slabsEi�1 andEi, then this endpoint is considered as lying in slabEi; this is consistentwith our choice of slab boundaries (see Section 2.1). In the internal node of the interval tree, we only storethe endpoint values as slab boundaries (keys), without storing the corresponding cell ID’s. During queryoperations, if the query valueq has the same value as some slab boundary, we consider all slabs that canpossibly contain the value ofq, and perform the query operation on all such slabs accordingly. This ensuresthat all answers are correctly reported. Notice that if several slab boundaries have the same value as that ofq,we only need to go to the two children respectively to the leftof the leftmost such slab boundary and to theright of the rightmost such boundary. This is because all other children in between can only contain intervalswhose two endpoint values are the same (same asq); the corresponding cells of such intervals are thus lyingin the interior of the isosurface and therefore are not interesting cells tobe reported.

3.2 Interfacing with Vtk

Extracting isosurfaces withioBuild andioQuery is relatively simple. Recall thatioBuild is used topreprocess the dataset (see Fig. 3), and thenioQuery is used for reporting active cells of a given isosurface(see Fig. 4).

The input toioBuild is a Toff file3, which unfortunately contains indices to vertices. Therefore, the firstpart ofioBuild is to normalize the Toff file, de-referencing these indices as described in the beginning ofSection 2, before the actual construction of the interval tree can begin. Since there are four vertices in eachcell (tetrahedron), four passes over the input are necessary. If the files are already given in de-referenced form,the first part ofioBuildwould not be necessary.

A full isosurface extraction pipeline should include several steps in addition to finding active cells. Inparticular, (1) intersection points and triangles have to be computed; (2) triangles need to be decimated [27];and (3) triangle strips have to be generated. Steps (1)–(3) can be carried out by the existing code in Vtk [26],which makes it a perfect match for a proof-of-concept implementation of our I/O techniques. Our currentcode only implements the actual triangle generation. UsingVtk’s simple pipeline scheme, it is a simpleprogramming exercise to further process the triangulation, decimate it and create the strips.

TheioQuery code is implemented by linking our I/O querying code with Vtk’s isosurface generation, asshown in Fig. 4. Given an isovalue, (1) all the active cells are collected from disk; (2) a vtkUnstructuredGrid

3A Toff file is analogous to the Geomview “off” file. It has the number of vertices and tetrahedra, followed by a list of the verticesand a list of the tetrahedra, each of which is specified using the vertex locations in the file as an index. See [30].

13

dataset.left

dataset.right

dataset.multi

ioBuilddataset

dataset.intTree

Figure 3: Preprocessing phase. Given “dataset”, which is aninput volumetric dataset containing tetrahedralcells,ioBuild creates the four files of the interval tree to be used later during isosurface queries.

vtkUnstructuredGrid

vtkPolyMapper

isoSurfaceisoValue

dataset.*ioQuery

vtkContourFilter

Figure 4: Isosurface extraction phase. Given the four files of the interval tree and an isovalue,ioQueryfilters the dataset and passes to Vtk only those active cells of the isosurface. Several Vtk methods are used togenerate the isosurface, in particular, vtkUnstructuredGrid, vtkContourFilter, and vtkPolyMapper.

is generated; (3) the isosurface is extracted with vtkContourFilter; and (4) the isosurface is saved in a filewith vtkPolyMapper. At this point, memory is deallocated. If multiple isosurfaces are needed, this processis repeated. Note that this approach requires double buffering of the active cells during the creation of thevtkUnstructuredGrid data structure. A more sophisticatedimplementation would be to incorporate the func-tionality ofioQuery inside the Vtk data structures and make the methods I/O aware. This should be possibledue to Vtk’s pipeline evaluation scheme (see Chapter 4 of [26]).

4 Experimental Results

In this section we present experimental results of activelyusing our I/O filtering techniques on real datasets.We have run our experiments on four different datasets shownin Table 1.4 All of these datasets are tetrahe-dralized versions of well-known datasets. Our primary interest in this initial implementation was to quantifythe I/O overhead, if any, both in terms of memory and time, andto compare with Vtk’s native isosurfaceimplementation.

Our benchmark machine was an off-the-shelf PC: a Pentium Pro, 200MHz with 128M of RAM, and twoEIDE Western Digital 2.5Gb hard disk (5400 RPM, 128Kb cache,12ms seek time). Each disk block size is4,096 bytes. We ran Linux (kernel 2.0.27) on this machine. One interesting property of Linux is that it allowsduring booting the specification of the exact amount of main memory to use. This allows us tofake for theisosurface code a given amount of main memory to use (after this memory is completely used, the system willstart to use disk swap space and have page faults).

4Our beta code is currently only available upon request. We plan to release it to the public in our web page after some minorimprovements.

14

Name # of Cells Original Size Size after Normalization

Blunt Fin 187K 5.8M 12MCombustion Chamber 215K 6.8M 13MLiquid Oxygen Post 513K 16.4M 33M

Delta Wing 1,005K 33.8M 64M

Table 1: A list of the datasets used for testing. After normalization, each cell is 64 bytes long (3D coordinatesand scalar fields are represented as floats).

Preprocessing with ioBuild

UsingioBuild is a fully automatic process. The only argument is the name ofthe input file containingtetrahedral cells. During its execution,ioBuild creates and writes multiple files, but in the end only the fourfiles shown in Fig. 3 are kept.

In analyzing the behavior of external memory algorithms, itis very important to take into account theamount of main memory used by the algorithm. In general, the more memory it is given, the less I/O oper-ations it needs. For instance, when sorting a file that can be kept entirely in main memory, we just need totouch the file twice, once for reading, and once for writing. As the available main memory is smaller, we needto perform more I/O operations. The scalability of an external memory algorithm is best seen when the mainmemory is smaller than the dataset. In order to simulate this, we only allowioBuild to allocate a 1024Kblocks (4 Mb) of RAM. This is a parameter in the program.

The first time we ranioBuild, it just seemed to betoo fast for the number of I/O reads and writesioBuild was issuing. It turned out that the operating system was ableto optimize (by caching) a lot ofthose I/O requests, and the CPU was running at nearly 95% of usage. In order to avoid these side effects ofthe operating system, we lowered the amount of main memory ofour system by starting Linux with a “linuxmem=16M” command line at kernel boot time. Basically, about14M of main memory can actually be usedby applications, and during the time of our benchmarks, our system was fully functional (i.e., it was not insingle user mode; all system-related daemons were active).That is, we ran our experiments in a “normal”environment on the datasets with size equal to or larger thanthe main memory. (Remember that only 4M ofRAM were actually used byioBuild.) The actual CPU usage percentage during this preprocessing was inthe range of low teens.

In Table 2 we show all relevant experimental data obtained from runningioBuild with a 16M/4M con-figuration. Recall from Section 3.2 that the first part ofioBuild is to normalize (i.e., de-reference) the inputToff file (which in turn amounts to a few external sorting operations), and the second part is to actually con-struct the interval tree. The basic operations ofioBuild are hence external sorting and scanning (thescanand distributeprocess described in Section 2.3). It should be clear from Table 2 that the overall running timeof ioBuild is the same as performing external sorting a few times, and islinear in the size of the datasets.ThereforeioBuild is both efficient and predictable — for a given system configuration, one can estimate(or extrapolate) the overall running time based on a linear behavior. This preprocessing can be made muchfaster by using faster disks, and optimized to use more memory. For instance, in an SGI Power Challenge, itonly takes 10 minutes to preprocess the Delta Wing dataset.

The interval tree data structure requires more disk space than the normalized tetrahedron file, which is alsolarger than the original Toff file. The average increase in disk space is about 7.7 times, and the maximumincrease is 8.5 times (see Table 2). During the preprocessing time, scratch files used for computation arenecessary. In general, one needs about 16 times the amount ofdisk space of the original Toff file to generate

15

Blunt Chamber Post Delta

intTree 303K 147K 238K 750Kleft 15.6M 17.3M 41.0M 80.5Mright 15.6M 17.3M 41.0M 80.5Mmulti 17.6M 20.0M 34.6M 80.6MTotal Size 49.2M 54.7M 116.8M 242.3MOriginal Size 5.8M 6.8M 16.4M 33.8MRatio of Size Increase 8.5 8.0 7.1 7.2

Normalization 348s 465s 920s 1798sTree Construction 361s 391s 928s 1982sTotal Time 709s 856s 1848s 3780s

Page Ins 103K 115K 270K 570KPage Outs 109K 123K 286K 605K

Table 2: Statistics for the running ofioBuild in a machine with 16M of main memory and 4M of buffermemory, for the datasets in Table 1. The first four values are the sizes of the four files kept after the prepro-cessing. “Total size” is the total amount of disk space used after preprocessing. “Normalization” is the timeused to convert the input Toff file to a normalized (i.e., de-referenced) file. “Tree Construction” is the actualtime used to create the interval tree data files from the normalized file. “Total Time” is the overall running timeof the whole preprocessing. “Page Ins” and “Outs” are the numbers of disk block reads and writes requestedto the operating system.

the interval tree. Notice, however, that this extra disk space for scratch files is neededonce and for all, sinceafter runningioBuild once to build the interval tree files, these tree files can be duplicated by just copying,without ever runningioBuild again.

Because it might be necessary to use as much as 16 times the original disk space for preprocessing agiven dataset, we believe that in large production environments, large scratch areas should be available forpreprocessing. We see this as a minor cost for the overall savings in both time and space later on. One shouldalso note, that disk prices are on the order of 35-40 times lower than main memory prices. So the overall costof a four to eight factor increase in disk space overhead is negligible when compared to a twofold increasein main memory costs. (In [29], a twofold main memory overhead is reported for improved, although notoptimal, isosurface generation times.) Again, we would like to stress the overall speed advantage in isosurfaceextraction time our technique provides, since it only performs the preprocessing once.

Isosurface Extraction with ioQuery

The true performance of an isosurface extraction techniqueis the actual time it takes to generate a givenisosurface. As explained before, our code is coupled with Vtk (see Fig. 4). Basically, during isosurfaceextraction, we find the active cells, and use Vtk’s isosurface capabilities to actually generate the isosurface.In the following, we useioQuery to denote the entire isosurface extraction code.

We ran three batteries of tests, each with different amount of core memory (128M, 64M, and 32M). Eachtest consists of calculating 10 isosurfaces with isovaluesin the range of the scalar values of each dataset, byusing our code and also the Vtk-only isosurface code (denoted by vtkiso). We did not run X-windowsduring the isosurface extraction time, and the output of vtkPolyMapper was saved in a file instead. Somerepresentative isosurfaces are shown in Fig. 8.

16

We found that our approach has several advantages. Initially we thought that our method would only beuseful for really large datasets, but our experiments show that even for smaller datasets the I/O querying timeis negligible when compared to the overall isosurface extraction time.

Another advantage is that, for large isosurfaces, the memory requirements of the triangle mesh is verylarge. If the entire volume dataset is kept in the main memory, then even if the dataset itself can fit, adding thetriangle mesh might make the main memory not large enough andcause system swap. The fact thatioQueryonly uses two blocks of main memory during active-cell searching makes our approach very favorable: themain memory space can be saved for the triangle mesh of the isosurface, so that system swap is much lesslikely to occur.

We summarize in Table 3 thetotal running times for the extraction of the 10 isosurfaces usingioQueryandvtkiso with different amount of main memory. It should be clear thatfor larger datasets (e.g., Post andDelta)ioQuery is much faster, and the margins increase significantly as themain memory size goes down.

Figures 5 to 7 summarize detailed benchmarks, with left columns presenting the performance of runningioQuery, and the right columns for the performance ofvtkiso. For each isosurface calculated usingioQuery, we break the time into four categories:

(1) I/O time (the bottommost part, shown in red) – This is the time to identify and bring in from disk theactive cells of the isosurface in question.

(2) Copy Time (the second part from bottom, shown in yellow) –In order to use Vtk’s isosurface capabil-ities, we need to generate a vtkUnstructuredGrid object that contains the active cells we just obtained.We refer to the time for this process as “Copy Time”.

(3) Isosurface (the third part from bottom, shown in blue) – This is the time for Vtk’s isosurface code toactually generate the isosurface from the active cells.

(4) File Output (the topmost part, shown in green) – In the endwe write to disk a file containing the actualisosurface in Vtk format.

As for the performance ofvtkiso, only items (3) and (4) are shown in Figs. 5–7, and there are twoadditional costsnot shown: the reading of the dataset, and the generation of the vtkUnstructuredGrid. Thesetwo operations are performed only once whenvtkiso starts to run (i.e., at the beginning of each batch ofthe 10 isosurface extractions), and thus it is not possible to spread the costs over each individual isosurfaceextraction. Therefore we do not show them in Figs. 5–7, but only show the sum of the two costs asvtkisoI/O in Table 3. It is very interesting to see (from Table 3) that thevtkiso I/O entries for Post in 64M and32M, and for Delta in all three memory sizes, are all much larger than the corresponding entries ofioQuery.This means that before the timevtkiso finishes reading the dataset and generating a vtkUnstructuredGrid,ioQuery already finished calculating all 10 isosurfaces!

Regarding Figs. 5–7, we make the following observations:� ForioQuery, in general the I/O time is only a small fraction of the overall isosurface extraction time(see the left columns of Figs. 5–7). In particular, for most of the time (1) is smaller than (3), especiallyas the datasets get larger. This means that the active-cell searching process is not a bottleneck anymore; the effect is even more significant for larger datasets. One can see the output-sensitive behaviorby noting that when small isosurfaces (or no isosurfaces) are generated,ioQuery takes negligibletime.� For ioQuery, the I/O times almost do not change with the amount of main memory (see the leftcolumns of Figs. 5–7). This shows that our technique for finding active cells only needs a very smallamount of main memory, and the performance is independent ofthe size of the main memory available.

17

Blunt Chamber Post Delta

ioQuery – 128M 7s 16s 18s 31svtkiso – 128M 15s 22s 44s 182svtkiso I/O – 128M 3s 2s 12s 40s

ioQuery – 64M 9s 17s 21s 31svtkiso – 64M 18s 27s 112s 3122svtkiso I/O – 64M 5s 6s 67s 224s

ioQuery – 32M 10s 19s 22s 32svtkiso – 32M 21s 54s 1563s 3188svtkiso I/O – 32M 8s 28s 123s 249s

Table 3: Overall running times for the extraction of the 10 isosurfaces usingioQuery andvtkiso withdifferent amount of main memory. These include all the time to read the datasets and write the isosurfacesto files. vtkiso I/O is the fractional amount of time ofvtkiso for reading the dataset and generating avtkUnstructuredGrid.� ForioQuery, the overall running times almost do not change with the amount of main memory (see

the left columns of Figs. 5–7). We expect that the overall running times will increase only when Vtkneeds more main memory than available to store the actual polygons generated (and thus causes pagefaults). The fact that our technique only needs a very small amount of main memory to find active cellsenables us to visualize much larger isosurfaces very efficiently.� Even without taking into account the two costs ofvtkiso (reading the dataset and generating a vtkUn-structuredGrid), comparing the times ofvtkiso and ofioQuery as shown in Figs. 5–7 (which is thusnot fair for ioQuery), ioQuery is already much faster for large datasets. For example, for the 8thisosurface of Delta Wing using 128M of RAM, it takesioQuery 11 seconds, whilevtkiso takesabout 34 seconds. As for small datasets, the overhead ofioQuery is negligible. But the advantagesreally start to show as the main memory size goes down. For instance, with 64M, for the 8th isosurfaceof Delta Wing, it takesioQuery the same 11 seconds, whilevtkiso takes 300 seconds!

5 Conclusions and Future Work

We have presented an I/O-optimal technique for isosurface extraction from volumetric data. Our method is topreprocess the dataset to build an I/O-optimal interval tree in disk, and then to extract isosurfaces by queryingthe interval tree for active cells and generating the isosurfaces from those cells. The method is I/O-optimal inspace, query, and preprocessing.

We also established the practical efficiency of the technique through experimental results, based on ourimplementation of anI/O filter for Vtk’s isosurface extraction for the case of unstructured grids. We showthat the time for searching active cells is less than the timefor generating the isosurfaces from active cells,and this search time is independent of the main memory available. Also, with the interval tree built in disk,there is no need to load the entire dataset into main memory. In addition, the preprocessing is performed onceand for all; its running time is the same as running external sorting a few times, and is linear in the size of thedatasets.

All these features make our technique especially suitable for use with very large datasets, or when thereis only a small amount of main memory. In fact, even for smaller datasets with enough main memory, our

18

method introduces only a negligible overhead.At this point, the major obstacle in making production usageof our technique is the disk overhead involved

in both preprocessing and dataset storage. This will be the main area of our future research. As can be seenfrom Table 2, the actual size of the interval tree (i.e., file dataset.intTree) is not the main storage overhead. Infact, most of the overhead comes from our desire to achieve the optimal I/O bound for querying active cells,since searching active cells was originally the bottleneckin the entire process of isosurface extraction. Nowthat we have successfully eliminated this bottleneck with I/O-querying time being less than the CPU time forthe generation phase, we are currently considering techniques to reduce the disk storage overhead at the costof increasing the I/O-querying time, as long as the I/O querying still takes no more time than the generationphase. Observe that under this theme if we overlap the I/O querying with the CPU time, then there would beno cost for I/O.

Our long-term goal is to develop a disk indexing scheme that will provide a canonical representation formatfor general (even time-varying) regular and irregular volumetric datasets. Our goal would be to limit theaverage size overhead of indices to no more than 10% of the original dataset size. In order to reach this goal,we are planning to combine several techniques, such as compressing the index structures and exploiting theoverall coherence in the datasets (e.g., in the case of time-varying datasets). Also, it is conceivable that lesspreprocessing time and less space requirement could be achieved at the cost of more query time. It would benice to design a disk indexing scheme that exhibits a smooth trade-off among these costs, and that in additionallows the user to specify the desired trade-off in his/her own application. Another interesting problem comesfrom the fact that for the handling of really large scientificdatasets (terabytes of data), some form of tape-based indexing is also needed.

We believe that our results in this paper provide an evidenceto support a broad area for future research:interactive out-of-core scientific visualization. Previous work, for the most part, seemed to indicate that out-of-core algorithms would not be useful in interactive applications. In fact, it was thought that work such asthe geometric prefetching of Funkhouseret al. [12] would not be possible for scientific visualization becausescientific datasets lack the sort of coherence that architectural models have. Our work shows that by making adelicate indexing of the data, it is possible to retrieve therelevant data so fast from disk that all of the I/O timecan be disregarded, and the interactivity of the original application is not compromised by the I/O operations.

Acknowledgements

We thank Pat Crossno, Lichan Hong, Joseph Mitchell and Dino Pavlakos for useful criticism and help in thiswork. We thank Lars Arge and Jeff Vitter for useful comments about the presentation of their I/O-optimalinterval tree, and Lars Arge for useful discussions on the preprocessing of the tree. The first author wouldalso like to thank Roberto Tamassia for his constant supportand encouragement. We thank the ComputationalGeometry Laboratory (J. Mitchell, Director) and the Centerfor Visual Computing (A. Kaufman, Director),for use of the computing resources. The Blunt Fin, the LiquidOxygen Post, and the Delta Wing datasets arecourtesy of NASA. The Combustion Chamber dataset is from Vtk[26]. We thank W. Schroeder, K. Martin,and B. Lorensen for Vtk; the Geometry Center of the University of Minnesota for Geomview; and LinusTorvals, and the Linux community for Linux. Without these powerful tools, it would have been much harderto perform this work.

19

References

[1] A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems.Communi-cations of the ACM, 31(9):1116–1127, 1988.

[2] L. Arge. The buffer tree: A new technique for optimal I/O-algorithms. InProc. Workshop on Algorithmsand Data Structures, pages 334–345, 1995.

[3] L. Arge. Personal communication, April, 1997.

[4] L. Arge, D. E. Vengroff, and J. S. Vitter. External-memory algorithms for processing line segments ingeographic information systems. InProc. European Symp. Algorithms, pages 295–310, 1995.

[5] L. Arge and J. S. Vitter. Optimal interval management in external memory. InProc. IEEE Foundationsof Comp. Sci., pages 560–569, 1996.

[6] L. Arge and J. S. Vitter. Personal communication, October, 1997.

[7] C. L. Bajaj, V. Pascucci, and D. R. Schikore. Fast isocontouring for improved interactivity. In1996Volume Visualization Symposium, pages 39–46, October 1996.

[8] P. Callahan, M. T. Goodrich, and K. Ramaiyer. Topology B-trees and their applications. InProc.Workshop on Algorithms and Data Structures, pages 381–392, 1995.

[9] Y.-J. Chiang. Experiments on the practical I/O efficiency of geometric algorithms: Distribution sweepvs. plane sweep.Comput. Geom.: Theory and Appl.(to appear). Extended abstract appeared inProc.Workshop on Algorithms and Data Structures, pages 346–357, 1995.

[10] Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. InProc. ACM-SIAM Symp. on Discrete Algorithms, pages 139–149, 1995.

[11] P. Cignoni, C. Montani, E. Puppo, and R. Scopigno. Optimal isosurface extraction from irregular volumedata. In1996 Volume Visualization Symposium, pages 31–38, October 1996.

[12] T. A. Funkhouser, S. Teller, C. H. Sequin, and D. Khorramabadi. Database management for modelslarger than main memory. InInteractive Walkthrough of Large Geometric Databases, Course Notes32, SIGGRAPH ’95, pages E15–E60, 1995. Appeared as “The UC Berkeley System for InteractiveVisualization of Large Architectural Models”, inPresence: Teleoperators and Virtual Environments,Vol.5, No.1, Winter 1996.

[13] M. T. Goodrich, J.-J. Tsay, D. E. Vengroff, and J. S. Vitter. External-memory computational geometry.In IEEE Foundations of Comp. Sci., pages 714–723, 1993.

[14] T. He, L. Hong, A. Varshney, and S. Wang. Controlled topology simplification. IEEE Transactions onVisualization and Computer Graphics, 2(2):171–184, June 1996.

[15] T. Itoh and K. Koyamada. Automatic isosurface propagation using an extrema graph and sorted boundarycell lists. IEEE Transactions on Visualization and Computer Graphics, 1(4):319–327, December 1995.

[16] P. C. Kanellakis, S. Ramaswamy, D. E. Vengroff, and J. S.Vitter. Indexing for data models with con-straints and classes. InProc. ACM Symp. on Principles of Database Sys., pages 233–243, 1993.

20

[17] D. Knuth. The Art of Computer Programming Vol. 3: Sorting and Searching. Addison-Wesley, 1973.

[18] Y. Livnat, H.-W. Shen, and C.R. Johnson. A near optimal isosurface extraction algorithm using spanspace.IEEE Transactions on Visualization and Computer Graphics, 2(1):73–84, March 1996.

[19] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3D surface construction algorithm.In Maureen C. Stone, editor,Computer Graphics (SIGGRAPH ’87 Proceedings), volume 21, pages163–169, July 1987.

[20] G. M. Nielson and B. Hamann. The asymptotic decider: Removing the ambiguity in marching cubes.In Visualization ’91, pages 83–91, 1991.

[21] M. H. Nodine and J. S. Vitter. Paradigms for optimal sorting with multiple disks. InProc. of the 26thHawaii Int. Conf. on Systems Sciences, January 1993.

[22] F. P. Preparata and M. I. Shamos.Computational Geometry: An Introduction. Springer-Verlag, NewYork, NY, 1985.

[23] S. Ramaswamy and S. Subramanian. Path caching: A technique for optimal external searching. InProc.ACM Symp. on Principles of Database Sys., pages 25–35, 1994.

[24] W. Schroeder, W. Lorensen, and C. Linthicum. Implicit modeling of swept surfaces and volumes. InIEEE Visualization ’94, pages 40–45, October 1994.

[25] W. Schroeder, K. Martin, and W. Lorensen. The design andimplementation of an object-oriented toolkitfor 3D graphics and visualization. InIEEE Visualization ’96, October 1996.

[26] W. Schroeder, K. Martin, and W. Lorensen.The Visualization Toolkit. Prentice-Hall, 1996.

[27] W. Schroeder, J. Zarge, and W. Lorensen. Decimation of triangle meshes. In Edwin E. Catmull, editor,Computer Graphics (SIGGRAPH ’92 Proceedings), volume 26, pages 65–70, July 1992.

[28] H.-W. Shen, C. D. Hansen, Y. Livnat, and C. R. Johnson. Isosurfacing in span space with utmostefficiency (ISSUE). InIEEE Visualization ’96, October 1996.

[29] H.-W. Shen and C.R. Johnson. Sweeping simplices: A fastiso-surface extraction algorithm for unstruc-tured grids. InIEEE Visualization ’95, pages 143–150, October 1995.

[30] C. T. Silva, J. S. B. Mitchell, and A. E. Kaufman. Fast rendering of irregular grids. In1996 VolumeVisualization Symposium, pages 15–22, October 1996.

[31] S. Subramanian and S. Ramaswamy. The P-range tree: A newdata structure for range searching insecondary memory. InProc. ACM-SIAM Symp. on Discrete Algorithms, pages 378–387, 1995.

[32] S. Teller, C. Fowler, T. Funkhouser, and P. Hanrahan. Partitioning and ordering large radiosity com-putations. In Andrew Glassner, editor,Proceedings of SIGGRAPH ’94 (Orlando, Florida, July 24–29,1994), Computer Graphics Proceedings, Annual Conference Series, pages 443–450. ACM SIGGRAPH,ACM Press, July 1994. ISBN 0-89791-667-0.

[33] D. E. Vengroff and J. S. Vitter. I/O-efficient scientificcomputation using TPIE. InProc. IEEE Symp. onParallel and Distributed Computing, 1995.

21

[34] D. E. Vengroff and J. S. Vitter. Efficient 3-d range searching in external memory. InProc. Annu. ACMSympos. Theory. Comput., pages 192–201, 1996.

[35] J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory I: Two-level memories.Algorithmica,12(2-3):110–147, 1994.

[36] J. Wilhelms and A. Van Gelder. Octrees for faster isosurface generation. InComputer Graphics (SanDiego Workshop on Volume Visualization), volume 24 (5), pages 57–62, November 1990.

22

Figure 5: Running times for extracting isosurfaces usingioQuery (left column) andvtkiso (right column)with 128M of main memory. Note that two costs ofvtkiso are not shown.

Figure 6: Running times for extracting isosurfaces usingioQuery (left column) andvtkiso (right column)with 64M of main memory. Note that two costs ofvtkiso are not shown.

Figure 7: Running times for extracting isosurfaces usingioQuery (left column) andvtkiso (right column)with 32M of main memory. Note that two costs ofvtkiso are not shown.

Figure 8: Typical isosurfaces are shown. The upper two are for the Combustion Chamber dataset. The onesin the bottom are for the Blunt Fin dataset.