modeling & analyzing massive terrain data sets

67
Modeling & Analyzing Modeling & Analyzing Massive Terrain Data Massive Terrain Data Sets Sets Pankaj K. Agarwal Pankaj K. Agarwal Duke University Duke University

Upload: kali

Post on 29-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Modeling & Analyzing Massive Terrain Data Sets. Pankaj K. Agarwal Duke University. STREAM Project. S calable T echniques for hi- R esolution E levation data A nalysis & M odeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling & Analyzing Massive Terrain Data Sets

Modeling & Analyzing Massive Modeling & Analyzing Massive Terrain Data SetsTerrain Data Sets

Pankaj K. AgarwalPankaj K. Agarwal

Duke UniversityDuke University

Page 2: Modeling & Analyzing Massive Terrain Data Sets

STREAM ProjectSTREAM Project

Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling

In collaboration with: Lars Arge, Helena Mitasova

Students: Andrew Danner,Thomas Molhave,Ke Yi

Page 3: Modeling & Analyzing Massive Terrain Data Sets

Diverse elevation point data: density, Diverse elevation point data: density, distribution, accuracydistribution, accuracy

1974

20011999

1995Lidar 0.15m v. accuracy; altitude 700m and 2300m

Photogrammetry 0.76m v. accuracy (5ft contours)

2004

1998

RTK-GPS 0.10m v. accuracy

Page 4: Modeling & Analyzing Massive Terrain Data Sets

Constructing Digital Elevation ModelsConstructing Digital Elevation Models

Grid, TIN, Contour DEMs

Page 5: Modeling & Analyzing Massive Terrain Data Sets

Natural feature extractionNatural feature extraction

Extracted foredunes 1999-2004 usingprofile curvature and elevation threshold

MaMaps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters

Page 6: Modeling & Analyzing Massive Terrain Data Sets

Tracking evolving featuresTracking evolving featuresslip faces: slope > 30deg 1995 new slip face in 1999 2001

how to automatically track features that change some of their properties over time?

dune crests: profile curvature threshold

Page 7: Modeling & Analyzing Massive Terrain Data Sets

Hydrology AnalysisHydrology Analysis

• Flow Analysis• Watershed

hierarchy

Page 8: Modeling & Analyzing Massive Terrain Data Sets

Challenge: Massive Data SetsChallenge: Massive Data Sets

• LIDAR– NC Coastline: 200 million points – over 7 GB– Neuse River basin (NC): 500 million points – over

17 GB

• Output is also large– 10ft grid: 10GB– 5ft grid: 40GB

Page 9: Modeling & Analyzing Massive Terrain Data Sets

Approximation AlgorithmsApproximation Algorithms

• Exact computation expensiveMany practical problems are intractable

Multiple & often conflicting optimization criteria

Suffices to find a near-optimal solution

• Tunable Approximation algorithms– Tradeoff between accuracy and efficiency– User specifies tolerance

Page 10: Modeling & Analyzing Massive Terrain Data Sets

I/O-BottleneckI/O-Bottleneck• Data resides in secondary memory• Disk access is 106 times slower than main memory

access– Maximize useful data transferred with each access– Amortize large access time by transferring large

contiguous blocks of data

• I/O – not CPU – is often the bottleneck for processing massive data sets

track

magnetic surface

read/write armread/write head

Page 11: Modeling & Analyzing Massive Terrain Data Sets

• Traditional algorithms optimize CPU computation– Not sensitive to penalty of disk access

• I/O model– Memory is finite

– Data is transferred in blocks

– Complexity measured in disk blocks transferred

I/O-efficient Algorithms [AV88]I/O-efficient Algorithms [AV88]

Disk

RAM

CPU

M

B

B ~ 2K-8K

Page 12: Modeling & Analyzing Massive Terrain Data Sets

Elevation Data to Watershed Hierarchies: Elevation Data to Watershed Hierarchies: A Pipeline ApproachA Pipeline Approach

Page 13: Modeling & Analyzing Massive Terrain Data Sets

• Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data)

• Goal: construct a grid DEM for z• DEM Applications

– Hydrology

– Ecology

– Viewsheds …• Many algorithms only work on grid data

Point Cloud to Grid DEMPoint Cloud to Grid DEM

Page 14: Modeling & Analyzing Massive Terrain Data Sets

Sub-meter resolution: improved structures Sub-meter resolution: improved structures representationrepresentation

2004 lidar-based 30cm resolution DSM

1999 lidar 2m resolution DSM

Binning (within cell average) sufficient for ~1-3m DEMs but interpolation producesimproved geometry representation and allows us to create higher resolution DEMs/DSMs

2004 lidar: 0.5m resolution binning

interpolated DSM

Page 15: Modeling & Analyzing Massive Terrain Data Sets

General Approach: InterpolationGeneral Approach: Interpolation

• Kriging, IDW, Splines, …

Interpolate

O(N3)z=F(x,y)

• Modern data sets can have millions of points

• Direct interpolation does not scale

N Points

Page 16: Modeling & Analyzing Massive Terrain Data Sets

Improving Interpolation with Quad TreesImproving Interpolation with Quad Trees

• Segment initial point set using a quad tree– Each quad tree leaf has a <K points (10-100)

• Interpolate each segment separately• N/K segments

– O(K3) interpolation per segment

– O(K2N) total running time

– Linear in N

• Boundary smoothness?– Use points from nearby

segments

Page 17: Modeling & Analyzing Massive Terrain Data Sets

Overview of the AlgorithmOverview of the Algorithm

• Construct quad-tree on disk using bulk loading [AAPV 01]

• For each quad-tree node, find points in neighboring segments

• Arrange points on disk for sequential interpolation– Use the algorithm by Mitasova, Mitas, Harmon

– Can use any other interpolation method

Page 18: Modeling & Analyzing Massive Terrain Data Sets

Building an External Quad TreeBuilding an External Quad Tree• Build top few levels in memory• Buffer points in lower levels• Write buffers to disk when full

K=2

Block I/O

Page 19: Modeling & Analyzing Massive Terrain Data Sets

Building a TINBuilding a TIN

TIN: Triangulated Irregular Network

Constrained DelaunayTriangulation

Developed an I/O-efficient algorithm

Builds TIN for NeuseRiver Basin (14GB) in7.5 hours

Page 20: Modeling & Analyzing Massive Terrain Data Sets

Future Work: Hierarchical DEMsFuture Work: Hierarchical DEMs

• Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains– Focuses on visualization– Most algorithms perform random memory access

• Out-of-core simplification– I/O-efficient algorithms for terrain simplification (edge

contraction method)

• Feature preserving simplification– Preserves main river networks– Preserves visibility for most of the points

Page 21: Modeling & Analyzing Massive Terrain Data Sets

Coping with Noisy DataCoping with Noisy Data

• Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96]

– water flows downwards– stops at a minimum

Page 22: Modeling & Analyzing Massive Terrain Data Sets

Denoising via FloodingDenoising via Flooding [Jenson and Domingue 88, Arge et al. 03][Jenson and Domingue 88, Arge et al. 03]

• Pour water uniformly on terrain• Water level rises until steady-state

Page 23: Modeling & Analyzing Massive Terrain Data Sets

River Network After FloodingRiver Network After Flooding

Sort(N) I/Os

After flooding

Page 24: Modeling & Analyzing Massive Terrain Data Sets

Denoising via FloodingDenoising via Flooding [Jenson and Domingue 88, Arge et al. 03][Jenson and Domingue 88, Arge et al. 03]

• Pour water uniformly on terrain• Water level rises until steady-state

• Use topological persistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum

• Flood low-persistence minima, preserve high-persistence minima

Page 25: Modeling & Analyzing Massive Terrain Data Sets

Topological Persistence Topological Persistence [Edelsbrunner et al. 02][Edelsbrunner et al. 02]

Page 26: Modeling & Analyzing Massive Terrain Data Sets

The Union-Find ProblemThe Union-Find Problem

• A universe of N elements: x1, x2, …, xN

• Initially N singleton sets: {x1}, {x2 }, …, {xN}

• Each set has a representative

• Maintain the partition under– Union(xi, xj) : Joins the sets containing xi and xj

– Find(xi) : Returns the representative of the set containing xi

Page 27: Modeling & Analyzing Massive Terrain Data Sets

Formulated as Batched Union-FindFormulated as Batched Union-Find• On a terrain represented as a TIN

• When reach– A minimum or maximum: do nothing– A regular point u: Issue union(u,v) for a lower neighbor v– A saddle u: let v and w be nodes from u’s two connected pieces

in its lower link Issue: find(v), find(w), union(u,v), union(u,w)

• Sequence of union/find operations can be computed in advance

lower link

Page 28: Modeling & Analyzing Massive Terrain Data Sets

Classical Union-Find AlgorithmClassical Union-Find Algorithm

d

b j a

e g

h

f l

n

m

i

s r cz k

p

representatives

d

b j a

e g

h

f l

n

m

Union(d, h) :

link-by-rank

d

b j a

e g

h

f l n

Find(n) :

path compression

m

Page 29: Modeling & Analyzing Massive Terrain Data Sets

ComplexityComplexity

• O(N α(N)) for a sequence of N union and find operations [Tarjan 75]– α(•) : Inverse Ackermann function (very slow!)

– Optimal in the worst case [Tarjan79, Fredman and Saks 89]

• Batched (Off-line) version– Entire sequence known in advance

– Can be improved to linear on RAM [Gabow and Tarjan 85]

– Not possible on a pointer machine [Tarjan79]

Page 30: Modeling & Analyzing Massive Terrain Data Sets

Our ResultsOur Results

• An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os– Same as sorting– optimal in the worst case

• A simpler algorithm using O(sort(N) log(N/M)) I/Os– Implemented

• Apply to persistence computation, flooding– O(sort(N)) time

• Other applications

Page 31: Modeling & Analyzing Massive Terrain Data Sets

I/O-Efficient Batched Union-FindI/O-Efficient Batched Union-Find

• Two-stage algorithm– Convert to interval union-find

• Compute an order on the elements s.t. each union joins two adjacent sets

– Solve batched interval union-find

• Each stage formulated as a geometric problem and solved using sort(N) I/Os

Page 32: Modeling & Analyzing Massive Terrain Data Sets

Experimental ResultsExperimental Results

Traditional algorithm good as long as data fits in memory

Page 33: Modeling & Analyzing Massive Terrain Data Sets

Experimental Results:Neuse River BasinExperimental Results:Neuse River Basin

On the entire data set: EM takes 5 hours, IM fails

0.5 billion points (14GB raw data), sampled to obtain smaller data sets

Page 34: Modeling & Analyzing Massive Terrain Data Sets

Contour TreesContour Trees

Page 35: Modeling & Analyzing Massive Terrain Data Sets

Previous ResultsPrevious Results

• Directly maintain contours– O(N log N) time [van Kreveld et al. 97]

– Needs union-split-find for circular lists– Do not extend to higher dimensions

• Two sweeps by maintaining components, then merge– O(N log N) time [Carr et al. 03]

– Extend to arbitrary dimensions

Page 36: Modeling & Analyzing Massive Terrain Data Sets

Join Tree and Split TreeJoin Tree and Split Tree [Carr et al. 03][Carr et al. 03]

9

8

76

5

4

32

1

Join tree

9

8

76

5

4

32

1

Split tree

Qualified nodes

9

8

76

5

4

3

1

Join tree

9

8

76

5

4

3

1

Split tree

Page 37: Modeling & Analyzing Massive Terrain Data Sets

Final Contour TreeFinal Contour Tree

9

8

76

5

4

32

1

Join tree

9

8

76

5

4

32

1

Split tree

9

8

76

5

4

32

1

Contour tree

Hard to BATCH!

Page 38: Modeling & Analyzing Massive Terrain Data Sets

Another CharacterizationAnother CharacterizationLet w be the highest node that is a descendant of v in join treeand ancestor of u in split tree, (u, w) is a contour tree edge

9

8

76

5

4

32

1

Join tree

9

8

76

5

4

32

1

Split tree

9

8

76

5

4

32

1

Contour tree

u

vw

u

vw

u

vw

Now can BATCH!

Page 39: Modeling & Analyzing Massive Terrain Data Sets

Experimental ResultsExperimental Results

On the Neuse River Basin data set

Page 40: Modeling & Analyzing Massive Terrain Data Sets

Coping with Noise: Future WorkCoping with Noise: Future Work

• Constructing hydrologically conditioned DEMs– Formulate as an optimization problem

– Using persistent homology

– Integrating topology and geometry

– Handling flat areas

• Scalability remains a big issue• Handling anthropogenic features

Page 41: Modeling & Analyzing Massive Terrain Data Sets

Back to Pipeline: Flow ModelingBack to Pipeline: Flow Modeling

Page 42: Modeling & Analyzing Massive Terrain Data Sets

From Elevation to River NetworksFrom Elevation to River Networks

• Where does water go?• From higher elevation to lower

elevation• Single flow directions forms a tree

Sea

80

90

100

95

85

110

Page 43: Modeling & Analyzing Massive Terrain Data Sets

Drainage AreaDrainage Area

• How much area is upstream of each node?

• Each node has initial drainage area (1)

• Drainage area of internal nodes depends on drainage area of children

1

1

1

1

1

1

1

1 1

1

3

3

2

2

2

3

2

5

7

17

25

14

11

9

7

Page 44: Modeling & Analyzing Massive Terrain Data Sets

Drainage Area example (Panama)Drainage Area example (Panama)

Page 45: Modeling & Analyzing Massive Terrain Data Sets

IFSARE and SRTM based stream analysisIFSARE and SRTM based stream analysisStream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data.

SRTM - 7400x3600 DSM at 90m resolution for entire Panama, IFSARE - 10800x11300 DEM at 10m resolution for the Panama canal sectionUsing r.terraflow - officially released in GRASS6.2 in 2006 - streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr

Page 46: Modeling & Analyzing Massive Terrain Data Sets

Hierarchical Watershed DecompositionHierarchical Watershed Decomposition

Page 47: Modeling & Analyzing Massive Terrain Data Sets

Watershed HierarchiesWatershed Hierarchies• Decompose a river network into a hierarchy of hydrological

units• All water in HU flows to a common outlet• Hierarchy provides tunable level of detail• Method used: Pfafstetter [VV99]• Want a solution scalable to large modern hi-res terrains

Page 48: Modeling & Analyzing Massive Terrain Data Sets

PfafstetterPfafstetter• Find main river

• Find four largest tributaries

• Label basins/interbasins

• Recurse until single path

1

1

1

1

1

1

1

1

1

1

2

2

2

3

2

7

17

25

14119

7

3

3

5

8

6

4

2

7

53

1

9

Page 49: Modeling & Analyzing Massive Terrain Data Sets

226

25 24

23

21

2227

771

75

7374

72

RecurseRecurse

Page 50: Modeling & Analyzing Massive Terrain Data Sets

Example Watershed BoundariesExample Watershed Boundaries

1

2

3

4

5

6

7

8

9

91

92

93

94

95

9697

98

99

991

992993

994

995

996

997

998999

Page 51: Modeling & Analyzing Massive Terrain Data Sets

A Complete PipelineA Complete Pipeline

Page 52: Modeling & Analyzing Massive Terrain Data Sets

ImplementationImplementation

• TPIE: C++ primitives for I/O-efficient algorithms• GRASS: Open Source GIS• Interpolation: Regularized spline with tension (in

GRASS)• Data:

– North Carolina LIDAR

• Neuse river basin: 400 million points (NC Floodmaps)

• Outer banks coastal data : 128 million points (NOAA CSC)

– USGS 30m NED

Page 53: Modeling & Analyzing Massive Terrain Data Sets

Grid Construction ResultsGrid Construction Results

Resolution (ft) 40 20 10

Grid cells x106 221 885 3542

Points x106 205 340 415

Total time 12h32m 14h46m 26h52m

Time spent(%)      

Build quad tree 8.9 7.1 5.7

Find neighbors 31.6 32.4 29.1

Interpolate 58.8 58.5 59.3

Write output 0.7 2 5.9

• Neuse• Point Thinning • Adjusted

interpolation parameters

Page 54: Modeling & Analyzing Massive Terrain Data Sets

Sample Watershed ResultsSample Watershed Results

size (MB) 150 713 5819

size (mln cells) 30.8 147 396.5

total time 10m29s 58m10s 187m43s

Time spent… %

importing data 8 7 16

sorting by flow 16 15 13

building river list 31 35 29

sorting river list 19 20 19

computing labels 7 6 6

sort by grid order 14 13 12

exporting data 5 4 5

Page 55: Modeling & Analyzing Massive Terrain Data Sets

Future WorkFuture Work

•Terrain Modeling

•Terrain Analysis

Page 56: Modeling & Analyzing Massive Terrain Data Sets

Multivalent terrain modelsMultivalent terrain models

- multiple surfaces- hybrid raster and 3D vector for structured environments (urban)- voxel for unstructured environments (forested areas)

Multiple surface representation:DSM and DEM

Page 57: Modeling & Analyzing Massive Terrain Data Sets

Terrain analysis on DEMs with structuresTerrain analysis on DEMs with structuresInvestiagte multivalent elevation field representations that preserve road and stream connectivity

Page 58: Modeling & Analyzing Massive Terrain Data Sets

Dynamic DataDynamic Data

• Continuous data streams from various sensors– Maintain smart summaries of data

– Communication, energy efficient algorithms for sensor networks

• Coping with moving/deformable objects– Tracking evolving terrain features

– Kinetic algorithms that update the structure locally

Page 59: Modeling & Analyzing Massive Terrain Data Sets

Terrain AnalysisTerrain Analysis• Flow modeling

– Erosion analysis– Soil moisture modeling (collaboration with

ecologists)

• Tracking evolving features• Visibility based navigation

– Finding paths from which most of the terrain is visible

– Computing well concealed paths– Covering terrains from a few points

Page 60: Modeling & Analyzing Massive Terrain Data Sets

AcknowledgmentsAcknowledgments

• Collaborators– Lars Arge, Helena Mitasova– Andrew Danner, Thomas Molhave, Ke Yi

• Sponsored by– Army Research Office W911NF-04-1-0278

Page 61: Modeling & Analyzing Massive Terrain Data Sets

Relevant PublicationsRelevant Publications• Danner, Yi, Molhave, A., Arge, Mitasova, TerraStream: From

Elevation Data to Watershed Hierarchies, submitted for publication, 2007.

• A., Arge, Danner, From point cloud to grid DEM: A scalable approach. In Proc. 12th SSDH, 771–788, 2006.

• A., Arge, Yi, I/O-efficient construction of constrained Delaunay triangulations. In Proc. ESA, 355–366, 2005.

• A., Arge, Yi. I/O-efficient batched union-find and its applications to terrain analysis. In Proc. 22nd SoCG, 2006.

• Arge, Danner, Haverkort, Zeh. I/O-efficient hierarchical watershed decomposition of grid terrain models. In Proc. 12th SSDH, 825–844, 2006.

• Danner. I/O Efficient Algorithms and Applications in Geographic Information Systems. PhD thesis, Duke University, 2006.

Page 62: Modeling & Analyzing Massive Terrain Data Sets

Thanks!Thanks!

Page 63: Modeling & Analyzing Massive Terrain Data Sets

Future Directions – Noise RemovalFuture Directions – Noise Removal

• Bridge detection/removal

• Other flow routing methods

• Flow routing on flat surfaces

• Comparing flow networks

Page 64: Modeling & Analyzing Massive Terrain Data Sets

Mapping LIDAR point densityMapping LIDAR point density

no. of points/2m grid cell1996 0.21997 0.9 ATM II NASA1998 0.41999 1.42001 0.2 NCflood Leica2003 2.0 EARL NASA2004 15.0 SHOALS USACE2005 6.0 SHOALS USACE

pt / 2m grid cell

2004: SHOALS2001: NC Flood

1998 2004

point density at the beginning of the project current industry standard

Page 65: Modeling & Analyzing Massive Terrain Data Sets

Building an External Quad TreeBuilding an External Quad Tree

• After first scan of data:– Write top of tree to disk– Flush all buffers

• Recurse on buffers

• levels in one scan

• I/Os total

Page 66: Modeling & Analyzing Massive Terrain Data Sets

• Identifying minima likely due to noise• Don’t want to remove real features

– Persistent homology [ELZ 02]

– Computed in Sort(N) I/Os

Coping with Noisy DataCoping with Noisy Data

Page 67: Modeling & Analyzing Massive Terrain Data Sets

InterpolationInterpolation• Scan through list of points with neighbors• Interpolate each leaf separately – Use any

interpolation method– Evaluate at grid cells

– Write grid cell values as (i,j,z) as they are computed

• Sort grid cells by raster order

Interpolate Evaluate

(i, j, z)

Sort