2011 workshop on high performance and distributed geographic information systems (hpdgis’11) 19 th...

21
2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding Up Large-Scale Geospatial Polygon Rasterization on GPGPUs Jianting Zhang Department of Computer Science, the City College of New York [email protected]

Upload: anis-bates

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Speeding Up Large-Scale Geospatial Polygon

Rasterization on GPGPUs Jianting Zhang

Department of Computer Science, the City College of New [email protected]

Page 2: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Outline Introduction and Motivations Background and Related Works The Serial Scan-Line Fill Algorithm Preprocessing Polygon Collections Efficient Polygon Rasterization on

GPGPUs Experiments and Results Conclusion and Future Work

Page 3: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Introduction: Personal HPC-G

“Despite all these initiatives the impact of parallel GIS research has remained slight…”

“…fundamental problem remains the fact that creating parallel GIS operations is non-trivial and there is a lack of parallel GIS algorithms, application libraries and toolkits.”

A. Clematis, M. Mineter, and R. Marciano. High performance computing with geographical data. Parallel Computing, 29(10):1275–1279, 2003

Marrying GPGPU with GIS – The next generation High-Performance GIS in a Personal Computing Environment (Zhang 2010, HPDGIS)• Every personal computer is now a parallel machine: CMPs and GPUs

• Multi-core CPUs become the mainstream ; the more cores they have, the more GPU features they have

• NVIDIA alone has shipped almost 220 million CUDA-capable GPUs from 2006-2010 (CACM 2010/11)

Page 4: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Introduction – Personal HPC-G Chip-Multiprocessors (CMP):

http://en.wikipedia.org/wiki/Multi-core_processor Cores/per chip: Dual-core Quad-core Six-

core8/10/12 Chips/per node: 1->24/8 Intel MIC (32 cores) UIUC Rigel Design (1024 core)

Massively parallel GPGPU computing: Hundreds of GPU cores in a GPU card Nvidia GTX480 (03/2010): 480 cores, 1.4 GHZ, 1.5GB, 177.4 GB/s memory

bandwidth, 1.35 TFlops Nvidia GTX590 (03/2011): 1024 cores, 1.2 GHZ, 3GB, 327.74 GB/s memory

bandwidth, 2.49 TFlops

Parallel hardware is ever affordable than before …

Page 5: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Introduction – Personal HPC-G Geospatial data volumes never stop growing

Satellite: e.g., from GOES to GOES-R (2016) http://www.goes-r.gov/downloads/GOES-R-Tri.pdf Spectral (3X)*spatial (4X)* temporal

(5X)=60X Derived thematic data products (vector)

http://www.goes-r.gov/products/baseline.html http://www.goes-r.gov/products/option2.html

Species distributions and movement data E.g. 300+ millions occurrence records (GBIF) E.g. 717,057 polygons and 78,929,697

vertices for 4148 birds distribution data (NatureServe)

Animals can move across space and time

Event Locations, trajectories and O-D data E.g., Taxi trip records (traces or O-D locations) 0.5 million in NYC and 1.2 million in Beijing per

day From O-D to shortest paths to flow patterns

COM.GEO’10SSDBM’10ACMGIS 10ACMGIS 11

ACMGIS’08ACMGIS’09GeoInformatics’09HPDGIS’11

COM.GEO’10HPDGIS’10???

Page 6: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Motivations

0 21 3

GPU-based parallel algorithm design to efficiently manage large-scale species distribution data (overlapped polygons)

Part 1: Extended quadtree to represent overlapped polygons (GeoInformatics’09 and ACMGIS’09)

Part 2: Efficient conversion between real-world geospatial polygons to quadtreesStep 1:From polygons to scan-line segments. Step 2: from scan-line segments to quadtrees

Part 3: Query-driven visual exploration (ACMGIS’08 and ACMGIS’09)

Page 7: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Background and Related Works Polygon-rasterization on GPUS

State-of-the-art: OpenGL GL_Polygon Problems

Fix-function, proprietary, black-box Does not support complex (e.g. concave) polygons – results may

be incorrect (although acceptable for display purposes) GL_Polygon is much slower than GL_TRIANGLES Require a hardware context to read back rasterization results Accuracy is limited by screen resolution Difficult to implement using graphics languages for GIS developers

GPGPU comes to the rescue Being able to use GPU parallel computing power Using C/C++ languages is more intuitive Directly generating spatial data structures can be more efficient

(than using rasterized images to construct quadtrees) More client-server computing friendly

No previous works on polygon rasterization on GPGPUs for geospatial apps.

Page 8: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Background and Related Works

Spatial Data structures on GPUs for computer graphics applications KD-Tree (Zhou et al 2008, Hou et al 2001), Octree

(Zhou 2011) They are designed to efficiently render triangles, not

querying polygons Software rasterization of triangles

(Laine and Karras 2011), (Panntaleoni 2011), (Schwarz and Seidel 2011)

Results are encouraging when compared to hardware rasterization (2-8x gap)

Again, they are deisgned for rasterizing/rendering triangles, not for query polygons

Page 9: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Background and Related Works Geospatial Data Processing on GPUs Pre-GPGPU:

Using graphics data structures and primitives for spatial selection and spatial join queries (Sun et al 2003)

Difficult and unintuitive Post-GPGPU

Spatial similarity join (Lieberman et al 2008) Density-based spatial clustering (Bohm et al 2009) Min-Max quadtree for large-scale raster data (Zhang

et al 2010) Decoding quad-tree encoded bitplane bitmaps of

large-scale raster data (Zhang et al 2011)

Page 10: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

The Serial Scan-Line Fill Algorithm

For each scan line y from ymin to ymax

1. Compute the intersection points with all edges

2. Sort the intersection points and form the scan line segments

3. (Fill the raster cells in the scan line segments)

End

Intersection points between scan line y=y’ and edge (x1,y1) and (x2,y2)x’=(x1+(y-y1)/(y2-y1)*(x2-x1))

GDAL/GRASS codebases

Page 11: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Polygon Rasterization on GPGPUs - Challenges

•Unique hardware characteristics (e.g. Nvidia Telsa C2050)• large number of threads (1024 per SM, 14 SMs)• limited shared memory: 48K per SM (shared by 1024 threads) • limited registers: 32768 per SM, i.e., 32 per thread •Need explicit shared memory management to make full utilization of the memory hierarchy

•Parallelizing Scan-Line Fill Algorithm •Mimicking CPU algorithm (assigning a polygon to a thread)

•Will NOT Work•Uncoalesced accesses to global memory are extremely inefficient •Insufficient registers and shared memory

•How to assign computing blocks and threads to scan-lines and polygon edges?

Page 12: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Polygon Rasterization on GPGPUs – Design

SM1SM2 … SMn

GPU Global Memory

L2

L1

…•The GPU SMs are divided into 14*4 computing blocks

•A computing block has 256 threads and processes one polygon

•All threads in a computing block loop through scan lines cooperatively

Page 13: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Polygon Rasterization on GPGPUs – Design

abc

d e

f

3

2

1

4

56 For each scan line y from ymin to ymax

End

654321 Global Memory

654321 Shared MemoryX/Y

OXOOXIntersection

OOOXXSorting

X/Y coordinates in shared memory are re-used (ymax-ymin-1) times

Page 14: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Polygon Rasterization on GPGPUs – Sorting

__device__ inline ushort scan4(ushort num) { __shared__ ushort ptr[2* MAX_PT]; ushort val=num; uint idx = threadIdx.x; ptr[idx] = 0; idx += Tn; ptr[idx] =num; SYNC

val += ptr[idx - 1]; SYNC ptr[idx] = val; SYNC val += ptr[idx - 2]; SYNC ptr[idx] = val; SYNC val += ptr[idx - 4]; SYNC ptr[idx] = val; SYNC … val = ptr[idx - 1]; return val;}

0 0 0 0 0 1 1 00 1 1 00 0 0 0 0 1 2 1

0 0 0 0 0 1 2 2

0 0 0 0 0 1 2 2

Step 0

Step 1

Step 2

Step 3

Result of exclusive scan

•GPGPUs are extremely good at sorting

•Sorting on shared memory are extremely fast

Benefits

•only true intersection results are written back to global memory

•Save GPU memory footprint and I/O costs

Page 15: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Experiments and Results

Data: NatureServe West Hemisphere birds speices

distributions: http://www.natureserve.org/getData/birdMaps.jsp

4148 birds: http://geoteci.engr.ccny.cuny.edu/geoteci/SPTestMap.html

717,057 polygons, 1,199,799 rings 78,929,697 vertices (1.3 G - shp files) Total number of scan-line/polygon edge

intersections: 200+ billions

Page 16: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Experiments and Results

Group # 1 2 3 4 5

Min # vertices 32 64 128 256 512

Max # vertices 64 128 256 512 1024

# Threads 64 128 256 512 1024

# Polygons 46509 23880 9666 5076 3146

CPU time (ms) 526 995 1803 4490 9387

GPU time (ms) 88 49 88 224 528

Speedup 6.0X 20.1X 20.5X 20.0X 17.8X

Page 17: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Discussions - handling large polygons

•The current implementation can not process polygons whose number of vertices are above a few thousands

•8n bytes for x coordinates

•8n bytes for y coordinates

•4n bytes for x coordinates of the intersections

•~100 extra bytes

•(20n+100)<48kn~2000 (using a whole SM as a computing block)

•We have limited the number of points to the number of threads (1024) - having one thread process a few vertices is not scalable

•We need a better way to handle scalability

Page 18: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Discussions - handling large polygons

Proposed Solution: chunking edge list, computing separately and then assembling

654321 Global Memory

X/Y654321 shared MemoryChunking

(x2,y2)

(x1,y1)

(x4,y2)

(x3,y1)

Computing

assembling (x3,y1)

(x1,y1)

(x4,y2)

(x2,y2)

Sorting using a separate kernel

Page 19: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Summary and Conclusion Introduced A GPGPU accelerated software rasterization

framework to rasterize and index large-scale geospatial polygons

Provided A GPGPU based design and implementation of computing intersection points

Achieved about 20X speedup for groups of polygons with vertices between 64 and 1024 using the birds species distribution data in the West Hemisphere that has about 3/4 million of polygons and more than 78 millions of vertices

Discussed on extending the current implementation to support polygons with arbitrarily large numbers of vertices by extensively using efficient sorting

Work reported is preliminary - several important components in realizing a dynamically integrated vector-raster data model for high-performance geospatial analysis on GPGPUs are still currently under development.

Page 20: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Future Work Extend our current implementation to support

large polygons with arbitrary numbers of vertices

Implement the quadtree construction (step2) based on the GPGPU computed scan-line segments (CPU/GPU)

Perform a comprehensive performance comparison with that of commercial spatial database indexing

Integrate with front end modules in spatial databases (e.g., query parser and optimizer)

Page 21: 2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding

2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11)

19th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011

Q&A

[email protected]

21