interactive ray tracing with cuda - · pdf fileray tracing & rasterization rasterization...

Post on 10-Feb-2018

241 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Interactive Ray Tracing with CUDA

David Luebke and Steven ParkerNVIDIA Research

Ray Tracing & Rasterization

RasterizationFor each triangle:

Find the pixels it coversFor each pixel: compare to closest triangle so far

Ray tracingFor each pixel:

Find the triangles that might be closestFor each triangle: compute distance to pixel

When all triangles/pixels have been processed, we know the closest triangle at all pixels

Ray Tracing & Rasterization

RasterizationFor each triangle:

Find the pixels it coversFor each pixel: compare to closest triangle so far

Ray tracingFor each pixel:

Find the triangles that might be closestFor each triangle: compute distance to pixel

When all triangles/pixels have been processed, we know the closest triangle at all pixels

Requires Z-buffer: track distance per pixel

Requires spatial index: a spatially sorted arrangement of triangles

Myths of Ray Tracing & Rasterization

Ray tracing is clean, rasterization is ugly

Both are ugly

Ray tracing is sublinear, rasterization linear in primitives

Rasterization uses culling techniques

Ray tracing is linear, rasterization sublinear in pixels

Ray tracing uses packets & frustum tracing

Ray Tracing vs. Rasterization

Rasterization is fast but needs cleverness to support complex visual effects

Ray tracing supports complex visual effectsbut needs cleverness to be fast

Copyright NVIDIA 2008 6

Why Rasterization?

Fast & Efficient

Ubiquitous – part of workflow, pipeline

Great for displacement-mapped geometry

Developers know how to make beautiful pictures...

Copyright NVIDIA 2008 7

Why Rasterization?

From Battlefield: Bad Company, EA Digital Illusions CE AB

Copyright NVIDIA 2008 8

Why Rasterization?

From Battlefield: Bad Company, EA Digital Illusions CE AB

Copyright NVIDIA 2008 9

Why Rasterization?

From Crysis, Crytek GmbH

Copyright NVIDIA 2008 10

Why Rasterization?

From Crysis, Crytek GmbH

Copyright NVIDIA 2008 11

Why ray tracing?

Ray tracing unifies rendering of visual phenomenafewer algorithms with fewer interactions between algorithms

Easier to combine advanced visual effects robustlysoft shadowssubsurface scatteringindirect illuminationtransparencyreflective & glossy surfacesdepth of field…

Ray Tracing vs. Rasterization

Rasterization is fast but needs cleverness to support complex visual effects

Ray tracing supports complex visual effectsbut needs cleverness to be fast

Use both!

Copyright NVIDIA 2008 13

Ray tracing (Appel 1968, Whitted 1980)

Copyright NVIDIA 2008 14

Distributed Ray Tracing (Cook, 1984)

Copyright NVIDIA 2008 15

Path Tracing (Kajiya, 1986)

Copyright NVIDIA 2008 16

Ray Tracing Regimes

Computational Power

Interactive

Real-time

Copyright NVIDIA 2008 17

Industrial strength ray tracingmental images is market leader for ray tracing software

Applicable in numerous markets: automotive, design, architecture, film

Copyright NVIDIA 2008 18

Importance

Copyright NVIDIA 2008 19

Importance

Copyright NVIDIA 2008 20

Importance

Copyright NVIDIA 2008 21

Interactive Ray Tracing

Copyright NVIDIA 2008 22

GPUs Are Fast & Getting Faster

0

250

500

750

1000

Sep-02 Jan-04 May-05 Oct-06 Feb-08

Peak

GFL

OP/

s

NVIDIA GPU Intel CPU

Copyright NVIDIA 2008 23

Why GPU Ray Tracing?

Abundant parallelism, massive computational power

GPUs excel at shading

Opportunity for hybrid algorithms

Copyright NVIDIA 2008 24

GPU Ray Tracing

Purcell et al., Ray Tracing on Programmable Graphics Hardware, SIGGRAPH 2002

Purcell et al., Photon Mapping on Programmable Graphics Hardware, Graphics Hardware 2004

Popov et al., Stackless KD-Tree Traversal for High Performance GPU Ray Tracing, Computer Graphics Forum, Oct 2007

Popov et al., Realtime Ray Tracing on GPU with BVH-based Packet Traversal, Symposium on Interactive Ray Tracing 2007

Purcell Photon Map Image Goes Here

Copyright NVIDIA 2008 25

GPU Ray Tracing

Horn et al., Interactive k‐D Tree GPU RaytracingACM SIGGRAPH Symposium on Interactive 3D Graphics 2007

02468

1012141618

K-D RestartGPU ImprovementLoopingShort-Stack

Copyright NVIDIA 2008 26

GPU Ray Tracing

Zhou et al., Real‐Time KD‐Tree Construction on Graphics HardwareMicrosoft Research Asia Tech Report 2008‐52

Volume Ray Casting

Ray marching for isosurfaces + direct volume rendering

Electron density of virusfrom cryoelectroscopy

Vital to change isosurfaceinteractively

Great match for CUDA

Volume Ray Casting With CUDAMarsalek & Slusallek 2008

Volume Ray Casting

Ray marching for isosurfaces + direct volume rendering

Electron density of virusfrom cryoelectroscopy

Vital to change isosurfaceinteractively

Great match for CUDA

Volume Ray Casting With CUDAMarsalek & Slusallek 2008

Volume Ray Casting

Ray marching for isosurfaces + direct volume rendering

Electron density of virusfrom cryoelectroscopy

Vital to change isosurfaceinteractively

Great match for CUDA

Volume Ray Casting With CUDAMarsalek & Slusallek 2008

City demoReal systemNVSG-driven animation and interactionProgrammable shadingModeled in Maya, imported through COLLADAFully ray traced

2 million polygonsBump-mappingMovable light source5 bounce reflection/refractionAdaptive antialiasing

System Diagram – ray tracing

Texture/Vertex buffer setup(OpenGL)

Ray tracing(CUDA)

Image display/ postprocessing

(OpenGL)

System Diagram – ray tracing

Texture/Vertex buffer setup(OpenGL)

Ray tracing(CUDA)

Image display/ postprocessing

(OpenGL)

Ray generation

Material shading

Miss shader

Light shader

Traversal

ProgrammableRay tracing system

Build

Copyright NVIDIA 2008 33

Key Parallel Abstractions in CUDA

0. Zillions of lightweight threadsSimple decomposition model

1. Hierarchy of concurrent threadsSimple execution model

2. Lightweight synchronization primitivesSimple synchronization model

3. Shared memory model for cooperating threadsSimple communication model

Copyright NVIDIA 2008 34

Key Parallel Abstractions in CUDA

0. Zillions of lightweight threadsSimple decomposition model

1. Hierarchy of concurrent threadsSimple execution model

2. Lightweight synchronization primitivesSimple synchronization model

3. Shared memory model for cooperating threadsSimple communication model

Copyright NVIDIA 2008 35

Hierarchy of concurrent threads

Parallel kernels composed of many threadsall threads execute the same sequential program

Thread t

Copyright NVIDIA 2008 36

Hierarchy of concurrent threads

Parallel kernels composed of many threadsall threads execute the same sequential program

Threads are grouped into thread blocksthreads in the same block can cooperate

Thread t

t0 t1 … tBBlock b

Copyright NVIDIA 2008 37

Hierarchy of concurrent threads

Parallel kernels composed of many threadsall threads execute the same sequential program

Threads are grouped into thread blocksthreads in the same block can cooperate

Threads/blocks have unique IDs

Thread t

t0 t1 … tBBlock b

Kernel foo()

. . .

Copyright NVIDIA 2008 38

Big Picture

1. Big strategic optimization: minimize per-thread state

2. Otherwise, take simplest option• Clever optimizations usually violate rule 1

3. Lots of opportunity for further research• Coalescing work for increased coherence (work queues)

• Data coherence • Execution coherence

• Ray space hierarchies• Radical departures from traditional methods (see RT08)

GTX 280 supports up to 30,720 concurrent threads!

Copyright NVIDIA 2008 39

Details – Algorithmic

Top-level BVH + subtrees (BVH or k-d tree) Supports rigid motion, instancingRebuild/refit easy to add

Traversal + intersection + shading “megakernel”while – while vs. if – if

Highly variable thread lifetimes!Software load-balancing

Copyright NVIDIA 2008 40

Details - Implementation

Triangle & hierarchy data through texture cache

Ray tree recursionStack in local memory to store shader live variables

Copyright NVIDIA 2008 41

Short Stack

Goal: minimize state per thread

Strategy: replace traversal stack with short stack

Horn et al., Interactive k-D Tree GPU Raytracing, I3D 2008

Slides courtesy Daniel Horn

Copyright NVIDIA 2008 42

A

DC

KD-Tree

B

X

Y

Z

X

Y Z

A B C D

Copyright NVIDIA 2008 43

A

DC

KD-Tree

B

X

Y

Z

X

Y Z

A B C D

Copyright NVIDIA 2008 44

A

DC

KD-Tree

B

X

Y

Z

X

Y Z

A B C D

Copyright NVIDIA 2008 45

A

DC

KD-Tree

B

X

Y

Z

X

Y Z

A B C D

Copyright NVIDIA 2008 46

A

DC

KD-Tree

B

X

Y

Z

X

Y Z

A B C D

tmin

tmax

Copyright NVIDIA 2008 47

DC

A

B

X

Y

Z

KD-Tree Traversal

X

Y Z

A B C D

Z

AStack:

Copyright NVIDIA 2008 48

DC

A

B

X

Y

Z

KD-Restart

Standard traversalOmit stack operationsProceed to 1st leaf

If no intersectionAdvance (tmin,tmax)Restart from root

Proceed to next leaf

Copyright NVIDIA 2008 49

DC

A

B

X

Y

Z

KD-Restart with short stack (size 1)

X

Y Z

A B C D

Z

AStack: A

Copyright NVIDIA 2008 50

Short Stack Cache

Even better:

Each thread stores full stack in memory non-blocking writes

Cache top of stack locally (registers or shared memory)

Enables BVHs as well as k-d trees

5-10% faster in our current implementation

Copyright NVIDIA 2008 51

Details – Algorithmic

Top-level BVH + subtrees (BVH or k-d tree) Supports rigid motion, instancingRebuild/refit easy to add

Traversal + intersection + shading “megakernel”while – while vs. if – if

Highly variable thread lifetimes!Software load-balancing

Copyright NVIDIA 2008 52

Details - Implementation

Triangle & hierarchy data through texture cache

Ray tree recursionStack in local memory to store shader live variables

Copyright NVIDIA 2008 53

Big Picture

1. Big strategic optimization: minimize per-thread state

2. Otherwise, take simplest option• Clever optimizations usually violate rule 1

3. Lots of opportunity for further research• Coalescing work for increased coherence (work queues)

• Data coherence • Execution coherence

• Ray space hierarchies• Radical departures from traditional methods (see RT08)

System Diagram – ray tracing

Texture/Vertex buffer setup(OpenGL)

Ray tracing(CUDA)

Image display/postpro

cessing(OpenGL)

System Diagram – ray tracing

Texture/Vertex buffer setup(OpenGL)

Ray tracing(CUDA)

Image display/postpro

cessing(OpenGL)

Ray generation

Material shading

Miss shader

Light shader

Traversal

ProgrammableRay tracing system

Build

System Diagram – Hybrid

Multi-pass Rasterization

(OpenGL)

Ray tracing(CUDA)

Composite, shade, display

(OpenGL)

Ray generation

Material shading

Miss shader

Light shader

Traversal

ProgrammableRay tracing system…

Δ IDs, …Build

FBO, …

Copyright NVIDIA 2008 57

Hybrid Rendering – Primary Rays

Copyright NVIDIA 2008 58

Hybrid Rendering – Primary Rays

Copyright NVIDIA 2008 59

Hybrid Rendering – “God Rays”Wyman & Ramsey, RT08

Creative Commons Image: Mila Zinkova

Copyright NVIDIA 2008 60

Hybrid Rendering – “God Rays”Wyman & Ramsey, RT08

Creative Commons Image: Mila Zinkova

Copyright NVIDIA 2008 61

Indirect Illumination != Ray Tracing

No indirect lighting With indirect lighting

Laine et al., Incremental Instant Radiosity for Real‐Time Indirect IlluminationEurographics Symposium on Rendering 2007

Copyright NVIDIA 2008 62

Solve the Right Problems!

Tracing eye rays is uninterestingrasterization wins, use it

Scenes change dynamically at run timecan’t lovingly craft all spatial indices in off-line process

Complex shaders & texturing are mandatorya big weakness of CPU software tracers to date

Need to provide a complete solutionconstruction, shading, application integration, hardware

Copyright NVIDIA 2008 63

Summary

CUDA makes GPU ray tracing fast and practical

A powerful tool in the interactive graphics toolbox

Hybrid algorithms are the future

Leverage the power of rasterization with the flexibility of CUDA

Together they provide tremendous scope for innovation

Thank You!

top related