practical parallel processing for today’s rendering challenges siggraph 2001 course 40

Post on 13-Jan-2016

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Practical Parallel Processing for Today’s Rendering Challenges SIGGRAPH 2001 Course 40 Los Angeles, CA. Speakers. Alan Chalmers, University of Bristol Tim Davis, Clemson University Erik Reinhard, University of Utah Toshi Kato, SquareUSA. Schedule. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Practical Parallel Processing for Today’s Rendering Challenges -- 1

Practical Parallel Processing for Today’s Rendering Challenges

SIGGRAPH 2001 Course 40Los Angeles, CA

Practical Parallel Processing for Today’s Rendering Challenges -- 2

SpeakersSpeakers

Alan Chalmers, University of Bristol Tim Davis, Clemson University Erik Reinhard, University of Utah Toshi Kato, SquareUSA

Practical Parallel Processing for Today’s Rendering Challenges -- 3

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 4

ScheduleSchedule

Introduction (Davis) Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 5

The Need for SpeedThe Need for Speed

Graphics rendering is time-consuming• large amount of data in a single image

• animations much worse

Demand continues to rise for high-quality graphics

Practical Parallel Processing for Today’s Rendering Challenges -- 6

Rendering and Parallel ProcessingRendering and Parallel Processing

A holy union Many graphics rendering tasks can be

performed in parallel Often “embarrassing parallel”

Practical Parallel Processing for Today’s Rendering Challenges -- 7

3-D Graphics Boards3-D Graphics Boards

Getting better Perform “tricks” with texture mapping Steve Jobs’ remark on constant frame

rendering time

Practical Parallel Processing for Today’s Rendering Challenges -- 8

Parallel / Distributed Rendering

Fundamental Issues

• Task Management

Task subdivision, Migration, Load balancing

• Data Management

Data distributed across system

• Communication

Fundamental Issues

• Task Management

Task subdivision, Migration, Load balancing

• Data Management

Data distributed across system

• Communication

Practical Parallel Processing for Today’s Rendering Challenges -- 9

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues

(Chalmers) Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 10

Introduction

“Parallel processing is like a dog’s walking on its hind legs. It is not done well, but you are surprised to find it done at all”

[Steve Fiddes (apologies to Samuel Johnson)]

• Co-operation

• Dependencies

• Scalability

• Control

“Parallel processing is like a dog’s walking on its hind legs. It is not done well, but you are surprised to find it done at all”

[Steve Fiddes (apologies to Samuel Johnson)]

• Co-operation

• Dependencies

• Scalability

• Control

Practical Parallel Processing for Today’s Rendering Challenges -- 11

Co-operation

Solution of a single problem

• One person takes a certain time to solve the problem

• Divide problem into a number of sub-problems

• Each sub-problem solved by a single worker

• Reduced problem solution time

BUT

• co-operation overheads

Solution of a single problem

• One person takes a certain time to solve the problem

• Divide problem into a number of sub-problems

• Each sub-problem solved by a single worker

• Reduced problem solution time

BUT

• co-operation overheads

Practical Parallel Processing for Today’s Rendering Challenges -- 12

Working TogetherWorking Together

Overheads• access to pool

• collision avoidance

Practical Parallel Processing for Today’s Rendering Challenges -- 13

DependenciesDependencies

Divide a problem into a number of distinct stages• Parallel solution of one stage before next can start

• May be too severe no parallel solution

each sub-problem dependent on previous stage

• Dependency-free problems

order of task completion unimportant

BUT co-operation still required

Practical Parallel Processing for Today’s Rendering Challenges -- 14

Building with BlocksBuilding with Blocks

Strictly sequential Dependency-free

Practical Parallel Processing for Today’s Rendering Challenges -- 15

ScalabilityScalability

Upper bound on the number of workers• Additional workers will NOT improve solution time

• Shows how suitable a problem is for parallel processing

• Given problem finite number of sub-problems

more workers than tasks

• Upper bound may be (a lot) less than number of tasks

bottlenecks

Practical Parallel Processing for Today’s Rendering Challenges -- 16

Bottleneck at Doorway Bottleneck at Doorway

@ $ &

More workers may result in LONGER solution time

Practical Parallel Processing for Today’s Rendering Challenges -- 17

ControlControl

Required by all parallel implementations• What constitutes a task

• When has the problem been solved

• How to deal with multiple stages

• Forms of control

centralised

distributed

Practical Parallel Processing for Today’s Rendering Challenges -- 18

Control RequiredControl Required

Sequential

Parallel

Practical Parallel Processing for Today’s Rendering Challenges -- 19

Inherent DifficultiesInherent Difficulties

Failure to successfully complete• Sequential solution

deficiencies in algorithm or data

• Parallel solution

deficiencies in algorithm or data

deadlock

data consistency

Practical Parallel Processing for Today’s Rendering Challenges -- 20

Novel DifficultiesNovel Difficulties

Factors arising from implementation• Deadlock

processor waiting indefinitely for an event

• Data consistency

data is distributed amongst processors

• Communication overheads

latency in message transfer

Practical Parallel Processing for Today’s Rendering Challenges -- 21

Evaluating Parallel ImplementationsEvaluating Parallel Implementations

Realisation penalties• Algorithmic penalty

nature of the algorithm chosen

• Implementation penalty

need to communicate

concurrent computation & communication activities

idle time

Practical Parallel Processing for Today’s Rendering Challenges -- 22

Solution TimesSolution Times

Practical Parallel Processing for Today’s Rendering Challenges -- 23

Task ManagementTask Management

Providing tasks to the processors• Problem decomposition

algorithmic decomposition

domain decomposition

• Definition of a task

• Computational Model

Practical Parallel Processing for Today’s Rendering Challenges -- 24

Problem DecompositionProblem Decomposition

Exploit parallelism• Inherent in algorithm

algorithmic decomposition

parallelising compilers

• Applying same algorithm to different data items

domain decomposition

need for explicit system software support

Practical Parallel Processing for Today’s Rendering Challenges -- 25

Abstract Definition of a TaskAbstract Definition of a Task

• Principal Data Item (PDI) - application of algorithm

• Additional Data Items (ADIs) - needed to complete computation

Practical Parallel Processing for Today’s Rendering Challenges -- 26

Computational ModelsComputational Models

Determines the manner tasks are allocated to PEs• Maximise PE computation time

• Minimise idle time

load balancing

• Evenly allocate tasks amongst the processors

Practical Parallel Processing for Today’s Rendering Challenges -- 27

Data Driven ModelsData Driven Models

All PDIs allocated to specific PEs before computation starts

Each PE knows a priori which PDIs it is responsible for

Balanced (geometric decomposition)• evenly allocate tasks amongst the processors

• if PDIs not exact multiple of Pes then some PEs do one extra task

portion at each PE = number of PDIsnumber of PEs

Practical Parallel Processing for Today’s Rendering Challenges -- 28

Balanced Data DrivenBalanced Data Driven

+

solution time = initial distribution

result collation

+243

Practical Parallel Processing for Today’s Rendering Challenges -- 29

Demand Driven ModelDemand Driven Model

Task computation time unknown• Work is allocated dynamically as PEs become idle

PEs no longer bound to particular PDIs

• PEs explicitly demand new tasks

• Task supplier process must satisfy these demands

Practical Parallel Processing for Today’s Rendering Challenges -- 30

Dynamic Allocation of TasksDynamic Allocation of Tasks

solution time =+

2 x total comms time

number of PEstotal comp time for all PDIs

Practical Parallel Processing for Today’s Rendering Challenges -- 31

Task Supplier ProcessTask Supplier Process

Simple demand driven task supplier

PROCESS Task_Supplier() Begin remaining_tasks := total_number_of_tasks

(* initialise all processors with one task *) FOR p = 1 TO number_of_PEs SEND task TO PE[p] remaining_tasks := remaining_tasks -1

WHILE results_outstanding DO RECEIVE result FROM PE[i] IF remaining_tasks > 0 THEN SEND task TO PE[i] remaining_tasks := remaining_tasks -1 ENDIF

End (* Task_Supplier *)

Practical Parallel Processing for Today’s Rendering Challenges -- 32

Load BalancingLoad Balancing

All PEs should complete at the same time• Some PEs busy with complex tasks

• Other PEs available for easier tasks

• Computation effort of each task unknown

hot spot at end of processing unbalanced solution

• Any knowledge about hot spots should be used

Practical Parallel Processing for Today’s Rendering Challenges -- 33

Task Definition & GranularityTask Definition & Granularity

Computational elements• Atomic element (ray-object intersection)

sequential problem’s lowest computational element

• Task (trace complete path of one ray)

parallel problem’s smallest computational element

• Task granularity

number of atomic units is one task

Practical Parallel Processing for Today’s Rendering Challenges -- 34

Task PacketTask Packet

Unit of task distribution• Informs a PE of which task(s) to perform

• Task packet may include

indication of which task(s) to compute

data items (the PDI and (possibly) ADIs)

• Task packet for ray tracer one or more rays to be traced

Practical Parallel Processing for Today’s Rendering Challenges -- 35

Algorithmic DependenciesAlgorithmic Dependencies

Algorithm adopted for parallelisation:• May specify order of task completion

• Dependencies MUST be preserved

• Algorithmic dependencies introduce:

synchronisation points distinct problem stages

data dependencies careful data management

Practical Parallel Processing for Today’s Rendering Challenges -- 36

Distributed Task ManagementDistributed Task Management

Centralised task supply• All requests for new tasks to System Controller

bottleneck

• Significant delay in fetching new tasks

Distributed task supply

• task requests handled remotely from System Controller

• spread of communication load across system

• reduced time to satisfy task request

Practical Parallel Processing for Today’s Rendering Challenges -- 37

Preferred Bias AllocationPreferred Bias Allocation

Combining Data driven & Demand driven• Balanced data driven

tasks allocated in a predetermined manner

• Demand driven

tasks allocated dynamically on demand

• Preferred Bias: Regions are purely conceptual

enables the exploitation of any coherence

Practical Parallel Processing for Today’s Rendering Challenges -- 38

Conceptual RegionsConceptual Regions

• task allocation no longer arbitrary

Practical Parallel Processing for Today’s Rendering Challenges -- 39

Data ManagementData Management

Providing data to the processors• World model

• Virtual shared memory

• Data manager process

local data cache

requesting & locating data

• Consistency

Practical Parallel Processing for Today’s Rendering Challenges -- 40

Remote Data FetchesRemote Data Fetches

Advanced data management• Minimising communication latencies

Prefetching

Multi-threading

Profiling

• Multi-stage problems

Practical Parallel Processing for Today’s Rendering Challenges -- 41

Data RequirementsData Requirements

Requirements may be large• Fit in the local memory of each processor

world model

• Too large for each local memory

distributed data

provide virtual world model/virtual shared memory

Practical Parallel Processing for Today’s Rendering Challenges -- 42

Virtual Shared Memory (VSM)Virtual Shared Memory (VSM)

Providing a conceptual single memory space• Memory is in fact distributed

• Request is the same for both local & remote data

• Speed of access may be (very) differentSystem Software Provided by DM process

Compiler HPF, ORCA

Operating System Coherent Paging

Hardware DDM, DASH, KSR-1

Higherlevel

Lowerlevel

Practical Parallel Processing for Today’s Rendering Challenges -- 43

ConsistencyConsistency

Read/write can result in inconsistencies• Distributed memory

multiple copies of the same data item

• Updating such a data item

update all copies of this data item

invalidate all other copies of this data item

Practical Parallel Processing for Today’s Rendering Challenges -- 44

Minimising Impact of Remote DataMinimising Impact of Remote Data

Failure to find a data item locally remote fetch• Time to find data item can be significant

• Processor idle during this time

• Latency difficult to predict

eg depends on current message densities

• Data management must minimise this idle time

Practical Parallel Processing for Today’s Rendering Challenges -- 45

Data Management TechniquesData Management Techniques

Hiding the Latency• Overlapping the communication with computation

prefetching

multi-threading

Minimising the Latency• Reducing the time of a remote fetch

profiling

caching

Practical Parallel Processing for Today’s Rendering Challenges -- 46

PrefetchingPrefetching

Exploiting knowledge of data requests• A priori knowledge of data requirements

nature of the problem

choice of computational model

• DM can prefetch them (up to some specified horizon)

available locally when required

overlapping communication with computation

Practical Parallel Processing for Today’s Rendering Challenges -- 47

Multi-ThreadingMulti-Threading

Keeping PE busy with useful computation• Remote data fetch current task stalled

• Start another task (Processor kept busy)

separate threads of computation (BSP)

• Disadvantages: Overheads

Context switches between threads

Increased message densities

Reduced local cache for each thread

Practical Parallel Processing for Today’s Rendering Challenges -- 48

Results for Multi-ThreadingResults for Multi-Threading

• More than optimal threads reduces performance

• “Cache 22” situation

less local cache more data misses more threads

Practical Parallel Processing for Today’s Rendering Challenges -- 49

ProfilingProfiling

Reducing the remote fetch time• At the end of computation all data requests are

known

if known then can be prefetched

• Monitor data requests for each task

build up a “picture” of possible requirements

• Exploit spatial coherence (with preferred bias allocation)

prefetch those data items likely to be required

Practical Parallel Processing for Today’s Rendering Challenges -- 50

Spatial CoherenceSpatial Coherence

Practical Parallel Processing for Today’s Rendering Challenges -- 51

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems (Davis)

Practical Applications

Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 52

Classification of Parallel Rendering Systems

Classification of Parallel Rendering Systems

Parallel rendering performed in many ways

Classification by• task subdivision

polygon rendering ray tracing

• hardware

parallel hardware distributed computing

Practical Parallel Processing for Today’s Rendering Challenges -- 53

Classification by Task SubdivisionClassification by Task Subdivision

Original rendering task broken into smaller pieces to be processed in parallel

Depends on type of rendering Goals

• maximize parallelism

• minimize overhead, including communication

Practical Parallel Processing for Today’s Rendering Challenges -- 54

Task Subdivision in Polygon Rendering

Task Subdivision in Polygon Rendering

Rendering many primitives Polygon rendering pipeline

• geometry processing (transformation, clipping, lighting)

• rasterization (scan conversion, visibility, shading)

Practical Parallel Processing for Today’s Rendering Challenges -- 55

Polygon Rendering PipelinePolygon Rendering Pipeline

Graphics database traversal

Display

GeometryProcessing

Rasterization

… G GG G

… R RR R

Practical Parallel Processing for Today’s Rendering Challenges -- 56

Primitive Processing and SortingPrimitive Processing and Sorting

View processing of primitives as sorting problem• primitives can fall anywhere on or off the screen

Sorting can be done in either software or hardware, but mostly done in hardware

Practical Parallel Processing for Today’s Rendering Challenges -- 57

Primitive Processing and SortingPrimitive Processing and Sorting

Sorting can occur at various places in the rendering pipeline• during geometry processing (sort-first)

• between geometry processing and rasterization (sort-middle)

• during rasterization (sort-last)

Practical Parallel Processing for Today’s Rendering Challenges -- 58

Sort-firstSort-first

GeometryProcessing

Rasterization

Graphics database(arbitrarily partitioned)

Display

G GG G …

R RR R

Redistribute “raw” primitives

(Pre-transform)

Practical Parallel Processing for Today’s Rendering Challenges -- 59

Sort-first MethodSort-first Method

Each processor (renderer) assigned a portion of the screen

Primitives arbitrarily assigned to processors

Processors perform enough calculations to send primitives to correct renderers

Processors then perform geometry processing and rasterization for their primitives in parallel

Practical Parallel Processing for Today’s Rendering Challenges -- 60

Screen SubdivisionScreen Subdivision

Practical Parallel Processing for Today’s Rendering Challenges -- 61

Sort-first DiscussionSort-first Discussion

+ Communication costs can be kept low

- Duplication of effort if primitives fall into more than one screen area

- Load imbalance if primitives concentrated

- Very few, if any, sort-first renderers built

Practical Parallel Processing for Today’s Rendering Challenges -- 62

Sort-middleSort-middle

GeometryProcessing

Rasterization

Graphics database(arbitrarily partitioned)

Display

G GG G

R RR R

Redistribute screen-space primitives

Practical Parallel Processing for Today’s Rendering Challenges -- 63

Sort-middle MethodSort-middle Method

Primitives arbitrarily assigned to renderers

Each renderer performs geometry processing on its primitives

Primitives then redistributed to rasterizers according to screen region

Practical Parallel Processing for Today’s Rendering Challenges -- 64

Sort-middle DiscussionSort-middle Discussion

+ Natural breaking point in graphics pipeline

- Load imbalance if primitives concentrated in particular screen regions

+ Several successful hardware implementations• PixelPlanes 5

• SGI Reality Engine

Practical Parallel Processing for Today’s Rendering Challenges -- 65

Sort-lastSort-last

GeometryProcessing

Rasterization

Graphics database(arbitrarily partitioned)

Display

G GG G …

R RR R

Redistribute pixels, samples, orfragments

(Compositing)

Practical Parallel Processing for Today’s Rendering Challenges -- 66

Sort-last MethodSort-last Method

Primitives arbitrarily distributed to renderers

Each renderer computes pixel values for its primitives

Pixel values are then sent to processors according to screen location

Rasterizers perform visibility and compositing

Practical Parallel Processing for Today’s Rendering Challenges -- 67

Sort-last DiscussionSort-last Discussion

+ Less prone to load imbalance

- Pixel traffic can be high

+ Some working systems • Denali

Practical Parallel Processing for Today’s Rendering Challenges -- 68

Task Subdivision in Ray TracingTask Subdivision in Ray Tracing

Ray tracing often prohibitively expensive on single processor

Prime candidate for parallelization• each pixel can be rendered independently

Processing easily subdivided• image space subdivision

• object space subdivision

• object subdivision

Practical Parallel Processing for Today’s Rendering Challenges -- 69

Image Space SubdivisionImage Space Subdivision

Practical Parallel Processing for Today’s Rendering Challenges -- 70

Image Space Subdivision DiscussionImage Space Subdivision Discussion

+ Straightforward

+ High parallelism possible

- Entire scene database must reside on each processor• need adequate storage

+ Low processor communication

Practical Parallel Processing for Today’s Rendering Challenges -- 71

Image Space Subdivision DiscussionImage Space Subdivision Discussion

- Load imbalance possible• screen space may be further subdivided

+ Used in many parallel ray tracers• works better with MIMD machines

• distributed computing environments

Practical Parallel Processing for Today’s Rendering Challenges -- 72

Object Space SubdivisionObject Space Subdivision

3-D object space divided into voxels Each voxel assigned to a processor Rays are passed from processor to

processor as voxel space is traversed

Practical Parallel Processing for Today’s Rendering Challenges -- 73

Object Space Subdivision Discussion

Object Space Subdivision Discussion

+ Each processor needs only scene information associated with its voxel(s)

- Rays must be tracked through voxel space

+ Load balance good

- Communication can be high

+ Some successful systems

Practical Parallel Processing for Today’s Rendering Challenges -- 74

Object PartitioningObject Partitioning

Each object in the scene is assigned to a processor

Rays passed as messages between processors

Processors check for intersection

Practical Parallel Processing for Today’s Rendering Challenges -- 75

Object Partitioning DiscussionObject Partitioning Discussion

+ Load balancing good

- Communication high due to ray message traffic

- Fewer implementations

Practical Parallel Processing for Today’s Rendering Challenges -- 76

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence (Davis)

• Interactive Ray Tracing• Parallel Rendering and the Quest for Realism: The Kilauea

Massively Parallel Ray Tracer Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 77

Practical Experiences at Clemson

Problems with Rendering Current Resources Deciding on a Solution A New Render Farm

Practical Parallel Processing for Today’s Rendering Challenges -- 78

A Demand for Rendering

Computer Animation course 3 SIGGRAPH animation submissions

• render over semester break

Practical Parallel Processing for Today’s Rendering Challenges -- 79

Current Resources

dedicated lab• 8 SGI 02’s (R12000, 384 MB)

general-purpose lab• 4 SGI 02’s

shared lab• dual-pipe Onyx2 (8 R12000, 8 GB)

• 10 SGI 02’s (R12000, 256 MB)

offices• 5 SGI 02’s

Practical Parallel Processing for Today’s Rendering Challenges -- 80

Resource Problems Rendering prohibits interactive sessions Little organized control over resources

• users must be self-monitoring

m renders on n machines 1 render on n/m machines

Disk space Cross-platform distributed rendering to PCs

problematic• security (rsh)

• distributed rendering software

• directory paths

Practical Parallel Processing for Today’s Rendering Challenges -- 81

Short-term Solutions

Distributed rendering restricted to late night

Resources partitioned

Practical Parallel Processing for Today’s Rendering Challenges -- 82

Problems with Maya

video Traditional distributed computing

problems• dropped frames

• incomplete frames

• tools developed

Practical Parallel Processing for Today’s Rendering Challenges -- 83

Problems with Maya

Tools (DropCheck)

Practical Parallel Processing for Today’s Rendering Challenges -- 84

Problems with Maya

Tools (Load Scan)

Practical Parallel Processing for Today’s Rendering Challenges -- 85

Problems with Maya

Animation inconsistencies• next slide

Some frames would not render Particle system inconsistencies

Practical Parallel Processing for Today’s Rendering Challenges -- 86

Problems with Maya

Practical Parallel Processing for Today’s Rendering Challenges -- 87

Rendering Tips

Layering

Practical Parallel Processing for Today’s Rendering Challenges -- 88

Rendering Tips

Layering

Practical Parallel Processing for Today’s Rendering Challenges -- 89

Deciding on a Solution - RenderDrive

RenderDrive by ART (Advanced Rendering Technology)• network appliance for ray tracing

• 16-48 specialized processors

• claims speedups of 15-40 over Pentium III

• 768MB to 1.5GB memory

• 4GB hard disk cache

Practical Parallel Processing for Today’s Rendering Challenges -- 90

Deciding on a Solution - RenderDrive

• plug-in interface to Maya

• Renderman ray tracer

• $15K - $25K

Practical Parallel Processing for Today’s Rendering Challenges -- 91

Deciding on a Solution - PCs

Network of PCs as a render farm 10 PCs each with 1.4GHz, 1GB memory,

and 40GB hard drive Maya will run under Windows 2000 or

Linux (Maya 4.0) Distributed rendering software not

included for Windows 2000

Practical Parallel Processing for Today’s Rendering Challenges -- 92

Deciding on a Solution - PCs Win

RenderDrive had some unusual anomalies

Interactive capabilities Scan-line or ray tracing Distributed rendering software may be

included Problems with security still exist

• shared file system

Practical Parallel Processing for Today’s Rendering Challenges -- 93

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence (Davis)

• Interactive Ray Tracing

• Parallel Rendering and the Quest for Realism: The Kilauea Massively Parallel Ray Tracer

Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 94

Agenda

Background Temporal Depth-Buffer Frame Coherence Algorithm Parallel Frame Coherence Algorithm

Practical Parallel Processing for Today’s Rendering Challenges -- 95

Background - Ray TracingBackground - Ray Tracing

Closest to physical model of light High cost in terms of time / complexity

Practical Parallel Processing for Today’s Rendering Challenges -- 96

Background - Frame CoherenceBackground - Frame Coherence

Frame coherence • those pixels that do not change from one frame to

the next

• derived from object and temporal coherence

We should not have to re-compute those pixels whose values will not change• writing pixels to frame files

Practical Parallel Processing for Today’s Rendering Challenges -- 97

Background - Test AnimationBackground - Test Animation

Glass Bounce (60 frames at 320x240; 5 obj)

Practical Parallel Processing for Today’s Rendering Challenges -- 98

Background - Frame Coherence Background - Frame Coherence

Practical Parallel Processing for Today’s Rendering Challenges -- 99

Previous WorkPrevious Work

Frame coherence• moving camera/static world [Hubschman and

Zucker 81]

• estimated frames [Badt 88]

• stereoscopic pairs [Adelson and Hodges 93/95]

• 4D bounding volumes [Glassner 88]

• voxels and ray tracking [Jevans 92]

• incremental ray tracing [Murakami90]

Practical Parallel Processing for Today’s Rendering Challenges -- 100

Previous Work (cont.)Previous Work (cont.)

Distributed computing• Alias and 3D Studio

• most major productions starting with Toy Story [Henne 96]

Practical Parallel Processing for Today’s Rendering Challenges -- 101

GoalsGoals

Render exactly the same set of frames in much less time

Work in conjunction with other optimization techniques

Run on a variety of platforms Extend a currently popular ray tracer

(POV-Ray) to allow for general use

Practical Parallel Processing for Today’s Rendering Challenges -- 102

Temporal Depth-BufferTemporal Depth-Buffer

Similar to traditional z-buffer For each pixel, store a temporal depth in

frame units

1

2

3

1

2

3

5 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 5

Practical Parallel Processing for Today’s Rendering Challenges -- 103

Frame Coherence AlgorithmFrame Coherence Algorithm

Practical Parallel Processing for Today’s Rendering Challenges -- 104

Frame Coherence AlgorithmFrame Coherence Algorithm

Practical Parallel Processing for Today’s Rendering Challenges -- 105

Identify volume within 3D object space where movement occurs

Divide volume uniformly into voxels For each voxel, create a list of frame

numbers in which changing objects inhabit this voxel

Frame Coherence AlgorithmFrame Coherence Algorithm

Practical Parallel Processing for Today’s Rendering Challenges -- 106

In each frame, track rays through voxels for each pixel

From the voxels traversed, find the one with the lowest frame number

Record that number in the temporal depth-buffer

Frame Coherence AlgorithmFrame Coherence Algorithm

Practical Parallel Processing for Today’s Rendering Challenges -- 107

Frame Coherence AlgorithmFrame Coherence Algorithm

for each frame of the animation

for each pixel that needs to be computed for this frame

trace the rays for this pixel

for each voxel that any of these rays intersect

get the next frame number to compute

set the t-buffer entry to the lowest frame number found

Practical Parallel Processing for Today’s Rendering Challenges -- 108

Frame Coherence AlgorithmFrame Coherence Algorithm

1

2

3

5 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 5 5 3 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 5 2 2 2 3 3 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 2 5 5 5 5 55 5 5 5 2 2 2 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5 5 5

Practical Parallel Processing for Today’s Rendering Challenges -- 109

Voxel Volume Voxel Volume

Uniform voxel spatial subdivision Voxel can be non-cubical Ways to determine voxel volume

• user-supplied

• pre-processing phase

active voxel marking

in distributed environment, done by master or slave or both

Practical Parallel Processing for Today’s Rendering Challenges -- 110

Frame Coherence ExampleFrame Coherence Example

Practical Parallel Processing for Today’s Rendering Challenges -- 111

Test AnimationTest Animation

Pool Shark (620 frames at 640x480; 174 obj)

Practical Parallel Processing for Today’s Rendering Challenges -- 112

Test Animations - ProblemTest Animations - Problem

Bounding box problem

Practical Parallel Processing for Today’s Rendering Challenges -- 113

ResultsResults

standardalgorithm

frame coherencealgorithm

ratio of framecoherence to standard

speedup

total number ofrays 47,841,269 13,259,380 0.27 --

total parse time0:48 1:30 1.88 --

first framerendering time 6:34 8:49 1.34 0.75

average framerendering time 7:15 3:05 0.43 2.33

total framerendering time 5:26:55 2:19:51 0.43 2.33

Practical Parallel Processing for Today’s Rendering Challenges -- 114

Frame Coherence DiscussionFrame Coherence Discussion

Localized movement can have global effects

Performance depends on both the number and complexity of recomputed pixels

Issues• overhead

• antialiasing

• motion blur

Practical Parallel Processing for Today’s Rendering Challenges -- 115

Uses less memory than other methods Simple Can be used with other algorithms

Temporal Depth-Buffer DiscussionTemporal Depth-Buffer Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 116

Parallel Frame Coherence AlgorithmParallel Frame Coherence Algorithm

Distributed computing environment 1-8 Sun Sparc Ultra 5 processors running

at 270 MHz Coarse-grain parallelism Load balancing

• divide work among processors

• keep data together for frame coherence

Practical Parallel Processing for Today’s Rendering Challenges -- 117

Load BalancingLoad Balancing

Image space subdivision• each processor computes a subregion for the

entire length of the run

Recursively subdivide subsequences to keep processors busy

… …… …

Practical Parallel Processing for Today’s Rendering Challenges -- 118

Screen SubdivisionScreen Subdivision

Practical Parallel Processing for Today’s Rendering Challenges -- 119

Load BalancingLoad Balancing

Coarse bin packing: find block with smallest number of computed frames

Keep statistics on average first frame time and average coherent frame time

Find a hole in the sequence Leave some free frames before new start

Practical Parallel Processing for Today’s Rendering Challenges -- 120

Load Balancing ExampleLoad Balancing Example

18414

3

4

2

614

3

4

2

1141914

speedprocessor new

speedprocessor current

2

1 start tmp - end holestart tmp framestart

143113

811

3

4

15

3011

speedprocessor new

speedprocessor current

time frame avg

time framefirst start hole start tmp

h o l es t a r t

h o l ee n d

f i r s t f r a m e t i m e = 3 0a v g f r a m e t i m e = 1 5

c u r r e n t p r o c e s s o r s p e e d = 4n e w p r o c e s s o r s p e e d = 3

t m ps t a r t

s t a r tf r a m e

… …1 0 1 91 81 71 61 51 41 1 1 31 2 2 0

Practical Parallel Processing for Today’s Rendering Challenges -- 121

Results - Parallel Frame CoherenceResults - Parallel Frame Coherence

standardalgorithm

parallel with 8machines

speedup parallel frame coherencewith 8 machines

speedup

total number ofrays 47,841,269 49,161,582 1.03 18,299,347 0.38

total parse time0:48 -- -- -- --

first framerendering time 6:34 -- -- -- --

average framerendering time 7:15 1:05 6.7 :34 12.9

total framerendering time 5:26:55 49:49 6.6 25:47 12.9

Practical Parallel Processing for Today’s Rendering Challenges -- 122

ResultsResultsstandardalgorithm

frame coherencealgorithm

ratio of framecoherence to standard

speedup

total number ofrays 15,731,252 6,386,883 0.41 --

total parse time0:11 0:19 1.73 --

first framerendering time 2:39 3:19 1.25 0.80

average framerendering time 2:42 1:39 0.61 1.64

total framerendering time 2:42:26 1:39:02 0.61 1.64

number ofprocessors

total numberof rays

ratio tosingle processor

average framerendering time

total renderingtime

speedup

1 15,731,252 1.00 2:42 2:42:26 1.00

2 5,890,290 0.37 :38 38:25 4.23

4 5,913,926 0.38 :22 22:12 7.31

8 6,063,338 0.39 :16 16:28 9.86

12 6,086,781 0.39 :12 11:37 13.98

16 6,323,673 0.40 :11 10:50 14.99

Practical Parallel Processing for Today’s Rendering Challenges -- 123

Another Test AnimationAnother Test Animation

Soda Worship (60 frames at 160x120; 839 obj)

Practical Parallel Processing for Today’s Rendering Challenges -- 124

Another Test AnimationAnother Test Animation

Practical Parallel Processing for Today’s Rendering Challenges -- 125

ResultsResultsstandardalgorithm

frame coherencealgorithm

ratio of framecoherence to standard

speedup

total number ofrays 44,454,548 19,944,939 0.45 --

total parse time3:06 3:47 1.04 --

first framerendering time 27:54 29:14 1.07 0.94

average framerendering time 28:07 15:07 0.54 1.86

total framerendering time 28:10:10 15:11:27 0.54 1.85

number ofprocessors

total numberof rays

ratio tosingle processor

average framerendering time

total renderingtime

speedup

1 44,454,548 1.00 28:10 28:10:10 1.00

2 22,163,526 0.50 15:11 11:48:11 2.39

4 22,286,422 0.50 7:45 4:27:26 6.32

8 22,409,023 0.50 3:58 2:16:34 12.38

12 23,125,140 0.52 2:38 1:31:05 18.56

16 23,180,741 0.52 2:02 1:12:15 23.39

Practical Parallel Processing for Today’s Rendering Challenges -- 126

Good speedup Multiplicative speedup with both Speedup limitations

• voxel approximation

• writing pixels to frame files (communication)

Results DiscussionResults Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 127

ConclusionsConclusions

Frame coherence algorithm combined with distributed computing provides good speedup

Algorithm scales well Techniques are useful and accessible to

a wide variety of users Benefits depend on inherent properties

of the animation

Practical Parallel Processing for Today’s Rendering Challenges -- 128

Shameless AdvertisementShameless Advertisement

Masters of Fine Arts in Computing (MFAC)• special effects and animation courses

• two year program

Clemson Computer Animation Festival in Fall 2002

Practical Parallel Processing for Today’s Rendering Challenges -- 129

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence

• Interactive Ray Tracing (Reinhard)

• Parallel Rendering and the Quest for Realism: The Kilauea Massively Parallel Ray Tracer

Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 130

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample reuse techniques

IntroductionIntroduction

Practical Parallel Processing for Today’s Rendering Challenges -- 132

Interactive Ray TracingInteractive Ray Tracing

Renders effects not available using other rendering algorithms

Feasible on high-end supercomputers provided suitable hardware is chosen

Scales sub-linearly in scene complexity Scales almost linearly in number of

processors

Practical Parallel Processing for Today’s Rendering Challenges -- 133

Hardware ChoicesHardware Choices

Shared memory vs. distributed memory Latency and throughput for pixel

communication

Choice Shared memory• This section of the course focuses on SGI Origin

series super computers

Practical Parallel Processing for Today’s Rendering Challenges -- 134

Shared MemoryShared Memory

Shared address space Physically distributed memory ccNUMA architecture

Practical Parallel Processing for Today’s Rendering Challenges -- 135

SGI Origin 2000 ArchitectureSGI Origin 2000 Architecture

Practical Parallel Processing for Today’s Rendering Challenges -- 136

ImplicationsImplications

ccNUMA machines are easy to program, But it is more difficult to generate

efficient code

Memory mapping and processor placement may be important for certain applications

Topic returns later in this course

Practical Parallel Processing for Today’s Rendering Challenges -- 137

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample re-use techniques

Interactive Ray TracingInteractive Ray Tracing

Practical Parallel Processing for Today’s Rendering Challenges -- 139

Basic AlgorithmBasic Algorithm

Master-slave configuration Master (display thread) displays results

and farms out ray tasks Slaves produce new rays Task size reduced towards end of each

frame• Load balancing

• Cache coherence

Practical Parallel Processing for Today’s Rendering Challenges -- 140

Tracing a Single RayTracing a Single Ray

Use spatial subdivisions for ray acceleration (assumed familiar)

Use grid or bounding volume hierarchy Could be optimized further, but good

results have been obtained with these acceleration structures

Efficiency mainly due to low level optimization

Practical Parallel Processing for Today’s Rendering Challenges -- 141

Low Level OptimizationLow Level Optimization

Ray tracing in general:• Ray coherence: neighboring rays tend to intersect

the same objects

• Cache coherence: objects previously intersected are likely to still reside in cache for current ray

• Memory access patterns are important (next slide)

Practical Parallel Processing for Today’s Rendering Challenges -- 142

Memory AccessMemory Access

On SGI Origin series computers:• Memory allocated for a specific process may be

located elsewhere in the machine reading memory may be expensive

• Processes may migrate to other processors when executing a system call whole cache becomes invalidated; previously local memory may now be remote and more expensive to access

Practical Parallel Processing for Today’s Rendering Challenges -- 143

Memory Access (2)Memory Access (2)

Pin down processes to processors Allocate memory close to where the

processes run that will use this memory

Use sysmp and sproc for processor placement

Use mmap or dplace for memory placement

Practical Parallel Processing for Today’s Rendering Challenges -- 144

Further Low Level OptimizationsFurther Low Level Optimizations

Know the architecture you work on (Appendix III.A in the course notes)

Use profiling to find expensive bits of code and cache misses (Appendix III.B in the course notes)

Use padding to fit important data structures on a single cache line

Practical Parallel Processing for Today’s Rendering Challenges -- 145

Frameless RenderingFrameless Rendering

Display pixel as soon as it is computed No concept of frames

• Perceptually preferable

• Equivalent of a full frame takes longer to compute

• Less efficient exploitation of cache coherence

• This alternative will return later in this course

Practical Parallel Processing for Today’s Rendering Challenges -- 146

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample re-use techniques

Practical Parallel Processing for Today’s Rendering Challenges -- 147

Animation and Interactive Ray Tracing

Animation and Interactive Ray Tracing

Practical Parallel Processing for Today’s Rendering Challenges -- 148

Why Animation?Why Animation?

Once interactive rendering is feasible, walk-through is not enough

Desire to manipulate the scene interactively

Render preprogrammed animation paths

Practical Parallel Processing for Today’s Rendering Challenges -- 149

Issues to Be AddressedIssues to Be Addressed

What stops us from animating objects?

• Answer: spatial subdivisions

• Acceleration structures normally built during pre-processing

• They assume objects are stationary

Practical Parallel Processing for Today’s Rendering Challenges -- 150

Possible SolutionsPossible Solutions

Target applications that require a small number of objects to be manipulated/ animated• Render these objects separately

Traversal cost will be linear in the number of animated objects

Only feasible for extremely small number of objects

Practical Parallel Processing for Today’s Rendering Challenges -- 151

Possible Solutions (2)Possible Solutions (2)

Target small number of manipulated or animated objects• Modify existing spatial subdivisions

For each frame delete object from data structure

Update object’s coordinates

Re-insert object into data structure

• This is our preferred approach

Practical Parallel Processing for Today’s Rendering Challenges -- 152

Spatial SubdivisionSpatial Subdivision

Should be able to deal with• Basic operations such as insertion and deletion of

objects should be rapid

• User manipulation can cause the extent of the scene to grow

Practical Parallel Processing for Today’s Rendering Challenges -- 153

Subdivisions InvestigatedSubdivisions Investigated

Regular grid Hierarchical grid

• Borrows from octree spatial subdivision

• In our case this is a full tree: all leaf nodes are at the same depth

Both acceleration structures are investigated in the next few slides

Practical Parallel Processing for Today’s Rendering Challenges -- 154

Regular Grid Data StructureRegular Grid Data Structure

We assume familiarity with spatial subdivisions!

Practical Parallel Processing for Today’s Rendering Challenges -- 155

Object Insertion Into GridObject Insertion Into Grid

Compute bounding box of object Compute overlap of bounding box with

grid voxels Object is inserted into overlapping voxels

Object deletion works similarly

Practical Parallel Processing for Today’s Rendering Challenges -- 156

Extensions to Regular GridExtensions to Regular Grid

Dealing with expanding scenes requires

• Modifications to object insertion/deletion

• Ray traversal

Practical Parallel Processing for Today’s Rendering Challenges -- 157

Extensions to Regular Grid (2)Extensions to Regular Grid (2)

Practical Parallel Processing for Today’s Rendering Challenges -- 158

Features of New Grid Data StructureFeatures of New Grid Data Structure

We call this an ‘Interactive Grid’• Straightforward object insertion/deletion

• Deals with expanding scenes

• Insertion cost depends on relative object size

• Traversal cost somewhat higher than for regular grid

Practical Parallel Processing for Today’s Rendering Challenges -- 159

Hierarchical GridHierarchical Grid

Objectives• Reduce insertion/deletion cost for larger objects

• Retain advantages of interactive grid

Practical Parallel Processing for Today’s Rendering Challenges -- 160

Hierarchical Grid (2)Hierarchical Grid (2)

Practical Parallel Processing for Today’s Rendering Challenges -- 161

Hierarchical Grid (3)Hierarchical Grid (3)

Build full octree with all leaf nodes at the same level• Allow objects to reside in leaf nodes as well as in

nodes higher up in the hierarchy

• Each object can be inserted into one or more voxels of at most one level in the hierarchy

• Small object reside in leaf nodes, large objects reside elsewhere in the hierarchy

Practical Parallel Processing for Today’s Rendering Challenges -- 162

Hierarchical Grid (4)Hierarchical Grid (4)

Features:• Deals with expanding scenes similar to interactive

grid

• Reduced insertion/deletion cost

• Traversal cost somewhat higher than interactive grid

Practical Parallel Processing for Today’s Rendering Challenges -- 163

Test ScenesTest Scenes

Practical Parallel Processing for Today’s Rendering Challenges -- 164

VideoVideo

Practical Parallel Processing for Today’s Rendering Challenges -- 165

MeasurementsMeasurements

We measure• Traversal cost of

Interactive grid

Hierarchical grid

Regular grid

• Object update rates of

Interactive grid

Hierarchical grid

Practical Parallel Processing for Today’s Rendering Challenges -- 166

Framerate vs. Grid Size (Sphereflake)Framerate vs. Grid Size (Sphereflake)

Practical Parallel Processing for Today’s Rendering Challenges -- 167

Framerate vs. Grid Size (Triangles)Framerate vs. Grid Size (Triangles)

Practical Parallel Processing for Today’s Rendering Challenges -- 168

Framerate Over Time (Sphereflake)Framerate Over Time (Sphereflake)

Practical Parallel Processing for Today’s Rendering Challenges -- 169

Framerate Over Time (Triangles)Framerate Over Time (Triangles)

Practical Parallel Processing for Today’s Rendering Challenges -- 170

ConclusionsConclusions

Interactive manipulation of ray traced scenes is both desirable and feasible using these modifications to grid and hierarchical grids

Slight impact on traversal cost (More results available in course notes)

Practical Parallel Processing for Today’s Rendering Challenges -- 171

OverviewOverview

Introduction Interactive ray tracer Animation and interactive ray tracing Sample re-use techniques

Sample Re-use TechniquesSample Re-use Techniques

Practical Parallel Processing for Today’s Rendering Challenges -- 173

Brute Force Ray TracingBrute Force Ray Tracing

Enables interactive ray tracing

Does not allow large image sizes Does not scale to scenes with

high depth complexity

Practical Parallel Processing for Today’s Rendering Challenges -- 174

SolutionSolution

Exploit temporal coherence Re-use results from previous frames

Practical Parallel Processing for Today’s Rendering Challenges -- 175

Practical SolutionsPractical Solutions

Tapestry (Simmons et. al. 2000)• Focuses on complex lighting simulation

Render cache (Walter et. al. 1999)• Addresses scene complexity issues

• Explained next

Parallel render cache (Reinhard et. al. 2000)• Builds on Walter’s render cache

• Explained next

Practical Parallel Processing for Today’s Rendering Challenges -- 176

Render Cache AlgorithmRender Cache Algorithm

Basic setup• One front-end for:

Displaying pixels

Managing previous results

• Parallel back-end for:

Producing new pixels

Practical Parallel Processing for Today’s Rendering Challenges -- 177

Render Cache Front-endRender Cache Front-end

Frame based rendering For each frame do:

• Project existing points

• Smooth image and display

• Select new rays using heuristics

• Request samples from back-end

• Insert new points into point cloud

Practical Parallel Processing for Today’s Rendering Challenges -- 178

Render CacheRender Cache

Practical Parallel Processing for Today’s Rendering Challenges -- 179

Render Cache (2)Render Cache (2)

Point reprojection is relatively cheap Smooth camera movement for small

images Does not scale to large images or large

numbers of renderers front-end becomes bottleneck

Practical Parallel Processing for Today’s Rendering Challenges -- 180

Parallel Render CacheParallel Render Cache

Aim: remove front-end bottleneck• Distribute point reprojection functionality

• Integrate point reprojection with renderers

• Front-end only displays results

Practical Parallel Processing for Today’s Rendering Challenges -- 181

Parallel Render Cache (2)Parallel Render Cache (2)

Practical Parallel Processing for Today’s Rendering Challenges -- 182

Parallel Render Cache (3)Parallel Render Cache (3)

Features:• Scalable behavior for scene complexity

• Scalable in number of processors

• Allows larger images to be rendered

• Retains artifacts from render cache

• Introduces new artifacts

Practical Parallel Processing for Today’s Rendering Challenges -- 183

ArtifactsArtifacts

Render cache artifacts at tile boundaries Image deteriorates during camera

movement

These artifacts are deemed more acceptable than loss of smooth camera movement!

Practical Parallel Processing for Today’s Rendering Challenges -- 184

VideoVideo

Practical Parallel Processing for Today’s Rendering Challenges -- 185

Test ScenesTest Scenes

Practical Parallel Processing for Today’s Rendering Challenges -- 186

ResultsResults

Sub-parts of algorithm measured individually• Measure time per call to subroutine

• Sum over all processors and all invocations

• Afterwards divide by number of processors and number of invocations

• Results are measured in events per second per processor

Practical Parallel Processing for Today’s Rendering Challenges -- 187

Scalability (Teapot Model)Scalability (Teapot Model)

Practical Parallel Processing for Today’s Rendering Challenges -- 188

Scalability (Room Model)Scalability (Room Model)

Practical Parallel Processing for Today’s Rendering Challenges -- 189

Samples Per SecondSamples Per Second

Practical Parallel Processing for Today’s Rendering Challenges -- 190

Reprojections Per SecondReprojections Per Second

Practical Parallel Processing for Today’s Rendering Challenges -- 191

ConclusionsConclusions

Exploitation of temporal coherence gives significantly smoother results than available with brute force ray tracing alone

This is at the cost of some artifacts which require further investigation

(More results available in course notes)

Practical Parallel Processing for Today’s Rendering Challenges -- 192

AcknowledgementsAcknowledgements

Thanks to:• Steven Parker for writing the interactive ray tracer

in the first place

• Brian Smits, Peter Shirley and Charles Hansen for involvement in the animation and parallel point reprojection projects

• Bruce Walter and George Drettakis for the render cache source code

Practical Parallel Processing for Today’s Rendering Challenges -- 193

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering Systems Practical Applications

• Rendering at Clemson / Distributed Computing and Spatial/Temporal Coherence

• Interactive Ray Tracing

• Parallel Rendering and the Quest for Realism: The Kilauea Massively Parallel Ray Tracer (Kato)

Summary / Discussion

Practical Parallel Processing for Today’s Rendering Challenges -- 194

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 195

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 196

ObjectiveObjective

Global illumination Extremely complex scenes

Practical Parallel Processing for Today’s Rendering Challenges -- 197

Parallel ProcessingParallel Processing

Hardware• Multi-CPU machine

• Linux PC cluster

Software• Threading (Pthread)

• Message passing (MPI)

Practical Parallel Processing for Today’s Rendering Challenges -- 198

Our Render FarmOur Render Farm

Practical Parallel Processing for Today’s Rendering Challenges -- 199

Global IlluminationGlobal Illumination

Photon map

Practical Parallel Processing for Today’s Rendering Challenges -- 200

Ray Tracing RendererRay Tracing Renderer

Machine : A B C

Machine : A B C

Machine : A B CRead Scene

Ray Tracing

Shading

Output

Practical Parallel Processing for Today’s Rendering Challenges -- 201

Ray Tracing RendererRay Tracing Renderer

Read Scene

Ray Tracing

Shading

Output

Machine : G H I

Machine : D E F

Machine : A B C

Practical Parallel Processing for Today’s Rendering Challenges -- 202

Ray Tracing RendererRay Tracing Renderer

Machine : G H I

Machine : D E F

Machine : A B CRead Scene

Ray Tracing

Shading

Output

Practical Parallel Processing for Today’s Rendering Challenges -- 203

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 204

Parallel Ray TracingParallel Ray Tracing

Simple case Complex case

Practical Parallel Processing for Today’s Rendering Challenges -- 205

Parallel Ray TracingParallel Ray Tracing

Simple case Complex case

Practical Parallel Processing for Today’s Rendering Challenges -- 206

Accel GridAccel GridHierarchical uniform grid

Scene data

Practical Parallel Processing for Today’s Rendering Challenges -- 207

Simple Case (scene distribution)Simple Case (scene distribution)

Machine A

Machine BScene Data

copy

Practical Parallel Processing for Today’s Rendering Challenges -- 208

Simple Case (ray tracing)Simple Case (ray tracing)

Machine A

Machine BScreen

Practical Parallel Processing for Today’s Rendering Challenges -- 209

Parallel Ray TracingParallel Ray Tracing

Simple case Complex case

Practical Parallel Processing for Today’s Rendering Challenges -- 210

Complex Case (scene distribution)Complex Case (scene distribution)

Machine A

Machine B

Random

Scene Data

Practical Parallel Processing for Today’s Rendering Challenges -- 211

Complex Case (accel grid construction)Complex Case (accel grid construction)Independent construction Aligned by table

Machine B

Machine AMachine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 212

Complex Case (ray tracing)Complex Case (ray tracing)Machine A

Machine B

Screen

CompareResults

Practical Parallel Processing for Today’s Rendering Challenges -- 213

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 214

Parallel Photon MappingParallel Photon Mapping

Photon trace Photon lookup

Practical Parallel Processing for Today’s Rendering Challenges -- 215

Parallel Photon MappingParallel Photon Mapping

Photon trace Photon lookup

Practical Parallel Processing for Today’s Rendering Challenges -- 216

Photon Tracing (simple case)Photon Tracing (simple case)

PhotonMap

Store

Store

Practical Parallel Processing for Today’s Rendering Challenges -- 217

Photon Tracing (complex case)Photon Tracing (complex case)

PhotonMap B

Randomly store

PhotonMap A

Machine B

Machine A

Practical Parallel Processing for Today’s Rendering Challenges -- 218

Parallel Photon MappingParallel Photon Mapping

Photon trace Photon lookup

Practical Parallel Processing for Today’s Rendering Challenges -- 219

Photon Lookup (simple case)Photon Lookup (simple case)

Machine A

Machine B

PhotonMap

PhotonMap

Lookuprequest

Irradiancevalue

Lookuprequest

Irradiancevalue

Practical Parallel Processing for Today’s Rendering Challenges -- 220

Photon Lookup (complex case)Photon Lookup (complex case)

Machine A

Machine B

PhotonMap A

PhotonMap B

Lookuprequest

Irradiancecalculation

Irradiancevaluecopy

Practical Parallel Processing for Today’s Rendering Challenges -- 221

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 222

TaskTask

MtaskWtaskBtaskStaskRtask

AtaskEtaskLtaskPtaskOtask

Practical Parallel Processing for Today’s Rendering Challenges -- 223

Task AssignmentTask Assignment

TaskTask

TaskTask

Machine A Task

TaskTask

Machine B

TaskTask

Task

Machine C

Practical Parallel Processing for Today’s Rendering Challenges -- 224

Roles of TasksRoles of Tasks

pixel

T S RA

ACompare

Practical Parallel Processing for Today’s Rendering Challenges -- 225

Task ConfigurationTask Configuration

A

A

T S RMachine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 226

Task ConfigurationTask Configuration

A

A

T S R

T S R

Machine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 227

Task ConfigurationTask Configuration

A

A

T S R

T S R

Machine A

Machine BA

A

T S R

T S R

Machine C

Machine DA

A

T S R

T S R

Machine E

Machine F

Practical Parallel Processing for Today’s Rendering Challenges -- 228

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Practical Parallel Processing for Today’s Rendering Challenges -- 229

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Practical Parallel Processing for Today’s Rendering Challenges -- 230

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Practical Parallel Processing for Today’s Rendering Challenges -- 231

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel

pixel

Practical Parallel Processing for Today’s Rendering Challenges -- 232

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel Compare

pixel Compare

Practical Parallel Processing for Today’s Rendering Challenges -- 233

Task InteractionTask Interaction

A

A

T S R

T S R

Machine

A

Machine

B

pixel Compare

pixel Compare

Practical Parallel Processing for Today’s Rendering Challenges -- 234

Task Interaction (simple case)Task Interaction (simple case)

A

A

T S R

T S R

Machine

A

Machine

B

Practical Parallel Processing for Today’s Rendering Challenges -- 235

Roles of Tasks (photon map)Roles of Tasks (photon map)

T S

RA

A

LP

PLookup

PhotonMap B

PhotonMap A

Practical Parallel Processing for Today’s Rendering Challenges -- 236

Task Configuration (photon map)Task Configuration (photon map)

A

A

L

P

P

RST

Machine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 237

Task Configuration (photon map)Task Configuration (photon map)

T SR

A

A

L

P

P

L

RST

Machine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 238

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

STMachine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 239

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

ST

photon

photonMachine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 240

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

ST

photon

photonMachine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 241

Task Interaction (photon map)Task Interaction (photon map)

T S

L

P

P

L

ST

photonLookup

photon

Lookup

Machine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 242

Task Configuration (simple photon)Task Configuration (simple photon)

T SR

A

A

L

P

P

L

RST

Machine A

Machine B

Practical Parallel Processing for Today’s Rendering Challenges -- 243

Task PriorityTask Priority

pixel

Compare

photon

T SR

L

A

PLookup

Low HighPriority

Practical Parallel Processing for Today’s Rendering Challenges -- 244

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 245

Parallel Shading ProblemParallel Shading Problem

NReflection

I

P

Cp = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 246

Parallel Shading ProblemParallel Shading Problem

NReflection

I

P

Machine B

Machine A

Cp = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 247

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

Practical Parallel Processing for Today’s Rendering Challenges -- 248

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 249

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 250

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 251

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 252

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 253

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 254

Parallel Shading Problem (solution)Parallel Shading Problem (solution)

AB

C D E

A : C = Cs + CrB : C = Cs + Cr

Practical Parallel Processing for Today’s Rendering Challenges -- 255

Decomposing Shading ComputationDecomposing Shading Computation

shading calculation

Practical Parallel Processing for Today’s Rendering Challenges -- 256

Decomposing Shading ComputationDecomposing Shading Computation

funcA funcBoutside task

shading calculation

Practical Parallel Processing for Today’s Rendering Challenges -- 257

Decomposing Shading ComputationDecomposing Shading Computation

funcA funcBoutside task

shading calculation

SPOT SPOToutside task

Practical Parallel Processing for Today’s Rendering Challenges -- 258

SPOTSPOT

Method+

Data

data slot

Practical Parallel Processing for Today’s Rendering Challenges -- 259

SPOT ConditionSPOT Condition

Practical Parallel Processing for Today’s Rendering Challenges -- 260

Parallel Shading Solution using SPOTParallel Shading Solution using SPOT

Machine B

Machine A

Outside Task

Cs

Cr

C = Cs + CrSPOT

ASPOT

B

ReflectionRay

Practical Parallel Processing for Today’s Rendering Challenges -- 261

Parallel Shading Solution using SPOTParallel Shading Solution using SPOT

SPOT SPOT

SPOT SPOT

SPOT SPOT

Machine A

Machine B

Outside Task

A

B

C

Practical Parallel Processing for Today’s Rendering Challenges -- 262

Shader SPOT Network ExampleShader SPOT Network Example

Practical Parallel Processing for Today’s Rendering Challenges -- 263

OutlineOutline

What is Kilauea ? Parallel ray tracing & photon mapping Kilauea architecture Shading logic Rendering results

Practical Parallel Processing for Today’s Rendering Challenges -- 264

Rendering ResultsRendering Results

Test machine specification• 1GHz Dual Pentium III

• 512Mbyte memory

• 100BaseT Ethernet

• 18 machines connected via 100BaseT switch

Practical Parallel Processing for Today’s Rendering Challenges -- 265

QuatroQuatro 700,223 triangles, 1 area point & sky light,

1280 x 692 18 machines : 7min 19sec

Practical Parallel Processing for Today’s Rendering Challenges -- 266

Quatro : single Atask testQuatro : single Atask test

Speedup

0.00

5.00

10.00

15.00

20.00

25.00

1 3 5 7 9 11 13 15 17

Number of machines

Spe

edup raytrace

linearall

Rendering time

0:00:00

0:14:24

0:28:48

0:43:12

0:57:36

1:12:00

1:26:24

1:40:48

1 3 5 7 9 11 13 15 17

Number of machines

Exe

cutio

n tim

e (h

:m:s

)

allraytrace

Practical Parallel Processing for Today’s Rendering Challenges -- 267

JeepJeep 715,059 triangles, 1 directional & sky light, 1280 x 692 18 machines : 8min 27sec

Practical Parallel Processing for Today’s Rendering Challenges -- 268

Jeep4Jeep4 2,859,636 triangles, 1 directional & sky light, 1280 x 692

18 machines : 12min 38sec 2 Atsks x 1

Practical Parallel Processing for Today’s Rendering Challenges -- 269

Jeep4 : 2 Atasks testJeep4 : 2 Atasks test

1Atask group = 2 machines

Speedup

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

1 2 3 4 5 6 7 8 9

Number of Atask group

Spee

dup raytrace

linearall

Rendering time

0:00:00

0:14:24

0:28:48

0:43:12

0:57:36

1:12:00

1:26:24

1:40:48

1 2 3 4 5 6 7 8 9

Number of Atask group

allraytrace

Practical Parallel Processing for Today’s Rendering Challenges -- 270

Jeep8Jeep8 5,719,072 triangles, 1 directional & sky light, 1280 x 692

16 machines : 18min 43sec 4 Atasks x 4

Practical Parallel Processing for Today’s Rendering Challenges -- 271

Escape PODEscape POD 468,321 triangles, 1 directional & sky light, 1280 x 692 18 machines : 14min 55sec

Practical Parallel Processing for Today’s Rendering Challenges -- 272

ansGunansGun 20,279 triangles, 1 spot & sky light, 1280 x 960 18 machines : 16min 38sec

Practical Parallel Processing for Today’s Rendering Challenges -- 273

SCN101SCN101 787,255 triangls, 1 area light, 1280 x 692 18 machines : 9min 10sec

Practical Parallel Processing for Today’s Rendering Challenges -- 274

VideoVideo

Practical Parallel Processing for Today’s Rendering Challenges -- 275

Conclusion / Future WorkConclusion / Future Work

We achieved:• Close to linear parallel performance

• Highly extensible architecture

We will achieve even more:• Speed

• Stability

• Usability (user interface)

• Etc.

Practical Parallel Processing for Today’s Rendering Challenges -- 276

Additional InformationAdditional Information

Kilauea live rendering demo• BOOTH #1927 SquareUSA

http://www.squareusa.com/kilauea/

Practical Parallel Processing for Today’s Rendering Challenges -- 277

ScheduleSchedule

Introduction Parallel / Distributed Rendering Issues Classification of Parallel Rendering

Systems Practical Applications Summary / Discussion (Chalmers)

Practical Parallel Processing for Today’s Rendering Challenges -- 278

SummarySummary

Practical Parallel Processing for Today’s Rendering Challenges -- 279

Contact InformationContact Information

Alan Chalmers alan@cs.bris.ac.uk

Tim Davistadavis@cs.clemson.edu

Toshi Katohttp://www.squareusa.com/kilauea/

Erik Reinhardreinhard@cs.utah.edu

Slideshttp://www.cs.clemson.edu/~tadavis

top related