porting bullet to opencl - khronos group

35
© Copyright Khronos Group, 2010 - Page 1 Porting Bullet to OpenCL Erwin Coumans Simulation Team Lead Sony Computer Entertainment US R&D

Upload: others

Post on 11-Feb-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

© Copyright Khronos Group, 2010 - Page 1

Porting Bullet to OpenCL

Erwin CoumansSimulation Team Lead

Sony Computer Entertainment US R&D

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

- FindOpenCL.cmake

© Copyright Erwin Coumans, 2010 - Page 3

Some games using Bullet Physics

© Copyright Erwin Coumans, 2010 - Page 4

Some Movies using Bullet Physics

© Copyright Erwin Coumans, 2010 - Page 5

Authoring Tools

• Maya Dynamica Plugin

• Cinema 4D 11.5

• Blender

• Houdini

• Lightwave CORE

• TBA: Standalone Physics Editor

© Copyright Erwin Coumans, 2010 - Page 6

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

© Copyright Erwin Coumans, 2010 - Page 7

Leveraging the AMD, Apple & NVIDIA SDK

• Radix sort, bitonic sort

• Prefix scan, compaction

• Examples how to use fast shared memory

• Uniform Grid example in Particle Demo

© Copyright Erwin Coumans, 2010 - Page 8

Neighbor Search

• Calculate grid index of particle center

• Parallel Radix or Bitonic Sorted Hash Array

• Search 27 neighboring cells

- Can be reduced to 14 because of symmetry

• Interaction happens during search

- No need to store neighbor information

• Jacobi iteration: independent interactions

© Copyright Erwin Coumans, 2010 - Page 9

Interacting Particle Pairs

Array Index

Sorted Cell ID Particle ID

0 4,D

1 4,F

2 6,B

3 6,C

4 6,E

5 9,A

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Interacting Particle Pairs

D,F

B,C

B,E

C,E

A,D

A,F

A,B

A,C

A,E

© Copyright Erwin Coumans, 2010 - Page 10

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

Implement and debug on CPU

• Keep a reference CPU implementation and transfer memory to CPU and back

- All stages in the physics pipeline can be hot-swapped between CPU and GPU

• We implemented MiniCL, a non-compliant OpenCL replacement

- OpenCL kernels compiled and linked as regular C

- Use regular MSVC debugger, breakpoints etc.

- Multi-threaded or sequential for easier debugging

© Copyright Erwin Coumans, 2010 - Page 12

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

© Copyright Erwin Coumans, 2010 - Page 13

Compress data before sending

• Using the GPU Uniform Grid as part of the Bullet CPU pipeline

• Available through btOpenCLBroadphase

• Reduce bandwidth and avoid sending all pairs

• Bullet requires persistent contact pairs

- to store cached solver information (warm-starting)

• Pre-allocate pairs for each object

© Copyright Erwin Coumans, 2010 - Page 14

Persistent Pairs

Before After Difference

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Particle Pairs Before

After Differences

D,F D,F A,B removed

B,C B,C B,C removed

B,E B,E C,F added

C,E C,E C,D added

A,D A,D

A,F A,F

A,B A,C

A,C A,E

A,E C,F

C,D

0

1

2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

© Copyright Erwin Coumans, 2010 - Page 15

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

© Copyright Erwin Coumans, 2010 - Page 16

From particles to rigid bodies

Particles Rigid Bodies

World Transform Position Position and Orientation

Neighbor Search Uniform Grid Dynamic BVH tree

Compute Contacts Sphere-Sphere Generic Convex Closest Points, GJK

Static Geometry Planes Concave Triangle Mesh

Solving method Jacobi Projected Gauss Seidel

© Copyright Erwin Coumans, 2010 - Page 17

Dynamic BVH Trees

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

B

C E

D

F

ADF

BC E A

© Copyright Erwin Coumans, 2010 - Page 18

Dynamic BVH tree acceleration structure

• Broadphase n-body neighbor search

• Ray and convex sweep test

• Concave triangle meshes

• Cloth and Soft body

• Compound collision shapes

• Occlusion culling and view frustum culling

© Copyright Erwin Coumans, 2010 - Page 19

Dynamic BVH tree (instead of uniform grid)

• Keep two dynamic trees, one for moving

objects, other for objects (sleeping/static)

• Find neighbor pairs:

- Overlap M versus M and Overlap M versus S

DF

BC E A

QP

UR S T

S: Non-moving DBVT M: Moving DBVT

© Copyright Erwin Coumans, 2010 - Page 20

Some Optimizations

• Objects can move from one tree to the other

• Incrementally update, re-balance tree

• Tree creation and update is hard to parallelize

- Research and development in progress

• Tree traversal can be parallelized on GPU

- Idea proposed by Takahiro Harada at GDC 2009

© Copyright Erwin Coumans, 2010 - Page 21

Parallel GPU Tree Traversal using History Flags

• Alternative to recursive or stackless traversal

00

00

00

00

© Copyright Erwin Coumans, 2010 - Page 22

Parallel GPU Tree Traversal using History Flags

• 2 bits at each level indicating visited children

10

00

00

00

© Copyright Erwin Coumans, 2010 - Page 23

Parallel GPU Tree Traversal using History Flags

• Set bit when descending into a child branch

10

10

00

00

© Copyright Erwin Coumans, 2010 - Page 24

Parallel GPU Tree Traversal using History Flags

• Reset bits when ascending up the tree

10

10

00

00

© Copyright Erwin Coumans, 2010 - Page 25

Parallel GPU Tree Traversal using History Flags

• Requires only twice the tree depth bits

10

11

00

00

© Copyright Erwin Coumans, 2010 - Page 26

Parallel GPU Tree Traversal using History Flags

• When both bits are set, ascend to parent

10

11

00

00

© Copyright Erwin Coumans, 2010 - Page 27

Parallel GPU Tree Traversal using History Flags

• When both bits are set, ascend to parent

10

00

00

00

© Copyright Erwin Coumans, 2010 - Page 28

History tree traversaldo{

if(Intersect(n->volume,volume)){

if(n->isinternal()) {

if (!historyFlags[curDepth].m_visitedLeftChild){

historyFlags[curDepth].m_visitedLeftChild = 1;

n = n->childs[0];curDepth++;

continue;}

if (!historyFlags[curDepth].m_visitedRightChild){

historyFlags[curDepth].m_visitedRightChild = 1;

n = n->childs[1];curDepth++;

continue;}

}

else

policy.Process(n);

}

n = n->parent;

historyFlags[curDepth].m_visitedLeftChild = 0;

historyFlags[curDepth].m_visitedRightChild = 0;

curDepth--;

} while (curDepth);

© Copyright Erwin Coumans, 2010 - Page 29

Voxelizing objects

© Copyright Erwin Coumans, 2010 - Page 30

General convex collision detection on GPU

• Algorithms are branchy (GJK, EPA)

• Parts of the computation can be accelerated using GPU hardware

- Replace support map by cube map lookup

© Copyright Erwin Coumans, 2010 - Page 31

Parallelizing Constraint Solver

• Projected Gauss Seidel iterations are not embarrassingly parallel

A

B D

1 4

© Copyright Erwin Coumans, 2010 - Page 32

Reordering constraint batches

A B C D

1 1

2 2

3 3

4 4

A

B D

1 4

A B C D

1 1 3 3

4 2 2 4

© Copyright Erwin Coumans, 2010 - Page 33

OpenCL kernel Setup Batches__kernel void kSetupBatches(...)

{

int index = get_global_id(0);

int currPair = index;

int objIdA = pPairIds[currPair * 2].x;

int objIdB = pPairIds[currPair * 2].y;

int batchId = pPairIds[currPair * 2 + 1].x;

int localWorkSz = get_local_size(0);

int localIdx = get_local_id(0);

for(int i = 0; i < localWorkSz; i++){

if((i==localIdx)&&(batchId < 0)&&(pObjUsed[objIdA]<0)&&(pObjUsed[objIdB]<0)){

if(pObjUsed[objIdA] == -1)

pObjUsed[objIdA] = index;

if(pObjUsed[objIdB] == -1)

pObjUsed[objIdB] = index;

}

barrier(CLK_GLOBAL_MEM_FENCE);

}

}

© Copyright Erwin Coumans, 2010 - Page 34

Summary

• Implement and debug kernels on CPU first

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

© Copyright Erwin Coumans, 2010 - Page 35

Thanks!

• Available in SVN branches/OpenCL

- http:\\bullet.googlecode.com

• Visit the Physics Simulation Forum at

- http:\\bulletphysics.org

• For technical details see also Takahiro Harada’s GDC talk

• Email: [email protected]