gpu technology conference 2017 vr rendering...
TRANSCRIPT
© 2017 Autodesk
Michael Nikelsky
Sr. Principal Engineer, Autodesk
GPU Technology Conference 2017
VR Rendering Improvements FeaturingAutodesk VRED
Ingo Esser
Sr. Engineer, Developer Technology, NVIDIA
2
AGENDA
NVIDIA VRWorksat a glance
Autodesk VREDVR Rendering Improvements
3
NVIDIA VRWORKSComprehensive SDK for VR Developers
GRAPHICS HEADSET AUDIOTOUCH & PHYSICS
PROFESSIONAL
VIDEO
4
NVIDIA VRWORKSComprehensive SDK for VR Developers
GRAPHICS HEADSET AUDIOTOUCH & PHYSICS
PROFESSIONAL
VIDEO
5
GRAPHICS PIPELINEVR Workloads
1512
1680
1512
124M Pix/sN vertices
60 Hz
457M Pix/s2N vertices
90 Hz
Preprocessing
Geometric
Pipeline
Rasterization
Fragment Shader
Postprocessing
~3.6x
3x
1080
1920
6
NVIDIA VRWORKSComprehensive SDK for VR Developers
GRAPHICS HEADSET AUDIOTOUCH & PHYSICS
PROFESSIONAL
VIDEO
7
SINGLE PASS STEREO
Render eyes separately
Doubles CPU and GPU load
Traditional Rendering
8
SINGLE PASS STEREO
Single Pass Stereo uses Simultaneous Multi-Projection architecture
Draw geometry only once
Vertex/Geometry stage runs onceOutputs two positions for left/right
Only rasterization is performed per-view
Using SPS to improve rendering performance
9
SINGLE PASS STEREO
In OpenGL via GL_NV_stereo_view_rendering
Create texture arrayfor rendering left and right eye simultaneously
No other changes needed, shaders perform SPS
OpenGL
10
SINGLE PASS STEREO
Calculate projection space position
proj_pos = proj * view * model * inPosition;
Output both positions via different builtin variables, only x component may differ
gl_Position = proj_pos + vec4(offset, 0, 0, 0);
gl_SecondaryPositionNV = proj_pos – vec4(offset, 0, 0, 0);
Use declaration and value of gl_Layer to route output to layers 0 and 1 of tex array
layout(secondary_view_offset=1) out highp int gl_Layer;
gl_Layer = 0;
Vertex Shader
11
Single Pass Stereo brings benefitsin geometry bound scenarios
Heavy fragment shaders will reduce scaling
GRAPHICS PIPELINESingle Pass Stereo Performance Results
Preprocessing
Geometric
Pipeline
Rasterization
Fragment Shader
Postprocessing
SPS
12
NVIDIA VRWORKSComprehensive SDK for VR Developers
GRAPHICS HEADSET AUDIOTOUCH & PHYSICS
PROFESSIONAL
VIDEO
13
HMD OPTICSCountering Lens Distortion
User’s ViewDisplayed Image Optics
14
HMD RENDERINGOversampling near the borders
Displayed ImageRendered Image
15
LENS MATCHED SHADINGFour Viewports
Original Image LMS Image
16
In OpenGL via GL_NV_clip_space_w_scaling extension
Set up four viewports, rendering full resolution
Set scissors to each quadrant
glScissorArray(0, 4, scissors);
W scaling parameters
glViewportPositionWScaleNV(i, Wx, Wy);
Viewport 0
Scissor 0
LENS MATCHED SHADINGOpenGL
17
LENS MATCHED SHADING
gl_ViewportMask[0] controls broadcastingof vertices and primitives
Inefficient – set mask in vertex shader
gl_ViewportMask[0] = 15;
More efficient – filter in pass through geometry shader
Determine quadrant(s) for each primitive
Set bit(s) in gl_ViewportMask[0]
Shaders
Viewport 0
Scissor 0
18
HMD runtime can‘t consume w warped images yet, need to unscale before submit
𝑠𝑐𝑎𝑙𝑒 =1
1− 𝑤𝑥∗𝑃′𝑥 −𝑤𝑦∗𝑃
′𝑦
𝑃′ = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑃
𝑢𝑛𝑠𝑐𝑎𝑙𝑒 =1
1+ 𝑤𝑥∗𝑃𝑥 +𝑤𝑦∗𝑃𝑦
𝑃 = 𝑢𝑛𝑠𝑐𝑎𝑙𝑒 ∗ 𝑃′
LENS MATCHED SHADINGScaling and Unscaling
Quadrant 0
0,0
w/2, h/2
𝑃′
𝑢𝑛𝑠𝑐𝑎𝑙𝑒
𝑠𝑐𝑎𝑙𝑒
𝑃
19
LENS MATCHED SHADINGExtreme example, Wx = 2.0 Wy = 2.0
20
LENS MATCHED SHADINGExtreme example, Wx = 2.0 Wy = 2.0
21
GRAPHICS PIPELINE
LMS can improve performance ofRaster / Fragment stage
Trade-off between quality and performance
Lens Matched Shading Results
Preprocessing
Geometric
Pipeline
Rasterization
Fragment Shader
Postprocessing
LMS
SPS
22
NVIDIA VRWORKSComprehensive SDK for VR Developers
GRAPHICS HEADSET AUDIOTOUCH & PHYSICS
PROFESSIONAL
VIDEO
23
HMD RENDERINGVR SLI functionality
VR SLI HMD rendering
Prepare scene
Upload left view data to GPU0
Upload right view data to GPU1
Render scene on both GPUs
Transfer texture
Submit to HMD
Separate data upload
GL allocations & uploads are broadcast
GL render calls are broadcast
Efficient texture copies
24
VR SLI
Command & data broadcast
BufferSubData to specific GPU
CopyImageSubData & CopyBufferSubData
GPU-GPU Framebuffer Blit
Global barrier & directed sync functions
GPU Masks
Per-GPU sample locations
Per-GPU queries
Updates between NVX and NV extensions
25
VR SLIBroadcast allocations & uploads
Geometry
Parameters
Textures
Left view data
Right view data
tex0 tex1
tex0 tex1
26
VR SLIBroadcast allocations & uploads
for( auto i = 0; i < 2; ++i ){sceneData.viewMatrix = view[i];sceneData.viewProjMatrix = proj[i] * view[i];
glMulticastBufferSubDataNV (
1<<i,
sceneUbo,
0, sizeof(SceneData), &sceneData
);}
GPU Mask
Same UBO
Per-eye data
Different data
27
VR SLI
Application sends draw commands only once
Commands are broadcast between GPUs
Broadcast render commands
Render
tex0 tex1
tex0 tex1
28
VR SLIBroadcast render commands
tex0 tex1
tex0 tex1
R
L
glBindFramebuffer( ...,renderFBO);
glFramebufferTexture2D( ...,tex0,0 );
render(); render on both GPUs
tex0 on both GPUs
29
VR SLI
Copy function allows direct copy between GPUs
Avoids CPU copy, transfer directly via PCIe
Texture transfer
glMulticastWaitSyncNV(0, GPUMASK_1 );
glMulticastCopyImageSubDataNV(1, 1<<0,tex0, ...,tex1, ...,width, height, 1);
glMulticastWaitSyncNV(1, GPUMASK_0 );
copy tex0 @ GPU 1to tex1 @ GPU0
GPU 1 wait for GPU 0(Target is ready)
GPU 0 wait for GPU 1(Copy is done)
tex0 tex1
tex0 tex1
R
L R
30
GRAPHICS PIPELINE
VR SLI covers a wide variety of workloads
Perfect load balancing betweenleft/right eye and two GPUs
Copy overhead and view independentworkloads limit scaling
Some pre- and postprocessingcan be distributed
VR SLI Results
Preprocessing
Geometric
Pipeline
Rasterization
Fragment Shader
Postprocessing
LMS
SPS
VR SLI
31
TRY IT OUT!
NVIDIA VRWorks SDK provides OpenGL, Direct3D & Vulkan samples
developer.nvidia.com/vrworks
Extensions
www.khronos.org/registry/OpenGL/extensions/NV/NV_stereo_view_rendering.txt
www.khronos.org/registry/OpenGL/extensions/NV/NV_clip_space_w_scaling.txt
www.khronos.org/registry/OpenGL/extensions/NV/NV_gpu_multicast.txt
32
AGENDA
NVIDIA VRWorksat a glance
Autodesk VREDVR Rendering Improvements
We may make statements regarding planned or future development efforts for our existing or new products and services. These statements are not intended to be a promise or guarantee of future availability of products, services or features but merely reflect our current plans and based on factors currently known to us. These planned and future development efforts may change without notice. Purchasing decisions should not be made based upon reliance on these statements.
These statements are being made as of May, 9th 2017 and we assume no obligation to update these forward-looking statements to reflect events that occur or circumstances that exist or change after the date on which they were made. If this presentation is reviewed after May, 9th 2017, these statements may no longer contain current or accurate information.
Safe harbor statement
Autodesk VRED Professional
▪ Visualization and virtual prototyping tool
▪ Focus on Automotive
▪ High Quality OpenGL and raytracing rendering
▪ VR support
▪ Powerwalls, Cave
▪ Oculus Rift
▪ HTC Vive
Image courtesy of Porsche AG
Requirements
▪ Engineering Datasets
▪ 30-70M triangles insidethe view frustum
▪ 3-5k meshes
▪ 10-20k scenegraph nodes
▪ 100-300 materials
▪ Realistic appearence
▪ Measured materials
▪ No data reduction possible
Image courtesy of Porsche AG
Single Pass Stereo
▪ Render to layered texture
▪ Use latest drivers
▪ Don´t write to individual layers
▪ Adjust Frustum culling to account for both eyes
▪ Setup uniform buffers with matrices for both eyes
▪ Set layout(secondary_view_offset = 1) out int gl_Layer
▪ Use gl_Layer to access correct matrices for shading
▪ Write gl_SecondaryPositionNV in vertex or geometry shader
Lens Matched Shading
▪ Not yet available in VRED
▪ Divide view into 4 quadrants
▪ Set lens coefficients for eachquadrant
▪ Setup scissor masks for eachviewport
▪ Render to all viewports
▪ Unproject the distortion
Lens Matched Shading
▪ Need to avoid rendering outside the visible area
▪ glWindowRectanglesEXT
▪ Hidden Area Mesh
▪ Need to calculate whichviewports a triangle touches
▪ Use pass through geometryshader for best performance
▪ Requires different shader foreach geometry type
Datasets used for testing
▪ Small Dataset
▪ ~5.5 Mtriangles, ~900 meshes, 2.5k nodes
▪ Medium Dataset
▪ ~34 Mtriangles, ~3k meshes, 19k nodes
▪ Large Dataset
▪ ~63 Mtriangles, ~5k meshes, 17k nodes
▪ Measurements done using
▪ 2 Quadro P6000
▪ 4x Multisampling + Pixelfilter
▪ HTC Vive
Results
10,0 9,58,1 7,8
16,615,3 15,0 15,0
22,5
17,1
29,0
23,0
0,0
10,0
20,0
30,0
Baseline Single PassStereo
Lens MatchedShading
LMS + SPS
Fra
metim
eM
illis
econds
Small Dataset
Medium Dataset
Large Dataset
Occlusion Culling
▪ Shader based occlusion culling
▪ https://github.com/nvpro-samples/gl_occlusion_culling
▪ Algorithm
▪ Render all geometries visible in the previous frame
▪ Disable Color and Depth writes
▪ Rasterize bounding boxes of all geometries
▪ Record visible bounding boxes
▪ Read back results
▪ Render remaining visible geometries
Why the readback?
▪ Original algorithm relies on bindless buffers and textures
▪ Requires custom memory management
▪ Few buffers shared by many objects
▪ Difficult to handle out of memory scenarios
▪ CPU does not know what is visible
▪ Requires binding of all shaders and geometries twice
▪ Binding costs can eliminate performance gains
▪ Sorted rendering difficult
Occlusion Culling results
10,0
7,05,5
16,617,9
14,5
22,5
17,1
13,3
0,0
10,0
20,0
30,0
Baseline Occlusion Culling LMS + SPS + OcclusionCulling
Fra
metim
eM
illis
econds
Small Dataset
Medium Dataset
Large Dataset
VR SLI Rendering
▪ For details see GTC 2016 talk: „Integrating VR SLI into Autodesk VRED“
▪ Use one GPU per eye
▪ Bind rendersurface
▪ Setup Camera Buffer for both eyes
▪ Render the scene
▪ Copy rendersurface from GPU1 to GPU0
▪ Submit rendersurfaces to HMD
▪ New NV_gpu_multicast extension allows more flexibility
▪ Occlusion Culling
VR SLI results
10,0
5,8 6,0 5,6
16,6
8,9
5,8 5,7
22,5
11,2
5,6 6,0
0,0
10,0
20,0
30,0
Baseline SLI SLI + Culling SLI + LMS +Culling
Fra
metim
eM
illis
econds
Small Dataset
Medium Dataset
Large Dataset
Conclusion and final thoughts
▪ Using extensions can greatly improve performance
▪ Not every extension always works
▪ Test out different options
▪ SLI still the best option
▪ Asynchronous Reprojection/Timewarp helps a lot
Autodesk and the Autodesk logo are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders.
Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.
© 2017 Autodesk. All rights reserved.