gdc : mar16.pdf
TRANSCRIPT
![Page 1: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/1.jpg)
© Copyright Khronos Group 2016 - Page 142
Swapchains Unchained!(What you need to know about Vulkan WSI)
Alon Or-bach, Chair, Vulkan Window System Integration Sub-Group – March 2016
![Page 2: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/2.jpg)
© Copyright Khronos Group 2016 - Page 143
Intro to Vulkan Window System Integration• Explicit control for acquisition and
presentation of images - Designed to fit the Vulkan API and today’s
compositing window systems
• Not all extensions are supported by every platform- You MUST check and enable the extensions
your app/engine uses!!!
• Today’s presentation should help you get presentation working- Learn how to present through a swapchain
- Overview of Vulkan objects used by the WSI
extensions
WSI Jargon Buster• Platform
Our terminology for an OS
/ window system e.g.
Android, Windows,
Wayland, X11 via XCB
• Presentation EngineThe platform’s compositor
or display engine
• ApplicationYour app or game engine
![Page 3: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/3.jpg)
© Copyright Khronos Group 2016 - Page 144
How many WSI extensions are there?• Two cross-platform instance extensions- VK_KHR_surface
- VK_KHR_display
• Six (platform) instance extensions- VK_KHR_android_surface
- VK_KHR_mir_surface
- VK_KHR_wayland_surface
- VK_KHR_win32_surface
- VK_KHR_xcb_surface
- VK_KHR_xlib_surface
• Two cross-platform device extensions- VK_KHR_swapchain
- VK_KHR_display_swapchain
![Page 4: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/4.jpg)
© Copyright Khronos Group 2016 - Page 145
Vulkan Surfaces • VkSurfaceKHR- Vulkan’s way to encapsulate a native
window / surface
• Platform-independent surface queries- Find out crucial information about your
surface’s properties- e.g., if presentation is supported by a
particular queue on a particular device
- Some platforms provide additional queries
• An implementation may support multiple platforms- e.g., both xlib and xcb
Physical Device A
Platform X
Queue Family 2
Queue Family 1 Queue
Family 0
Platform Y
Physical Device B
Queue Family 1Queue
Family 0
Surface from
Platform X
Physical Device C
Queue Family 1Queue
Family 0
![Page 5: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/5.jpg)
© Copyright Khronos Group 2016 - Page 146
Vulkan Swapchains: VK_KHR_swapchain• Array of presentable images associated with
a surface- Application requests a minimum number
of presentable images
- Implementation creates at least that
number
- Implementation may have a limit
• Upfront allocation of presentable images- No allocation hitching at crucial moment
- Pre-record fixed content command buffers
• Present mode determines behavior- FIFO support mandatory
- Platforms can offer mailbox,
immediate, FIFO relaxed
const VkSwapchainCreateInfoKHR createInfo ={VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR, // sTypeNULL, // pNext0, // flagsmySurface, // surfacedesiredNumberOfPresentableImages, // minImageCountsurfaceFormat, // imageFormatsurfaceColorSpace, // imageColorSpacemyExtent, // imageExtent1, // imageArrayLayersVK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, // imageUsageVK_SHARING_MODE_EXCLUSIVE, // imageSharingMode0, // queueFamilyIndexCountNULL, // pQueueFamilyIndicessurfaceProperties.currentTransform, // preTransformVK_COMPOSITE_ALPHA_INHERIT_BIT_KHR, // compositeAlphaswapchainPresentMode, // presentModeVK_TRUE, // clippedVK_NULL_HANDLE // oldSwapchain};
![Page 6: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/6.jpg)
© Copyright Khronos Group 2016 - Page 147
Vulkan Swapchains: They’re good!• Application knows which image within a
swapchain it is presenting- Content of image preserved between
presents
• Application is responsible for explicitly recreating swapchains - no surprises- Platform informs app if current swapchain
- Suboptimal: e.g. after window resize,
swapchain still usable for present via image
scaling
- Surface Lost: swapchain no longer usable for
present
- Application is responsible to create a new
swapchain
![Page 7: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/7.jpg)
© Copyright Khronos Group 2016 - Page 148
Vulkan Swapchains: They’re jolly good!• Presenting and acquiring are separate
operations- No need to submit a new image to acquire
another one, unless presentation engine
cannot release it
• Application must only modify presentable images it has acquired
• Presentation engine must only display presentable images that have been presented!
Stalls in frame loop are very bad!
![Page 8: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/8.jpg)
© Copyright Khronos Group 2016 - Page 149
VK_KHR_<platform>_surface
VK_KHR_surface
VK_KHR_swapchain
Platform-specific APIs
Steps to setup your presentable images1 – Create a native window/surface
2 – Create a Vulkan surface
3 – Query information about your surface
4 – Create a Vulkan swapchain
5 – Get your presentable images
![Page 9: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/9.jpg)
© Copyright Khronos Group 2016 - Page 150
VK_KHR_swapchain
Vulkan Frame Loop – as easy as 1-2-3!
2 – Submit command buffer(s) for that image
1 – Acquire the next presentable image 3 – Present the image
0 – Create your swapchain
LegendSetup
Steady-state
Response to suboptimal
/ surface_lost
![Page 10: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/10.jpg)
© Copyright Khronos Group 2016 - Page 151
Vulkan Displays: VK_KHR_display• Vulkan’s way to discover display devices
(screens, panels) outside a window system- Reminder: Not supported on all platforms
• Defines VkDisplayKHR and VkDisplayModeKHR objects- Represent the display devices and the
modes they support connected to a
VkPhysicalDevice
- Determine if a display supports multiple
planes that are blended together
• Enables creation of a VkSurfaceKHR to represent a display plane
Physical Device
Surface
Display 0
Plane 2Plane 1
Plane 0
Display Mode 1Display
Mode 0
Display 1
Display Mode 1Display
Mode 0
![Page 11: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/11.jpg)
© Copyright Khronos Group 2016 - Page 152
VK_KHR_display_swapchain• Extends the information provided at vkQueuePresentKHR- What region to present from the swapchain image
- What region to present to on the display
- Whether the display should persist the image
• Adds ability to create a shared swapchain- Swapchain that takes multiple VkSwapchainCreateInfoKHR structs
- Allows multiple displays to be presented to simultaneously
- No guarantee that presents are atomic ...presently!
![Page 12: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/12.jpg)
© Copyright Khronos Group 2016 - Page 153
Any question?
[email protected]@alonorbach (disclaimers apply!)
![Page 13: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/13.jpg)
© Copyright Khronos Group 2016 - Page 1
LunarG® SDK for Vulkan®
Karen Ghavam, CEOKarl Schultz, Principal EngineerJon Ashburn, Principal Engineer
![Page 14: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/14.jpg)
© Copyright Khronos Group 2016 - Page 2
Enter the Raffle for your prize!Congratulations!
You are the recipient of the Vulkan Programming Guide, courtesy of LunarG!
Is your OpenGL Programming Guide getting lonely? Well, it will soon have a companion. In August 2016, when the Vulkan Programming Guide becomes available, LunarG will ship it directly to you!
In the meantime, visit LunarXchange (Vulkan.lunarg.com) for the LunarG SDK for Vulkan, and accept this book bag, anxiously awaiting its Vulkan Programming Guide.
![Page 15: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/15.jpg)
© Copyright Khronos Group 2016 - Page 3
LunarG SDK• Loader Binary• Validation Layer Libraries• Vulkan trace and replay tools- vktrace- vkreplay
• SPIR-V Tools- GLSL Validator - SPIR-V Disassembler and Assembler - SPIR-V Remapper
• RenderDoc*• Sample Programs
*For a detailed demonstration of RenderDoc don’t miss:Practical Development for Vulkan (presented by Valve Software). Thursday. 12:45 – 1:45. Room 3009, West Hall
![Page 16: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/16.jpg)
© Copyright Khronos Group 2016 - Page 4
Download the LunarG SDK for Vulkan at LunarXchange: vulkan.lunarg.com
Version 1.0.5.0 now available!
![Page 17: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/17.jpg)
© Copyright Khronos Group 2016 - Page 5
The Power of a Layered Ecosystem
Development pathValidation
layer
Debug layer
Other layers
Production path
Vulkan application
Installable Client Driver
Vulkan application
Installable Client Driver
Loader
Loader
![Page 18: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/18.jpg)
© Copyright Khronos Group 2016 - Page 6
Layers: Fully IntegratedProgrammatic Approach
Vulkan application
Debug Report
Callback
Installable Client Driver
Layer
Application supplies list
of layers
Application handles messages in
callback
Layers report “results” as
messages
Loader
![Page 19: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/19.jpg)
© Copyright Khronos Group 2016 - Page 7
Layers: Externally Activated“Ad-hoc” Approach
Vulkan application
Debug Report
Callback
Installable Client Driver
Layer
User sets environment variables:
VK_INSTANCE_LAYER=“layer name”
Default Debug Report writes to output stream
Layers report “results” as
messages
Loader
Layer Settings File
![Page 20: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/20.jpg)
© Copyright Khronos Group 2016 - Page 8
Demo We’ll Be Using
“Hologram”By
Chia-I Wu (olv)
• Well-written Vulkan demo• Simulation of 5000 moving objects• Demonstrates multi-threaded command
buffer recording• Can be found in:• https://github.com/LunarG/VulkanSamples
![Page 21: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/21.jpg)
© Copyright Khronos Group 2016 - Page 9
Demo!
Watch the demo for a minute or so
![Page 22: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/22.jpg)
© Copyright Khronos Group 2016 - Page 10
A Few Hologram Internals – Object Data
5000 ShaderParamBlocks
struct ShaderParamBlock {float light_pos[4];float light_color[4];float model[4 * 4];float view_projection[4 * 4];
};
One ShaderParamBlock per Object
For Each Frame and For Each Object:• Modify ShaderParamBlock• BindDescriptorSet
Two Frames of Object Data
![Page 23: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/23.jpg)
© Copyright Khronos Group 2016 - Page 11
Modify DemoLet’s add code to modulate the transparency of each object, independently, as a function of time.To do this, we need to:
1. Add a parameter to the ShaderParamBlock: “per-object” alpha2. Modify the shader program to apply the per-object alpha3. Modify the Simulation to change the transparency of each object over time
Start with Step 1!struct ShaderParamBlock {
float light_pos[4];
float light_color[4];
float model[4 * 4];
float view_projection[4 * 4];
float alpha;
};
![Page 24: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/24.jpg)
© Copyright Khronos Group 2016 - Page 12
Let’s See What Happens
Change the code and re-run demo
![Page 25: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/25.jpg)
© Copyright Khronos Group 2016 - Page 13
More Information• Layer Documentation- LunarXchange website (https://vulkan.lunarg.com/app/docs/latest/layers)- More details on validation and other layers
• Screenshot Layer- Good for showing someone else what is wrong- Also can be used for before/after image-compare testing
• Vktrace/Vkreplay- Useful for sending someone a trace file in lieu of setting up a reproduction
scenario
![Page 26: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/26.jpg)
A next gen Engine design on a next gen API
Dan Baker
Graphics Architect, Oxide Games
![Page 27: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/27.jpg)
Nitrous design philosophies
• Job based threading
• Message based systems
• Redundant, shallow state design
• Always evaluate – opposite of Lazy Evaluation
• Efficient memory streaming
• Asynchronous systems
![Page 28: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/28.jpg)
Data driven design
Unit AI System
MessageQueue
Physics Queue
FOW queue
Minimap queue
Message Dispatcher
![Page 29: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/29.jpg)
Relating to Graphics Stack
• Collection of messages and systems extends into graphics
• Dozens of independent systems can operate in parallel
• Big systems internally parrelize (e.g. particles, unit rendering)
![Page 30: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/30.jpg)
A modern API
• Concept of message based, asynchronous design well matched
Exposure of asynchronous nature of a GPU is the key design difference of Vulkan over OpenGL/D3D11
![Page 31: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/31.jpg)
A contract between App and API
• Application will not make conflicting calls on the same objects (e.g. writing one object while another is reading it)
• Driver will generally not lock or serialize any API call– Context information is embedded on the
object being operated on
– With exception to occasional CPU side memory allocation (but should be rare occurrence on create calls)
![Page 32: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/32.jpg)
Application runs parallel to GPU
Even Command Buffers
Odd Command Buffers
Delete Queue
Delete Queue
Application GPU
Flush Queue
![Page 33: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/33.jpg)
Application runs parallel to GPU
Even Command Buffers
Odd Command Buffers
Delete Queue
Delete Queue
Application GPU
Flush Queue
![Page 34: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/34.jpg)
Review
• When we say Vulkan is free threaded, we mean– most API function calls are operators. They operate only on data which
is passed into them as output, and read-only the data passed on that as input
– API function calls are transparent for thread safety: valid to call so long as the there is no read/write or write/write hazards. Apps responsibility to manage them
– GPU/CPU hazard is explicitly exposed. GPUs are read operators on data, therefore read/write hazards between CPU/GPU must also be managed by application
– In General, API function calls will not have locks in them• With exception to calls which must allocate some types of memory
![Page 35: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/35.jpg)
Old way
Sim Job Sim Job Sim JobCore 1
Current Frame
Sim JobCore 2
Sim JobCore 3
Sim JobCore 4
AI Job
Sim Job
Graphics
Core 5
Game Job
Core 6
???GPU Fence, or CPU wait???
Sim Job
Sim Job
Sim Job
Graphics (Opaque, in driver)
AI Job
Game Job
Game Job
Dead time
Game Job
Game Job
AI Job AI Job
Physics Job
Physics Job
Physics Job
![Page 36: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/36.jpg)
Old Way
Driver related cores. Missing time due to thread accounting and system level synchronization primitives
Lots of unused CPU space! Engine is just waiting for driver to be done
![Page 37: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/37.jpg)
Powerful New model
Sim Job Sim Job Sim JobVulkan
CMD JobVulkan
CMD JobCore 1
Current Frame
Sim Job
Sim Job
VulkanCMD Job
VulkanCMD JobCore 2
Sim Job Sim JobVulkan
CMD JobVulkan
CMD JobCore 3
Sim Job Sim JobVulkan
CMD JobVulkan
CMD JobCore 4
AI Job
Sim Job Sim JobVulkan
CMD JobVulkan
CMD JobCore 5
Game Job
Sim Job Sim JobVk present
JobCore 6
GPU Fence End of Frame
Sim Job
Sim Job
Sim Job
Sim JobVulkan
CMD JobVulkan
CMD Job
AI Job
Game Job
Game Job
Next Frame
![Page 38: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/38.jpg)
New way
Vulkan simulation using a modified Mantle build to simulate infinitely fast GPU
![Page 39: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/39.jpg)
Difficult part of Vulkan
• Need to have a strategy for rendering up front, not lazy eval
• Before can setup shader, need to understand bindings, before bindings, need to understand descriptors– Probably need to know these even before a descriptor is
created
• The more you can know about a render job at compile time, the easier Vulkan will be
![Page 40: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/40.jpg)
Setting up the Engine
• Pipelines created up front, combination(s) specified in shaderlanguage
• No concept of individual shader stages – Vertex/Fragment considered one block
• 64 mb temp buffer created for each frame– Shader constants– No buffers are updated directly– Any updates are dumped into staging buffer and copied – When 64 mbs is exceeded, slow allocation path is used, typically only
initialization
• Internal command format that can be built in parallel
![Page 41: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/41.jpg)
Shader Combos
• Large, monolithic blocks with many state folded in
– Shaders
– Alpha state
– MSAA state
– Depth State
• Managing combinatorics is major challenge
![Page 42: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/42.jpg)
Shader Combos
• Very unlikely that hardware actually needs to create unique pipeline object– The problem is that each hardware has a different state that might
require a new shader
• Vulkan has bulk shader create – Give a bunch of shader combinations at once to driver– Most likely driver only has to create a few actual shaders
• Nitrous does group creates – 20-40 combinations of a pipeline that might get used. A little bit of pruning for shader author
![Page 43: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/43.jpg)
Pipeline serialization
• Major problem with D3D12
• Serialization context is passed into shader create
– Needed because most pipelines are not unique
• Driver will use this is a database to store compiled pipeline object
• Can serialize the whole database
![Page 44: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/44.jpg)
Texture Sets
• Nitrous eliminates individual shader bindings
• Textures must be part of groups
• Maps to a descriptor set
![Page 45: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/45.jpg)
Bind Vector
Batch Shader SetPrimitive (vertices)
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Constant Set
Constant Set
Constant Set
Constant Set
Constant Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Constant Set
Constant Set
Constant Set
Constant Set
Constant Set
![Page 46: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/46.jpg)
Bind VectorTexture Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Texture Set
Constant Set
Constant Set
Constant Set
Constant Set
Constant Set
• Becomes a Layout in Vulkan• Layouts are specified during the shader
creation stage• Nitrous uses only 1 master layout
• Most engines will use multiple• Switching layouts has cost
• Can easily sort off redundant changes, only call bind descriptor when something needs changing
![Page 47: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/47.jpg)
Manging hazards
• The trickiest part of Vulkan• Must manage any time a resource will be used differently
– Cache Flush– Operator barrier– Decompression
• USE THE VALIDATOR– Could get correct results on current hardware only to see problems on future
hardware– No different then multi-threaded coding
• Consider having engine layer automatically partially calculate barriers– Good design should do a good job– Nitrous is 100% explicit right now, but will likely to switch to partial automatic system
![Page 48: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/48.jpg)
General performance
• Shader auto recompiling won’t happen automatically– Constant folding
– But no frame stutters due to recompiles
• Memory barriers can introduce stalls
– Need to plan out
• Changing pipelines, layouts frequently
![Page 49: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/49.jpg)
Threading/Command buffers
• Best idea is to have many command buffers, but 1 allocator per thread per frame queued
• Command buffer allocation can cause memory bloat
• Nitrous sorts command buffers from estimated size, largest first, down to smallest
![Page 50: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/50.jpg)
Questions
twitter: dankbaker, oxidegames
![Page 51: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/51.jpg)
Performance Lessons from Porting Source 2 to Vulkan
Dan Ginsburg
![Page 52: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/52.jpg)
Overview
Dota 2 Vulkan Performance Results
Performance Lessons Learned
![Page 53: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/53.jpg)
Overview
Dota 2 Vulkan Performance Results
Performance Lessons Learned
![Page 54: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/54.jpg)
Source 2 Overview
OpenGL, Direct3D 9, Direct3D 11, Vulkan
Windows, Linux, Mac
Dota 2 Reborn
![Page 55: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/55.jpg)
Dota 2 Performance Results - Disclaimer
Not an ideal showcase for Vulkan
Source 2 renderer is multithreaded, but…
Dota 2 is only ~1500 draw calls per frame
Allows DX/GL a frame of latency to avoid being
renderthread bound
Does not (yet!) take advantage of:
Baking descriptors
Command buffer resubmission
![Page 56: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/56.jpg)
Dota 2 Performance Results - Disclaimer
Not an ideal showcase for Vulkan
Source 2 renderer is multithreaded, but…
Dota 2 is only ~1500 draw calls per frame
Allows DX/GL a frame of latency to avoid being
renderthread bound
Does not (yet!) take advantage of:
Baking descriptors
Command buffer resubmission
Still very pleased with results!
![Page 57: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/57.jpg)
Dota 2 Vulkan Performance – DX9 Latency
Frame Start Frame End
![Page 58: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/58.jpg)
Dota 2 Vulkan Performance – DX9 Latency
Frame Start Frame End Present Issued
![Page 59: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/59.jpg)
Dota 2 Vulkan Performance – DX9 Latency
Frame Start Frame End Present Issued
DX9 Latency: 3.8ms
![Page 60: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/60.jpg)
Dota 2 Vulkan Performance – Vulkan Latency
Frame Start Frame End
![Page 61: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/61.jpg)
Dota 2 Vulkan Performance – Vulkan Latency
Frame Start Frame End Present Issued
![Page 62: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/62.jpg)
Dota 2 Vulkan Performance – Vulkan Latency
Frame Start Frame End Present Issued
Vulkan Latency: 0.4ms (!)
![Page 63: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/63.jpg)
Dota 2 Vulkan – Latency Reduction
Renderthread no longer a bottleneck
Reduces “wallclock” time of frame
Time from end of frame to present reduced by 3.4ms
Really important for:
Latency sensitive games (eSports)
VR
![Page 64: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/64.jpg)
Dota 2 Vulkan - Framerate
Two timedemos:
Typical Dota 2 Match
High Drawcall Battle Scene
Test system:
NVIDIA TITAN X 356.45
i7-3770k @ 3.50GHz
Test settings:
Resolution: 640x480 (CPU Perf)
Highest Rendering Quality
Vulkan/GL/DX9/DX11
![Page 65: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/65.jpg)
Dota 2 Timedemo – Typical Dota 2 Match
![Page 66: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/66.jpg)
Dota 2 Timedemo – Typical Dota 2 Match
182.95
170.55
188.5
128.1
FPS
NVIDIA TITAN X i7 3770k 640x480 356.45 - HQ
Vulkan OpenGL DX9 DX11
![Page 67: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/67.jpg)
Dota 2 Timedemo – Battle Scene
![Page 68: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/68.jpg)
Dota 2 – High Drawcall Timedemo
85.3
75.15 75.65
67.5
FPS
NVIDIA TITAN X i7 3770k 640x480 356.45 - HQ
Vulkan OpenGL DX9 DX11
![Page 69: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/69.jpg)
Dota 2 Vulkan Performance - Overall
Significant latency reduction
Improved framerate in heavy scenes
Only going to get better…
![Page 70: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/70.jpg)
Overview
Dota 2 Vulkan Performance Results
Performance Lessons Learned
![Page 71: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/71.jpg)
Overview
Dota 2 Vulkan Performance Results
Performance Lessons Learned
Command Buffer Recycling
Command Buffer Batching
Redundant Call Filtering
Updating Descriptors
Pipeline Cache Usage
![Page 72: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/72.jpg)
Command Buffer Recycling Overview
At least one VkCommandPool per thread
Recycling options:
vkResetCommandPool – resets all command buffers in
pool
vkResetCommandBuffer – reset single command buffer
Reset can either recycle or release resources
![Page 73: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/73.jpg)
Command Buffer Recycling
Souce 2 recycles individual command buffers after
completion
vkBeginCommandBuffer costly
Using VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT
Driver reallocates resources
Done to reduce memory footprint, but came at perf cost
![Page 74: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/74.jpg)
Fast Command Buffer Recycling
vkCreateCommandPool
Use VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT
vkResetCommandBuffer( pCmdBuffer, 0 )
flags == 0, keeps resources for reuse
Downside: memory growth
Source 2 strategy for handling memory growth:
Destroy command buffers no longer needed
Heuristic to destroy command buffers
![Page 75: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/75.jpg)
Command Buffer Batching
vkQueueSubmit implies a flush
Also has CPU costs – memory residency
Important to batch submits
![Page 76: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/76.jpg)
Command Buffer Batching
![Page 77: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/77.jpg)
Command Buffer Batching
Batched submit: ~0.7ms / frame
![Page 78: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/78.jpg)
Command Buffer Batching
Batched submit: ~0.7ms / frame Unbatched submits: ~4.5ms / frame
![Page 79: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/79.jpg)
Source 2 Command Buffer Batching
Gather command buffers on renderthread
Up to a threshold, needed during load time
Wait for present request
Issue single submit with all batched command buffers
![Page 80: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/80.jpg)
Redundant Call Filtering
Your job now!
Vulkan drivers may not (should not!) filter calls
If we don’t do it, we will force IHVs to
Hurts the good apps at the expense of the bad
Examples from Source 2:
vkCmdBindIndexBuffer
vkCmdBindVertexBuffers
vkCmdBindPipeline
Dynamic render state
vkCmdSet*
![Page 81: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/81.jpg)
Updating Descriptors
vkUpdateDescriptorSets #1 hotspot
vkCmdBindDescriptorSets #2 hotspot
Source 2 approach:
Single pipeline layout shared across all pipelines
Descriptor sets will have unused entries
Update/bind descriptor set per draw
Not efficient!
![Page 82: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/82.jpg)
Updating Descriptors – The Right Way
In shaders, organize descriptor sets by update
frequency
Bake descriptor sets up front
Use compatible pipeline layouts to simplify descriptor
allocation
![Page 83: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/83.jpg)
Updating Descriptors – The Right Way
In shaders, organize descriptor sets by update
frequency
Bake descriptor sets up front
Use compatible pipeline layouts to simplify descriptor
allocation
…we plan to do this in the future. Will help perf a lot.
![Page 84: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/84.jpg)
Pipeline Creation
vkCreateShaderModule is relatively fast
Loads in the SPIR-V, no heavy compilation
~0.01ms in Dota 2
vkCreateGraphicsPipelines is expensive
Driver performs shader compile here
0.2 – 152ms in Dota 2 before cache is warmed
![Page 85: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/85.jpg)
Vulkan Pipeline Cache
Serialize compiled pipelines to disk
Preload to remove first-time stutters
Header contains VendorID/DeviceID/UUID
Otherwise opaque format
Avoid unnecessary shader compiles
Driver de-duplicates
Only driver knows when recompile is needed based on
state
Pipeline cache should contain only unique pipelines
Allows compilation on multiple threads
Merge later using vkMergePipelineCaches
![Page 86: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/86.jpg)
Summary
Dota 2 Vulkan Performance Results
Reduced latency
Improved framerate in expensive scenes
Performance Lessons Learned
Command Buffer Recycling
Command Buffer Batching
Redundant Call Filtering
Updating Descriptors
Pipeline Cache Usage
![Page 87: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/87.jpg)
Questions?
![Page 88: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/88.jpg)
Vulkan Does RetroA Vulkan Use-Case Study with RetroArch and libretro
Hans-Kristian Arntzen – GDC 2016
![Page 89: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/89.jpg)
Background• Me
• Multimedia programming since 2009
• Co-founder of RetroArch project in 2010-2011
• Working at ARM hacking on the Mali GPUs since 2014
• Contributed Vulkan backend on launch day
• RetroArch / libretro
• Multi-platform system optimized for enjoying retro content
• Plugin abstraction to support many different systems
• Strong focus on portability and performance
![Page 90: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/90.jpg)
Problem• Retro content usually needs to render on CPU
• Emulators of classic consoles in particular is a prime example
• Get software rendered images to screen fast and reliably
• Blazing fast texture uploads part of the equation
CPU
GPU magic
![Page 91: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/91.jpg)
Streaming with Vulkan• Vulkan exposes VK_IMAGE_TILING_LINEAR
• Finally! For some reason, never added to OpenGL
• GPUs can sample from these textures• At least on the Vulkan drivers I have tested ...
• No reason to copy from linear to optimal layout (used once!)
• Vulkan supports persistently mapped memory• Finally, us GLES folks can do it right -
• Combine this to a dream scenario• Persistently map a ring buffer of linear textures
• Let libretro core render directly into HOST_VISIBLE memory or use pure memcpy()
![Page 92: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/92.jpg)
Caveats• Vulkan doesn’t require support for sampling linear textures
• Might need fallback
• Linear textures might not be DEVICE_LOCAL• Mostly a desktop thing
• Might need same fallback as before ...
• Memory might not be cached• Fallback to copy if we want to blend on the surface
• Simple, vendor-neutral fallbacks• If we hit either case, copy linear texture to DEVICE_LOCAL
• Might as well copy to OPTIMAL tiling layout
• vkCmdCopyImage (or vkCmdCopyBufferToImage)
![Page 93: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/93.jpg)
The various ways to copy ...• Ring buffered textures with glTexSubImage appears to be best
• We already did the hard part for the driver• Texture is not in use by GPU, should allow optimal path• Only way in pure GLES2
• Classic async PBO uploads have extra overhead on all drivers• After all, have to copy to PBO, then copy to texture• Doesn’t accomplish anything over plain SubImage in our case
• AZDO-style PBO seems interesting ... but• Observed bizzarre 10x performance dips in TexSubImage• So much for that ...
• On Raspberry Pi 1, things got weirder ...• Optimal path was uploading to OpenVG texture• Share image with GLES via EGL ...
![Page 94: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/94.jpg)
Benchmark• NES video from Nestopia libretro core
• 256x240 resolution @ 32 bpp
• Ran through RetroArch’s Vulkan and GL backends• Measurements
• Time to copy texture from CPU to texture
• Time spent overall to submit frame
• Measured on Linux
![Page 95: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/95.jpg)
OpenGL results• Sure, we’re measuring in microseconds• We can do so much better!
• * GL calls were blocking mid-frame• Probably rate-limiting waiting for older frames
CPU GPU Copy OpenGL (µs) Frame OpenGL (µs)
i5-5257U @ 2.70 Intel HD 6100 (Mesa) 130 N/A (*)
i7 920 @ 2.66 nVidia GTX 760 272 302
Cortex-A17 @ 1.8 Mali T-764 585 806
![Page 96: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/96.jpg)
Vulkan delivers!• Copy time essentially a memcpy() benchmark
• Overall frame times way better than the GL texture upload!
• Great uplifts across the board
• Still room for improvement
CPU GPU Copy Vulkan (µs) Frame Vulkan (µs) Copy uplift
i5-5257U @ 2.70 Intel HD 6100 (Mesa) 27 122 352 %
i7 920 @ 2.66 nVidia GTX 760 46 69 491 %
Cortex-A17 @ 1.8 Mali T-764 80 215 631 %
![Page 97: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/97.jpg)
Conclusion• Even humble 2D applications can gain from Vulkan
• Not reserved for the highest-end engine developers
• Vulkan provides a far more direct and simple path to perf
• Fast paths are more obvious than before
• Going from good to great is much simpler in Vulkan
![Page 98: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/98.jpg)
THANKS!
@themaister
github.com/Themaister
github.com/libretro/RetroArch
![Page 99: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/99.jpg)
© Copyright Khronos Group 2016 - Page 191
Porting Cinder to VulkanHai Nguyen, Google
![Page 100: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/100.jpg)
GFXBench 5 - Aztec RuinsBenchmarking Vulkan
Gergely Juhasz, Lead Gfx Engineer @Kishonti
![Page 101: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/101.jpg)
GFXBench 5 in a nutshell
• Concept• Working title: Aztec Ruins
• Entirely new rendering engine• In-house render API for Vulkan, Metal, DX12• Also on OpenGL 4.3+, ES 3.2, DX11 for comparison• Algorithmic and workload parity across different backends
• High-end graphics features• Real time dynamic GI• Complex shading and advanced post-effects
• State• Near to Beta• Gold version expected by Q3
![Page 102: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/102.jpg)
Actual engine footage
![Page 103: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/103.jpg)
Render pipeline – Direct lights
![Page 104: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/104.jpg)
Render pipeline – Dynamic shadows
![Page 105: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/105.jpg)
Render pipeline – Global illumination
![Page 106: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/106.jpg)
Render pipeline – Post-process
![Page 107: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/107.jpg)
Global illumination
• Probes capture the lighting conditions
• SH is generated for every probe
• Final scene is shaded by deferred irradiance lights
• Well fits in Vulkan’s subpass concept
![Page 108: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/108.jpg)
Subpass 1 – Geometry
![Page 109: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/109.jpg)
Subpass 2 – Lighting
![Page 110: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/110.jpg)
Final step – Post effects
![Page 111: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/111.jpg)
Multi-threaded command recording 1
Render job Render targets
Render states
Drawcalls
A B
D EC
F
Dependency graphPipeline consists of several render jobs
![Page 112: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/112.jpg)
Multi-threaded command recording 2
Command buffer
Command buffer
Command buffer
Command buffer
Main thread Command queue
Main rendering thread submits the command buffers according to the dependency graph
![Page 113: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/113.jpg)
Future development plans
• Planned rendering features• Indirect specular highlights and shadows by GI
• Deferred decals
• Animated vegetation
• Compute based motion blur
• Atmospheric effects, particles
• VR
![Page 114: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/114.jpg)
![Page 115: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/115.jpg)
![Page 116: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/116.jpg)
© Copyright Khronos Group 2016 - Page 208
Comparing Vulkan to OpenGL (ES)
Barthold LichtenbeltMarch 16, 2016
![Page 117: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/117.jpg)
© Copyright Khronos Group 2016 - Page 209
Beneficial Vulkan Scenarios
Is your graphicswork CPU bound?
Can your graphicscreation be parallelized?
start
yes
Vulkanfriendly
Your graphicsplatform is fixed
You’lldo what it
takes to squeeze outMax perf.
You put a premium on
avoidinghitches
You canmanage your
graphics resourceallocations
yes
yes
yes
yes
yes
![Page 118: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/118.jpg)
© Copyright Khronos Group 2016 - Page 210
Unlikely to Benefit
Scenarios to reconsider coding to Vulkan
1. Need for compatibility to pre-Vulkan platforms2. Heavily GPU-bound application3. Heavily CPU-bound application due to non-graphics work4. Single-threaded application, unlikely to change5. App can target middle-ware engine, avoiding 3D graphics API dependencies
• Consider using an engine targeting Vulkan, instead of coding Vulkan yourself
![Page 119: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/119.jpg)
© Copyright Khronos Group 2016 - Page 211
Comparing OpenGL, AZDO, and VulkanIssue Naïve GL AZDO VulkanDeterministic state validation/pre-compilation
no no Yes
Improved single thread performance no Yes Yes
Multi-threaded work creation no partial yes
Multi-threaded work submission (to driver)
no no yes
GPU based work creation no partial partial (through MDI)
Ability to re-use created work no partial yes
Multi-threaded resource updates no Yes Yes
Learning curve low high Significant
Effort low high Significant
![Page 120: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/120.jpg)
© Copyright Khronos Group 2016 - Page 212
Fish demo•Vulkan and OpenGL ES 3.1•Can change- # of schools of fish
- # of fish per school
- # of fish per drawcall
•Worker threads create commandbuffers in Vulkan mode
•Reports- Drawcalls/sec
- FPS
- CPU time per thread
- GPU time
•Android and Windows• Source code will be available soon
![Page 121: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/121.jpg)
© Copyright Khronos Group 2016 - Page 213
200K Fishies, 100 fish per draw call
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1
OpenGL ES
Vulkan
drawcalls / sec
7x
1.5x
1.2x
![Page 122: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/122.jpg)
© Copyright Khronos Group 2016 - Page 214
200K Fishies, 1 fish per draw call
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1
OpenGL ES
Vulkan
drawcalls / sec
6x5x
19x
![Page 123: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/123.jpg)
© Copyright Khronos Group 2016 - Page 215
FISH DEMO
![Page 124: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/124.jpg)
Porting Cinder to VulkanLearning to Follow RulesHai NguyenCreative Technology LeadArt Copy & Code Project
![Page 125: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/125.jpg)
Vulkan: Lots of rules and no mercy.
~Joseph Campbell (paraphrased)
![Page 126: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/126.jpg)
Introducing Cinder
● What’s creative coding?○ Programming with aesthetic intent
● What platforms does Cinder run on?○ Android, Linux, Windows, iOS and OS X
● Open source under Simplified BSD
C++ Creative Coding Framework | https://libcinder.org
Porting Cinder to Vulkan
![Page 127: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/127.jpg)
Cinder: Who/What/Where?
● Who is Cinder’s target audience?○ Creative coders
● What is Cinder used for?○ Apps: mobile to desktop to Times Square
● Where has Cinder been used?
Audience and Projects
Porting Cinder to Vulkan
![Page 128: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/128.jpg)
Grove | Simon Geilfus Planetary | BLOOM.io SCAD Museum | Pentagram
IBM THINK | Mirada Samsung CenterStage | TBG Dia Lights | Kollision
Audi Urban Future | Kollision Androidify | Red Paper Heart Taxi, Taxi! | Robert Hodgin
Porting Cinder to Vulkan: Projects That Use Cinder
![Page 129: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/129.jpg)
Porting Cinder to Vulkan
● Vulkanizing Cinder
● Crossing Vendor Implementations
● Speed Bumps
The Road To Glory
Porting Cinder to Vulkan
![Page 130: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/130.jpg)
Vulkanizing Cinder
● Added RendererVk to Cinder○ Cinder rendering architecture is modular
● Wrapped Vulkan in C++○ Created idiomatic layer for expression
● Created high level graphics classes○ Textures, vertex buffers, render targets, etc
Getting to the First Triangle
Porting Cinder to Vulkan
![Page 131: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/131.jpg)
Vulkanizing Cinder
● Initial port on Windows: ~3wks○ Included updating GLSL to Vulkan convention
● Android and Linux port: ~3hrs (each)○ Added platform WSI calls
○ Added platform swapchain creation
● Everything else stayed the same○ Including GLSL shader code used in demos and tests
Going Cross Platform
Porting Cinder to Vulkan
![Page 132: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/132.jpg)
Crossing Vendor Implementations
● Vendor implementations follow the spec○ Conformance tested
● Slightly different behaviors○ Image layout transitions in render passes
● Varying GPU limits/features○ Found in VkPhysicalDeviceLimits
Implementation Details Will Vary
Porting Cinder to Vulkan
![Page 133: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/133.jpg)
Speed Bump: Image Layout Transitions
● Initial platform allowed image layouts to be LAYOUT_GENERAL○ Made it easy to get up and going
● Seemed to work on other GPUs - until one didn’t○ Why? Vendor had stricter adherence to spec
● Checked spec and added logic for transitions○ Had to rework a good bit of code
Dad Said Yes But Mom Said No
Porting Cinder to Vulkan
![Page 134: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/134.jpg)
Whooops...
Porting Cinder to Vulkan
![Page 135: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/135.jpg)
YAY!
Porting Cinder to Vulkan
![Page 136: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/136.jpg)
Speed Bump: Not Paying Attention to Limits
● Not adhering to limits often results in crashes
● Mishandled vkCmdBindDescriptorSets○ Exceeded maxBoundDescriptorSets
● Tried to multithread on device with 1 queue○ Failed to check queue family’s queue count
VkPhysicalDeviceLimits / VkQueueFamilyProperties
Porting Cinder to Vulkan
![Page 137: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/137.jpg)
No More Black Box / Fewer Black Screens
● Vulkan Specification○ Clear about requirements and expectations (mostly)
● Check Device Limits / Features at Run Time○ Easy to query in Vulkan
● Validation Layers Are Your Friends○ Turn on at day 1 - leave on until shipped
Help Vulkan Help You
Porting Cinder to Vulkan
![Page 138: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/138.jpg)
Antoine LabourE. Greg DanielJesse HallShannon WoodsDaniel KochJeff BolzMathias HeyerPiers DaniellTristan LorachJohn McDonaldDominik Witczak
Special Thanks
![Page 139: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/139.jpg)
Thank You!Hai Nguyen
https://libcinder.org
![Page 140: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/140.jpg)
GFXBench 5 - Aztec RuinsBenchmarking Vulkan
Gergely Juhasz, Lead Gfx Engineer @Kishonti
![Page 141: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/141.jpg)
GFXBench 5 in a nutshell
• Concept• Working title: Aztec Ruins
• Entirely new rendering engine• In-house render API for Vulkan, Metal, DX12• Also on OpenGL 4.3+, ES 3.2, DX11 for comparison• Algorithmic and workload parity across different backends
• High-end graphics features• Real time dynamic GI• Complex shading and advanced post-effects
• State• Near to Beta• Gold version expected by Q3
![Page 142: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/142.jpg)
Actual engine footage
![Page 143: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/143.jpg)
Render pipeline – Direct lights
![Page 144: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/144.jpg)
Render pipeline – Dynamic shadows
![Page 145: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/145.jpg)
Render pipeline – Global illumination
![Page 146: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/146.jpg)
Render pipeline – Post-process
![Page 147: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/147.jpg)
Global illumination
• Probes capture the lighting conditions
• SH is generated for every probe
• Final scene is shaded by deferred irradiance lights
• Well fits in Vulkan’s subpass concept
![Page 148: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/148.jpg)
Subpass 1 – Geometry
![Page 149: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/149.jpg)
Subpass 2 – Lighting
![Page 150: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/150.jpg)
Final step – Post effects
![Page 151: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/151.jpg)
Multi-threaded command recording 1
Render job Render targets
Render states
Drawcalls
A B
D EC
F
Dependency graphPipeline consists of several render jobs
![Page 152: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/152.jpg)
Multi-threaded command recording 2
Command buffer
Command buffer
Command buffer
Command buffer
Main thread Command queue
Main rendering thread submits the command buffers according to the dependency graph
![Page 153: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/153.jpg)
Future development plans
• Planned rendering features• Indirect specular highlights and shadows by GI
• Deferred decals
• Animated vegetation
• Compute based motion blur
• Atmospheric effects, particles
• VR
![Page 154: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/154.jpg)
![Page 155: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/155.jpg)
![Page 156: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/156.jpg)
© Copyright Khronos Group 2016 - Page 208
Comparing Vulkan to OpenGL (ES)
Barthold LichtenbeltMarch 16, 2016
![Page 157: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/157.jpg)
© Copyright Khronos Group 2016 - Page 209
Beneficial Vulkan Scenarios
Is your graphicswork CPU bound?
Can your graphicscreation be parallelized?
start
yes
Vulkanfriendly
Your graphicsplatform is fixed
You’lldo what it
takes to squeeze outMax perf.
You put a premium on
avoidinghitches
You canmanage your
graphics resourceallocations
yes
yes
yes
yes
yes
![Page 158: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/158.jpg)
© Copyright Khronos Group 2016 - Page 210
Unlikely to Benefit
Scenarios to reconsider coding to Vulkan
1. Need for compatibility to pre-Vulkan platforms2. Heavily GPU-bound application3. Heavily CPU-bound application due to non-graphics work4. Single-threaded application, unlikely to change5. App can target middle-ware engine, avoiding 3D graphics API dependencies
• Consider using an engine targeting Vulkan, instead of coding Vulkan yourself
![Page 159: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/159.jpg)
© Copyright Khronos Group 2016 - Page 211
Comparing OpenGL, AZDO, and VulkanIssue Naïve GL AZDO VulkanDeterministic state validation/pre-compilation
no no Yes
Improved single thread performance no Yes Yes
Multi-threaded work creation no partial yes
Multi-threaded work submission (to driver)
no no yes
GPU based work creation no partial partial (through MDI)
Ability to re-use created work no partial yes
Multi-threaded resource updates no Yes Yes
Learning curve low high Significant
Effort low high Significant
![Page 160: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/160.jpg)
© Copyright Khronos Group 2016 - Page 212
Fish demo•Vulkan and OpenGL ES 3.1•Can change- # of schools of fish
- # of fish per school
- # of fish per drawcall
•Worker threads create commandbuffers in Vulkan mode
•Reports- Drawcalls/sec
- FPS
- CPU time per thread
- GPU time
•Android and Windows• Source code will be available soon
![Page 161: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/161.jpg)
© Copyright Khronos Group 2016 - Page 213
200K Fishies, 100 fish per draw call
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1
OpenGL ES
Vulkan
drawcalls / sec
7x
1.5x
1.2x
![Page 162: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/162.jpg)
© Copyright Khronos Group 2016 - Page 214
200K Fishies, 1 fish per draw call
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1
OpenGL ES
Vulkan
drawcalls / sec
6x5x
19x
![Page 163: GDC : Mar16.pdf](https://reader036.vdocuments.mx/reader036/viewer/2022081800/586a4cde1a28ab767d8be819/html5/thumbnails/163.jpg)
© Copyright Khronos Group 2016 - Page 215
FISH DEMO