keynote (johan andersson) - mantle for developers - by johan andersson, technical director,...
DESCRIPTION
Keynote, Mantle for Developers, by Johan Andersson, Technical Director, DICE/Electronic Arts, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.TRANSCRIPT
Simplify advanced development
Improve performance Enable developers to innovate
Challenge the status quo
Mantle?
Explicit Model: Mantle
Traditional Model: Black Box
Middle-ground abstraction – compromise between performance & “usability”
Hidden resource memory & state
Resource CPU access tied to device context
Driver analyzes & synchronizes implicitly
Thin low-level abstraction to expose how hardware works
App explicit memory management
Resources are globally accessible
App explicit resource state transitions
Control
New model
Tell when render target will be used as a texture ‒ And many more resource state transitions
Don’t destroy resources that GPU is using ‒ Keep track with fences or frames
Manual dynamic resource renaming ‒ No DISCARD for driver resource renaming
Resource memory tiling
Powerful validation layer will help!
App responsibility Control
App high-level decisions & optimizations ‒ Has full scene information ‒ Easier to optimize performance & memory
Flexible & efficient memory management ‒ Linear frame allocators ‒ Memory pools ‒ Pinned memory
Reduced development time ‒ For advanced game engines & apps ‒ Easier to get to target performance & robustness
Explicit control enables Control
Light-weight driver ‒ Easier to develop & maintain ‒ Reduced CPU draw call overhead
Transient resources ‒ Alias render targets within frame ‒ Major memory savings ‒ No need to pre-allocate everything
Explicit control enables Control
Table with resource references to bind to graphics or compute pipeline
Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works
App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic
Example 1: Single simple dynamic descriptor set ‒ Bind everything you need for a single draw call ‒ Close to DX/GL model but share between stages
Descriptor sets CPU perf
Link Sampler
Image Memory
VertexBuffer (VS)
Texture0 (VS+PS)
Constants (VS)
Texture1 (PS)
Texture2 (PS)
Sampler0 (VS+PS)
Dynamic descriptor set
Table with resource references to bind to graphics or compute pipeline
Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works
App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic
Example 2: Reuse static set with nesting ‒ Reduce update time & memory usage
Descriptor sets CPU perf
Link Sampler
Image Memory
Constants (VS)
Link
Dynamic descriptor set
Texture3 (PS)
Texture4 (PS)
Sampler0 (VS+PS)
Texture2 (PS)
Texture1 (PS)
Sampler1 (PS)
Static descriptor set
VertexBuffer (VS)
Texture0 (VS+PS)
CPU perf
Shader stages & select graphics state combined into single object ‒ No runtime compilation or patching needed! ‒ Significantly less runtime overhead to use
Supports parallel building & caching ‒ Fast loading times
Usage & management up to the app ‒ Static vs dynamic creation ‒ Amount of pipelines ‒ State usage
Monolithic pipelines
IA VS HS DS Tessellator
GS RS PS DB
CB
Pipeline state
Issue pipelined graphics & compute commands into a command buffer ‒ Bind graphics state, descriptor sets, pipeline ‒ Draw calls ‒ Render targets ‒ Clears ‒ Memory transfers ‒ NOT: resource mapping
Fully independent objects ‒ Create multiple every frame ‒ Or pre-build up front and reuse
Command buffers CPU perf
Render Driver Render
Game Render
Game Game Render
Automatically extracts parallelism out of most apps
Doesn’t scale beyond 2-3 cores
Additional latency
Driver thread often bottleneck – can collide app threads
CPU 0
CPU 1
CPU 2
CPU perf
DX/GL parallelism
Render
Game
Render
Game Game
Render
App can go fully wide with its rendering – minimal latency
Close to linear scaling with CPU cores
No driver threads – no overhead – no contention
Frostbite’s approach on all consoles – and on PC with Mantle!
Render
Render
Render
Render
Render
Render
Render
Render
Render
CPU 0
CPU 1
CPU 2
CPU 3
CPU 4
CPU perf
Parallel dispatch with Mantle
GPU perf
Thanks to improved CPU performance – CPU will rarely be a bottleneck for the GPU ‒ CPU could help GPU more:
‒ Less brute force rendering ‒ Improve culling
Shader pipeline object – driver optimizations ‒ Can optimize with pipeline state knowledge ‒ Can optimize across all shader stages
Resource states ‒ Gives driver a lot more knowledge & flexibility ‒ Apps can avoid expensive/redundant transitions,
such as surface decompression
Expose existing GPU functionality ‒ Quad & Rect-lists ‒ HW-specific MSAA & depth data access ‒ Programmable sample patterns ‒ And more..
GPU optimizations
Modern GPUs are heterogeneous machines with multiple engines ‒ Graphics pipeline ‒ Compute pipeline(s) ‒ DMA transfer ‒ Video encode/decode ‒ More…
Mantle exposes queues for the engines + synchronization primitives
Queues GPU perf
Graphics
Compute
DMA
GPU
. . .
Queues
Async DMA transfers ‒ Copy resources in parallel with graphics or
compute
Queue use cases GPU perf
Render Other render Use copy Copy
Graphics
DMA
Async DMA transfers ‒ Copy resources in parallel with graphics or
compute
Async compute together with graphics ‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Queue use cases GPU perf
GBuffer Shadowmap 0 Shadowmap 1 Final lighting Non-shadowed lighting Compute
Graphics
Async DMA transfers ‒ Copy resources in parallel with graphics or
compute
Async compute together with graphics ‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Multiple compute kernels collaborating ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute
rasterizer
Queue use cases GPU perf
Compute Geometry Compute 0
Compute 1
Graphics Ordinary Rendering Compute Rasterizer
Async DMA transfers ‒ Copy resources in parallel with graphics or
compute
Async compute together with graphics ‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units
Multiple compute kernels collaborating ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute
rasterizer
Compute as frontend for graphics pipeline ‒ Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline
Queue use cases GPU perf
Game engines will build large GPU job graphs ‒ Move away from single sequential submission ‒ Just as we already have done on CPU
Draw0 Draw1 Draw2 Process0 Compute
Graphics
Process1 Process0
Programmability
Explicit control of GPU queues and synchronization, finally! ‒ Implement your own Alternate-Frame-Rendering ‒ Or something more exotic..
Use case: Workstation rendering with 4-8 GPUs ‒ Super high-quality rendering & simulation ‒ Load balance graphics & compute job graphs across GPUs ‒ 20-40 TFlops in a single machine!
Use case: Low-latency rendering ‒ Important for VR and competitive games ‒ Latency optimized GPU job graph scheduling ‒ VR: Simultaneously drive 2 GPUs (1 per eye)
Explicit Multi-GPU
Programmability
Command buffer predication & flow control ‒ GPU affecting/skipping submitted commands ‒ Go beyond DrawIndirect / DispatchIndirect ‒ Advanced variable workloads ‒ Advanced culling optimizations
Write occlusion query results into GPU buffer ‒ No CPU roundtrip needed ‒ Can drive predicated rendering ‒ Or use results directly in shaders (lens flares)
New mechanisms
Programmability
Mantle supports bindless resources ‒ Shaders can select resources to use instead of
static binding from CPU ‒ Extension of the descriptor set support
Key component that will open up a lot of opportunities!
Examples ‒ Performance optimizations – less data to update ‒ Logic & data structures that live fully on the GPU
‒ Scene culling & rendering ‒ Material representations
‒ Deferred shading ‒ Raytracing
Bindless resources
Mantle gives us strong benefits on Windows today ‒ Console-like performance & programmability on both Windows 7 and Windows 8 ‒ For us, well worth the dev time!
DX & GL are the industry standards ‒ Needed for platforms that do not support Mantle ‒ Needed by devs who do not want/need more control ‒ Have to have fallback paths for GL/DX, but not limit oneself to it
Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations ‒ PS4 graphics API has great programmability & performance as well ‒ Share concepts, methods & optimization strategies
Today Platforms
Want to see Mantle on Linux and Mac! ‒ Would enable support for our full engine & rendering ‒ Significantly easier to do efficient renderer with Mantle than with OpenGL
Use cases: ‒ Workstations ‒ R&D
‒ Not limited by WDDM ‒ Games
‒ Mantle + SteamOS = powerful combination!
Linux & Mac Platforms
Mobile architectures are getting closer in capabilities to desktop GPUs
Want graphics API that allows apps to fully utilize the hardware ‒ Power efficient ‒ High performance ‒ Programmable
Major opportunity with Mantle – leap frog GL4, DX11 ‒ For mobile SoC vendors ‒ For Google and Apple
Mobile Platforms
Mantle is designed to be a thin hardware abstraction ‒ Not tied to AMD’s GCN architecture ‒ Forward compatible ‒ Extensions for architecture- and platform-specific functionality
Mantle would be a much more efficient graphics API for other vendors as well ‒ Most Mantle functionality can be supported on today’s modern GPUs
Want to see future version of Mantle supported on all platforms and on all modern GPUs! ‒ Become an active industry standard with IHVs and ISVs collaborating ‒ Enable us developers to innovate with great performance & programmability everywhere
Multi-vendor? Platforms
Mantle support is in development ‒ Core renderer (closer to PS4 than DX11) ‒ Implement all rendering techniques used in BF4 (many!) ‒ CPU optimizations (parallel dispatch, descriptor sets) ‒ GPU optimizations (minimize transitions, MSAA) ‒ R&D for advanced GPU optimizations ‒ Memory management ‒ Multi-GPU support ‒ ~2 months of work
Update targeting late December
Battlefield 4 Frostbite
Very different rendering compared to BF4
Frostbite Mantle renderer will work out of the box
Focus on APU performance
Plants vs Zombies: Garden Warfare Frostbite
All Frostbite games designed with Mantle ‒ 15 games in development across all of EA
Advanced Mantle rendering & use cases ‒ Lots of exciting R&D opportunities!
Want multi-vendor & multi-platform support!
Future Frostbite