inside xbox one martin fuller xbox advanced technology group amd and microsoft game developer day -...
TRANSCRIPT
Inside Xbox One
Martin FullerXbox Advanced Technology Group
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 2
• This is a non-NDA event• That means there is a limit to how much I can say, go easy!
NDA
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 3
AMD Jaguar (x64) - 8-cores arranged in 2x clusters of 4 cores each
1.75 GHz
Dual issue
Out of order execution
Speculative execution
Store-to-load forwarding
SSE4.2 and AVX
(Dot product!)
16 x 256-bit wide floating point registers
Hardware pre-fetch
CPU
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 4
8 GiB of DDR3 at 68 GiB/s Low latency
Not enough bandwidth to touch all of memory a frame, RAM as a super fast cache
48-bit virtual address space 256 terabytes
Tricky to fragment!
Synced between CPU and GPU
4 MiB of L2 cache 2 MiB per cluster
MOESI protocol for cache coherency
16-way set associative
Per core, up to eight cache requests in flight at once
Memory
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 5
1. Store to load forwarding saves the dreaded LHS stall But not spilling out registers is even better
2. The branch predictor is not a crystal ball Branchless tricks learnt in Xbox 360 era can still apply
3. Hardware data pre-fetch is awesome Only works with arrays
4. Avoid aliasing load/stores on 2KiB alignments This causes a false positive that delays load execution
5. Go wide with SSE and leverage all cores No brainer
CPU – Recommendations
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 6
AMD GCN 768-SPU • 853 MHz
• 32 MiB of ESRAM at 109 GiB/s
• 4 Move Engines
• 3 hardware display planes
Resolution independent
Frame rate independent
• Exact sRGB this time!
(oh, and its free)
• Hardware video encode and decode
• HDMI 1.4a in and out
GPU
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 7
More than just DMA copy Memory set
Texture swizzle
JPEG decompress
LZ compress and decompress
Move Engines
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 8
32MiB of general purpose RAM Not like EDRAM on Xbox 360
109 GiB/s
Sometimes faster in practice!
Zero contention
Not shared with CPU, SRA’s or video out
ESRAM makes everything better
Render targets
Textures
Geometry
Compute tasks
ESRAM
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 9
ESRAM can handle concurrent read/writes:Increasing effective bandwidth above 109 GiB/s
Operations that can take advantage of this:1. Read modify write operations
1. Depth buffer / HTILE update
2. Alpha blending
2. Oh, and concurrently DMA’ing resources in/out of ESRAM while also rendering
How much effective bandwidth can titles achieve?3. The current record holder achieved 141 GiB/s from ESRAM (this is a post processing pass in
a real title)
4. Of course all titles combine ESRAM’s >= 109 GiB/s with DRAM’s 68 GiB/s
ESRAM – Sometimes faster in practice?
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 10
1. Statically allocate a small number of render targets in ESRAM
2. Alias the same memory for re-use later
3. Partial residency Put the top strip of render targets (sky) in DRAM, the rest in ESRAM
4. Asynchronously DMA resources in/out of ESRAM
Launch titles were at 1 - 2
2nd wave of titles are now starting to tackle points 3 and/or 4
3rd+ wave will get really good at this!
ESRAM – The Four Stages of Adoption
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 11
It’s like 8 bit days all over again!(Sort of)
Plan the asynchronous moves Move resources in/out asynchronous while also rendering
New memory map at each stage of the render pipeline
Don’t forget, swizzle textures on DMA
ESRAM – Memory Maps!
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 12
1. Are you bandwidth limited?
2. Have you maxed out the fixed function hardware?
3. Do you have spare compute resource?
Then use async compute!
Titles have barely scratched the surface yet:
Watch this space!
Maxing out the GPU
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 13
1. Use ESRAM First for depth / stencil
Then colour targets
Then everything else
2. Sort by state / shader / use hardware instancing (Batch batch batch!)
3. Always swizzle textures
4. Be wary of using too many general purpose registers Keep an eye on occupancy in PIX, we normally recommend >= 4
5. Avoid reading DRAM via the CPU-coherent bus
6. There is no hardware integer divide
The usual GPU recommendations
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 14
DX11 was designed for the desktop (a long time ago, 2008!) Abstracts a variety of different GPU architectures
Manages VRAM residency for you
Over subscribing VRAM is a serious performance pitfall
Handles hazards
Developers can handle these at a higher level => less cost
Xbox One will run vanilla DX11 PC code Easy port
Extensions available for low level access
Graphics API
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 15
DX11.X Some DX12 features available right now on Xbox:
Turn off hazard tracking
Simple fence API
Deferred contexts re-implemented
New resource descriptor model
Draw bundles
(Xbox specific, not the DX12 API)
Graphics API
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 16
The CPU cannot saturate DRAM bandwidth on its own, the GPU can! Significant performance degradation from DRAM contention
Fancy CPU features don’t help if memory starved
10. Use ESRAM as much as possible
20. Leave DRAM for the CPU and DMA
30. goto 10;
DRAM - Contention
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 17
1. Hardware data cache pre-fetch units are awesome Manual pre-fetch is near pointless once hardware pre-fetch is spinning
Wasting bandwidth if only operating on small arrays
2. Write combined memory pages and SSE streaming store instructions by-pass the cache
No load - halves the bandwidth consumed by the CPU
3. Pack your data! Expanding / compressing data is cheap (CPU & GPU)
F16C (half <-> float) CPU instructions
Store to load forwarding avoids LHS stalls
4. Swizzle your textures Move engines can swizzle on copy
DRAM – Love your bandwidth
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 18
Custom audio hardware Very fast
Lots of features
Kinda cool!
Nuff said
Audio
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 19
1. ERA Exclusive Resource Allocation
Only one active at a time
Custom OS
(Games!)
2. SRA Shared Resource Allocation
Win8 core
(Apps)
3. Hypervisor SRA and ERA use different virtual address space
3x Operating Systems
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 20
ERA can be in one of several states
1. Full screen Full resources (even with snapped app up)
2. Constrained (Windowed) Slightly less CPU and GPU resource
No input
Same amount of memory
3. Suspended Zero CPU and GPU resource
No input
Same amount of memory
Limited time to save after receiving a suspend message
PLM (Program Lifetime Management)
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 21
Hardware: Higher resolution colour and depth
Better ranges
New – infrared!
Microphone array
No tilt motor
Software: Improved skeletal tracking
Improved biometrics
Kinect 2.0
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 22
6x Bluray = ~26 MiB/s To install a 50 GiB Bluray at ~26 MiB/s = ~33 minutes
Too long to wait… bored now… Game must start after an initial payload has been installed.
When running title can hint as to what to install next.
No direct access to Bluray.
Could be digital download
It’s obvious but I’ll say it anyway – compress you assets!
Streaming install
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 23
Cloud compute:• Developer’s code is hosted and executed in Windows Azure
• Game code execution automatically scales based upon usage
Live services:• Stats, analytics, matchmaking & storage.
Secure!
The Cloud
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 24
1. Is your code 64-bit compliant?
2. Can you scale to 6 cores?
3. Adopt new DX11.X API extensions Manage your own resource hazards
4. Make sure you use ESRAM effectively
5. Package content for streaming install Game design considerations
6. Quick save on ERA termination
7. Kinect, Smartglass
8. Cloud services
Challenges
AMD AND MICROSOFT GAME DEVELOPER DAY - June 2 2014, STOCKHOLM 25
(That I’m allowed to answer)
Thank You! – Questions?
© 2014 Microsoft
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing
market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.