nitrous and mantle: combining efficient engine design with a modern api dan baker, partner, oxide...

38
NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

Upload: karson-culling

Post on 31-Mar-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

NITROUS AND MANTLE:Combining efficient engine design with a modern API

Dan Baker, Partner, Oxide Games

Page 2: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

2| Nitrous and Mantle | 19 March 2014

PRE-REQUISITE MOTIVATIONAL SLIDE

MODERN APIS ARE STARTING TO FEEL RATHER DATED

BUT HOW MUCH BETTER CAN WE BE?

Page 3: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

3| Nitrous and Mantle | 19 March 2014

PRE-REQUISITE MOTIVATIONAL SLIDE

TURNS OUT… A WHOLE LOT FASTER

Page 4: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

4| Nitrous and Mantle | 19 March 2014

HONEY, DOES THIS DRESS MAKE ME LOOK FAT?

=

Page 5: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

5| Nitrous and Mantle | 19 March 2014

STATE OF THE ART TODAY: WHAT’S GOING ON?

Lots of little things add up2 major problems require rearchitecture

–Functional threading model throws a wrench into task based systems

–Implicit Hazard tracking and synchronizationAPI tries to hide the async nature of GPU

Lots of little things, memory model, binding model, etcAnalysis of features like instancing indicate that it is unreliable and tends to speed up only the fastest frames, correlation between batches and driver perf is casual

Can’t RETRO fit old APIS

Page 6: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

6| Nitrous and Mantle | 19 March 2014

DIVING INTO NITROUS

Nitrous = Oxide’s custom engineSpecifically designed for high throughputCore neutral. Main thread acts only as lightweight sequencerAll work divided up into small jobs, which are in the microsecond rangeCan produce lots of jobs, 10,000+ range per frame

Page 7: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

7| Nitrous and Mantle | 19 March 2014

STAR SWARM

Nitrous Engine demo

Free to download, experiment

Proof of concept for modern API design

Represents 2 AI opponents, thus application CPU load is realistic

10,000 units possible

100,000+ batches possible

Page 8: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

8| Nitrous and Mantle | 19 March 2014

SECRETS BEHIND STAR SWARM

Much of what is required for high performance isn’t specific to Mantle

Star Swarm originally not based on MantleIf engine is structured in certain ways, Mantle support is straight-forward and intuitive. Maybe even fun.

Work done to restructure engine will have benefits outside of Mantle support

Page 9: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

9| Nitrous and Mantle | 19 March 2014

ADDING NITROUS TO THE ENGINE

Rendering broken into jobs which generate autonomous command buffers

CPU to GPU data streamlined – constants, texture updates go into GPU frame memory

Shader bindings standardized

Shaders, state, bundled into blocks

Resources grouped into sets

Graphics commands streamlined, restricted bind points

Stateless command format

Expensive state transitioned rarely

Much attention paid to cache usage, lockless data structures

All hazards detangled, all buffers considered non persistent

Page 10: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

10| Nitrous and Mantle | 19 March 2014

MULTI-CORE CPU BASICS

Be Wary, There Is A Lot Of Very Bad Advice In The Wild

Spawning threads to handle tasks

Relying OS preemptive scheduler, heavy weight OS synchronization primitives

Functional threading in general

Your Survival Guide

OK: Multi-thread read of same location

OK: Multi-thread write to different locations

OK: Multi-thread write to same location in ‘stamp’ mode

CAUTION: Atomic instructions

STOP: Multi-thread read/write to same location

STOP: Multi-thread write to same CACHE line

Page 11: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

11| Nitrous and Mantle | 19 March 2014

NITROUS AND MANTLE

Nitrous is NOT built around MantleReverse is more true, Mantle adapts well to Nitrous internal conceptsThe concepts are what make engine fastResults are astounding, driver time reduced up to 50xMantle is the harbinger of future API design, Not just in Graphics

Page 12: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

12| Nitrous and Mantle | 19 March 2014

TASK BASED SYSTEM

Idea is that work load is a constructed graph of much smaller nuggets

Many advantages

– Scales well, 32+ cores

– Easy to balance workload

– More power efficient – more slower cores just as good

Already seeing CPUs dynamically slowing clock speed

– If enough similar work items queued, can execute same code on cores

Cache hit rate much higher

– End up generating a larger number of command buffers to prevent thread serialization

Page 13: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

13| Nitrous and Mantle | 19 March 2014

HOW NITROUS GENERATES COMMANDS

Core 0 Core 1 Core 2

Cmd Buffer 0

Cmd Buffer 1

Cmd Buffer 2

Cmd Buffer 3

Frame Data

Cmd Buffer 2

Cmd Buffer 0

Cmd Buffer 1

Cmd Buffer 3

Page 14: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

14| Nitrous and Mantle | 19 March 2014

NITROUS COMMAND FORMATS

In reality, diagram is over simplified

Nitrous has it’s own internal command format

– Small, efficient commands

– Stateless, each command contains references to all needed state

– Inheritance unneeded

– Separates internal graphics system from any particular API

Being Stateless, can be generated completely out of order

Entire Frame is queued up in internal command format

Frame is translated to GPU commands via Mantle

– Nitrous Command buffers are translated into Mantle Command Buffers at one section

Get’s more optimal use out of instruction cache and data cache

Page 15: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

15| Nitrous and Mantle | 19 March 2014

BUILDING AROUND ASYNCRONISITY: HOW NITROUS THINKS OF A FRAME

Entire app should be exposed to concept of asyncronisity

The concept of a frame:

– A set of commands which will be executed on the GPU

– A set of data which will be read by the GPU

– This concept is fundamental in Nitrous, regardless of API

Frame

CMD CMD CMD CMD

Frame Data

Persistent

Textures Big Transfer Buffers

Resource Sets

Page 16: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

16| Nitrous and Mantle | 19 March 2014

CREATING A FRAME, USING FRAME DATA

Create 2 copies of our frame data

One will be read by GPU, while other is being written to by the CPU

Must use fence to make sure CPU doesn’t get ahead

More complex situations could be explored

Frame data includes

– Constant Data

– Small texture updates

Even Frame

Odd Frame

GPU

CPU

Page 17: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

17| Nitrous and Mantle | 19 March 2014

STARTING OUR FRAME

g_fTotalFrameTime = System::Time::TimeAsSeconds( System::Time::PeekTime()) - g_StartTotalFrameTime ;g_uFrameBuffer = (g_uFrame % TransferMemory::BUFFERS);

if(g_bProtectMemory) ProtectMemory(g_CmdTransferMemory.PerFrameMemory[g_uFrameBuffer].pData, g_CmdTransferMemory.Capacity, PO_READWRITE);

gr = grWaitForFences(g_Device, 1, &g_FrameFences[g_uFrameBuffer], true, MAX_FRAME_WAIT_TIME_SECONDS);if(gr == GR_TIMEOUT) // Throw the frame out return false;

g_StartTotalFrameTime = System::Time::TimeAsSeconds( System::Time::PeekTime());

//Reset our allocation and map the current GPU memoryHeapAndChunk HeapChunk = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].Memory;MarkMemoryChunk(HeapChunk);GR_GPU_MEMORY DynamicMemory = g_MemoryHeap.MemoryChunk[HeapChunk.Heap][HeapChunk.SubChunk].Memory; gr = grMapMemory(DynamicMemory, 0, (GR_VOID**) &g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData); g_CmdTransferMemory.PerFrameMemory[g_uFrameBuffer ].Offset = 0;g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer ].Offset = 0;

g_GPUTransferMemory.pCurrentMantleMemoryBase = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData;g_GPUTransferMemory.pCurrentMantleMemoryEnd = g_GPUTransferMemory.PerFrameMemory[g_uFrameBuffer].pData + g_GPUTransferMemory.Capacity;g_GPUTransferMemory.CurrentMantleMemory = DynamicMemory;g_GPUTransferMemory.TempHeapMemory = 0;

return true;

Page 18: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

18| Nitrous and Mantle | 19 March 2014

HINT

Use memory heap that has highest cpuWritePerfRatingIn Debug, rather then copying directly to GPU memory, allocate CPU memory

–Or use pinned Mantle memoryThen, use OS call Virtual Protect with PAGE_NOACCESS for any data that effects the frame, while the frame is being accessed by GPU, or could be being translated by the CPU

If any part of system inadvertently writes to the memory, will throw exception

Page 19: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

19| Nitrous and Mantle | 19 March 2014

SOME EXTRA STUFF WE WILL NEED

Because we track hazards, we will want a few more buffersA delete queue – objects are not deleted, but placed in the delete queue

–One queue per frame, once that frame is complete, items will be deleted

A state transition queue

–Used only when a resource is created, to transition it to the desired initial state

An Unordered Command Queue

–Gets flushed before main frames command queue

–Useful for preparing resources for first time use (e.g. initialization)

Page 20: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

20| Nitrous and Mantle | 19 March 2014

INTERNAL COMMAND FORMAT

Nitrous has it’s own internal command format

Persistent state:

– Resource Sets

– Shader Blocks

– Various pipeline state

Frame State, primary construct is a batch set

– Contains primitives, batches and shader sets

– Batches which reference

Primitives

Shader Sets

– Constant references are made into our frame memory

Each one of these has a different, natural change frequency

Page 21: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

21| Nitrous and Mantle | 19 March 2014

NITROUS MEMORY POOLS

Resources used together, created together

Multiple resource sets are often pooled

Simplifies memory management, less then 1000 total allocations

Orange Team Unit’s Memory

FIGHTER 1 CAR. REAR

CAR. FOR CARRIER MAIN

(0) Albiedo(1) Material Mask(2) Ambient Occlusion(3) Normal Map(4) Weathering Map

(0) Albiedo(1) Material Mask(2) Ambient Occlusion(3) Normal Map(4) Weathering Map

(0) Albiedo(1) Material Mask(2) Ambient Occlusion(3) Normal Map(4) Weathering Map

(0) Albiedo(1) Material Mask(2) Ambient Occlusion(3) Normal Map(4) Weathering Map

Page 22: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

22| Nitrous and Mantle | 19 March 2014

NITROUS MEMORY POOLS

GPU resource allocation a little tricky – we don’t know ahead of time how big something might be

2 step process, first calculate size of resource, then allocate pool based on that size

Does not map 1:1 to Mantle memory allocations

Instead, Pool is created with default page size

When a new resource is added, either it places inside current allocation, or if resource is bigger then the page size, creates a new allocation that fits the resource

A memory pool in Nitrous = a list of allocations in Mantle

If able to size ahead of time, only 1 allocation

Unit Textures

Diffuse

Specular

Mask

AO

Normal

Mantle Alloc

Mantle Alloc

Mantle Alloc

Page 23: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

23| Nitrous and Mantle | 19 March 2014

CREATING A RESOURCE

//Setup our resource decriptionsResourceDesc RedLUTDesc, BlueLUTDesc, GreenLUTDesc;RedLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfRedData, 0);BlueLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfBlueData, 0);GreenLUTDesc.Init(OX_FORMAT_R32_FLOAT, 1, 1, RT_TEXTURE_1D, v3ui(1024,1,1), pfGreenData, 0);

//Calculate Sizeuint64 uSize = 0;uSize += GetResourceMemorySize(RedLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU);uSize += GetResourceMemorySize(BlueLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU);uSize += GetResourceMemorySize(GreenLUTDesc, Graphics::RF_SHADERRESOURCE, Graphics::HT_GPU);

//Create a GPU heap, sized to what we wantg_GPUMemory = AllocateHeap(uSize, Graphics::HT_GPU);

//Create the ResourcesColorLuts.RedLUT.Resource = CreateResource(RedLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);ColorLuts.BlueLUT.Resource = CreateResource(BlueLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);ColorLuts.GreenLUT.Resource = CreateResource(GreenLUTDesc, RF_SHADERRESOURCE, RSTATE_DEFAULT, g_GPUMemory);

Page 24: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

24| Nitrous and Mantle | 19 March 2014

SOME EXTRA MANAGEMENT REQUIRED

Creating a Resource slightly more involved

When a resource creation call occurs, check to see if we are a GPU heap

If so, no way to directly map memory and upload resource so

– 1) Allocate, or recycle a CPU visible heap object

– 2) Create Resource and map into this heap

– 3) Create Resource on the GPU in the specified heap, (it will be uninitialized)

– 4) Issue a copy command in our Unordered Command Queue

– 5) Place temp resource in a deletion queue

For any resource, we allow a default state to be specified

– At beginning of frame, before we execute main comands, issue any state transition queues to place resources from default state into desired state

Page 25: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

25| Nitrous and Mantle | 19 March 2014

RESOURCE SETS

In real world, textures are grouped

Nitrous has 5 bind points

– 2 for batch

– 2 for shader

– 1 for primitive

VB is just a resource set

Nitrous does not allow binding of individual textures

Clearly, maps 1:1 to a descriptor

Space Fighter 1

(0) Albiedo

(1) Material Mask

(2) Ambient Occlusion

(3) Normal Map

(4) Weathering Map

Page 26: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

26| Nitrous and Mantle | 19 March 2014

VERTEX BUFFERS

Nitrous does not use Vertex Buffers

Instead, Resource Set acts as VB, but with more programmatic control

Vastly simplifies engine side management

– VBs can be saved as DDS files

– Do not require a huge amount of loading code for slightly different Vertex Formats

– Can fold Displacement maps and other geometry modifiers into Primitive Resource Set

Not seen strong evidence on any hardware that this causes a performance issue

Page 27: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

27| Nitrous and Mantle | 19 March 2014

CONSTANT BUFFERS

Nitrous does not have concept of constant buffers

Instead, all constant data is thrown out every frame

– When we render an object, CPU will generate the constants needed for that frame

– Grab a piece of the Frame Memory and write to it

Constant bindings are just references into our frame memory

But… be careful! CPU is writing straight to GPU memory. Do NOT read it back!

Evidence suggests no performance advantage of persisting constants across frames, regenerating every frame is ample fast. 100k+ batches not a problem

Page 28: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

28| Nitrous and Mantle | 19 March 2014

A BATCH IN NITROUS CONSISTS OF 4 PARTS

Batch Set

Prim 0 Prim 1 Prim 2

Shader 0 Shader 1

Batch 0

Batch 1

Batch 2

Batch 3

Batch 4

Primitive

IB

Resources

Tri info

Shader

Resources (2)

Constants (2)

Shader Block

Batch

Primitive

Shader

Resources (2)

Constants (2)Batch Set

Batches

Primitives

Shaders

RTs

Blend State

Page 29: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

29| Nitrous and Mantle | 19 March 2014

DESCRIPTOR TABLE LAYOUT FOR NITROUS

Descriptor 0

*Batch Resource Set 0

*Batch Resource Set 1

Batch Constants 1

Batch Constants 2

*Shader Resource Set 0

*Shader Resource Set 1

Shader Constants 0

Shader Constants 1

*UAV

*Samplers (only 1 global bank)

Descriptor 1

*Primitive VB

Dynamic Const

Batch Constants 0

Page 30: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

30| Nitrous and Mantle | 19 March 2014

DESCRIPTOR BINDING STRATEGY

Remember: Descriptors are just structures on GPU memory, so need to double buffer as well

Create 1 giant descriptor table, start update at beginning of frame

Recognize that we have a resource bind vector of only 9 items

Each bind vector can be built into a descriptor table, but don’t need unique one

Check to see if this bind vector has been built before(During this frame), e.g. resident in a small cache, if so, just reference it

If not, build a new descriptor table, and place in cache

Dynamic constants, batch constant 0, uses grCmdBindDynamicMemoryView

– Usually, this will change every call (e.g. some part of the batch is changing or else it’s the same batch)

Using grCmdBindDynamicMemoryView, for 100k batches, about 5-10k descriptors actually need to get built per frame

Page 31: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

31| Nitrous and Mantle | 19 March 2014

TRACKING RESOURCE USAGE

Apps responsibility to track what resources get used

Simple strategy: Stamp a frame number on each memory pool anytime it is bound

Traverse the complete resource list, anything which matches current frame must be resident

Quick as long as we keep # of heaps reasonable

Important: Frame # should be padded into a cache line to avoid serialization

Heap description Last Frame Used

UI Textures intro 2401

UI Textures in Game 17204

Orange Faction Units 17204

Purple Faction Units 17204

Weapon effects 16392

Post Process RTs 17204

Terrain Heightmap 17204

Page 32: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

32| Nitrous and Mantle | 19 March 2014

DEALING WITH STATE TRANSITIONS

Most important, difficult part of Mantle

Must understand anytime a resource is getting used in a different way,

Read After Write

Write After Write

Page 33: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

33| Nitrous and Mantle | 19 March 2014

SHADER BLOCKS

Shader Blocks

– Group of shaders with identical resources

– Key point : all shader stages grouped together

– All resources are bound to all stages

– For mantle, need add some extra data

Can we blend?

What back buffer formats might be used?

What z buffer formats might be used?

– Create a matrix of pipeline objects based on specified modes

The right pipeline objet is selected based on current RT state

RTs and blendstate already chunked, no extra state changes introduced

ShaderGroup SimpleShader{ ResourceSetPrimitive = VertexData; ConstantSetDynamic[0] = DynamicData; ResourceSetBatch[1] = UserTS; ConstantSetShader[0] = Globals;

RenderTargetFormats = R16G16B16A16_FLOAT, R11G11B10_FLOAT; BlendStates = BlendOff; DepthTargetFormats = D32_FLOAT;

Methods {main: CodeBlocks = SimpleShaders; VertexShader = SimpleVSShader; PixelShader = SimplePSShader;zprime: CodeBlocks = SimpleShaders; VertexShader = SimpleVSShader; PixelShader = BlankSimplePSShader;

}}

Page 34: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

34| Nitrous and Mantle | 19 March 2014

CREATING SHADER BLOCKS IN MANTLE

Translate HLSL Byte code to Mantle IC

– All done at compile time, have a Mantle speific executable

Creating a Mapping Table

– Batch has 5 bind points

– Shader has 4 bind points

– Batch Set has 1 bind point

– Primitive has 1 bind point

– Global Samplers have 1 bind point

Set up our IC so all pipeline objects use exactly the same top level desciptor

Page 35: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

35| Nitrous and Mantle | 19 March 2014

WHAT ABOUT THAT PRESENT?

Unlike other APIS, we do not need, or should, block on the present on the main thread

Instead we spawn a job, which we block against on the next present

Void PresentJob(){…result = grQueueSubmit(g_UniversalQueue, g_cCommandBuffers, g_CommandBuffers,

cMemRefs, MemRef, g_FrameFences[g_uSubmittingFrameBuffer]);

uint32 PresentFlags = 0; if(g_bVSync) PresentFlags = GR_WSI_WIN_PRESENT_FLIP_DONOTWAIT;

// instruct the GPU to present the backbuffer in the applications windowGR_WSI_WIN_PRESENT_INFO presentInfo ={ g_hWnd, g_MasterResourceList.Images[DR_BACKBUFFER], GR_WSI_WIN_PRESENT_MODE_BLT, 0, PresentFlags};

result = grWsiWinQueuePresent(g_UniversalQueue, &presentInfo);

SignalProcessAndPresentDone(pInfo);}}

Page 36: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

36| Nitrous and Mantle | 19 March 2014

WHAT OUR FRAME SUBMISSION LOOKS LIKE

1) Block on last frames present’s job (e.g. NOT the fence, the actual job we spawned)

2) Process and pending resource transitions from newly created resources

3) Generate all pending unordered commands, by generating into 1 or more cmd buffers

4) Send signals to the issuers of unordered commands, to notify them the commands are submiitted

5) Begin translation of Nitrous cmds into Mantle cmds – usually 100-500 jobs across all cores

6) Flush the deletion queues for this frame (likely a few frames old at this point)

7) Any item in our master deletion queue, add to the now empty deletion queue for this frame

8) Handle memory readbacks

9) Spawn Present job

Page 37: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

37| Nitrous and Mantle | 19 March 2014

FUTURE WORK

Now have explicit control over Multi GPU

Can write better MGPU solutions, like split screen which will not increase latency

– We just got rid of a bunch of latency, don’t want to add it back!

Asymetric GPU use situations are doable – e.g. using integrated graphics in tandem with Discrete GPU

Page 38: NITROUS AND MANTLE: Combining efficient engine design with a modern API Dan Baker, Partner, Oxide Games

38| Nitrous and Mantle | 19 March 2014

RESULTS

Star Swarm surprised both Oxide and AMD

– We were not expecting to see cases where application was 300-400% faster, still room for optimizations

– Right now, we are clearly GPU bound, will release an update soon that increases CPU utilization a little bit to optimize GPU, expecting 10-20% more performance out of Mantle on high end GPUs

Driver overhead very consistent, well correlated to number of calls made

About 2 man months of work

– For an Alpha API, likely 1 month if final version

Especially telling on slower CPUs, surprising number of cases with high end GPUS with old CPUs

Try for yourself: Star Swarm is free to download on Steam!