var/fence: using nv_vertex_array_range and nv_fence cass everitt

VAR/Fence:Using NV_vertex_array_range and NV_fence

Cass Everitt

http://www.opengl.org/

Overview

• What is NV_vertex_array_range?• Fast variation of vertex arrays• GPU pulls data asynchronously

• What is NV_fence?• Better synchronization

• How to use VAR/fence together

• Performance hints

What is NV_vertex_array_range (a.k.a. VAR) ?

• Standard vertex arrays really only reduce function call overhead

• Other optimizations difficult because• Coherency model too strict

• Driver must copy array data before glEnd() returns• Same for glDrawElements() and glDrawArrays()

• Memory range unbounded• Any client memory can be used

What is NV_vertex_array_range (a.k.a. VAR) ? (2)

• Compiled vertex arrays improve this somewhat• Relaxes coherency requirements

• Lock/Unlock semantics• More room to optimize

• Usually requires lots of redundant copying• App could do better memory management

• Introduces index bounds• But not explicit memory bounds

• For multipass rendering• Can re-use transformed vertices (software T&L)• Can put data in AGP/video memory (hardware T&L)


• VAR allows the GPU to pull vertex data (via DMA)• Coherency model completely relaxed

• No constraints on when data must be fetched• Introduces synchronization issues

• VAR memory must be specially allocated• Reqiures AGP or video memory • Special wgl and glx entry points

• VAR memory range must be contiguous• This version of “lock” takes a pointer and a size• Typically a single, large arena is allocated

• Greater application burden of memory management, but with potentially big payoff


• VAR gives the application total control!• App decides how best to use limited resources

• Avoids continually copying static arrays (e.g. skin texture coords)

• The power of display lists – with mutability!

What VAR (alone) does not do well…

• VAR adds the ability to have really fast vertex arrays, but• VAR memory is in limited supply, and• VAR does not provide an efficient way to re-use

memory• Calling glFinish() or glFlushVertexArrayRangeNV()

is too heavy-handed

What is NV_fence?

• NV_fence provides fine-grained synchronization• A “fence” is a probe that can be placed into the

OpenGL command stream• Each fence has a condition that can be tested

• GL_ALL_COMPLETED_NV is currently the only condition

• glFinishFenceNV() allows an app to wait until a specific fence’s condition is satisfied

• Very fine application-level control

How to use VAR/fence together

• The combination of VAR and fences is very powerful• VAR gives the best possible T&L performance• With fences, apps can achieve very efficient

pipelining• In memory-limited situations, VAR memory must

be reused• Fences can be placed in the command stream to

determine when memory can be reclaimed• Different strategies can be used for dynamic/static

data• App chooses management mechanism

How to use VAR/fence together (2)

• The learning_VAR demo uses multiple buffers (within a single arena) to achieve high T&L throughput on completely dynamic geometry!

Performance hints

• If T&L is not a bottleneck, VAR won’t help• VAR helps get geometry to the GPU• If your app is fill-bound, consider

• Adding more geometry – it probably won’t cost anything

• Make sure to use multi-texture and NV_register_combiners to reduce your memory bandwidth requirements

• Dynamic geometry requires CPU tuning work• Float-> int casts can be a huge bottleneck

Performance hints (2)

• Effective memory management and synchronization is key• Avoid redundant copies

• Do this on a per-array bases, not per-object• Be clever in your use of memory

• Use fences to keep both CPU and GPU working

Questions, comments, feedback?

• Cass Everitt, [email protected]• www.nvidia.com/developer• We try to answer questions at OpenGL.org’s

“advanced” opengl forum

http://www.nvidia.com/developer

http://www.nvidia.com/developer

var/fence: using nv_vertex_array_range and nv_fence cass everitt

Documents