var/fence: using nv_vertex_array_range and nv_fence cass everitt
DESCRIPTION
What is NV_vertex_array_range (a.k.a. VAR) ? Standard vertex arrays really only reduce function call overhead Other optimizations difficult because Coherency model too strict Driver must copy array data before glEnd() returns Same for glDrawElements() and glDrawArrays() Memory range unbounded Any client memory can be usedTRANSCRIPT
Overview
• What is NV_vertex_array_range?• Fast variation of vertex arrays• GPU pulls data asynchronously
• What is NV_fence?• Better synchronization
• How to use VAR/fence together
• Performance hints
What is NV_vertex_array_range (a.k.a. VAR) ?
• Standard vertex arrays really only reduce function call overhead
• Other optimizations difficult because• Coherency model too strict
• Driver must copy array data before glEnd() returns• Same for glDrawElements() and glDrawArrays()
• Memory range unbounded• Any client memory can be used
What is NV_vertex_array_range (a.k.a. VAR) ? (2)
• Compiled vertex arrays improve this somewhat• Relaxes coherency requirements
• Lock/Unlock semantics• More room to optimize
• Usually requires lots of redundant copying• App could do better memory management
• Introduces index bounds• But not explicit memory bounds
• For multipass rendering• Can re-use transformed vertices (software T&L)• Can put data in AGP/video memory (hardware T&L)
What is NV_vertex_array_range (a.k.a. VAR) ? (3)
• VAR allows the GPU to pull vertex data (via DMA)• Coherency model completely relaxed
• No constraints on when data must be fetched• Introduces synchronization issues
• VAR memory must be specially allocated• Reqiures AGP or video memory • Special wgl and glx entry points
• VAR memory range must be contiguous• This version of “lock” takes a pointer and a size• Typically a single, large arena is allocated
• Greater application burden of memory management, but with potentially big payoff
What is NV_vertex_array_range (a.k.a. VAR) ? (4)
• VAR gives the application total control!• App decides how best to use limited resources
• Avoids continually copying static arrays (e.g. skin texture coords)
• The power of display lists – with mutability!
What VAR (alone) does not do well…
• VAR adds the ability to have really fast vertex arrays, but• VAR memory is in limited supply, and• VAR does not provide an efficient way to re-use
memory• Calling glFinish() or glFlushVertexArrayRangeNV()
is too heavy-handed
What is NV_fence?
• NV_fence provides fine-grained synchronization• A “fence” is a probe that can be placed into the
OpenGL command stream• Each fence has a condition that can be tested
• GL_ALL_COMPLETED_NV is currently the only condition
• glFinishFenceNV() allows an app to wait until a specific fence’s condition is satisfied
• Very fine application-level control
How to use VAR/fence together
• The combination of VAR and fences is very powerful• VAR gives the best possible T&L performance• With fences, apps can achieve very efficient
pipelining• In memory-limited situations, VAR memory must
be reused• Fences can be placed in the command stream to
determine when memory can be reclaimed• Different strategies can be used for dynamic/static
data• App chooses management mechanism
How to use VAR/fence together (2)
• The learning_VAR demo uses multiple buffers (within a single arena) to achieve high T&L throughput on completely dynamic geometry!
Performance hints
• If T&L is not a bottleneck, VAR won’t help• VAR helps get geometry to the GPU• If your app is fill-bound, consider
• Adding more geometry – it probably won’t cost anything
• Make sure to use multi-texture and NV_register_combiners to reduce your memory bandwidth requirements
• Dynamic geometry requires CPU tuning work• Float-> int casts can be a huge bottleneck
Performance hints (2)
• Effective memory management and synchronization is key• Avoid redundant copies
• Do this on a per-array bases, not per-object• Be clever in your use of memory
• Use fences to keep both CPU and GPU working
Questions, comments, feedback?
• Cass Everitt, [email protected]• www.nvidia.com/developer• We try to answer questions at OpenGL.org’s
“advanced” opengl forum