var/fence: using nv_vertex_array_range and nv_fence cass everitt

13
VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

Upload: alisha-patrick

Post on 18-Jan-2018

234 views

Category:

Documents


0 download

DESCRIPTION

What is NV_vertex_array_range (a.k.a. VAR) ? Standard vertex arrays really only reduce function call overhead Other optimizations difficult because Coherency model too strict Driver must copy array data before glEnd() returns Same for glDrawElements() and glDrawArrays() Memory range unbounded Any client memory can be used

TRANSCRIPT

Page 1: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

VAR/Fence:Using NV_vertex_array_range and NV_fence

Cass Everitt

Page 2: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

Overview

• What is NV_vertex_array_range?• Fast variation of vertex arrays• GPU pulls data asynchronously

• What is NV_fence?• Better synchronization

• How to use VAR/fence together

• Performance hints

Page 3: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

What is NV_vertex_array_range (a.k.a. VAR) ?

• Standard vertex arrays really only reduce function call overhead

• Other optimizations difficult because• Coherency model too strict

• Driver must copy array data before glEnd() returns• Same for glDrawElements() and glDrawArrays()

• Memory range unbounded• Any client memory can be used

Page 4: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

What is NV_vertex_array_range (a.k.a. VAR) ? (2)

• Compiled vertex arrays improve this somewhat• Relaxes coherency requirements

• Lock/Unlock semantics• More room to optimize

• Usually requires lots of redundant copying• App could do better memory management

• Introduces index bounds• But not explicit memory bounds

• For multipass rendering• Can re-use transformed vertices (software T&L)• Can put data in AGP/video memory (hardware T&L)

Page 5: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

What is NV_vertex_array_range (a.k.a. VAR) ? (3)

• VAR allows the GPU to pull vertex data (via DMA)• Coherency model completely relaxed

• No constraints on when data must be fetched• Introduces synchronization issues

• VAR memory must be specially allocated• Reqiures AGP or video memory • Special wgl and glx entry points

• VAR memory range must be contiguous• This version of “lock” takes a pointer and a size• Typically a single, large arena is allocated

• Greater application burden of memory management, but with potentially big payoff

Page 6: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

What is NV_vertex_array_range (a.k.a. VAR) ? (4)

• VAR gives the application total control!• App decides how best to use limited resources

• Avoids continually copying static arrays (e.g. skin texture coords)

• The power of display lists – with mutability!

Page 7: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

What VAR (alone) does not do well…

• VAR adds the ability to have really fast vertex arrays, but• VAR memory is in limited supply, and• VAR does not provide an efficient way to re-use

memory• Calling glFinish() or glFlushVertexArrayRangeNV()

is too heavy-handed

Page 8: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

What is NV_fence?

• NV_fence provides fine-grained synchronization• A “fence” is a probe that can be placed into the

OpenGL command stream• Each fence has a condition that can be tested

• GL_ALL_COMPLETED_NV is currently the only condition

• glFinishFenceNV() allows an app to wait until a specific fence’s condition is satisfied

• Very fine application-level control

Page 9: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

How to use VAR/fence together

• The combination of VAR and fences is very powerful• VAR gives the best possible T&L performance• With fences, apps can achieve very efficient

pipelining• In memory-limited situations, VAR memory must

be reused• Fences can be placed in the command stream to

determine when memory can be reclaimed• Different strategies can be used for dynamic/static

data• App chooses management mechanism

Page 10: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

How to use VAR/fence together (2)

• The learning_VAR demo uses multiple buffers (within a single arena) to achieve high T&L throughput on completely dynamic geometry!

Page 11: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

Performance hints

• If T&L is not a bottleneck, VAR won’t help• VAR helps get geometry to the GPU• If your app is fill-bound, consider

• Adding more geometry – it probably won’t cost anything

• Make sure to use multi-texture and NV_register_combiners to reduce your memory bandwidth requirements

• Dynamic geometry requires CPU tuning work• Float-> int casts can be a huge bottleneck

Page 12: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

Performance hints (2)

• Effective memory management and synchronization is key• Avoid redundant copies

• Do this on a per-array bases, not per-object• Be clever in your use of memory

• Use fences to keep both CPU and GPU working

Page 13: VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt

Questions, comments, feedback?

• Cass Everitt, [email protected]• www.nvidia.com/developer• We try to answer questions at OpenGL.org’s

“advanced” opengl forum