stencil routed a-buffer kevin myers and louis bavoil nvidia

29
Stencil Routed A- Buffer Kevin Myers and Louis Bavoil NVIDIA

Upload: angelina-simpson

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Stencil Routed A-BufferStencil Routed A-Buffer

Kevin Myers and Louis Bavoil

NVIDIA

Our Cool ThingOur Cool Thing

What is it?What is it?

• A-Buffer

– Simply a list of fragments per-pixel

• “The A-buffer, an antialiased hidden surface method” [Carpenter 84]

• Related Work

– Depth Peeling [Mammen 89] [Everitt 01]

– k-Buffer [Bavoil et al. 07]

Why do I need this?Why do I need this?

• Often want more than nearest

– Alpha blending

– Volume rendering

– Collision detection

– Refraction and caustics

– Global illumination

Why is it hard?Why is it hard?

• GPU’s optimized to capture nearest layer

– Z buffering and early z test

– Fine for most real-time lighting models

– Wasteful if not rendering front to back

Things that don’t workThings that don’t work

• Blending can’t just turn of z-buffering

– Most operations non-commutative

• MRT

– Can’t direct output

• Reading what you’re writing

– Hazardous

• “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06]

• k-Buffer [Bavoil et al. 07]

A-BufferA-Buffer

• “A list of fragments per-pixel”

– Anything on the GPU that resembles this?

• MSAA

– “A list of samples per-pixel”

– Samples store coverage

MSAA in reviewMSAA in review

• Multisampled Antialiasing

– Fragments are rasterized at a higher res

• 8xMSAA == 8 x aliased resolution

– Pixel shader is run once per-pixel

– Frame buffer storage is at sample resolution

Say What?Say What?

• MSAA samples == A-Buffer pixels??

• MSAA sample patterns don’t help

• Need all MSAA samples at pixel center

Line up your Sub-samplesLine up your Sub-samples

• Turn off multisampling

– Still render to an MSAA buffer

– Pixel shader output bloats to all sub-samples

– BOOL D3D10_RASTERIZER_DESC::MultisampleEnable

• Now writing 8 samples per pixel

– All have the same value!!

Bloating Your PixelBloating Your Pixel

• Applause?

• Meets the definition

– “List of fragments per-pixel”

• Not exactly what we want

– Each item contains same value

– Next fragment will clobber the entire list

– Need to update one entry in the list

• Once and only once

Stencil RoutingStencil Routing

1 2

3 4

2 3

4 5

3 4

5 6

4 5

6 7

Stencil always increments

Stencil passes when 4

Stencil RoutingStencil Routing

• First introduced by Purcell et al 2003

– Did not work for general rasterization

• Tile aligned points

– Fat point is spread across four pixels

• Four pixels get same value

• Stencil allows one pixel to update

Stencil Routing and MSAAStencil Routing and MSAA

• Stencil always operates at sample res

– Regardless of MultisampleEnable state

– DX10 Spec

• Use sub-samples to route

– Allows any pixel shader output to be routed

• Arbitrary primitives

Stencil Routing and MSAAStencil Routing and MSAA

5

8

23

7

4

6

9

4

7

12

6

3

5

8

3

6

01

5

2

4

7

2

5

00

4

1

3

6

A Stencil Test That WorksA Stencil Test That Works

• StencilFunc

– D3D10_COMPARISON_EQUAL

• StencilRef

– 2

• More on this later

• StencilPassOp and StencilFailOp

– D3D10_STENCIL_OP_DECR_SAT

Initializing StencilInitializing Stencil

• Clear stencil buffer to pass value ( 2 )

– Initializes sample 0 to 2

• Use SampleMask to selectively update

– Stencil set to replace with refrence value

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

2

23

2

2

2

2

2

2

23

2

2

2

2

2

2

23

2

4

2

2

5

8

23

7

4

6

9

Why start at 2?Why start at 2?

• When all sub-samples are written

– Most stencil values will be 0

• Except the last one written

– Last sample written stencil == 1

• When overflow occurs

– All stencil values will be 0

Occlusion Query TestOcclusion Query Test

0

2

00

0

0

0

3

0

3

00

2

0

1

4

0

0

00

0

0

0

0

0

0

00

0

0

0

1

0

2

00

0

0

0

3

0

3

00

2

0

1

4

0

0

00

0

0

0

0

0

0

00

0

0

0

1

0

0

00

0

0

0

0

Pixel did notoverflow

Pixel overflowed

Handling OverflowHandling Overflow

• Set sample mask to last sample updated

• Draw full screen quad

– Issue an occlusion query

– Set stencil to pass if stencil == 0

• Check occlusion query

– Sample pass count == overflow count

Handling OverflowHandling Overflow

• Occlusion query

– Good

• Very fast

• Allows for dynamic A-Buffer sizing

– Bad

• Requires some CPU intervention

– Ideally A-Buffer size is fixed

DemoDemo Demo Time!

Secrets of the DragonSecrets of the Dragon

• Single A-Buffer

– RG32F

• R is packed color

• G is depth

– Saves on texture loads

• Post process sort

– 8 fragment per-pixel bitonic sort

• Additional fragments, insertion sort

8800 GTX Performance8800 GTX Performance

8 Layers Depth Peeling 8xABuffer ABuffer Speedup

640x480 30.9 164 5.3

800x600 30.4 139 4.6

1024x768 29.5 110 3.7

1280x960 28.1 81.4 2.9

1600x1200 26.2 54.9 2.1

16 Layers Depth Peeling 8xABuffer ABuffer Speedup

640x480 15.5 76.7 4.9

800x600 15.3 63.0 4.1

1024x768 14.7 48.0 3.3

1280x960 14.1 34.6 2.5

1600x1200 13.3 23.1 1.7

Alpha Blended Stanford Dragon

Limits…DOH!Limits…DOH!

• 254 layers of depth max

– 8-bit stencil ( 255 – 1 for overflow bit )

– If you do this call us cause that’s crazy

• Fragments at same depth

– Must be handled in post-process

• MSAA

SummarySummary

• Stencil Routed A-Buffer

– Ideally suited for complex geometries

• Much faster than depth peeling

• A-buffer can be dynamically resized

– Use an occlusion query

– Best to pre-determine size

Future WorkFuture Work

• Render target arrays

– Each target has its own stencil buffer

– Target replaces sub-sample

• Or augments sub-sample

– #arrays * MSAA level in one “CPU pass”

• With dx10 saturates 254 layers

– Use instancing for additional “GPU passes”

Thanks for all the fishThanks for all the fish

• Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers

The last slide…The last slide…

• ?

[email protected]

[email protected]