stencil routed a-buffer

29
Stencil Routed A- Buffer Kevin Myers and Louis Bavoil NVIDIA

Upload: pete

Post on 23-Feb-2016

59 views

Category:

Documents


0 download

DESCRIPTION

Stencil Routed A-Buffer. Kevin Myers and Louis Bavoil NVIDIA. Our Cool Thing. What is it?. A-Buffer Simply a list of fragments per-pixel “The A-buffer, an antialiased hidden surface method” [Carpenter 84] Related Work Depth Peeling [Mammen 89] [Everitt 01] k-Buffer [Bavoil et al. 07]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stencil Routed A-Buffer

Stencil Routed A-Buffer

Kevin Myers and Louis Bavoil

NVIDIA

Page 2: Stencil Routed A-Buffer

Our Cool Thing

Page 3: Stencil Routed A-Buffer

What is it?

• A-Buffer– Simply a list of fragments per-pixel

• “The A-buffer, an antialiased hidden surface method” [Carpenter 84]

• Related Work– Depth Peeling [Mammen 89] [Everitt 01]

– k-Buffer [Bavoil et al. 07]

Page 4: Stencil Routed A-Buffer

Why do I need this?

• Often want more than nearest– Alpha blending

– Volume rendering

– Collision detection

– Refraction and caustics

– Global illumination

Page 5: Stencil Routed A-Buffer

Why is it hard?

• GPU’s optimized to capture nearest layer– Z buffering and early z test

– Fine for most real-time lighting models

– Wasteful if not rendering front to back

Page 6: Stencil Routed A-Buffer

Things that don’t work

• Blending can’t just turn of z-buffering– Most operations non-commutative

• MRT– Can’t direct output

• Reading what you’re writing– Hazardous

• “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06]

• k-Buffer [Bavoil et al. 07]

Page 7: Stencil Routed A-Buffer

A-Buffer

• “A list of fragments per-pixel”– Anything on the GPU that resembles this?

• MSAA– “A list of samples per-pixel”

– Samples store coverage

Page 8: Stencil Routed A-Buffer

MSAA in review

• Multisampled Antialiasing– Fragments are rasterized at a higher res

• 8xMSAA == 8 x aliased resolution

– Pixel shader is run once per-pixel

– Frame buffer storage is at sample resolution

Page 9: Stencil Routed A-Buffer

Say What?

• MSAA samples == A-Buffer pixels??• MSAA sample patterns don’t help• Need all MSAA samples at pixel center

Page 10: Stencil Routed A-Buffer

Line up your Sub-samples

• Turn off multisampling– Still render to an MSAA buffer

– Pixel shader output bloats to all sub-samples

– BOOL D3D10_RASTERIZER_DESC::MultisampleEnable

• Now writing 8 samples per pixel– All have the same value!!

Page 11: Stencil Routed A-Buffer

Bloating Your Pixel

• Applause?• Meets the definition

– “List of fragments per-pixel”

• Not exactly what we want– Each item contains same value

– Next fragment will clobber the entire list

– Need to update one entry in the list• Once and only once

Page 12: Stencil Routed A-Buffer

Stencil Routing

1 2

3 4

2 3

4 5

3 4

5 6

4 5

6 7

Stencil always increments

Stencil passes when 4

Page 13: Stencil Routed A-Buffer

Stencil Routing

• First introduced by Purcell et al 2003– Did not work for general rasterization

• Tile aligned points

– Fat point is spread across four pixels• Four pixels get same value

• Stencil allows one pixel to update

Page 14: Stencil Routed A-Buffer

Stencil Routing and MSAA

• Stencil always operates at sample res– Regardless of MultisampleEnable state

– DX10 Spec

• Use sub-samples to route– Allows any pixel shader output to be routed

• Arbitrary primitives

Page 15: Stencil Routed A-Buffer

Stencil Routing and MSAA

5

8

23

7

4

6

9

4

7

12

6

3

5

8

3

6

01

5

2

4

7

2

5

00

4

1

3

6

Page 16: Stencil Routed A-Buffer

A Stencil Test That Works

• StencilFunc– D3D10_COMPARISON_EQUAL

• StencilRef– 2

• More on this later

• StencilPassOp and StencilFailOp– D3D10_STENCIL_OP_DECR_SAT

Page 17: Stencil Routed A-Buffer

Initializing Stencil

• Clear stencil buffer to pass value ( 2 )– Initializes sample 0 to 2

• Use SampleMask to selectively update– Stencil set to replace with refrence value

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

2

22

2

2

2

2

2

2

23

2

2

2

2

2

2

23

2

2

2

2

2

2

23

2

4

2

2

5

8

23

7

4

6

9

Page 18: Stencil Routed A-Buffer

Why start at 2?

• When all sub-samples are written– Most stencil values will be 0

• Except the last one written

– Last sample written stencil == 1

• When overflow occurs– All stencil values will be 0

Page 19: Stencil Routed A-Buffer

Occlusion Query Test

0

2

00

0

0

0

3

0

3

00

2

0

1

4

0

0

00

0

0

0

0

0

0

00

0

0

0

1

0

2

00

0

0

0

3

0

3

00

2

0

1

4

0

0

00

0

0

0

0

0

0

00

0

0

0

1

0

0

00

0

0

0

0

Pixel did notoverflow

Pixel overflowed

Page 20: Stencil Routed A-Buffer

Handling Overflow

• Set sample mask to last sample updated• Draw full screen quad

– Issue an occlusion query

– Set stencil to pass if stencil == 0

• Check occlusion query– Sample pass count == overflow count

Page 21: Stencil Routed A-Buffer

Handling Overflow

• Occlusion query– Good

• Very fast

• Allows for dynamic A-Buffer sizing

– Bad• Requires some CPU intervention

– Ideally A-Buffer size is fixed

Page 22: Stencil Routed A-Buffer

Demo Demo Time!

Page 23: Stencil Routed A-Buffer

Secrets of the Dragon

• Single A-Buffer– RG32F

• R is packed color

• G is depth

– Saves on texture loads

• Post process sort– 8 fragment per-pixel bitonic sort

• Additional fragments, insertion sort

Page 24: Stencil Routed A-Buffer

8800 GTX Performance

8 Layers Depth Peeling 8xABuffer ABuffer Speedup

640x480 30.9 164 5.3

800x600 30.4 139 4.6

1024x768 29.5 110 3.7

1280x960 28.1 81.4 2.9

1600x1200 26.2 54.9 2.1

16 Layers Depth Peeling 8xABuffer ABuffer Speedup

640x480 15.5 76.7 4.9

800x600 15.3 63.0 4.1

1024x768 14.7 48.0 3.3

1280x960 14.1 34.6 2.5

1600x1200 13.3 23.1 1.7

Alpha Blended Stanford Dragon

Page 25: Stencil Routed A-Buffer

Limits…DOH!

• 254 layers of depth max– 8-bit stencil ( 255 – 1 for overflow bit )

– If you do this call us cause that’s crazy

• Fragments at same depth– Must be handled in post-process

• MSAA

Page 26: Stencil Routed A-Buffer

Summary

• Stencil Routed A-Buffer– Ideally suited for complex geometries

• Much faster than depth peeling

• A-buffer can be dynamically resized– Use an occlusion query

– Best to pre-determine size

Page 27: Stencil Routed A-Buffer

Future Work

• Render target arrays– Each target has its own stencil buffer

– Target replaces sub-sample• Or augments sub-sample

– #arrays * MSAA level in one “CPU pass”• With dx10 saturates 254 layers

– Use instancing for additional “GPU passes”

Page 28: Stencil Routed A-Buffer

Thanks for all the fish

• Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers