solving some common problems in a modern deferred rendering engine

Slide 1

1Solving Some Common Problemsin a Modern Deferred Rendering Engine

Jose Luis Sanchez BonetTomasz Stachowiak/* @h3r2tic */

Good news everyone! I'm Tom, this is Jose, and we're going to talk about deferred rendering. The focus is on current generation consoles, but the presented techniques can be used on just about any platform, so we hope anyone can benefit from them.2Deferred rendering pros and consPros ( some )Very scalableNo shader permutation explosionG-Buffer useful in other techniquesSSAO, SRAA, decals, Cons ( some )Difficult to use multiple shading modelsDoes not handle translucent geometrySome variants do, but may be impracticalDeferred rendering has been very popular lately due to its scalability, and because it plays nicely with other techniques, which can reuse the G-Buffer. At the same time, it doesnt come without downsides. We are going to cover two of them in this presentation, and propose the custom solutions we've developed for our upcoming console title. The two problems are: handling many shading models, and rendering translucent geometry. I'm going to cover the former in the first half of the presentation, and then Jose will talk about translucency.3

Reflectance models

In graphics rendering, we use simple mathematical formulas, to approximate the look of some classes of surfaces. The most commonly used model, or Bidirectional Reflectance Distribution Function, is Blinn-Phong, which works reasonably well as an approximation of some dielectrics. It is used due to its simplicity, but for the same reason, it cannot reproduce the look of many surfaces accurately. You might want to render your plastics with Blinn, skin with Eric Penner's pre-integrated model, hair with Kajiya-Kay or Marschner, brushed metal with anisotropic Ward, and so on. The visual properties of these surfaces are vastly different, and can not be covered with just a single, simple mathematical model.4BRDFs vs. renderingForward renderingMaterial shader directly evaluates the BRDFTrivialDeferred renderingLight shaders decoupled from materialsNo obvious solution

BRDF???So how do we render with multiple shading models? If you use forward rendering, this is trivial. Because the BRDF is combined with the material in the same shader, it just works.However, in deferred rendering, we need to evaluate the reflectance model in the light shader, and these don't bear any connection to material shaders that the BRDFs are associated with.5BRDFs vs. deferred branching?Read shading model ID in the lighting shader, branchMight be the way to go on next-genExpensive on current consolesTax for branches never takenDont want to pay it for every light

Three different BRDFs, only one used( branch always yields the first one )Platform1 BRDF2 BRDFs3 BRDFs3601.85 ms2.1 ms( + 0.25 ms )2.35 ms( + 0.5 ms )PS31.9 ms2.48 ms( + 0.58 ms )2.8 ms( + 0.9 ms )One approach would be to branch in the light shader. That is, the solid pass emits an identifier of the BRDF into the G-Buffer. The light shader reads it and branches upon its value.This solution might be viable on next-gen hardware, but it doesn't fare quite well on current consoles. In a small test case we did with a single full-screen light, branching brough the rendering cost from 1.85 to 2.1 milliseconds for just a single extra shading model. This is the tax you pay for not even taking the branch. That is, our test case is synthetic, and only the first BRDF is ever used. And it gets much worse on the PS3, which doesn't even have control flow instructions.

6BRDFs vs. deferred LUTs?Pre-calculate BRDF look-up tablesMight be shippable enoughSee: S.T.A.L.K.E.R.Limited control over parametersRoughnessAnisotropy, etc.BRDFs highly dimensionalIsotropic with roughness control 3D LUTOne could also tabulate the BRDF data, and sample it using a combination of an ID, as well as some geometric parameters, such as N dot L and N dot H. One such approach has been used successfully in the game S.T.A.L.K.E.R., so it might be enough for your title as well. The trouble is, BRDFs are highly dimensional functions, so tabulation might be difficult; for example, the data for an isotropic BRDF parameterized by surface roughness, is already at least a 3-dimensional function. /* See Michael Ashikhmins "Distribution-Based BRDFs. */

7BRDFs vs. deferred our approachOne default BRDFOthers a relatively rare caseShading model ID in stencilMulti-pass light renderingMask out parts of the scene in each pass

We decided to use a single reflectance model for most of our scene geometry, and then special-case rendering in rare instances, such as skin and hair.

The core of the idea is pretty simple: when rendering the solid pass, we store the ID of the shading model in the stencil buffer. Then in the lighting pass, we draw light geometry once for each BRDF, using the ID as a mask.

8Multi-pass tax avoidanceFor each lightFind all affected BRDFsRender the light volume once for each modelAnalogous to multi-pass forward rendering!Store bounding volumes of objects with non-standard BRDFsIntersect with light volumesImplemented like this, the idea would be inefficient. We would be multiplying the number of draw calls and shader switches by the number of supported BRDFs. However, when rendering a light, we can detect which BRDFs it can potentially use, and skip any extra processing. If you think of it, this is a very similar idea to multi-pass forward rendering.

Here's a scene with two objects, both of which use different shading models. We have two lights influencing them. The light on the left is interesting, in that it will only affect just one object, hence only one BRDF. Therefore it doesn't need to run the multi-BRDF code path at all.To accomplish this optimization, we store the bounding boxes of all objects which use non-standard shading models. During light rendering, we intersect light volumes with these bounds, and conservatively find a list of all BRDFs which a light may potentially touch.

/* We could in theory detect which BRDFs a light may affect and only use dynamic branching there, but then we either always pay a high cost, or we would need to create lots of shader permutations, for example shading model A and B, A with B and C, A with C, B with C, et cetera. For this reason we are just going to use multi-pass rendering. */9Making it practicalNeeds to work with depth culling of lightsHierarchical stencil on 360 and PS3

Now, there are two more bits to the algorithm, needed to make it practical. Firstly, it needs to work with the commonly used stencil and depth-based light culling trick. Secondly, it must play well with the hierarchical stencil buffer.

Let's start with a quick reminder of depth culling for lights. Consider a surface rendered into the G-buffer, and three lights. The left one is completely in front of the surface, so cannot influence it. The right one is behind the surface, so cannot influence it either. Only the middle one contributes to lighting, because its volume intersects the surface in the G-buffer.

10Depth culling of lightsAssuming viewer is outside the light volumeStart with stencil = 0Render back faces of light volumeIncrement stencil; no color outputRender front facesOnly where stencil = 0; write colorRender back facesClear stencil; no color output

So how do we accomplish that using stencil testing? Let's consider the case when the viewer is outside of the light's volume. The stencil is initially clear.11Depth culling of lightsAssuming viewer is outside the light volumeStart with stencil = 0Render back faces of light volumeIncrement stencil; no color outputRender front facesOnly where stencil = 0; write colorRender back facesClear stencil; no color output

We start by writing the value of one into the stencil by using the back faces of the light volume. This will result in the stencil being set where the light is completely in front of the surface. Therefore we only want to render where the stencil is zero 12Depth culling of lightsAssuming viewer is outside the light volumeStart with stencil = 0Render back faces of light volumeIncrement stencil; no color outputRender front facesOnly where stencil = 0; write colorRender back facesClear stencil; no color output

and we do so using the front faces with stencil testing enabled.

Note that this is a vanilla version of the algorithm, and you may be using an optimized one.13Culling with BRDFsPack the culling bit and BRDF togetherUse masks to read/affect required partsAssuming 8 supported BRDFs:UnusedBRDF IDCulling bit76543210culling_mask = 0x01brdf_mask = 0x0Ebrdf_shift = 1Extending this idea to selectively rendering multiple shading models, we need to pack both the culling bit and the shading model identifier in the stencil buffer.

Because stencil testing supports read and write masks, we can act upon and affect portions of the stencil value.

Heres a sample layout assuming a maximum of eight supported BRDFs. Note that the BRDF bits can be placed at any offset in the byte.14Light volume rendering passesRender back facesIncrement stencil; no color outputFor each BRDF:Render front faceshi-stencil ref = brdf_id

solving some common problems in a modern deferred rendering engine

Documents

deferred rendering pros

ms ps31

used model

graphics rendering

impracticaldeferred

lighting shader

deferred branching

shading model id