optimized effects for mobile devices · nizar romdhane, director of ecosystem, arm . arm #1124...

45
Optimized Effects for Mobile Devices Ed Plowman, Director of Performance Analysis, ARM Stacy Smith, Senior Software Engineer, ARM

Upload: others

Post on 13-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Optimized Effects for Mobile Devices

Ed Plowman, Director of Performance Analysis, ARM

Stacy Smith, Senior Software Engineer, ARM

Page 2: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Grab Your Crystal Balls

Top 3 questions I get asked:

Q. What does the future of mobile content look like?

A. That depends on how much GPU capability you have?

Q. How much performance will content developers need?

A. As much as you can give them!

Q. When will the mobile reach console quality?

Well, lets take a look at that and see if we can answer the others along the way…

Page 3: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Mobile GPU Compute Year On Year

0

1000

2000

3000

4000

5000

6000

2006 2008 2010 2012 2014 2016 2018

GF

LO

PS

/Sec

How long before Desktop GPU compute is in Mobile?

PS3 Xbox 360

Page 4: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

0.000

50.000

100.000

150.000

200.000

250.000

300.000

350.000

400.000

2006 2008 2010 2012 2014 2016 2018

Gig

a b

yte

s/se

c

How long before Desktop GPU Bandwidth is seen in Mobile?

Mobile GPU BW Growth Year on Year

PS3 Xbox 360

Sate of the Art

Mobile

Page 5: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Why is BW Not Progressing as Fast?

Simple… Power!

I did the graph, but desktop GPU power is too horrific!

Desktop = 170 Watts to >300 Watts… that’s just the GPU!

Console = 80-100 Watts (CPU/GPU/WiFi/Network)

Mobile Platform = 3 - 7 Watts (CPU/GPU/Modem/WiFi)!

Page 6: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

How to Get 100W of Work from 3Watts?

“I believe the sign of maturity is accepting

deferred gratification.” - Peggy Cahn

Five main suppliers of GPU tech in mobile

Three are deferred renderers

And those three make up >90% of the volume

This is not a coincidence!

Deferred rendering is most efficient GPU tech for Mobile

Efficiency of BW, HW resource and Power

Getting the most from it requires slightly different thinking…

Page 7: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Thinking in a Deferred World…

Minimize draw calls and state changes

Draw Calls/API calls are not free use them wisely

Grouping draw calls with like state = good

But… don’t go crazy

Large object batches with high potential occlusion can be costly

Remember those vertices still need processing

Draw Target Bind/unbind on each draw call = bad

Seen (disappointingly) in a lot of commercial engines

Can cause flush and reload cycles of tile/cache memory

Bind it once, issue all draw calls, unbind it…

Hint: Take a look at the use of glDiscardFramebufferEXT()

Indicates to driver that render attachment is done with/complete

Page 8: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Thinking in a Deferred World…

Use Vertex Buffer Objects

Client side vertex buffers use copy on write (CoW) on each Draw Call

VBO’s don’t, so they provide a considerable performance increase

Avoid dynamic VBO or IBO updates using glBufferSubData()

Multiple Render Targets (New for OpenGL® ES 3.0)

Very efficient on deferred GPU

Make sure sum of bits/frag is “do-able” “in tile” for max performance

Different criteria for each GPU provider

Page 9: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Avoiding Blocking Behaviours

Deferred GPU’s use a pipeline

glReadPixels(), glCopyTexImage(), glTexSubImage() = bad…

If you must use glReadPixels use PBO’s

Use FBO instead of glCopyTexImage()

Also Occlusion Query (OpenGL ES 3.0) - Results delayed by 1-2 frames

Busy waiting on OQ bad idea!

Build Command Vertex Shading Fragment Shading

Frame N [-----] [-----]

Frame N+1 Frame N [-----]

Frame N+2 Frame N+1 Frame N

Frame N+1 [-----] [-----]

Page 10: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Make Every Access Count

Think about “cacheability” of data

De-interleave vertex data

Think about representation

Do you really need a FP32/component for a texture coordinates accessing a 512x512 texture?

X Y Z W RGB TexCord Vertex =

Vert 0 Vert 1

Vert 2 Vert 3

Cache line 1 =

Cache line 2 =

Vert 0

(XYZW) (R

GB

) Cache line 1 =

Cache line 2 =

Vert 1

(XYZW)

Vert 2

(XYZW)

Vert 3

(XYZW)

Tex

Cord Cache line 3 =

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

Tex

Cord

Tex

Cord

Tex

Cord

Tex

Cord

Tex

Cord

Tex

Cord

Tex

Cord

Page 11: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Compress, Compress, Compress!

25

35

45

55

8 5.12 3.56 2 1.28 0.89

PS

NR

(d

B)

Compression Rate (bpp)

ASTC = Adaptive Scalable Texture Compression

New texture compression standard developed by ARM, adopted by Khronos

KHR_texture_compression_astc_ldr for OpenGL ES and Open GL

Increased quality and fidelity at low bit-rates

Expansive range of input formats offers complete flexibility

Choice of base format, 2D and 3D plus addition of HDR formats

Page 12: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Compression in the Pre-ASTC World

L

LA

X+Y

HDR L

RGB

XY+Z

RBGA

RGB+A

HDR X+Y

HDR RGB

HDR XY+Z

HDR RGBA

HDR RGB+A

1 2 3 4 5 6 7 8 Compressed bits/pixel

Input

Colo

r Fo

rmat

s

8

16

16

16

24

24

32

32

32

48

48

64

64

Input

bits/

pix

el

ETC, BC5

ETC, BC4

BC7

All Major Players

PVRTC PVRTC

ETC, BC1

BC6

ETC, BC2

BC3, BC7

Page 13: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

ASTC Choices

L

LA

X+Y

HDR L

RGB

XY+Z

RBGA

RGB+A

HDR X+Y

HDR RGB

HDR XY+Z

HDR RGBA

HDR RGB+A

1 2 3 4 5 6 7 8 Compressed bits/pixel

Input

Colo

r Fo

rmat

s

16

24

24

32

32

32

48

48

64

64

Input

bits/

pix

el

All ASTC

8

16

16

Page 14: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Look it up or Calculate it? You would be surprised what you can get done in a cycle…

precision mediump float;

varying vec4 detailtc_envtc, bumptrans;

uniform sampler2D dettex, envtex, colormap;

uniform float color_param, bumpstrength;

void main()

{

vec4 bt = bumptrans;

vec2 bt_crossmul = bt.xy * bt.wz;

float diffuse = max(0.0, bt_crossmul.x-bt_crossmul.y);

vec4 bump_cr = texture2D(dettex,detailtc_envtc.xy);

vec4 tbump = bt * bumpstrength * bump_cr.xyxy;

vec2 envtc = tbump.xy + tbump.zw + detailtc_envtc.zw;

vec4 col = texture2D(colormap, vec2(bump_cr.z, color_param));

vec4 env = texture2D(envtex,envtc);

gl_FragColor = col * diffuse + env * bump_cr.w;

}

Shader Features:

•Paletted color mapping

•Environment mapping

•Bump mapping

•Variable reflectance mapping

•Diffuse falloff of the texture color

•Adjustable bump map strength

•Adjustable color table

Mali-T600 series = 3 Cycles

Page 15: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

GDC 2012 Demo: Timbuktu

Page 16: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Asset Conditioning

Cross platform - desktop & mobile

Desktop build - caching

Mobile build - loads caches

Asset pipeline - utility functions

Page 17: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Batching Deferred immediate mode rendering

glDrawElements and glDrawArrays have an overhead

Less draw calls, less overhead.

DrawCall class stitches multiple objects into one draw

Macro functions in shaders make batching as simple as:

vec4 pos=transform[getInstance()]*getPosition();

Page 18: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Batching

Page 19: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Batching

uniform mat4 transforms[4];

Page 20: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Object Instancing Multiple geometries, or single object instances:

for(int i=0;i<50;i++)

drawbuilder.addGeometry(geo1);

drawbuilder.Build()

Can implement LOD switching, when objects are sorted front to

back and correctly culled.

Seen in TrueForce

Page 21: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects

Page 22: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Bloom

Page 23: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Bloom

When considering uses of greater colour resolution the first

thought was HDR and Bloom.

But how to do bloom without the HDR images?

Page 24: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Bloom

Page 25: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Bloom

Render to low res FBO mapped to texture

Value filter and blur in 1st

post- processing pass, onto

second FBO texture

Sample vertical blur in

second pass then apply to

full resolution frame buffer

Page 26: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Depth of Field

Page 27: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Depth of Field

16bit depth buffers as textures opened the possibility of a

variable blur for depth of field

But how to do it without 16bit textures?

Page 28: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Depth of Field

Page 29: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Depth of Field

Additive

Mix

Bloom

Page 30: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Terrain Mapping

Page 31: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Terrain Mapping

Uniform Buffers and Vertex IDs can be used to implement

tessellated mesh subdivision

But how can this be approximated without the buffers or IDs?

Page 32: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Terrain Mapping

Page 33: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Special Effects: Terrain Mapping

Page 34: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

SIGGRAPH 2012 Demo: Timbuktu 2

Page 35: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Timbuktu 2: Extended features

OpenGL® ES 3.0!

3D textures

Shadow comparison

16 bit depth textures

HDR lighting

Page 36: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

3D Textures

Give more definition to deformed track

3D Textures mipmap in all 3 dimensions

Instead used 2D Texture arrays

Page 37: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

3D Textures

floor(z)

fract(z)

ceil(z)

texture2DArray

mix(t1, t2, frac)

Page 38: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Shadow Mapping

Depth rendered to FBO from viewpoint of light

Projected onto scene to compare to distance

Doing this in the shader yields some interesting results

Page 39: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Shadow Comparison Texture

Compare

Interpolate

Compare

Compare

Compare

Compare

Interpolate

OpenGL ES 2.0 Texture Compare:

OpenGL ES 3.0 Shadow Texture Compare:

Page 40: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

16 Bit Depth Buffers Also used for:

Soft Particles

Better fidelity of DOF

Page 41: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Particle Lighting Displacement Mapping:

Texture offset to X and Y coords

Distortion strength increases over time

Page 42: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

HDR Lighting RGBA 10 10 10 2 format used

Everything gets normalised

Bright spots need to stand out

Make everything else darker!

Set exposure in post processing

Page 43: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

Main conference ARM Sponsored Sessions

Thursday, March 28th

10:00 – 11:00am

Room 3022

Optimized Effects for Mobile Devices Stacy Smith, Senior Software Engineer, ARM

Ed Plowman, Director of Performance Analysis, ARM

1:00 – 2:00pm

Room 3016

The Future of Mobile Gaming

PANEL SESSION Moderator: Jason Della Rocca, Co-founder of Execution Labs

Baudouin Corman, Vice President of Publishing, Americas for Gameloft

David Helgason, CEO, Unity

Dr Chris Doran, Founder & COO, Geomerics

Niccolo De Masi, CEO, Glu

Jasper Smith, CEO, PlayJam

Nizar Romdhane, Director of Ecosystem, ARM

Page 44: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

ARM #1124 in-booth Educational Theater

Over 30 talks from ARM and partners such as EpicGames, Havok, PlayJam,

Geomerics, Softkinetics, Metaio, Marmalade

20 minute length talks with Q&A at the end

An Android tablet prize draw at each session

Summary and videos of all Educational Theater Talks at

http://malideveloper.arm.com/gdc2013

Page 45: Optimized Effects for Mobile Devices · Nizar Romdhane, Director of Ecosystem, ARM . ARM #1124 in-booth Educational Theater Over 30 talks from ARM and partners such as EpicGames,

malideveloper.arm.com

Thank you

Any questions?