tips and lessons - wordpress.com...tips and lessons - optimizing cpu usage - optimizing gpu usage -...

40
T&L Tips and lessons for profiling and optimizing mobile games Hello Everybody and welcome to this talk called Tips and lessons for profiling and optimizing mobile games.

Upload: others

Post on 17-Jun-2020

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

T&LTips and lessons for profiling and optimizing mobile games

Hello Everybody and welcome to this talk calledTips and lessons for profiling and optimizing mobile games.

Page 2: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Tips and lessons - Optimizing CPU usage

- Optimizing GPU usage

- Optimizing Memory usage

- Optimizing Battery usage

- Optimizing all things

Today I’m going to talk about some of the things we did in order to optimize CPU usage, GPU usage, Memory, Battery and hopefully if we have time some things related to data, code and workflow in general.

Page 3: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Who am I Mihai Sebea

- Studio Technical Coordinator @ Gameloft Barcelona

- 14+ years in games- Worked on - Nova 1 & 2- Asphalt 8 - Asphalt 9 - Despicable Me : Minion Rush

Twitter: @mihai_sebea

But first, For those who don’t know me, My name is Mihai SebeaI’m Studio Technical Coordinator at Gameloft Barcelona.I’ve been making games since as long as I can remember and more than 14 years professionally.I started off by porting java games to various platforms.I worked on Nova 1 & 2, Depsicable Me : Minion Rush, The adventures of Tin Tin, Asphalt 8 and recently on the critically acclaimed Asphalt 9.

Page 4: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Performance- Make your program correct and stable- Make your program fast- Add more features- Design your systems- Convince managers that performance

is a feature

Before we head into the technical details of profiling and optimizing we first need to have a small chat about Performance.

We hear a lot the phrase “premature optimization is the root of all evil”, and I have a feeling that some people understand this as we should not optimize some code unless it’s really REALLY SLOW.

I would say that we should interpret this more on the lines of “do NOT sacrifice everything” in the name of performance (and by everything I mean readability, scalability, maintainability and other bilities that might be important for your project).

Also, we could look at it more on the lines of “do not optimize your code” until it’s correct and stable.

Considering this, the best approach is to think about Performance as a feature. Start off small. Make your application work and not crash first and foremost.

There is no point in optimizing something if you don’t have the correct results or if the application crashes.

After each round of adding more features and bugs (i meant bug fixes) schedule a round of profiling and optimizing otherwise the tendency for your application is to become slower, bigger and use more memory.

Each time you do one of this rounds you will get a better understanding on how your systems work and how data passed from one part to another, how it’s transformed. Try to schedule some time here to better design your systems and your API’s to allow you better performance.

Managers might not be open to the idea of spending time on rewriting code just to get the same results, so gather some data on how this redesign will help things in the future and what it means if a certain issue is left unresolved in the long run!

For example we noticed for some games, that for each 100MB we save from the application’s size on disk, we would get 100K more users installing the app.

Also we have seen that on each update where we were adding a new level we were

Page 5: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

adding about 200MB of data.

It was quite obvious that this would not scale very well.

This gave us some time to look into how we could optimize package size and how can we better design our levels so they use less disk space.

Don’t get discouraged if you don’t succeed on the first try. Gather more data, find a better time to expose the issue and the solution.

Page 6: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

First rule of profiling ?- Establish a baseline - Gather data- Make a hypothesis- Test the hypothesis- Measure results- Rinse and Repeat

What things can we profile?

CPU, GPU, Memory, Workflow

So you got some time for performance improvements.Where do we start ?

Establish a baseline so you can MEASURE if you are making changes in the right direction.Baseline means make sure you have the exact same conditions.For example you profile always the same level or the same spot where the issue happens.For mobile devices make sure you profile on the same actual device. (Be careful that some manufacturers have the same device name with different CPU and GPU so very different performance characteristics).Make sure your device is not throttled.

- You can add a visual indicator in the game to know when this happens. - Submerge the device into a glass of water if the device gets too hot

- I’m kidding ...don’t do that unless the device is water resistant - Try profiling in airplane mode so other apps won’t interfere with profiling.- Note that these will create ideal conditions but you will have to be prepared for

far less than ideal. - Some platforms support setting your device in a permanently throttled state

you can profile on worst conditions too and make sure your application handles them properly.

After you establish a baseline start gathering some data.Based on the data gathered, ALWAYS BASED ON the data, make an hypothesis. DO

Page 7: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

NOT, I Repeat do not make a hypothesis without data! Test that hypothesis by making changes and measure results of the changes.Continue until a certain target is reached (usually 33ms / frame) Let’s see next what things can stop you from reaching the said target.Usually you are limited by either CPU or GPU.As a side note here always use ms to describe the performance of your system. It much clearer to say this system takes 6ms to do its job instead of saying this takes 2fps since you don’t know how much fps you had to begin with.

Page 8: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

CPU Profiling- ALWAYS profile in Release with all

optimizations enabled- Read your compilers

documentation - Experiment with different

optimization flags - Os / O3 / Oz - Use the Profiler- Check all the threads- Double check your thread priority

Now, Before we start with CPU profiling:Make sure that you:

- Compiled the executable with the best flags for your release configuration. - Read the compilers documentation to see what flags it supports and what they

do.- When you upgrade your compiler check it again as things might have

changed.- Experiment with different flags.

- For Asphalt9 we noticed that using optimize size it’s better then using optimize for speed as it results in a smaller but faster binary.

- Use Link time optimization if possible. It usually results in smaller, faster binary.

First lesson here:- Always use a profiler.

- Don’t look at graph with the frame time, it will not give you any information except that at some point you had a problem.

- Don’t stare at the code until it confesses, it usually doesn’t.Second lesson:

- If you see some functions taking randomly more time then they should you need to check what the other threads are doing.You usually have way more threads than cpu cores so some threads might be preempted so that other threads have a chance to be scheduled on a certain core.

- Make sure you have lower priority for threads that are not critical so they don’t

Page 9: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

- preempt your main thread.- If you are supporting multiple platforms and if you have threads with lower or

higher priority, make sure you double check each platform’s documentation on what a thread priority value represents as it might mean different things on different platforms.

- On Asphalt9 we had such a bug because we assumed that the ios thread priority values are the same as on linux.

Page 10: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

CPU Profiling2 types of profilers

- Statistical Profilers (Sampling) - Very sleepy, Visual studio, - Orbit profiler, Instruments- Doesn’t require any changes- Useful when you have low fps - Might be limited to some platforms - Might not catch spikes

- Instrumentation - Remotery, MicroProfile, Tracy- Can catch spikes - Should work on any platform- You need to add the blocks manually - Adding blocks means recompiling - Ads more overhead for displaying the events or sending them over the network

So let’s talk about profilers:There are 2 types of profilers Sampling profiler and Intrusive profiler or Instrumentation

- A sampling profiler will stop the application at certain intervals and record the stack trace to show you where your application spends most of the time.

- There are quite a few. Very Sleepy, Visual Studio, Orbit Profiler- This type of profiler has the advantage that it doesn’t require any code

changes. - Disadvantages are that it might not work on all platforms.- Might not catch occasional spikes.-- Instrumentation requires you to add manual blocks in the functions that you

want to profile.- The advantages would be that it should work on any platform.- Will catch frame spikes - Disadvantages are that: You need to add the blocks manually - Every change means you need to recompile and deploy - Ads more overhead because every block has a cost. Plus the cost of

processing all the information and displaying or sending it over the network

Page 11: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Live Demo

I have small demo here with a small procedural animation cause we are all programmers right :)And you can see that from time to time, randomly the animation stutters.Your mission if you choose to accept is to find out why. Let’s look at it first with a sampling profiler. I will be using Very Sleepy which is a great little tool. Let’s profile this for a few seconds ...We have a lot of functions call here some of them are interesting some of them are not but we cannot know for sure which function call is causing the problem.

And here is where an intrusive profiler becomes useful.

I’m using Tracy here as it’s very fast and has little overhead and a really nice UI …And you can easily see the problematic code.You might be tempted to say that no programmer will add random sleeps in their code. And you are right, no one will do this intentionally. But we had this exact same problem a few weeks ago when we refactored some code where a thread was supposed to start then sleep for a configured amount of time before actually doing it’s job. When we refactored this code we inadvertently made the main thread sleep for a while.Thanks to instrumentation it was easy to spot and fix the problem. Now this all seems to be easy right ? but we actually tried about 5 profilers until we

Page 12: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

settled on this one. Each one has it’s share of issues either too much overhead or a buggy UI or both :) What helped a lot is to have some defines that are agnostic to the profiling library and keep the profiling library as an implementation detail.This way we were able to swap out and test multiple profilers and choose the one that we liked the most.A few things to keep in mind

- You NEED to Disable ALL profiling code in retail version !- Intrusive profilers do have and overhead so try to keep few biggers scopes

and add more little ones only when needed. - If a profiling scope has an overhead of say 2 us (which seems very very little

right ?) and you have 5000 of them on a frame you will have a 10ms overhead per frame which would mean a third of your frame time

- Profilers will allocate more memory and probably use a few threads so keep that in mind as well as we will talk about memory later on.

Page 13: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

CPU optimizationsUpdate

and

render

in

parallel

Next tip would be “Parallelize as much as you can”Usually if your game is single threaded your thread will be split between updating the game objects and issuing draw calls.And somewhat easy solution, If you’re game or rendering engine allows it, is to do the rendering and update in parallel on 2 different threads (since today’s mobile devices usually have at least 2 cores).

Obviously this introduces a frame of delay but in practice this is might not be a huge issue.

We did this for Asphalt9 as you can see in this image.The game code is mostly single threaded so game programmers don’t need to worry about threading.The game thread is split into 2 phases. An ”update” phase where game objects are updated and a “render” phase where drawcals are prepared and added to what the game calls a command buffer.On the next frame this command buffer is given to the Render thread that issues draw calls commands.This allowed us to have a lot more draw calls then we could do on Asphalt8 especially on lower end platforms without introducing threading issues on the game side.

Also you see here that Physics run a on a different thread as well and the game has a small phase at the beginning where it has to wait for the results of the physics thread. In practice it rarely has to wait.

Page 14: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

If on some devices the physics thread takes too much time because of the preemption that we were mentioning earlier then we have the option to run the physics simulation on the update thread.

Also you can notice that there is another thread that seems to do a very long operation. That is the texture streaming thread that was set to a low priority So all operations that run on it will take much longer then expected as the OS will schedule it less often. The thing here is that we always keep a small texture i think 64x64 loaded in memory and if the streaming thread has not finished loading the whole texture we will display the small one until the full texture is ready. In practice this is not very noticeable and it helps the game run as smooth as possible.

Page 15: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

CPU optimizations- Cache as much as you can*

- Be aware of the memory that you are using- Precompute as much as you can - Load files async

- Don’t stall the cpu to wait for disk i/o- Use instancing - Use impostors (billboards)- Use batching (manual or automatic)- Use neon / simd instructions- Use binary shaders (if you can)

- Bake them offline if your platforms supports it - Bake them on loading (async)

- while your application starts - Cache them to disk

- Move stuff to GPU (skinning, particles, draw indirect)

There are many other things …- It usually helps to cache as much as you can instead of computing it - Be aware that caching implies that you have the same data in 2 places so if

the data needs to be updated you will need to update it in 2 places - Precompute data offline if you can or if you cannot try to cache it to disk the

first time you compute it - If you need to load files from disk try to load them async as disk access is

somewhat slow on certain platforms- If you have enough memory try to keep as much as you can in memory so you

don’t have to read from disk- Use instancing if you have the same mesh multiple times in your scene

- This will save you draw calls - Use impostors if some mehes change very little over time

- This will give you much cheaper drawcalls as you only need to draw a quad with a texture on it

- Use some form of batching drawcalls together either automatic if the rendering pipeline supports it or with some manual system.

- Again this will save you drawcalls as these are usually the biggest issue on the cpu (on certain graphics API*) - i’m looking at you OpenGL

- Use neon or simd instruction for matrix multiplications, particle simulation or anything can can be computed wide

- Use binary shaders if you can, otherwise shader compilation can be pretty slow for complex shaders

Page 16: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

- Bake them offline if your platforms supports it and if you know all your shader permutations

- If you cannot bake them offline you can bake them once when your application starts and cache them to disk for the next run

- Better yet bake just your menu shaders first and keep compiling he rest of the shaders in background while the player goes through the menus

- If the actions phase starts too soon and not all shaders are ready, keep displaying the loading screen until all of them are compiled so you don’t have freezes if they are compiled right when they are needed for rendering.

- Last but not least there are things that could be moved to GPU like skinning for example or particle systems or even better the actual draw calls with whatever the new cool graphics API’s your platform supports.

Page 17: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

GPU Profiling- Render DOC - Microsoft PIX- Nvidia NSight - Xcode - instruments

- Game Performance Template

- Adreno GPU Profiler- Mali Graphics Debugger- Android Studio’s GPU Debugger

Ok ...we now reached the point where you are no longer bottlenecked by the CPU, you probably moved skinning and other things to the GPU and most likely you are bottlenecked by the GPU.There are a lot of tools available for different platforms.Render DOC is probably one of the best tools out there for debugging.The great thing about it is that it’s open source and if it crashes you can always debug it and fix it. And I recommend you submit merge requests with your fixes so we can all benefit.Xcode’s Instruments is pretty cool. I was usually using tiler/fragment/device utilization to figure out which part needs to be optimized. The latest version of xcode has even more powerful tools. When you capture a frame it can show you your post fx chain and how different passed depend on each other, it shows you warnings if something is not optimal, it can show you memory usage, the performance of your shaders and much more. Other than this the game performance template can show you even more interesting stuff like when a frame is scheduled and why and so on.

It’s worth mentioning the adreno GPU profiler the Mali graphics debuger and Android Studio’s GPU debugger.

Page 18: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Live Demo- Live Demo- Pause the app- Modify Shaders- Check cycle count

There are a lot of things these profiling tools can do and a lot of information that they can show you but they don’t work on all platforms and sometimes there are some things that they can’t do without the help of the game.

For the purpose of helping us with debugging and profiling we wrote a small tool called JetDebuggerJet being our rendering engine.

Let’s push our luck and try another demo.

Let’s start our previous demo and Jet debugger We connect through a tcp/ip socket so this will work on an actual device as well. One problem that we had was that a bug was happening at certain camera angle where a certain expensive shader was visible.So we added a button in jet debugger to be able to “Pause” the application. Basically call Update with a time of 0.And now we can “pause” the application in that certain spot that was problematic. This is useful when profiling as it reduces the noise from the update part of the game and allows us to focus on the rendering.

There is also a button to play one frame so you can debug stuff that happens very quickly. For cases where we want to slow things down or speed them up there is a

Page 19: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

slider that allows us to increase or decrease the time. Another useful feature is the ability to modify your shaders on the fly on the device.

We have a list of shaders Each shader has a list of mutations.Mutations are a pairs of shaders and specfic features that can be toggled on and off in each shader.

There are 2 vertical panels On the left we have the source code of the shadersIt’s written in a custom language from which we can generate source code for glsl/hlsl or metal and we can to add more platforms if needed. You can modify the shader in realtime and you can check on the device if this has an impact on performance. It’s also useful when you want to try out new things or just fix bugs. On the right we have 2 panels for the fragment and vertex shader. Another useful feature is the ability to run some offline shader compiler and get some information about the cycle count of each shader.This gives us a good approximation on what part needs to be optimized for a certain device.

Now these changes are done in memory so the tip here is DO NOT forget to save your changes in the actual shader file on your computer and commit them in source control. :)

Page 20: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

GPU Optimizations- Fragment Shader Limited

- Downscale you effects

- Vertex Shader Limited- Optimize shaders - Use LODs

- Bandwidth Limited - Be aware of your render targets

- load / store settings - Compress your textures

With the GPU you are either fragment shader limited, vertex shader limited or bandwidth limited. In today’s 4k resolutions it’s quite easy to be fs limited

- Try to drop the resolution to something less than native for the 3d scene (70-80%) but keep the UI at 100% so it looks crisp !

- You can drop it even more if you have some kind of AA post effects- Avoid large transparent objects as blending can be expensive - Use lower precision where possible- Move normalizations to VS shader where possible as usually you have way

less vertex then you have pixels- Compress / downsize your textures - so you are not limited by texture

bandwidth- Use mipmaps - Disable trilinear or anisotropic filtering where it’s not need

VS limited - Use lower precision where possible- Use LOD’s for your meshes where possible- Optimize your vertex format to be as small as possible- Use 16b indices where possible

Bandwidth limited- Check if you are storing the contents of some render targets that are not

reused later on and could be discarded- Keep Render Target format as small as possible (Don’t use floating point if you

don’t have HDR assets)

Page 21: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

- Use packed stencil and depth buffer if supported- Use memoryless render targets if possible and if your device supports it

Page 22: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Scaling Out- Disable effects- Disable particles- Decrease shadow quality- Increase culling ratios- Decrease sound quality- Decrease texture LOD- Decrease reflections quality- Decrease UI render size

At this point you did everything you could do to optimize for both CPU and GPU but for some devices it will not be enough.So it’s time to start cutting stuff:

- Scale down post effects - Scale down on particles - Decrease Shadow quality - Increase culling ratios - so objects are culled earlier so they are not rendered - Increase Texture LOD - so you will use smaller textures- Decrease reflection quality - If you have some fancy screen space reflection

that is too expensive add another system where maybe just some small meshes are rendered

- If it’s not enough, Decrease UI render size - Cut out anything that it’s not essential for gameplay.

Best thing to do is NOT hardcode these options but keep them in a file that the game can read so you can quickly toggle them and check if it helps performance.

Even better is to have this file stored on a server and the game to check for an updated file in case you want to update some options when the game is live so you don’t need to recompile a new version.

It’s also useful to store in this file different options that change how your rendering works around different driver bugs.

Page 23: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Memory Profiling

- Heap - VM- Textures - Buffers

OK ...so at this point our application is running smoothly … cpu usage is pretty low .. gpu doesn’t heat up … all is great except from time to time the OS kills our application because it’s using too much memory.

Things that use memory may include but are not limited to :)- Heap allocations - Virtual memory - Textures- Buffers

Page 24: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Heap- No leaks- Be aware of padding of your structs- Avoid std unordered map

- It can have a lot of overhead

- In general avoid small structs with lots of containers inside - Use one linear array with all your data

- Works best if you try to limit to POD types- And reinterpret cast the memory buffer as your struct

- Be careful when serializing data if you are on a 32b or 64b architecture

First make sure you don’t have any leaks :)XCode’s leaks instruments is by far the best tool to find memory leaks. It can even find circular references to shared pointers.It’s quite slow on older devices but it’s definitely worth the pain to find those nasty leaks.

Be careful with the padding of your structs or classes as it might not be easy to spot how much overhead they add.Be aware of the overhead of your containers:

- Avoid unordered::map as it can have significant overhead - Avoid having lots of structs with small containers like vector or string inside as

an empty vector has of overhead of 24 bytes on 64b architecture. - If you have systems with lots of small objects, pack them in one big linear

array, if possible- This works best if you limit your data to POD types - You can reinterpret_cast part of the memory buffer so you can still work with

structs in your code as that will keep your code readable.- Just be careful that the memory needs to be aligned to 4 bytes on armv7

platforms otherwise your application will crash.

As a note here try to use only sized types (uint32_t) in structs that you are serialization to disk or passing through the network to avoid problems between different architectures, but also to make it easier to calculate the size of your structs ( and the size of the padding)

Page 25: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today
Page 26: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Virtual Memory - Memory map large(ish) files- Don’t map large files

- If you are going to - use the whole contents

- Use a thread pool instead of - a large number of threads

Virtual memory is a nice little trick the the OS does to let the application allocate more memory then the physical system has.This works pretty well in practice as long as you don’t allocate too much memory.

If you have big files from which you need only small parts (like fonts for example) it helps to memory map the file instead of keeping them all in memory.Memory mapping a file means the system will page in the parts that you are using but it’s free to unload the pages that are not in use if the system is low on memory.

But if you have large files that you are going to consumed all at once (like uploading textures) it seems it’s faster to copy them to memory and upload them than to give a memory mapped pointer to the texture upload api. On iOS the system takes into account the number of pages in use in order to notify apps to free some memory or to even close them if no more memory is available.

This means that memory mapping helps a lot but there are some things you need to keep in mind:

- There are already a lot of things that are memory mapped : - The executable, system fonts, some assets that the os uses - We also noticed through empirical tests that there is a limit to the virtual

memory we could allocate and it seems to be around 700-800MB or 32b devices and around 2.5GB for arm64 devices.

Also, on iOS each thread uses 512Kb of stack space that is memory mapped so if

Page 27: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

you have a large number of threads this will add up. Prefer having a threadpool where you can execute tasks instead of having a lot of threads waiting around for small things to be executed. If it’s not possible to have a threadpool try to decrease the size of the stack as 512Kb is quite a lot of memory.

Page 28: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Textures- Use the best compression - Avoid uncompressed textures- Skip the first mipmap

- On ios use pvrtc2bpp where possible - As an anecdote for Asphalt8 we had a lot of discussions with artists

about the compression artefacts of pvrtc2bpp and pvrtc4bpp- And comparing 2 textures with a tool the artefacts were quite visible.- So we made 2 builds one with pvrtc2 and one with pvrtc4- Gave them to the artists and we didn’t tell them which one is which.- And aside the lead artist that notice some small artefacts for the

skybox texture no one could tell that there is any difference.- So we changed back the compression for the skybox and shipped a

version that was 300MB smaller :)- Obviously it probably helped that Asphalt it’s a high speed racing

game with a lot of motion blur so it might not work on any game but I think it’s worth a try.

- A major benefit of this was that bandwidth and memory usage which decreased a lot as pvrtc2 is half the size of a pvrtc4 texture

- Try to avoid uncompressed textures as they are really huge in both memory and disk

- If alpha channel doesn’t compress well you can try to keep the alpha channel as a separate 1 channel uncompressed texture and the RGB part as as different texture and recompose them in the shader.

- This might be smaller than 1 uncompressed texture

- Ideally you should have different data for different devices if you don’t, on devices with a lower resolution and less memory you can load the texture

Page 29: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

- starting from mipmap 1 (mipmap 0 being the highest) right ? we start counting from 0 RIGHT ?. This will save a lot of memory and also loading time as you won’t have to read and upload a big chunk of the file.

Page 30: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Buffers- Allocate large chunks

- Use ranges from a chunk

- Don’t overwrite memory- Don’t delete buffers that are used by the GPU

On ios buffers will take at least a page so 4k on armv7, 16K or arm64 So DO NOT allocate a lot of small buffers as this will have a lot of overhead (that will not be shown by profiling tools)Allocate big buffers (1MB, 2MB, 4MB) and use small slots from it for each vertex, index or constant buffer as needed.Also set correct attributes for buffers that will be uploaded only once (static) and stuff that will be uploaded often(dynamic)Given this complexity, best thing to do is to have is to have one or more types of allocators that will manage the buffers and the slots.

Also keep in mind that some API will copy internally your buffers, newer API will not and you are responsible of not overwriting them or deleting them if they are still in use by the GPU.

Page 31: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Optimizing battery usage- Use less CPU - Use less GPU- Optimize memory transfers

At this point if we have optimized CPU, GPU and bandwidth there is nothing much we could since these are the primary sources of battery consumption. But, Related to cpu:

- Ideally you should let the OS schedule on what core your thread run but, you could also mess around with the affinity of your threads.

- On devices with Little Big architecture you could pin your threads to the little cores to save on battery (provided of course that your application still runs optimally )

- If you have 60fps in the action phase of your game you could try to keep 30fps in the menu so the phone has time to cool down and doesn’t throttle but also doesn’t drain battery

Use less gpu - We already discussed about various optimizations for the gpu so anything

done there should help with battery as well.- If possible you could look into automatically dimming the screen brightness to

save a bit more battery

Optimize memory transfers - Upload only what is needed any only what changes- Don’t upload camera position for each mesh … use a different constant buffer

for camera / scene values

Page 32: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Workflow - Data- Profile your pipeline - Hot reload assets - How long does it take to bake all your data?- Check you dependencies

- Building only what has changed?

- Cache the results - Reuse caches between machines

- Distribute the baking

At this point your application works perfectly, no cpu gpu or battery problems, so there is nothing left to do right ? Well, I do believe that there a few more things that you can do. And I do believe that these are quite important and often overlooked.The bigger your team is the more important is to profile your pipeline. So ask yourself a few questions

- how long does a file take from when it’s saved to disk to when it reaches your tools or even better when it reaches the game

- Can your tools or game hot reload assets ? - How much time will that save and how long it will take you to implement it

- How long does it take to build data - Do you properly check the dependencies between assets?- Are you building assets incrementally as in JUST those that changed?- Are you caching the results of the bake ?- Are the results of a bake from a user available for other users ?- Are you distributing the bake ing process between multiple computers

?If the answer to all of these question is yes then Congratulations. Great Job.

- We manage to drop the bake time for all textures (when compressed with pvrtc best quality) from 50h on a single computer to a couple of hours using a bunch of computers (usually build servers)

- We can add more computers if we need to ship a version in a

Page 33: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

- hurry - This has proven very useful when doing certain compression

tests with different formats or options.

Page 34: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Workflow - Code- Profile code compilation- Distribute compilation- Use compile units

OK at this point we did all this our data pipeline works perfectly and it’s very fast. What’s left ?The last step would be to compile the code and prepare the version. So let’s ask a few more questions

- How long does it take to compile the code?- Can you distribute compilation over a network - fastbuild, ccache, incredibuild

?- Can you improve your compile times?

- First check your headers - there are tools that can do this for you header hero is a good example, include only what you use is another.

- Do not add implementations in headers unless you can prove it’s performance critical

- Explicitly instantiate common template types so they are not instantiated in all our translation units

- For template classes … you can split the definition in header files and implementation in inl files

- You can them manually include the inl file only when you need to specialize a certain type

- Saves a lot in obj sizes and linking - Another useful technique to improve compile time is to use compile units

- If you don’t know about this … some cpp files are created (hopefully automatically) that include all other cpp files

- This saves a lot in compile time, but usually requires more memory as the compiler has to process a lot more code at once

Page 35: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

- A nice advantage of this is that the compiler and linker might have more code visible in the same translation unit which should make the code small and faster

Page 36: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Workflow - Package- One step build- Test each commit - Automate everything

And finally now that the data is baked and the code compiled the final step is to package it all together and ship it !

- At this point you should have setup a build server that does the data baking and code compilation for each commit so you can ensure a stable version

- Ideally you should even run some unit tests to make sure that your application doesn’t crash when running.

- Make sure that you automate as much as possible from the steps that you need to do in order to configure your version

- This will help a lot in avoiding problems that might arise if steps need to be done manually

Page 37: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Conclusions, Conclusions, Conclusions

- Profile- Optimize- Repeat

As for conclusions

Profile as much as you canOptimize as much as time allows it Ideally there should be someone in the team allocated on this full time as you have seen there are a lot of things to do.Do research and try the tools that are available.Don’t be afraid of writing your own tools if it will help in the long run.Do go to conferences and watch presentations, there might be might be a lot of valuable information out there.Don’t forget to have fun and don’t stress !

Page 38: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Thanks and AcknowledgementsThanks Everyone!

I’d like to take a moment to thank you for coming to see this presentation.To thank everyone at gameloft barcelona for their contribution !Our Technical Director Catalin Vasile for inspiring and helping us push things way further than we ever imagined.And last but not least our Studio Manager Sasha for supporting us and for understanding the value of a custom engine and tools.

Page 39: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

We are Hiring !http://www.gameloft.com/corporate/jobs/

And we are working on even cooler stuff for the future so join us :)We are hiring.

Page 40: Tips and lessons - WordPress.com...Tips and lessons - Optimizing CPU usage - Optimizing GPU usage - Optimizing Memory usage - Optimizing Battery usage - Optimizing all things Today

Questions ?

If you have any questions ?