gameloft and intel: working together to bring high quality

14
Gameloft and Intel: Working together to bring High Quality graphics to x86 Android* By Steve Hughes Introduction Most people, including gamers, pigeonhole computing devices as either desktop or mobile and expect high-end effects on their desktop apps and lower level streamlined effects on their mobile devices. They usually accept the gulf between the devices and don’t complain. However, when I started looking at the 4th generation Intel® Atom™ processor (codenamed Bay Trail) late last year, I realized that it is no lazy piece of HW. In fact, I saw the potential to add some significant desktop-style effects to the right game and with a bit of work produce a real showpiece app to demonstrate its capabilities. After a quick look around, I decided to work with Gameloft on its racing title GT Racing 2 (GTR2). I already knew the team at Gameloft, and they’ve always been eager to go the extra mile to optimize performance and make their games stand out. In this article I will describe the effects implemented by Gameloft in GTR2 and focus on how we managed to fit those effects into the 30 frames per second (FPS) budget we had set ourselves. We were also limited by time since we wanted to show off what we’d achieved at GDC in SF 2014, and the end of 2013 was already fast approaching. The effects The exact effects used in the game were chosen by Gameloft as they knew which effects they most wanted to include. This is only fair, they know their engine and we needed to get the effects in quickly so we could spend time on optimization. Figures 1 and 2 show the before / after images, clearly showing what we managed to do to enhance the image with the extra CPU and GPU time we had on the x86 device.

Upload: others

Post on 27-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gameloft and Intel: Working together to bring High Quality

Gameloft and Intel: Working together to bring High Quality graphics to x86 Android*

By Steve Hughes

Introduction

Most people, including gamers, pigeonhole computing devices as either desktop or mobile and expect

high-end effects on their desktop apps and lower level streamlined effects on their mobile devices. They

usually accept the gulf between the devices and don’t complain. However, when I started looking at the

4th generation Intel® Atom™ processor (codenamed Bay Trail) late last year, I realized that it is no lazy

piece of HW. In fact, I saw the potential to add some significant desktop-style effects to the right game

and with a bit of work produce a real showpiece app to demonstrate its capabilities. After a quick look

around, I decided to work with Gameloft on its racing title GT Racing 2 (GTR2). I already knew the team

at Gameloft, and they’ve always been eager to go the extra mile to optimize performance and make

their games stand out.

In this article I will describe the effects implemented by Gameloft in GTR2 and focus on how we

managed to fit those effects into the 30 frames per second (FPS) budget we had set ourselves. We were

also limited by time since we wanted to show off what we’d achieved at GDC in SF 2014, and the end of

2013 was already fast approaching.

The effects

The exact effects used in the game were chosen by Gameloft as they knew which effects they most

wanted to include. This is only fair, they know their engine and we needed to get the effects in quickly

so we could spend time on optimization. Figures 1 and 2 show the before / after images, clearly showing

what we managed to do to enhance the image with the extra CPU and GPU time we had on the x86

device.

Page 2: Gameloft and Intel: Working together to bring High Quality

Light Shafts To achieve the light shafts effect the sun is rendered to a second render target and a radial blur pass in

the opposite direction to the sun’s position. This is carried out on a low resolution render target in

several passes, and the results look like this:

Figure 3. Initial render of the sun at a low resolution render target. The sun here is occluded by opaque scene objects to get the shape seen here.

Figure 4. Secondly a set of radial blur passes are applied to get the image in the right. From the original partially obscured sun we now get the glow loosely representing airborne particles colliding with direct sunlight.

Figure 1. The normal appearance of GTR2 on existing

ARM* devices. Great models but a bit of a letdown with

normal lighting.

Figure 2. GTR2 on Intel® Atom™ processor-based tablets

showing enhanced visuals from the bloom and light

shaft effects.

Page 3: Gameloft and Intel: Working together to bring High Quality

Figure 5. The blurred image is then added back in to the original frame and the final effect can be seen here. This effect is applied real-time during the race.

Bloom Bloom is a fairly stock effect but easy to get wrong. The object of bloom is to simulate the effect of a

sudden bright light in a scene saturating the image and leaking out into the scene around it.

Bloom is completed in three stages:

1: The original scene is filtered to remove any dark pixels leaving only the bright pixels in the scene. This

image is written to another render target (Figure 6).

Figure 6. Light pixels are extracted from the original image.

Page 4: Gameloft and Intel: Working together to bring High Quality

2: The new render target containing the light pixels is then blurred. This it to simulate the bright pixels

leaching into the surrounding dark pixels (Figure 7).

Figure 7. Light pixel image is then blurred.

3: Finally the blurred light pixel render target is added to the original scene to produce the bloom effect

(Figure 8).

Page 5: Gameloft and Intel: Working together to bring High Quality

Figure 8. Blurred light pixel images are added to original scene with some scaling to produce the final bloom image.

Depth of field

To achieve depth of field, we start with the game scene and apply a horizontal and vertical Gaussian blur

pass.

Figure 9. The original game scene

Page 6: Gameloft and Intel: Working together to bring High Quality

After the two stages, we can see that the whole image is now blurred (Figure 10). We now have a

blurred image and the original sharp image, along with a depth buffer from the original render pass.

The next step is to select a depth value which will be our focal point - such as the center of the car. For

each pixel on the screen, we blend the blurred image and the sharp image based on the difference

between the depth of the current pixel and the focal point depth value. Pixels father away in depth

from the focal point will have greater contribution from the blurred image, while pixels with a depth

value close to the focal point will have a greater contribution from the sharp image.

Figure 10. Blurred out of focus copy of game scene

Figure 11. Depth of field in action on Bay Trail. We left this effect to the menu and other non-game screens, because accurate distance vision is important in racing.

The net result (Figure 11) is a fairly good approximation to depth of field images such as you would get

from a camera.

Page 7: Gameloft and Intel: Working together to bring High Quality

Heat Haze

Figure 12. Heat haze was reserved for the start grid, where it gave a realistic heat feel to the cars before the race start.

Heat haze effects try to simulate the air shimmer you see rising from heated objects in sunlight (Figure

12). The effect is created by applying an animating distortion effect to the original color buffer. To

confine the effect to the region around the car the effect is masked by an alpha channel image (Figure

13).

Figure 13. Heat haze mask generated from the camera viewpoint.

The effect was confined to the starting grid because accurate distance vision is essential to successful

racing.

Page 8: Gameloft and Intel: Working together to bring High Quality

Getting started on Optimization Developers often view game optimization as a path of diminishing returns. By that I mean a lot of work

generally goes in to optimizing a game to an average frame time of 33ms for 30FPS, but generally there

is no point optimizing a game past 30FPS because that is the rate at which it will be expected to run.

However, on mobile devices this is not true. In all we had about 12ms worth of effects to add that would

have increased the frame time to 45ms (nearer 22FPS). This meant we had to remove 12ms from an

already optimized game to achieve our final target frame rate with all the effects turned on.

The place to start in any optimization process is to look at whether the game is GPU or CPU bound. That

is, determine if GPU or CPU code needs optimizing to improve frame time. Using Graphics Performance

Analyzers (GPA), the System Analyzer, we captured data for the following graph:

Figure 14. GPU Busy hovers around 90-100% for most of the race, while the CPU averages around 25%. It’s fairly clear that the

app is GPU bound, which is reasonable for a racing game.

It’s pretty easy to make this graph. Simply add the metrics you want to System Analyzer, then hit the

“CSV” button to dump out the metrics you want to a csv file. You can then load them in to Excel* or

other graphing software.

A lot of developers don’t know that GPA works great on x86 mobile devices. It’s a great set of tools and

well worth looking at.

Drilling down on a frame We captured a number of frames from the game before the effects were added using System Analyzer

then opened them with Frame Analyzer to see what low hanging fruit we could find. Figure 15 shows a

frame of the game before the effects were added that I used a lot in the early stages:

Page 9: Gameloft and Intel: Working together to bring High Quality

Figure 15. Frame is split in to two halves. Some big GPU events occur in the last half.

First and most obvious are the two calls to glClear() in the second half of the frame highlighted in purple

in the frame graph. This is an issue I often find in engines - render targets tend to be cleared first even

though they are going to be fully written to later. Removing these was an easy fix that gave us about

5ms, getting us well on the way.

The big blue bar in between the two glClear() calls is an interesting event. We had been experimenting

with the screen size of earlier development kits that Gameloft received, and with very large screens

(2560x1900) it was more efficient for them to render to a lower res back buffer then upsample to the

full size screen. The event in question is the upsample from the back buffer to the screen. This is a huge

event and needed some scrutiny. What I found here was that most of the time the EUs on the GPU were

stalled waiting for the texture sampler on this erg. This made sense actually because the fragment

shader was very simple and the size of the texture being copied was huge (>8Mb), so naturally the

shader would spend a lot of time waiting for the data it needed in order to complete. This led me to

think that we could probably render to a full-size render target and get rid of the upsample. The net

result was not a performance improvement because what time we gained was used up by rendering to

the larger target. What we did gain was a fair bit of visual improvement.

The last thing of note in this frame was the 4 big ergs labelled A, B, C, and D. You may have noticed that

my approach here is to look at all the big ergs and see what can be done to remove or reduce them.

That’s the best way to get started with Graphics Performance Analyzers. In cases like these 4 ergs, we

could do very little: these are the 4 cars in view in the frame. This is a racing game so it is only right to

devote a fair amount of rendering time to the cars.

Platform Analyzer Investigation.

One place we looked for performance was Platform Analyzer, which is a relatively new tool in GPA. With

Platform Analyzer you can look at the CPU / GPU holistically and see how the queues are managed on

the GPU (Figure 16). I startled to see that we had a problem with the driver that was hurting us:

Page 10: Gameloft and Intel: Working together to bring High Quality

Figure 16. From Platform Analyzer. Horizontal scale is time; the stacked chart at the top shows queue depth on the GPU.

At first it looks like the GPU queues are always full and everything is fine. However, looking closely at the

marked points we noticed that about every 10 frames an event occurred that stalled the whole process

and drained out the queue, grinding almost to a halt before starting up again. We spent some time

looking for a periodical draw call that had some kind of dependency, but it was hard to know what to

look for.

This one turned out to an Intel graphics driver issue. As often happens in prerelease HW, the drivers

were still being worked on. This turned out to be a stall, which had been fixed a few weeks before but

we hadn’t updated the driver because we were otherwise happy with the driver we had. We’re not sure

of the actual improvement we got from a frames-per-second perspective, but we did get a much

smoother frame rate as a result of the driver fix.

Drilling down into the effects At this point we had gained about 5ms frame time and improved the visual quality. We still needed to

find about another 7ms so we decided to look at the effects themselves. We weren’t going to skimp on

visual quality, but since they were all new we thought there might be some performance gains to find.

Page 11: Gameloft and Intel: Working together to bring High Quality

Figure 17. Frame Analyzer capture with bloom and light-shaft effects added. Note that glClears() are gone, so predictably there

is a lot more time spent in the second half of the frame, where all the post processing for the effects takes place.

Looking at this frame (Figure 17) we were drawn to the ergs labelled B and C, which turned out to be

blur and bright passes for the bloom effect. These were consuming 3-4ms each which we figured looked

a little high. After investigation Gameloft made some significant changes here to the effect which

resulted in a significant performance increase.

Firstly, they found that the blur stages were being executed on a full screen texture. This was reduced

to a quarter screensize and the result was that the blur almost dropped from the Frame Analyzer display

all together.

Secondly, the bright pass render target was in full HD. Gameloft found that this could be safely reduced

insize to about half screen without visual changes and gaining another significant increase in

performance.

After the bloom render targets had been optimized and we had gained some performance, we started

to look more closely at the bloom itself. The general consensus was that the bloom looked a little

washed out (Figure 18), so after verifying that the blur and bright pass textures looked ok, we took a

look at the shaders.

The math in the bloom shader looked a little complex, as compared to a typical bloom shader like this

fragment:

lowp vec4 bloom = texture2D(blur, vCoord0) * 1.5 - threshold;

gl_FragColor = bloom * bloomFactor;

As an experiment I used a little known feature of GPA Frame Analyzer where you can modify shaders in

a captured frame and recompile them to see the difference in appearance, performance, etc. It didn’t

take long to invent a shader that did a simple bloom within the confines of the frame (you can change

the source, but you couldn’t touch input or constants in GPA at the time).

The shader ran a tiny bit faster than the original shader, but the significant contribution from the shader

changes was the visual quality. As a result, a new shader was created for the bloom pass which made

the effect significantly better. Compare figure 18 with figure 19 to see the difference we saw.

Page 12: Gameloft and Intel: Working together to bring High Quality

Figure 18. Bloom effect showing the “washed out feel” of the shadows and the rocks on the left.

Figure 19. New bloom, which looks almost HDR compared to the old one.

Page 13: Gameloft and Intel: Working together to bring High Quality

Conclusions

The aim of this project was to take a game already optimized to 30FPS and optimize it further to gain

enough ms per frame to allow room for about 12ms of effects to be added. We managed to pull about

7ms from the game itself and save another 5ms from the effects themselves and as a result of driver

fixes. We managed to prove that modern mobile devices like Bay Trail are capable of executing effects

that previously were preserved for consoles and desktop GPUs. None of what we did would have been

possible without GPA, and without a great working relationship with Gameloft.

About Gameloft

A leading global publisher of digital and social games, Gameloft® has established itself as one of the top

innovators in its field since 2000. Gameloft creates games for all digital platforms, including feature

phones, smartphones, tablets, set-top boxes and connected TVs. Gameloft operates its own established

franchises such as Asphalt®, Real Football®, Modern Combat and Order & Chaos®, and also partners

with major rights holders including Marvel®, Hasbro®, FOX®, Mattel® and Disney®. Gameloft is present

on all continents, distributes its games in over 120 countries and employs over 5,200 developers.

For more information, consult http://www.gameloft.com.

About the Author

Steve is a senior application engineer at Intel, providing technical support to game developers in the areas of 3D graphics enabling and multi-threading solutions on PC and mobile devices. Steve has 14 years of experience as a programmer in the gaming industry where he worked on 11 titles, went through 2 bankruptcies, and generally had a good time. Steve is a keen gamer, writes and plays music, and isn’t a writer!

Page 14: Gameloft and Intel: Working together to bring High Quality

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS

DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL

ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO

SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A

PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER

INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR

ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL

INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not

rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel

reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities

arising from future changes to them. The information here is subject to change without notice. Do not finalize a

design with this information.

The products described in this document may contain design defects or errors known as errata which may cause

the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your

product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature,

may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Software and workloads used in performance tests may have been optimized for performance only on Intel

microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer

systems, components, software, operations, and functions. Any change to any of those factors may cause the

results to vary. You should consult other information and performance tests to assist you in fully evaluating your

contemplated purchases, including the performance of that product when combined with other products.

Any software source code reprinted in this document is furnished under a software license and may only be used

or copied in accordance with the terms of that license.

Intel, the Intel logo, and Intel Atom are trademarks of Intel Corporation in the U.S. and/or other countries.

Copyright © 2014 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.