cranking floating point performance up to 11

162
Cranking Floating Point Performance Up To 11 Noel Llopis Snappy Touch http://twitter.com/snappytouch [email protected] http://gamesfromwithin.com

Upload: john-wilker

Post on 18-May-2015

1.560 views

Category:

Technology


1 download

DESCRIPTION

The iPhone has a surprisingly powerful engine under that shiny hood when it comes to floating-point computations. This is something that surprises a lot of programmers because by default, things can slow down a lot whenever any floating point numbers are involved. This session will explain the secrets to unlocking maximum performance for floating point calculations, from the mysteries of Thumb mode, to harnessing the full power of the forgotten vector floating point unit. Stay away from this session if he thought of reading or even (gasp!) writing assembly code scares you.

TRANSCRIPT

Page 1: Cranking Floating Point Performance Up To 11

Cranking Floating Point Performance Up To 11 

Noel LlopisSnappy Touch

http://twitter.com/[email protected]

http://gamesfromwithin.com

Page 2: Cranking Floating Point Performance Up To 11
Page 3: Cranking Floating Point Performance Up To 11
Page 4: Cranking Floating Point Performance Up To 11
Page 5: Cranking Floating Point Performance Up To 11
Page 6: Cranking Floating Point Performance Up To 11
Page 7: Cranking Floating Point Performance Up To 11

Floating Point Performance

Page 8: Cranking Floating Point Performance Up To 11

Floating point numbers

Page 9: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

Page 10: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

Page 11: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

• Following IEEE 754 format

Page 12: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

• Following IEEE 754 format

• Single precision: 32 bits

Page 13: Cranking Floating Point Performance Up To 11

Floating point numbers

• Representation of rational numbers

• 1.2345, -0.8374, 2.0000, 14388439.34, etc

• Following IEEE 754 format

• Single precision: 32 bits

• Double precision: 64 bits

Page 14: Cranking Floating Point Performance Up To 11

Floating point numbers

Page 15: Cranking Floating Point Performance Up To 11

Floating point numbers

Page 16: Cranking Floating Point Performance Up To 11

Why floating point performance?

Page 17: Cranking Floating Point Performance Up To 11

Why floating point performance?

• Most games use floating point numbers for most of their calculations

Page 18: Cranking Floating Point Performance Up To 11

Why floating point performance?

• Most games use floating point numbers for most of their calculations

• Positions, velocities, physics, etc, etc.

Page 19: Cranking Floating Point Performance Up To 11

Why floating point performance?

• Most games use floating point numbers for most of their calculations

• Positions, velocities, physics, etc, etc.

• Maybe not so much for regular apps

Page 20: Cranking Floating Point Performance Up To 11

CPU

Page 21: Cranking Floating Point Performance Up To 11

CPU

• 32-bit RISC ARM 11

Page 22: Cranking Floating Point Performance Up To 11

CPU

• 32-bit RISC ARM 11

• 400-535Mhz

Page 23: Cranking Floating Point Performance Up To 11

CPU

• 32-bit RISC ARM 11

• 400-535Mhz

• iPhone 2G/3G and iPod Touch 1st and 2nd gen

Page 24: Cranking Floating Point Performance Up To 11

CPU (iPhone 3GS)

Page 25: Cranking Floating Point Performance Up To 11

CPU (iPhone 3GS)

• Cortex-A8 600MHz

Page 26: Cranking Floating Point Performance Up To 11

CPU (iPhone 3GS)

• Cortex-A8 600MHz

• More advanced architecture

Page 27: Cranking Floating Point Performance Up To 11

CPU

Page 28: Cranking Floating Point Performance Up To 11

CPU

• No floating point support in the ARM CPU!!!

Page 29: Cranking Floating Point Performance Up To 11

How about integer math?

Page 30: Cranking Floating Point Performance Up To 11

How about integer math?

• No need to do any floating point operations

Page 31: Cranking Floating Point Performance Up To 11

How about integer math?

• No need to do any floating point operations

• Fully supported in the ARM processor

Page 32: Cranking Floating Point Performance Up To 11

How about integer math?

• No need to do any floating point operations

• Fully supported in the ARM processor

• But...

Page 33: Cranking Floating Point Performance Up To 11

Integer Divide

Page 34: Cranking Floating Point Performance Up To 11

Integer Divide

Page 35: Cranking Floating Point Performance Up To 11

Integer Divide

There is no integer divide

Page 36: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

Page 37: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

Page 38: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

Page 39: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

• Can use a fixed-point library.

Page 40: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

• Can use a fixed-point library.

• Performs rational arithmetic with integer values at a reduced range/resolution.

Page 41: Cranking Floating Point Performance Up To 11

Fixed-point arithmetic

• Sometimes integer arithmetic doesn’t cut it

• You need to represent rational numbers

• Can use a fixed-point library.

• Performs rational arithmetic with integer values at a reduced range/resolution.

• Not so great...

Page 42: Cranking Floating Point Performance Up To 11

Floating point support

Page 43: Cranking Floating Point Performance Up To 11

Floating point support

• There’s a floating point unit

Page 44: Cranking Floating Point Performance Up To 11

Floating point support

• There’s a floating point unit

• Compiled C/C++/ObjC code uses the VFP unit for any floating point operations.

Page 45: Cranking Floating Point Performance Up To 11

Sample program

Page 46: Cranking Floating Point Performance Up To 11

Sample program struct Particle { float x, y, z; float vx, vy, vz; };

Page 47: Cranking Floating Point Performance Up To 11

Sample program struct Particle { float x, y, z; float vx, vy, vz; };

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Page 48: Cranking Floating Point Performance Up To 11

Sample program struct Particle { float x, y, z; float vx, vy, vz; };

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

• 7.2 seconds on an iPod Touch 2nd gen

Page 49: Cranking Floating Point Performance Up To 11

Floating point support

Page 50: Cranking Floating Point Performance Up To 11

Floating point support

Trust no one!

Page 51: Cranking Floating Point Performance Up To 11

Floating point support

Trust no one!When in doubt, check the

assembly generated

Page 52: Cranking Floating Point Performance Up To 11

Floating point support

Page 53: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 54: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 55: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

Page 56: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

• Less memory, maybe better performance.

Page 57: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

• Less memory, maybe better performance.

• No floating point support.

Page 58: Cranking Floating Point Performance Up To 11

Thumb Mode• CPU has a special thumb

mode.

• Less memory, maybe better performance.

• No floating point support.

• Every time there’s an fp operation, it switches out of Thumb, does the fp operation, and switches back on.

Page 59: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 60: Cranking Floating Point Performance Up To 11

Thumb Mode

• It’s on by default!

Page 61: Cranking Floating Point Performance Up To 11

Thumb Mode

• It’s on by default!

• Potentially HUGE wins turning it off.

Page 62: Cranking Floating Point Performance Up To 11

Thumb Mode

• It’s on by default!

• Potentially HUGE wins turning it off.

Page 63: Cranking Floating Point Performance Up To 11

Thumb Mode

Page 64: Cranking Floating Point Performance Up To 11

Thumb Mode

• Turning off Thumb mode increased performance in Flower Garden by over 2x

Page 65: Cranking Floating Point Performance Up To 11

Thumb Mode

• Turning off Thumb mode increased performance in Flower Garden by over 2x

• Heavy usage of floating point operations though

Page 66: Cranking Floating Point Performance Up To 11

Thumb Mode

• Turning off Thumb mode increased performance in Flower Garden by over 2x

• Heavy usage of floating point operations though

• Most games will probably benefit from turning it off (especially 3D games)

Page 67: Cranking Floating Point Performance Up To 11
Page 68: Cranking Floating Point Performance Up To 11

2.6 seconds!

Page 69: Cranking Floating Point Performance Up To 11

ARM assemblyDISCLAIMER:

Page 70: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Page 71: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Page 72: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Page 73: Cranking Floating Point Performance Up To 11

ARM assembly

I’m not an ARM assembly expert!!!DISCLAIMER:

Z80!!!

Page 74: Cranking Floating Point Performance Up To 11

ARM assembly

Page 75: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

Page 76: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

• References included in your USB card

Page 77: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

• References included in your USB card

• Or download them from the ARM site

Page 78: Cranking Floating Point Performance Up To 11

ARM assembly

• Hit the docs

• References included in your USB card

• Or download them from the ARM site

• http://bit.ly/arminfo

Page 79: Cranking Floating Point Performance Up To 11

ARM assembly

Page 80: Cranking Floating Point Performance Up To 11

ARM assembly

• Reading assembly is a very important skill for high-performance programming

Page 81: Cranking Floating Point Performance Up To 11

ARM assembly

• Reading assembly is a very important skill for high-performance programming

• Writing is more specialized. Most people don’t need to.

Page 82: Cranking Floating Point Performance Up To 11

VFP unit

Page 83: Cranking Floating Point Performance Up To 11

VFP unitA0

Page 84: Cranking Floating Point Performance Up To 11

VFP unitA0

+

Page 85: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

Page 86: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

=

Page 87: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

Page 88: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

A1

B1+

C1=

Page 89: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

A1

B1+

C1=

A2

B2+

C2=

Page 90: Cranking Floating Point Performance Up To 11

VFP unitA0

B0+

C0=

A1

B1+

C1=

A2

B2+

C2=

A3

B3+

C3=

Page 91: Cranking Floating Point Performance Up To 11

VFP unit

Page 92: Cranking Floating Point Performance Up To 11

VFP unitA0 A1 A2 A3

Page 93: Cranking Floating Point Performance Up To 11

VFP unit

+A0 A1 A2 A3

Page 94: Cranking Floating Point Performance Up To 11

VFP unit

+A0 A1 A2 A3

B0 B1 B2 B3

Page 95: Cranking Floating Point Performance Up To 11

VFP unit

+

=

A0 A1 A2 A3

B0 B1 B2 B3

Page 96: Cranking Floating Point Performance Up To 11

VFP unit

+

=

A0 A1 A2 A3

B0 B1 B2 B3

C0 C1 C2 C3

Page 97: Cranking Floating Point Performance Up To 11

VFP unit

+

=

A0 A1 A2 A3

B0 B1 B2 B3

C0 C1 C2 C3

Sweet! How do we use the vfp?

Page 98: Cranking Floating Point Performance Up To 11

"fldmias %2, {s8-s23} \n\t" "fldmias %1!, {s0-s3} \n\t" "fmuls s24, s8, s0 \n\t" "fmacs s24, s12, s1 \n\t"

"fldmias %1!, {s4-s7} \n\t"

"fmacs s24, s16, s2 \n\t" "fmacs s24, s20, s3 \n\t" "fstmias %0!, {s24-s27} \n\t"

Like this!

Page 99: Cranking Floating Point Performance Up To 11

Writing vfp assembly

Page 100: Cranking Floating Point Performance Up To 11

Writing vfp assembly

• There are two parts to it

Page 101: Cranking Floating Point Performance Up To 11

Writing vfp assembly

• There are two parts to it

• How to write any assembly in gcc

Page 102: Cranking Floating Point Performance Up To 11

Writing vfp assembly

• There are two parts to it

• How to write any assembly in gcc

• Learning ARM and VPM assembly

Page 103: Cranking Floating Point Performance Up To 11

vfpmath library

Page 104: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

Page 105: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

• http://code.google.com/p/vfpmathlibrary

Page 106: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

• http://code.google.com/p/vfpmathlibrary

• Vector/matrix math

Page 107: Cranking Floating Point Performance Up To 11

vfpmath library

• Already done a lot of work for you

• http://code.google.com/p/vfpmathlibrary

• Vector/matrix math

• Might not be exactly what you need, but it’s a great starting point

Page 108: Cranking Floating Point Performance Up To 11

Assembly in gcc

• Only use it when targeting the device

Page 109: Cranking Floating Point Performance Up To 11

Assembly in gcc

• Only use it when targeting the device

#include <TargetConditionals.h>#if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1) #define USE_VFP#endif

Page 110: Cranking Floating Point Performance Up To 11

Assembly in gcc

• The basics

asm (“cmp r2, r1”);

Page 112: Cranking Floating Point Performance Up To 11

Assembly in gcc

• Multiple lines

asm ( “mov r0, #1000\n\t” “cmp r2, r1\n\t”);

Page 113: Cranking Floating Point Performance Up To 11

Assembly in gcc• Accessing C variables

asm (//assembly code : // output operands : // input operands : // clobbered registers);

Page 114: Cranking Floating Point Performance Up To 11

Assembly in gcc• Accessing C variables

asm (//assembly code : // output operands : // input operands : // clobbered registers);

int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );

Page 115: Cranking Floating Point Performance Up To 11

Assembly in gcc• Accessing C variables

asm (//assembly code : // output operands : // input operands : // clobbered registers);

int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );

%0, %1, etc are the variables in order

Page 116: Cranking Floating Point Performance Up To 11

Assembly in gcc

Page 117: Cranking Floating Point Performance Up To 11

Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );

Page 118: Cranking Floating Point Performance Up To 11

Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );

Clobber register list are registers used by

the asm block

Page 119: Cranking Floating Point Performance Up To 11

Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );

Clobber register list are registers used by

the asm block

volatile prevents “optimizations”

Page 120: Cranking Floating Point Performance Up To 11

VFP asmFour banks of 8 32-bit registers each

Page 121: Cranking Floating Point Performance Up To 11

VFP asmFour banks of 8 32-bit registers each

#define VFP_VECTOR_LENGTH(VEC_LENGTH) "fmrx r0, fpscr \n\t" \ "bic r0, r0, #0x00370000 \n\t" \ "orr r0, r0, #0x000" #VEC_LENGTH "0000 \n\t" \ "fmxr fpscr, r0 \n\t"

Page 122: Cranking Floating Point Performance Up To 11

VFP asm

Page 123: Cranking Floating Point Performance Up To 11

VFP asm

Page 124: Cranking Floating Point Performance Up To 11

VFP asmfor (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Page 125: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Page 126: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Was: 2.6 seconds

Page 127: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}

Was: 2.6 secondsNow: 1.4 seconds!!

Page 128: Cranking Floating Point Performance Up To 11

VFP asmLet’s do 6 operations at once!

struct Particle2 { float x0, y0, z0; float x1, y1, z1; float vx0, vy0, vz0; float vx1, vy1, vz1; };

Page 129: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }

Page 130: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds

Page 131: Cranking Floating Point Performance Up To 11

VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds

Now: 1.2 seconds

Page 132: Cranking Floating Point Performance Up To 11

VFP asmWhat’s the loop/cache overhead?

for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }

Page 133: Cranking Floating Point Performance Up To 11

VFP asmWhat’s the loop/cache overhead?

for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }

Was: 1.2 seconds

Page 134: Cranking Floating Point Performance Up To 11

VFP asmWhat’s the loop/cache overhead?

for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }

Was: 1.2 secondsNow: 1.2 seconds!!!!

Page 135: Cranking Floating Point Performance Up To 11
Page 136: Cranking Floating Point Performance Up To 11

Matrix multiply

Page 137: Cranking Floating Point Performance Up To 11

Matrix multiplyStraight from vfpmathlib

Page 138: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 s

Straight from vfpmathlib

Page 139: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 sNormal: 0.096855 s

Straight from vfpmathlib

Page 140: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s

Straight from vfpmathlib

Page 141: Cranking Floating Point Performance Up To 11

Matrix multiply

Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s

About 2x faster!

Straight from vfpmathlib

Page 142: Cranking Floating Point Performance Up To 11

Good use of vfp

Page 143: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

Page 144: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

Page 145: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

Page 146: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

• Physics

Page 147: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

• Physics

• Procedural content generation

Page 148: Cranking Floating Point Performance Up To 11

Good use of vfp

• Matrix operations

• Particle systems

• Skinning

• Physics

• Procedural content generation

• ....

Page 149: Cranking Floating Point Performance Up To 11

What about the 3GS?

Page 150: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 151: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 152: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 153: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 154: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 155: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 156: Cranking Floating Point Performance Up To 11

What about the 3GS?

3G 3GS

Thumb

Normal

VFP1

VFP2

Touch

7.2 8.0

2.6 2.6

1.4 1.30

1.2 0.64

1.2 0.18

Page 157: Cranking Floating Point Performance Up To 11

More 3GS: NEON

Page 158: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

Page 159: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

• Floating point and integer

Page 160: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

• Floating point and integer

• Huge potential

Page 161: Cranking Floating Point Performance Up To 11

More 3GS: NEON

• SIMD coprocessor

• Floating point and integer

• Huge potential

• Very little documentation right now :-(

Page 162: Cranking Floating Point Performance Up To 11

Thank you!

Noel LlopisSnappy Touch

http://twitter.com/[email protected]

http://gamesfromwithin.com