dco1 performance measurement and improvement lecture 7

Post on 18-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DCO 1

PerformancePerformance Measurement and Measurement and

ImprovementImprovement

Lecture 7Lecture 7

2

Practical Hints

Give you the practical methods to enhance the programming performance

Hidden TroubleFast Allocation and Free

3

Hidden Trouble

First look at the memory allocation, malloc()malloc in printf malloc

Strings

Malloc, memory

allocation

4

malloc in printf

printf causes malloc to be called in the usual implementation.

This can add an unexpected cost.

String manipulation is expensive in general whether it is formatting text as in printf,

reading ASCII text and converting to numbers,

or performing string comparisons

Better not to use printf() Use puts()

5

Malloc

malloc is (or new in C++) expensive.A common solution is to use static or local variables to avoid allocating memory on the heap. Another solution is to keep a list of objects that need to be allocated often. Then, allocation is just a matter of removing an object from the list, and freeing simply inserts the object on the listBetter not to use malloc()

6

Strings

Microsoft Foundation Class (MFC) CString class allocates dynamic memory. This is great if you want to avoid managing memory yourself and you want to avoid nasty bugs due to writing data beyond the end of allocated string memory. On the other hand, if you find that memory allocation is taking significant time in an inner loop, you might want to consider allocating a fixed-length character array as local or static data.

7

Fast Allocation and Free

to obtain faster performance is to use a large block of memory from which smaller chunks are allocated to compute some result. (heap)

After the result is obtained, the entire block is freed.

This is fast because:Memory is allocated simply by incrementing the "free" pointer by the number of bytes you need to allocate. There is no need to free each allocated object;

you free all objects at once by freeing the entire pool at once.

reasons

8

Coding for Speed http://www.abarnett.demon.co.uk/tutorial.html

mainly from this web site

Array Indices Aliases Registers Integers Loop Jamming Dynamic Loop Unrolling Faster for() loops Switch Pointers Early loop breaking Misc Using array indices

There are many ways to speed up

the operation.

9

Array Indices

switch ( queue ) {case 0 :   letter = 'W';   

break; case 1 :   letter = 'S';   

break; case 2 :   letter = 'U';   

break; }

or may be if ( queue == 0 )   letter = 'W'; else if ( queue == 1 )   letter = 'S'; else   letter = 'U';

An example

using switch and

if-else

10

Array Indices

A quicker method is to simply use the value as an index into a character array, eg.

static char *classes="WSU"; letter = classes[queue];

In this case, class[0] means W, class[1] means S and class[2] means U

11

Aliases (1)

void func1( int *data ) {     int i; for(i=0; i<10; i++)     {           

somefunc2( *data, i);   } }

Not very good

12

Aliases – better change to this

void func1( int *data ){    

int i;     int localdata;     localdata = *data;     for(i=0; i<10; i++)     {           

somefunc2( localdata, i);     }

}

Better way

13

Registers – computer is good at register allocation

Use the "register" declaration whenever you can, eg.

register float  val; register double dval; register int    ival;

This will be fster

14

Integers

Use unsigned ints instead of ints if you know the value will never be negative.

Unsigned int a; is better then int a;

Some processors can handle unsigned integer arithmetic considerably faster than signed eg.

unsigned int i; instead of int iInteger arithmetic is faster than floating-point operation

15

Loop Jamming

Never use two loops where one is enough: for(i=0; i<100; i++) {    

stuff(); } for(i=0; i<100; i++) {    

morestuff(); }

Better combine

them

16

Loop Jamming

It would be better to do:

for(i=0; i<100; i++) {     stuff();     morestuff();

}

17

Example – three loops (0.36ms)

18

Example – one loop (0.31ms)

19

Loop Unrolling and Dynamic Loop Unrolling

for(i=0; i<3; i++) {     something(i);

}is less efficient than something(0); something(1); something(2);It is because the code has to check and increment the value of i.

20

Example – two for loops (0.96ms)

21

Example – one for loop (0.52ms)

22

Faster for loop

Ordinarily, you would code a simple for() loop like this:

for( i=0;  i<10;  i++){ ... }

i loops through the values 0,1,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 If you don't care about the order of the loop counter, you can do this instead:

for( i=10; i--; ) { ... }10, 9, 8, 7,……..

Decrement is faster

23

Faster for loop

The syntax is a little strange, but is perfectly legal. The same effect could also be gained by coding:

for(i=10; i; i--){……}

or (to expand it further) for(i=10; i!=0; i--){……}

24

Example – int and increment (1.51ms)

25

Example– unsigned int, decrement (1.29ms)

26

Use witch() instead of if...else...

For large decisions involving if...else...else..., like this:

if( val == 1)     dostuff1();

else if (val == 2)     dostuff2();

else if (val == 3)     dostuff3();

it may be faster to use a switch: switch( val )

{     case 1: dostuff1(); break;     case 2: dostuff2(); break;     case 3: dostuff3(); break; }

Better change to case

27

Pointers

Whenever possible, pass structures by reference ( ie. pass a pointer to the structure )

void print_data( const bigstruct  *data_pointer)

{     ...printf contents of structure... }

28

Early loop breaking

This loop searches a list of 10000 numbers to see if there is a -99 in it. found = FALSE; for(i=0;i<10000;i++) {     if( list[i] == -99 )     {         found = TRUE;     } } if( found ) printf("Yes, there is a -99. Hooray! \n");

This works well but searches the whole list.

29

Early loop breaking

A better way is to abort the search when it is found.

found = FALSE; for(i=0; i<10000; i++) {     if( list[i] == -99 )     {         found = TRUE;         break;     } } if( found ) printf("Yes, there is a -99. Hooray!\n");

30

Suggestion (1)

Avoid using ++ and -- etc. within loop expressions, eg. while(n--){}, as this can sometimes be harder to optimise. Minimize the use of global variables. Declare anything within a file (external to functions) as static, unless it is intended to be global. Use word-size variables if you can, as the machine can work with these better ( instead of char, short, double, bitfields etc. ).

31

Suggestion (2)

Don't use recursion. Recursion can be very elegant and neat, but creates many more function calls which can become a large overhead. Avoid the sqrt() square root function in loops - calculating square roots is very CPU intensive. Single dimension arrays are faster than multi-dimensioned arrays. (a[16] is better than a[4][4])Compilers can often optimise a whole file - avoid splitting off closely related functions into separate files, the compiler will do better if can see both of them together (it might be able to inline the code, for example).

32

Example - without recursion

33

Example - with recursion (366 ms), I already reduced the number of recursions

34

Suggestion (3)

Single precision maths may be faster than double precision - there is often a compiler switch for this. (float is better than double unless you really want it.)Floating point multiplication is often faster than division - use val * 0.5 instead of val / 2.0. Addition is quicker than multiplication - use val + val + val instead of val * 3 puts() is quicker than printf(), although less flexible.

35

Example - float (4 bytes) and double (8 bytes)

36

Suggestion (4)Use #defined macros instead of commonly used tiny functions - sometimes the bulk of CPU usage can be tracked down to a small external function being called thousands of times in a tight loop. Replacing it with a macro to perform the same job will remove the overhead of all those function calls, and allow the compiler to be more aggressive in it's optimisation.. Binary/unformatted file access is faster than formatted access, as the machine does not have to convert between human-readable ASCII and machine-readable binary. If you don't actually need to read the data in a file yourself, consider making it a binary file.

37

Summary

It is better to write a simple but fast program.

There are many ways to speed up the operation in programming.

top related