embedded systems programming

Embedded Systems Programming

Writing Optimised C code for ARM

Why write optimised C code?

• For embedded system size and/or speed are of key importance

• The compiler optimisation phase can only do so much

• In order to write optimal C code you need to know details of the underlying hardware and the compiler

What compilers can’t do

• void memclr( char * data, int N)

• {• for (; N > 0; N--)• {• *data=0;• data++;• }• }

• Is N == on first loop?– 0 – 1 is dangerous!

• Is data array 4 byte aligned?– Can store using int

• Is N a multiple of 4?– Could do 4 word

blocks at a time

• Compilers have to be conservative!

An example Program

• The program might seem fine – even resource friendly

• Using a char saves space

• for loops make good assembler

• Lets look at the assembler code

/* program showing inefficient * variable and loop* usage craig Nov 04 */

int checksum_1(int *data){

char i; int sum = 0; for (i =0; i < 64; i++) sum += data[i]; return sum;

}

.text.align 2.global checksum_1.type checksum_1,function

checksum_1:@ args = 0, pretend = 0, frame = 0@ frame_needed = 1, current_function_anonymous_args = 0mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r1, r0mov r0, #0 @ sum = 0mov r2, r0 @ i = 0

.L6:ldr r3, [r1, r2, asl #2] @ data[i]add r0, r0, r3 @ sum = data[i]add r3, r2, #1 @ i ++and r2, r3, #255cmp r2, #63 @ i < 64bls .L6ldmea fp, {fp, sp, pc}

.Lfe1:.size checksum_1,.Lfe1-checksum_1

What is wrong?

• The use of char means that the compiler has to cast to look at 8 bits – using– and r2, r3, #255

• The loop variable requires a register and initialisation

• If the loop is called often then the tests and branch is quite an overhead

Variable sizes

• In general the compiler will use 32bit registers for local variables but will have to cast them when used as 8 or 16 bit values

• If you can, use unsigned ints, if you can’t explicitly cast

• Using signed shorts can be quite a problem for compilers

Watch your shorts!

• The above C code turns into the rather nasty assembler

• The gnu C compiler is very cautious when confronted with short variables

short add( short a, short b){

return a + (b >> 1);}

mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r1, r1, asl #16mov r0, r0, asl #16mov r0, r0, asr #16add r0, r0, r1, asr #17mov r0, r0, asl #16mov r0, r0, asr #16ldmea fp, {fp, sp, pc

Becomes ….

Loops #1

• As well as using a char for a loop counter the loop counter could be redundant

• Terminate loops by counting down to 0 the reduces register usage and means no initialisation

• Use do..while instead of for loops

*/* Program to show efficient use of * variables and loops */int checksum_2(int *data){ int sum = 0, i = 64;

do { sum += *(data++); } while ( --i != 0 ); return sum;

}

Efficient loop C

Efficient loop assembler

checksum_2:@ args = 0, pretend = 0, frame = 0@ frame_needed = 1, current_function_anonymous_args = 0mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r1, r0mov r0, #0 @ sum = 0mov r2, #64 @ i = 64

.L6:ldr r3, [r1], #4 @ *(data++)add r0, r0, r3 @ sum = *(data++)subs r2, r2, #1 @ --ibne .L6ldmea fp, {fp, sp, pc}

Loop unrolling

• If a loop is going to be repeated often then the test and branch can be quite an overhead

• If the loop is a multiple of 4 and is done quite a lot then the loop can be unrolled

• This increases code a size but is more speed efficient

• Sizes that are not multiples of 4 can be done but are less efficient.

An unrolled loop

* Program to show efficient use of * variables and loops & loop unrolling */int checksum_2(int *data){ int sum = 0, i = 64; do { sum += *(data++); sum += *(data++); sum += *(data++); sum += *(data++); i -= 4; } while ( i != 0 ); return sum;}

checksum_2:@ args = 0, pretend = 0, frame = 0@ frame_needed = 1, current_function_anonymous_args = 0mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r2, r0mov r0, #0mov r1, #64

.L6:ldr r3, [r2], #4add r0, r0, r3ldr r3, [r2], #4add r0, r0, r3ldr r3, [r2], #4add r0, r0, r3ldr r3, [r2], #4add r0, r0, r3subs r1, r1, #4bne .L6ldmea fp, {fp, sp, pc}

/* Program to show use of * loop unrolling */int checksum_2(int *data, unsigned int N){ int sum = 0; unsigned int i; for ( i = N/4; i != 0; i--) { sum += *(data++); sum += *(data++); sum += *(data++); sum += *(data++); } for ( i = N&3; i != 0; i--) sum += *(data++); return sum;}

Loop unrolling ! = 4

embedded systems programming

Documents