embedded systems programming
DESCRIPTION
Embedded Systems Programming. Writing Optimised C code for ARM. Why write optimised C code?. For embedded system size and/or speed are of key importance The compiler optimisation phase can only do so much - PowerPoint PPT PresentationTRANSCRIPT
Embedded Systems Programming
Writing Optimised C code for ARM
Why write optimised C code?
• For embedded system size and/or speed are of key importance
• The compiler optimisation phase can only do so much
• In order to write optimal C code you need to know details of the underlying hardware and the compiler
What compilers can’t do
• void memclr( char * data, int N)
• {• for (; N > 0; N--)• {• *data=0;• data++;• }• }
• Is N == on first loop?– 0 – 1 is dangerous!
• Is data array 4 byte aligned?– Can store using int
• Is N a multiple of 4?– Could do 4 word
blocks at a time
• Compilers have to be conservative!
An example Program
• The program might seem fine – even resource friendly
• Using a char saves space
• for loops make good assembler
• Lets look at the assembler code
/* program showing inefficient * variable and loop* usage craig Nov 04 */
int checksum_1(int *data){
char i; int sum = 0; for (i =0; i < 64; i++) sum += data[i]; return sum;
}
.text.align 2.global checksum_1.type checksum_1,function
checksum_1:@ args = 0, pretend = 0, frame = 0@ frame_needed = 1, current_function_anonymous_args = 0mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r1, r0mov r0, #0 @ sum = 0mov r2, r0 @ i = 0
.L6:ldr r3, [r1, r2, asl #2] @ data[i]add r0, r0, r3 @ sum = data[i]add r3, r2, #1 @ i ++and r2, r3, #255cmp r2, #63 @ i < 64bls .L6ldmea fp, {fp, sp, pc}
.Lfe1:.size checksum_1,.Lfe1-checksum_1
What is wrong?
• The use of char means that the compiler has to cast to look at 8 bits – using– and r2, r3, #255
• The loop variable requires a register and initialisation
• If the loop is called often then the tests and branch is quite an overhead
Variable sizes
• In general the compiler will use 32bit registers for local variables but will have to cast them when used as 8 or 16 bit values
• If you can, use unsigned ints, if you can’t explicitly cast
• Using signed shorts can be quite a problem for compilers
Watch your shorts!
• The above C code turns into the rather nasty assembler
• The gnu C compiler is very cautious when confronted with short variables
short add( short a, short b){
return a + (b >> 1);}
mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r1, r1, asl #16mov r0, r0, asl #16mov r0, r0, asr #16add r0, r0, r1, asr #17mov r0, r0, asl #16mov r0, r0, asr #16ldmea fp, {fp, sp, pc
Becomes ….
Loops #1
• As well as using a char for a loop counter the loop counter could be redundant
• Terminate loops by counting down to 0 the reduces register usage and means no initialisation
• Use do..while instead of for loops
*/* Program to show efficient use of * variables and loops */int checksum_2(int *data){ int sum = 0, i = 64;
do { sum += *(data++); } while ( --i != 0 ); return sum;
}
Efficient loop C
Efficient loop assembler
checksum_2:@ args = 0, pretend = 0, frame = 0@ frame_needed = 1, current_function_anonymous_args = 0mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r1, r0mov r0, #0 @ sum = 0mov r2, #64 @ i = 64
.L6:ldr r3, [r1], #4 @ *(data++)add r0, r0, r3 @ sum = *(data++)subs r2, r2, #1 @ --ibne .L6ldmea fp, {fp, sp, pc}
Loop unrolling
• If a loop is going to be repeated often then the test and branch can be quite an overhead
• If the loop is a multiple of 4 and is done quite a lot then the loop can be unrolled
• This increases code a size but is more speed efficient
• Sizes that are not multiples of 4 can be done but are less efficient.
An unrolled loop
* Program to show efficient use of * variables and loops & loop unrolling */int checksum_2(int *data){ int sum = 0, i = 64; do { sum += *(data++); sum += *(data++); sum += *(data++); sum += *(data++); i -= 4; } while ( i != 0 ); return sum;}
checksum_2:@ args = 0, pretend = 0, frame = 0@ frame_needed = 1, current_function_anonymous_args = 0mov ip, spstmfd sp!, {fp, ip, lr, pc}sub fp, ip, #4mov r2, r0mov r0, #0mov r1, #64
.L6:ldr r3, [r2], #4add r0, r0, r3ldr r3, [r2], #4add r0, r0, r3ldr r3, [r2], #4add r0, r0, r3ldr r3, [r2], #4add r0, r0, r3subs r1, r1, #4bne .L6ldmea fp, {fp, sp, pc}
/* Program to show use of * loop unrolling */int checksum_2(int *data, unsigned int N){ int sum = 0; unsigned int i; for ( i = N/4; i != 0; i--) { sum += *(data++); sum += *(data++); sum += *(data++); sum += *(data++); } for ( i = N&3; i != 0; i--) sum += *(data++); return sum;}
Loop unrolling ! = 4