gcc compiler as a performance testing tool for c programs
TRANSCRIPT
Symbiosis International University Symbiosis Institute of Computer Studies and Research
SICSR
Verification and Validation Techniques:
By: Daniel Ilunga Musenge
Performance Testing with GCC
Compiler
Academic year: 2014 - 2015
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
2
Abstract This paper introduces you to GCC Compiler as a Performance testing tool. GCC
Compiler has never been used before as a Performance Testing, but in this
paper we’ll be discussing various points on how this can be implemented based
on the objective of a Performance testing.
In performance testing, we don’t focus on the bugs in the code that we are
testing, but its purpose being of removing the bottlenecks in the codes by taking
under consideration parameters like CPU time, memory usage, Speed, Stability,
scalability and so on.
To achieve this, we have used the GCC optimization options; different levels of
optimization are also discussed as well as a comparative study between manual
testing and automatic testing (by using Unix/Linux commands).
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
3
1. GCC: Short for GNU Compiler Collection, GCC is a collection of programming compilers
including C, C++, Objective-‐C, Fortran, Java, and Ada. And the current release of GCC
compiler is the 4.9.2 version.
2. Performance Testing:
• Software performance testing is a means of quality assurance (QA). It
involves testing software applications to ensure they will performs
• Features and Functionality supported by a software system is not the only
concern. A software application's performance like its response time, do
matter. The goal of performance testing is not to find bugs but to
eliminate performance bottlenecks. Bottlenecks are obstructions in
system, which degrade overall system performance. Bottlenecking is
when either coding errors or hardware issues cause a decrease of
throughput under certain loads. Bottlenecking is often caused by one faulty
section of code. The key to fixing a bottlenecking issue is to find the section of
code that is causing the slow down and try to fix it there. We usually
eliminate the performance bottlenecks by either fixing poor running
processes or adding additional hardware.
• The focus of Performance testing is checking a software program's:
– Speed – Here we check whether the application / software responds
quickly or not.
– Scalability -‐ Determines maximum user load the software application
can handle. A software product suffers from poor scalability when it
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
4
cannot handle the expected number of users or when it does not
accommodate a wide enough range of users. Load testing should be
done to be certain the application could handle the anticipated
number of users.
– Stability -‐ Determines if the application is stable under varying loads
as well under their expected workload.
• Some common performance bottlenecks are:
– CPU utilization
– Memory utilization
– Operating System limitations etc.
à GCC as a Performance Testing tool: As we all know, GCC is never mentioned to be a testing tool. In this article, we’ll be using GCC as a testing tool for “Performance Testing” in particular apart from many other testing techniques that we can use with the help of GCC, such as Black box Testing, which we are not demonstrating here. To begin, I would like to start with how we can execute C programs using the GCC compilers, just for a quick review: In your terminal: dimsconsultants$ gcc test.c –o test The above line will make the gcc compiler use the ‘test as the output file with the help of the ‘-‐o’ option and test.c is the C file out of which test.exe will be created. As mentioned above, the goal of performance testing is not to find bugs but to eliminate performance bottlenecks. And we shall look at the few following parameters for performance Testing:
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
5
• Processor Usage -‐ amount of time processor spends executing non-‐idle threads.
• Memory use -‐ amount of physical memory available to processes on a computer.
• Response time – is a time a system or functional unit takes to react to a given input. In other words, we can say that the response time is the total amount of time it takes to respond to a request for service. That service can be anything from a memory fetch, to a disk IO, to a complex database query, or loading a full web page etc.
• Throughput -‐ rate a computer or network receives requests per second.
• Top waits -‐ are monitored to determine what wait times can be cut down when dealing with, how fast data is retrieved from memory
• Thread counts -‐ An applications health can be measured by the no. of threads that are running and currently active.
• Garbage collection -‐ has to do with returning unused memory back to the system. Garbage collection needs to be monitored for efficiency etc. 1. To check the speed of you C program, we use two techniques:
a) Manual time calculation as in the below example:
Test programs: a) test01.c program #include <stdio.h> #include <time.h> //function declaration void sum(int no1, int no2); int main(){ clock_t start = clock(); //starting time int num1=10; int num2=10; sum(num1,num2);
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
6
clock_t end = clock(); //end time clock_t elapsed = (end -‐ start) / CLOCKS_PER_SEC; printf("\nElapse time: %.8f\n",elapsed); return 0; } void sum(int no1, int no2){ int num1 = no1; int num2 = no2; printf("%d",(num1+num2)); }
b) test02.c program #include <stdio.h> #include <time.h> //function declaration void sum(short int no1, short int no2); int main(){ clock_t start = clock(); //starting time short int num1=10, num2 = 10; sum(num1,num2); clock_t end = clock(); //end time clock_t elapsed = (end -‐ start) / CLOCKS_PER_SEC; printf("\nElapse time: %.8f\n",(float)elapsed); return 0; } void sum(short int no1, short int no2){ printf("%d",(no1+no2)); }
c) test03.c program: #include <stdio.h> #include <stdlib.h> #include <time.h>
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
7
int main(int argc, char*argv[]) { FILE *fp; char ch; int count=0;
printf("\n Reading a file...\n"); clock_t start = clock(), end; fp=fopen("objective.txt","r"); if(fp==NULL) { printf("\n File not found!fp=NULL\n\n"); } else { //using while will remove that extra character that comes after reaching the end of file ch=fgetc(fp);//to at least have one character to see whether enter into a loop or not while(ch!=EOF) { //putchar(ch); ch=fgetc(fp); ++count; } printf("\n the no of char is = %d\n\n",count); } fclose(fp);//should be used to close the file. printf ("Calculating...\n"); end = clock() -‐ start; printf("\n\nElapsed time: %.8f\n\n",((float)end)/CLOCKS_PER_SEC );
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
8
return 0; }
d) test04.c program: #include <stdio.h> #include <stdlib.h> #define SIZE 30 void swap(int *x,int *y); void selection_sort(int* a, const int n); void display(int a[],int size); void main() { int a[SIZE] = {8,5,2,3,1,6,9,4,0,7,8,5,2,3,12,36,9,44,102,71,8,45,22,3,11,66,9,4,0,7}; int i; printf("The array before sorting:\n"); display(a,SIZE); selection_sort(a,SIZE); printf("The array after sorting:\n"); display(a,SIZE); } /* swap two integers */ void swap(int *x,int *y) {
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
9
int temp; temp = *x; *x = *y; *y = temp; } /* perform selection sort */ void selection_sort(int* a,const int size) { int i, j, min; for (i = 0; i < size -‐ 1; i++) { min = i; for (j = i + 1; j < size; j++) { if (a[j] < a[min]) { min = j; } } swap(&a[i], &a[min]); } } /* display array content */ void display(int a[],const int size) { int i; for(i=0; i<size; i++) printf("%d ",a[i]); printf("\n"); }
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
10
e) test05.c program: /** * Test C program * Author: Daniel Ilunga * Edited: 02/02/2015 * Version: 1.0 */ #include <stdio.h> #include <stdlib.h> #include <math.h> #include "myctype.h" #include <time.h> void twinprime(); int prime(); int frequency_of_primes (int n) ; void str_comb(); void russianMethod(); void reverseArray(); void toupperCase1(); int main(){ clock_t start = clock(), end; //twin prime twinprime(); //str_comb str_comb(); //Russian method of multiplication russianMethod();
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
11
reverseArray(); //reverse array function toupperCase1(); //toupperCase1() function printf ("Calculating...\n"); long int fq; fq = frequency_of_primes (499999); printf ("\nThe number of primes lower than 500,000 is: %ld\n",fq); end = clock() -‐ start; printf("\n\nElapsed time: %.8f\n\n",((float)end)/CLOCKS_PER_SEC ); return 0; } void twinprime() { int no1=11; int no2=13; printf("\n ***** Twin Prime number Program ****** \n"); //printf("\nEnter first number = "); //scanf("%d",&no1); //printf("\nEnter second number = "); //scanf("%d",&no2); int diff=0; if(no1>no2) { diff=no1-‐no2; if(diff==2) { if(prime(no1) && prime(no2)) { printf("\n\nTwine Pime!!\n\n"); } else{ printf("\n\nNot twine prime!!\n\n"); } }
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
12
} else{ if(no1==no2) { printf("\n\nNot twine prime!!\n\n"); } else{ diff=no2-‐no1; if(diff==2) { if(prime(no1) && prime(no2)) { printf("\n\nTwine Pime!!\n\n"); } else{ printf("\n\nNot twine prime!!\n\n"); } }else{ printf("\n\nNot twine prime!!\n\n"); } } } } int prime(int number) { int cnt=2, flag=0; if(number==2) { //printf("Prime"); return number; } else{ while(cnt<number/2) { if(number%cnt!=0) { flag = 1; } cnt++; }
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
13
if(flag==1) return 0; else return 1; } } int frequency_of_primes (int n) { printf("\n\n**** Count of all Prime Numbers < 500000 ****\n\n"); int i,j; int freq=n-‐1; for (i=2; i<=n; ++i) for (j=sqrt(i);j>1;-‐-‐j) if (i%j==0) { -‐-‐freq; break; } return freq; } void str_comb(){ int cnt=4; printf("\n ***** String Combination Program ****** \n\n"); //printf("Enter no char = "); //scanf("%d",&cnt); char str[] = {'a','b','c','d'}; //printf("Enter a string = "); //scanf("%s",str); for (int i = 0; i < cnt; ++i) { printf("%c ",str[i] );
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
14
} printf("\n"); char temp=0; int i=0; while (i<cnt) { for (int j = 0; j < cnt-‐1; ++j) { temp=str[j]; str[j]=str[j+1]; str[j+1]=temp; for (int d = 0; d < cnt; ++d) { printf("%c ",str[d] ); } printf("\n"); } i++; } } void russianMethod(){ int num1=20, num2=13, num3; int j=0;//counter variable int result=0; printf("\n ***** Russian Method of Multiplication Program ****** \n\n"); //printf("\n Enter the first number = "); //scanf("%d",&num1); //printf("\n Enter the second number = "); //scanf("%d",&num2); num3=num1;//to keep track of the initial value of num2 int count=0; while(num1>0) {
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
15
num1=num1/2; ++count; }//finding the size that we can use for storing the values in the arrays //printf("count= %d",count); int arrdiv[count]; int arrmul[count]; arrdiv[0]=num3; arrmul[0]=num2; for(j=1;j<count;j++) { num3=num3/2; num2=num2*2; arrdiv[j]=(num3); arrmul[j]=(num2); } for(j=0;j<count;j++) { if(arrdiv[j]%2!=0) { result=result+arrmul[j]; } } printf("\n Using -‐ Russian method of multiplication the result is = %d\n\n",result); } void toupperCase1() { printf("\n\n**** ToUpperCase ****\n\n"); int EXIT_SIZE = 15, START_SIZE = 0;//the size of the array //printf("\n Enter the size of the array = "); //scanf("%d",&EXIT_SIZE); int i=0,count=0; int cnt=START_SIZE;//initialisation of cnt char *ptr;//declaration of the pointer that will point at the array char *temp; ptr=(char*)(malloc(EXIT_SIZE*sizeof(char)));
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
16
temp=ptr; printf("Enter your string = "); scanf("%s",ptr);//reading the elements of the array for(i=START_SIZE;i<EXIT_SIZE;++i) { *(temp+i)=*(ptr+i); ++count; } for(cnt=START_SIZE,i=START_SIZE;cnt<EXIT_SIZE;++cnt,++i) { if((*(ptr+i))>=97 && (*(ptr+i))<=122) { *(ptr+i)=(*(ptr+i)-‐32); } if((*(ptr+i))>=65 && (*(ptr+i))<=90) { *(ptr+i)=*(temp+i); } } printf("\n%s\n\n",ptr); printf("The length of the array is = %d\n\n",count); } void reverseArray(){ printf("\n\n**** Reverse array ****\n\n"); int START_SIZE = 0, EXIT_SIZE = 20;//the size of the array printf("\n Enter the size of character array = "); scanf("%d",&EXIT_SIZE); int i=0,count=0; int cnt=START_SIZE;//initialisation of cnt char *ptr;//declaration of the pointer that will point at the array
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
17
char *temp; ptr=(char*)(malloc(EXIT_SIZE*sizeof(char)));//assigning memory that will be pointed by ptr temp=ptr; printf("Enter your string = "); scanf("%s",ptr);//reading the elements of the array //ptr=temp;//to make the temp pointer point at the initial element ptr=ptr-‐EXIT_SIZE; for(i=(EXIT_SIZE-‐1),cnt=START_SIZE;i>=START_SIZE;-‐-‐i,++cnt) { *(ptr+i)=*(temp+cnt); } START_SIZE = 0; for(cnt=START_SIZE;cnt<EXIT_SIZE;++cnt) { printf("%c",*(ptr+cnt)); ++count; } printf("\n\nThe length of the array is %d\n\n",count); } à Using the time command, will return the output and the elapsed manually calculated in the test.c program depending on the operating system on which it runs as well as the machine characteristics. dimsconsultants$ gcc test.c –Wall –o dims
dimsconsultants$ ./dims
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
18
Output: Elapse time: 0.00000000 -‐ this elapsed time is equivalent to the user time if performed automatically with the help of the time command. -‐ the “-‐Wall” option is to enable the warnings in the program. To silence the warning we should use “-‐w”, and remember both options are case sensitive.
f) Automatic time calculation using the time command as: dimsconsultants$ gcc test.c –Wall –o dims
dimsconsultants$ time ./dims # will return the time with three components: Output: -‐> real 0m0.005s -‐> user 0m0.001s -‐> sys 0m0.002s Note: -‐ the above program is executed on a machine with the following characteristics:
-‐ Processor: 2.3 GHz Intel Core i7 -‐ Memory: 16 GB 1600 MHz DDR3
Which also means that the output will vary based on the system the program is tested or executed. For instance, on a multi-‐processor machine, a multi-‐threaded process might have an elapsed time lesser than the total CPU time – due to the fact
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
19
that, threads or processes may run in parallel and also because the time statistics reported come from different origins, so the recorded time for every small running task can be subject to rounding errors, etc. The meaning of the above output is explained in the below lines. -‐> Real, User and Sys process time statistics: One of these things is not like the other. Real refers to actual elapsed time; User and Sys refer to CPU time used only by the process. • Real is wall clock time -‐ time from start to finish of the call. This is all elapsed time
including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
• User is the amount of CPU time spent in user-‐mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.
• Sys is the amount of CPU time spent in the kernel within the process. This means
executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-‐space. Like 'user', this is only CPU time used by the process. See below for a brief description of kernel mode (also known as 'supervisor' mode) and the system call mechanism.
To find out the CPU time how process has used for its execution, we’ll have to sum the User time and The System time -‐ User + Sys. We must remember also that this is on all CPUs, so if the process has multiple threads, it could exceed the time reported by Real. In The output we get, the figures include the User and Sys time of all child processes or threads associated to the process. à More about 'sys' Normally, there are many things that the programmers code can’t do in the user mode, such as allocating memory or accessing or requesting some I/o. Everything within an operating system is under supervision of the kernel. Some operations like malloc, calloc, fread / fwrte etc. will invoke the kernel functions and will be counted under ‘sys’ time. But, this does not mean, "every call to malloc will be
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
20
counted in 'sys' time". The call to malloc will do some processing of its own (still counted in 'user' time) and then somewhere along the way call the function in kernel (counted in 'sys' time). After returning from the kernel call, there will be some more time in 'user' and then malloc will return to your code. As for when the context switches happen, and how much time associated to it is counted in the kernel mode (sys time). It depends on the implementation of the library. Also, other seemingly innocent functions might also use malloc and the like in the background, which will again have some time in 'sys' time.
How do I test the Performance of my C Code? There are two techniques to achieve this:
• Enable a higher level of compiler optimization (-‐O in gcc), which we are going to discuss about with more details on how this can be done.
• And turn off the debug option if it's enabled (-‐g in gcc). Depending on your workload, this may improve performance by 10 to 50%. We can also use the (-‐pg in gcc) to reduce the execution time of our code. By default, the debugging mode is enabled.
à Brief introduction to different levels of optimization: There are seven -‐O settings or seven levels of optimization: -‐O0, -‐O1, -‐O2, -‐O3, -‐Os, Og, and -‐Ofast. Only use one of option at a time. Some users boast about even better performance obtained by using -‐O4, -‐O9, and so on, but the reality is that -‐O levels higher than 3 have no effect. The compiler may accept CFLAGS like -‐O4, but it actually doesn't do anything with them. It only performs the optimizations for -‐O3, nothing more. Now, we’ll be learning the different level of optimization one by one and what they do: With the exception of -‐O0, the -‐O settings each activate several additional flags, so be sure to read the GCC manual's chapter on optimization options to learn which flags are activated at each -‐O level, as well as some explanations as to what they do.
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
21
Let us examine each optimization level: -‐O0: This level (that is the letter "O" followed by a zero) turns off optimization entirely and is the default if no -‐O level is specified in CFLAGS or CXXFLAGS. This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled. This option is not recommended except for debugging purposes. -‐O1: the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It is basic, but it should get the job done all the time. -‐O2: A step up from -‐O1. The recommended level of optimization unless the system has special needs. -‐O2 will activate a few more flags in addition to the ones activated by -‐O1. With -‐O2, the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time. -‐O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -‐O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -‐O3 is also known to break several packages. Using -‐O3 is not recommended. -‐Os: optimizes code for size. It activates all -‐O2 options that do not increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or CPUs with small cache sizes. -‐Og: In GCC 4.8, a new general optimization level, -‐Og, has been introduced. It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of runtime performance. Overall experience for development should be better than the default optimization level -‐O0. Note that -‐Og does not imply -‐g, it simply disables optimizations that may interfere with debugging. -‐Ofast: New in GCC 4.7, consists of -‐O3 plus -‐ffast-‐math, -‐fno-‐protect-‐parens, and -‐fstack-‐arrays. This option breaks strict standards compliance, and is not recommended for use.
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
22
As previously mentioned, -‐O2 is the recommended optimization level. If package compilation fails and while not using -‐O2, try rebuilding with that option. -‐pipe A common flag is -‐pipe. This flag has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory. On systems with low memory, GCC might get killed. In those cases do not use this flag. Now, let say we do: dimsconsultants$ gcc test.c –o dims
è use the time command to view how much time this gives you: dimsconsultants$ time ./dims -‐ Remember that the above commands are tested on a machine with the
following characteristics:
-‐ Processor: 2.3 GHz Intel Core i7 -‐ Memory: 16 GB 1600 MHz DDR3
Therefore, the output might different from one machine to another. But, the concept will remain the same.
Table 1: Speed Test – before removing the bottlenecks Program Response Throughput CPU Time System time
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
23
time (User) (Real) (User + Sys) (Sys) test01.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s
test02.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s
test03.c 0m0.005s 0m0.009s 0m0.007s 0m0.002s
test04.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s
test05.c 0m0.535s 0m9.702s 0m0.537s 0m0.002s
Remember that the goal of performance testing is not to find bugs but to
eliminate performance bottlenecks. A good question should be, how do we
eliminate the performance bottlenecks?
We’ll use optimization levels to eliminate the bottlenecks in our programs without
we doing any change within our codes but making them better code with good
performance:
dimsconsultants$ gcc test.c –g –o dims #or also the below can be used: dimsconsultants$ gcc test.c –o dims –O1 #first level of optimization
è use the time command to view how much time this gives you: dimsconsultants$ time ./dims
Table 2: Speed Test – after removing the bottlenecks: -‐O1 (level) Program Response
time (User) Throughput
(Real) CPU Time (User + Sys)
System time (Sys)
test01.c 0m0.001s 0m0.004s 0m0.003s 0m0.002s
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
24
test02.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s
test03.c 0m0.005s 0m0.008s 0m0.007s 0m0.002s
test04.c 0m0.001s 0m0.004s 0m0.003s 0m0.002s
test05.c 0m0.465s 0m6.600s 0m0.468s 0m0.003s
Table 3:
Program
Before removing bottlenecks After removing bottlenecks User Real CPU Sys User Real CPU Sys
test01.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s 0m0.001s 0m0.004s 0m0.003s 0m0.002s
test02.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s 0m0.001s 0m0.004s 0m0.003s 0m0.002s
test03.c 0m0.005s 0m0.009s 0m0.007s 0m0.002s 0m0.005s 0m0.008s 0m0.007s 0m0.002s
test04.c 0m0.001s 0m0.005s 0m0.003s 0m0.002s 0m0.001s 0m0.004s 0m0.003s 0m0.002s
test05.c 0m0.535s 0m9.702s 0m0.537s 0m0.002s 0m0.465s 0m6.600s 0m0.468s 0m0.003s
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
25
è Comparison of different levels of optimization:
Program Level Response time (User)
Throughput (Real)
CPU Time (User + Sys)
System time (Sys)
test05.c
-‐00 0m0.559s 0m4.879s 0m0.540s 0m0.003s
-‐01 0m0.465s 0m6.600s 0m0.468s 0m0.003s
-‐02 0m0.460s 0m5.299s 0m0.462s 0m0.002s
-‐03 0m0.467s 0m6.179s 0m0.470s 0m0.003s
-‐O4 0m0.467s 0m5.066s 0m0.470s 0m0.003s
-‐0s 0m0.458s 0m5.129s 0m0.460s 0m0.002s
-‐0fast 0m0.449s 0m6.301s 0m0.451s 0m0.002s
Other options
-‐g 0m0.548s 0m5.615s 0m0.550s 0m0.002s
-‐pg 0m0.528s 0m6.179s 0m0.531s 0m0.003s
-‐pipe 0m0.543s 0m4.812s 0m0.545s 0m0.002s
à Things to remember:
Ø -‐O1: The purpose of the first level of optimization is to produce an optimized image in a short amount of time. These optimizations typically don't require significant amounts of compile time to complete. dimsconsultants$ gcc -‐O1 -‐o test test.c Any optimization can be enabled outside of any level simply by specifying its name with the -‐f prefix, as: dimsconsultants$ gcc -‐fdefer-‐pop -‐o test test.c
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
26
We also could enable level 1 optimization and then disable any particular optimization using the -‐fno-‐ prefix, like this: dimsconsultants$ gcc -‐O1 -‐fno-‐defer-‐pop -‐o test test.c This command would enable the first level of optimization and then specifically disable the defer-‐pop optimization.
Ø -‐O2: The level -‐O2 optimizations include all of the -‐O1 optimizations, plus a large number of others. dimsconsultants$ gcc -‐O2 -‐o test test.c
Ø -‐Os: The special optimization level (-‐Os or size) enables all -‐O2 optimizations that do not increase code size; it puts the emphasis on size over speed. This includes all second-‐level optimizations, except for the alignment optimizations. dimsconsultants$ gcc -‐Os -‐o test test.c
Ø -‐O3:
The third and highest level enables even more optimizations like described in the below table by putting emphasis on speed over size. This includes optimizations enabled at -‐O2. dimsconsultants$ gcc -‐O3 -‐o test test.c Although, -‐O3 and –Ofast can produce fast code. But, it is recommended to use the second level of optimization -‐O2 to increase the performance of your code.
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
27
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
28
Table 4: Memory usage before removal of bottlenecks dimsconsultants$ gcc test01.c –c –o dims
dimsconsultants$ size dims
Program
Memory usage
TEXT DATA OBJC Others Dec Hex test01.c 312 0 0 64 376 178
test02.c 326 0 0 64 390 186
test03.c 577 0 0 32 609 261
test04.c 975 0 0 128 1103 44F
test05.c 3559 0 0 256 3815 EE7
Table 5: Memory usage after removal of bottlenecks dimsconsultants$ gcc test01.c –c –o dims -‐Os
dimsconsultants$ size dims
Program
Memory usage
TEXT DATA OBJC Others Dec Hex test01.c 231 0 0 62 295 127 test02.c 238 0 0 64 302 12E test03.c 448 0 0 32 480 1E0 test04.c 734 0 0 128 862 35E test05.c 2071 0 0 256 2327 917
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
29
Program
Memory usage -‐ Before Memory usage -‐ After TEXT DATA OBJC Others Dec Hex TEXT DATA OBJC Others Dec Hex
test01.c 312 0 0 64 376 178 231 0 0 62 295 127
test02.c 326 0 0 64 390 186 238 0 0 64 302 12E
test03.c 577 0 0 32 609 261 448 0 0 32 480 1E0
test04.c 975 0 0 128 1103 44F 734 0 0 128 862 35E
test05.c 3559 0 0 256 3815 EE7 2071 0 0 256 2327 917
Table 6: Memory usage in a more human readable form dimsconsultants$ gcc test01.c –c –o dims
dimsconsultants$ ls –hl dims #size in KB
dimsconsultants$ ls –l dims # size in Bytes
Program Before removing bottlenecks
test01.c 1192B | 1.2KB
test02.c 1208B | 1.2KB
test03.c 1656B | 1.6KB
test04.c 2368B | 2.3KB
test05.c 6320B | 6.2KB
Table 7: Memory usage in a more human readable form -‐ Optimized dimsconsultants$ gcc test01.c –c –o dims -‐Os
dimsconsultants$ ls –hl dims #size in KB
dimsconsultants$ ls –l dims # size in Bytes
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
30
Program After removing bottlenecks
test01.c 1120B | 1.1KB
test02.c 1128B | 1.1KB
test03.c 1644B | 1.6KB
test04.c 2124B | 2.1KB
test05.c 4784B | 4.7KB Table 8:
Program
Memory usage
Before After test01.c 1192B | 1.2KB 1120B | 1.1KB test02.c 1208B | 1.2KB 1128B | 1.1KB test03.c 1656B | 1.6KB 1644B | 1.6KB test04.c 2368B | 2.3KB 2124B | 2.1KB test05.c 6320B | 6.2KB 4784B | 4.7KB
-‐> The above command is used to show how much memory is reduced. To just execute the code with the optimization levels do: (without the ‘-‐c’ option which stops the entire process at the compilation level only). But the easiest way is to use the –save-‐temps command that will allow you test all different files involved in the compilation process. dimsconsultants$ gcc test01.c –o dims –Os
dimsconsultants$ size ./dims #ls –lh dims Note: There is quite a lot to explore when it comes to using time command, such as the time taken by the gcc command to compile, execute etc., and how much memory the program under test takes at every compilation stage.
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
31
Ø Conclusion:
We can therefore say that, though not known as a tool for Performance Testing, GCC
compiler has lots of abilities that we should look at and exploit. With the help of
various Linux commands, we were able to test parameters such as, CPU time,
memory usage, Speed, Stability, scalability and so on.
To achieve this, we have used the GCC optimization options; different levels of
optimization we of great help for us. Though these options are of great help, one
should know which level to use based on the understanding of each optimization
level.
Verification and Validation Techniques GCC Compiler as a Performance Testing tool
32
Ø References:
1. GCC Documentation: -‐ https://gcc.gnu.org/onlinedocs/gcc/Optimize-‐Options.html