lifting variability from c to mbeddr-c
DESCRIPTION
Information about variability is expressed in C through the usage of preprocessor directives which interact in multiple ways with proper C code, leading to systems difficult to understand and analyze. Lifting the variability information into a DSL to explicitly capture the features, relations among them and to the code, would substantially improve today’s state of practice. In this paper we present a study which we performed on 5 large projects (including the Linux kernel) and almost 30M lines of code on extracting variability information from C files. Our main result is that by using simple heuristics, it is possible to interpret a large portion of the variability information present in large systems. Furthermore, we show how we extracted variability information from ChibiOS, a realtime OS available on 14 different core architectures, and how we lifted that information in mbeddr, a DSL-based technology stack for embedded programing with explicit support for variability.TRANSCRIPT
Contribution to Mbeddr
Image
C
Extracting variability from C and lifting it to mbeddr
Federico Tomassetti, Daniel Ratiu
3. Analysis
1. Variability in C
2. Variability in mbeddr
5. Case study
4. Results
C
The C preprocessor is evil
• It let you obfuscate everything, even keywords
• Everything is in global scope• What a module does, depends on the
context where it is included• It operates at token level• It makes the code very difficult to analyze
The C preprocessor is evil
• It let you obfuscate everything, even keywords
• Everything is in global scope• What a module does, depends on the
context where it is included• It operates at token level• It makes the code very difficult to analyze
The C preprocessor is evil
What a module does, depends on the context where it is included
#define A#include «foo.h»
#define B 50#include «foo.h»
// foo.h
#ifdef Astruct SomeStruct { …}#elseint b = B;void foo();#endif
What foo.h declares depend on where it is included
The C preprocessor is evil
• It let you obfuscate everything, even keywords
• Everything is in global scope• What a module does, depends on the
context where it is included• It operates at token level• It makes the code very difficult to analyze
The C preprocessor is evil
• It let you obfuscate everything, even keywords
• Everything is in global scope• What a module does, depends on the
context where it is included• It operates at token level• It makes the code very difficult to analyze
It is an extensible variant of C built on top of a projectional editor.
Existing extensions include:• interfaces with pre- and postconditions, • components, • state machines,• physical units,• requirements tracing,• product line variability.
mbeddr introduces higher-level abstractions
• Constants with scope• Feature models• Configuration models• Isolated modules
AnalysisWe analyzed:• Linux• Apache Openoffice• Quake• VLC• Mozilla
For a total of circa 73K files and 30M LOCs.
We analyzed these projects to understand how variability is used in C and what we can do for lifting it to mbeddr.
Individuate relevant statements
#define, #undef
#ifdef, #ifndef, #if, #elif, #else, #endif
Configuration processing
Presenceconditions
Parsing PC expressions
#if A>B && !(C||D)#elif D!=10#ifndef C
185K expression parsed 3 errors
Extra: parsing define expressions
#define A 5#define B do {} while(1);#define C 3 +
82-95% of define values are valid expressions
Exclude non-variability usages
#ifndef FOO_H#define FOO_H…#endif
#ifndef A#define A 5#endif
Double inclusion guard Override guard
VPs combination
#ifdef Afoo1();#elif Bfoo2();#if B>Afoo3();#elsefoo4();#endif#endif
VP1{ then_block: { foo1(); } elif_block: { foo2(); VP2 { then_block: { foo3();} else_block: { foo4();} } }}
VPs combination// A foo1();// !A && Bfoo2();// !A && B && B>Afoo3();// !A && B && !(B>A)foo4();
VP1{ then_block: { foo1(); } elif_block: { foo2(); VP2 { then_block: { foo3();} else_block: { foo4();} } }}
This is important in order to understand which kind of expressions we need to support in the higher level configuration language.
RQ1 Which are the typical building blocks in presence conditions?
Kind of expressions Presence conditions containing them
Identifier references 85-98 %Logical operators 21-66 %Number literals 6-16 %Comparison operators 0-6 %Others < 2%
RQ1 Which are the typical building blocks in presence conditions?
Depending on changes upon defined symbols, defines can be lifted (or not) as constant configuration values.
Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ2
We want constant to avoid this situation:
#define A 1#if A>1foo1();#endif#define A 2#if A>1foo2();#endif
Same condition, one is included, one is not
Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ2
Cases RangeSingle definitionMultiple definitions to the same valueDefinitions under different conditionsTotal
Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ2
Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ2
Definitions under different conditions
#if VERS <= 2#define A 1#elif VERS == 3#define A 2#else#define A 3#endif
Cases RangeSingle definition 69-90 %Multiple definitions to the same value
2-24 %
Definitions under different conditions
2-9 %
Total 95-99 %
Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ2
If they are, it could be possible to extract feature model constraints from them.
Are #error and #warning used in practice?RQ3
They are present in 4 out of 5 projects but they represent between 0 and 0.26% of the preprocessor statements.
Linux contains more than 800 #error/#warningMozilla more than 700
Are #error and #warning used in practice?RQ3
Results
RQ1) Which are the typical building blocks in presence conditions?
RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ3) Are #error and #warning used in practice?
Results
RQ1) Which are the typical building blocks in presence conditions?
RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ3) Are #error and #warning used in practice?
Identifiers, integers, logical and comparison operations
Results
RQ1) Which are the typical building blocks in presence conditions?
RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ3) Are #error and #warning used in practice?
Identifiers, integers, logical and comparison operations
More than 90% of symbols behave like constants
Results
RQ1) Which are the typical building blocks in presence conditions?
RQ2) Which changes (re- #defines and #undefs) are operated on a defined symbol?
RQ3) Are #error and #warning used in practice?
Identifiers, integers, logical and comparison operations
More than 90% of symbols behave like constants
Depends on the project
ChibiOS
ChibiOS is a real-time OS supporting 14 core architectures, different compilers and platforms.
OS Kernel module
41 files246 presence conditions233 definitions54 symbols in presence conditions2 symbols used in definitions of PC symbols53 symbols not defined in the module (feat.)3 defined in the module (derived feat.)
Demos/ARMCM3-STM32F103ZG-FATFS module
Definitions for 31 of the 53 features28 defined to TRUE/FALSE1 has no value1 has value 01 has value 20
Extracting variability from C and lifting it to mbeddr
Federico Tomassetti, Daniel Ratiu
Questions?