compiler ggcc
DESCRIPTION
The presentation will start by summarizing some results of the Eureka/ITEA project GGCC (Global GNU Compiler Collection) where Julio collaborated in the design of an open platform for coding rule validation.Then, the presentation continues on ellaboration on the different connections between formal techniques, in a broad sense, and open source software development. Finally, I will discuss how these examples lead naturally to the emergent concept of semantic forge.TRANSCRIPT
The Eureka/ITEA Global GCC Project
Julio Marino(joint work with Guillem Marpons and others)
Babel Research Group — Universidad Politecnica de Madrid
FOSSA09, Grenoble
Marino et al. (UPM) Global GCC FOSSA, November 2009 1 / 30
Overview
1 Project Overview
2 Coding Rule ValidationStructural Rule ValidationDomain-specific language: CRISP
3 The need for static analysis
4 Lessons learned
5 The way ahead
Marino et al. (UPM) Global GCC FOSSA, November 2009 2 / 30
ContextThe Global GCC Project (2006–2008)
ITEA-labeled consortium of industrial / research partnersI Industrial: Mandriva, Bertin, Telefonica I+D, small/medium-sized
companiesI Research labs: INRIA, CEA-LIST, UPM
Goal: make the GNU Compiler Collection (GCC) more attractive tothe (european) software industry by transferring academic resultsin three areas:
I Project-wide static analysisI Global optimizationI Minimise programming hazards by means of coding rules
Global GCC knowledge base: integrates heterogeneous informationprovided by the different components of GGCC
http://www.ggcc.info
Marino et al. (UPM) Global GCC FOSSA, November 2009 3 / 30
Coding Rules
Definition
Coding Rules constrain admissible constructs of alanguage to help produce more reliable and
maintainable code.
Standard coding rule sets do exist, e.g.:
High-Integrity C++ (HICPP): general C++ applications
MISRA-C (C language): automotive industry / embedded systems
Many organisations need to write their own rule setsor adapt existing ones.
Marino et al. (UPM) Global GCC FOSSA, November 2009 4 / 30
Coding RulesSome Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”
(MISRA-C 12.6)
“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”
(HICPP 3.3.6)
“All automatic variables shall have been assigned a valuebefore being used”
(MISRA-C 9.1)
“Behaviour should be implemented by only one memberfunction in a class”
(HICPP 3.1.9)
Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30
Coding RulesSome Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”
(MISRA-C 12.6)
“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”
(HICPP 3.3.6)
“All automatic variables shall have been assigned a valuebefore being used”
(MISRA-C 9.1)
“Behaviour should be implemented by only one memberfunction in a class”
(HICPP 3.1.9)
Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30
Coding RulesSome Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”
(MISRA-C 12.6)
“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”
(HICPP 3.3.6)
“All automatic variables shall have been assigned a valuebefore being used”
(MISRA-C 9.1)
“Behaviour should be implemented by only one memberfunction in a class”
(HICPP 3.1.9)
Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30
Coding RulesSome Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”
(MISRA-C 12.6)
“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”
(HICPP 3.3.6)
“All automatic variables shall have been assigned a valuebefore being used”
(MISRA-C 9.1)
“Behaviour should be implemented by only one memberfunction in a class”
(HICPP 3.1.9)
Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30
Coding RulesSome Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”
(MISRA-C 12.6)
“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”
(HICPP 3.3.6)
“All automatic variables shall have been assigned a valuebefore being used”
(MISRA-C 9.1)
“Behaviour should be implemented by only one memberfunction in a class”
(HICPP 3.1.9)
Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30
Rule Conformance Checking
Problems with Current Approaches
Rules are specified in natural language:I AmbiguityI Automatic checking hindered
Closed tools
Lack of extensibility
Proposed Solution
Define a logic based language that allows for precisely specifyingrule sets such as MISRA-C or HICPP
Use logic programming to get an automatic rule conformancechecking procedure
Integrate information provided by different program analyses
Marino et al. (UPM) Global GCC FOSSA, November 2009 6 / 30
Rule Conformance Checking
Problems with Current Approaches
Rules are specified in natural language:I AmbiguityI Automatic checking hindered
Closed tools
Lack of extensibility
Proposed Solution
Define a logic based language that allows for precisely specifyingrule sets such as MISRA-C or HICPP
Use logic programming to get an automatic rule conformancechecking procedure
Integrate information provided by different program analyses
Marino et al. (UPM) Global GCC FOSSA, November 2009 6 / 30
Other Tools
Proprietary tools:
Compilers: IAR Systems (C)
QA: Parasoft, Klocwork, Coverity, Semmle Code (Java)
Free software:
Checkstyle (Java)
Gendarme (ECMA CIL, Mono and .Net)
Drawbacks:
Lack of appropriate extensibility mechanisms
Ambiguity in natural language
Interoperability is difficult
Marino et al. (UPM) Global GCC FOSSA, November 2009 7 / 30
Motivation: C++ “Strange” Behavior
class A{public:
A();virtual void func ();
};
class B : public A{
B() : A() {}virtual void func ();
};
A::A() {func ();
}
B *d = new B();
// A::func or B::func?
Marino et al. (UPM) Global GCC FOSSA, November 2009 8 / 30
Motivation: C++ “Strange” Behavior
class A{public:
A();virtual void func ();
};
class B : public A{
B() : A() {}virtual void func ();
};
A::A() {func ();
}
B *d = new B();
// A::func or B::func?
Coding Rule:
“Do not invoke virtual methods of the declared classin a constructor or destructor.”
Marino et al. (UPM) Global GCC FOSSA, November 2009 8 / 30
C++ “strange” behavior (2)
class Base {};
class Derived : public Base{public:
~Derived () {}};
void foo(){
Derived* d = new Derived;delete d; // correctly calls derived destructor
}
void boo(){
Derived* d = new Derived;Base* b = d;delete b; // problem! does not call derived destructor !
}
Marino et al. (UPM) Global GCC FOSSA, November 2009 9 / 30
C++ “strange” behavior (2)
class Base {};
class Derived : public Base{public:
~Derived () {}};
void foo(){
Derived* d = new Derived;delete d; // correctly calls derived destructor
}
void boo(){
Derived* d = new Derived;Base* b = d;delete b; // problem! does not call derived destructor !
}
Rule HICPP 3.3.2
“Write a ‘virtual’ destructor for base classes.”
Marino et al. (UPM) Global GCC FOSSA, November 2009 9 / 30
ExampleRule Formalisation
Rule HICPP 3.3.15
“Ensure base classes common to more than one derived class arevirtual”
violate hicpp 3,3,15(a, b, c, d)←b 6= c ∧direct base of(a, b) ∧direct base of(a, c) ∧base of(b, d) ∧ base of(c, d) ∧¬virtual base of(a, c)
Rules are specified in an enriched LP-language with: disequality,quantifiers, constructive negation and sorts.
Marino et al. (UPM) Global GCC FOSSA, November 2009 10 / 30
ExampleExtraction of Program Information and Search of Violations
Rule HICPP 3.3.15 in Prolog
violate_hicpp_3_3_15(A,B,C,D) :-class(B), class(C),B \= C,class(D), class(A),direct_base_of(A, B),direct_base_of(A, C),base_of(B, D),base_of(C, D),\+ virtual_base_of(A, C).
class(’::Animal’). class(’::WingedAnimal’).
class(’::Mammal’). class(’::Bat’).
direct base of(’::Animal’, ’::Mammal’).
direct base of(’::Animal’, ’::WingedAnimal’).
direct base of(’::Mammal’, ’::Bat’).
direct base of(’::WingedAnimal’, ’::Bat’).
virtual base of(’::Animal’, ’::Mammal’).
Marino et al. (UPM) Global GCC FOSSA, November 2009 11 / 30
Proposed Approach
1 Formalize rules in a logic-based specification languagethat is executable: CRISP
2 Use GCC ?? for gathering necessary programinformation
Marino et al. (UPM) Global GCC FOSSA, November 2009 12 / 30
Our Rule Checking Procedure
Coding rules(in English)
C++ projectsource files
Coding rulesformalized
in CRISPC++
Coding rulecompiler
g++’
(project build)
Coding rulescompiled
into Prolog
Project factsin Prolog
Ciao Prologengine
Rule viola-tions report
1 Coding rule(s) written oncein the logic-based formalism
2 Extract program information(+ analysis information ifavailable) using GCC, andstore it
3 Search (using a Prologengine) for a counterexample
Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30
Our Rule Checking Procedure
Coding rules(in English)
C++ projectsource files
Coding rulesformalized
in CRISPC++
Coding rulecompiler
g++’
(project build)
Coding rulescompiled
into Prolog
Project factsin Prolog
Ciao Prologengine
Rule viola-tions report
1 Coding rule(s) written oncein the logic-based formalism
2 Extract program information(+ analysis information ifavailable) using GCC, andstore it
3 Search (using a Prologengine) for a counterexample
Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30
Our Rule Checking Procedure
Coding rules(in English)
C++ projectsource files
Coding rulesformalized
in CRISPC++
Coding rulecompiler
g++’
(project build)
Coding rulescompiled
into Prolog
Project factsin Prolog
Ciao Prologengine
Rule viola-tions report
1 Coding rule(s) written oncein the logic-based formalism
2 Extract program information(+ analysis information ifavailable) using GCC, andstore it
3 Search (using a Prologengine) for a counterexample
Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30
Our Rule Checking Procedure
Coding rules(in English)
C++ projectsource files
Coding rulesformalized
in CRISPC++
Coding rulecompiler
g++’
(project build)
Coding rulescompiled
into Prolog
Project factsin Prolog
Ciao Prologengine
Rule viola-tions report
1 Coding rule(s) written oncein the logic-based formalism
2 Extract program information(+ analysis information ifavailable) using GCC, andstore it
3 Search (using a Prologengine) for a counterexample
Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30
CRISP Building Blocks 1: Sorts
Variable, DataMember, LocalVariable
Function, MemberFunction, Constructor
Type, PointerType, Record
Scope, Namespace, Record, CompoundStatement
Operator
ArgumentTypeInFunctionType
ClassMember
Thing
Marino et al. (UPM) Global GCC FOSSA, November 2009 14 / 30
CRISP Building Blocks 2: (Binary) Relations
Function calls FunctionRecord hasImmediateBase RecordVariable hasType NonFunctionTypeFunction hasType FunctionTypeThing isDefinedIn ScopeScope isNestedIn ScopeRecord hasMember MemberFunctionRecord hasMember DataMemberRecord hasBase RecordRecord isPrivateBaseOf RecordRecord isVirtualBaseOf RecordPointerType hasPointedType TypeFunctionType hasReturnType TypeRecord hasFriend RecordRecord hasFriend MemberFunctionClassMember hasVisibility Visibility
Marino et al. (UPM) Global GCC FOSSA, November 2009 15 / 30
Example of Rule Formalization
Rule HICPP 3.3.13:
“Do not invoke virtual methods of the declared classin a constructor or destructor.”
Marino et al. (UPM) Global GCC FOSSA, November 2009 16 / 30
Example of Rule Formalization
Rule HICPP 3.3.13:
“Do not invoke virtual methods of the declared classin a constructor or destructor.”
rule HICPP 3.3.13
violated by Caller : MemberFunction; Callee : VirtualFunction
when exists R : Record such that
(
R hasMember Caller
and R hasMember Callee
and
(
Caller is Constructor
or Caller is Destructor
)
and Caller calls+ Callee
)
.
Marino et al. (UPM) Global GCC FOSSA, November 2009 16 / 30
Formalization of Rule HICPP 3.3.2
Rule HICPP 3.3.13:
“Write a ‘virtual’ destructor for base classes.”
rule HICPP 3.3.2violated by C : Record
when exists C’ such that C’ hasBase Cand not exist VD : Destructor such that(
VD isDefinedIn Cand VD is VirtualFunction
).
Marino et al. (UPM) Global GCC FOSSA, November 2009 17 / 30
Auxiliary Sorts and Relations
relation F : Function overloads F’ : Function
when exists S : Scope ; N : String such that
(
F isDefinedIn S
and F’ isDefinedIn S
and F hasUnqualifiedName N
and F’ hasUnqualifiedName N
and F \= F’
)
.
sort M : ClassMember is PrivateClassMember
when exists V : Visibility such that
(
M hasVisibility V and V is ‘private’
)
.
Marino et al. (UPM) Global GCC FOSSA, November 2009 18 / 30
Experimental Results
PROJECT KLOC LOAD TIME # VIOLATIONS (CHECKING TIME)3.3.1 3.3.2 3.3.11 3.3.15
Bacula 20 0.24 0 (0.0) 3 (0.0) 0 (0.0) 0 (0.0)CLAM 46 1.62 1 (0.0) 15 (0.5) 115 (0.1) 0 (0.2)Firebird 439 2.61 16 (0.0) 60 (1.0) 115 (0.2) 0 (0.3)IT++ 39 0.42 0 (0.0) 6 (0.0) 12 (0.0) 0 (0.0)OGRE 209 3.05 0 (0.0) 15 (0.9) 79 (0.2) 0 (0.3)Orca 89 1.17 1 (0.0) 12 (0.4) 0 (0.1) 0 (0.2)Qt 595 10.42 15 (0.0) 75 (10.5) 1155 (1.3) 4 (1.2)
All times expressed in seconds.
Marino et al. (UPM) Global GCC FOSSA, November 2009 19 / 30
Work in Progress
1 Implement / Enrich the CRISP Language
2 Implement more rules with information given by other tools
3 Open our abstract representation of programs to external tools
Marino et al. (UPM) Global GCC FOSSA, November 2009 20 / 30
Implement / enrich the CRISP language
Quantification and true negation neededI Both performed over certain domains (sorts)I Infinite domains may appear with templates / genericsI We have an implementation of constructive intensional negation
Goals automatically reordered
Extend CRISP to other languages: Java, Ada, C, Fortran, . . .
Marino et al. (UPM) Global GCC FOSSA, November 2009 21 / 30
Integration of Information from External Analyzers
Coding rules(in English)
C++ projectsource files
Coding rulesformalized
in CRISPC++
Coding rulecompiler
g++’
(project build)
Coding rulescompiled
into Prolog
Project factsin Prolog
Ciao Prologengine
Rule viola-tions report
Marino et al. (UPM) Global GCC FOSSA, November 2009 22 / 30
Integration of Information from External Analyzers
Coding rules(in English)
C++ projectsource files
Coding rulesformalized
in CRISPC++
Coding rulecompiler
g++’
(project build)
Knowledge Base about the compiled program
Ciao Prologengine
Rule viola-tions report
ExternalAnalyzer
Translation
Marino et al. (UPM) Global GCC FOSSA, November 2009 22 / 30
Example of New Relation that Needs Specific Analysis
relation F : MemberFunction maySelfCall G : MemberFunctionwhen (
exists C : Record ; R : ProgramLocation such that(
C hasMember Fand C hasMember Gand F \= Gand F hasProgramLocation Land G calledOn Land L mayAlias ’this’
))or F mustSelfCall G
.
Marino et al. (UPM) Global GCC FOSSA, November 2009 23 / 30
Example of Rule that Needs Specific Analysis (1)
Rule HICPP 3.4.2:
“Do not return non-const handles to class data from const member functions”
rule HICPP 3.4.2violated by F : ConstMemberFunction
when exists C : Record;L : ProgramLocation;A : PrivateDataMember;P : PointerType
such that(
A hasType Pand not P is ConstType
and C hasMember Aand C hasMember Fand F returns Land L mayAlias A
).
Marino et al. (UPM) Global GCC FOSSA, November 2009 24 / 30
Example of Rule that Needs Specific Analyses (2)
Rule HICPP 3.2.5:“Ensure destructors release all objects owned by the object”
rule HICPP 3.2.5
violated by D : Destructor
when exists C : Record; A : DataMember; F : MemberFunction;
L : ProgramLocation such that
(
C hasMember D
and C hasMember A
and not D releases A
and L isFreshLocationIn F
and A mayPointTo L
and not exists G : MemberFunction such that
(
C hasMember G
and not A mustBeLinkedFromHeapIn G
)
)
.
Marino et al. (UPM) Global GCC FOSSA, November 2009 25 / 30
New Relations
ProgramLocation mayPointTo AbstractMemoryLocationProgramLocation mustPointTo AbstractMemoryLocationProgramLocation mayAlias ProgramLocationProgramLocation mustAlias ProgramLocation
Marino et al. (UPM) Global GCC FOSSA, November 2009 26 / 30
Lessons learnedgo out & meet people
Industrial projects are different, but there is a whole world ofproblems to solve out there.
Take advantage of european instruments to get in contact with theindustry / overall impression with ITEA quite positive.
Do not try to include your own research agenda in the proposal,that will not work!. . . but it can work in the opposite direction:
I DESAF10S (2010–2012), Spanish Ministry of Science andInnovation
I PROMETIDOS (2010–2013), Madrid RegionalGoverment/European Social Fund
I A PhD on its way!
Marino et al. (UPM) Global GCC FOSSA, November 2009 27 / 30
Lessons learnedbe open, in several ways if possible
Adding the open source label to your project proposal may be beneficialbut try to avoid the obvious, naive argumentations.Global GCC exemplified the benefits of openness in several aspects:
The GCC suite itself, as a vehicle for efficient transfer of advancedcompilation techniques to the european industry, alleviating theirdependency from external proprietary solutions.
Our proposal for an extensible platform for coding rulespecification and validation is itself open source in the sense thatspecs are code that can be shared and enhanced by a new marketof potential users.
This is only possible thanks to a variety of existing static analysersand tools (e.g. CIAO) from academia already distributed on opensource licenses.
Marino et al. (UPM) Global GCC FOSSA, November 2009 28 / 30
Lessons learnedkeep your ears open for unexpected applications
Coding rules for COBOL and beyond. . .
Tools for semi-automatic refactoring
Better source code searches at Google
SAFE-GCC: NXP, Trimedia. . .
Marino et al. (UPM) Global GCC FOSSA, November 2009 29 / 30
Lessons learnedsome negative bits. . .
The GNU compiler collection itself may be a problem, sometimes,due to an obsolete architecture
Issues with copyright transfer to the FSF
Multiplicity of languages has been a problem as well (i.e. multiplefront-ends)
Do not try to solve all the problems of our planet. . . Get focused!
Read the small print — national issues concerning europeanprojects, etc.
Marino et al. (UPM) Global GCC FOSSA, November 2009 30 / 30
The way aheadcurrent state of affairs
Preliminary conclusions:
Clean (declarative) semantics given to potentially ambiguouscoding rules by means of (extended) logic programming
A number of rules implemented using plain Prolog
Rule violations found in highly regarded C++ projects!
Checker: little resource (memory and time) consumption
Future work:
Complete definition of a highly expressive language aimed atspecifying rules and translation scheme into efficient Prolog
Connect the framework with other parts of the GGCC project
Improve performance of overall checking procedure
http://www.ggcc.info
Marino et al. (UPM) Global GCC FOSSA, November 2009 31 / 30
The way aheada research agenda
Focus on tools
Do not miss reliability of open software as a real issue!Bring semantics to open source software development
I type systemsI description logics (ontologies, etc.)I static program analysis (abstract interpretation, model checking,
etc.)I programming language design (DSLs, concurrency. . . )
The future is. . . SFI searching sources based on types (Foogle)I ontology powered semantic desktops (Nepomuk)I coherent management of packages (Mancoosi)I automatic discovery and composition of sw (AMOS, EZweb)I safe composition of componentsI etc.
Marino et al. (UPM) Global GCC FOSSA, November 2009 32 / 30