compiler ggcc

44
The Eureka/ITEA Global GCC Project Julio Mari˜ no (joint work with Guillem Marpons and others) Babel Research Group — Universidad Polit´ ecnica de Madrid FOSSA09, Grenoble Mari˜ no et al. (UPM) Global GCC FOSSA, November 2009 1 / 30

Upload: inria

Post on 11-May-2015

727 views

Category:

Technology


0 download

DESCRIPTION

The presentation will start by summarizing some results of the Eureka/ITEA project GGCC (Global GNU Compiler Collection) where Julio collaborated in the design of an open platform for coding rule validation.Then, the presentation continues on ellaboration on the different connections between formal techniques, in a broad sense, and open source software development. Finally, I will discuss how these examples lead naturally to the emergent concept of semantic forge.

TRANSCRIPT

Page 1: Compiler Ggcc

The Eureka/ITEA Global GCC Project

Julio Marino(joint work with Guillem Marpons and others)

Babel Research Group — Universidad Politecnica de Madrid

FOSSA09, Grenoble

Marino et al. (UPM) Global GCC FOSSA, November 2009 1 / 30

Page 2: Compiler Ggcc

Overview

1 Project Overview

2 Coding Rule ValidationStructural Rule ValidationDomain-specific language: CRISP

3 The need for static analysis

4 Lessons learned

5 The way ahead

Marino et al. (UPM) Global GCC FOSSA, November 2009 2 / 30

Page 3: Compiler Ggcc

ContextThe Global GCC Project (2006–2008)

ITEA-labeled consortium of industrial / research partnersI Industrial: Mandriva, Bertin, Telefonica I+D, small/medium-sized

companiesI Research labs: INRIA, CEA-LIST, UPM

Goal: make the GNU Compiler Collection (GCC) more attractive tothe (european) software industry by transferring academic resultsin three areas:

I Project-wide static analysisI Global optimizationI Minimise programming hazards by means of coding rules

Global GCC knowledge base: integrates heterogeneous informationprovided by the different components of GGCC

http://www.ggcc.info

Marino et al. (UPM) Global GCC FOSSA, November 2009 3 / 30

Page 4: Compiler Ggcc

Coding Rules

Definition

Coding Rules constrain admissible constructs of alanguage to help produce more reliable and

maintainable code.

Standard coding rule sets do exist, e.g.:

High-Integrity C++ (HICPP): general C++ applications

MISRA-C (C language): automotive industry / embedded systems

Many organisations need to write their own rule setsor adapt existing ones.

Marino et al. (UPM) Global GCC FOSSA, November 2009 4 / 30

Page 5: Compiler Ggcc

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Page 6: Compiler Ggcc

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Page 7: Compiler Ggcc

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Page 8: Compiler Ggcc

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Page 9: Compiler Ggcc

Coding RulesSome Actual Examples

“Do not call the malloc() function” (MISRA-C 20.4)

“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)

“Expressions that are effectively Boolean should not beused as operands to operators other than (&&, || and !)”

(MISRA-C 12.6)

“If a virtual function in a base class is not overridden inany derived class, then make it non virtual”

(HICPP 3.3.6)

“All automatic variables shall have been assigned a valuebefore being used”

(MISRA-C 9.1)

“Behaviour should be implemented by only one memberfunction in a class”

(HICPP 3.1.9)

Marino et al. (UPM) Global GCC FOSSA, November 2009 5 / 30

Page 10: Compiler Ggcc

Rule Conformance Checking

Problems with Current Approaches

Rules are specified in natural language:I AmbiguityI Automatic checking hindered

Closed tools

Lack of extensibility

Proposed Solution

Define a logic based language that allows for precisely specifyingrule sets such as MISRA-C or HICPP

Use logic programming to get an automatic rule conformancechecking procedure

Integrate information provided by different program analyses

Marino et al. (UPM) Global GCC FOSSA, November 2009 6 / 30

Page 11: Compiler Ggcc

Rule Conformance Checking

Problems with Current Approaches

Rules are specified in natural language:I AmbiguityI Automatic checking hindered

Closed tools

Lack of extensibility

Proposed Solution

Define a logic based language that allows for precisely specifyingrule sets such as MISRA-C or HICPP

Use logic programming to get an automatic rule conformancechecking procedure

Integrate information provided by different program analyses

Marino et al. (UPM) Global GCC FOSSA, November 2009 6 / 30

Page 12: Compiler Ggcc

Other Tools

Proprietary tools:

Compilers: IAR Systems (C)

QA: Parasoft, Klocwork, Coverity, Semmle Code (Java)

Free software:

Checkstyle (Java)

Gendarme (ECMA CIL, Mono and .Net)

Drawbacks:

Lack of appropriate extensibility mechanisms

Ambiguity in natural language

Interoperability is difficult

Marino et al. (UPM) Global GCC FOSSA, November 2009 7 / 30

Page 13: Compiler Ggcc

Motivation: C++ “Strange” Behavior

class A{public:

A();virtual void func ();

};

class B : public A{

B() : A() {}virtual void func ();

};

A::A() {func ();

}

B *d = new B();

// A::func or B::func?

Marino et al. (UPM) Global GCC FOSSA, November 2009 8 / 30

Page 14: Compiler Ggcc

Motivation: C++ “Strange” Behavior

class A{public:

A();virtual void func ();

};

class B : public A{

B() : A() {}virtual void func ();

};

A::A() {func ();

}

B *d = new B();

// A::func or B::func?

Coding Rule:

“Do not invoke virtual methods of the declared classin a constructor or destructor.”

Marino et al. (UPM) Global GCC FOSSA, November 2009 8 / 30

Page 15: Compiler Ggcc

C++ “strange” behavior (2)

class Base {};

class Derived : public Base{public:

~Derived () {}};

void foo(){

Derived* d = new Derived;delete d; // correctly calls derived destructor

}

void boo(){

Derived* d = new Derived;Base* b = d;delete b; // problem! does not call derived destructor !

}

Marino et al. (UPM) Global GCC FOSSA, November 2009 9 / 30

Page 16: Compiler Ggcc

C++ “strange” behavior (2)

class Base {};

class Derived : public Base{public:

~Derived () {}};

void foo(){

Derived* d = new Derived;delete d; // correctly calls derived destructor

}

void boo(){

Derived* d = new Derived;Base* b = d;delete b; // problem! does not call derived destructor !

}

Rule HICPP 3.3.2

“Write a ‘virtual’ destructor for base classes.”

Marino et al. (UPM) Global GCC FOSSA, November 2009 9 / 30

Page 17: Compiler Ggcc

ExampleRule Formalisation

Rule HICPP 3.3.15

“Ensure base classes common to more than one derived class arevirtual”

violate hicpp 3,3,15(a, b, c, d)←b 6= c ∧direct base of(a, b) ∧direct base of(a, c) ∧base of(b, d) ∧ base of(c, d) ∧¬virtual base of(a, c)

Rules are specified in an enriched LP-language with: disequality,quantifiers, constructive negation and sorts.

Marino et al. (UPM) Global GCC FOSSA, November 2009 10 / 30

Page 18: Compiler Ggcc

ExampleExtraction of Program Information and Search of Violations

Rule HICPP 3.3.15 in Prolog

violate_hicpp_3_3_15(A,B,C,D) :-class(B), class(C),B \= C,class(D), class(A),direct_base_of(A, B),direct_base_of(A, C),base_of(B, D),base_of(C, D),\+ virtual_base_of(A, C).

class(’::Animal’). class(’::WingedAnimal’).

class(’::Mammal’). class(’::Bat’).

direct base of(’::Animal’, ’::Mammal’).

direct base of(’::Animal’, ’::WingedAnimal’).

direct base of(’::Mammal’, ’::Bat’).

direct base of(’::WingedAnimal’, ’::Bat’).

virtual base of(’::Animal’, ’::Mammal’).

Marino et al. (UPM) Global GCC FOSSA, November 2009 11 / 30

Page 19: Compiler Ggcc

Proposed Approach

1 Formalize rules in a logic-based specification languagethat is executable: CRISP

2 Use GCC ?? for gathering necessary programinformation

Marino et al. (UPM) Global GCC FOSSA, November 2009 12 / 30

Page 20: Compiler Ggcc

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Page 21: Compiler Ggcc

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Page 22: Compiler Ggcc

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Page 23: Compiler Ggcc

Our Rule Checking Procedure

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

1 Coding rule(s) written oncein the logic-based formalism

2 Extract program information(+ analysis information ifavailable) using GCC, andstore it

3 Search (using a Prologengine) for a counterexample

Marino et al. (UPM) Global GCC FOSSA, November 2009 13 / 30

Page 24: Compiler Ggcc

CRISP Building Blocks 1: Sorts

Variable, DataMember, LocalVariable

Function, MemberFunction, Constructor

Type, PointerType, Record

Scope, Namespace, Record, CompoundStatement

Operator

ArgumentTypeInFunctionType

ClassMember

Thing

Marino et al. (UPM) Global GCC FOSSA, November 2009 14 / 30

Page 25: Compiler Ggcc

CRISP Building Blocks 2: (Binary) Relations

Function calls FunctionRecord hasImmediateBase RecordVariable hasType NonFunctionTypeFunction hasType FunctionTypeThing isDefinedIn ScopeScope isNestedIn ScopeRecord hasMember MemberFunctionRecord hasMember DataMemberRecord hasBase RecordRecord isPrivateBaseOf RecordRecord isVirtualBaseOf RecordPointerType hasPointedType TypeFunctionType hasReturnType TypeRecord hasFriend RecordRecord hasFriend MemberFunctionClassMember hasVisibility Visibility

Marino et al. (UPM) Global GCC FOSSA, November 2009 15 / 30

Page 26: Compiler Ggcc

Example of Rule Formalization

Rule HICPP 3.3.13:

“Do not invoke virtual methods of the declared classin a constructor or destructor.”

Marino et al. (UPM) Global GCC FOSSA, November 2009 16 / 30

Page 27: Compiler Ggcc

Example of Rule Formalization

Rule HICPP 3.3.13:

“Do not invoke virtual methods of the declared classin a constructor or destructor.”

rule HICPP 3.3.13

violated by Caller : MemberFunction; Callee : VirtualFunction

when exists R : Record such that

(

R hasMember Caller

and R hasMember Callee

and

(

Caller is Constructor

or Caller is Destructor

)

and Caller calls+ Callee

)

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 16 / 30

Page 28: Compiler Ggcc

Formalization of Rule HICPP 3.3.2

Rule HICPP 3.3.13:

“Write a ‘virtual’ destructor for base classes.”

rule HICPP 3.3.2violated by C : Record

when exists C’ such that C’ hasBase Cand not exist VD : Destructor such that(

VD isDefinedIn Cand VD is VirtualFunction

).

Marino et al. (UPM) Global GCC FOSSA, November 2009 17 / 30

Page 29: Compiler Ggcc

Auxiliary Sorts and Relations

relation F : Function overloads F’ : Function

when exists S : Scope ; N : String such that

(

F isDefinedIn S

and F’ isDefinedIn S

and F hasUnqualifiedName N

and F’ hasUnqualifiedName N

and F \= F’

)

.

sort M : ClassMember is PrivateClassMember

when exists V : Visibility such that

(

M hasVisibility V and V is ‘private’

)

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 18 / 30

Page 30: Compiler Ggcc

Experimental Results

PROJECT KLOC LOAD TIME # VIOLATIONS (CHECKING TIME)3.3.1 3.3.2 3.3.11 3.3.15

Bacula 20 0.24 0 (0.0) 3 (0.0) 0 (0.0) 0 (0.0)CLAM 46 1.62 1 (0.0) 15 (0.5) 115 (0.1) 0 (0.2)Firebird 439 2.61 16 (0.0) 60 (1.0) 115 (0.2) 0 (0.3)IT++ 39 0.42 0 (0.0) 6 (0.0) 12 (0.0) 0 (0.0)OGRE 209 3.05 0 (0.0) 15 (0.9) 79 (0.2) 0 (0.3)Orca 89 1.17 1 (0.0) 12 (0.4) 0 (0.1) 0 (0.2)Qt 595 10.42 15 (0.0) 75 (10.5) 1155 (1.3) 4 (1.2)

All times expressed in seconds.

Marino et al. (UPM) Global GCC FOSSA, November 2009 19 / 30

Page 31: Compiler Ggcc

Work in Progress

1 Implement / Enrich the CRISP Language

2 Implement more rules with information given by other tools

3 Open our abstract representation of programs to external tools

Marino et al. (UPM) Global GCC FOSSA, November 2009 20 / 30

Page 32: Compiler Ggcc

Implement / enrich the CRISP language

Quantification and true negation neededI Both performed over certain domains (sorts)I Infinite domains may appear with templates / genericsI We have an implementation of constructive intensional negation

Goals automatically reordered

Extend CRISP to other languages: Java, Ada, C, Fortran, . . .

Marino et al. (UPM) Global GCC FOSSA, November 2009 21 / 30

Page 33: Compiler Ggcc

Integration of Information from External Analyzers

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Coding rulescompiled

into Prolog

Project factsin Prolog

Ciao Prologengine

Rule viola-tions report

Marino et al. (UPM) Global GCC FOSSA, November 2009 22 / 30

Page 34: Compiler Ggcc

Integration of Information from External Analyzers

Coding rules(in English)

C++ projectsource files

Coding rulesformalized

in CRISPC++

Coding rulecompiler

g++’

(project build)

Knowledge Base about the compiled program

Ciao Prologengine

Rule viola-tions report

ExternalAnalyzer

Translation

Marino et al. (UPM) Global GCC FOSSA, November 2009 22 / 30

Page 35: Compiler Ggcc

Example of New Relation that Needs Specific Analysis

relation F : MemberFunction maySelfCall G : MemberFunctionwhen (

exists C : Record ; R : ProgramLocation such that(

C hasMember Fand C hasMember Gand F \= Gand F hasProgramLocation Land G calledOn Land L mayAlias ’this’

))or F mustSelfCall G

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 23 / 30

Page 36: Compiler Ggcc

Example of Rule that Needs Specific Analysis (1)

Rule HICPP 3.4.2:

“Do not return non-const handles to class data from const member functions”

rule HICPP 3.4.2violated by F : ConstMemberFunction

when exists C : Record;L : ProgramLocation;A : PrivateDataMember;P : PointerType

such that(

A hasType Pand not P is ConstType

and C hasMember Aand C hasMember Fand F returns Land L mayAlias A

).

Marino et al. (UPM) Global GCC FOSSA, November 2009 24 / 30

Page 37: Compiler Ggcc

Example of Rule that Needs Specific Analyses (2)

Rule HICPP 3.2.5:“Ensure destructors release all objects owned by the object”

rule HICPP 3.2.5

violated by D : Destructor

when exists C : Record; A : DataMember; F : MemberFunction;

L : ProgramLocation such that

(

C hasMember D

and C hasMember A

and not D releases A

and L isFreshLocationIn F

and A mayPointTo L

and not exists G : MemberFunction such that

(

C hasMember G

and not A mustBeLinkedFromHeapIn G

)

)

.

Marino et al. (UPM) Global GCC FOSSA, November 2009 25 / 30

Page 38: Compiler Ggcc

New Relations

ProgramLocation mayPointTo AbstractMemoryLocationProgramLocation mustPointTo AbstractMemoryLocationProgramLocation mayAlias ProgramLocationProgramLocation mustAlias ProgramLocation

Marino et al. (UPM) Global GCC FOSSA, November 2009 26 / 30

Page 39: Compiler Ggcc

Lessons learnedgo out & meet people

Industrial projects are different, but there is a whole world ofproblems to solve out there.

Take advantage of european instruments to get in contact with theindustry / overall impression with ITEA quite positive.

Do not try to include your own research agenda in the proposal,that will not work!. . . but it can work in the opposite direction:

I DESAF10S (2010–2012), Spanish Ministry of Science andInnovation

I PROMETIDOS (2010–2013), Madrid RegionalGoverment/European Social Fund

I A PhD on its way!

Marino et al. (UPM) Global GCC FOSSA, November 2009 27 / 30

Page 40: Compiler Ggcc

Lessons learnedbe open, in several ways if possible

Adding the open source label to your project proposal may be beneficialbut try to avoid the obvious, naive argumentations.Global GCC exemplified the benefits of openness in several aspects:

The GCC suite itself, as a vehicle for efficient transfer of advancedcompilation techniques to the european industry, alleviating theirdependency from external proprietary solutions.

Our proposal for an extensible platform for coding rulespecification and validation is itself open source in the sense thatspecs are code that can be shared and enhanced by a new marketof potential users.

This is only possible thanks to a variety of existing static analysersand tools (e.g. CIAO) from academia already distributed on opensource licenses.

Marino et al. (UPM) Global GCC FOSSA, November 2009 28 / 30

Page 41: Compiler Ggcc

Lessons learnedkeep your ears open for unexpected applications

Coding rules for COBOL and beyond. . .

Tools for semi-automatic refactoring

Better source code searches at Google

SAFE-GCC: NXP, Trimedia. . .

Marino et al. (UPM) Global GCC FOSSA, November 2009 29 / 30

Page 42: Compiler Ggcc

Lessons learnedsome negative bits. . .

The GNU compiler collection itself may be a problem, sometimes,due to an obsolete architecture

Issues with copyright transfer to the FSF

Multiplicity of languages has been a problem as well (i.e. multiplefront-ends)

Do not try to solve all the problems of our planet. . . Get focused!

Read the small print — national issues concerning europeanprojects, etc.

Marino et al. (UPM) Global GCC FOSSA, November 2009 30 / 30

Page 43: Compiler Ggcc

The way aheadcurrent state of affairs

Preliminary conclusions:

Clean (declarative) semantics given to potentially ambiguouscoding rules by means of (extended) logic programming

A number of rules implemented using plain Prolog

Rule violations found in highly regarded C++ projects!

Checker: little resource (memory and time) consumption

Future work:

Complete definition of a highly expressive language aimed atspecifying rules and translation scheme into efficient Prolog

Connect the framework with other parts of the GGCC project

Improve performance of overall checking procedure

http://www.ggcc.info

Marino et al. (UPM) Global GCC FOSSA, November 2009 31 / 30

Page 44: Compiler Ggcc

The way aheada research agenda

Focus on tools

Do not miss reliability of open software as a real issue!Bring semantics to open source software development

I type systemsI description logics (ontologies, etc.)I static program analysis (abstract interpretation, model checking,

etc.)I programming language design (DSLs, concurrency. . . )

The future is. . . SFI searching sources based on types (Foogle)I ontology powered semantic desktops (Nepomuk)I coherent management of packages (Mancoosi)I automatic discovery and composition of sw (AMOS, EZweb)I safe composition of componentsI etc.

Marino et al. (UPM) Global GCC FOSSA, November 2009 32 / 30