the pivot: static analysis of c++ applications
DESCRIPTION
The Pivot: Static Analysis of C++ Applications. Bjarne Stroustrup Texas A&M University http://www.research.att.com/~bs. Overview. Static analysis of C++ What would be useful Why it is hard C++0x The Pivot Context Aims Organization Basic representations - PowerPoint PPT PresentationTRANSCRIPT
The Pivot:Static Analysis of C++ Applications
Bjarne StroustrupTexas A&M Universityhttp://www.research.att.com/~bs
2
Overview
• Static analysis of C++– What would be useful– Why it is hard– C++0x
• The Pivot– Context– Aims– Organization– Basic representations
• High-level program representation for HPC– Concept-based checking and transformation
3
What would be useful?
• Direct representation of high-level ideas in code– E.g. no sideeffects, idempotent operation, always gives the
same answer for the same element, no security violation, no memory leak, no race condition, no deadlock, being sorted, being band-diagonal, parallel application …
• Use of such direct representation– For providing guarantees
– For information
– For optimization
– For program transformation
4
It’s hard
• C++ is– Large– Extremely flexible and general– Quite irregular– Has it’s type-unsafe C subset
• High-level ideas tend to be represented as templated classes and functions– Generic programming, Template meta-programming, generative
programming– We have little experience with tools representing and manipulating
templates– Such templates tend to be provided as part of domain specific libraries
5
Bell Labs proverbs
• Library design is language design
• Language design is library design
But the devil is in the details
6
C++0x
• 1998: ISO C++ standard
• 2009 (estimated): ISO C++ standard– Better libraries and better support for library
building• Hash maps, regular expressions, file system, …
• Threads and memory model
• Concepts– A type system for types, integers, and operations
• Auto, template aliases, general, initializer lists, …
7
Concept: trivial example
// Caveat: likely C++0x
template<ForwardIterator Iter, ValueType Val>where Assignable<Iter::value_type,Val>
Iter find(Iter first, Iter last, Val v);
template<Container Cont, ValueType Val>where Assignable<Cont::value_type,Val>
Iter find(Iter first, Iter last, Val v);
vector<int> v = { 2, 3, 5, 8, 13, 21, 34 };
auto p1 = find(v.begin(), v.end(), 42);auto p2 = find(v,42.3);auto p3 = find(7,42); // error: 7 is not a Container
8
Concepts
• Can express many high-level abstractions– A type system for sets of types, integers, and operations
• We have experimental implementations of concepts
• A concept is a handle to which we can attach– some “standard semantics” within the language
– essentially arbitrary semantics outside the language using tools
• Until we get concepts, we can “fake them” with static analysts and transformation tools
9
Context for the Pivot
• Semantically Enhanced Library (Language)– Enhanced notation through libraries– Restrict semantics through tools
• And take advantage of that semantics
C++
DomainSpecific Library
SemanticRestriction
s
10
Context for the Pivot
• Provide the advantages of specialized languages– Without introducing new “special purpose” languages– Without supporting special-purpose language tool chains– Avoiding the 99.?% language death rate
• Provide general support for the SELL idea– Not just a specialized tool per application/library– The Pivot fits here
C++
DomainSpecific Library
SemanticRestriction
s
11
Example SELL: Safe C++
• Add– Range-checked std::vector
• iterators– Resource handles– Any (if needed) (a typesafe union type)
• Subtract– Arrays– Pointers– New/delete– Unions– Excessively complex/obscure code
• Uses of undefined construct not caught by compilers (e.g. a[++i] = i)
• Transforms– Pointers into iterators and resource handles (if porting)– New/delete into resource handle uses
13
Aims
• To allow fully general analysis of C++ source code– “What a human can do”– Foci
• Templates (e.g. specialization)• C++0x features (e.g. concepts, generalized initializers)• Distributed programming• Embedded systems
– Limitation: we work after macro expansion• To allow transformation of C++ code
– i.e. production of new code from old source• Non-aim: handling other languages
– e.g. Fortran, Java– but C and C++ dialects are relatively easy
14
Related work
• Lots– 20+ tools for analyzing C++
• But– Most are specialized
• E.g. alias analysis, flow analysis, numeric optimizations– Most are attached to a single compiler/parser– None handles all of C++
• E.g. C + classes, C++ but not standard libraries– (that requires full handling of templates)
• Hardly two tools handle the same subset• None handles the key C++0x features (e.g. concepts)
– Some are proprietary– No serious interoperability
16
The Pivot
Compiler
IPR
XPR
Tool 2
Tool 1
C++ sourceObject code
C++ source
IDL
Tool 4
“information”
Tool 3
Specialized representation(e.g. flow graph)
CompilerCompiler
17
Why? The Original Project
• Communication with remote mobile device– Calling interface
• CORBA, DCOM, Java RMI, …, homebrew interface
– Transport• TCP/IP, XML, …, homebrew protocol
• Big, Ugly, Slow, Proprietary, …– Why can’t I just write ISO Standard C++?
18
The original Project Distributed programs in ISO C++
• “as similar as possible to non-distributed programming, but no more similar”
// use local object:
X x; // remote at “my host”
A a;
std::string s("abc");
// …
x.f(a, s); // a function call
// use remote object :
proxy<X> x;
x.connect("my_host");
A a;
std::string s("abc");
// …
x.f(a, s); // a message send
19
IPR high-level principles
• Complete: Direct representation of C++– Built-in types, classes, templates, expressions, statements, translation units …– Can represent erroneous and incomplete C++ programs
• Regular– The structure contains all of C++ but doesn’t mimic irregularities
• Programming effort proportional to complexity of task– IPR is not just a data structure
• Extensible– Node types– Information associated with a node– Operations
• No integration with compilers
20
IPR design choices
• Type safe• IPR (not its users) handles memory management• Minimal (run-time and space)
– Minimal number of nodes (unification)– Minimal number of checked indirections (usually, virtual function calls)
• Expression-based regular superset of C++– E.g. statements, declarations are expressions too– C++0x features (most important: concepts – types have types)
• Interfaces:– Purely functional, abstract classes, for most users
• No mutation operation on abstract classes• Users don't get pointers directly
– Mutating (operates on concrete classes)• Users get to use pointers for in-place transformation
– Traversals (and queries)• Several, most not in “the Pivot core”
21
IPR is minimal
• Necessary for dealing with real-world code– Multi-million line programs are not uncommon
• Given the constraint of completeness– C++ is complex
• especially when we use the advanced template features essential for high-performance work
• Unified representation– E.g., there is only one int and only one 1– Type comparison becomes pointer comparison
• Indirections are minimized– An indirection (only) when there is a choice of different
types of information
26
XPR (eXternal Program Representation)
• Can be thought of as a specialized portable object database– Easy/fast to parse– Easy/fast to write
• Compact– About as compact as C++ source code
• Robust– Read/write without using a symbol table
• LR(1), strictly prefix declaration syntax• Human readable• Human writeable• Can represent almost all of C++ directly
– No preprocessor directives– No multiple declarators in a declaration– No <, >, >>, or << in template arguments, except in parentheses
27
XPR
i : int // int i;
C : class { // class C {
m : const int // const int m;
mm : *const int // const int* mm;
f : (:int,:*char) double // double f(int,char*);
f : (z:complex) C // C f(complex z);
} // };
vector : <T> class { // template<class T> class vector {
p : *T // T* p;
sz : int // int sz;
} // };
Extremely simple SELL example
template <Parallelizable T>
void f(const T& v)
{
double d = v[2]; // OK
double* d = &v[2]; // not OK
};
29
Current and future work
• Complete infrastructure– Complete EDG and GCC interfaces– Represent headers (modularity) directly– Complete type representation in XPR
• Initial applications– Style analysis
• including type safety and security
– Analysis and transformation of STAPL programs
• Build alliances
References[GJS+06] Gregor, Douglas; Järvi, Jaako; Siek, Jeremy; Lumsdaine, Andrew;
Dos Reis, Gabriel; Stroustrup, Bjarne: Concepts: Linguistic Support for Generic Programming in C++. to appear OOPSLA'06.
[DRS05] Stroustrup, Bjarne; Dos Reis, Gabriel: A concept design. C++ Committee, paper N1782. April 2005.
[SDR05] Stroustrup, Bjarne; Dos Reis, Gabriel: Supporting SELL for High Performance Computing. LCPC '05.
[Str05] Stroustrup, Bjarne: A rational for semantically enhanced libraries. LCSD '05.