postdoc symposium - a logic meta-programming foundation for example-driven pattern detection in...

Coen De RooverSoftware Languages LabVrije Universiteit Brussel

A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in Object-Oriented Programs

Promotors: Wolfgang De Meuter

Johan Brichau

Post-doctoral Symposium TrackInternational Conference on Software Maintenance, 29/09/2011Colonial Williamsburg, VA (USA)

class Ouch {int hashCode() { return ...;}

}

General-Purpose Pattern Detection Tools

e.g. structural e.g. control flow, data flow

identify code of which the user specified the characteristics

scanner = new Scanner();...scanner.close();...scanner.next();

2

Let me explain the title first. Given its length, this will take a slide or two. First off, what do I consider a “general-purpose pattern detection tool”? Well, that’s any tool that identifies code of which the user has specified the characteristics. Those characteristics can be related to the structure of a program, but also its control flow and data flow. The slide illustrates the difference. Consider possible violations of the invariant that equal objects have equal hash codes. Those are characterized structurally by a method hashCode(), without a corresponding equals() method. Reads from a closed scanner, on the other hand, are characterized by control flow characteristics (next is invoked after close), and data flow characteristics (both on same scanner). Now, if such a tool existed, it could be used for a variety of purposes. For instance, to check whether the protocol of an in-house API is used correctly. Or to check whether an application-specific bug you just discovered, isn’t more widespread. Or even to check whether someone is instantiating a class for which a factory method exists. In short, it could be used to detect a lot of interesting application-specific patterns ... not just design patterns :)

public class OnlyLoggingLeaf extends Component { public void acceptVisitor(ComponentVisitor v) { System.out.println("Only logging."); }} public class SillyLeaf extends OnlyLoggingLeaf { public void acceptVisitor(ComponentVisitor v) { super.acceptVisitor(v); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }}

Exam

ple-

driv

en D

etec

tion user

tool

class ?name extends Component { public void acceptVisitor(?type ?v) { System.out.println(?string); ?v.?visitMethod(this); }}

0.6483

Of course, if such a general-purpose pattern detection tool existed, you would still have to tell it somehow what patterns to look for. Wouldn’t it be great if you could just give the tool an example implementation of the pattern you are looking for, and have the tool return all variants of this example in your code? In a nutshell, that’s what I proposed in my dissertation: an example-driven approach to pattern detection. So, how does it work in practice? Imagine that you want to check whether all subclasses of a Component class, have a method acceptVisitor that logs something before double dispatching to its parameter. Then you give the tool an example implementation as shown on the slid. It looks like regular Java code with meta-variables. For instance, the one in green substitutes both for the parameter of the method and the receiver of the double dispatching. Each result reported by the tool consists of bindings for the meta-variables. One of those results is shown on the bottom of the slide. As you can see, this particular result is a variant of the implementation we gave to the tool. Here, the acceptVisitor() method performs the required logging through a super call instead of directly. It also doesn’t dispatch to its parameter, but to a temporary variable that aliases the parameter. The more of these variants the tool finds, the better. However, not all variants of the implementation are equal. That’s why an example-driven pattern detection tool ranks the variants it finds based on their similarity to the given example.

Motivationuniform language for specifying behavioral and structural characteristics

existing specification languages are too specialized

e.g. temporal logic formulas over a control flow graph

! constraint satisfaction problem over AST nodes

familiar to developers

often communicate using example snippets and diagrams

facilitates assessing reported variants

recalls implicit implementation variants through static analysesrelieves developers from having to enumerate each variant

shields developers from intricate analysis results

e.g. present in one, all, or some program executions4

So, why did I investigate such an example-driven approach to pattern detection? First of all, code templates provide a uniform language for specifying the behavioral and structural characteristics of a pattern. Existing tools, in contrast, are tailored to one kind of characteristic. For instance, tools specialized in cflow chars might have you express cflow chars using temporal logic formulas over a cflow graph. Tools specialized in structural characteristics, on the other hand, might have you express a constraint satisfaction problem over ast nodes. Clearly, the differences between such highly specialized languages make it difficult to specify heterogeneously characterized patterns. Second, code templates align well with the way developers tend to communicate: through example snippets and diagrams. Third, having the tool recall implicit variants of the exemplified characteristics relieves user from having to enumerate all of them in a specification. It also shields users from the intricate program analyses that are needed to recall these variants. Fourth, not all variants of behavioral characteristics are created equal. Some show up in only one, in some, or in all program executions. Ranking these variants should facilitate assessing them.

In the dissertation ...

5

Now, motivating all of this took a lot longer in the actual dissertation. There, I discussed the dimensions in the design of a pattern detection tool, surveyed the existing tools on these dimensions, concluded that there was a need for a general-purpose tool, motivated a set of desiderata for such a tool, and evaluated the existing tools on these desiderata. I then used this evaluation to motivate the cornerstones of my approach. Today, I’ll briefly discuss 3 of them: logic meta-programming, example-driven matching of code templates and domain-specific unification.

Founding Cornerstone: LMPspecify characteristics through logic queriesleave operational search to logic evaluator

quantify overreified program representationAST, CFG, PTA

✘ exposes users to details of representation + reification

✓ expressive, declarative, ...

6

Logic Meta Programming is the founding cornerstone of my approach. It advocates specifying a pattern’s characteristics through logic queries, and leaving the operational search for the pattern’s instances to the logic evaluator. Which is, of course, a good thing. You can read the query on the slide as “give me a class that declares a method named “foo” or one that is named “bar”. LMP is something that has already been around for a decade or two. It has been used to detect patterns in ASTs, CFGs, and even PTA results. It is very expressive, but it exposes users to the details of such a representation and the way it was converted into a logic format.

Cornerstone: Example-Driven Specificationexemplify characteristics through code templates embedded in logic queries

matched according to multiple strategies

vary in leniency from AST-based to flow-based

recall implicit variants of structural and control flow characteristics

{ System.out.println(“Hello”); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }

✘{ super.acceptVisitor(v); x.doSomething(); v.visitSuperLogLeaf(this); }

✓

if jtClassDeclaration(?class) {

},?method methodDeclarationHasName: ?visitMethod

7

So, LMP is expressive, but difficult to use. The second cornerstone of my approach therefore advocates exemplifying pattern characteristics through code templates instead. By embedding templates in logic queries, they can be combined through logic connectives and multiple occurrences of the same meta-variable. On the slide, you can see a logic query that consists of two conditions. The first condition corresponds to our example implementation of the Component subclass. The second condition is not a template, but an ordinary logic condition. They are connected through the occurrences of the purple meta-variable. As a result, we will also find the method invoked by the visitXXX message. Now, code templates are nothing new in pattern detection tools. What is new, however, is that we match these in an example-driven manner according to multiple strategies. These strategies vary in leniency from strict AST-based (which is the predominant one among pattern detection tools) to a very lenient flow-based matching. The idea is that these strategies recall variants of sturctural and control flow chars that are implied by the semantics of the programming language. For instance, even an indirect subclass of Component with the acceptVisitor method on the left would be recognized as a variant of our implementation. It corresponds to the example, except that it contains additional instructions and performs the exemplified logging through a super call. However, to recall the pattern instance on the right, our tool needs to be able to recognize implicit variants of data flow characteristics. This brings us to the last cornerstone I will discuss.

Cornerstone: Domain-Specific Unificationextensions ensure that implicit implementation variants unify

class ?name extends Component { public void acceptVisitor(?type ?v) { System.out.println(?string); ?v.?visitMethod(this); }}

class MustAlias extends Component { public void acceptVisitor(ComponentVisitor v) { System.out.println(“Hello”); ComponentVisitor temp = v; temp.visitSuperLogLeaf(this); }}

AST node AST node identical 1

Qualified Type Simple Type denote same or co-variant return types 1

Expression Expression in must-alias or may-alias relation 0.9 or 0.5

Message Name Method Name message may invoke method according to dynamic or static receiver type 0.5 or 0.4

. . .

consults static analyses

likelihood of resulting in

false positive, propagated by

fuzzy logic cornerstone

8

The domain-specific unification cornerstone consists of domain-specific extensions to the regular unification procedure we know from Prolog. It ensures that implicit implementation variants unify. In the code on the right, the first occurrence of the green variable v is bound to a parameter. The second occurrence of v is bound to a temporary variable. The dsu allows this because v and temp happen to evaluate to the same value at run-time. To determine this, the dsu consults static analyses.The table on the slide lists some other unification extensions. The one in the second row, unifies a qualified type with a simple type if both denote the same type or are co-variant return types. To this end, it consults a semantic analysis. The name of a message and the name of a method also unify if the message may invoke the method according to the static or the dynamic type of the receiver. As an extension may succeed where the plain uni proc fails, it might result in false positives. We therefore associate unification degrees with each extension. They are shown in the last column. For instance, two expression unify with a degree of 0.9 if they alias in every program execution, but with a degree of 0.5 if they alias only in some. All of these degrees are combined and used to compute the ranking of a detected result.

In Practice: Detecting Lapsed Observers

9

On to some practice. The paper discusses how to detect possible lapsed observers in an example-driven manner.Those are observers that are added to a subject, but never removed.

?x unifies with ?ybinding for ?x binding for ?y conditions degreean ASTNode an ASTNode unify under general-purpose procedure 1an ASTNode a compound term each argument ti of the compound term f (t1, . . . , tn) and each corresponding child ci of the AST

node unifies with degree �i and functor f unifies with the name of the node’s class

�nı=1�i

a Type a Type according to the semantic analysis, denote same type or are co-variant return types 1a method invocationName

a method declara-tion Name

invocation may invoke declaration according to the static type of the receiver or the dynamic typeof the objects it may evaluate to

14 or 1

2

a class declarationName

an instance creationexpression Name

expression instantiates declared class 1

an Expression an Expression according to an intra-procedural must-alias analysis or according to an inter-procedural may-alias analysis, must or may evaluate to the same object at run-time

910 or 1

2

an Expression a variable declara-tion Name

expression references the variable according to a semantic analysis 910

1 class Point implements ChangeSubject {2 private HashSet observers ;3 public void addObserver ( utils.ChangeObserver o ) {

4 observers .add( o );5 }6 public void removeObserver( ChangeObserver o) {

7 this.observers .remove(o);8 }9 public void notifyObservers() {

10 for (Iterator e = observers .iterator() ; e.hasNext() ;) {

11 ((ChangeObserver)e.next()) . refresh (this);12 }13 }14 }15 class Screen implements ChangeObserver {16 public void refresh (ChangeSubject s) { ... }17 }18 class Main {19 public static void main(String[] args) {20 Point p = new Point(5, 5);

21 Screen s1 = new Screen ("s1") ;

22 Screen s2 = new Screen("s2");23 p. addObserver ( s1 );24 p.addObserver(s2);25 ...26 p.removeObserver(s2);27 }28 }

9

1 if jtClassDeclaration(?subjectClass){2 class ?subjectName {3 ?mod1List ?t1 ?observers = ?init;4 public ?t2 ?addObserver ( ?observerType ?observer ) {

5 ?observers .?add( ?observer );6 }7 public ?t3 ?removeObserver( ?observerType ?otherObserver) {

8 ?observers .?remove(?otherObserver);9 }

10 ?mod2List ?t4 ?notifyObservers(?param1List) {11 ?observers ;12 ?observer . ?update (?argList);13 }14 }15 },

16 jtClassDeclaration(?observerClass){17 class ?observerName {18 ?mod3List ?t5 ?update (?argList) {}19 }20 },

21 jtExpression(?register){ ?subject. ?addObserver ( ?lapsed ) },

22 not(jtExpression(?unregister){ ?subject.?removeObserver( ?lapsed ) }),

23 jtExpression(?alloc){ ?lapsed := new ?observerName (?argList) }

8

Fig. 2. Domain-specific extensions of the unification procedure illustrated on the detection of lapsed listeners in the Observer design pattern.

?subjectClass and ?observerClass). Lines 21–23 exemplifythe lapsed listener pitfall at the instance-level: as instancesof the participating classes that exhibit the characteristics ofthe pitfall. They identify ?lapsed objects that are added to a?subject (line 21), but never removed from it (line 22). Thefinal condition term is optional. It identifies the expressionthat instantiated the lapsed object. To this end, it uses thenon-native operator := which unifies the logic variable onits left-hand side with the AST node that matches the codeon its right-hand side. As a result, ?alloc will be bound tonew Screen("s1") for the depicted program.

Note that the depicted specification only detects possiblelapsed listeners. It does not identify the point in the program’sexecution after which an observer is no longer needed, nordoes it specify that the ?unregister expression should beexecuted after the ?register expression. It can therefore onlybe used to issue warnings. Actual lapsed listeners could beidentified through a subsequent dynamic analysis.

IV. EVALUATION

In the dissertation [8], we formulated several well-motivateddesiderata for each dimension in the design of a pattern detec-tion tool: its specification language, its detection mechanismand its program representation. When fulfilled, these result ina general-purpose tool that can be applied to detect structuraland behavioral pattern characteristics using descriptive speci-fications in a uniform language. Through running examples,we motivated each individual cornerstone of our approach

using the desiderata it helps to fulfill. Next, we evaluated ourapproach as a whole on these desiderata by detecting patternsthat are representative for the intended use of a general-purpose tool: design patterns, µ-patterns and bug patterns.

Overall, the patterns were straightforward to specify inan example-driven manner. Even though the patterns arediverse and heterogeneously characterized, their specificationsare descriptive. The majority consists of exclusively templateterms with little non-native syntax. We had to resort to higher-order logic predicates only for cardinality constraints suchas “for-all” and “as-many-as”. We were able to detect mostof the pattern instances with few false positives. Recallingthe missing instances would require exemplifying additionalprototypical implementations of the pattern —of which thedetection mechanism recognizes implicit implementation vari-ants. Many false positives could be eliminated by adding logicconditions to the specification that implement heuristics.

V. FUTURE WORK

There are still many open questions related to our approach.For instance, the ranking of the detection results was intendedto reflect their projected likelihood of being a false positive.Large corpus-based studies are needed to test these projectionsagainst reality. Using UML-like diagrams instead of codetemplates to exemplify a pattern would be a natural extensionof our approach. Interestingly, logic variables could serve aslinks between a class diagram and a sequence diagram insuch specifications. We have already started to explore this

Example-driven Specificationsubject class

observer class

lapsedobserverinstance

addmethod

update message

update method

instance creation

addmessage

Rest assured, there is some reason to this madness. I’ve simply highlighted all occurrences of a variable in the same color. The specification for the lapsed observer consists of 3 parts. The first is a template that exemplifies the prototypical implementation of the subject class. Among others, it has a method addObserver (in orange) for registering an observer with the subject. It takes the observer as its parameter (in blue) and adds the observer to the purple field. It also has a notifyObservers method that notifies observers of state changes. Note that it sends a message (in yellow) to one of the previously added observers in blue). The second part exemplifies an observer class. It is exemplified as a class in which the method invoked by the yellow update message resides. The lapsed observer instance is found as the gray argument to an addObserver invocation, which is never used as an argument to any removeObserver invocation. The last line finds the expression that created this observer instance.



�nı=1�i




14 or 1

2





910 or 1

2






10 for (Iterator e = observers .iterator() ; e.hasNext() ;) {

11 ((ChangeObserver)e.next()) . refresh (this);12 }13 }14 }15 class Screen implements ChangeObserver {16 public void refresh (ChangeSubject s) { ... }17 }18 class Main {19 public static void main(String[] args) {20 Point p = new Point(5, 5);

21 Screen s1 = new Screen ("s1") ;

22 Screen s2 = new Screen("s2");23 p. addObserver ( s1 );24 p.addObserver(s2);25 ...26 p.removeObserver(s2);27 }28 }

9









8




IV. EVALUATION




V. FUTURE WORK


qualified type

expression

Example Instance

simple type

expression

expression parameter name



�nı=1�i




14 or 1

2





910 or 1

2






10 for (Iterator e = observers .iterator() ; e.hasNext() ;) {11 ((ChangeObserver)e.next()) . refresh (this);12 }13 }14 }15 class Screen implements ChangeObserver {16 public void refresh (ChangeSubject s) { ... }17 }18 class Main {19 public static void main(String[] args) {20 Point p = new Point(5, 5);

21 Screen s1 = new Screen ("s1") ;22 Screen s2 = new Screen("s2");23 p. addObserver ( s1 );24 p.addObserver(s2);25 ...26 p.removeObserver(s2);27 }28 }

9









8




IV. EVALUATION




V. FUTURE WORK


method name

class name

message name

simple type 11

Here’s an example of a lapsed listener, together with the unification extensions that were required to find it. The paper has all the details.

Evaluationresult in general-purpose detection tool

for structural and behavioral characteristics

using descriptive specifications in a uniform language

motivated each cornerstone through desiderata it helps to fulfill

using running examples for each kind of characteristic

approach as a whole on desideratausing design patterns, µ-patterns, bug patterns✓ descriptive specifications✓ most instances recalled with few false positives✘ cardinality constraints difficult to exemplify

12

Now, how do you evaluate such a thing? I needed to evaluate the approach as a whole on the desiderata for a general-purpose pattern detection tool, but I also had to motivate its individual cornerstones. I therefore enabled the cornerstones one by one in my tool to demonstrate what desiderata they help to fulfill. The approach as a whole was evaluated by detecting instances of representative design patterns, micro-patterns and bug patterns. And of course, it worked well. A lot of specifications consisted solely of Java code with meta-variables. Only cardinality constraints such as “at least as many as” are easier to express using plain LMP.

Future Workwhat to rank pattern detection results on?

similarity + imprecisions in analyses ! severity for bug patterns?

in general, make sure that our tools become part of every developer’s toolbox

pattern specification formalisms that are even easier to use

but ... maybe ease-of-use is not the only adoption hurdle

search space exploration backed by program analyses

13

e.g. example-driven program transformation

e.g. example-driven history querying

generalizing pattern instances into example-driven specifications

need a corpus of programs in which pattern instances have been documented

What do I consider future work? First of all, I’m currently ranking results based on their similarity to the given example and on the imprecisions in the analyses that were needed to find each result. That seems ok for several patterns, perhaps except for bug patterns. Specialized bug detection tools rank bugs based on their severity. So there is room for future work here, although yesterday’s keynote speaker seems to disagree :) In any case, to evaluate our tools, we need a corpus of programs in which pattern instances have been documented. That’s a tremendous task, but someone has to do it. Perhaps we can do it collaboratively using a social website. There is also room for other specification formalisms that are even easier to use. Currently, I’m interested in search-based techniques from artificial intelligence to automatically generalize snippets of code into an example-driven specification. In general, I believe a lot of work still needs to be done to ensure our tools become part of every developer’s toolbox. I’m thinking of specifying program transformations in an example-driven way or even querying the history of a program in an exemple-driven way. But maybe ease-of-use is not the only hurdle to the adoption of our tools. Empirical studies are needed to determine what is keeping our tools from the toolboxes.

Lessons for Doctoral Students

proponent of artifact-driven research

stand on shoulder of giants ... or reinvent the wheel ?

specification of artifact for reproducibility (in my case: meta-interpreters)

SOUL, JDT, SOOT: thanks!

share with others, gain momentum

but, often takes implementing an algorithm to understand its details

be wary of analysis paralysis

trust your advisors when they say you have enough material ;)

anonymous: “getting a PhD is akin to getting a driver’s license for doing research”

14

highly

subjective

Since this is the post-doctoral symposium, here are some of the lessons I learned that could be of use to doctoral students. Warning: these are highly subjective and personal.

[email protected]/SOUL/

postdoc symposium - a logic meta-programming foundation for example-driven pattern detection in...

Technology

example implementation

data flow characteristics

exampledriven approach

given example

example snippets

control flow anddata

component class

method acceptvisitor