v. winter, j. guerrero, a. james, c. reinke linking syntactic and semantic models of java source...

38
V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

Upload: gilbert-simmons

Post on 23-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

V. Winter, J. Guerrero, A. James, C. Reinke

LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM

TRANSFORMATION SYSTEM

Page 2: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

OUTLINE• Introduction

• Motivation: The need for static analysis

• Why transformation systems are interesting in this setting

• Creating a rule in PMD

• Creating a rule in Sextant

• GPS-Traverse

• Overview

• Example: Constructing a call-graph

• Technical details of GPS-Traverse

Page 3: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

SOURCE-CODE ANALYSIS• Is heavily employed across the public and private sectors including:

• the top 5 commercial banks

• 5 of the top 7 computer software companies

• 3 of the top 5 commercial aerospace and defense industry leaders

• the 3 largest arms services for the US

• 3 of the leading 4 accounting firms

• 2 of the top 3 insurance companies

Page 4: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

SOURCE-CODE ANALYSIS• It has been argued that source-code analysis can play an important role with respect to

software assurance within an Agile development process

• The FDA is recommending (and may eventually mandate) the use of static-analysis tools for the development of medical device software.

• GrammaTech’s CodeSonar is a static-analysis tool that the FDA is currently using to investigate failures in recalled medical devices.

Page 5: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

STATIC-ANALYSIS TOOLS• Are frequently rule-based

• Utilize a variety of software models (e.g AST, call-graph, control-flow graph)

• In an OO implementation, involve traversals of object-structures using the visitor pattern.

• Make use of pattern recognition (e.g., matching).

• May transform source-code (e.g., inserting markers/annotations to control analysis)

• Query software models

• Aggregate information

Page 6: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

CREATING A RULE IN PMDAvoid using while-loops without curly braces

Page 7: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

CREATING A RULE IN PMD• Step 1: Figure out what to look for. In this case we want to capture the convention that

while-loops must use braces.

• Construct a compilation unit containing an instance of the syntactic property you want to detect.

class Example { void bar() { while (baz) buz.doSomething(); } }

Page 8: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

AST GENERATION• PMD uses JavaCC to generate an AST (Abstract Syntax Tree) corresponding to the

source code.

CompilationUnit TypeDeclaration ClassDeclaration:(package private) UnmodifiedClassDeclaration(Example) ClassBody ClassBodyDeclaration MethodDeclaration:(package private) ResultType MethodDeclarator(bar) FormalParameters Block BlockStatement Statement WhileStatement Expression PrimaryExpression PrimaryPrefix Name:baz Statement StatementExpression:null PrimaryExpression PrimaryPrefix Name:buz.doSomething PrimarySuffix Arguments

Page 9: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

PATTERN SELECTION• Select and generalize the smallest portion of the AST containing the pattern in which you

are interested. Make sure you discriminate good patterns from bad patterns (e.g., blocks versus no blocks). Consult Java grammar as needed.

CompilationUnit TypeDeclaration ClassDeclaration:(package private) UnmodifiedClassDeclaration(Example) ClassBody ClassBodyDeclaration MethodDeclaration:(package private) ResultType MethodDeclarator(bar) FormalParameters Block BlockStatement Statement WhileStatement Expression PrimaryExpression PrimaryPrefix Name:baz Statement StatementExpression:null PrimaryExpression PrimaryPrefix Name:buz.doSomething PrimarySuffix Arguments

Page 10: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

CREATE RULE

public class WhileLoopsMustUseBracesRule extends AbstractRule { public Object visit(ASTWhileStatement node, Object data) { SimpleNode firstStmt = (SimpleNode)node.jjtGetChild(1); if (!hasBlockAsFirstChild(firstStmt)) { addViolation(data, node); } return super.visit(node,data); } }

Page 11: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

CREATE PATTERN MATCHER

// pattern matcher private boolean hasBlockAsFirstChild(SimpleNode node) {

return (node.jjtGetNumChildren() != 0 && (node.jjtGetChild(0) instanceof ASTBlock));

}

Page 12: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

ADD RULE TO RULESET• Add the Newly Created Rule to the PMD ruleset

<?xml version="1.0"?><ruleset name="My custom rules"xmlns="http://pmd.sf.net/ruleset/1.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://pmd.sf.net/ruleset/1.0.0 http://pmd.sf.net/ruleset_xml_schema.xsd"xsi:noNamespaceSchemaLocation="http://pmd.sf.net/ruleset_xml_schema.xsd"><rule name="WhileLoopsMustUseBracesRule"message="Avoid using 'while' statements without curly braces"class="WhileLoopsMustUseBracesRule"><description>Avoid using 'while' statements without using curly braces</description><priority>3</priority><example><![CDATA[public void doSomething() {while (true)x++;}]]></example></rule></ruleset>

Page 13: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

IN SEXTANTAvoid using while-loops without curly braces

Page 14: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

CREATE BASIC RULE PATTERN

strategy WhileLoopsMustUseBracesRule:

Statement[:] while ( <Expression>_1 ) <Statement>_1 [:] Statement[:] while ( <Expression>_1 ) <Statement>_1 [:]

Page 15: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

ADD SPECIFIC PATTERN CONSTRAINT

strategy WhileLoopsMustUseBracesRule:

Statement[:] while ( <Expression>_1 ) <Statement>_1 [:] Statement[:] while ( <Expression>_1 ) <Statement>_1 [:] if { not(<Statement>_1 = Statement[:] <Block>_1 [:]) }

Page 16: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

ADD METRIC/ACTION

strategy WhileLoopsMustUseBracesRule:

Statement[:] while ( <Expression>_1 ) <Statement>_1 [:] Statement[:] while ( <Expression>_1 ) <Statement>_1 [:] if { not(<Statement>_1 = Statement[:] <Block>_1 [:]) andalso sml.addViolation(<Statement>_1) }

Page 17: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

OBSERVATIONS• Primitive operations in transformation systems include:

• Parsing

• Matching

• Traversal

• The software models that transformation systems typically operate on are terms – either concrete or abstract syntax trees.

• This makes the foundational framework of transformation systems well-suited for rule-based source-code analysis systems. Especially systems whose rules have syntax-based specifications.

Page 18: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

SEMANTIC RULESUse equals() instead of == to compare objects

Page 19: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

JAVA’S INTEGER CACHE• Some rules require semantic analysis

• The implementation of such rules requires the ability to query semantic models (i.e., software models other than an AST)

Page 20: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

package p1; public class A { static void myEq(Integer x, Integer y) {

System.out.println(x == y); }

public static void main(String[] args) { myEq(100,100); myEq(200,200); } }

Page 21: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

GPS-TRAVERSELinking Syntactic and Semantic Models within a Transformation System

Page 22: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

GPS-TRAVERSE • GPS-Traverse

• enables contextual information to be transparently tracked during transformation.

• is a collection of transformations whose purpose is to associate terms with the contexts in which they are defined

• This association is based on:

• Structural properties

• Nested classes

• Local classes

• Anonymous classes

• Frame variables currently in scope

• Generic variables currently in scope

Page 23: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

NESTED CLASSES

package p1; class B1 { class B2 { class B3 { int x; } } }

Page 24: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

FIELDS VERSUS LOCAL VARIABLES

class B { int x = 1; void f() { { ... x ... int x = 2; ... x ... } }

Page 25: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

GENERIC TYPES VERSUS STANDARD TYPES

class C<T> { class T { T T; // field T of type <T> } }

Page 26: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

IN SUMMARY…• GPS-Traverse: term context

• In turn, a tuple of the form (term, context) provides the basis for a variety of semantic analysis functions

• A particularly useful such analysis function is called resolution

Page 27: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

RESOLUTION• Resolution is a semantic analysis function that operates on terms denoting references

• The resolution function used by Java is highly complex and involves:

• Static evaluation

• Type analysis

• Overloading, overriding, shadowing

• Generic analysis

• Local analysis

• Visibility – public, protected, package private, private

• Subtyping

• Imports: single-type, on-demand, and static

Page 28: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

USES OF RESOLUTION• Resolution is a prerequisite for a variety of software-based analysis and manipulation

activities such as:

• Bootstrapping semantic models

• Software metrics

• API usage analysis

• Refactoring

• Slicing

• Migration – a well-formed compliment of slicing

• Join point recognition

• Resolution-informed transformation is well-suited for many of these activities

• And finally, resolution-informed transformation can also play a key role in the construction of semantic models of software such as the call graph of a software system

Page 29: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

EXAMPLE: CALL GRAPH

package p1; public class A extends C { class innerA extends B1 { void g(byte b) {

f(b + 0); f(0);

} } } class B1 extends B2 { private void f(int x) { } } class B2 { void f(long x) { } void f(short x) { } } class C { void f(int x) { } }

Page 30: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM
Page 31: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

TECHNICAL DETAILSBascinet, the TL System, and Sextant

Page 32: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

BASCINET• A Netbeans-based IDE supporting the development of TL programs

• Syntax-directed editors for TL, ML, and EBNF files

• Code-folding for both TL and ML

• Hyperlinks from MLton compiler output to ML source code

• Integrated with third-party visualization tools such as Cytoscape , GraphViz, and TreeMap

• Solves some key system-level problems:

• Discrete concurrent (forgetful) application of a transformation to a file hierarchy

{ transformation } x {file1, file2, …}

• Continuous sequential (stateful) application of a transformation to a file hierarchy

state1 = transformation( state0, file1)

state2 = transformation( state1, file2)

Page 33: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

THE TL SYSTEM• Input: GLR Parser

• Output: Abstract Prettyprinter

• TL – A language for specifying higher-order transformation

• First-order matching on concrete syntax trees

• First-order and higher-order generic traversals

• Standard combinators plus special-purpose combinators

• Modular

• Partially type-checked

• ML – A functional programming language tightly integrated with TL

• Computation is expressed in terms of modules written in TL and ML.

Page 34: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

TL• The terms being manipulated are concrete syntax trees

• The computational unit is the conditional rewrite rule:

termlhs termrhs if { condition }

• Rules (also called strategies) can be bound to identifiers:

r: termlhs termrhs if { condition }

• Strategies can be constructed by composing rules using a variety of combinators:

r1 <+ r2

r1 <; r2

• Strategies can be applied to terms using traversals and iterators:

TDL myStrategy myTerm

Page 35: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

import_closed GPS.Locator

module CyclomaticComplexity strategy initialize: ...

strategy outputResults: ...

strategy collectMetrics: TDL ( GPS.Locator.enter <; ccAnalysis <; GPS.Locator.exit )

strategy ccAnalysis: MethodCC <+ ConstructorCC strategy MethodCC: ... strategy ConstructorCC: ...

end // module

Page 36: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

GPS-TRAVERSE• Transformationally maintains a semantic model which can be queried in a variety of

ways:

• getContextKey

• getEnclosingContextKey

• currentContextType

• enclosingContextType

• withinContextType

• inMethod

• isGeneric

• isLocalGeneric

• isVar

Page 37: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

strategy CallGraph: <SelectorOptExpression>_methodCall <SelectorOptExpression>_methodCall if {

isMethodCall <SelectorOptExpression>_methodCallandalso sml.GPS_inMethod()andalso <key>_methodContext = sml.GPS_getContextKey()

// semantic queryandalso <key>_calledMethod = sml.resolve( <key>_methodContext ,<SelectorOptExpression>_methodCall)andalso sml.outputPP( <key>_methodContext )andalso sml.output(" calls ")andalso sml.outputPP( <key>_calledMethod )

}

strategy isMethodCall:

//basic call SelectorOptExpression[:] <TypeArgsOpt>_1 <Id>_1 <Arguments>_1 [:] SelectorOptExpression[:] <TypeArgsOpt>_1 <Id>_1 <Arguments>_1 [:] <+ // embedded call ...

Page 38: V. Winter, J. Guerrero, A. James, C. Reinke LINKING SYNTACTIC AND SEMANTIC MODELS OF JAVA SOURCE CODE WITHIN A PROGRAM TRANSFORMATION SYSTEM

Questions?

THE END