program analysis, representation, and transformation
DESCRIPTION
Program Analysis, Representation, and Transformation. Program Analysis. Extracting information, in order to present abstractions of, or answer questions about, a software system Static Analysis: Examines the source code Dynamic Analysis: Examines the system as it is executing. - PowerPoint PPT PresentationTRANSCRIPT
Apr 19, 2023 COSC6431 2
Program Analysis
• Extracting information, in order to present abstractions of, or answer questions about, a software system
• Static Analysis: Examines the source code
• Dynamic Analysis: Examines the system as it is executing
Apr 19, 2023 COSC6431 3
What are we looking for?
• Depends on our goals and the system– In almost any language, we can find out information
about variable usage
– In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc.
– We can also find potential blocks of code that can never be executed in running the program (dead code)
– Typically, the information extracted is in terms of entities and relationships
Apr 19, 2023 COSC6431 4
Entities
• Entities are individuals that live in the system, and attributes associated with them.
Some examples:– Classes, along with information about their superclass,
their scope, and ‘where’ in the code they exists.
– Methods/functions and what their return type or parameter list is, etc.
– Variables and what their types are, and whether or not they are static, etc.
Apr 19, 2023 COSC6431 5
Relationships
• Relationships are interactions between the entities in the system.
Relationships include:– Classes inheriting from one another.
– Methods in one class calling the methods of another class, and methods within the same class calling one another.
– One variable referencing another variable.
Apr 19, 2023 COSC6431 6
Information format
• Many different formats in use• Simple but effective: RSF
inherit TRIANGLE SHAPE• TA is an extension of RSF that includes a schema
$INSTANCE SHAPE Class• GXL is a XML-like extension of TA
Blow-up factor of 10 or more makes it rather cumbersome
Apr 19, 2023 COSC6431 7
Static Analysis
• Involves parsing the source code
• Usually creates an Abstract Syntax Tree
• Borrows heavily from compiler technology but stops before code generation
• Requires a grammar for the programming language
• Can be very difficult to get right
Apr 19, 2023 COSC6431 8
CppETS
• CppETS is a benchmark for C++ extractors
• It consists of a collection of C++ programs that pose various problems commonly found in parsing and reverse engineering
• Static analysis research tools typically get about 60% of the problems right
Apr 19, 2023 COSC6431 9
Example program
#include <iostream.h>class Hello {public: Hello(); ~Hello(); };Hello::Hello(){ cout << "Hello, world.\n"; } Hello::~Hello(){ cout << "Goodbye, cruel world.\n"; }main() {
Hello h;return 0;
}
Apr 19, 2023 COSC6431 10
Example Q&A
• How many member methods are in the Hello class? Answer: Two, the constructor (Hello::Hello()) and destructor (Hello::~Hello()).
• Where are these member methods used?Answer: The constructor is called implicitly when an instance of the class is created. The destructor is called implicitly when the execution leaves the scope of the instance.
Apr 19, 2023 COSC6431 11
Static analysis in IDEs
• High-level languages lend themselves better to static analysis needs– EiffelStudio automatically creates BON
diagrams of the static structure of Eiffel systems
– Rational Rose does the same with UML and Java
• Unfortunately, most legacy systems are not written in either of these languages
Apr 19, 2023 COSC6431 12
Static analysis pipeline
Source code Parser Abstract Syntax Tree
Fact base
Fact extractor
Clustering algorithm
Metrics tool
Visualizer
Apr 19, 2023 COSC6431 13
Dynamic Analysis
• Provides information about the run-time behaviour of software systems, e.g.– Component interactions– Event traces– Concurrent behaviour– Code coverage– Memory management
• Can be done with a profiler or a debugger
Apr 19, 2023 COSC6431 14
Instrumentation
• Augments the subject program with code that transmits events to a monitoring application, or writes relevant information to an output file
• A profiler can be used to examine the output file and extract relevant facts from it
• Instrumentation affects the execution speed and storage space requirements of the system
Apr 19, 2023 COSC6431 15
Instrumentation process
Source code Annotator Annotated program
Instrumentedexecutable
CompilerAnnotation
script
Apr 19, 2023 COSC6431 16
Dynamic analysis pipeline
Instrumentedexecutable
CPU Dynamic analysis data
Fact base
Profiler
Clustering algorithm
Metrics tool
Visualizer
Apr 19, 2023 COSC6431 17
Non-instrumented approach
• One can also use debugger log files to obtain dynamic information
• Disadvantage: Limited amount of information provided
• Advantage: Less intrusive approach, more accurate performance measurements
Apr 19, 2023 COSC6431 18
Dynamic analysis issues
• Ensuring good code coverage is a key concern
• A comprehensive test suite is required to ensure that all paths in the code will be exercised
• Results may not generalize to future executions
Apr 19, 2023 COSC6431 19
Static vs. Dynamic
• Reasons over all possible behaviours (general results)
• Conservative and sound
• Challenge: Choose good abstractions
• Observes a small number of behaviours (specific results)
• Precise and fast
• Challenge: Select representative test cases
Apr 19, 2023 COSC6431 20
Program Representation
• Fundamental issue in re-engineering– Provides means to generate abstractions– Provides input to a computational model for
analyzing and reasoning about programs– Provides means for translation and
normalization of programs
Apr 19, 2023 COSC6431 21
Key questions
• What are the strengths and weaknesses of various representations of programs?
• What levels of abstraction are useful?
Apr 19, 2023 COSC6431 22
Representation schemes
• Chosen based on objectives and tasks to be performed. Popular ones are:– Abstract syntax trees– Control Flow Graphs– Data Flow Graphs– Structure Charts– Module Dependency Graphs
Apr 19, 2023 COSC6431 23
Abstract Syntax Trees
• A translation of the source text in terms of operands and operators
• Omits superficial details, such as comments, whitespace
• All necessary information to generate further abstractions is maintained
Apr 19, 2023 COSC6431 24
AST production
• Four necessary elements to produce an AST:– Lexical analyzer (turn input strings into
tokens)– Grammar (turn tokens into a parse tree)– Domain Model (defines the nodes and arcs
allowable in the AST)– Linker (annotates the AST with global
information, e.g. data types, scoping etc.)
Apr 19, 2023 COSC6431 25
AST example
• Input string: 1 + /* two */ 2• Parse Tree:
• AST (withoutglobal info)
21
+
intint
Add
1 2
arg1 arg2
Apr 19, 2023 COSC6431 26
Control Flow Graphs
• Offer a way to eliminate variations in control statements by providing a normalized view of the possible flow of execution of a program
• To produce a CFG:– AST of the program
– Decomposition of the program into basic blocks
– Basic semantics on the control statements of the language
Apr 19, 2023 COSC6431 27
Data Flow Graphs
• Focus mostly on the exchange of information between program components, i.e. basic blocks, functions, modules
• To produce a DFG:– AST of the program
– Decomposition of the program into basic blocks (or more coarsely-grained level)
– Annotations on uses and definitions of variables
Apr 19, 2023 COSC6431 28
Structure charts
• Represent data and control information in a concise and compact form
• To produce a structure chart:– The CFG of the program– The DFG of the program
Apr 19, 2023 COSC6431 29
Module Dependency Graphs
• The most common way to represent data coupling and data dependencies between program and system entities
• To produce an MDG:– Structure chart of the program– Information on parameter passing between
procedures and functions– Containment information
Apr 19, 2023 COSC6431 30
Program Transformation
• A program is a structured object with semantics
• Structure allows us to transform a program
• Semantics allow us to compare programs and decide on the validity of transformations
Apr 19, 2023 COSC6431 31
Program Transformation
• The act of changing one program into another (from a source language to a target language)
• Used in many areas of software engineering:– Compiler construction
– Software visualization
– Documentation generation
– Automatic software renovation
Apr 19, 2023 COSC6431 32
Application examples
• Converting to a new language dialect• Migrating from a procedural language to an
object-oriented one, e.g. C to C++• Adding code comments• Requirement upgrading, e.g. using 4 digits for
years instead of 2 (Y2K)• Structural improvements, e.g. changing GOTOs
to control structures• Pretty printing
Apr 19, 2023 COSC6431 33
Simple program transformation
• Modify all arithmetic expressions to reduce the number of parentheses using the formula: (a+b)*c = a*c + b*c
x := (2+5)*3becomesx := 2*3 + 5*3
Apr 19, 2023 COSC6431 34
Two types of transformations
• Translation– Source and target language are different– Semantics remain the same
• Rephrasing– Source and target language are the same– Goal is to improve some aspect of the program
such as its understandability or performance– Semantics might change
Apr 19, 2023 COSC6431 35
Translation
• Program synthesis– Lowers the level of abstraction, e.g. compilation
• Program migration– Transform to a different language
• Reverse Engineering– Raises the level of abstraction, e.g. create architectural
descriptions from the source code
• Program Analysis– Reduces the program to one aspect, e.g. control flow
Apr 19, 2023 COSC6431 37
Rephrasing
• Program normalization– Decreases syntactic complexity (desugaring),
e.g. algebraic simplification of expressions
• Program optimization– Improves performance, e.g. inlining, common-
subexpression and dead code elimination
Apr 19, 2023 COSC6431 38
Rephrasing
• Program refactoring– Improves the design by restructuring while
preserving the functionality
• Program obfuscation– Deliberately makes the program harder to
understand
• Software renovation– Fixes bugs such as Y2K
Apr 19, 2023 COSC6431 39
Transformation tools
• There are many transformation tools
• Program-Transformation.org lists 90 of them
• Most are based on term rewriting
• Other solutions use functional programming, lambda calculus, etc.
Apr 19, 2023 COSC6431 40
Term rewriting
• The process of simplifying symbolic expressions (terms) by means of a Rewrite System, i.e. a set of Rewrite Rules.
• A Rewrite Rule is of the formlhs rhswhere lhs and rhs are term patterns
Apr 19, 2023 COSC6431 41
Example Rewrite System
0 + x x s(x) + y s(x + y)(x + y) + z x + (y + z)
Under these rewrite rules, the term((s(s(a)) + s(b)) + c)will be rewritten ass(s(s(a + (b + c))))
Apr 19, 2023 COSC6431 42
TXL
• A generalized source-to-source translation system
• Uses a context-free grammar to describe the structures to be transformed
• Rule specification uses a by-example style
• Has been used to process billions of lines of code for Y2K purposes
Apr 19, 2023 COSC6431 43
TXL programs
• TXL programs consist of two parts:– Grammar for the input language– Transformation Rules
• Let’s look at some examples…
Apr 19, 2023 COSC6431 44
Calculator.Txl - Grammar
% Part I. Syntax specification
define program
[expression]
end define
define expression
[term]
| [expression] [addop] [term]
end define
define term
[primary]
| [term] [mulop] [primary]
end define
define primary [number] | ( [expression] )end define define addop '+ | '-end define define mulop '* | '/end define
Apr 19, 2023 COSC6431 45
Calculator.Txl - Rules% Part 2. Transformation rulesrule main replace [expression] E [expression] construct NewE [expression] E [resolveAddition] [resolveSubtraction] [resolveMultiplication] [resolveDivision] [resolveParentheses] where not NewE [= E] by NewEend rule
rule resolveAddition replace [expression] N1 [number] + N2 [number] by N1 [+ N2]end rule rule resolveSubtraction …rule resolveMultiplication …rule resolveDivision …rule resolveParentheses replace [primary] ( N [number] ) by Nend rule
Apr 19, 2023 COSC6431 46
DotProduct.Txl
% Form the dot product of two vectors,% e.g., (1 2 3).(3 2 1) => 10define program ( [repeat number] ) . ( [repeat number] ) | [number]end define
rule main replace [program] ( V1 [repeat number] ) . ( V2 [repeat number] ) construct Zero [number] 0 by Zero [addDotProduct V1 V2]end rule
rule addDotProduct V1 [repeat number] V2 [repeat number] deconstruct V1 First1 [number]
Rest1 [repeat number] deconstruct V2 First2 [number]
Rest2 [repeat number] construct ProductOfFirsts [number] First1 [* First2] replace [number] N [number] by N [+ ProductOfFirsts]
[addDotProduct Rest1 Rest2]end rule
Apr 19, 2023 COSC6431 47
Sort.Txl
% Sort.Txl - simple numeric bubble sortdefine program [repeat number]end definerule main replace [repeat number] N1 [number] N2 [number] Rest [repeat number] where N1 [> N2] by N2 N1 Restend rule
Apr 19, 2023 COSC6431 48
Other TXL constructs
compounds -> :=end compoundskeys var procedure exists inout outend keysfunction isAnAssignmentTo X [id] match [statement] X := Y [expression]end function