connecting task to source gail c. murphy department of computer science university of british...
TRANSCRIPT
Connecting Task to Source
Gail C. Murphy
Department of Computer ScienceUniversity of British Columbia
Includes joint work with: Elisa Baniassad, University of British Columbia David Notkin, University of Washington Kevin Sullivan, University of Virginia
© G.C. Murphy 2
Once Upon a Time...
Browser
Network
Parser
HTM L Parser
Browser
Network
Parser
HTM L Parser
VRM L Parser
File System
Changeis inevitable...
?
500 pages
500 pages
500 pages
500 pages
© G.C. Murphy 3
Overview of Talk
500 pages
500 pages
500 pages
500 pages
Browser
Network
Parser
HTM L Parser
VRM L Parser
File System
• A Typical Estimation Task• Software Reflexion Model
Technique
• A Typical Reengineering Task• Conceptual Modules
Technique
• Partial and ApproximateTechniques
• Summary
TaskTask
© G.C. Murphy 4
A Typical Estimation Scenario
• You are asked to provide, within five days, an estimate of the effort required to modify an implementation of a Unix operating system to page over a distributed network
include shareregresslib
sys
1947 files
860 files
NetBSD Kernel Source Code
© G.C. Murphy 5
Software Visualization
"Calls"extracted fromNetBSD sourceand displayed
usingField [Reiss95]
© G.C. Murphy 6
Reverse Engineering
"Calls"extracted fromNetBSD sourcewith Field anddisplayed withRigi [Müller89]
© G.C. Murphy 7
Boxology
• Model of a Unix virtual memory subsystem drawn by a domain expert
HardwareTranslation
VMPolicy
User
VirtualAddressMaint.
KernelFault Handler
Memory
Pager FileSystemModule
Call
© G.C. Murphy 8
Software Reflexion Model
Memory
HardwareTranslation
User
KernelFaultHandler
Pager
VirtualAddressMaint.
VMPolicy
FileSystem0
10
8
2
2
Convergence
Divergence
Absence
12 3
74
18
0 000
0
0
0
00
HardwareTranslation
VMPolicy
User
VirtualAddressMaint.
KernelFault Handler
Memory
Pager FileSystem
© G.C. Murphy 9
Software Reflexion Model Technique
ModelMapping
ExtractionTool
1
2
3
4
RMTools
ReflexionModel
SystemArtifacts
SourceModel
© G.C. Murphy 10
1. State a High-Level Model
• Syntactic
• Multiple relations
• “everyone has one or more”
CallData
Access
HardwareTranslation
VMPolicy
User
VirtualAddressMaint.
KernelFault Handler
Memory
Pager FileSystem
© G.C. Murphy 11
2. Extract a Source Model
• Use existing tools (e.g., cflow, Field, etc.)• Lightweight lexical source model extractor (Murphy/Notkin)• May contain multiple relations
ExtractionTool
... arp.c... vfsop.c
arpwhohasvget_internal
callscalls
... net.c
... rrip.csendrecv
rip_analyze
etc.
© G.C. Murphy 12
3. State a Declarative Mapping
• Name source model entities using: physical and logical software structure regular expressions
• Many-to-many mapping
Source Model Entities High-Level Model Entities
file=pager.c Pager file=vm_map.* VirtualAddressMaint. dir=vm func=active VMPolicy
© G.C. Murphy 13
4. Investigate a Reflexion Model
KernelFault Handler
Pager
2
0
1
12966
Arc Values
lfs_truncate calls vnode_pager_setsize
etc.
Convergence
Divergence
AbsenceFileSystem
© G.C. Murphy 14
Iteration
• Want to investigate the data relationships?
augment the source model
update the mapping:
var=queue.*active VMPolicy
recompute...Call
DataAccess
HardwareTranslation
VMPolicy
User
VirtualAddressMaint.
KernelFault Handler
Memory
Pager FileSystem
© G.C. Murphy 15
Refined Reflexion Model
Memory
HardwareTranslation
User
KernelFaultHandler
Pager
VirtualAddressMaint.
VMPolicy
FileSystem0
10
8
2
2
Convergence
Divergence
Absence
12 3
7 4
18
0 000
0
0
0
00
0
0
3
966
© G.C. Murphy 16
Experience
compiler 4,000 lines of C++4,000 lines of Ada
restructuring tool 30,000 lines of CLOSSPIN 65,000 lines of Modula-3NetBSD 250,000 lines of Cindustrial audibles 6,000 lines of C++Excel >1,000,000 lines of C
© G.C. Murphy 17
Excel: Experimental Reengineering
• A Microsoft engineer computed Reflexion Models several times a day for four weeks 120,000 calls and global variable references map file with over 1000 entries high-level model with 15 entities and 96 interactions 4 minutes to compute on a 486
• Some lessons learned: map files evolved to be larger than expected scale places pressure on managing the information
© G.C. Murphy 18
Other Features...
• Family of reflexion model systems• Parameterized by structural descriptions• Incremental computation algorithms• Typed model• Tagging and annotations to manage investigation• Used for a variety of tasks
O verview of Talk
Typica l Estimation Task/Reflexion M odels Typica l Reengineering Task/ Conceptual M odules Partia l and Approximate Techniques Summary
© G.C. Murphy 19
A Typical Reengineering Scenario
main() {for ( i=0; i< n files; i++ ) { char buf[8192]; F ILE *fp ; in t cc; fp=fopen( files[i], "r" ); tmp=tempname(); o fp=xtmpfopen(tmp);
Sort
Inpu tP ipe
O utpu tP ipe
so rt.c (1700 lines)
29 o the rC files
?
?
?
© G.C. Murphy 20
Reengineering Scenario...main() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;
fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);
sort( files, nfiles, ofp );
sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )
if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files
Procedure main:
Procedure sort:
Input Pipe
© G.C. Murphy 21
Program Database
• Identify variables of interest• For each variable
where is the variable declared?
where is the variable referenced?
• Collate results• Repeat
0
20
40
60
80
100
120
140
sort
Using Field'sxrefdb, 126lines w erereturned ofw hich 30%
w ere relevant.
© G.C. Murphy 22
Slicermain() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;
fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);
sort( files, nfiles, ofp );
sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )
if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files
• Compute backward slices on variables in pre-identified lines of code
Slices com putedw ith Unravel w ere
> 750 nodes insize and included
irrelevantprocedures
© G.C. Murphy 23
Type Inferencer
• Determine constraints on the representation of values• Can be used to identify abstract data types, detect
abstraction violations, find unused variables, and determine where there are possible references to a value
• The Lackwit [O’Callahan & Jackson 97] tool produces graphs summarizing how values are transmitted through a program
Lackw it graphsare often
large and arereported in
term s of theexisting
structure.
© G.C. Murphy 24
Software Reflexion Model
Filte r
InputP ipe
O utputP ipe
5
2
180
0
4
C a llsN o typeData
• Difficult to ascertain interface of the module
• No support for querying the source model
• Syntactic comparison
© G.C. Murphy 25
Conceptual Module Technique
S ourceM ode l
In te rfaceA na lys is
Q ueryR esu lts
Form a CM
Query
S ource M ode lE xtrac tion Too l
C onceptua lM odu le
Too l
© G.C. Murphy 26
Forming a Conceptual Module• Map lines of code to a logical module• Two ways to map the code:
by specifying line numbers (individual, ranges, etc.)
by specifying pieces of existinglogical structure (i.e., variablesor procedures)
• Each module has a name• Formation can be iterative
main() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;
fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);
sort( files, nfiles, ofp );
sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )
if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files
For sort, we endedup including about 24lines in the input pipeconceptual module.
© G.C. Murphy 27
Interface Analysis
• Local (interface) analysis is used to summarize how the module interacts directly with the existing code
main() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;
fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);
sort( files, nfiles, ofp );
sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )
if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files
Input Variables: sortalloc, main.ofp, main.minus, etc.
Output Variables: main.mergeonly, sort.ofp, sortalloc, etc.
Local Variables: main.files, main.nfiles, sort.files
Control Transfers: xmalloc at sort.c 1796, fillbuf at sort.c 248, etc.
© G.C. Murphy 28
Interface Analysis...
• Interface analysis is straightfoward. One twist is that the analysis is setup to be tolerant of the source model.
• Source model consists of: variable dependence relation control transfer relation procedure start relation
May be either use-def pairsor uses & defs
Two phase analysis for local variables:1. Use-def pairs: all uses & defs in module implies local variable.2. Uses & defs: consider input/output; promote to local if all uses and defs in module.
© G.C. Murphy 29
Querying about Conceptual Modules
• Once one or more conceptual modules are formed, the re-engineer typically needs to perform queries:
How do the Conceptual Modules relate to each other? How do the Conceptual Modules relate to the existing
source?
• The tool provides both pre-coded queries as well as a programmable interface through which a user can code queries.
© G.C. Murphy 30
Conceptual Module Relationships
A
B
directdef
use
A
B
indirectdef
use
A B A
B
overlap contains
© G.C. Murphy 31
Programmable InterfaceSET common = new SET(); // Get the use-def chains for all input and local variables// of that module.
Module first = (Module)Module.ModuleTable.elementAt(0);
common=DefUse.GetFullUseDefChain(first);
for(int i=1; i<Module.ModuleTable.size(); i++) {
// Get the use-def chains for the next module
Module current = (Module)Module.ModuleTable.elementAt(i);
SET curr_chain = DefUse.GetFullDefUseChain(current);
// Intersect the chains to determine common definition points common = DefUse.INTERSECTION(common, curr_chain);
}
common.print();
© G.C. Murphy 32
Experience
Task System Size (loc)
Extract components GNU sort 5100
Restructure adventure 8000
C to C++ GNU plot 26,000
(compression) 10,000
Restructure(subset)
CUDD 47,000
SUIFSUIF
SUIFSUIF
SUIFSUIF
SUIFSUIF
xrefdbxrefdb
SUIFSUIF = tools built on SUIF provides use-def pairs
xrefdbxrefdb = Field’s xrefdb provides uses & defs
© G.C. Murphy 33
Query Context and Form
• Two parts to expressing context: identify region of source over which to query restrict the region for which results are reported
• Conceptual Module identifies region and interface analysis summarizes local results
• Form includes both input and output: some tasks require queries over grouped items reort results in terms of task
• can use Conceptual Module structure to query against source; results are reported in terms of target structure
© G.C. Murphy 34
Partial and Approximate Techniques
• Each of these characteristics can be an effective way to attack scale.
• These characteristics can be combined to provide software engineers with a “smoother” means of managing source investigations.
Bottom line for most developments is that time is money.
Time
% P
rogra
m C
onsi
dere
d
approximate
conservative
© G.C. Murphy 35
SummarySoftware Reflexion Model
“Definitely confirmed suspicions about the structure of Excel. Further, it allowed me to pinpoint the deviations. ... very easy to ignore stuff that is not interesting and thereby focus on the part of Excel I want to know more about.”
Microsoft Engineer
Conceptual Module “not only did the tool verify the independent nature of
the ZDD functionality and allow me to rip out all that code, but, the process of using your tool forced me to analyze and understand the code in a way that I had not been doing and that ultimately it very quickly gave me the perspective I needed.”
Yvonne Coady
500 pages
500 pages
500 pages
500 pages
Browser
Network
Parser
HTM L Parser
VRM L Parser
File System
TaskTask
© G.C. Murphy 36
Summary...
• demonstrated benefits of task-aware program understanding techniques current techniques are structurally task-aware
• demonstrated role for approximate information reflexion model technique makes engineer responsible conceptual modules takes some of responsibility
• goal is to get to “what-if” tools that would allow engineers to leverage, cost-effectively, connections between design and source