connecting task to source gail c. murphy department of computer science university of british...

36
Connecting Task to Source Gail C. Murphy Department of Computer Science University of British Columbia ludes joint work with: Elisa Baniassad, University of British Columbia David Notkin, University of Washington Kevin Sullivan, University of Virginia

Upload: jemimah-hines

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Connecting Task to Source

Gail C. Murphy

Department of Computer ScienceUniversity of British Columbia

Includes joint work with: Elisa Baniassad, University of British Columbia David Notkin, University of Washington Kevin Sullivan, University of Virginia

© G.C. Murphy 2

Once Upon a Time...

Browser

Network

Parser

HTM L Parser

Browser

Network

Parser

HTM L Parser

VRM L Parser

File System

Changeis inevitable...

?

500 pages

500 pages

500 pages

500 pages

© G.C. Murphy 3

Overview of Talk

500 pages

500 pages

500 pages

500 pages

Browser

Network

Parser

HTM L Parser

VRM L Parser

File System

• A Typical Estimation Task• Software Reflexion Model

Technique

• A Typical Reengineering Task• Conceptual Modules

Technique

• Partial and ApproximateTechniques

• Summary

TaskTask

© G.C. Murphy 4

A Typical Estimation Scenario

• You are asked to provide, within five days, an estimate of the effort required to modify an implementation of a Unix operating system to page over a distributed network

include shareregresslib

sys

1947 files

860 files

NetBSD Kernel Source Code

© G.C. Murphy 5

Software Visualization

"Calls"extracted fromNetBSD sourceand displayed

usingField [Reiss95]

© G.C. Murphy 6

Reverse Engineering

"Calls"extracted fromNetBSD sourcewith Field anddisplayed withRigi [Müller89]

© G.C. Murphy 7

Boxology

• Model of a Unix virtual memory subsystem drawn by a domain expert

HardwareTranslation

VMPolicy

User

VirtualAddressMaint.

KernelFault Handler

Memory

Pager FileSystemModule

Call

© G.C. Murphy 8

Software Reflexion Model

Memory

HardwareTranslation

User

KernelFaultHandler

Pager

VirtualAddressMaint.

VMPolicy

FileSystem0

10

8

2

2

Convergence

Divergence

Absence

12 3

74

18

0 000

0

0

0

00

HardwareTranslation

VMPolicy

User

VirtualAddressMaint.

KernelFault Handler

Memory

Pager FileSystem

© G.C. Murphy 9

Software Reflexion Model Technique

ModelMapping

ExtractionTool

1

2

3

4

RMTools

ReflexionModel

SystemArtifacts

SourceModel

© G.C. Murphy 10

1. State a High-Level Model

• Syntactic

• Multiple relations

• “everyone has one or more”

CallData

Access

HardwareTranslation

VMPolicy

User

VirtualAddressMaint.

KernelFault Handler

Memory

Pager FileSystem

© G.C. Murphy 11

2. Extract a Source Model

• Use existing tools (e.g., cflow, Field, etc.)• Lightweight lexical source model extractor (Murphy/Notkin)• May contain multiple relations

ExtractionTool

... arp.c... vfsop.c

arpwhohasvget_internal

callscalls

... net.c

... rrip.csendrecv

rip_analyze

etc.

© G.C. Murphy 12

3. State a Declarative Mapping

• Name source model entities using: physical and logical software structure regular expressions

• Many-to-many mapping

Source Model Entities High-Level Model Entities

file=pager.c Pager file=vm_map.* VirtualAddressMaint. dir=vm func=active VMPolicy

© G.C. Murphy 13

4. Investigate a Reflexion Model

KernelFault Handler

Pager

2

0

1

12966

Arc Values

lfs_truncate calls vnode_pager_setsize

etc.

Convergence

Divergence

AbsenceFileSystem

© G.C. Murphy 14

Iteration

• Want to investigate the data relationships?

augment the source model

update the mapping:

var=queue.*active VMPolicy

recompute...Call

DataAccess

HardwareTranslation

VMPolicy

User

VirtualAddressMaint.

KernelFault Handler

Memory

Pager FileSystem

© G.C. Murphy 15

Refined Reflexion Model

Memory

HardwareTranslation

User

KernelFaultHandler

Pager

VirtualAddressMaint.

VMPolicy

FileSystem0

10

8

2

2

Convergence

Divergence

Absence

12 3

7 4

18

0 000

0

0

0

00

0

0

3

966

© G.C. Murphy 16

Experience

compiler 4,000 lines of C++4,000 lines of Ada

restructuring tool 30,000 lines of CLOSSPIN 65,000 lines of Modula-3NetBSD 250,000 lines of Cindustrial audibles 6,000 lines of C++Excel >1,000,000 lines of C

© G.C. Murphy 17

Excel: Experimental Reengineering

• A Microsoft engineer computed Reflexion Models several times a day for four weeks 120,000 calls and global variable references map file with over 1000 entries high-level model with 15 entities and 96 interactions 4 minutes to compute on a 486

• Some lessons learned: map files evolved to be larger than expected scale places pressure on managing the information

© G.C. Murphy 18

Other Features...

• Family of reflexion model systems• Parameterized by structural descriptions• Incremental computation algorithms• Typed model• Tagging and annotations to manage investigation• Used for a variety of tasks

O verview of Talk

Typica l Estimation Task/Reflexion M odels Typica l Reengineering Task/ Conceptual M odules Partia l and Approximate Techniques Summary

© G.C. Murphy 19

A Typical Reengineering Scenario

main() {for ( i=0; i< n files; i++ ) { char buf[8192]; F ILE *fp ; in t cc; fp=fopen( files[i], "r" ); tmp=tempname(); o fp=xtmpfopen(tmp);

Sort

Inpu tP ipe

O utpu tP ipe

so rt.c (1700 lines)

29 o the rC files

?

?

?

© G.C. Murphy 20

Reengineering Scenario...main() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;

fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);

sort( files, nfiles, ofp );

sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )

if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files

Procedure main:

Procedure sort:

Input Pipe

© G.C. Murphy 21

Program Database

• Identify variables of interest• For each variable

where is the variable declared?

where is the variable referenced?

• Collate results• Repeat

0

20

40

60

80

100

120

140

sort

Using Field'sxrefdb, 126lines w erereturned ofw hich 30%

w ere relevant.

© G.C. Murphy 22

Slicermain() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;

fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);

sort( files, nfiles, ofp );

sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )

if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files

• Compute backward slices on variables in pre-identified lines of code

Slices com putedw ith Unravel w ere

> 750 nodes insize and included

irrelevantprocedures

© G.C. Murphy 23

Type Inferencer

• Determine constraints on the representation of values• Can be used to identify abstract data types, detect

abstraction violations, find unused variables, and determine where there are possible references to a value

• The Lackwit [O’Callahan & Jackson 97] tool produces graphs summarizing how values are transmitted through a program

Lackw it graphsare often

large and arereported in

term s of theexisting

structure.

© G.C. Murphy 24

Software Reflexion Model

Filte r

InputP ipe

O utputP ipe

5

2

180

0

4

C a llsN o typeData

• Difficult to ascertain interface of the module

• No support for querying the source model

• Syntactic comparison

© G.C. Murphy 25

Conceptual Module Technique

S ourceM ode l

In te rfaceA na lys is

Q ueryR esu lts

Form a CM

Query

S ource M ode lE xtrac tion Too l

C onceptua lM odu le

Too l

© G.C. Murphy 26

Forming a Conceptual Module• Map lines of code to a logical module• Two ways to map the code:

by specifying line numbers (individual, ranges, etc.)

by specifying pieces of existinglogical structure (i.e., variablesor procedures)

• Each module has a name• Formation can be iterative

main() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;

fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);

sort( files, nfiles, ofp );

sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )

if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files

For sort, we endedup including about 24lines in the input pipeconceptual module.

© G.C. Murphy 27

Interface Analysis

• Local (interface) analysis is used to summarize how the module interacts directly with the existing code

main() for ( i=0; i< nfiles; i++ ) { char buf[8192]; FILE *fp; int cc;

fp=fopen( files[i], "r" ); tmp=tempname(); ofp=xtmpfopen(tmp);

sort( files, nfiles, ofp );

sort( char **files, int nfiles, FILE *ofp ) { fp=xfopen( files, "r" ); while ( fillbuf( &buf, fp ) ) { findlines( &buf, &lines )

if ( feof (fp) ) && !nfiles... tfp=ofp; else ++n_temp_files

Input Variables: sortalloc, main.ofp, main.minus, etc.

Output Variables: main.mergeonly, sort.ofp, sortalloc, etc.

Local Variables: main.files, main.nfiles, sort.files

Control Transfers: xmalloc at sort.c 1796, fillbuf at sort.c 248, etc.

© G.C. Murphy 28

Interface Analysis...

• Interface analysis is straightfoward. One twist is that the analysis is setup to be tolerant of the source model.

• Source model consists of: variable dependence relation control transfer relation procedure start relation

May be either use-def pairsor uses & defs

Two phase analysis for local variables:1. Use-def pairs: all uses & defs in module implies local variable.2. Uses & defs: consider input/output; promote to local if all uses and defs in module.

© G.C. Murphy 29

Querying about Conceptual Modules

• Once one or more conceptual modules are formed, the re-engineer typically needs to perform queries:

How do the Conceptual Modules relate to each other? How do the Conceptual Modules relate to the existing

source?

• The tool provides both pre-coded queries as well as a programmable interface through which a user can code queries.

© G.C. Murphy 30

Conceptual Module Relationships

A

B

directdef

use

A

B

indirectdef

use

A B A

B

overlap contains

© G.C. Murphy 31

Programmable InterfaceSET common = new SET(); // Get the use-def chains for all input and local variables// of that module.

Module first = (Module)Module.ModuleTable.elementAt(0);

common=DefUse.GetFullUseDefChain(first);

for(int i=1; i<Module.ModuleTable.size(); i++) {

// Get the use-def chains for the next module

Module current = (Module)Module.ModuleTable.elementAt(i);

SET curr_chain = DefUse.GetFullDefUseChain(current);

// Intersect the chains to determine common definition points common = DefUse.INTERSECTION(common, curr_chain);

}

common.print();

© G.C. Murphy 32

Experience

Task System Size (loc)

Extract components GNU sort 5100

Restructure adventure 8000

C to C++ GNU plot 26,000

(compression) 10,000

Restructure(subset)

CUDD 47,000

SUIFSUIF

SUIFSUIF

SUIFSUIF

SUIFSUIF

xrefdbxrefdb

SUIFSUIF = tools built on SUIF provides use-def pairs

xrefdbxrefdb = Field’s xrefdb provides uses & defs

© G.C. Murphy 33

Query Context and Form

• Two parts to expressing context: identify region of source over which to query restrict the region for which results are reported

• Conceptual Module identifies region and interface analysis summarizes local results

• Form includes both input and output: some tasks require queries over grouped items reort results in terms of task

• can use Conceptual Module structure to query against source; results are reported in terms of target structure

© G.C. Murphy 34

Partial and Approximate Techniques

• Each of these characteristics can be an effective way to attack scale.

• These characteristics can be combined to provide software engineers with a “smoother” means of managing source investigations.

Bottom line for most developments is that time is money.

Time

% P

rogra

m C

onsi

dere

d

approximate

conservative

© G.C. Murphy 35

SummarySoftware Reflexion Model

“Definitely confirmed suspicions about the structure of Excel. Further, it allowed me to pinpoint the deviations. ... very easy to ignore stuff that is not interesting and thereby focus on the part of Excel I want to know more about.”

Microsoft Engineer

Conceptual Module “not only did the tool verify the independent nature of

the ZDD functionality and allow me to rip out all that code, but, the process of using your tool forced me to analyze and understand the code in a way that I had not been doing and that ultimately it very quickly gave me the perspective I needed.”

Yvonne Coady

500 pages

500 pages

500 pages

500 pages

Browser

Network

Parser

HTM L Parser

VRM L Parser

File System

TaskTask

© G.C. Murphy 36

Summary...

• demonstrated benefits of task-aware program understanding techniques current techniques are structurally task-aware

• demonstrated role for approximate information reflexion model technique makes engineer responsible conceptual modules takes some of responsibility

• goal is to get to “what-if” tools that would allow engineers to leverage, cost-effectively, connections between design and source