software archaeology

43
1 © 2012 IBM Corporation Software Archaeology Rediscovering Your Architecture Chris Bailey – IBM Java Service Architect 2 nd October 2012

Upload: chris-bailey

Post on 11-Nov-2014

1.444 views

Category:

Documents


4 download

DESCRIPTION

As Java deployments have become more complex, it has become harder and harder to get good insight into the structure and execution flow of the application. This applies particularly where middleware and third-party components or legacy software that has little or no design documentation is involved. This session introduces you to software archaeology: how to build an understanding of the layout and execution of your Java application from the deployed application itself. Presented at JavaOne 2012.

TRANSCRIPT

Page 1: Software Archaeology

1

© 2012 IBM Corporation

Software ArchaeologyRediscovering Your Architecture

Chris Bailey – IBM Java Service Architect

2nd October 2012

Page 2: Software Archaeology

2 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Important Disclaimers

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.

ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.

IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.

IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

- CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS

Page 3: Software Archaeology

3 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Introduction to the speaker

■ 12 years experience developing and deploying Java SDKs

■ Recent work focus:– Java applications in the cloud– Java usability and quality– Debugging tools and capabilities– Requirements gathering– Highly resilient and scalable deployments

■ My contact information:– [email protected]– http://www.linkedin.com/profile/view?id=3100666

Page 4: Software Archaeology

4 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Goals of the Talk

■ Introduce the concept of Software Archaeology and closing the software lifecycle

■ Discuss some of the available methodologies

■ Show how to rediscover your architecture with a sample application

Page 5: Software Archaeology

5 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Agenda

■ The Software Lifecycle

■ Software Archaeology

■ An Introduction to UML

■ Digging into applications

Page 6: Software Archaeology

6 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

The Software Lifecycle

“A software development process, also known as a software development life-cycle (SDLC), is a structure imposed on the development of a software product. Similar terms include software life cycle and software process. It is often considered a subset of systems development life cycle”- Wikipedia

■ Should be a closed loop, iterative process

■ Design should relate to Requirements

■ Implementation should relate to Design

■ Even as the application evolves

Page 7: Software Archaeology

7 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

The Waterfall Model

■ Not uncommon for applications to be more like the waterfall model

■ Software lifecycle becomes linear...

■ Changes needed between:– Requirements and Design– Design and Implementation

… and as a result of maintenance

become lost

■ Result is that deployments do not reflect their original designs

Page 8: Software Archaeology

8 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Legacy Systems and Code

■ Old applications or code that have been inherited

■ Typically in maintenance, and providing a valuable function

■ However it often provides challenges:– Maintenance cost is usually high

• Outside of normal vendor support

– Little or no requirement and design documentation• Potentially even without source code for some parts!

Page 9: Software Archaeology

9 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Software Archaeology

“..the study of poorly documented or undocumented legacy software implementations, as part of software maintenance... includes the reverse engineering of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information... may reveal dysfunctional team processes which have produced poorly designed or even unused software modules.”- Wikipedia

■ Software archaeology makes it possible to build designs from legacy systems and code

Page 10: Software Archaeology

10 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

An Introduction to Unified Modelling Language (UML)

“...a standardized general-purpose modeling language in the field of object-oriented software engineering. The standard is managed, and was created, by the Object Management Group. ...offers a standard way to visualize a system's architectural blueprints”- Wikipedia

■ A unified standard modelling notation

■ Allows the creation of structure and design plans

■ UML provides two categories of diagram:– Behaviour Diagrams: show the dynamic behaviour between the elements of the system– Structural Diagrams: show the static structure of the system being modelled

Page 11: Software Archaeology

11 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Behaviour Diagrams

■ Use-case diagram– used to visualize the functional requirements of a system– shows the relationship of "actors" to essential processes– often used to show the relationship between use cases

Page 12: Software Archaeology

12 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Behaviour Diagrams

■ Activity Diagram– Shows flow of control during an activity– Best used to model to model higher-level processes– “Less technical" than sequence diagrams

Page 13: Software Archaeology

13 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Behaviour Diagrams

■ Statechart Diagram– Shows the different states that a class can be in– Shows the transitions from state to state– Usually only covers "interesting" classes

Page 14: Software Archaeology

14 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Behaviour Diagrams

■ Sequence Diagram– Shows the interactions between objects in the sequential order that those interactions occur– Applied to code objects, but can be applied to business objects.– Can be used for communication between teams or organisations

Page 15: Software Archaeology

15 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Structural Diagrams

■ Deployment Diagram– Shows how a system will be physically deployed in the hardware environment– Shows where the different entities of the system will physically run – Shows how they will communicate with each other– Models the physical deployment

Page 16: Software Archaeology

16 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Structural Diagrams

■ Component Diagram– Shows the dependencies that the software has on the other software components– Usually shown at a high level with large-grain components– Or at the component package level

Page 17: Software Archaeology

17 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

UML Structural Diagrams

■ Class diagrams– Shows the static structures of the system, generally implementation classes– Shows how the different entities relate to each other

■ Object Diagrams:– Show a complete or partial view of the structure at a specific time.– Focus on a set of object instances and attributes, and the links between the instances– More concrete than class diagrams– Often used to provide examples, or act as test cases for the class diagrams

Page 18: Software Archaeology

18 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Software Archaeology: Methodologies

■ Various methods exist for Software Archaeology, including:

– Static analysis• Analysis of source code by tooling or by hand

– Trace based analysis• Injecting trace into running application

– Debugger based analysis• Stepping through code under a debugger

Page 19: Software Archaeology

19 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into a Simple Application

■ A simple application: “JavaGrep”

■ Usage: JavaGrep <pattern> <list of files>

■ Where:– Pattern is a search term– List of files is one of more files to search

Page 20: Software Archaeology

20 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into a Simple Application■ JavaGrep.java:

Compile regexp Add files to fileList

For EachFile

Create FileScanner scan()Get matching lines Print matching lines

Print summary

Page 21: Software Archaeology

21 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into a Simple Application

Create LineNumberReader

Get LineFor Each

Line Check for match Add to matchLines

■ FileScanner.java:

Page 22: Software Archaeology

22 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into a Simple Application

Run regexp matcher against line

Get ArrayList of matching lines

Get scanned line count

Get matched line count

Page 23: Software Archaeology

23 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Understanding the code behaviour

■ Dynamic behaviour at code level is the method call graph– Interaction between classes and objects– Sequence of methods called and returned

■ Best fit in UML is the sequence diagram

Page 24: Software Archaeology

24 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generating a sequence diagram

■ Static Analysis– Limited tooling exists– Really needs to be done manually

■ Runtime Analysis– Activate tracing mechanism(s)– Run the application

• Code coverage testing• QA or Performance Testing• Production!

Page 25: Software Archaeology

25 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generated Sequence Diagram: Static Analysis

■ Static Analysis Results:– Requires reading of the code!

– Accurate diagram

– Limited to available source• JavaGrep• FileScanner

– Possible to include other calls• Limited to single extra call

Page 26: Software Archaeology

26 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Runtime Analysis: Profiling Output

■ Profiling output:– Full graph of all running methods

– Access to call frequency/cost information

– Requires methods to be run...

– No call ordering or count data

Page 27: Software Archaeology

27 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Runtime Analysis: Profiling Output

■ Digging into FileScanner.scan():– FileScanner.scanLine()– LineNumberReader.readLine()– StringBuilder.append()– ArrayList.<init>()– StringBuilder.append()– ArrayList.add()

Page 28: Software Archaeology

28 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generated Sequence Diagram: Runtime Analysis

■ Profiling Analysis:– Scope to cover all code

– No ordering information

– No call count information

Page 29: Software Archaeology

29 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Runtime Analysis: Limited tracing to add order and count information

14:38:09.260*0x6db300 mt.3 > JavaGrep.<clinit>()V Bytecode static method14:38:09.260 0x6db300 mt.9 < JavaGrep.<clinit>()V14:38:09.260 0x6db300 mt.3 > JavaGrep.main([Ljava/lang/String;)V14:38:09.281 0x6db300 mt.0 > FileScanner.<init>(Ljava/lang/String;Ljava/util/regex/Pattern;)V 14:38:09.283 0x6db300 mt.6 < FileScanner.<init>(Ljava/lang/String;Ljava/util/regex/Pattern;)V 14:38:09.283 0x6db300 mt.0 > FileScanner.scan()I14:38:09.292 0x6db300 mt.0 > FileScanner.scanLine(Ljava/lang/String;)............ ........ .... .....14:38:09.296 0x6db300 mt.6 < FileScanner.scanLine(Ljava/lang/String;)Z14:38:09.296 0x6db300 mt.6 < FileScanner.scan()I14:38:09.296 0x6db300 mt.0 > FileScanner.getMatchLines()Ljava/util/ArrayList;14:38:09.296 0x6db300 mt.6 < FileScanner.getMatchLines()Ljava/util/ArrayList;14:38:09.296 0x6db300 mt.0 > FileScanner.getMatchCount()I14:38:09.296 0x6db300 mt.6 < FileScanner.getMatchCount()I14:38:09.297 0x6db300 mt.0 > FileScanner.getMatchCount()I14:38:09.297 0x6db300 mt.6 < FileScanner.getMatchCount()I14:38:09.297 0x6db300 mt.0 > FileScanner.getLineCount()I14:38:09.297 0x6db300 mt.6 < FileScanner.getLineCount()I14:38:09.297 0x6db300 mt.9 < JavaGrep.main([Ljava/lang/String;)V■ Order is:

scan() -> getMatchedLines() -> getMatchCount() -> getMatchCount() -> getLineCount()

Page 30: Software Archaeology

30 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generated Sequence Diagram: Runtime Analysis

■ Runtime Analysis:– Scope to cover all code

– Ordering information included

– Call count information included

– Only covers executed code*

Page 31: Software Archaeology

31 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Understanding the structure

■ Structure at code level is the object graph– Interaction between classes and objects– References between object on the Java heap

■ Best fit in UML is the class and/or objectdiagram

Page 32: Software Archaeology

32 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generating a Class or Object Diagram

■ Static Analysis– Provide tool with access to necessary source– Run analysis!

■ Runtime Analysis– Run the application

• Code coverage testing• QA or Performance Testing• Production!

– Generate a system or heap dump

Page 33: Software Archaeology

33 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generated Class Diagram: Static Analysis

■ Static Analysis Results:– Accurate diagram

– Limited to available source

Page 34: Software Archaeology

34 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Runtime Analysis: System Dump Information

■ JavaGrep:– Static attributes:

• matchString java.util.String “for”• fileList java.util.ArrayList size is 1, one entry: “..\src\JavaGrep”• _pattern java.util.regex.Pattern “\sfor\s” compiled = true

Page 35: Software Archaeology

35 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Runtime Analysis: System Dump Information

■ FileList java.uti..ArrayList:– Instance attributes:

• ElementData java.lang.Object[10] single entry of “..\src\JavaGrep.java”• size int 1• modCount int 1

Page 36: Software Archaeology

36 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Runtime Analysis: System Dump Information

■ FileScanner– Object instance attributes:

• _fileName java.lang.String “..\src\JavaGrep.java”• _fReader java.io.LineNumberReader object @ 0x228165e0• _pattern java.lang.String “\sfor\s” compiled = true• _matchLines java.util.ArrayList object @ 0x22864410• _matchCount int 4• _lineCount int 41

Page 37: Software Archaeology

37 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Generated Class Diagram: Runtime Analysis

■ Runtime Analysis Results:– The same accurate diagram– Methods could be added from sequence diagram– Scope to add further classes/objects

Page 38: Software Archaeology

38 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into Something More Complex...

■ Apache Tomcat 7.0.27

■ org.apache.catalina.startup.Bootstrap.main()– init()– load()– start()

■ Uses reflection to call:– org.apache.catalina.startup.Catalina.*

Page 39: Software Archaeology

39 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into Something More Complex....

Page 40: Software Archaeology

40 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Digging into Something More Complex....

■ Digging into “server”:

Page 41: Software Archaeology

41 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Summary

■ Its possible to use static and/or runtime tooling to understand an application

■ Allows you to close the development lifecycle– even for legacy and 3rd party code

■ Allows you to migrate legacy systems where original requirements are lost– Definition of existing system as source of requirements for new implementation

■ Allows you to debug problems– difference between design and implementation may well be your issue

Page 42: Software Archaeology

42 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

References

■ Get Products and Technologies:– IBM Monitoring and Diagnostic Tools for Java:

• https://www.ibm.com/developerworks/java/jdk/tools/

■ Learn:– Health Center InfoCenter:

• http://publib.boulder.ibm.com/infocenter/hctool/v1r0/index.jsp

■ Discuss:– IBM on Troubleshooting Java Applications Blog:

• https://www.ibm.com/developerworks/mydeveloperworks/blogs/troubleshootingjava/– Health Center Forum:

• http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1461– IBM Java Runtimes and SDKs Forum:

• http://www.ibm.com/developerworks/forums/forum.jspa?forumID=367&start=0

Page 43: Software Archaeology

43 © 2012 IBM CorporationRediscovering Your Architecture: With Software Archaeology

Copyright and Trademarks

© IBM Corporation 2012. All Rights Reserved.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide.

Other product and service names might be trademarks of IBM or other companies.

A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml