Improving Software Maintenance using Unsupervised Machine Learning techniques

Download Improving Software Maintenance using Unsupervised Machine Learning techniques

Post on 05-Dec-2014

582 views

Category:

Technology

1 download

Embed Size (px)

DESCRIPTION

"Improving Software Maintenance using Unsupervised Machine Learning techniques": Ph.D. defence presentation. Unsupervised Machine Learning techniques have been used to face different software maintenance issues such as Software Modularisation and Clone detection.

TRANSCRIPT

<ul><li> 1. Dottorato in Scienze Computazionali e Informatiche, XXV Ciclo Ph.D. Candidate: Valerio Maggio Thesis Advisors: Dr. Sergio Di Martino Dr. Anna Corazza June 5th, 2013 UNSUPERVISED MACHINE LEARNING FOR SOFTWARE MAINTENANCE </li> <li> 2. THESIS G O A L UNSUPERVISED MACHINE LEARNING FOR SOFTWARE MAINTENANCE These solutions exploit (unsupervised) machine learning techniques to mine information from the source code Define and experimentally evaluate solutions (techniques and prototype tools) for automatic software analysis to support software maintenance activities. </li> <li> 3. There exist different types of Software Maintenance. SOFTWARE MAINTENANCE A software system must be continually adapted during its overall life cycle or it progressively becomes less satisfactory. (cit. Lehmans First Law of Software Evolution) </li> <li> 4. SOFTWARE MAINTENANCE A software system must be continually adapted during its overall life cycle or it progressively becomes less satisfactory. (cit. Lehmans First Law of Software Evolution) Software Maintenance represents the most expensive, time consuming and challenging phase of the whole development process. </li> <li> 5. SOFTWARE MAINTENANCE A software system must be continually adapted during its overall life cycle or it progressively becomes less satisfactory. (cit. Lehmans First Law of Software Evolution) Software Maintenance represents the most expensive, time consuming and challenging phase of the whole development process. Software Maintenance could account up to the 85-90% of the total software costs. </li> <li> 6. ISSUES SOFTWARE MAINTENANCE Software Maintenance is about change! (cit. S. Jarzabek) Change Analysis </li> <li> 7. ISSUES SOFTWARE MAINTENANCE Software Maintenance is about change! (cit. S. Jarzabek) Change Analysis Program Comprehension The documentation is usually scarce or not up to date! </li> <li> 8. ISSUES SOFTWARE MAINTENANCE The source code (usually) represents the most reliable source of information about the system Software Maintenance is about change! (cit. S. Jarzabek) Change Analysis Program Comprehension The documentation is usually scarce or not up to date! </li> <li> 9. ISSUES SOFTWARE MAINTENANCE The source code (usually) represents the most reliable source of information about the system Software Maintenance is about change! (cit. S. Jarzabek) Change Analysis Program Comprehension Reverse Engineering The documentation is usually scarce or not up to date! </li> <li> 10. REVERSE E N G I N E E R I N G Definition of tools and techniques to support maintenance activities Goal: Build higher-level software models in an automatic fashion gathering information from the source code or any other document Goal: To aid the comprehension of the system </li> <li> 11. REVERSE E N G I N E E R I N G Definition of tools and techniques to support maintenance activities Goal: Build higher-level software models in an automatic fashion gathering information from the source code or any other document Goal: To aid the comprehension of the system STATIC ANALYSIS DYNAMIC ANALYSIS </li> <li> 12. REVERSE E N G I N E E R I N G Definition of tools and techniques to support maintenance activities Goal: Build higher-level software models in an automatic fashion gathering information from the source code or any other document Goal: To aid the comprehension of the system STATIC ANALYSIS DYNAMIC ANALYSIS </li> <li> 13. MACHINE L E A R N I N G Provides computational effective solutions to analyze large data sets </li> <li> 14. MACHINE L E A R N I N G Provides computational effective solutions to analyze large data sets Provides solutions that can be tailored to different tasks/domains </li> <li> 15. MACHINE L E A R N I N G Provides computational effective solutions to analyze large data sets Provides solutions that can be tailored to different tasks/domains Requires many efforts in: the definition of the relevant information best suited for the specific task/domain </li> <li> 16. MACHINE L E A R N I N G Provides computational effective solutions to analyze large data sets Provides solutions that can be tailored to different tasks/domains Requires many efforts in: the definition of the relevant information best suited for the specific task/domain the application of the learning algorithms to the considered data </li> <li> 17. UNSUPERVISEDLEARNING Supervised Learning: Learn from labelled samples Unsupervised Learning: Learn (directly) from the data Learn by examples </li> <li> 18. UNSUPERVISEDLEARNING Supervised Learning: Learn from labelled samples Unsupervised Learning: Learn (directly) from the data Learn by examples </li> <li> 19. UNSUPERVISEDLEARNING Supervised Learning: Learn from labelled samples Unsupervised Learning: Learn (directly) from the data Learn by examples (+) No cost of labeling samples (-) Trade-off imposed on the quality of the data </li> <li> 20. THESIS C O N T R I B U T I O N S &amp; </li> <li> 21. THESIS C O N T R I B U T I O N S Contributions to three relevant and related open issues in Software Maintenance &amp; </li> <li> 22. THESIS C O N T R I B U T I O N S Contributions to three relevant and related open issues in Software Maintenance 1. SOFTWARE RE-MODULARIZATION 3. CLONE DETECTION 2. SOURCE CODE NORMALIZATION &amp; </li> <li> 23. ISSUE #1 SOFTWARE RE-MODULARIZATION </li> <li> 24. Software Classes UI Process Components UI Components Data Access Components Data Helpers / Utilities Security Operational Management Communications Business Components Application Facade Buisiness Workows Messages Interfaces Service Interfaces Re-modularization provides a way to support software maintainers by automatically grouping together (clustering) related software classes SOFTWARE RE-MODULARIZATION PROBL EM S T A T E M E N T </li> <li> 25. Re-modularization provides a way to support software maintainers by automatically grouping together (clustering) related software classes SOFTWARE RE-MODULARIZATION External Systems Service Consumers Services Service Interfaces Messages Interfaces Cross Cutting Security OperationalManagement Communications Data Data Access Components Data Helpers / Utilities Presentation UI Components UI Process Components Business Application Facade Buisiness Workows Business Components Clusters of Software Classes PROBL EM S T A T E M E N T </li> <li> 26. SOURCECODE LEXICALINFORMATION </li> <li> 27. SOURCECODE LEXICALINFORMATION </li> <li> 28. SOURCECODE LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code to produce clusters of classes that are lexically related. </li> <li> 29. SOURCECODE LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code to produce clusters of classes that are lexically related. State of the Art: Information Retrieval (IR) based approaches. </li> <li> 30. IRINDEXING </li> <li> 31. Implicit assumption: The same words are used whenever a particular concept occurs IRINDEXING </li> <li> 32. 1. Tokenization 2. Normalization draw, the, are, null, handl, box, r, rectangl, g, graphic, box, display, box, ... Draws, the, are, NullHandle, box, r, Rectangle, g, Graphics, box, displayBox, ... Implicit assumption: The same words are used whenever a particular concept occurs IRINDEXING </li> <li> 33. SOURCECODEZONES LEXICALINFORMATION </li> <li> 34. SOURCECODEZONES LEXICALINFORMATION Class Names </li> <li> 35. SOURCECODEZONES LEXICALINFORMATION Class Names Attribute Names </li> <li> 36. SOURCECODEZONES LEXICALINFORMATION Class Names Attribute Names Method Names Method Names </li> <li> 37. SOURCECODEZONES LEXICALINFORMATION Class Names Attribute Names Method Names Parameter Names Method Names Parameter Names </li> <li> 38. SOURCECODEZONES LEXICALINFORMATION Class Names Attribute Names Method Names Parameter Names Comments Comments Method Names Parameter Names Comments </li> <li> 39. SOURCECODEZONES LEXICALINFORMATION Source Code Class Names Attribute Names Method Names Parameter Names Comments Comments Method Names Parameter Names Source Code Comments </li> <li> 40. ZONEINDEXING </li> <li> 41. ZONEINDEXING RQ1: Do terms in different Zones provide different contributions? </li> <li> 42. 1. Tokenization 2. Normalization ZONEINDEXING Draws, the, are, NullHandle, box, r, draw, g, Graphics, color, displayBox, ... draw, the, are, null, handl, box, r, draw, g, graphic, color, display, box, ... RQ1: Do terms in different Zones provide different contributions? </li> <li> 43. SOURCECODELEXICON JFreeChart JEdit JUnit Xerces </li> <li> 44. SOURCECODELEXICON JFreeChart Good lexicon in every Zone! </li> <li> 45. SOURCECODELEXICON JEdit Very good in Comments Very poor in Method and Parameter names </li> <li> 46. SOURCECODELEXICON JUnit Good in Class and Attribute names No Comments at all </li> <li> 47. SOURCECODELEXICON Xerces Poor in Method, Parameter names and Co...</li></ul>