semiautomatic reverse engineering tool on oracle forms...

Semiautomatic Reverse Engineering Tool on OracleForms Information Systems

Oscar Javier Chaparro ArenasCode: 02300403

Universidad Nacional de ColombiaFacultad de Ingenierıa

Departamento de Ingenierıa de Sistemas e IndustrialBogota, D.C.

December 2012

Semiautomatic Reverse Engineering Tool on OracleForms Information Systems

Oscar Javier Chaparro ArenasCode: 02300403

A thesis submitted in partial fulfillment of the

requirements for the degree ofMaster in Engineering - Systems and Computing

AdvisorJairo Aponte, Ph.D

Research lineSoftware Engineering

Research groupColSWE: Software Engineering Research Group

Universidad Nacional de ColombiaFacultad de Ingenierıa

Departamento de Ingenierıa de Sistemas e IndustrialBogota, D.C.

December 2012

3

Title in English

Semiautomatic Reverse Engineering Tool on Oracle Forms Information Systems

Tıtulo en espanol

Herramienta de ingenierıa inversa semiautomatica sobre sistemas de informacion legadosen Oracle Forms.

Abstract: Legacy information systems are systems that have had a long evolution, longerthan the typical turnaround time of the developers in the company. They are essential tothe business and encode large amounts of essential information related to the businessprocesses. However, continuous changes in the domain of some systems, the lack of strictmaintenance processes, and the turnaround of developers inevitably leads to gradual lossof knowledge about the system and its domain, most often corroborated by the fact thatexternal documentation is rarely updated in synch with the code and other artifacts. SIFI,a financial legacy information system owned by the software development company ITConsultores S.A.S., presents such a maintenance issues: high coupling, architecture decay,no formal documentation and loss of domain and implementation knowledge, making itsevolution very difficult. Within this thesis, a reverse engineering tool for Oracle Forms andPL/SQL information systems was built, aiming at supporting the maintenance processon SIFI. The tool is able to extract and visualize structural and behavioral informationabout the system, and implements the approach we proposed for automatically extractingstructural business rules from legacy databases. The effectiveness of the tool was assessedunder the understanding and maintenance of SIFI, through a survey. The results showthat tool is very useful, as it improves the productivity of developers to complete theirtasks and the maintenance process of SIFI is now easier for them. In addition, theimplemented business rule extraction approach was assessed though a study with 4 ITCemployees. The results show that the recovery technique is practical, while there isroom for improvement, and it will be used as basis for the recovery of additional knowledge.

Resumen: Los sistemas legados son sistemas que han tenido una larga evolucion,mas larga que el tiempo tıpico de los desarrolladores en una empresa. Estos sistemasson esenciales para el negocio y contienen grandes cantidades de informacion sobre losprocesos de negocio. Sin embargo, los cambios continuos en el dominio de algunossistemas, la falta de procesos estrictos de mantenimiento, y los cambios de desarrolladoresconducen de manera inevitable a perdidas graduales de conocimiento del sistema y sudominio, lo cual es corroborado por el hecho de que la documentacion externa es rara vezactualizada, de acuerdo con el codigo y otros artefactos. SIFI, un sistema de informacionlegado desarrollado y mantenido por la empresa de desarrollo de software IT ConsultoresS.A.S, presenta tales problemas de mantenimiento: alto acoplamiento, decaimiento dela arquitectura, sin documentacion formal y con perdida de conocimiento acerca de sudominio e implementacion, lo cual hace que su evolucion sea difıcil. En esta tesis seconstruyo una herramienta de ingenierıa inversa para sistemas de informacion en OracleForms y PL/SQL, con el objetivo de apoyar el proceso de mantenimiento de SIFI. Laherramienta es capaz de extraer y visualizar informacion estructural y comportamentaldel sistema, e implementa la tecnica que hemos propuesto para extraer automaticamentereglas de negocio estructurales de bases de datos legado. A traves de una encuesta seevaluo la efectividad de la herramienta considerando el mantenimiento y entendimiento

de SIFI. Los resultados muestran que la herramienta es muy util porque mejora laproductividad de los desarrolladores en completar sus tareas y ahora el proceso demantenimiento de SIFI es menos complicado. Asimismo, la tecnica de extraccion dereglas de negocio fue evaluada a traves de un estudio con 4 colaboradores de ITC. Losresultados muestran que la tecnica es practica, habiendo posibilidad de mejora, y serausada como base para recuperar informacion adicional.

Keywords: Reverse Engineering, Legacy Information Systems, Business Rule Extrac-tion, Software Maintenance, Software Understanding

Palabras clave: Ingenierıa Inversa, Sistemas de Informacion Legados, Extraccion deReglas de Negocio, Mantenimiento de Software, Entendimiento de software

Acceptation Note

Thesis Work

Approved

“Laureate mention”

JuryAndrian Marcus, Ph.D.

JuryMassimiliano Di Penta, Ph.D.

AdvisorJairo Aponte, Ph.D.

Bogota, 10/12/2012

Dedication

This Thesis is dedicated to my parents, professors, fellows and other people who helpedme in different ways to complete this project.

Acknowledgments

For me, this project was an enriching experience from every perspective. From theprofessional point of view I acquired new skills in the development of a project and aproduct, from a research standpoint, I was able to address a problem in a systematic andrigorous way; and from the personal standpoint, this project included a high learningfactor regarding the relationship with fellows, colleagues and other people involved.

First of all, I want to thank my parents for helping me to undertake this masterproject and to take me to new and deep horizons in my life. They will be in my heartforever. I also wish to thank my fellows and professors from the university, includingmy advisor and friends from the research group, for their ideas, advices, time andunconditional support in the development of this thesis. They are the best people I’veever met.

Additionally, my sincere thanks to all the people of ITC, including managers, de-velopers and functional people. This thesis was to support the company but I think theirhelp was much more than mine. I would like to give special thanks to those in ITC whobelieve applied research as a means of solving practical problems, and for those who havea vision of creating a better world, from a small dimension such as computing.

I want to thank the National University of Colombia and all its people, for con-tributing to a better country, through the support to research projects. I am happy tobelong to this university and much more to contribute to the development of computing,specifically to software engineering, even if it is a minimal contribution. Finally, there toomuch work to be done in this field. Every day I will be more thirsty for solving researchand engineering problems.

Thank you all.

Oscar Chaparro

Bogota, November 2012

Contents

Contents I

List of Tables IV

List of Figures V

1. Introduction 1

1.1 Background and justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2. Reverse engineering and Business Rules Extraction 5

2.1 Reverse Engineering concepts and relationships . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Reverse Engineering and Software Comprehension . . . . . . . . . . . . . 6

2.1.2 Reverse Engineering and Software Maintenance . . . . . . . . . . . . . . . 6

2.1.3 Reverse Engineering concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Techniques in Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Standard techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Specialized techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2.1 Programming plans matching . . . . . . . . . . . . . . . . . . . . . 10

2.2.2.2 Execution traces analysis . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.2.3 Module extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2.4 Text processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Business Rules Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Manual BRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.2 Heuristic BRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.3 Dynamic BRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

I

CONTENTS II

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3. Automatic Extraction of Structural Business Rules from LegacyDatabases 16

3.1 Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Definition of Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Structure of Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.3 Types of Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.4 What are not Business Rules? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.5 The Relation between Business Rules and Software Information Sys-tems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Extraction of Structural Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 DB - BR Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.2 BR Format and Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 BRE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.4 PP Module Specific Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4. ReTool: An Oracle Forms Reverse Engineering Tool 29

4.1 Oracle Forms Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 ReTool’s Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 ReTool’s Components and features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1 Extractors and Analyzers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.2 Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.3 Visualizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.4 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Further details about ReTool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 Limitations and future enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5. Evaluation and Discussion 55

5.1 Evaluation of ReTool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.1 Purpose of the survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.2 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.3 Survey description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

CONTENTS III

5.1.5 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Evaluation of the BRE technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6. Conclusions 68

6.1 How does ReTool support the maintenance and understanding of SIFI? . . . 68

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.3 Recommendations about the maintenance of SIFI . . . . . . . . . . . . . . . . . . . 70

ReTool Evaluation survey 71

Bibliography 74

List of Tables

3.1 Summary of the heuristics used in the BRE algorithm. . . . . . . . . . . . . . . . 27

5.1 Examples of the extracted BR components from the PP module. . . . . . . . . 64

5.2 Statistics of the extracted BRs and components from the PP module. . . . . 65

5.3 Results of the BR evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4 Quantitative information about the misunderstood/incorrect rules. . . . . . . 66

IV

List of Figures

2.1 Reverse Engineering process according to [36]. . . . . . . . . . . . . . . . . . . . . . 7

2.2 Modularization process based on clustering/searching algorithms [38]. . . . . 12

2.3 Business rules knowledge through abstraction levels. . . . . . . . . . . . . . . . . . 13

3.1 Business rules structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Element of guidance metamodel [44]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Relationship between information systems and business rules. . . . . . . . . . . 20

3.4 DB - BR component mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Implemented BR database model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Example of a form module of an Oracle Forms application. . . . . . . . . . . . . 30

4.2 General architecture of an Oracle Forms application [3]. . . . . . . . . . . . . . . 31

4.3 General architecture of ReTool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Command-line options of the program DB Analyzer. . . . . . . . . . . . . . . . . . 33

4.5 Comparison of the structure of the form CUSTOMERS and output of theForms Parser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.6 Comparison of the elements of the item COMMENTS and the generatedXML file by the Forms Parser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.7 Command-line options of the program Forms Analyzer. . . . . . . . . . . . . . . . 36

4.8 Command-line options of the program Dependencies Analyzer. . . . . . . . . . 36

4.9 Example of the repository metamodel, which represents the database tablesand their relational components, of the target system . . . . . . . . . . . . . . . . 37

4.10 Graphical User Interface of ReTool Web. . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.11 Graphical elements of a box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.12 Pop-up dialog displayed for a table column FOND DESCRI. . . . . . . . . . . . 40

4.13 Icons of every type of object in ReTool Web. . . . . . . . . . . . . . . . . . . . . . . 41

4.14 ReTool Web’s login page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

V

LIST OF FIGURES VI

4.15 Searching page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.16 Searching results page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.17 Object References page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.18 Table References page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.19 Form References page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.20 Table Hierarchy page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.21 Dependencies Tree page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.22 Call Sequences page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.23 Package Information page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.24 Tables-Form page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.25 Forms-Table page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.26 SQL Inserts Generation page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.27 Reverse Engineering Processes page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1 Distribution of the survey respondents by working group. . . . . . . . . . . . . . 56

5.2 Distribution of the survey respondents by working time. . . . . . . . . . . . . . . 57

5.3 Level of feature usage of the tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.4 Level of tool usage on development tasks. . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.5 Degree of tool support in reducing the time to complete tasks. . . . . . . . . . . 59

5.6 Degree of tool support in reducing the effort to complete tasks. . . . . . . . . . 59

5.7 Degree of tool support in increasing the precision to complete tasks andartifacts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.8 Rating of some quality attributes of the tool. . . . . . . . . . . . . . . . . . . . . . . 60

5.9 Status of the received suggestions from the respondents. . . . . . . . . . . . . . . 60

5.10 Specific categorization of the non provided suggestions. . . . . . . . . . . . . . . . 61

5.11 Categorization of the non provided suggestions. . . . . . . . . . . . . . . . . . . . . 61

CHAPTER 1

Introduction

1.1 Background and justification

Software evolution is the changing process of software through time. This process includesthe conception phase, the development and maintenance stages, and the phase-out of thesoftware project and product. Software evolution is an inevitable process because of theenvironmental changes, the new emerging concepts and business rules, and the evolu-tion of technology regarding hardware, software, programming and software engineeringparadigms [9]. From this perspective and depending on the evolutionary strategies appliedto software [9], every software system will eventually be legacy in the future.

Legacy software systems are those that have had a long evolution. These systemswere developed with outdated technologies and paradigms [27][42] and have a large sizein terms of information processing capacity, functionality provided, program components,lines of code (LOC) and other size metrics. They are often difficult to modify and havelow integration level with other systems [42]. On the other hand, they are essential forbusiness [9][27] and encode large amounts of essential information related to the businessprocesses.

The characteristics mentioned above have implications on the evolution of those soft-ware systems. Continuous changes in the system’s domain, the lack of strict maintenanceprocesses, and the turnaround of developers inevitably leads to gradual loss of knowledgeabout the system (its structure and behavior) and its domain, most often corroboratedby the fact that external documentation is rarely updated in synch with the code andother artifacts. Consequently, one of the most important problems in the evolution oflegacy systems is the lack of knowledge about them [27]. The most up to date source ofinformation for recovering this knowledge is the structure, the behavior and the data ofthe information system.

These problems have been addressed by several software reverse engineering techniquesand methods [13][27]; Canfora Harman et al. [13] evidence the work and the success in thearea. Reverse engineering is defined by [14] as the analysis of a system in order to identifyits components, relationships and behavior. The field aims at creating system represen-tations in other forms or at higher abstraction level independent of the implementation.The main goal of the field [13][47] is supporting software comprehension/understanding,

1

CHAPTER 1. INTRODUCTION 2

software maintenance and software re-engineering. More specifically [14][47], reverse engi-neering is employed to reduce software complexity, to generate alternative software viewsfrom different architectural perspectives, to retrieve “hidden” information as a consequenceof system long evolution, to detect side effects or non-planned design branches, to synthe-size software through the generation of high level abstractions, to facilitate software reuse,to assess correctness and reliability of software systems, and to locate, trace and evaluatedefects or changes quickly.

Additionally, software reverse engineering is a process that consists of four steps [47]:data extraction from software artifacts, data processing and analysis, knowledge persis-tence in a repository and presentation of knowledge. The first step is the acquisition ofraw data from several sources of information of the software system. Subsequently, in thesecond step, these data are subjected to cleaning, filtering, transforming and structural,semantic and behavioral analyses. The resulting information of this analysis is stored ina repository or a knowledge database; and finally, this information is presented to theuser in a graphical or textual way through different software views. In the first step, themain source of information is the source code, however, there are other sources that arecomplementary to this type of information. Specifically, there are three types [13]: staticsources, such as the source code or the system documentation (in general, any other in-formation that is not produced by the system execution), dynamic sources, which provideinformation resulting from the system execution such as execution traces; and historicalsources or information of the system evolution, such as version control repositories.

Reverse engineering is a process of knowledge extraction, where such knowledge isuseful and necessary for other processes. Indeed, its importance lies in the role it has inother domains where this method is necessary. Examples include software maintenanceand migration. Software maintenance is costly because of the essential properties of thesoftware; reverse engineering is required for maintaining legacy systems since they are largeand complex, and difficult to modify. In software migration, reverse engineering is the firststep and is used to extract representations of a system at different levels of abstraction andindependent of the implementation. The second step is re-engineering, in which designtransformations are made, leading to new representations and a new implementation ofthe system. Sometimes there are merely programming language transformations.

This thesis describes the application of reverse engineering on SIFI (SIstema FiduciarioIntegrado1), which is an information system implemented in PL/SQL and Oracle Formstechnology, and has a life cycle of over fifteen years. The information system is developedin Oracle Forms, an outdated (obsolete) technology that has several limitations from theevolutionary point of view. Oracle Forms is an enterprise application technology thatprovides a two-layer distributed client-server architecture [55]. The business logic in thistechnology is implemented on both client and server through the PL/SQL language, alanguage extension to SQL that provides procedural programming to the Oracle databasetechnology. This type of two-layer architectures has some advantages over one-layer ones[55], for example, there is a load distribution among clients, the job of the server is executedunder only one context and the performance of the system is often high because of thehigh coupling possibility of the system. However, these advantages have some limitationsfrom the evolutionary point of view [2][55]:

1In English, Fiduciary/Trust Integrated System


1. Scalability issues due to the centralized nature of the server and decentralizationof the clients. A modification in the system often implies a large update in all theclients.

2. Strong coupling since the business logic can be integrated (mixed) with the view (thegraphical user interface) and data management layers, and therefore, the businesslogic is difficult to isolate and change.

3. Strong dependence on the tools’ vendor, which is risky if the vendor stops providingsupport. Also, these low-layer architectures have very poor portability.

Furthermore, SIFI has a large complexity and size, and therefore, its maintenance,understanding and evolution are relatively difficult. SIFI is a financial information systemthat manages the data and the operations related to investments funds, financial invest-ments, trusts, accounting, budget, treasury, accounts payable, billing and commissionsportfolio. The system is maintained by a Colombian software development company (i.e.,IT Consultores S.A.S. or just ITC) and has been deployed for more than fifteen years,operating in several trust companies (banking companies). It is currently deployed andused in nine large Colombian trust companies that represent 60% of this market in thecountry. The system is developed using Oracle Forms, has 1.400 form modules, 3.200database tables, 3.500 stored procedures, 700 database views and around 800 KLOC inthe database.

1.2 Problem definition

SIFI has some maintenance issues that make it difficult to change and evolve. Some ofthese are:

• SIFI is implemented in Oracle Forms 6i, an obsolete technology that has severalmaintenance restrictions (described briefly above).

• The system has high coupling between structural objects, such as database tables,and behavioral objects, such as database packages and procedures.

• The architecture of the system has decayed over time, mainly because of a uncon-trolled maintenance process.

• The system has no formal documentation about technical and business/domain as-pects of the system. There are no traceability links between artifacts either.

• The knowledge about the system has remained mainly within two groups of people:the technical group, who knows SIFI’s architecture, and the business group, whoknows the business processes supported by SIFI. The problem is that when peopleleave the company, a loss of knowledge about the system and the business occurs.

Then, the problem addressed by this thesis is synthesized in two questions:

• How can we support SIFI’s maintenance and understanding processes?

• How can we extract business information from SIFI, such as business rules?


In this sense, a reverse engineering process is required to tackle these problems, insuch a way that structural, behavioral and business information of the system is extractedfrom the source code and other sources of information. The resulting information andknowledge of this process will be oriented to support the evolving tasks on the system.

The main contribution of this thesis is to create a reverse engineering tool for OracleForms and PL/SQL information systems. Specifically, reverse engineering techniques areimplemented for the extraction of system structural and behavioral information, and aBusiness Rules Extraction technique is proposed and developed. The effectiveness ofthe tool to support the understanding and maintenance of SIFI was confirmed by theevaluation carried out. Moreover, this work resulted in a list of recommendations abouthow the maintenance process on SIFI should be addressed by the company.

The reason for creating a tool is that the company is interested in obtaining all thepossible knowledge from this reverse engineering process, considering the implications ofperforming this process and the fact that the company is planning to migrate SIFI to aweb technology.

1.3 Thesis Organization

This document is structured as follows:

• Chapter 2 presents the concepts, methods, and motivation for performing reverseengineering and business rules extraction.

• Chapter 3 describes the proposed method for extracting business rules from legacydatabases.

• The reverse engineering tool we created is described in detail in Chapter 4.

• Chapter 5 discusses the results of the tool evaluation. In the same way, the evaluationresults of the business rule extraction technique are described thoroughly.

• Finally, this document presents the conclusions, future work and recommendationsabout the maintenance process on SIFI in Chapter 6.

CHAPTER 2

Reverse engineering and Business Rules

Extraction

Software Reverse Engineering is a field that consists of techniques, tools and processesto get knowledge from software systems already built. The main goal of this disciplineis to retrieve information about how a software system is composed and how it behavesaccording to the relations of its components [14]. Reverse Engineering aims at creatingrepresentations of software in other forms or in more abstract representations, independentfrom the implementation, e.g., visual metaphors or UML diagrams. Techniques in the fieldare employed for many purposes [14]: to reduce system complexity, to generate alternativeviews of software from several architectural perspectives, to retrieve “hidden” informationas a consequence of a long evolution of systems, to detect side effects or non-planneddesign branches [47], to synthesize software, to facilitate reuse of code and components, toassess the correctness and reliability of systems, and to locate and track defects or changesfaster.

Reverse Engineering (RE) arises as an important area in software engineering, sinceit becomes necessary in Software Evolution. Software Evolution is an inevitable process,basically because most factors related to software (and technology) change [9]: businessfactors such as business concepts, paradigms, processes, etc., and technology factors suchas hardware, software, software engineering paradigms, etc. RE is especially important inregard to legacy systems since they suffer degradation and have a long operational life,producing a loss of knowledge about how they are built. Generally, the changes in thiskind of software actually happen but are not well-documented. Instead, knowledge aboutchanges is kept by people, but people are volatile: people leave projects and companies,and people forget easily. Therefore, knowledge needs to be extracted directly from software(source code) and its behavior (run-time information), which are both the main sourcesof information for RE. In this sense, the general problem is how to extract information orknowledge from software artifacts.

Reverse Engineering can be considered as a more general field compared to SoftwareComprehension, as the first one involves more field actions and not only comprehension ofcode (e.g., RE is also used for fault localization), although sometimes the comprehensionprocess is a secondary result of it. The philosophy about RE is the application of techniquesto know how software works, how it is designed and how this design allows software to

5

CHAPTER 2. REVERSE ENGINEERING AND BUSINESS RULES EXTRACTION 6

behave the way it does. Consequently, in this process it is perfectly natural that thecomprehension part appears as an indirect consequence or as a motivation.

Since Reverse Engineering is vital for software development processes, this chapterpresents an overview of some techniques in the area. The chapter is organized as follows: insection 2.1, the needs, benefits and purposes of RE are reviewed, in the context of SoftwareUnderstanding and Maintenance. Section 2.2 presents a review of some techniques in thefield.

2.1 Reverse Engineering concepts and relationships

The term Reverse Engineering has been used to refer to methods and tools related to un-derstanding (or comprehending) and maintaining software systems. RE techniques havebeen used to perform systems examination, so in Software Understanding and Mainte-nance, RE has been a useful medium to support these processes.

2.1.1 Reverse Engineering and Software Comprehension

Software Comprehension is the process performed by an engineer or a developer to un-derstand how a software system works internally. The understanding process involves thecomprehension of the structure, the behavior and the context of operation of a program.Along with these attributes, the explanation of problem domain relationships is required[15]. Understanding is one of the most important problems in Software Evolution; it is saidthat between fifty and ninety percent of the effort in maintenance stages is devoted to thistask [40]. Commonly, system documentation is out of date, incorrect or even inexistent,which increases the difficulty of understanding what the system does, how it works andwhy it is coded that way [8].

Several comprehension models and cognitive theories are reviewed by Storey in [49].It could be said that the main models, or at least the most common, are top-down andbottom-up. Top-down comprehension strategy is basically the mapping between previoussystem/domain knowledge and the code, through formulation, verification and rejectionof hypotheses. In terms of the implementation of a RE tool, this process typically includesrule matching to detect how code chunks achieve sub-goals within a specific feature or plan[43]. When performing automatic RE to legacy software, bottom-up approach is commonlyused because top-down approach requires detailed knowledge about the “goals the programis supposed to achieve” [11]. In bottom-up understanding, software instructions are takento infer logic and semantic groups, categories and goals. The automation of this approachis very complex [11][43] and is not supposed to be solved by a single technique, becauseof the semantic gap between code and domain knowledge of a system. Additionally,developers actually need a variety of functionalities and information that one techniqueor implementation may not achieve or provide.

2.1.2 Reverse Engineering and Software Maintenance

Software Maintenance is usually defined as the process made on software after its deliveryto production environment [12]. Common activities in maintenance are correction of de-fects, performance improvement and software adaptation due to changes in requirements


or business rules. Software Maintenance is divided into 4 categories [50]: Corrective main-tenance, Adaptive maintenance, Performance enhancement and Perfective maintenance.

Reverse Engineering comprises the first step in Software Maintenance: the examina-tion activity. Changes in software are executed later; therefore, RE does not involve thechanging process [12].

2.1.3 Reverse Engineering concepts

Software Reverse Engineering was defined by Chikofsky et al. [14] as the process ofanalyzing a system to:

• Identify the system’s components and their inter-relationships, and

• Create representations of the system in another form or at a higher level of abstrac-tion.

A discussion of this definition is developed in [51]. In this work, authors state that thisdefinition does not fit to all techniques in the field; for example, program slicing doesnot recover system’s components and relationships. This definition does not specify whatkinds of representations are considered and the context in which the process is executed, sothe role of automation and the knowledge acquisition process are not clear. In this sense,the authors propose a more complete definition: “Reverse Engineering includes everymethod aimed at recovering knowledge about an existing software system in support tothe execution of a software engineering task”.

The process of Reverse Engineering is divided into four phases [36]: ContextParsing, Component Analyzing, Design Recovering and Design Reconstructing. Figure2.1 shows the whole process.

Figure 2.1. Reverse Engineering process according to [36].

Asif in [4] presents the elements involved in the Reverse Engineering process:


• Extraction at different levels of abstraction,

• Abstraction for scaling through more abstract representations,

• Presentation for supporting other process such as maintenance, and

• User specification allowing the user to manage the process, the mappings for trans-formation of representations, and software artifacts.

Khan et al. [47] defines the general process of Reverse Engineering, which consists in fourphases: data extraction from software artifacts, data processing and analysis, knowledgepersistence in a repository, and presentation of knowledge. The authors also present somebenefits and applications of Reverse Engineering. RE is used:

• To ensure system consistency and completeness with specification.

• To support verification and validation phases.

• To assess the correctness and reliability of the system in the development phase,before it is delivered.

• To trace down software defects.

• To evaluate the impact of a change in the software (for estimating and controllingthe maintenance process).

• To facilitate the understanding by allowing the user to navigate through system ina graphical way.

• To speed up the Software Maintenance and Understanding. A requirement of a REtool is the fast and correct generation of cross-reference information and differentrepresentations of software.

• To measure re-usability through pattern identification.

RE embraces a broad range of techniques, from simple ones, such as call graphs extraction,to more elaborated ones, such as architecture recovery. The trends go towards moresophisticated and automatic methods. The next section presents some techniques in thefield.

2.2 Techniques in Reverse Engineering

Methods or techniques (used indistinctly) in Reverse Engineering are automatic solutionsto one or more problems in the field [51]. This section provides a partial revision oftechniques, divided into two categories: standard and specialized techniques.

2.2.1 Standard techniques

Standard techniques include mostly basic descriptive techniques, such as dependency mod-els or structural diagrams. The objective of standard techniques is to obtain system struc-ture and dependencies at different levels of abstraction, by applying basic source code


analysis and Abstract Syntax Tree (AST) 1 processing. According to [47], a RE tooltypically provides the following views:

• Module charts: they present relationships between system components. A moduleis a group of software units based on a criterion. Theoretically, modules must havea well-defined function or purpose.

• Structure charts: in a general sense, they present software objects categorizedand linked by some kind of relationship, generally, method calls. According to [57]2,these diagrams also show data interfaces between modules. Examples of structurecharts are entity-relationship models and class diagrams. Actually, module chartsare structure charts.

• Call graphs: they describe calls and dependencies between objects at differentlevels of granularity. These diagrams are created by analyzing AST from code. Thelevel of granularity is set (for instance functions, methods, classes or variables) andthen, the AST is traversed to find the usage of objects. Call graphs are importantfor change propagation analysis.

• Control-flow diagrams: at low/medium levels of granularity they present thesystems execution flow, in which control structures (e.g., IF or FOR / WHILE)guide the flow. Another diagram of this type is the Control Structure Diagram(CSD) [20], in which source code is enriched through several graphical constructscalled “CSD program components/units”, to improve its comprehensibility.

• Data-flow diagrams: they are graphs that show the flow of data (in parametersand variables) through features, modules, or functional processes [57].

• Global and local data structures, and parameter lists: these allow going toa fine-grained level of software.

The automatic extraction of these diagrams involves several operational tasks on codeand its AST. For example, in data flow diagrams, the transformation and the storingof data must be obtained; in this case every parameter needs to be tracked between andinside methods or functions to know what operations include them and how they are used.Before this process, modules need to be determined. In addition, some of these diagramsare the source of information for techniques such as dependencies analysis of code anddata, and the evaluation of the changes impact in code.

2.2.2 Specialized techniques

Other techniques, called specialized techniques in this chapter, include operations on soft-ware artifacts that are intended not only to describe software but to extract knowledgefrom it. Programming plans matching using artificial intelligence techniques [11], exe-cution traces processing [22][28], module extraction [37][38], natural text processing [42],pattern extraction [24], and Business rules extraction [41][42], are some examples. Someof them are described briefly in this section.

1An AST is a tree representation of the syntactic structure of source code. For example, AST View isan Eclipse plugin for visualizing AST for java programs: http://www.eclipse.org/jdt/ui/astview/.

2See chapter 15 of [57]: Additional Modeling Tools.

http://www.eclipse.org/jdt/ui/astview/


2.2.2.1 Programming plans matching

One problem in Reverse Engineering is how to find the semantics (the meaning of) of code.In the automation of this process the analysis of code identifiers and concept extractionare almost required. However, another approach is making matches between chunks ofcode and programming plans stored in a repository [11][43]. A programming plan isa ”design element in terms of common implementation patterns” [43]. A plan can beconsidered as a programming pattern or template, and can be generic or domain-specific.Some examples of plans are READ-PROCESS-LOOP and READ-EMPLOYEE-INFO-CALCULATE-SALARY; the former is a generic plan which means “reading input valuesand perform some actions to each value” at implementation level. The latter is a specificdomain plan that is a specialization of the first one because at implementation level isalmost the same as the first plan, but has a more semantic stereotype. This means thatthe repository has a set of generic and specialized plans organized in a hierarchy. In [43],the authors state that the recognition of programming plans against information of theAST of code is better, in terms of searching cost, if the plan library is highly organized,each plan has indexing and specialization/implication links to other plans3.

However, the matching task is a NP problem, so the process is computationally ex-pensive [11]. The work presented in [11] addresses this problem by applying artificialintelligence techniques. The approach is a two-step process. The first step is a GeneticAlgorithm execution to make an initial filtering of the plan library based on “relaxed”matching between code chunks [10] and programming plans stored in the library (therepository). The second step uses Fuzzy Logic to perform a deeper matching. The outputof the whole approach is a ranked set of programming plans according to a similarity valuewith a chunk of code.

In summary, the objective to be achieved by these works is to find programmingplans similar to a portion of code. Programming plans are stereotypical patterns of code(generic or domain-specific patterns), therefore it is possible to assign high level conceptsto programs, once the matching process has been performed.

2.2.2.2 Execution traces analysis

As a complement to Static Analysis in RE, processing of run-time information is commonlyused in what is called Dynamic Analysis. Maybe the most common source is the systemexecution trace.

Execution traces analysis refers to execution trace processing to find patterns in tracesthat have a specific function. The advantage of traces is that they show the portions ofsoftware that are being executed in a specific execution scenario. In this way, the searchspace is smaller than the one in static analysis because the executed portions of code arethe only ones considered.

Two problems are detected in dynamic processing: first, knowledge of the systemis required to perform this analysis, and second, this dynamic analysis produces hugeamounts of information (long traces). The former problem refers to the fact that it isnot possible to capture the (infinite) entire execution domain of a system. If there is noknowledge about how the system works, using and executing all its functionalities is not

3According to Quilici [11], a plan consists of inputs, components, constraints, indexes and implications.


possible. If it is not necessary to know about the entire system, and the knowledge aboutthe use of specific functionality actually exists, this would not be a problem. The latterproblem depends on how the software is built at low level, how the code is instrumented4

and how much information the user needs. Other problems related to dynamic analysisare low performance, high storage requirement and cognitive load in humans [15].

Object-oriented software has been the most common object of study in traces analysis.For example, in [22], the problem of identifying clusters of classes is addressed basedon a technique that reduces the “noise” in execution traces through the detection ofwhat the authors call “temporally omnipresent elements”, which represent execution units(groups of statements) distributed in the trace. In this sense, noise represents informationthat is not specific to a behavior of interest. For this, samples of a trace are taken andthe distribution of each element along the samples is calculated through a measure oftemporal occurrence. To cluster elements, dynamic correlation is used. Two elements aredynamically correlated if they appear in the same samples, so the measure of correlationis based on the number of samples in which they occur. The clustering part takes allelements whose correlation is higher than a fixed threshold, thus grouping elements incomponents.

In summary, the noise of traces is removed and then clustering is applied to the filteredtraces. That work presents an industrial experiment of the approach. The system ofstudy was a two-tier client-server application: the client was a Visual Basic 6 applicationof 240 KLOC and the server was comprised of 90 KLOC of Oracle PL/SQL (ProceduralLanguage/Structured Query Language) code. The client code was instrumented sinceno trace generation environment was found, and in the case of the server, Oracle tracingfunctions were used. The system was executed over a use-case, producing a trace of 26.000calls.

Similarly, the authors in [28] define “utility element” as any element in a programthat is accessed from multiple places within a scope of the program. The objective of theauthors is to remove these utility elements or classes from the analysis. This is achievedby calculating the proportion of classes that call each other (fan-in analysis) iterativelyby reducing the analysis scope or by applying the technique on a set of packages. Theutility-hood metric, U , of the class C is defined as

U = |IN | /(|S| − 1) (2.1)

where S is a set of classes considered in the analysis and IN is a subset of classesthat use C. Besides this metric, the standard score (z-score) was considered to determinepossible utility classes: classes with large and positive z-score values are possible utilities.Once the filtering is performed, the depiction of components is done by a tool that gener-ates Use Case Maps 5. In this latter step, calls between classes and conditions of execution(control-flow statements) are considered.

As it is noticed, the main challenge in execution trace analysis is how to reduce andprocess the trace, so the final result is a good abstraction of what a system does under aspecific execution scenario. The main problems to be addressed are the definition of theinformation that the traces should have, the metrics and procedures that should be used

4Code instrumentation refers to the use of software tools or additional portions of code in the system,through which execution traces and behavior information of the system are gathered.

5For more information about this type of models refer to www.usecasemaps.org

www.usecasemaps.org


for filtering, and the way to analyze and represent the reduced traces so that they canexpress knowledge about the system.

Other dynamic analysis works employ web mining [58], association rules and clustering[35][34][45], and reduction techniques [16]6.

2.2.2.3 Module extraction

Hill Climbing and Simulated Annealing are used to form clusters of components based onfan-in/out analysis. The approach starts by building a Module Dependency Graph, thenrandom partitions (clusters) of the graph are formed as the initial clustering configuration,and later the partitions are rearranged iteratively by changing one component from onecluster to another. The objective is to find the optimal configuration based on the conceptsof low coupling and high cohesion, which is achieved by considering fan-in/out information.This was accomplished by maximizing the objective function, which the authors calledModularization Quality (MQ). The general process is shown in Figure 2.2.

Figure 2.2. Modularization process based on clustering/searching algorithms [38].

2.2.2.4 Text processing

Text processing refers to the processing of source code as text. This implies the processingof morphological, syntactic, semantic and lexical elements of code. Text processing takesadvantage of identifiers and how statements are organized to extract semantic informationof artifacts: classes, methods, packages, etc. The requirement is that code is well written,i.e., the identifiers express semantic information of the business, and follow some generalparameters about how they are defined.

In [42], the key-phrase extraction algorithm KEA7 is used to translate business rulesinto specific domain business terms, from documentation. For this, they connect docu-

6For more information about dynamic analysis techniques see [15].7For more information about the KEA algorithm see http://www.nzdl.org/Kea (November, 2012)

http://www.nzdl.org/Kea


ments to business rules. The key point is that documents contain technical description ofvariables, so it is possible to establish a direct mapping between rules and documents.

2.3 Business Rules Extraction

Business Rules Extraction (BRE) is a reverse engineering process for recovering BusinessRules (BR) from software information systems (Figure 2.3). In general, BR are constraintsthat define the structure of a business (i.e., structural business rules) and guide the waya business operates (i.e., operative business rules) [44][52]. BRE is important in softwareknowledge acquisition because:

• It is a means for software re-documentation [1] and functionality-code tracing [7].

• It builds BR-Code mappings that can support the understanding of the system [48].

• It supports the validation process for checking that the system fulfills its specification[7], i.e., that all the business rules are actually implemented [23].

• It is used in software re-engineering and migration [23].

In this section, the Business Rule Extraction related work is reviewed. This revision is theresearch background of the proposed approach for extracting structural BRs, described inthe next chapter.

Figure 2.3. Business rules knowledge through abstraction levels.

We overview the BRE work in three categories: manual, heuristic, and dynamic. Theformer refers to conduct manual examination of source code for extracting BR, and theothers about performing automatic extraction of rules. Heuristic techniques have focusedon static processing of source code, while dynamic techniques emphasize the processing ofdynamic artifacts, such as, execution traces. In general, automatic BRE has only focusedon the examination of source code and on the identification of language structures orprogram sequences that could lead to business rules.


2.3.1 Manual BRE

Earls et al. [23] present a method for manual BRE on legacy code. The authors mentiontwo BRE approaches: program-centric and data-centric. The method they propose isprogram-centric and focuses on locating and classifying error-processing sections in sourcecode and the conditions that lead to those sections. The most important conclusion of thiswork is that manual BRE requires too much time, especially in large information systems,but is more accurate compared with existing and proposed automatic tools and methods.

In [54] the authors present an analysis on the impact of human factors in the BRE oflegacy systems and survey additional related work to manual BRE.

2.3.2 Heuristic BRE

Heuristic BRE consists on automatic extraction techniques that take advantage of somecommon elements of the source code that could compose BRs. Elements considered inheuristic BRE are, for example, identifier names, exception raising/handling statements,and control flow structures. One method generally used in heuristic BRE is programslicing [56].

The work presented in [30] and [54] focus on identifying business/domain variablesand the application of generalized program slicing for those variables. There are someimportant facts to consider from these works. First, the input and output variables areimportant because they belong to data flow interfaces in programs; second, control state-ments are essential for BRE since they guide the program’s execution flow; and third, theBR representation defines mostly the BR components.

Shekar et al. [46] focus on enterprise knowledge discovery from legacy systems. The au-thors consider more “semantic” source code elements for performing knowledge discovery.Examples of the elements are: output messages, semantic relationships between variablesand table columns, variable usages, assignment and control statements, and variables usedin database queries.

In [7], the authors suggest to use error messages, program comments, functions, andcontrol flow structures. The authors present three steps to extract BRs, based on SBVRstandard: extraction of business vocabulary, creation of rules using vocabulary, and “ac-tivity interleaving”.

The authors in [41] perform BRE on COBOL legacy systems. They focus on simpleCOBOL statements that carry business meaning, such as, calculations and branchingstatements. In addition, they use the identifiers and conditions to extract the meaningand context of BR. The format used for BRs is [conditions] [actions], in which actions arecompleted if conditions are satisfied.

In [48], the following four program elements that compose business rules are proposed:results, arguments, assignments, and conditions. Using program slicing, the authors trackall the assignments of data results from calculations and capture the conditions that triggerthe assignments. In this work, meaningful names of variables is mandatory.

The authors in [53] propose the use of information-flow relations between input/outputvariables, statements and expressions to identify domain variables. The approach theypropose is mostly useful on procedural code.


While all these techniques relate to our BRE approach, in as much as they are static,automated techniques, none of them has the same input and output format as our imple-mentation, hence we could not use any of them in our work.

2.3.3 Dynamic BRE

The work in [19] and [31] are examples of dynamic BRE. They present process miningapproaches, which are intended to recover decision rules and control flow of systems fromlogs and execution traces. According to the authors, the main advantage of analyzing dy-namic artifacts is that it is possible to detect participants, responsibilities, and concurrentactivities in processes [31]. However, logs and other sources of information should complywith certain characteristics that make possible to perform process mining.

2.4 Summary

In this chapter we focused on the theoretical background and related work concerning re-verse engineering and business rules extraction. In the first instance, the RE concepts werecovered, including its definition, its process, and its relationship of software maintenanceand understanding. Later the chapter presents a partial revision of standard and spe-cialized RE techniques. Some of the standard techniques are implemented in the createdreverse engineering tool we created. Chapter exposes the details about this. Finally, theBRE related work is reviewed as the research background for the proposed BRE technique,which is defined and described in detail in the next Chapter.

CHAPTER 3

Automatic Extraction of Structural Business

Rules from Legacy Databases

As described before, ITC has no formal current technical and domain documentation inSIFI, including of the BRs involved in SIFI’s business processes. The knowledge about thesystem has remained mainly within two groups of people: the technical group, who knowsSIFI’s architecture, and the business group, who knows the business processes supportedby SIFI. The problem is that when people leave the company, a loss of knowledge aboutthe system and the business occurs. To mitigate this problem the company has started areverse engineering process in which BRE is one of the most relevant steps. This Chapterreports the first steps of this process: an approach for extracting structural BRs fromlegacy databases.

In order to define the BRE approach, we performed a revision of the BR concepts,their characteristics and categorization, based on the Semantics of Business Vocabularyand Business Rules (SBVR) standard [5][26]. Then, we analyzed the structural database(DB) components of SIFI and the BR concepts to define a mapping among them, usingbasic heuristics. Finally, we defined a BR format, following the SBVR methodology forthe extraction of rules. Our technique extracted 870 BRs from the SIFI databases. Fouremployees of ITC analyzed 300 of the rules and found that 29% of recovered rules arecorrect structural business rules, 36% correspond to implementation rules, and 35% areincomplete or incorrect rules. We further analyzed with the four evaluators the incom-plete/incorrect rules to identify ways to improve them. The recovery technique proves tobe practical, while there is room for improvement, and it will be used as basis for therecovery of additional knowledge.

The chapter presents first background information on BRs, followed by the definition ofour approach for BRE. The evaluation process and the results are presented and discussedin Chapter 5.

16

CHAPTER 3. AUTOMATIC EXTRACTIONOF STRUCTURAL BUSINESS RULES FROM LEGACYDATABASES17

3.1 Business Rules

Business rule is a common term in business and software design. Intuitively, we considerBR as what the business and software does or operates. Although this conception is notwrong, it is quite inaccurate.

3.1.1 Definition of Business Rules

A rule is an explicit regulation or principle that governs the conduct or procedures withina particular area of activity, defining what is allowed. A business rule is a rule underbusiness jurisdiction, that is, a rule that is enacted, revised and discontinued by thebusiness [26][44]. For example, the “law” of gravity can affect a particular business, butit is not a business rule because the business cannot govern it; instead, the business maycreate rules for adaptation or compliance.

Business rules guide the behavior or action of a particular business and serve as crite-rion for making decisions, as they are used for judging or evaluating a behavior or action.They shape the business (i.e., the business structure) and constraint processes (i.e., thebehavior of the business), to get the best for the business as a whole [29][44]. This meansthat business rules must be defined and managed in an appropriate way to guide thebusiness to an optimal state.

3.1.2 Structure of Business Rules

OMG’s Semantics of Business Vocabulary and Business Rules (SBVR) standard [5][26]defines the key concepts of BRs:

• Business vocabulary: is the common vocabulary of a business, built on concepts,terms, and fact types.

• Business rules: they are statements/sentences based on fact types that guide thestructure or operation of a business.

• Semantic formulation: is a way of structuring the meaning of rules through severallogical formulations [26], e.g., logical operators (and, or, if-then, etc.), quantificationstates (each, at least, at most, etc.), and modal formulations (It is obligatory or It isnecessary).

• Notation: is the language used to write and express BRs. The SBVR standarduses three reference notations, namely: SBVR Structured English, RuleSpeak, andObject-Role Modeling.

Business rules are composed of a structured business vocabulary [44], which provides themwith meaning and consistency. Structured business vocabulary comprises the followingelements [29][44]:

• Noun concepts: they are represented by terms of the business. They are elemental,often countable and non-procedural. For instance, Currency, Country, Customer,Operational Cost, etc.


• Instances: they are “examples” of noun concepts. Instances are always from thereal world, not in a model. For example, Euro, United States, etc.

• Fact types: they are connections of concepts made by verbs or verb phrases. Theygive structure to the business vocabulary, recognize known facts and organize knowl-edge about results of processes. Common shapes or classes of fact types are cate-gorizations (e.g., Current Account is a category of Bank Account), properties (e.g.,Bank Account has Balance), compositions (e.g., Bank is composed of Customers andFunds) and classifications (e.g., Euro is classified as Currency). Fact types also canbe categorized by arity, which is the number of noun concepts in the fact type.

The structured business vocabulary can be represented in many forms. Ontologies orUML class diagrams are examples of such forms; they can be used for representing andstructuring the vocabulary and the knowledge about a business domain.

Through semantic formulations, business rules add a sense of obligation or necessityand remove degrees of freedom to the structured business vocabulary [44] (see Figure3.1). For example, for the fact type Bank Account has Account ID, a business rule couldbe A Bank Account must have only one Account ID. In this case, the business rule hasquantifiers (A, only one) and an operative modal keyword (must) that restricts the facttype. Another form to express the same business rule is: It is necessary that a BankAccount has exactly one Account ID. In this case, the rule has the structural prefix modalkeyword it is necessary that and the quantifiers a and exactly one.

Figure 3.1. Business rules structure.

Although business rules can be expressed in several forms, it is required to followonly one specific notation, such as SBVR Structured English or RuleSpeak. Even so,and regardless the way business rules are expressed, they must be declarative and non-procedural, i.e., they must define the case/state of knowledge without describing when,where, or how the case/state is achieved [29][39][44].

3.1.3 Types of Business Rules

There are two types of business rules [5][6][44]:

• Behavioral or operative rules: they govern the behavior or business operationsin a suitable and optimal fashion and, therefore, they are important for modeling


business processes. These rules always carry the sense of obligation or prohibition,can be violated directly [44], and not all are automatable. Examples of this kind orrules are the following:

– Not automatable: “A Customer Service Operator must contact a Customer atleast every month”.

– Automatable: “A Loan below $1000 must be accepted without a Credit Check”.

• Definitional or structural rules: they structure and organize basic businessknowledge. They carry the sense of necessity or impossibility and cannot be vi-olated directly. Unlike behavioral rules, not all definitional rules are business rules(e.g., law of gravity or rules of math); however, all of them are automatable. Whenevaluating structural rules, usually there are two possibilities: classifications (classmembership) and computations (results). Some examples are:

– Classification: “A Customer is always a Premium Customer if the Customerhas a Balance of more than $1,000,000”.

– Computation: “The Total Balance of a Bank Account is always computed asthe Sum of Transaction Values”.

3.1.4 What are not Business Rules?

Business rules, business policies, and advices are elements of guidance (guidelines) (seeFigure 3.2). Then, what is the difference between them? why are all of them not consideredas business rules? First of all, business rules must be practicable elements of guidance.Although business policies are guidelines, they are not practicable, so they are not businessrules [44]. For example, the sentence “Compliance to the Customer is our Priority” is nota BR. Instead, business policies are reduced to practical guidelines, i.e., to business rulesor advices. Advices and business rules are practicable guidelines because they are builtupon fact types, but advices do not remove any degree of freedom from fact types [44]. Inother words, they do not set any obligation or prohibition on business conduct, and anynecessity or impossibility for knowledge about business operations [44]. The following isan example of an advice: “A Customer Loan may be handled by a Personal Assistant”.

In the same way, the following elements are not business rules:

• Events: they express actions performed by an actor in a specific moment in time. Abusiness rule can be analyzed to find events where it needs to be evaluated.

• CRUD1 operations: they are events that always result in data rather than a businessrule.

Exceptions are not BRs per se, but violations to rules. However, they are used to formulateand organize logical and coherent BRs. In a business rule context there should not beexceptions; instead, well stated business rules.

1Create, Retrieve, Update and Delete.


Figure 3.2. Element of guidance metamodel [44].

3.1.5 The Relation between Business Rules and Software InformationSystems

Business rules capture the decision logic needed for activities in business processes and pro-duce multiple events in processes [44]. Typically, software systems model and implementbusiness process, so we can say that business logic and rules are in source code (see Figure3.3), at least implicitly. For example, the system messages, when an operative exceptionhas occurred, have implicit BRs, if they provide the users with guiding messages.

Figure 3.3. Relationship between information systems and business rules.

Business rules are key components within the structure of business processes and soft-ware models [6][44]. They evaluate facts in a process and can control the basis for changesin the flows of processes in which decisions appear. At business level, business rules enablethe business to make consistent operative decisions, to coordinate processes, and to applyspecialized know-how in the context of some product/service [44]. At software level, busi-ness rules guide flows in procedures and constraints the facts allowed. In any case, if abusiness rule changes, the business processes and the software modules associated to that


rule need to be changed [23], in order to restore the compliance with the business rules[7].

3.2 Extraction of Structural Business Rules

Business rules rely on fact types and, in turn, fact types on business vocabulary. For theautomatic extraction of structural BRs from database components we followed the samereasoning. First, we extracted business concepts; then, we created fact types by relatingthe concepts through verb phrases and; finally, we generated sentences and business rulesby adding quantifiers, modal and logical keywords to fact types. In this process, we triedto map every structural database component, including tables, columns, constraints andcomments, to the business rules structures, i.e., concepts, verb phrases, fact types andrules. We defined a set of heuristics for BRE, which is used in an algorithm, and also abusiness rule DB model, which represents the format of the extracted rules.

For defining and refining the mappings and heuristics, we analyzed the database ofSIFI. Because of the large size of the system, we decided to start the work on the Pro-gramming and Payment (PP) module, which is one of the largest modules of the systemand it is used by almost all of the other SIFI modules. The module handles common tasksin many processes of a trust company: trust tax payments, contracts and invoice pay-ments, investment discharges, and in general, payments to trust suppliers. We analyzedthe portions of the SIFI database that correspond to the PP module.

3.2.1 DB - BR Mappings

We identified a set of database components to extract BRs from, and defined DB-BRmappings using basic heuristics (see Fig. 3.4). For this, we analyzed the DB core of thePP module, composed of 25 tables and considered standard DB elements, so that theheuristics would work for other systems. The following structural database componentswere considered2:

Tables: they often model noun concepts from the business domain. A table representsa unique concept, which is extracted from its comments and/or its name. For instance,the table called FD TMOVI stores the movements of trust payments, so the concept ofthe table is Trust Payment Movement. Nevertheless, there are tables that only relatetwo or more tables (many-to-many relationships), resulting in those that represent verbphrases, instead of concepts. For example, the table called FD TMVCR represents therelationship Rejection Cause per Trust Payment Movement. This table only relates thetables FD TCSRZ (Movement Rejection Cause) and FD TMOVI, thus representing arelationship. In this case, the table expresses the has relationship and the fact type TrustPayment Movement has Rejection Cause.

Table columns (attributes): they often model noun concepts and fact types. Atable column also has a unique concept, and its relationship with the table that belongto, represent a fact type. The comments of columns are used for extracting concepts andverb phrases. Initially, the fact types that can be created with column concepts are thoseof class property. For example, the column MOVI ESTADO, which belongs to the table

2All the examples in this and the next sections are from the PP module. The original data is in Spanishand we translate here some of the examples.


FD TMOVI, has the concept State (i.e., Estado in Spanish), so the property fact typeextracted from the column and the table is Trust Payment Movement has State.

Foreign keys/constraints: these constraints represent relationships between tableconcepts, i.e., fact types. The verb phrases of these relationships are extracted from thecomments of the tables/columns that the constraints reference. For example, the tableFD TMOVI has a foreign key that references the table called FD TOPER (which has theconcept Trust Movement Operation); therefore, the fact type that represents the foreignkey is Trust Payment Movement generates Trust Movement Operation. In this case, theverb generates is extracted from the comments of the columns involved in the constraint.

Primary and unique keys/constraints: these constraints are used only to ensureuniqueness and identification of data. Therefore they do not map to any concept of BR.

Check constraints: they are conditions that always should be satisfied when addingand updating data in tables. This means they cannot be violated and, therefore, theyoften encode structural BR. We found three types of check constraints in the PP module:non null constraints, check list constraints, and others.Non null constraints verify that columns always have non null values. A non null constraintoperates only on one column of a table. For example, the table FD TMOVI has a con-straint that checks that column MOVI ESTADO has non null values (MOVI ESTADOIS NOT NULL). A BR that can be created, from the constraint, is A Trust PaymentMovement always has a State, which relies on the fact type Trust Payment Movementhas State (created from the concepts of the table and column, and their relationship).Check list constraints verify that column values always are in a finite list of values. Wefound three possibilities regarding the meaning of values: values meaning classes of con-cepts, states of concepts, or operative parameters. The meaning of the values is automati-cally determined using the comments of the columns involved in the constraint. For this,it is expected that the meaning of each constraint value is explicitly defined in the com-ments.Regarding the classes of concepts, we used the heuristic that classes are often repre-sented by nouns. In this case, the nouns corresponding to each value are used to createcategorization fact types. For example, the table called GE TFORMULA (Formula ofMovement Concept) has a column called FORM CLASE (Formula Class) and a checklist constraint with the code

FORM CLASE IN ( ’CD’ , ’SD ’ )

The value ’CD’ corresponds to the concept Discount Formula and the value ’SD’ to theconcept Non-discount Formula. Both concepts are extracted from the nouns in the com-ments and represent categories of the column concept. Then, the created categorizationfact types are: Discount Formula is a category of Formula Class, and Non-discountFormula is a category of Formula Class.Regarding the values that encode states of concepts, the heuristic used was: wordscorresponding to each value are verbs in past participle or, in some cases, adjectives. Forinstance, the column MOVI ESTADO (table FD TMOVI ) is used in the check constraint’scode

MOVI ESTADO IN ( ’A ’ , ’ I ’ , ’T ’ , ’P ’ , ’X ’ )

where ’A’ is authorized, ’I’ is inserted, and ’X’ is canceled, etc. For this type of checklist constraints, the values are used to build unary fact types, such as: Trust Payment


Movement is authorized or Trust Payment Movement is canceled .Regarding values that mean operative parameters, we found that in the PP modulethey are used for guiding the operations and processes in the system. In other words, theydo not represent structural knowledge. For example, the column FORM IMP MUNIC(table GE TFORMULA), with the concept City Tax, is used in the check constraint’scode

FORM IMP MUNIC IN ( ’S ’ , ’N ’ )

where there are two values: ’S’ (Yes) and ’N’ (No). The column comment indicates anoperative condition that means if the formula should or should not consider City Taxes.Other check constraints are those that involve any other logical conditions that alwaysshould be satisfied. For example, the column called MOVI VLRMOV (table FD TMOVI ),which is the Movement Value, is used in the check constraint’s code

MOVI VLRMOV >= 0

which means that the value cannot be negative.

Table/column comments: comments are descriptions in natural language abouttables and columns in the database. They are used to extract noun concepts and facttypes related to all the other DB components. When comments are not available in thedatabase, concepts and fact types must be manually assigned or extracted from othersources, such as labels in the presentation layer of the information system.In the case of SIFI, we were able to automatically identify the concepts of tables andcolumns from their comments (in most cases), using a Part-Of-Speech (POS) tagger - weused the TreeTagger3, which works for Spanish. By analyzing the DB comments of the PPmodule, we also realized that comments often contain more than one noun or composednouns; thus, based on some informal tests we made about the completeness of the conceptregarding the number of nouns, we decided to join up to three nouns to create a concept.Additionally, for creating fact types, on each column comment we expect to find verbs thatassociate concepts. When the column comments only contain nouns, the extracted verb isthe one used for property fact types, i.e., the verb has. When the comments contain verbs,they are identified with the POS tagger and used to create fact types. For example, forthe fact type Trust Payment Movement generates Trust Movement Operation, the verbgenerates is extracted from the comment of the column MOVI OPER (table FD TMOVI ).Finally, the POS tagger was used to identify nouns, verbs in past participle and adjectives,for establishing the meaning of values in check list constraints.

3.2.2 BR Format and Schema

Based on SBVR Structured English and RuleSpeak, we defined a BR format, which wasimplemented in a database schema. The schema represents the basis for the development ofan automated General Rulebook System (GRBS) [44], that allows to store and track all theknowledge around the BRs, including concepts, fact types, rules, and their relationships.The general goal of a GRBS is to provide means for the smart governance and a corporatememory though traceability [44]. Our schema has the same long term goal.

The schema we defined is composed of eight database tables (see Fig. 3.5). Sevenof them model the BR concepts and the other one, the bs t object concept table, models

3TreeTagger can be found at http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/.

http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/


Figure 3.4. DB - BR component mappings.

the links between the DB components and noun concepts. As a result, the BR schemanot only models and formats BRs but also provides support for tracking the relationshipsbetween DB and BR components.

According to the schema, concepts and verbs are terms used to compose fact types. Inturn, fact types are complemented with quantifiers and keywords to compose sentences;and finally, the combination of sentences and keywords, compose BRs. Examples of quan-tifiers are the following: every, at least, maximum, exactly, more than one, and between.Keywords are logical (if, only if, and, or, etc.), operative (must, can) or structural (always,never).

A sentence is composed of two quantifiers, one per fact type concept, and a keyword.For instance, the sentence A Trust Payment Movement always has a State has the quanti-fier “a” for both concepts and the structural keyword “always”. In turn, a BR is a sentenceor a composition of two sentences joined together with another keyword. For example, aBR composed of two sentences is: A Trust Payment Movement only is authorized if theMovement has no Rejection Causes.

There are two important design elements of the schema: a fact type can be composedby one or two concepts, having unary and binary fact types only; and in the same way, abusiness rule is exclusively composed of one or two sentences. These design decisions werebased on the BR reduction principle expressed in [44]. The principle aims at producingeasily-understood and granular rules that can be independently managed, re-used, andmodified.


Figure 3.5. Implemented BR database model.

3.2.3 BRE Algorithm

The BRE algorithm has two parts: (1) extraction and processing of concepts and facttypes; and (2) creation of the business rules. The input of the algorithm is a list of tablesof the target system (in this case SIFI), which are processed one by one, and the outputis a set of persisted concepts, fact types, and rules, following the schema defined above.From each table, the comments and the structural elements mentioned above are used.

The first part of the algorithm creates the structural vocabulary, necessary for gener-ating BRs. The following steps are followed in the first part:

• For each table extract the comments. From the comments extract the nouns. Createthe noun concept by using up to the first three nouns encountered in the comment. Ifthere are no nouns or no comments, then use the table name as noun concept. Insertthe noun concept in the database (unless already exists). The relationship betweenthe noun concept and the table is inserted (for DB-BR component tracking).


• For each column that does not correspond to the foreign or primary keys extract thecomments. If the comment contains the word “if ”, then do not process it, else extractthe nouns. Create the noun concept by using up to the first three nouns encounteredin the comment. If there are no nouns or no comments, then use the column nameas noun concept. Insert the noun concept in the database (unless already exists).The relationship between the noun concept and the column is inserted.

• For each check constraint in a column. Extract the nouns, the past participle verbs,and the adjectives from the comment of the column.If the check constraint is a list constraint (the PL/SQL condition of the constraint isparsed and verified), then extract the values of the list. If these values only include“Yes/No”, then exclude the constraint, because they do not represent structuralconstraints (instead, they are used for guiding procedures).If the number of nouns is greater than 2 and the number of past participle verbs isless than 2, then the values represent classes of the column concept, so insert eachvalue as a new concept and create a category fact type between this new conceptand the column concept. Else, the values represent states of the column concept, soinsert each value as a verb phrase, thus creating unary fact types.

• Obtain the foreign keys for each table and its parent tables. For each foreign key,obtain the columns of the table involved in the foreign key. Identify the verbsfrom the comments of the columns. If there are no verbs then use the verb phrase“has/belongs to” as the verb for the fact type created between the table concept andthe concepts of its parent tables. Else concatenate the verbs as the verb phrase ofthe fact type.Repeat the procedure with the children tables.

The second part of the algorithm generates the BRs:

• For each table, its parents and its children tables, do the following: for each columnthat only have a non null constraints obtain the fact types related with the column(see above). For each fact type, create and insert the sentence and the business ruleby adding the keywords “each”, “always”, and/or “one” to the fact type (follow-ing the schema defined). For example, for the fact type Deposit has TransactionValue, the keywords are placed always in this way: Every Deposit always has oneTransaction Value.

The summary of the heuristics used in the algorithm is presented in Table 3.1.

3.2.4 PP Module Specific Heuristics

The above algorithm evolved in several iterations, where the results were investigated man-ually by the developer of the algorithm (not an expert in the domain, yet knowledgeable)and new heuristics were introduced based on these results. Most of these heuristics wereintroduced to make the approach more conservative. Despite that these heuristics couldaffect the generalization of the approach, we find them useful for increasing the precision.

For example, three conditions were necessary to detect and avoid columns that repre-sent operative conditions and non-structural rules: (1) the presence of parametric valuesin the check list constraints; (2) the presence of the word “if” in the column comments;


Table 3.1. Summary of the heuristics used in the BRE algorithm.

Concepts are created from the first three nouns in thecomments of tables/columns.

Concepts are created with the name of the table/column ifit was not possible to identify the nouns in the comments.

Unary fact types are created from check list constraintswhich values encode state of concepts, if the column

comments contain more past participle verbs than nouns.

Category fact types are created from check list constraintswhich values encode classes of concepts, if the column

comments contain more nouns than past participle verbs.

Property fact types created from columns that do notbelong to FKs or PKs, with the verb “has”.

Property fact types are created with all the verbs extractedfrom the comments of the columns that belong to FKs.

Property fact types are created with the verb phrase“has/belong to” from the columns involved in FK, if it was

not possible to identify the verbs in the comments.

Business rules created are from property fact types ofcolumns involved in non null constraints (those having the

verb “has”), by adding the keywords “each”, “always”,and “one”.

and (3) the acceptance of null values in the columns. This allowed us to filter out extractedBRs from 1,010 to 870.

Another heuristic addresses the presence of the word “which” in the column comments.We noticed that columns having this word represent elements of business operations. Inother words, they must be used to extract operative BRs (non-structural BRs). The samecase happened with columns that had values meaning states; generally, they representcompleted actions and, therefore, they can be used to create operative BR. In any case,those concepts and fact types were extracted, and we expect to use them for proposing amethod for extracting operative rules as future work.

Apart from that, we found some problems related with the design of SIFI. An issueis that some categories between concepts that were detected manually from foreign keys,could not be extracted automatically. For example, the table called FD TFIDE, whichrepresents the concept Trust, has a foreign key to the table GE TCIAS, which representsthe concept Company. This foreign key represents a category of Company, resulting in thecategorization fact type Trust is a category of Company. We did not find an automaticway for extracting these kind of fact types. Apart from that, we found other cases in whichtables have the unique role of relating tables (many-to-many relationships), resulting incomposition fact types. At this stage we did not extract those fact types.

Another fact we noticed was that some presumed nullable columns may not be nullablein reality. In this case, we assumed the constraints that check the column values are inhigher layers of the information system, i.e., in presentation or application logic layers.Extracting BR relevant information from those layers is subject of future work.


3.3 Summary

In this Chapter we presented the BRE approach for legacy databases. We performed arevision of the BR concepts based on the SBVR standard, defined DB-BR componentmappings, heuristics, and a business rule format, concluding with the design of a BREalgorithm, which was implemented in the reverse engineering tool, described in detail inthe next Chapter.

CHAPTER 4

ReTool: An Oracle Forms Reverse Engineering

Tool

ReTool is a static Reverse Engineering tool for Oracle Forms information systems. Thegeneral purposes of the tool are basically two:

• To synthesize all the technical and domain knowledge around the target systems1,and

• To support the maintenance and understanding processes in Oracle Forms systems.

ReTool also has a specific purpose, which consists on applying these general purposes toSIFI, the main information system that ITC develops and maintains, in order to tacklethe maintenance issues of the system.

The general features of the tool are:

• Extraction, analysis and persistence of the structure and dependencies of forms anddatabase objects.

• Simple and advanced searching of objects.

• Graphical browsing of dependencies between objects: object calls, queries and tablereferences, and sequences of calls.

• Visualization of the structure and components of objects (forms, tables, packages,etc.)

• Source code visualization of executable PL/SQL objects.

• Data extraction for the target system database, through the generation of SQLINSERT statements.

• Data export of the processed information to Excel files.

1The target system is the software system subjected to reverse engineering (e.g., SIFI).

29

CHAPTER 4. RETOOL: AN ORACLE FORMS REVERSE ENGINEERING TOOL 30

The development of the tool was led by the author of this thesis, iteratively andincrementally, receiving the feedback of the ITC’s developers. The final product is amodular and useful tool, which effectively has supported them in their tasks since earlierversions of the tool.

This chapter describes ReTool in detail and is organized as follows: section 4.1 describesthe Oracle Forms Technology (version 6i), emphasizing on the problems of this technologyfrom the maintenance point of view, sections 4.2 and 4.3 present the tool in detail, includingits architecture, its components and features. Some technical and design aspects of thetool are described in Section 4.4. Finally, the limitations and some general improvementsto the tool are covered in Section 4.5.

4.1 Oracle Forms Technology

Oracle Forms version 6i is an enterprise application technology that provides a two-layerdistributed client-server architecture [55]. The user interaction with the system is achievedthrough several form modules and the input and processed data is stored in an OracleDatabase. Basically, a form module has several visual components (Figure 4.1 shows anexample of a form module): input text fields, checkboxes, buttons, radio buttons, titles,labels and so on.

Figure 4.1. Example of a form module of an Oracle Forms application.

Each visual element has execution units that are triggered when events occur, forexample, when the user passes the mouse over an element. These execution units arecalled triggers (or form triggers) and their behavior is analogous to the database triggers,those that are executed when there is an insert, update or delete on a database table.Figure 4.2 presents the general architecture of Oracle Forms. An Oracle Forms applicationis composed of one or more forms (form modules), which are the graphical user interfaceof the application. Each form module contains one or more data blocks (or just blocks),and each one includes one or more items. Blocks are created to perform CRUD operationson data sources, often tables, while items are used to represent the columns of sources


(tables). Items and blocks have events that are handled by triggers which communicatewith other execution objects: program units in the form, procedures in libraries, andpackages and procedures of the database. Please refer to [3], for more information on theelements that compose a form module.

Figure 4.2. General architecture of an Oracle Forms application [3].

The business logic in Oracle Forms technology is implemented on both client and serverthrough the PL/SQL language. This type of two-layer architectures has some advantagesover one-layer ones [55], for example, there is a load distribution among clients, the job ofthe server is executed under only one context and the performance of the system is oftenhigh because of the high coupling possibility of the system. However, these advantageshave some limitations from the evolutionary point of view [2][55]:

1. Scalability issues due to the centralized nature of the server and decentralizationof the clients. A modification in the system often implies a large update in all theclients.

2. Strong coupling since the business logic can be integrated (mixed) with the view (thegraphical user interface) and data management layers, and therefore, the businesslogic is difficult to isolate and change.

3. Strong dependence on the tools’ vendor, which is risky if the vendor stops providingsupport. Also, these low-layer architectures have very poor portability.

Oracle Forms version 6i has been de-supported by Oracle since 2005. New versions ofthis technology include Web deployment, JEE and Web services integration2.

4.2 ReTool’s Architecture

ReTool is organized in several macro components that are common in a reverse engineer-ing tool [32]: extractors, analyzers, a repository and visualizers (Figure 4.3). The sources

2Source: http://www.orafaq.com/wiki/Oracle_Forms

http://www.orafaq.com/wiki/Oracle_Forms


of information of the tool are the form modules and the database objects of the targetsystem. First, the sources are processed by the extractors (Forms Parser, Forms and DBAnalyzers), then, their output data is stored in the repository (Metadata database), later,the analyzers process and synthesize the extracted data into information in the reposi-tory (Forms and DB Analyzers), and finally, this information is displayed in a visualizer(Information Displayer and Web Visualizer). The previous is the general process that adeveloper should follow to reverse engineer an Oracle Forms system with ReTool.

Figure 4.3. General architecture of ReTool.

4.3 ReTool’s Components and features

This section describes each component of ReTool regarding its inputs/outputs, featuresand quality attributes, and the way it should be used.

4.3.1 Extractors and Analyzers

ReTool has three extractors and analyzers:


DB Analyzer: this java command-line program processes the target system database,extracting all its structural objects, including tables, views, constraints, indexes and se-quences, and executable objects such as PL/SQL executable packages, procedures, func-tions and triggers. This component also processes the calls between executable objects,at procedure level and also, the queries and tables that are referenced in them. All thisinformation is stored in the ReTool repository. Figure 4.4 shows the command-line optionsof this program.

Figure 4.4. Command-line options of the program DB Analyzer.

Forms Parser: written in C/C++, this command-line program processes the OracleForm Source Code files (FMB files) of the form modules of the target system. It extractsthe structure of a form and creates a hierarchical structure of folders and files with theproperties of every element of a form, including data blocks, items, canvases, programunits, triggers, alerts, list of values (LOVs), record groups and windows. Figure 4.5shows, on the right, the structure of the form CUSTOMERS (visualized by the OracleForm Builder, an Oracle development tool) and, on the left, the folder-files hierarchy thatthe Forms Parser creates when executed. Each element of the form has a XML file withall the properties of the element. Figure 4.6 shows an example of a generated XML file ofthe item COMMENTS. The parsing process from FMB files is possible using the OracleForms API [18].

Both analyzers, the DB and Forms Analyzers, use ANTLR Parser Generator v3 toparse and create the AST of procedures, functions and packages. A PL/SQL grammarwas created based on the contribution by Patrick Higging3.

Forms Analyzer: written in Java, this command-line program processes all the fold-ers and files generated by the Forms Parser. This program processes all the extractedcomponents of a form, the calls of executable objects (program units and form triggers)

3http://www.antlr.org/grammar/1279318813752/PLSQL.g

http://www.antlr.org/grammar/1279318813752/PLSQL.g


(a) Structure of the form. (b) Folder-files hierarchy generated by theForms Parser.

Figure 4.5. Comparison of the structure of the form CUSTOMERS and output of the FormsParser.

and the queries and tables referenced by these objects. In the same way, the program isable to extract the referenced tables and views in data blocks of a form. Figure 4.7 showsthe command-line options of the program.

Dependencies Analyzer: written in Java, this command-line program processes allthe calls among the executable objects and the tables referenced by those objects, tobuild a dependencies tree. In this way, it is possible to know all the paths (sequencesof calls/dependencies) among objects and ’directly or indirectly’ all the tables that arereferenced by a form and its components. The complete hierarchy of tables are alsoprocessed by this program and stored in the repository. Figure 4.8 shows the command-line options of the program.

4.3.2 Repository

The ReTool’s repository is a metadata database, designed for the Oracle RDBMS, thatstores every Oracle Forms/Database object and each of their attributes, for example, theirname, the target system they belongs to, the source code in the cases of executable objects,


(a) Properties of the element. (b) Generated XML file with information of the element.

Figure 4.6. Comparison of the elements of the item COMMENTS and the generated XML fileby the Forms Parser.

the description or comment and their properties, in the case of Oracle Forms objects. Thedependencies between objects are also stored in this database. The repository is dividedin four parts:

• One portion stores the target system database objects and the dependencies betweenthese objects. All these repository objects of this portion has the prefix “db ”, asseen in Figure 4.9.

• Other portion includes the tables and objects related with target system forms. Allthese objects have the prefix “mm ”.

• The tables and objects related with the parsing processes on the target systemcompose other repository portion. All these objects have the prefix “pr ”.

• Finally, the rest includes the tables and objects related with the Business ProcessExtraction approach detailed in Chapter 3. All these objects have the prefix “bs ”.

The database also has constraints that ensure the consistency and integrity of data andindexes that speed up the information retrieval of the visualizers. In addition, there areseveral database views that are useful for the visualizers as they condense and summarizeinformation. Since it is expensive to build complete sequences of calls between two objects,as the amount of data could be very large, ReTool’s repository implements Materializedviews [25] with indexes that speed up the queries. The dependencies tree generated bythe Dependencies Analyzer is stored in two special tables and the materialized views arecreated referencing those tables. There are also tables and materialized views for thecomplete table hierarchy. The drawback with the materialized views approach is thatthey require to be refreshed in order to include new data in the repository. But the mainadvantages of the approach are the speeding of information retrieval and the possibilityto know all the sequences of calls between objects.


Figure 4.7. Command-line options of the program Forms Analyzer.

Figure 4.8. Command-line options of the program Dependencies Analyzer.


Figure 4.9. Example of the repository metamodel, which represents the database tables and theirrelational components, of the target system


4.3.3 Visualizers

The visualizers in ReTool display the information stored in the repository. There are twoindependent visualizers:

Information Displayer: written in Java, this command-line program receives a textstring and the output is the information of the structure of the forms that contain theinput text in their names. The program has several arguments that display or omit certaininformation; for example, there is an option for hiding the tables that program units anddata blocks reference.

Web Visualizer: written in Java, this web application is the main visualizer andmaybe the most important component of ReTool. Referred here as ReTool Web, it displaysalmost all the information stored in the repository under a uniform scheme of visualization.Most of the web pages of ReTool Web look like the one shown in Figure 4.10, in whichthere are several boxes that contain information, each box displaying the result of a fixedor dynamic query to the repository.

Figure 4.10. Graphical User Interface of ReTool Web.

As shown in Figure 4.11, a box of a web page is composed of some elements:

• A toolbar which contains, from left to right, a blue icon, the title of the box, thenumber of displayed objects versus the total of objects of the box and the graphicalelement [ + ].

– The blue icon, if the mouse passes over it, displays a pop-up dialog box withsome options that apply for the entire box. For example, in Figure 4.11, there


is a button in the pop-up box that exports all the data of the box to an Excelfile (bottom-left corner image).

– The element [ + ] increases the size of the box to a fixed value, if the userrequires to visualize more data in the box. The graphical element [ - ] allowsto restore the box size.

• The content of the box include zero or more objects of the target system. Eachone has an icon on the left, depending on the object type, its name, its description(under the name), and an icon on the right with several options that can be doneon the object. The information presented in a box is retrieved in an on-demand anda paginated way. Each time the user scrolls down the box, the next page of theinformation is retrieved.

When the user clicks on the toolbar of a box, the content of the box is collapsed. If itis clicked again, then the box returns to its default size.

Figure 4.11. Graphical elements of a box.

Each option for the pop-up dialog of an object (bottom-right corner Figure 4.11) hasa letter with a specific behavior on the object, if clicked:

• I: for all types of objects (forms, tables, packages, etc.), this option displays a pop-upbox with specific information of the current object. For example, for table columns(Figure 4.12), the information displayed (from top to bottom) is the internal ID ofthe object, the object’s name, its type, the type of data of the column, if it is nullableor not, and the table that it belongs to.

• R: for executable objects (except packages), tables and forms, this option redirectsthe user to the web page where the dependencies tree of the current object is beingdisplayed.

• In: for tables, this option redirects the user to the page that generates the SQLINSERT scripts of the data stored in the target system database, based on a queryon the current table.

• J: for tables, this option redirects the user to the table hierarchy page of the currenttable.


• T: for forms, this option redirects the user to the page where the tables referencedby the current form are being displayed.

• F: for tables, this option redirects the user to the page where the forms that referencethe current table are being displayed.

• P: for all the components of a form, including data blocks, items, program units,and so on, this option displays the properties of the current component. This optionis only displayed in the web page of the current form’s structure.

• C: this option displays the source code of an executable object.

• IO: this option applies for executable objects (except packages), tables and forms.It adds the current object to the session, as a source or target object for the callsequences feature.

Figure 4.12. Pop-up dialog displayed for a table column FOND DESCRI.

The icons of the objects, according to the type of object, are shown in figure 4.13. Thefirst three icons are the most used in the application, since the executable objects, formsand tables are the main and common objects in a Oracle Forms application (at least inSIFI). In the next paragraphs, every web page of ReTool Web is deeply described.

The first web page of ReTool Web is the login page (Figure 4.14). ReTool Web hasa common authentication mechanism, in which a username and a password should beprovided to access the application and the information it provides. Currently, there isonly one user of the application: retool; more users will be provided and different accesspermissions to the application pages. Additionally, the login page has a selector of thetarget system in which the user is interested.

When the user has logged in, the searching page, shown in Figure 4.15, is presented tothe user. All the pages of the visualizer have a header and body; the header has a targetsystem selector and a menu bar with the following elements:

• The home and log out buttons.

• The menus and options for navigating to every page of the application.

• A simple searching input for quick searching. This allows the user to perform asimple search (on the objects name) in any page of the application.


Figure 4.13. Icons of every type of object in ReTool Web.

Figure 4.14. ReTool Web’s login page.


In the center of the searching page, there is a searching input with the search andprevious results buttons; on the right side of the page, there is a searching/filtering panelwith several options such as, searching in the source code, in the comments of tables, orin the selected types of objects. Also, it is possible to search for objects that belong toother objects; for example, it is possible to search for columns of a set of tables by theirnames, or items that belong to a form specified by the user.

Figure 4.15. Searching page.

When the user performs a search, by entering a text string and pressing the searchingbutton or the enter key, the results are shown as in Figure 4.16, where each type of objectis located in a box. On the top-left of the results page, there is a button that takes theuser to the searching form for performing another search.

When the user clicks on the name of an executable object, in the results page or anyother page, the application redirects the user to the Object References page, show inFigure 4.17. This page has a special distribution:

• In the center of the page the Object Of Interest (OFI) is located.

• On the left column of the page, the boxes show the forms that reference the OFIand the tables that the OFI retrieves data from. In other words, the left columnand the objects located there express some kind of “input” to the OFI.

• On the right column of the page, the tables that the OFI inserts, updates or deletesare shown. In this case, the column expresses some kind of “output” or result fromthe OFI.

• In the box above the OFI, the executable objects that call the OFI are displayed.

• Finally, the box below the OFI shows the executable objects that are called by theOFI.


Figure 4.16. Searching results page.


In this page, if the user clicks another executable object, the boxes are updated with theinformation of the new current OFI.

Figure 4.17. Object References page.

When the user clicks on the name of a table, the application redirects the user to theTable References page shown in Figure 4.18, which is very similar to the Object Referencespage. The Table Of Interest (TOI) is located in the center, the top-left box shows theexecutable objects that perform any CRUD operation on the table, and the bottom-leftbox displays the table constraints. On the right, the table’s columns and triggers areshown, while the box above the TOI contains the parent tables of the TOI and the boxbelow the children tables 4.

Now, when a form is clicked, the application redirects the user to the Form Referencespage (Figure 4.19). The structure of the Form Of Interest (FOI) is shown on the left box.When the user expands a node of the tree, the objects are displayed in the table and alsoin the top-center box. The DB executable objects that the form references directly orindirectly are displayed on the bottom-center box. Finally, the properties of the currentform object are shown in the right box. The current form object could be any object ofthe FOI, for example a specific data block or item that belongs to the FOI. For this, theusers have to click the “proper” link of the object on the structure tree, or the P buttonof the object in the top-center box.

The Table Hierarchy page (Figure 4.20) presents the tree of all children tables of theTOI in the left box, and the tree of all the TOI’s parent tables in the right box. To access

4A parent of a table is another table that has an integrity constraint (a foreign key) to the former one.The children table depends on the parent regarding the data it contains.


Figure 4.18. Table References page.

Figure 4.19. Form References page.


this page, the users need to click the J option of a table, in the pop-up dialog, as explainedabove.

Figure 4.20. Table Hierarchy page.

One of the most important pages in ReTool Web is the Dependencies Tree page (Figure4.21). Given a form, executable object or table, the direct or indirect forward dependenciesare shown in the left box, and the reverse dependencies in the right box. In this way, itis possible to know sequences of calls and references between objects. For example, it ispossible to navigate the references of a form and the references of those references untilthe referenced tables. The navigation in the opposite direction is also possible.

ReTool Web is also able to retrieve and display the sequences of calls between a sourceobject and a target object (Figure 4.22). For this, in every page of the application, theuser can select the source and target objects (only forms, executable objects or tables)with the IO option in the pop-up dialog of the objects. The selected objects are shownin the top-left corner of every page and they can be changed for other objects. When thesource and target are already selected, the application enables the button that redirectsto the Call Sequences page, above the source/target dialog. In this page, the source andthe target are located in opposite directions, on the left and on the right, respectively. Inthe middle, all the sequences between them are displayed. Each sequence is in a frame inthe middle box.

The application can also display information about a package (Figure 4.23). ThePackage Information page has three boxes: the left box displays the forms that referencethe package, the middle box the executable objects (procedures and functions) that belongto the package, and the right box the tables that are referenced by the package. This pagecan be accessed by clicking a package.

The Tables-Form page (Figure 4.24) displays the tables that a form retrieves data (top-left box), updates (bottom-left box), inserts (top-right box), and deletes from(bottom-rightbox). The box below the FOI displays the tables that are data sources of the FOI’s datablocks. This page is accessed by clicking the T option of a form in its pop-up dialog.


Figure 4.21. Dependencies Tree page.

Figure 4.22. Call Sequences page.


Figure 4.23. Package Information page.

Figure 4.24. Tables-Form page.


In the same way, the Forms-Table page (Figure 4.25) displays the forms that insert(top-left box), delete (bottom-left box), retrieve data from (top-right box), and update(bottom-right box) the TOI. The box below the TOI displays the forms containing datablocks with the TOI as data source. This page is accessed by clicking the F option of atable in its pop-up dialog.

Figure 4.25. Forms-Table page.

In addition, ReTool is able to generate SQL INSERT statements of the data stored inthe target system database (Figure 4.26). The SQL Inserts Generation page presents theTOI’s columns on the left to the user, each one with an input box, and the user, afterentering a searching value on one column, presses one of the three buttons (above thecolumns) for retrieving the data related with the TOI:

• If the user presses the left button, “parent inserts”, the application retrieves thedata of the TOI and all its parent tables, the parents of its parents, and so on, andgenerates a script with the SQL INSERT statements of all the data, preserving theintegrity order in which the statements should be executed. In this way, it is possibleto use the generated script to create the data in an empty database.

• If the user presses the middle button “all inserts”, the application retrieves the dataof the TOI, its parent and children tables. The application considers the cases whenone child table has one or more parent tables different than the current table, so it ispossible to retrieve the entire database (all the data). Thus, this feature should beused carefully, because retrieving all the data could take too much time and couldalso increase the load of the target database.

• If the users press the right button “all + triggers”, then the application does thesame as the middle button, but at the beginning of the script, the statements thatdisable the triggers of the tables involved are placed and at the end of the script, thetriggers are enabled again with the proper statements. This prevents some errorswhen the SQL INSERTs are executed.


This feature is useful when a developer wants to copy data from one database to anotherfor performing testing in an isolated way. This page can be accessed by clicking the Inoption of a table in its pop-up dialog.

Figure 4.26. SQL Inserts Generation page.

Finally, ReTool Web provides a web page that shows the information of the reverseengineering/parsing processes performed on the target system (Figure 4.27). On the leftside of the page, all the processes are shown in decreasing order in time, the first processthat appears is the last process performed on the system. On the right side, the general andthe detailed information of the current selected process (by the user) is shown. The generalinformation displayed includes the process state (in progress, successfully finished, andfinished with errors), the type of processing (DB, Forms, Dependencies), the date and timewhen it started, the ending date-time, the spent time and the number of errors/warnings.The detailed information includes, for every type of object, the current number of objectsprocessed versus the total number of objects to process, the progress of the processing, itsstate and the number of errors/warnings. This page is accessed only in the Administrationmenu in the menu bar.

In short, ReTool Web is a simple and versatile visualizer, which includes several pagesthat describe the structure of objects such as forms, tables and packages. One of the maingeneral features of the tool is the object dependencies processing of executable objects,forms and tables, which can be browsed in several ways: in general, different web pagespresent this information, and in particular, the Dependencies Tree and the Call Sequencespages allow to explore all the direct and indirect dependencies of objects.

4.3.4 Utilities

Additional to the core components, ReTool has several command-line programs that com-plete the component portfolio of the tool:


Figure 4.27. Reverse Engineering Processes page.


• Business Rules Analyzer: performs the extraction of concepts, fact types, andbusiness rules of the target system. This program is the implementation of the BREapproach presented in Chapter 3.

• Forms Comparator: compares two forms (generally two versions of the sameform) using the folder-files hierarchies generated by the Forms Parser, and displaysthe differences between them regarding data blocks and items. Specifically, thisprogram displays if the items where added, removed or visibly changed (for example,if their visibility was changed to hidden).

• Form Tables Displayer: this program displays the tables that a list of formsreference to, including the CRUD operation on them.

• Lines Counter: counts the total number of LOC of all executable objects of atarget system.

• Paths Displayer: displays the dependency tree of a list of executable objects. Thisprogram does not require refreshing the materialized views.

4.4 Further details about ReTool

ReTool supports the processing of multiple target systems. The extractors, analyzers andvisualizers have several options to change from one system to another. In the case ofthe DB, Forms and Dependencies Analyzers the command-line argument -dbs is used forselecting the system, while in ReTool Web, the selector in the login page and the headerof every page changes the current system. In addition, the tool supports incrementalprocessing of objects, which means that the tool is able to process the selected type ofobjects instead of processing all types (all the objects). In the same way, it is possible toprocess and update specific objects of the target system. For example, the analyzers allowto process Table X, Y and Z of a determined target system, through the command-linearguments of the programs.

All the extractors and analyzers are multi-threading, which reduces the processingtime of the systems. For an information system as SIFI, the average processing timeof the database is about 30 minutes, for the forms is 3 hours, and for the dependenciestree 2.5 hours. In total, the average processing time for SIFI is 6 hours. The time forrefreshing the materialized views is about 1-2 hours, depending on the amount of data inthe repository and the server load. Apart from this, the analyzers stores data about theprocessing, including the state, the type of processing, the date and time when it started,the ending date-time, the spent time and the number of errors/warnings. In this way, it ispossible to know, for every type of object, the current number of objects processed versusthe total number of objects to process, the progress of the processing, its state and thenumber of errors/warnings. By the way, ReTool is able to process the objects of the targetsystem regardless an error in specific objects; in other words, the tool is fault-tolerant.

Finally, ReTool is designed and implemented in C/C++, Java, several web technologiessuch as HTML, XML, XSP and JavaScript and with ITC specific technologies. TheReTool’s repository is designed for the Oracle 10g RDBMS, and ReTool Web is supportedfor Apache Tomcat 7.0.22 or later. The main operating system supported for ReTool isWindows XP or later, but it can also work for Linux and other operating systems that


run the Oracle JVM. The unique running restriction is that the Forms Parser only worksin a Windows machine with Oracle Forms 6i installed.

4.5 Limitations and future enhancements

The ReTool’s features and benefits has been described previously. However, ReTool havesome limitations that will be addressed in the future. The following are some limitationsand enhancements to be done to analyzers and extractors:

• Currently, ReTool detects wrong procedure/function calls when parsing the sourcecode. For example, the usage of cursors with parameters are taken as functioncalls. Although, the false positives are validated and not stored in the repository,the performance could be improved and the amount of process warnings could bereduced.

• The target system processing time is high, mainly when refreshing the materializedviews. We need to consider other alternatives for these views.

• Oracle Reports5 is the complement technology of Oracle Forms. SIFI has manyreports implemented on this technology but ReTool does not support it. In thesame way, object libraries are not processed by the tool. We will move forward toperform reverse engineering on these components.

• In ReTool, the target systems need to be updated periodically in order to keep therepository with the system last changes. Currently, this updating process is manual:a user needs to check in the Version Control System (VCS) of the target system toknow the changes performed on the objects and used them as an input to ReTool.We will develop an automatic way of doing this, through the integration betweenReTool and VCS.

• The ReTool’s command-line tools are not easy to use compared to graphical appli-cations such as ReTool Web. In some cases, when there are many changes in thetarget system and it should be updated in ReTool, the usage of these tools could bevery difficult. Then, we will implement a GUI for the extractors and analyzers.

ReTool Web also needs to be enhanced in some aspects:

• The user may want to customize the visualization of the system; for example, changethe layout of pages or the size of boxes or the objects color. User customization is afeature for future work.

• Currently, the tool supports a single language: Spanish. We will move forward toimplement Internationalization.

• The tool does not allow the edition of Form and DB objects, including their sourcecode and properties. We will implement this feature in the tool.

5http://en.wikipedia.org/wiki/Oracle_Reports

http://en.wikipedia.org/wiki/Oracle_Reports


Finally, ReTool should offer, in the future, integration or interoperability with othersystems through exchange formats [32] or any other mechanism, an easy installation,dynamic analysis to the call sequences feature, as complement to the static analysis, anda metrics module.

4.6 Summary

ReTool is a reverse engineering tool for supporting the maintenance and understandingof Oracle Forms information systems. This Chapter described the tool in detail, includ-ing its architecture, components, features, technical design aspects, limitations and futureimprovements. The next Chapter discusses the results of the qualitative evaluation per-formed with a group of ITC employees.

CHAPTER 5

Evaluation and Discussion

This chapter presents the evaluation of ReTool and the structural Business Rules Extrac-tion technique.

5.1 Evaluation of ReTool

We performed a qualitative evaluation of ReTool, through a survey, since we were inter-ested in how the tool has been useful for the people in the company (ITC).

5.1.1 Purpose of the survey

The survey was conducted with the goal of getting feedback from the people that haveused the tool, taking into account some aspects such as:

• The usage of several tool features.

• The everyday tasks that the tool has supported.

• The level of gain in time, effort and precision in the understanding and maintenancetasks on SIFI.

• The users’ perception about the visual and information quality.

This feedback allowed us to establish how the tool has performed in the maintenanceand understanding of SIFI, and also how the quality of the tool is regarding the usagefrom the users. Also, we were able to detect some improvement aspects that constitutethe basis for additional future work.

5.1.2 Subjects

The target people were developers and maintainers of SIFI and SGF1. Forty-two people,including managers, functional and technical people constituted this group. The people

1SGF (Sistema de Gestion Financiera in Spanish) is another Oracle Forms information system ownedby ITC. The survey also included this system.

55

CHAPTER 5. EVALUATION AND DISCUSSION 56

ranged from junior to senior employees (developers and functional people) and from newto old employees in the company. However, we selected a subset of all the potentialrespondents of the survey, and chose the people that use the tool the most. In total therewere 29 respondents, representing the 69% of the target group.

5.1.3 Survey description

The survey consisted in 11 questions that can be divided into five categories (the completequestionnaire can be found in the Appendix):

1. Questions to categorize the respondents (2 questions).

2. Questions intended to know the tool usage and the tasks that are supported by thetool (2 questions).

3. Questions that evaluate the level of improvement in time, effort and precision in theuser tasks (3 questions).

4. Questions that evaluate some quality attributes of the tool (1 question).

5. Questions intended to know some tool improvements that users considered importantto make in the future (3 questions).

5.1.4 Results

The survey was answered by 29 people, 79,3% (23 people) of them corresponds to theSoftware Factory Group, the people that perform adaptive and perfective maintenanceon SIFI; 17,2% (5 people) belong to the Support Center Group, the people that performcorrective maintenance; and 3,4% (1 person) to the Functional Group, which is the groupthat know the business domain of SIFI/SGF. The distribution of the respondents is shownin Figure 5.1.

Figure 5.1. Distribution of the survey respondents by working group.

We also categorize the respondents by their working time in the company (Figure 5.2).It was found that 44,8% of the people has been working at ITC for more than 2 years,34,5% have been working between 6 and 12 months, 13,8% between 1 and 2 years, and


the 6,9%, the new people in the company, have been working from less than 6 months.People with more working time in the company are senior developers, while the rest arejunior (basically, the novices) or between junior and senior developers.

Figure 5.2. Distribution of the survey respondents by working time.

Regarding the usage of the tool general features (Figure 5.3), we found that the featuremost used is the DB/Forms object searching with 96,6% of usage; in other words, 96,6% ofthe respondents use this feature. The browsing of object dependencies is used in a 93,1%,the visualization of the objects structure in a 65,5%, the source code visualization in a41,4%, the extraction of data from a database in a 20,7%, and finally, the informationexport to Excel files is the least used, with a usage of 3,4%.

Figure 5.3. Level of feature usage of the tool.

Figure 5.4 shows the tasks in which the tool has been used and the degree of usage ofthe tool in each task. The tool has been mostly used in the measurement of changes impactwith 86,2% of the respondents who use the tool for supporting this task. The technicalunderstanding of the system (SIFI/SGF) is next with 62,1% of support. Bug detection is


supported in a 37,9%, followed by bug fixing and planning of maintenance tasks, both witha 34,5% of support. The detection of useless database/Oracle forms objects is next withthe 27,6%. The detection of inconsistencies between the design and the implementation isanother task for which ReTool has been useful with 17,2%, and finally, the least supportedtask is the functional understanding of the system with a 6,9%.

Figure 5.4. Level of tool usage on development tasks.

People were also asked about their effectiveness for completing the tasks and artifactsby using the tool, in terms of time (Figure 5.5), effort (Figure 5.6), and precision (Figure5.7). Only one person could not answer the questions related with this criterion, becausehe started to use the tool when he started to maintain SIFI, so he did not have comparisoncriteria.

We found that 14,3% of the respondents said that the degree of tool support in reducingthe time to complete tasks is Very much, and 71,4% said that this support is Much. Incontrast, 14,3% said that there was little tool support, and nobody said (0%) that the toolwas useless at all. Regarding the level of tool support in reducing the effort for completingtasks, 10,7% and 75% of the respondents said that this support has been Very much andMuch, respectively; and 14,4% of them said that the tool was little useful in this aspect.Again, no one perceives ReTool as a useless tool. Finally, regarding the precision of tasksand artifacts, 10,7% of the people rated the tool support as Very much, 64,3% as Much,25% as Little, and 0% as Nothing.

We also asked the people to rate some quality attributes of the tool in a 1-4 scale,having 1 as the worst score and 4 as the best one. Figure 5.8 shows the results, having thebest scored attribute: “Ways of searching the information” with a score of 3.45, and theworst, but well, scored attributes: “Completeness about the displayed information” and“Level of detail about the displayed information” with a score of 3. Among them are, indecreasing order, “Speed in the information displaying” with 3.29, “Intuitive informationvisualization” with 3.28, “Ways of browsing the DB/Forms objects” with 3.24, “Ways ofvisualize the information” with 3.21, and “Tool usability” with 3.07.

Finally, the survey respondents gave us several comments and suggestions about im-provements of the tool regarding the information it provides or could provide, and theway of visualizing such information. We received 39 suggestions that were classified intoseveral categories. However, some suggestions were not considered in this categorization


Figure 5.5. Degree of tool support in reducing the time to complete tasks.

Figure 5.6. Degree of tool support in reducing the effort to complete tasks.

Figure 5.7. Degree of tool support in increasing the precision to complete tasks and artifacts.


Figure 5.8. Rating of some quality attributes of the tool.

(Figure 5.9) because they were beyond the scope of the tool (1 suggestion) so they wereNot accepted, some of them were not clear so we did Not understand them (3 suggestions),and some of them contained comments about features and attributes that were alreadyProvided by the tool (7 suggestions). In this way, we classified 28 suggestions in thefollowing categories (Figure 5.10):

1. Eleven of them (39,3%) were about improvements in the tool usability,

2. Nine (32,1%) about the requirement for additional information that the tool shouldprovide,

3. Three (10,7%) about providing more semantic information,

4. Two (7,1%) about additional filters,

5. and there was one suggestion about code instrumentation, one about improving thespeed of searching and one about improving the searching feature (3,6% each).

Figure 5.9. Status of the received suggestions from the respondents.


Figure 5.10. Specific categorization of the non provided suggestions.

The Usability and Filtering categories correspond to “Visualization suggestions”, Ad-ditional and Semantic Info. categories to “Information suggestions”, and the others to“Feature suggestions”. In this way, we received 13 “Information suggestions”, 13 “Visu-alization suggestions” and 2 “Feature Suggestions” (Figure 5.11).

Figure 5.11. Categorization of the non provided suggestions.

5.1.5 Analysis and Discussion

Most of the tool users are technical, i.e., developers and maintainers. There are fewfunctional people using the tool, and we asked only one functional employee to answerthe survey. We think this tool is intended mainly for technical people because all theinformation that tool displays is about technical aspects of the target systems. However, wefound that people want more semantic and functional information, which was manifested insome suggestions. This can also be reflected on the low tool usage for the task “Functionalunderstating of the system” (Figure 5.4). In any case, junior and senior developers usethe tool as well as new and old employees2.

2By the way, all senior developers have a long working time in the company, while the junior ones ashort time.


Regarding the usage of the tool’s general features (Figure 5.3), the most important fea-tures have a high level of usage as expected: “Object Searching”, “Dependencies Browsing”and “Structure visualization”. However, we expected more usage of “Source code visual-ization”. In the suggestions, we received an improvement request about this feature, andwe think the main reason for this low usage is that the code cannot be edited in the tool,and the visualization is limited to highlighting and line numbering. On the other hand, weexpected low usage of Extraction of data from a DB because it is a very specific featureand is not necessary for all the people in the company. The usage of Data export to Excelfiles is also low because it is a new feature, so most of the people have not used it yet. Ingeneral, there is a high level of usage of the tool.

The tasks that the tool supported the most were “Measurement of the changes impact”and “Technical understanding of the system” (Figure 5.4). This was expected because ofthe large size and the high coupling of SIFI and SGF, and the fact that there is no doc-umentation about the systems architecture and domain, so these tasks are very commonin the company. As stated before, the “Functional Understanding of the system” is theless supported but required task; the rest of the tasks have been supported by the tool insome cases.

Regarding the effectiveness of tasks, people perceive ReTool as a very useful tool forsupporting their tasks (Figures 5.5, 5.6 and 5.7): 85,7% assigned a positive usefulnessin time and effort reduction versus 14,3% of negative usefulness, and 75% of positiveusefulness for increased precision versus 25% of negative precision. One of the reasonsfor the negative perception of usefulness is the usage level of the tool by some people.The tool usage varies because some people have roles in the development group that onlyrequire the tool in specific scenarios, and in some cases, there are people that have startedusing the tool.

In general, the perceived tool quality attributes by the users is high. The tool hasseveral ways of searching the information, the speed of the information displaying is high,the information visualization is intuitive, it has several ways of visualizing and browsingthe system objects, the tool is easy to use, it presents enough detail about the information,and the completeness about the information is high. However, the users suggested severalspecific improvements about some attributes and features:

• The tool should allow manual or automatic semantic clustering of objects, and ex-traction of high level processes.

• In several cases, the granularity should be more specific. ReTool granularity is atprocedure level.

• The tool should calculate quality metrics of the target systems.

• The tool should include processing and displaying of additional DB and Forms ob-jects, such as sequences or pop-up menus.

• The information visualization should be more integrated and synchronized, and alsomore specific through filters.

• The tool should be more intuitive regarding the searching and filtering feature.

Other suggestions were more specific regarding the target system. For example, one usersaid: ”The tool should display the version number of the objects”. In SIFI/SGF, each


object has a number that corresponds to the version, but this specific implementationdecision on these systems. ReTool could work for other Oracle Forms systems differentfrom SIFI and SGF.

In conclusion, ReTool seems to be very useful in the understanding and maintenanceof Oracle Forms applications. It improves the productivity of developers to complete theirtasks through several functionalities such as object searching, visualization and browsingof object dependencies and structure, visualization of source code, data exporting from adatabase to INSERT scripts, and export of the displayed information to Excel files. Thetool has been used in the company for more than one year and it has been successful.On the other hand, the tool needs to be improved in order to cover some aspects andrequirements that the developers and the functional people request.

5.2 Evaluation of the BRE technique

ReTool implemented the algorithm and DB-BR mappings and heuristics, presented inChapter 3, and it is able to extract:

• Concepts from tables and columns, using the comments in the DB.

• Verbs and binary property fact types from the table/column comments and concepts,and from the tables hierarchy (foreign keys).

• Categorization and unary fact types from check list constraints.

• Structural BR from the extracted fact types, using the non null constraints.

The input of the tool is a list of DB tables and the output is a set of extracted andstored concepts, verbs, fact types, sentences and business rules. The tool processed theDB components of the input tables and the tables related to them through the foreignkeys (parent and children tables) and created the output in about 6 minutes of executiontime. Table 5.1 presents some examples of the resulting components. All the extractedBRs and components can be downloaded from http://www.itc.com.co/WCRE12/3.

Table 5.2 summarizes the results of the BRE tool in the PP module. The 25 coretables of the module were the input to the tool, resulting in 155 processed tables and3,447 columns, for a total of 3,602 processed objects. The number of extracted conceptsfrom tables and columns was 2,508, of which 2,142 correspond to general concepts, i.e.,concepts of tables and columns, and 366 correspond to categorical concepts, i.e., conceptsextracted from columns involved in check list constraints, which values represent classesof concepts. In addition, the tool was able to extract 413 verb phrases and a total of 4,005fact types using the extracted concepts, verbs, and table constraints. 488 unary fact typeswere extracted (from check list constraints, whose values represent states of concepts), andalso 366 categorization fact types. The number of extracted binary fact types is 3,517;114 of those were extracted from foreign keys having verbs in the column comments, and177 are with no verbs in comments. At the end, 870 sentences and structural BRs weregenerated using non null constraints and the extracted fact model. 22% of fact types wereused to create the structural BRs.

3The dataset is available only in Spanish.

http://www.itc.com.co/WCRE12/


Table 5.1. Examples of the extracted BR components from the PP module.

BR component Examples

ConceptTrust Payment Movement,

Operation Check

Unary Fact TypeTreasury Movement Type is

tax-exempt

Binary Fact TypeExpenditure per Contribution

generates TreasuryMovement

Property Fact TypeCredit Note has Specific

Voucher Type

Categorization Fact TypeFixed Investment

Depreciation is a category ofInvestment Depreciation

Structural Business RuleEvery Deposit always has a

Transaction Value

5.2.1 Evaluation

Once we refined the algorithm and obtained these results, we conducted a study with fourof the ITC employees to evaluate the precision of the results, i.e., we wanted to identifythe false positives. At this stage, we cannot assess the recall of the tool, as we do notknow exactly how many BRs are encoded in the system and in the database, hence wecannot estimate how many we are missing. Since this work is in an industrial context,usability is critical. Hence, we wanted to make sure that the precision is not too low andthe future users will not have to investigate too many false positives, as that would proveto be a deterrent in using this tool. This is why we introduced some of the conservativeheuristics described in Chapter 3. After discussing with the maintainers and managers ofthe system, we decided that the lowest acceptable precision would be around 20-25%. Inother words, the people interested in the knowledge acquisition were not willing to lookat more than 3-4 wrong rules in order to get a good one (in average).

We selected a subset of the rules extracted by the tool, and asked four employees of thecompany to analyze them. The four subjects know well the functionality of the softwaresystem, the module where the business rules were extracted from, and have extensiveexperience in the fiduciary business. The objects of the study were 300 rules randomlyselected from the set of 870 rules extracted by the tool (34%).

Specifically, each expert told us if the rules extracted actually embody business knowl-edge or just system implementation details, or if they do not make sense and why. Sinceall the employees have high level of knowledge of the system and the fiduciary business,each rule was evaluated by only one human expert. Given the limited time resources, wedecided to ensure higher coverage of the results, rather than ensure redundancy to avoidevaluation mistakes. Thus, each subject judged 75 rules by answering some questionsabout them. Given a business rule R, the following questions were asked to each subject:

• Does the rule R make sense?


Table 5.2. Statistics of the extracted BRs and components from the PP module.

Statistic Value

# of processed objects(tables/columns)

3,602

# of processed tables 155

# of processed columns 3,447

# of concepts 2,508

# of general concepts(tables/columns)

2,142

# of categorical concepts 366

# of verbs 413

# of fact types (FT) 4,005

# of unary FT 488

# of binary FT 3,517

# of property FT 3,151

# of categorization FT 366

# of FT from FK (verbs) 114

# of FT from FK (no verbs) 177

# of sentences 870

# structural BR 870

% of BR from # of FT 22

• If yes, is R a rule of the fiduciary business (domain rule) or a rule of the softwaresystem (implementation rule)?

• If not, please specify what you do not understand about the business rule R.

Of the 300 extracted rules, 195 (65%) of the rules in the sample were deemed correct by thesubjects, while 105 (35%) were considered incorrect (see Table 5.4a). Within the correctrules, 87 (29%) were judged as business rules, and 108 (36%) as implementation rules (seeTable 5.4b). Overall, the subject were satisfied with the precision of the tool. In average,one in three rules was a good business rule and one more was a correct implementationrule, which the subjects also deemed useful (although our focus was strictly on the BRs).We focused our attention on the false positives (i.e., the 105 rules deemed incorrect).

Table 5.3. Results of the BR evaluation.

Total Correct Rules 195 (65%)

Total IncorrectRules

105 (35%)

(a) Number of correct and incorrect rules.

Correct Business Rules 87 (29%)

Correct ImplementationRules

108 (36%)

(b) Number of correct business and implementationrules.

As mentioned, we asked the subject to point out (where possible) or speculate what iswrong with the rules. Table 5.4 summarizes the obtained information. 57 of these ruleswere incorrect because their concepts missed some terms essential for their understanding,or had unnecessary terms that make them difficult to understand. We also found that 7rules made no sense because the same concept was repeated twice in the rules. In addition,


44 rules contained unclear concepts. We found that this lack of clarity comes from thecomments of tables and columns, from where these ill-defined concepts were extracted bythe tool. Among these incorrect rules, we found that 17 rules contain operative conceptsthat should be used for operative rules instead of structural business rules. In these cases,although the concepts were understood, the rules were incorrect because the keywords thatrestricted the fact types are wrong. 2 of the incorrect rules correspond to real businessrules (if fixed) and 13 to implementation rules (if fixed). An incorrect rule may be classifiedin several categories of Table 5.4.

Table 5.4. Quantitative information about the misunderstood/incorrect rules.

Rules Num.

Miss/add Terms 57 (54%)

No-sense Rules 7 (7%)

Wrong Comments 44 (42%)

Operative Concepts 17 (16%)

Implementation Rules 13 (12%)

Business Rules 2 (2%)

In summary, we detected three elements that negatively affect the precision of the BREapproach:

1. Poor quality in some table/column comments for the extraction of verbs, concepts,categorization, and unary fact types. In the case of categorization and unary fact types,the existence of table/column comments is mandatory. If comments do not exist in theDB, the mapping of all BR would be manual or would be extracted from other sourcesof information. We found comments with no natural language, such as, “—” or “****”;comments that specify examples of the concepts instead of the concepts; comments withincomplete descriptions that miss important terms describing the concepts; commentswith unnecessary and confusing information; comments with erroneous descriptions of thetables and columns; and comments with spelling errors. Our tool could be also used topoint out places where the comments need to be updated.

2. The precision of the POS tagger, which leads to missing or wrong detected nounsand verbs in comments. In the same way, this leads to incomplete extraction of concepts,verbs and fact types. We plan to try out other POS taggers in the future. In some cases,the POS tagger was misled due to misspellings. We plan to run a spell checker to clean thecomments in the future. A more complex natural language processing of the commentsmay be also needed.

3. Decayed architecture of the PP module as a result of the long evolution of thesystem. This is reflected in the number of nullable columns and the presence of parametricand characteristic columns4 in the same table. In some cases, this leads to the extractionof false structural BRs. In a few cases the tool creates non sense rules where a conceptis associated with itself, such as the rule: “Every Deposit always has a Deposit”. Thishappens with columns having a non null constraint and a primary key constraint. Wewill modify our algorithm to exclude these cases. Additionally, columns having a nonnull constraint and a check list constraint that verify ’S’ (Yes) and ’N’ (No) values werenot excluded. As explained in section 3.2.1, they are used for guiding the operations

4Characteristic columns are those representing basic information of the table’s entity.


and processes in the system, instead of structuring and organizing knowledge about thebusiness.

5.3 Summary

A survey was conducted with a group of ITC employees to know how the ReTool hasbeen useful for them. The results show that tool is very useful in the understanding andmaintenance of SIFI and SGF, as it improves the productivity of developers to completetheir tasks and the maintenance process of the systems now is easier. On the other,the proposed BRE approach was assessed. We conducted a study with four of the ITCemployees to evaluate the precision of the results, i.e., the correct BRs, extracted by thetool. We found that 29% of recovered rules are correct structural business rules, 36%correspond to implementation rules, and 35% are incomplete or incorrect rules. Theresults show that the recovery technique is practical, while there is room for improvement,and it will be used as basis for the recovery of additional knowledge.

6

Conclusions

This thesis presents an Oracle Forms reverse engineering tool, called ReTool. The toolaims at supporting the understanding and the maintenance of legacy information sys-tems, by providing automatic processing, searching and visualization of the system objectdependencies and structure. Moreover, the tool provides several features and attributesthat make it a versatile and useful tool. The evaluation shows the potential of the toolregarding the successful use case in the industry. The tool was applied to SIFI, a largeand complex financial legacy system, in which the tool supported several maintenance andunderstanding problems of the system, that developers address every day.

Furthermore, we presented an approach for automatically extracting structural busi-ness rules from legacy databases and its application on a specific legacy system. Specifi-cally, we performed a revision of the BR concepts based on the SBVR standard, proposedan approach that considers DB-BR component mappings to extract structural BRs fromdatabases, applied the approach in an industrial legacy information system, and performeda preliminary assessment of the extracted components and rules. The results showed that29% of the extracted rules are correct business rules and 36% are correct implementationrules. The tool was deemed useful and usable by the evaluators. The evaluation studyrevealed some of the causes for the false positives, which we plan to address in futurework (at least in part). While some aspects of our BRE algorithm are generic, most ofthe heuristics it uses are specific to the systems we worked on. The generalization ofour solution is beyond the scope of this work, yet we believe that our experience can beadapted to other existing systems.

6.1 How does ReTool support the maintenance and under-standing of SIFI?

ReTool implements several variants of reverse engineering standard techniques (see Sec-tion 2.2.1). ReTool creates structure trees and calls/dependencies graphs that are modeledby the ReTool’s repository metamodel. The visualization of this information is achievedthrough a web application, in which there are boxes that contain different informationthat result of specific queries, useful for the developers in their daily tasks. The boxeslayout in several pages organize and present the information to the user, in an on-demandand scalable way. The object searching feature allows the user to perform quick searchesand also advanced searching. This feature also allows the user to locate objects of interest,and obtain related information with specific objects or concepts. The page navigation and

68

CHAPTER 6. CONCLUSIONS 69

browsing of object dependencies and structure, allows the user to inquire about depen-dencies and change effects on objects and also to understand how the target system workstechnically. Additionally, the distributed character of the visualizer encourages the usersto test and use it, as it needs to be installed only in one machine.

Regarding the specific problems of SIFI, ReTool provides several ways to support itsmaintenance and understanding from developers perspective. All these advantages areevidenced by the ReTool evaluation in Chapter 5:

• ReTool processes the dependencies of objects automatically in SIFI. Before ReTool,this job was manual and tedious, because the users had limited tools for this: thetext searching option of development environments and tools such as the OracleForm Builder 1.

• ReTool allows the search of Forms and DB objects faster than other tools (for ex-ample, the Oracle Form Builder or PL/SQL Developer2) and in an integrated way.This means that users only need one tool for searching objects. Before ReTool, usersneeded at least two tools for searching: one for Oracle Forms and one for the OracleDatabase.

• ReTool presents the information about SIFI in an organized, condensed and graphicalway. This allows the users to have different focus points, regarding the informationthey need, to get synthesized information, to perform a structured maintenance/un-derstanding process and to obtain a visual representation of the software system.The information visualization of SIFI is very important as it addresses the invisibil-ity and complexity of the system [21][33].

• The BRE approach, implemented in ReTool, is a starting point for re-documentingthe business rules that implement SIFI.

The cost of all the ReTool advantages is the installation/configuration of its compo-nents and the processing time of the target system.

6.2 Future Work

As future work we will continue developing ReTool, regarding all the future enhancementsthat were presented in Section 4.5.

Additionally, we plan to implement and refine all the proposed DB-BR mappings. Therefinement will include:

• The analysis of temporary tables and check constraints with any type of conditions.

• The detection of roles in fact types [44], verb phrases from tables that only relateother tables (many-to-many relationship), and concept synonyms.

• The extension of the BR format/scheme, in order to represent several type of rulesin other forms, e.g., decision tables as a way for representing parallel business rules[44].

1See http://www.oracle.com/technetwork/documentation/6i-forms-084462.html and [17] to get in-formation about this tool

2http://www.allroundautomations.com/plsqldev.html

http://www.oracle.com/technetwork/documentation/6i-forms-084462.html

http://www.allroundautomations.com/plsqldev.html

CHAPTER 6. CONCLUSIONS 70

We also plan to perform a formal qualitative/quantitative evaluation of the approachin terms of precision and recall, and finally, we will move forward to the extraction ofoperative business rules.

6.3 Recommendations about the maintenance of SIFI

This thesis concludes by proposing some recommendations about how the maintenanceprocess on SIFI should be addressed by the company, based on the experience of theauthor and the application of reverse engineering on it using ReTool:

• We encourage the company to perform perfective maintenance on SIFI, in an auto-matic way if possible. Of course, this is not an easy task, but, at least, it shouldbe applied to those Oracle Forms/Database objects that are critical due to theircoupling.

• There should be a detailed strategy for this, including specific procedures and aspectssuch as software metrics and manual or automatic testing, that assures an effectiverefactoring. Reverse engineering and software maintenance tools such as Retool, aredesigned for making this task easier.

• In this sense, the adoption of useful maintenance tools is essential. Tools for conceptlocation, impact analysis, automatic testing, metrics, and so on, need to be includedin the SIFI maintenance process. We know that there are few tools for OracleForms applications, but it is possible to build them. ReTool is one example and oneimportant step for trying to improve the process on this technology.

• We also suggest the company to define and adopt development and maintenancestandards about SIFI in its specific context. Detailed procedures, methodologies,roles, best practices and tools should be clearly defined and enacted.

APPENDIX

ReTool Evaluation survey

ReTool is an application that supports the process of understanding and maintenanceof SIFI/SGF. This survey aims to evaluate the usefulness of the tool, from the use andperception of people working in ITC. The results will improve the tool, in order to supporteven more the understanding and maintenance activities in SIFI/SGF. Please answer thefollowing questions.

1. What is your working group in ITC?

m Functional Group (Investments Funds, Financial Investments, etc.)

m Technical Group of Investments Funds

m Technical Group of Financial Investments

m Financial Technical Group

m Technical Group of Trusts

m Transversal Technical Group

m Web Technical Group

m Support Center Group

m Other:

2. How long ago do you work in ITC?

m Less than 6 months

m Between 6 months and 1 year

m Between 1 and 2 years

m Over 2 years

3. What ReTool features have you used?

r Forms and DB Object Searching

r Object Dependencies Browsing

r Data extraction from the target database, through the generation of INSERTstatements

71

RETOOL EVALUATION SURVEY 72

r Object (forms, tables, packages, etc.) Structure Visualization

r Source code Visualization

r Others:

–

–

4. In your daily work in ITC, the tool has helped you to (check all that apply):

r Understand how SIFI/SGF works technically

r Understand how SIFI/SGF works functionally

r Detect SIFI/SGF defects

r Fix SIFI/SGF defects

r Detect implemented vs. designed inconsistencies

r Detect useless or unused objects, or objects that should be removed

r Measure the impact of changes of SIFI/SGF

r Plan maintenance activities in SIFI/SGF

r Others:

–

–

5. What is the degree of tool support in reducing the time to complete your tasks?Choose one of the following options.

m Very much

m Much

m Little

m Nothing

6. What is the degree of tool support in reducing the effort to complete your tasks?Choose one of the following options.

m Very much

m Much

m Little

m Nothing

7. What is the degree of tool support in enhancing the precision to complete your tasksand artifacts? Choose one of the following options.

m Very much

m Much

m Little

m Nothing

8. Rate the following attributes of the tool from 1 to 4, being 4 the highest score and1 the lowest

RETOOL EVALUATION SURVEY 73

• Intuitive information visualization

• Tool Usability

• Ways of visualize the information

• Ways of searching the information

• Ways of browsing the DB/Forms objects

• Level of detail about the displayed information

• Completeness about the displayed information

• Speed in the information displaying

9. What improvements do you think should be done to the tool, regarding the displayedinformation?

10. What improvements do you think should be done to the tool, regarding the way theinformation is displayed?

11. Write down the features, modules or pages of the tool in which you require person-alized training.

Bibliography

[1] S. Ali, B. Soh, and J. Lai, Rule extraction methodology by using XML for businessrules documentation, Industrial Informatics, 2005. INDIN ’05. 2005 3rd IEEE Inter-national Conference on, August 2005, pp. 357 – 361.

[2] P. Alvarez and J. A. Banares, Sistemas de Informacion Distribuidos, conceptos yestandares de arquitecturas orientadas a servicios Web., 2006.

[3] L. Andrade, J. Gouveia, M. Antunes, M. El-Ramly, and G. Koutsoukos, Forms2Net -Migrating Oracle Forms to microsoft .NET, Generative and Transformational Tech-niques in Software Engineering (R. Lammel, J. Saraiva, and J. Visser, eds.), LectureNotes in Computer Science, vol. 4143, Springer Berlin Heidelberg, 2006, pp. 261–277.

[4] N. Asif, Software reverse engineering process: Factors, elements and features, Inter-national Journal of Library and Information Science 2 (2010), no. 7, 124–136.

[5] I.S. Bajwa, B. Bordbar, and M. Lee, SBVR vs OCL: A comparative analysis of stan-dards, Multitopic Conference (INMIC), 2011 IEEE 14th International, IEEE, Decem-ber 2011, pp. 261 –266.

[6] I.S. Bajwa, M. Lee, and Bordbar B., SBVR Business Rules Generation from NaturalLanguage Specification, AAAI 2011 Spring Symposium, March 2011, pp. 2–8.

[7] I. Baxter and S. Hendryx, A Standards-Based Approach to Extracting Business Rules,OMG’s Architecture Driven Modernization Workshop, 2005.

[8] I.D. Baxter and M. Mehlich, Reverse engineering is reverse forward engineering,Reverse Engineering, 1997. Proceedings of the Fourth Working Conference on, IEEE,1997, pp. 104–113.

[9] K.H. Bennett and V.T. Rajlich, Software maintenance and evolution: a roadmap,ICSE ’00: Proceedings of the Conference on The Future of Software Engineering,ACM, 2000, pp. 73–87.

[10] I. Burnstein and K. Roberson, Automated chunking to support program comprehen-sion, Program Comprehension, 1997. IWPC ’97. Proceedings., Fifth IternationalWorkshop on, IEEE, 1997, pp. 40–49.

[11] I. Burnstein, R. Saner, and Y. Limpiyakorn, Using an artificial intelligence approachto build an automated program understanding/fault localization tool, Tools with Arti-ficial Intelligence, 1999. Proceedings. 11th IEEE International Conference on, IEEE,1999, pp. 69 –76.

74

BIBLIOGRAPHY 75

[12] G. Canfora and A. Cimitile, Software Maintenance, vol. 2, ch. 2, pp. 15–20, WorldScientific Pub. Co, 2002.

[13] G. Canfora and M. Di Penta, New Frontiers of Reverse Engineering, FOSE ’07: 2007Future of Software Engineering, IEEE Computer Society, 2007, pp. 326–341.

[14] E. Chikofsky and J. Cross II, Reverse Engineering and Design Recovery: A Taxon-omy, Software, IEEE 7 (1990), no. 1, 13–17.

[15] B. Cornelissen, Evaluating Dynamic Analysis Techniques for Program comprehension,Ph.D. thesis, Delft University of Technology, 2009.

[16] B. Cornelissen, L. Moonen, and A. Zaidman, An Assessment Methodology for TraceReduction Techniques, Proceedings of the 24th International Conference on SoftwareMaintenance, IEEE Computer Society, September 2008, pp. 107–116.

[17] Oracle Corporation, Oracle Forms Developer, Form Builder Reference, Volume 1,Tech. report, A73074-01, 2000.

[18] , Using the Oracle Forms Application Programming Interface (API), Tech.report, EIT-DE-WP-56, 2000.

[19] R. Crerie, F. Baiao, and F. Santoro, Identificacao de regras de negocio utilizandomineracao de processos, Companion Proceedings of the XIV Brazilian Symposium onMultimedia and the Web, ACM, 2008, pp. 241–246.

[20] J.H. Cross, T.D. Hendrix, and S. Maghsoodloo, The control structure diagram: Anoverview and initial evaluation, Empirical Software Engineering 3 (1998), no. 2, 131–158.

[21] S. Diehl, Software Visualization: Visualizing the Structure, Behaviour, and Evolutionof Software, Springer, 2007.

[22] P. Dugerdil, Using trace sampling techniques to identify dynamic clusters of classes,Proceedings of the 2007 conference of the center for advanced studies on Collaborativeresearch, ACM, 2007, pp. 306–314.

[23] A.B. Earls, S.M. Embury, and N.H. Turner, A method for the manual extraction ofbusiness rules from legacy source code, BT Technology Journal 20 (2002), 127–145.

[24] H. Gall, R. Klosch, and R Mittermeir, Abstract Pattern-Driven Reverse Engineering,Development and Evolution of Software Architectures for Product Families, SecondInternational ESPRIT ARES Workshop, 1995, pp. 334–341.

[25] S. Ghandeharizadeh and J. Yap, Materialized Views and Key-Value Pairs in a cacheAugmented SQL System: Similarities and Differences, Tech. report, USC DatabaseLaboratory, 2002.

[26] Object Management Group, Semantics of Business Vocabulary and Business Rules(SBVR), v1.0, 2008.

[27] J.L. Hainaut, A. Cleve, J. Henrard, and J.M. Hick, Migration of Legacy InformationSystems, Software Evolution, Springer Berlin Heidelberg, 2008, pp. 105–138.

BIBLIOGRAPHY 76

[28] A. Hamou-Lhadj, E. Braun, D. Amyot, and T. Lethbridge, Recovering Behavioral De-sign Models from Execution Traces, Software Maintenance and Reengineering, 2005.CSMR 2005. Ninth European Conference on, IEEE, 2005, pp. 112 – 121.

[29] D. Hay and K. Healy, Defining Business Rules - What Are They Really?, July 2000.

[30] H. Huang, W.T. Tsai, S. Bhattacharya, X.P. Chen, Y. Wang, and J. Sun, Businessrule extraction from legacy code, Computer Software and Applications Conference,1996. COMPSAC ’96., Proceedings of 20th International, August 1996, pp. 162 –167.

[31] A.C. Kalsing, G.S. do Nascimento, C. Iochpe, and L.H. Thom, An IncrementalProcess Mining Approach to Extract Knowledge from Legacy Systems, EnterpriseDistributed Object Computing Conference (EDOC), 2010 14th IEEE International,IEEE, October 2010, pp. 79 –88.

[32] H. M. Kienle and H. A. Muller, The Tools Perspective on Software Reverse Engi-neering: Requirements, construction, and Evaluation, Advances in Computers (M.V.Zelkowitz, ed.), Advances in Computers, vol. 79, Elsevier, 2010, pp. 189 – 290.

[33] R. Koschke, Software visualization in software maintenance, reverse engineering, andre-engineering: a research survey, Journal of Software Maintenance and Evolution:Research and Practice 15 (2003), no. 2, 87–109.

[34] D. Lo, Mining specifications in diversified formats from execution traces, SoftwareMaintenance, 2008. ICSM 2008. IEEE International Conference on, IEE, 2008,pp. 420–423.

[35] D. Lo, S.C. Khoo, and C. Liu, Mining past-time temporal rules from execution traces,Proceedings of the 2008 international workshop on dynamic analysis, ACM, 2008,pp. 50–56.

[36] C. Lu, W. Chu, C. Chang, Y. Chung, X. Liu, and Yang H., Reverse Engineering, vol.Vol. 2, ch. 18, pp. 447–466, World Scientific Pub. Co, 2002.

[37] S. Mancoridis, B.S. Mitchell, Y. Chen, and E.R. Gansner, Bunch: a clustering tool forthe recovery and maintenance of software system structures, Software Maintenance,1999. (ICSM ’99) Proceedings. IEEE International Conference on, IEEE, 1999, pp. 50–59.

[38] B.S. Mitchell and S. Mancoridis, On the automatic modularization of software systemsusing the Bunch tool, Software Engineering, IEEE Transactions on 32 (2006), no. 3,193 – 208.

[39] T. Morgan, Business Rules and Information Systems: Aligning IT with BusinessGoals, Addison-Wesley Professional, 2002.

[40] H.A. Muller, S.R. Tilley, and K. Wong, Understanding software systems using reverseengineering technology perspectives from the Rigi project, Proceedings of the 1993conference of the Centre for Advanced Studies on Collaborative research: softwareengineering - Volume 1, 1993, pp. 217–226.

[41] E. Putrycz and A. Kark, Recovering Business Rules from Legacy Source code forSystem Modernization, Advances in Rule Interchange and Applications (A. Paschkeand Y. Biletskiy, eds.), Lecture Notes in Computer Science, vol. 4824, Springer Berlin/ Heidelberg, 2007, pp. 107–118.

BIBLIOGRAPHY 77

[42] , Connecting Legacy code, Business Rules and Documentation, Rule Repre-sentation, Interchange and Reasoning on the Web (N. Bassiliades, G. Governatori,and A. Paschke, eds.), Lecture Notes in Computer Science, vol. 5321, Springer Berlin/ Heidelberg, 2008, pp. 17–30.

[43] A. Quilici, A memory-based approach to recognizing programming plans, Commun.ACM 37 (1994), 84–93.

[44] R.G. Ross, Business Rule Concepts: Getting to the Point of Knowledge, third ed.,Business Rule Solutions Inc, 2009.

[45] H. Safyallah and K. Sartipi, Dynamic Analysis of Software Systems using ExecutionPattern Mining, Program Comprehension, 2006. ICPC 2006. 14th IEEE InternationalConference on, 2006, pp. 84–88.

[46] S. Shekar, J. Hammer, M. Schmalz, and O. Topsakal, Knowledge Extraction in theSEEK Project Part II: Extracting Meaning from Legacy Application code throughPattern Matching, Tech. report, University of Florida, 2003.

[47] T. Skramstad and M.K. Khan, Assessment of reverse engineering tools: A MECCAapproach, Assessment of Quality Software Development Tools, 1992., Proceedings ofthe Second Symposium on, IEEE, May 1992, pp. 120 –126.

[48] H.M. Sneed and K. Erdos, Extracting business rules from source code, Program Com-prehension, 1996, Proceedings., Fourth Workshop on, March 1996, pp. 240 –247.

[49] M.A. Storey, Theories, tools and research methods in program comprehension: past,present and future, Software Quality Journal 14 (2006), no. 3, 187–208.

[50] E. Burton Swanson, The dimensions of maintenance, Proceedings of the 2nd inter-national conference on Software engineering, IEEE Computer Society Press, 1976,pp. 492–497.

[51] P. Tonella, M. Torchiano, B. Du Bois, and T. Systa, Empirical studies in reverseengineering: state of the art and future trends, Empirical Software Engineering 12(2007), no. 5, 551–571.

[52] W. M. Ulrich, Knowledge Mining: Business Rule Extraction & Reuse, System Trans-formation Portal, February 2009.

[53] X. Wang, J. Sun, X. Yang, Z. He, and S. Maddineni, Application of information-flowrelations algorithm on extracting business rules from legacy code, Intelligent Controland Automation, 2004. WCICA 2004. Fifth World Congress on, vol. 4, IEEE, 2004,pp. 3055 – 3058.

[54] X. Wang, J. Sun, X. Yang, Z. He, and S Maddineni, Business rules extraction fromlarge legacy systems, Software Maintenance and Reengineering, 2004. CSMR 2004.Proceedings. Eighth European Conference on, March 2004, pp. 249 – 258.

[55] I. Warren and J. Ransom, Renaissance: a method to support software system evolu-tion, Computer Software and Applications Conference, 2002. COMPSAC 2002. Pro-ceedings. 26th Annual International, IEEE, 2002, pp. 415–420.

[56] B. Xu, J. Qian, X. Zhang, Z. Wu, and L. Chen, A brief survey of program slicing,ACM SIGSOFT Software Engineering Notes 30 (2005), no. 2, 1–36.

BIBLIOGRAPHY 78

[57] E. Yourdon, Structured Analysis Wiki, http://yourdon.com/strucanalysis, February2011.

[58] A. Zaidman, T. Calders, S. Demeyer, and J. Paredaens, Applying Webmining Tech-niques to Execution Traces to Support the Program comprehension Process, SoftwareMaintenance and Reengineering, 2005. CSMR 2005. Ninth European Conference on,IEEE, 2005, pp. 134–142.

semiautomatic reverse engineering tool on oracle forms...

Documents