[ieee 2008 23rd ieee/acm international conference on automated software engineering - l'aquila,...

4
A generic approach for class model normalization Jean-Rémy Falleri Marianne Huchard Clémentine Nebut LIRMM, CNRS and Université Montpellier 2, 161, rue Ada, 34392 Montpellier cedex 5, France E-mail: {falleri, huchard, nebut}@lirmm.fr Abstract Designing and maintaining a huge class model is a very complex task. When an object oriented software or model grows, duplicated elements start to appear, decreas- ing the readability and the maintainability. In this paper, we present an approach, implemented in a tool and validated by a case study, that helps software architects designing and improving their class models, discarding redundancy and adding relevant abstractions. Since many different lan- guages allow to express class models, this approach has been made generic i.e. capable of dealing with any lan- guage described by a meta-model. 1. Introduction Well designed class models make the software easier to understand, maintain and reuse. Unfortunately, when the software reaches a large size it is almost impossible for a software architect to know each detail of the architec- ture. Consequently, element duplications are unintention- ally introduced, and removing them can hardly be manu- ally achieved. For example, in Apache Common Collection software which contains 544 attributes, 373 duplications of attribute names have been found. Solutions are proposed in the literature on class model design to eliminate redundancies. In [6], five different nor- mal forms are defined, which guarantee that a given at- tribute or method will appear exactly once in the whole class model and that inheritance links correspond to attribute and method sets inclusion or refinement. These forms differ by the number of multiple inheritance links introduced. The contribution presented here is a generic approach and tool using Model-Driven Engineering (MDE) and Re- lational Concept Analysis (RCA) to perform class model normalization which has been applied on real world class models coming from different languages (Java and Ecore). France Télécom R&D has supported this work (CPRE 5326) RCA is used as a theoretical framework for eliminating duplicated elements and highlighting abstractions, while MDE gives us keys for defining a generic tool able to deal with many different input/output modeling and program- ming languages with only declarative parametrization. 2. Background Formal and Relational Analysis Formal Concept Anal- ysis [5, 6] is a clustering method that classifies a set of en- tities described by binary characteristics. For example (Fig. 1), entities are classes and characteristics are their attributes. A concept is composed of a maximal set of entities (extent) and the maximal set of characteristics they share (intent). In our small example, ({Student, T eacher}, {name}) and ({Student}, {name, number}) are two concepts. Con- cepts are partially ordered in a lattice using extent inclu- sion. ({Student, T eacher}, {name}) is a super-concept of ({Student}, {name, number}). In the left of Figure 2, the concept lattice is presented in a simplified form (top-to- bottom inherited characteristics and bottom-to-top inherited entities have been removed). Figure 1. First step of FCA on a UML model Figure 2. Second step of FCA on a UML model 978-1-4244-2188-6/08/$25.00 ©2008 IEEE 431

Upload: clementine

Post on 14-Feb-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

A generic approach for class model normalization ∗

Jean-Rémy Falleri Marianne Huchard Clémentine Nebut

LIRMM, CNRS and Université Montpellier 2,161, rue Ada, 34392 Montpellier cedex 5, France

E-mail: {falleri, huchard, nebut}@lirmm.fr

Abstract

Designing and maintaining a huge class model is avery complex task. When an object oriented software ormodel grows, duplicated elements start to appear, decreas-ing the readability and the maintainability. In this paper, wepresent an approach, implemented in a tool and validatedby a case study, that helps software architects designingand improving their class models, discarding redundancyand adding relevant abstractions. Since many different lan-guages allow to express class models, this approach hasbeen made generic i.e. capable of dealing with any lan-guage described by a meta-model.

1. Introduction

Well designed class models make the software easier tounderstand, maintain and reuse. Unfortunately, when thesoftware reaches a large size it is almost impossible fora software architect to know each detail of the architec-ture. Consequently, element duplications are unintention-ally introduced, and removing them can hardly be manu-ally achieved. For example, in Apache Common Collectionsoftware which contains 544 attributes, 373 duplications ofattribute names have been found.

Solutions are proposed in the literature on class modeldesign to eliminate redundancies. In [6], five different nor-mal forms are defined, which guarantee that a given at-tribute or method will appear exactly once in the whole classmodel and that inheritance links correspond to attribute andmethod sets inclusion or refinement. These forms differ bythe number of multiple inheritance links introduced.

The contribution presented here is a generic approachand tool using Model-Driven Engineering (MDE) and Re-lational Concept Analysis (RCA) to perform class modelnormalization which has been applied on real world classmodels coming from different languages (Java and Ecore).

∗France Télécom R&D has supported this work (CPRE 5326)

RCA is used as a theoretical framework for eliminatingduplicated elements and highlighting abstractions, whileMDE gives us keys for defining a generic tool able to dealwith many different input/output modeling and program-ming languages with only declarative parametrization.

2. Background

Formal and Relational Analysis Formal Concept Anal-ysis [5, 6] is a clustering method that classifies a set of en-tities described by binary characteristics. For example (Fig.1), entities are classes and characteristics are their attributes.A concept is composed of a maximal set of entities (extent)and the maximal set of characteristics they share (intent). Inour small example, ({Student, Teacher}, {name}) and({Student}, {name, number}) are two concepts. Con-cepts are partially ordered in a lattice using extent inclu-sion. ({Student, Teacher}, {name}) is a super-conceptof ({Student}, {name, number}). In the left of Figure 2,the concept lattice is presented in a simplified form (top-to-bottom inherited characteristics and bottom-to-top inheritedentities have been removed).

Figure 1. First step of FCA on a UML model

Figure 2. Second step of FCA on a UMLmodel

978-1-4244-2188-6/08/$25.00 ©2008 IEEE 431

Formal Concept Analysis is powerful to distribute at-tributes in a class hierarchy, but is unable to deal with re-lational descriptions. As an example, let us consider theclass model in the left of Figure 3. The application of FCAon this model leads to the creation of the model shown inthe right of Figure 3. The resulting model, even if it is innormal form, can still be improved. A new attribute withtype Person can be introduced in the class Person in orderto generalize the friends and colleagues properties.

Figure 3. Limitations of FCA on a UML model

Relational Concept Analysis [2, 7, 1, 4] is an extensionof FCA that aims at mining abstractions in data for whichentities are described by characteristics and by links. InRCA, instead of having just one binary relation, there is aset of relations, some of them describe several kinds of en-tities by characteristics (formal contexts) while others (re-lational contexts) describe links between entities (possiblyfrom several kinds). In Figure 4 there are two kinds of en-tities (properties described by names, classes with no de-scription) and two relations (one associating properties withtheir types, the other associating the classes with their prop-erties).

Figure 4. RCA contexts for a UML model

An iterative lattice construction is then applied whereseveral lattices are built and the discovered concepts at onestep are injected as new entities in the relational contextsfor next step. This iterative construction stops whenever foreach kind of entities, the lattices built while performing twosuccessive steps are isomorphic. The class model in Figure5 has been produced from the lattices deduced from tablesof Figure 4.

Model Driven Engineering At the beginning of the RCAprocess, class models have to be encoded into formal and

Figure 5. RCA result on a UML model

relational contexts. At the end, lattices have to be con-verted back to the initial class model format. As a con-sequence, to build an RCA-based tool able to deal with alarge range of input data formats, it is necessary to developas well a great number of encoders and decoders. Such anarchitecture requires too much coding effort and the log-ics of encoding/decoding is hidden in the code. When wewant to deal with a new class model language or languageversion or when we want to test new RCA configurations,code has to be written or changed, requiring a competentdeveloper. The solution we propose to tackle this issue isbased on Model-Driven Engineering [8] that is a recent soft-ware development paradigm centered on models and modeltransformations. In an MDE-based development, every pro-duced or used artifact (including code) is a model, whosestructure is defined by a meta-model (a model is said to con-form to a meta-model). To pragmatically handle two mod-els that conform to two different meta-models (for exam-ple to transform a UML model into a Relational Databasemodel), a program has to be written, dealing with bothmeta-models. For that purpose, MDE assumes the exis-tence of a unique meta-metamodel. Such a meta-metamodelallows to define how a meta-model is structured. Mainly,two meta-metamodels are used: EMOF [9] (defined by theOMG) and Ecore [3] (defined by Eclipse).

3. Generic class model normalization

In this section, we describe our approach, summarizedin Figure 6, that integrates RCA and MDE to perform classmodel normalization. Three successive model transforma-tions are defined: the encoding step transforms the inputclass model (which can be Java code, a UML class model,. . . ) into the tables (Relational Context Family or RCFmodel) representing this class model; the RCA step appliesthe RCA process on the RCF to build the concept latticefamily (CLF); the decoding step transforms the CLF modelinto a class model conform to the input meta-model.

Our objective is to make generic the first (encoding) andthe third (decoding) transformations. By generic, we meanthat they are written independently from the metamodel towhich the input/output models conform. The solution wechose is to simply ask the user of the tool to define a con-

432

Figure 6. Process overview

figuration file declaring which elements in the input meta-model she wants to consider in the RCA process.

Figure 7. A sample model

Figure 8. The Relational Context Family meta-model

We consider as an example a simple UML model (Fig-ure 7). Let us suppose that we want to apply the same RCAconfiguration as in Figure 4 to the sample UML model. Todo that, we want to create two formal contexts, one describ-ing the classes and one describing the properties. In orderto merge properties, the name of the properties has to beused as an attribute in the property formal context. Two re-lational contexts are also required: one describing the owne-dAttribute relation between classes and properties, and one

describing the type relation between the properties and theclasses. In order to give that kind of information to the en-coding and decoding transformations, we have introduced aconfiguration meta-model, shown in Figure 9.

Figure 9. The encoding/decoding configura-tion metamodel

The encoding transformation uses two models to fulfillits goal: a class model (UML, Java, . . . ) and a configura-tion model conform to the configuration meta-model previ-ously shown. To remain in the MDE paradigm, we createda meta-model for the RCF that will be produced by thistransformation (Figure 8). This transformation works asfollows. First, a formal context is created for each Formal-ContextCreation element in the configuration model. Enti-ties of this formal context are the elements coming from theclass model which are conform to the meta-class defined inthe metaClass attribute of the FormalContextCreation ele-ment. The attributes of this formal context will be createdaccording to the values of the metaAttributes attribute of theFormalContextCreation element.

Figure 10 shows in a textual format the configurationmodel used to encode the sample UML model. Accord-ing to this configuration model, two formal contexts will becreated: one for the classes (MetaClass class) and one forthe properties (MetaClass Property). No attributes will becreated in the class formal context. The value of the nameattribute from the properties will be used in the propertiesformal context. Figure 11 shows the two formal contextsKproperty and Kclass created using both the sample UMLmodel and the sample configuration model.

RCA Config. for UML Class Models:

Formal Context Creations:- MetaClass Class: metaAttributes = [],

metaSpecializationLink = "generalization.general"- MetaClass Property: metaAttributes = ["name"],

metaSpecializationLink = "redefinedProperty"

Relational Context Creations:- MetaReference ownedAttribute: source = Class,

target = Property- MetaReference type: source = Property,

target = Class

Figure 10. UML configuration model

433

Figure 11. The generated UML contexts

After having created the formal contexts, the encod-ing transformation creates the relational contexts. One re-lational context will be created for each RelationalCon-textCreation element from the configuration model. Thesource and target attributes from the RelationalContextCre-ation element will define which are the entities involved inthis relational context. The source entities are the entitiesof the FormalContextCreation defined as source of the Re-lationalContextCreation element, and so on for the targetentities. Then, for each source entity, the encoding transfor-mation will search if relations with the target entities of thetype defined in the metaReference attribute of the Relation-alContextCreation element exist in the input class model.Those relations will be reported into the relational context.In the UML configuration model of Figure 10, we can seethat two relational contexts will be created (they are shownin Figure 11). RownedAttribute stems from the MetaRefer-ence ownedAttribute in the configuration model. It links thethe classes and the properties: a pair will be added in therelation each time a class owns an attribute. Rtype stemsfrom the MetaReference type in the configuration model. Itlinks the properties and the classes: a pair will be added inthe relation each time a property is typed by a class. Thedecoding transformation is defined using same principles.

4. Conclusion

We have presented in this paper a theory and a tool allow-ing to normalize class models based on different metamod-els. The normalization process is based on Relational Con-cept Analysis. A case study has been conducted to demon-strate that the RCA process can be adapted just modifying

the configuration model of the underlying model transfor-mations. This case study has been conducted on two Ecoremodels, two Java programs and five UML models. A quan-titative analysis has been performed in terms of dedicatedmetrics on the obtained results. For instance, in ApacheCommon Collections, initially composed of 250 classes,RCA discovers 34 new classes and introduces 9 new at-tributes; in UML2 metamodel (written in Ecore), initiallycomposed of 246 classes and 615 properties, RCA finds1534 new classes, 2 new attributes and 996 new references.The experiments conducted with the tool confirmed us inthe intuitive idea that some RCA configurations allow todiscover lots of abstractions, among them a small numberof very relevant ones (that cannot be found with simplerconfigurations and FCA), and a large number of a poorly-interesting ones that need to be eliminated by further filter-ing. Future work will try to resolve this issue using severalresearch directions including the use of natural languageprocessing.

References

[1] G. Arévalo, J.-R. Falleri, M. Huchard, and C. Nebut. Buildingabstractions in class models: Formal concept analysis in amodel-driven approach. In O. Nierstrasz, J. Whittle, D. Harel,and G. Reggio, editors, MoDELS, volume 4199 of LectureNotes in Computer Science, pages 513–527. Springer, 2006.

[2] M. Dao, M. Huchard, M. R. Hacene, C. Roume, andP. Valtchev. Improving Generalization Level in UML Mod-els Iterative Cross Generalization in Practice. In K. E. Wolff,H. D. Pfeiffer, and H. S. Delugach, editors, ICCS, volume3127 of Lecture Notes in Computer Science, pages 346–360.Springer, 2004.

[3] Eclipse. The Eclipse Modeling Framework. http://www.eclipse.org/emf, 2005.

[4] J.-R. Falleri, M. Huchard, C. Nebut, and G. Arévalo. A ModelDriven Engineering approach for making generic FCA/RCAtools. In J. Diatta, P. Eklund, and M. Liquière, editors, Pro-ceedings of the Fifth International Conference on ConceptLattices and Their Applications (CLA’07), 2007.

[5] B. Ganter and R. Wille. Formal Concept Analysis: Mathemat-ical Foundations. Springer-Verlag New York, Inc. Secaucus,NJ, USA, 1997.

[6] R. Godin and P. Valtchev. Formal concept analysis-basedclass hierarchy design in object-oriented software develop-ment. In Formal Concept Analysis, volume 3626 of LectureNotes in Computer Science, pages 304–323. Springer, 2005.

[7] M. Huchard, M. R. Hacene, C. Roume, and P. Valtchev. Re-lational concept discovery in structured datasets. Ann. Math.Artif. Intell., 49(1-4):39–76, 2007.

[8] S. Kent. Model Driven Engineering. In M. J. Butler, L. Petre,and K. Sere, editors, IFM, volume 2335 of Lecture Notes inComputer Science, pages 286–298. Springer, 2002.

[9] OMG. MOF 2.0 core specification. http://www.omg.org/cgi-bin/doc?ptc/2004-10-15, 2004.

434