revealing mistakes in concern mapping tasks: an experimental evaluation

10
Revealing Mistakes in Concern Mapping Tasks: An Experimental Evaluation Camila Nunes 1 , Alessandro Garcia 1 , Eduardo Figueiredo 2 , Carlos Lucena 1 1 Opus Research Group, Software Engineering Lab, Informatics Department - PUC-Rio, Rio de Janeiro, Brazil 2 Computer Science Department, Federal University of Minas Gerais, Belo Horizonte, Brazil {cnunes, afgarcia, lucena}@inf.puc-rio.br, [email protected] Abstract — Concern mapping is the activity of assigning a stakeholder’s concern to its corresponding elements in the source code. This activity is primordial to guide software maintainers in several tasks, such as understanding and restructuring the implementation of existing concerns. Even though different techniques are emerging to facilitate the concern mapping process, they are still manual and error- prone according to recent studies. Existing work does not provide any guidance to developers to review and correct concern mappings. In this context, this paper presents the characterization and classification of eight concern mapping mistakes commonly made by developers. These mistakes were found to be associated with various properties of concerns and modules in the source code. The mistake categories were derived from actual mappings of 10 concerns in 12 versions of industry systems. In order to further evaluate to what extent these mistakes also occur in wider contexts, we ran two experiments where 26 subjects mapped 10 concerns in two systems. Our experimental results confirmed the mapping mistakes that often occur when developers need to interact with the source code. Keywords: Concern Mapping, Mapping Mistakes, Experimental Evaluation. I. INTRODUCTION Understanding and restructuring how concerns are realized in the source code are often required in software maintenance activities. For this reason, developers have to gather full knowledge about all the implementation elements that realize one or more concerns [1, 2]. Otherwise, the developers can incorrectly implement a change in non-related modules or miss relevant modifications. A concern is defined as any consideration made when producing the source code, which can impact on the software design and maintenance [1]. According to this definition, concerns can be, for instance, design patterns, programming idioms, functional concerns, and non-functional concerns. As a result, a key task is to explicitly map the implementation elements related to a specific concern, the so-called concern mapping [1-5]. The goal is to maintain correct and complete mappings according to the current version of the system. However, this is not trivial to achieve as each concern is usually scattered and tangled across the modular decomposition of a system [1-3]. Many techniques and tools were proposed with the goal of supporting concern mapping processes [1, 2, 4]. They are usually classified into three categories: static [1, 6], dynamic [7-9], and hybrid approaches [2, 10, 11]. Even though these techniques and tools have facilitated the process of concern mapping, it is still manual and error- prone to a large extent [1, 12-14]. Even when it is partially supported by a tool, developers still need to verify if the initial assignments are correct and complete. The mapping developers or reviewers need to check for each concern if: (i) elements in the module implementations were missed (i.e., false negatives), and (ii) all the mapped elements are correct (i.e., false positives). These activities are essential to guarantee the maximum mapping precision before the actual software change is carried out. However, software engineers are not equipped with any kind of guidance to promote or review the correctness and completeness of their concern mappings. As a result, this activity is often cumbersome and performed in an ad-hoc fashion. In fact, the number of mapping mistakes tends to be high according to recent studies [12-14]. Revelle et al. provided some guidelines to help programmers identify concerns in the source code [13]. However, they did not try to characterize and classify actual recurring mistakes made by different developers. For instance, they did not investigate which properties of concerns and module structures tend to lead to such mistakes. There is also no work that analyzes which mistake categories mostly influenced inaccurate concern mappings. This limitation is also becoming increasingly relevant given the growing number of empirical studies that rely on the assumption of reliable concern mappings [12-19]. In this context, this paper presents a study where we identified, characterized, and classified recurring concern mapping mistakes. These mistakes were found to be often associated with certain properties of concerns and modules in the source code. For instance, high concern tangling [15, 16] is interleaved with other concerns at the level of methods or modules; and overly communicative concerns [16] are characterized when there are many dependences among classes that implement different concerns. As a consequence, these intricate concern realizations tend to harm the correct identification of elements in the source code. Therefore, our first research stage comprised the classification of recurring mapping mistakes detected in an exploratory analysis of 12 versions of an industry system. Two experienced developers were responsible for mapping 10 concerns in the source code of these systems using the ConcernMapper tool [20] (Section III). The second research stage focused on evaluating to what extent these mistakes may occur in other contexts. We run two controlled experiments where 26 subjects had to identify 10 concerns in the source code. The goal of such experiments was to verify the occurrence rate of the mistake categories. In this second stage, we used two additional software systems [21, 22] different from the one used in the first stage. These systems were chosen because we needed reliable and complete reference mappings, 2011 15th European Conference on Software Maintenance and Reengineering 1534-5351/11 $26.00 © 2011 IEEE DOI 10.1109/CSMR.2011.16 101 2011 15th European Conference on Software Maintenance and Reengineering 1534-5351/11 $26.00 © 2011 IEEE DOI 10.1109/CSMR.2011.16 101

Upload: ufba

Post on 10-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Revealing Mistakes in Concern Mapping Tasks: An Experimental Evaluation

Camila Nunes1, Alessandro Garcia1, Eduardo Figueiredo2, Carlos Lucena1 1 Opus Research Group, Software Engineering Lab, Informatics Department - PUC-Rio, Rio de Janeiro, Brazil

2 Computer Science Department, Federal University of Minas Gerais, Belo Horizonte, Brazil {cnunes, afgarcia, lucena}@inf.puc-rio.br, [email protected]

Abstract — Concern mapping is the activity of assigning a stakeholder’s concern to its corresponding elements in the source code. This activity is primordial to guide software maintainers in several tasks, such as understanding and restructuring the implementation of existing concerns. Even though different techniques are emerging to facilitate the concern mapping process, they are still manual and error-prone according to recent studies. Existing work does not provide any guidance to developers to review and correct concern mappings. In this context, this paper presents the characterization and classification of eight concern mapping mistakes commonly made by developers. These mistakes were found to be associated with various properties of concerns and modules in the source code. The mistake categories were derived from actual mappings of 10 concerns in 12 versions of industry systems. In order to further evaluate to what extent these mistakes also occur in wider contexts, we ran two experiments where 26 subjects mapped 10 concerns in two systems. Our experimental results confirmed the mapping mistakes that often occur when developers need to interact with the source code.

Keywords: Concern Mapping, Mapping Mistakes, Experimental Evaluation.

I. INTRODUCTION Understanding and restructuring how concerns are

realized in the source code are often required in software maintenance activities. For this reason, developers have to gather full knowledge about all the implementation elements that realize one or more concerns [1, 2]. Otherwise, the developers can incorrectly implement a change in non-related modules or miss relevant modifications. A concern is defined as any consideration made when producing the source code, which can impact on the software design and maintenance [1]. According to this definition, concerns can be, for instance, design patterns, programming idioms, functional concerns, and non-functional concerns. As a result, a key task is to explicitly map the implementation elements related to a specific concern, the so-called concern mapping [1-5]. The goal is to maintain correct and complete mappings according to the current version of the system. However, this is not trivial to achieve as each concern is usually scattered and tangled across the modular decomposition of a system [1-3].

Many techniques and tools were proposed with the goal of supporting concern mapping processes [1, 2, 4]. They are usually classified into three categories: static [1, 6], dynamic [7-9], and hybrid approaches [2, 10, 11]. Even though these techniques and tools have facilitated the process of concern mapping, it is still manual and error-

prone to a large extent [1, 12-14]. Even when it is partially supported by a tool, developers still need to verify if the initial assignments are correct and complete. The mapping developers or reviewers need to check for each concern if: (i) elements in the module implementations were missed (i.e., false negatives), and (ii) all the mapped elements are correct (i.e., false positives). These activities are essential to guarantee the maximum mapping precision before the actual software change is carried out. However, software engineers are not equipped with any kind of guidance to promote or review the correctness and completeness of their concern mappings. As a result, this activity is often cumbersome and performed in an ad-hoc fashion.

In fact, the number of mapping mistakes tends to be high according to recent studies [12-14]. Revelle et al. provided some guidelines to help programmers identify concerns in the source code [13]. However, they did not try to characterize and classify actual recurring mistakes made by different developers. For instance, they did not investigate which properties of concerns and module structures tend to lead to such mistakes. There is also no work that analyzes which mistake categories mostly influenced inaccurate concern mappings. This limitation is also becoming increasingly relevant given the growing number of empirical studies that rely on the assumption of reliable concern mappings [12-19].

In this context, this paper presents a study where we identified, characterized, and classified recurring concern mapping mistakes. These mistakes were found to be often associated with certain properties of concerns and modules in the source code. For instance, high concern tangling [15, 16] is interleaved with other concerns at the level of methods or modules; and overly communicative concerns [16] are characterized when there are many dependences among classes that implement different concerns. As a consequence, these intricate concern realizations tend to harm the correct identification of elements in the source code. Therefore, our first research stage comprised the classification of recurring mapping mistakes detected in an exploratory analysis of 12 versions of an industry system. Two experienced developers were responsible for mapping 10 concerns in the source code of these systems using the ConcernMapper tool [20] (Section III).

The second research stage focused on evaluating to what extent these mistakes may occur in other contexts. We run two controlled experiments where 26 subjects had to identify 10 concerns in the source code. The goal of such experiments was to verify the occurrence rate of the mistake categories. In this second stage, we used two additional software systems [21, 22] different from the one used in the first stage. These systems were chosen because we needed reliable and complete reference mappings,

2011 15th European Conference on Software Maintenance and Reengineering

1534-5351/11 $26.00 © 2011 IEEE

DOI 10.1109/CSMR.2011.16

101

2011 15th European Conference on Software Maintenance and Reengineering

1534-5351/11 $26.00 © 2011 IEEE

DOI 10.1109/CSMR.2011.16

101

which were performed independently by the original developers. A static non-automatic approach was followed by the experiment subjects as the goal was to identify technique-agnostic mistakes made by developers. The use of dynamic and hybrid techniques was not the main target of our evaluation. The reason was that our goal was to detect the highest number of possible mistakes when developers interact with the source code. In fact, interaction with source code is also required in concern mappings partially derived with hybrid and dynamic approaches. In addition, certain techniques have limitations that are impeditive to our study. For instance, dynamic techniques are not effective to identify overlapping concerns [2, 10].

The main contributions of this paper are threefold. First, it classifies and documents concern and module characteristics often associated with mapping mistakes. We give concrete examples and discuss potential reasons on why such mistake categories might occur in concern mappings. Second, we also discuss and identify potential relationships among the mapping mistakes. Finally, we study the representativeness and frequency of the mistake categories through two controlled experiments (Section V). We also compare our work with previous studies (Section II), identify the threats to validity of our research (Section VI), and present our final remarks and future work (Section VII).

II. RELATED WORK This section discusses how our research covers the gap

of existing work in literature. The key related work is classified into two categories: concern mapping studies and techniques for supporting concern mappings.

A. Concern Mapping Studies A few recent studies were conducted to better

understand the concern mapping problem [12-14]. For instance, Robillard et al. [12] conducted an empirical study to assess the overall accuracy of concern mapping in four systems. The subjects of this study used a tool [20] to perform the concern mappings. The authors selected 16 concerns in the target systems; for each concern they asked three subjects to produce concern mappings. However, they have not identified, characterized and classified the recurring types of mapping mistakes. Revelle et al. [13] identified which factors influenced the consistent concern identification in two case studies. Two developers were instructed to identify concerns in the source code. They also provided some guidelines to help programmers to identify concerns in the source code. However, the influential factors identified by the authors are very general, and they do not explicitly reveal the actual mapping mistakes made by developers. In addition, both Robillard’s and Revelle’s studies do not consider different types of (non-)crosscutting concerns.

Figueiredo et al. [14] conducted an experimental study to investigate the accuracy of manual concern mappings. The authors evaluated the impact of the mappings on the precision of metrics to quantify crosscutting concern properties. For this, they measured the amount of hits, false positives and false negatives for each concern mapping. This work is different from ours because the

authors have not identified or categorized the mapping mistakes made by developers. As a consequence, developers are still left without any guidance on how to correct and extend their concern representation.

B. Techniques Supporting Concern Mapping Some tools and techniques support developers in

manually performing concern representations, such as ConcernMapper, ConcernTagger, FEAT, Concern Manipulation Environment, and intentional views [1, 20, 23]. There are several semi-automatic techniques for supporting concern (or feature) location [1-3], and concern mining [24-27]. For instance, FLAT3 [24] combines dynamic traces with information retrieval to identify user-visible concerns. FLAT3 is based on several existing tools, such as the Lucene library, MUTT, ConcernTagger, and ConcernMapper. There are also many concern location techniques that are classified as static [1, 6], dynamic [7-9], or hybrid approaches [2, 10, 11]. The former consists of static analysis techniques based on dependency graphs. They always require interaction between developers and the tool to manually assign each concern to implementation elements [1, 6]. The dynamic approaches try to identify parts of the source code that implement user-level features through program executions [5, 7-9]. To be applied, these techniques rely on the use, extension and tuning of pre-existing test suites. This often implies that a test coverage analysis is required. However, dynamic techniques are usually not accurate and most of the tools only provide coverage at the level of methods and branches.

Other authors [2, 10, 11] propose hybrid techniques that blend characteristics of both static and dynamic analyses. The evaluation of these techniques focuses mainly on hits, false positives and false negatives. The rate of mistakes is often very high [13, 14]. However, they do not reveal the nature of mistakes made manually by developers when either completing or correcting concern mappings. Our work is different because we are interested in characterizing and classifying mapping mistakes. This classification is also useful for enhancing concern mining techniques. For example, the use of semi-automatic techniques does not capture multi-partition concerns or overly communicative concerns (Section IV).

III. STUDY SETTINGS This section describes and justifies the industry systems

(Section A) used for identifying the recurring types of mapping mistakes. Section B describes the study procedures.

A. The Target Industry Systems A framework and three instantiated applications were

used to reveal the mapping mistakes (Section IV). Each application contains a copy of the framework code. All applications are of the logistic domain and have the goal of managing oil control aspects. These systems were selected for various reasons. First and foremost, the developers had the goal of re-engineering the framework design. To this end, they performed full mappings of several concerns in the code of the framework and its

102102

application instances. Second, they are part of a program family, which contain multiple releases have evolved since 2006. Third, the framework (51 KLOC) and applications (120 KLOC) have a significant size and complex modules. Fourth, they were modified along the years by different developers. To preserve copyright constraints, the fictitious name of Oil Control (OC) framework is used in this paper to refer to all the three applications and the framework. The OC framework is also used as the running example in Section IV. In Section V, we will evaluate the representativeness of the mapping mistakes in the context of two independent experiments.

B. Study Procedures This section describes the study preparatory steps

followed to detect and classify concern mapping mistakes made by developers, and explain the reasons behind them. Basically, the study procedures were divided into three stages. First, the developers selected the versions and concerns to be mapped in each system. A set of twelve versions of the OC framework was chosen; they correspond to four versions of each application. Multiple versions were used for concern mappings as the developers need to find out how concern realizations evolved over time. This was interesting for our analysis due to two reasons. We observed the nature of the concern mapping mistakes along versions. The developers had to produce mappings with the maximum precision (100%) due to two reasons: (i) reliable concern-driven measures had to be produced in order to help detecting features that should be refactored, and (ii) in certain cases, the entire concern code had to be found and removed.

A set of (non-)crosscutting concerns was selected. The following crosscutting concerns of the OC framework were chosen: Transaction, Exception Handling, and Logger. We also selected concerns of the logistic application domain: Product, Report, Notification, Route, Export, Scenarium, and Blend. These concerns were chosen because they are representative and contain relevant variabilities in the application domain. They are briefly described in Table I. Descriptions of these concerns were given to the subjects.

Second, in the reengineering of the OC framework, two subjects were responsible for mapping the concerns by using a tool called ConcernMapper [20]. These subjects are experienced developers and have several programming skills, including extensive knowledge on object-oriented programming and the Java language. The ConcernMapper tool [20] was used as its plug-in is well documented and has an easy integration with Eclipse. Finally, we analyzed all the concern mappings produced by each subject in order to verify the differences among them and thus observing and correlating the mistake manifestations. All the other developers were involved in review meetings in order to evaluate the mapping accuracy. They were involved in: (i) the validation of false positives and false negatives found, and (ii) the discussion about the characterization of common mapping mistakes found (Section IV).

The evaluation of the concern mapping accuracy in the OC framework is representative in terms of recurring

software reengineering scenarios. In fact, it contains typical challenging characteristics to be faced during the process of concern mappings. This framework contains little documentation of its architecture, use cases, and detailed design. Thus, the base documentation consists of the existing comments in the source code. In addition, it contains several types of (non-)crosscutting concerns of different sizes and scopes.

TABLE I. CONCERNS ANALYZED IN THE OC FRAMEWORK

Concerns Descriptions

OC

fram

ewor

k

Logger It saves information about program execution and/or errors.

Product It represents a product and its characteristics in the logistic domain.

Report It represents the report exhibition, exportation and printing.

Notification It defines a system notification to users (e.g., email).

Route It represents a route of products between two points in the logistics context.

Export It represents the generation of reports for different files formats (e.g., Excel).

Scenarium It represents exportation and importation properties of the products.

Blend It identifies a blend of products, a composition of products.

Exception Handling

It is the policy to handle exceptional conditions (i.e., the strategy within try-catch blocks).

Persistence/ Transaction

It is responsible for storing and recovering data from the database and ensuring ACID properties.

IV. MISTAKES ON CONCERN MAPPINGS

The mistakes were revealed by observing the mappings for the OC framework (Section III.A). The goal was to identify, characterize and categorize recurring mapping mistakes in order to help developers avoiding them. A set of eight mistakes were characterized and classified in two broad categories: (i) Concern Characteristics: types of mistakes which were found to be related to particular properties of how concerns are realized in the source code (Section A), and (ii) Module Characteristics, types of mistakes that are related to properties of the modularity units affected by the mappings, such as classes, super-classes, and methods (Section B).

For each category, we identified a set of mistakes and described the reasons for their occurrences. Additionally, we also use an example to illustrate and explain each mistake subcategory. Subjects involved in the OC framework study mapped the same set of concerns in twelve versions of the target systems (Section III.A). The categories of mistakes are related to missing or incorrect code elements in the mappings. This means that for the absence of elements, pieces of modularity units were omitted. That is, subjects did not map either coarse-grained or fine-grained elements responsible for realizing the concern. Incorrect elements are those that, although mapped, do not contribute to implement the concern. The implementation elements correspond to classes, methods, and attributes.

A. Concern Characteristics This category is mainly related to concern

characteristics in terms of the concern implementation. The sub-categories refer to: (i) dependencies among the implementation elements which contribute for

103103

implementing the same concern; (ii) interaction among concerns which is based on how concern realizations share implementation elements (e.g., methods); and (iii) the existence of modularity anomalies, such as bad smells [16, 28], associated with a concern implementation (e.g., the Feature Envy smell [16, 28]). Multi-Partition Concern. The multi-partition concern is a specific case of concern implementation scattered in several modules (classes or methods). A scattered concern has multiple partitions when: (i) a sub-set of modules (i.e., a given partition of the concern) that implement this concern contain explicit references among them, and (ii) one or more of the other modules (i.e., forming another partition of the same concern) do not contain explicit references to the sub-set in (i). The lack of explicit references between concern partitions was the reason that elements of a partition were not included in the concern mapping made by the subjects. Figure 1 illustrates a case of multi-partition concern, which is represented by a disconnected graph. According to this figure, classes in the A.1 sub-graph contain explicit references to each other. On the other hand, classes in the A.2 sub-graph which implements the same concern are not explicitly connected to classes in the A.1 sub-graph. This mistake was made by two subjects that did not map the modules without explicit references (A.2). The reason is that they started the mapping process by browsing the code of classes that contain explicit references to each other in a specific concern partition (A.1). As a result, the other classes responsible for implementing the concern were not mapped. This mistake happened, for instance, with the Scenarium concern in the OC framework. This concern is very scattered and is realized by several classes that do not have explicit references among them.

Concern A

Class

A.2

A.1

Figure 1. Multi-Partition Concern

Overly Communicative Concerns. This mistake is characterized when there are two sets of interconnected classes implementing the concerns A and B, respectively. These concerns communicate and for this reason there are dependences among classes that implement the concerns A and B. Figure 2 illustrates how this mistake is characterized. We observed that this scenario occurred in the following situation. The subject started to map the classes that implement the concern A. After that, s/he consequently mapped a set of classes that implement the concern B as being part of the concern A (or vice-versa). The reason for this mistake is that most of the classes of the concern A communicate with other ones that implement the concern B, the so-called overly communicative concerns. For instance, this scenario

occurred with the Scenarium and Notification concerns in the OC framework (Table I). There are several classes related to the Notification concern that communicate with classes of the Scenarium concern, such as Common ScenariumServiceDB class in Figure 3. Therefore, subjects that started to map the Scenarium concern also mapped classes related to the Notification concern (lines 03-06) as being part of the Scenarium concern. This means that strong dependencies of two concerns in the source code tend to cause misunderstanding on concern mappings.

Concern A

Class

Concern B

Figure 2. Communicative Concerns

01 public class CommonScenariumServiceDB { 02 public void sendNotification(ScenariumInfo) { 03 N notif = serviceNotificationData. 04 buildApprovalNotification(..); 05 N NotificationService.getInstance(). 06 notifyToAllUsers(..); } Label: N – Notification

Figure 3. Example of Communicative Concerns in the OC framework

Concern-Sensitive Bad Smells. This mistake is related to the existence of a bad smell associated with the implementation of a specific concern in the source code [28, 29]. The realization of a concern in the source code might be associated with a bad smell [29]. Bad smells are symptoms in the source code that may be indicative of software quality problems [28]. Although the traditional definitions of bad smells are not directly based on the definition of separation of concerns, some works have been associating their occurrences with a poor modularization of concerns [14-16]. There are several kinds of bad smells [28] that usually fall in this case, such as Feature Envy, God Class, God Method. For example, Feature Envy is related to a method that implements a different concern from the main concern associated with the class implementation. God Class is defined as classes that implement more than one concern at the same time, whereas God Method refers to methods that implement more than one concern. Two specific types of mistakes associated with the occurrence of bad smells were observed in the OC framework (Concerns Interlacing and Concerns Overlapping) and are discussed below. Concern Interlacing. The concern interlacing occurs when two or more concerns partially affect one (or more) module(s) in common [15, 22]. The interlacing can be classified into two categories: module and method-level interlaces. We have observed that the higher the tangling among concerns in a module, the more difficult the concern mapping is. For instance, there were cases where

104104

lines of code of specific methods were not mapped to a specific concern. Basically, the main factor associated with this mistake is the existence of blocks of code inside the class that do not implement the main purpose of a method. This means that concern interlaces are often associated with the Feature Envy [28] smell. As discussed in Section IV.C, this mistake category can overlap with previous mistake category. Figure 4 illustrates a slice of code that implements the following concerns: Exception Handling, Logger, and Notification. It is possible to see the interlacing among these three concerns. In fact, this interlacing hampered the understanding and performance of the subjects during the mapping process. However, concern interlaces are not necessarily associated with bad OO programming practices; even by using known OO techniques, it is not possible to completely separate all tangled concerns of interest [22]. 01 EH try { 02 if (!isLoggedUser) { 03 N notifyNewUserLogged(user); 04 } 05 EH } catch (InvalidLoginException) { 06 L theLogger.log(...); 07 EH } catch (RequestInternalException) { 08 L theLogger.log(login, requestException); 09 N mailLoginErrorToSupport(..); 10 } Labels: EH - Exception Handling N - Notification L- Logger

Figure 4. Example of Interlacing among Concerns.

Concern Overlapping. The concern overlapping occurs when two concerns entirely share one or more code elements (methods, attributes, or classes). For these cases, the subject makes the following mapping mistake. S/he tends to map the implementation elements to a concern and not mapping the same ones to other concerns that rely on the same code fragment. We mainly noticed that the existence of this mistake is often related to instances of God Class, God Method, and Feature Envy [28]. Figure 5 illustrates this mapping mistake. According to this figure, we have a God Method that is responsible for starting a set of services. Several concerns overlap in this God method, such as the Logger (line 04), Base Data Access (lines 03), Transaction (lines 6-7) and Instrumentation code (lines 05-08). We have observed two different problems: (i) different concerns being implemented in the same method, and (ii) instrumentation code only used to test the system and activate the transaction. As a result, this method was not mapped for all the concerns that it implements, such as Logger and Transaction. Therefore, this element was omitted in the concern mapping. 01 private void createLogisticServices() { 02 loginService = LoginService.getInstance(); 03 userService = UserServiceDB.getInstance(); 04 logService = LogService.getInstance(); 05 if (isInTestMode()) { 06 transactionService = 07 TransactionService.getInstance(); 08 } 09}

Figure 5. Example of God Method

Code Clones. This mistake encompasses the existence of cloned code in the source code. Code Clones are similar pieces of code implementing the same concern in different modules [30]. As result, the existence of code clones hampered the subjects mapping all occurrences of similar code to a specific concern. In addition, if the same code clone is related to more than one concern, the subject tends to not mapping the same method for different concerns. Hence, it is important to have the help of a clone detection tool [30] for detecting code clones during the mapping.

B. Module Characteristics This category is associated with the modularity

properties of the implementation elements. Basically, these properties are associated with entire classes and methods, attributes, interfaces and super-classes. Dedicated Implementation Elements. This mistake is related to the absence or incorrect mapping of an entire modularity unit. Dedicated implementation elements are defined as classes and methods totally responsible for implementing a concern. Intuitively, it is expected that these elements are easier to be mapped to the concern because their entire structure contributes to the concern implementation. However, this mistake was common in the OC framework, and the reasons can be associated with a series of module-specific properties. Some of these reasons found were: (i) names of classes, methods, and attributes that do not directly reflect the concern functionality – i.e., there is no naming pattern for variables and methods during the system evolution; (ii) absence of detailed comments in the module code in order to help to understand its purpose; and (iii) classes and methods that are misplaced in packages and classes, respectively. 01 public class FolderAdminFrame extends GenericFrame { 02 private ScenariumInfoTree infoTree; 03 public FolderAdminFrame getFrame( GenericFrame, ScenariumInfoTree) { 04 frame = GenericFrame.getGenericWindow(..); 05 if (frame == null) { 06 frame = new FolderAdminFrame(..); 07 } 08 return new FolderAdminFrame(...); 09 } 10 }

Figure 6. Example of Scenarium with Permission Concerns

Examples of concerns omitted in the OC framework were Scenarium and Notification. For instance, there is a Permission concern in the OC framework, which is responsible for defining permissions for several types of concerns, such as Scenarium and Product. This way, two possible steps followed by subjects to map the Permission concern are: (i) they identify the dedicated classes related to the Permission concern, and (ii) they select the exact occurrences of methods that implement the permission of the Scenarium. Figure 6 shows the FolderAdminFrame class that is responsible for allowing access to the administrator tree and more specifically the folders of the Scenarium concern, such as getFrame() method. This method is part of the permission concern but it was not mapped by two subjects. The reason for this mistake is that

105105

subjects did not verify which methods implement the permission of the Scenarium concern. For this reason, this entire method was incorrectly mapped to the Permission concern. On the other hand, for the case where implementation elements were incorrectly mapped we noticed that this mistake is also related to existence of well-known bad smells: Feature Envy, God Method and God Class [28]. In both cases, developers tend to incorrectly map parts of the class structure to the concern. Attribute Mapping. This mistake is more specific and it is related to the lack of class attributes in concern mappings [26]. We observed that this was a common mapping mistake as the subjects did not frequently map the attributes to a concern. Basically, the reason we found was that, in general, the subjects tend to focus mainly on the behavior implemented by classes and methods. They do not worry about observing the class data to map the attributes of each concern. The consideration of attributes in concern mappings is especially important as they might indicate other types of implementation elements that take part of a concern implementation. Figure 7 illustrates an example where the two subjects only mapped the pieces of source code inside the checkUnsavedScenarium() method. They did not map the infos and selectedInfo attributes related to the Scenarium concern. 01 public class MainDesktop { 02 private Map<Code<ScenariumInfo>, ScenariumInfo> infos; 03 private ScenariumInfo selectedInfo; 04 private boolean checkUnsavedScenarium() { 05 user = Client.getInstance().getUser(); 06 context = ServerMonitor.getInstance(). 07 getLoginContext(); 08 for (ScenariumInfo info:infos.values()) { 09 ... 10 } 11 return false; 12}

Figure 7. Example of Scenarium Concern

Interfaces and Super-Classes. This was a common mistake as the subjects map the main class but do not map its super-classes and implemented interfaces that should also be mapped. Basically, we believe the subjects focused on classes visible on their screen. They did not take into consideration super-classes and interfaces related to the application domain which require navigating through the program hierarchical structure. A similar case is the incorrect mapping of super-classes and interfaces. For this case, super-classes and interfaces were part of a framework or API realizing a different concern. However, the subjects incorrectly mapped them as being part of the concern under analysis. Figure 8 illustrates the two cases aforementioned in the OC framework. In the first case, the NotificationServiceInterface interface was not mapped. It should be included in the mapping as it refers to the Notification concern which is responsible by the services of the concern. In the second case, the super-class was incorrectly mapped in the OC framework because the class AbstractAction is related to an API for manipulating users’ request.

// Omitted class for the Notification concern 01 public class NotificationService implements 02 NotificationServiceInterface { 03 ... 04} // Incorrect mapping of the Scenarium concern 01 public abstract class ScenariumAction 02 extends AbstractAction { 03 ... 04}

Figure 8. Code Fragment with Super Classes

C. Correlating the Mapping Mistakes The previous sections presented eight mapping

mistakes grouped in two categories. These mistakes are not fully independent, and the occurrence of a mistake can (in) directly imply another one (and vice-versa). This section discusses the potential relationships between the mistakes observed in our target cases (Section III.A). Documenting such relationships also help developers to understand and identify alternative reasons for a particular mapping imperfection. Figure 9 provides an overview of the mistake relationships where they are represented by “can be related” and “can influence” arrows connecting two mistakes. The term “can be related” represents the case when a mistake can be seen, and consequently quantified over different perspectives depending on its granularity and specificity. The term “can influence” means the existence of a mistake can affect the emergence of another one.

cDedicated Implementation Element

Attribute Mappings

Interfaces and Super-Classes

Clone Code

Multi-partition Concern

Overly Communicative Concern

Concern Overlapping

Concern Interlacing

can be related

can be relatedcan be related

can be related

can influence

can influence

can be related

Bad Smells

Figure 9. Mapping Mistakes Relashionships

The mistake classified as Dedicated Implementation Elements is the one with more associations with other mistake categories (Figure 9). In this case, it can be related to the following mistakes: Concern Overlapping, Interfaces and Super-classes, Multi-Partition Concern, Code Clone, and Overly Communicative Concern. For instance, Figure 8 shows an example where the interface NotificationServiceInterface was missed (false negative) in a concern mapping. We can see this mistake under two different perspectives: (1) the entire NotificationServiceInterface interface or the dedicated implementation element was not mapped; or (2) the interface and super-class were not mapped. Hence, the mistake can be quantified as: Dedicated Implementation Element and/or Interfaces and Super-Classes. The other relationships follow the same reasoning and obviously they all need to be analyzed depending on each case. This explanation also applies to the relationship between the

subjects did not verify which methods implement the permission of the Scenarium concern. For this reason, this entire method was incorrectly mapped to the Permission concern. On the other hand, for the case where implementation elements were incorrectly mapped we noticed that this mistake is also related to existence of well-known bad smells: Feature Envy, God Method and God Class [28]. In both cases, developers tend to incorrectly map parts of the class structure to the concern. Attribute Mapping. This mistake is more specific and it is related to the lack of class attributes in concern mappings [26]. We observed that this was a common mapping mistake as the subjects did not frequently map the attributes to a concern. Basically, the reason we found was that, in general, the subjects tend to focus mainly on the behavior implemented by classes and methods. They do not worry about observing the class data to map the attributes of each concern. The consideration of attributes in concern mappings is especially important as they might indicate other types of implementation elements that take part of a concern implementation. Figure 7 illustrates an example where the two subjects only mapped the pieces of source code inside the checkUnsavedScenarium() method. They did not map the infos and selectedInfo attributes related to the Scenarium concern. 01 public class MainDesktop { 02 private Map<Code<ScenariumInfo>, ScenariumInfo> infos; 03 private ScenariumInfo selectedInfo; 04 private boolean checkUnsavedScenarium() { 05 user = Client.getInstance().getUser(); 06 context = ServerMonitor.getInstance(). 07 getLoginContext(); 08 for (ScenariumInfo info:infos.values()) { 09 ... 10 } 11 return false; 12}

Figure 7. Example of Scenarium Concern

Interfaces and Super-Classes. This was a common mistake as the subjects map the main class but do not map its super-classes and implemented interfaces that should also be mapped. Basically, we believe the subjects focused on classes visible on their screen. They did not take into consideration super-classes and interfaces related to the application domain which require navigating through the program hierarchical structure. A similar case is the incorrect mapping of super-classes and interfaces. For this case, super-classes and interfaces were part of a framework or API realizing a different concern. However, the subjects incorrectly mapped them as being part of the concern under analysis. Figure 8 illustrates the two cases aforementioned in the OC framework. In the first case, the NotificationServiceInterface interface was not mapped. It should be included in the mapping as it refers to the Notification concern which is responsible by the services of the concern. In the second case, the super-class was incorrectly mapped in the OC framework because the class AbstractAction is related to an API for manipulating users’ request.

// Omitted class for the Notification concern 01 public class NotificationService implements 02 NotificationServiceInterface { 03 ... 04} // Incorrect mapping of the Scenarium concern 01 public abstract class ScenariumAction 02 extends AbstractAction { 03 ... 04}

Figure 8. Code Fragment with Super Classes

C. Correlating the Mapping Mistakes The previous sections presented eight mapping

mistakes grouped in two categories. These mistakes are not fully independent, and the occurrence of a mistake can (in) directly imply another one (and vice-versa). This section discusses the potential relationships between the mistakes observed in our target cases (Section III.A). Documenting such relationships also help developers to understand and identify alternative reasons for a particular mapping imperfection. Figure 9 provides an overview of the mistake relationships where they are represented by “can be related” and “can influence” arrows connecting two mistakes. The term “can be related” represents the case when a mistake can be seen, and consequently quantified over different perspectives depending on its granularity and specificity. The term “can influence” means the existence of a mistake can affect the emergence of another one.

cDedicated Implementation Element

Attribute Mappings

Interfaces and Super-Classes

Clone Code

Multi-partition Concern

Overly Communicative Concern

Concern Overlapping

Concern Interlacing

can be related

can be relatedcan be related

can be related

can influence

can influence

can be related

Bad Smells

Figure 9. Mapping Mistakes Relashionships

The mistake classified as Dedicated Implementation Elements is the one with more associations with other mistake categories (Figure 9). In this case, it can be related to the following mistakes: Concern Overlapping, Interfaces and Super-classes, Multi-Partition Concern, Code Clone, and Overly Communicative Concern. For instance, Figure 8 shows an example where the interface NotificationServiceInterface was missed (false negative) in a concern mapping. We can see this mistake under two different perspectives: (1) the entire NotificationServiceInterface interface or the dedicated implementation element was not mapped; or (2) the interface and super-class were not mapped. Hence, the mistake can be quantified as: Dedicated Implementation Element and/or Interfaces and Super-Classes. The other relationships follow the same reasoning and obviously they all need to be analyzed depending on each case. This explanation also applies to the relationship between the

106106

following mistakes: Interfaces and Super-classes and Overly Communicative Concern.

Another interesting relationship that we observed in our analyses is between Concern Interlacing and Attribute Mapping. This is due the cascade effect that the former mistake can generate; for example, according to Figure 7 there is a interlace involving the Scenarium concern and other ones related to the application core. This concern interlace might have affected the manifestation of the mapping mistake, which in turn caused the attributes being omitted from the mapping. Analogous reasoning applies to the relationship between Multi-Partition Concern and Code Clone mistakes.

V. EXPERIMENTAL EVALUATION This section presents the mapping evaluation by two

controlled experiments. It shows and discusses the main results in terms of the frequency that mistakes occur for each category presented in the previous section.

A. Experimental Procedures We used two software systems in these experiments in

order to verify the occurrence of mapping mistakes. These mappings were initially produced in our previous study [14] with a different purpose: to analyze the impact of concern mappings on crosscutting concern measures. The chosen systems were: (i) a typical Web-based system called Health Watcher and (ii) a software product line called MobileMedia. A set of (non-)crosscutting concerns was selected for each system. We selected five concerns from Health Watcher – Concurrency, Distribution, Exception Handling, Persistence, and View – and four concerns from MobileMedia – Exception Handling, Security, Sorting, and Favourites. Descriptions of these concerns are provided in Table II. Descriptions of the Exception Handling and Persistence concerns are provided in Table I. We choose these concerns because (i) they are representative in these software systems, and (ii) they are different in terms of functionality and granularity.

TABLE II. CONCERNS ANALYZED IN HEALTH WATCHER AND MOBILEMEDIA

Concerns Descriptions

Hea

lth W

atch

er

Concurrency It provides a control for avoiding inconsistent information stores in the system’s database.

Distribution It is responsible for externalizing the system services at the server side and supporting their distribution to the clients.

View It is responsible for processing the web requests submitted by the system users.

Mob

ileM

edia

Security

MobileMedia implements this concern to improve the user’s privacy. So, accesses to albums require authentication (i.e., login and password).

Sorting Provides a service for sorting media by the number of accesses.

Favourite Provides services to set favourite media and visualize them.

It is important to highlight that the OC framework does

not contain detailed and extensive documentation, but only the source code with comments is available (Section II.A). On the other hand, both Health Watcher and MobileMedia systems contain detailed documentation, such as application domain, description of the concerns, main

classes involved in the system architecture [21, 22]. These systems were chosen as they were developed by different designers, which complicated even further the concern mapping process. In addition, this diversity was also important to analyze how the mistakes are representative under the perspective of different developers.

In the first experiment, 13 undergraduate Computer Science students in their final year of study mapped concerns onto Health Watcher. In the second experiment, 13 post-graduate (Master and PhD) students mapped concerns onto MobileMedia. All these students claimed to have knowledge of Java and object-oriented programming, Web technologies, database systems, UML, and Software Engineering. Before starting the experiment, it was explained to the subjects how the mapping activities should be done. We focused on manual mappings in order to maintain the goal of the study (Section III.B). Subjects involved in the study received the source code of four classes of each system: Health Watcher and MobileMedia. These classes were selected because they include all the selected concerns. Additionally, they are relevant classes because they belong to different layers in the system architecture.

The detection and quantification of the mapping mistakes were performed in a manual way. First, we separated the mappings performed by each subject considering each concern. Second, we started to analyze the mapping and to associate with the types of mistake that we classified (Section IV). During our analyses, we only count one type of mistake per wrongly mapped code fragment. We selected the mistake category that, according to the developer, was the main cause for a mapping mistake. It is also important to highlight that other reasons for the mistakes may exist, and this may be further explored in future work. However, we focused our analysis of the mappings under the perspective of our mistake classification derived from the OC framework (Section IV).

B. Quantifying the Mapping Mistakes Figure 10 presents the occurrence rate of the mistakes

in both Health Watcher and MobileMedia systems considering the 26 subjects involved in the experiments. This figure is organized in terms of the types of mistakes, the total for each type of mistake, and the total of subjects that made it in both systems considering all the concerns. For the set of selected classes in Health Watcher, the following mistakes were not detected: Concern Overlapping, Code Clone, and Multi-partition Concern. On the other hand, it was not observed the following mistakes in the MobileMedia system: Code Clone, Overly Communicative Concern, and Interfaces and Super-Classes. The Code Clone was the only mistake not observed in both systems considering the set of selected classes in the experiments. The strategy that we followed to detect it was to search for code blocks with similarity degree equal or larger than 90% among the classes implementing the same concern. We realized that such code clones (and respective mapping mistakes) could not be found in the target applications used in the study because they were frameworks; it was visible that the goal of their implementation was to minimize the occurrence of redundant code.

107107

According to these results, we observed that the subjects tend to make the same mistakes. For example, the mistake Dedicated Implementation Elements was evident for almost all subjects considering the following three concerns: Distribution, View, and Sorting. Generally, these mistakes tend to occur when there are few methods (1-2) inside a class. However, this mistake happened even in a class totally responsible for implementing the View concern (the ServletInsertEmployee class). We observed that the subjects mapped only some lines of code of this class to the concern. Subjects should instead have mapped it completely.

# Subjects - 20

#Subjects - 5

#Subjects - 23

#Subjecs - 15

#Subjects - 13

#Subjects - 10

#Subjects - 13

0 20 40 60 80 100 120 140

Dedicated Element

Concern Overlapping

Concern Interlacing

Attribute Mapping

Clone Code

Interfaces and Super-Classes

Multi-Partition Concern

Overly Communicative Concern

Occurence Rate of the Mistakes

Figure 10. Occurence rate of the Mistakes in Health Watcher and

MobileMedia

The Concern Interlacing mistake was mainly related to specific crosscutting concerns, namely: Exception Handling, Security, and Concurrency. The reason is that crosscutting concerns are tangled with other concerns in object-oriented systems [14-16, 22]. The mistake Multi-Partition Concern was more evident in fine-grained concerns, such as Favorites and Sorting. Similarly to crosscutting concerns, the implementation of these fine-grained concerns tends to be scattered over several methods. The mistakes Attribute Mapping and Interfaces and Super-Classes tend to uniformly occur with different kinds of concerns. We discuss below some mistakes grouped by concern.

The Distribution concern is related to the following mistakes: Dedicated Implementation Element, Overly Communicative Concerns, and Interfaces and Super-Classes. Figure 11 illustrates an example of the mistake Interfaces and Super-Classes where all the subjects did not map the IFacade interface in the Health Watcher system. In addition, they did not also map the entire rmiFacadeExceptionHandling() method (lines 05-07) provoking the mistake Dedicated Implementation Element. Regarding the mistake Overly Communicative Concerns, 13 subjects made it because there is a large interaction between classes of the distribution concern and classes that access the database repository. As a consequence, they mapped the classes that access the database as being part of the Distribution concern. The same reasoning behind this mistake applies to the Persistence concern.

The mapping mistakes associated with the Security concern were: Attribute Mapping and Concern Interlacing. Most of the subjects did not map the passwd attribute, which contributes for implementing this concern. In addition, the mapping of this concern includes lines of

code tangled with other concerns. As a consequence, many subjects did not map all the lines of code. It is possible to perceive how these mistakes are related and in fact, one can influences another (Figure 9). However, this is not always true and for this reason, they were documented separately.

01 public class HealthWatcherFacade 02 extends java.rmi.server.UnicastRemoteObject 03 implements IFacade { 04 ... 05 private void rmiFacadeExceptionHandling(..){ 06 ... 07 } 08}

Figure 11. Piece of Code of the Distribution concern

01 public boolean handleCommand(Command) { 02 ... 03 showMediaList(...); 04 ... 05 showMediaList(...); 06 } 07 public void showMediaList(...) { 08 ... 09 if (sort) { 10 bubbleSort(medias); 11 } 12 ... 13} 14 private void exchange(MediaData, int, int) { 15 ... 16 } 17 public void bubbleSort(MediaData) { 18 ... 19 }

Figure 12. Piece of code of the Sorting Concern

The Sorting concern is implemented by the MediaListController and MediaData classes. This concern is scattered over four methods of the MediaListController class as illustrated in Figure 12. Two of these methods – exchange() and bubbleSort() – are totally dedicated to implement the Sorting concern. Two mistakes were recurrently detected with the mapping of this concern. First, most of the subjects (S2, S3, S5, S7, S12, S13) did not map at least one of these methods (Dedicated Element) or they incorrectly mapped other methods not related to this concern. Second, subjects only mapped part of the concern as characterized by the Multi-Partition Concern mistake. In fact, most of the subjects did not map the method calls that contribute for implementing this concern (lines 03, 05, 07, 09-11). The mistake of Multi-Partition Concern also occurred for the Favourite concern.

01 try { 02 password = getCurrentScreen(); 03 getAlbumData().createNewAlbum(..) 04 getAlbumData().addPassword(..); 05 } catch () { 06 ... 07 }

Figure 13. Piece of Code of the Exception Handling Concern

The mapping mistake mostly associated with the Exception Handling concern was Concern Interlacing. Figure 13 illustrates this mistake in the MobileMedia

108108

system. The main issue with the mapping of this concern is that most of the subjects mapped the entire code inside try-catch statements. That is, subjects tend to associate the entire try block with the Exception Handling concern. As illustrated in Figure 13, the pieces of code mapped to this concern do not actually include lines 02 to 04. Basically, this mistake is due to the interlacing of the Exception Handling and Security concerns.

C. Non-Categorized Mistakes During our analyses, we observed that some mistakes

were not related to our proposed classification. For this reason, these mistakes were not considered in our count.

Non-related Lines of Code. We observed that some subjects mapped pieces of code totally non-related to any type of the selected concerns during the mapping process. These mapped lines of code were difficult to associate with the categories of mistakes that we previously identified. In general, these lines did not have any relationship with the other lines of code related to the concern which the subjects mapped. We characterized these lines of code as isolated pieces of code in the mapping. For this reason, it is complicated to associate them with some documented mistake. Therefore, it is important to investigate whether it is important to consider this case as a separate mistake category. A possibility is that subjects merely forgot to eliminate these “dead concern assignments” after they found out the correct ones.

Categories of Multi-Partition Concern. We observed that the occurrence of Multi-Partition Concern mistakes might be classified in terms of impact. This means to calculate the number of different lines of code mapped. This is because the subjects tend to map different pieces of the code due to the scattering of the concern over several modules. This way, these different pieces of code could determine which degree of impact this mistake has. In our analyses, we only count the existence of Multi-Partition Concern mistake without establishing levels of impact for it.

In this sense, the classification of mistakes presented in this study is a first step for further improvements and, consequently, other types of mistakes can be added. For this reason, it is important to highlight that it is necessary to analyze other types of mistakes when considering other systems.

VI. THREATS TO VALIDITY This section discusses the threats to validity according

to the classification proposed by [31]. Conclusion Validity. We identified two possible

threats to this category: (i) reliability of mappings: subjective decisions were made during the mapping process of each concern; and (ii) heterogeneity of subjects: two subjects were involved in the mapping of the OC framework and 26 subjects were involved in the mapping of the Health Watcher and MobileMedia. To reduce the risks associated with the category (i) for all the applications, the subjects received instructions before starting the concern mapping. In the OC framework the subjects studied the system before mapping and there were meetings with the development team in order to obtain the needed knowledge. In case of the controlled experiments, the subjects received instructions and explanations about

the system and concerns before starting the mapping. We tried to reduce the risk (ii) involving subjects with similar knowledge. In the OC framework, two experienced developers were responsible for this activity, whereas in Health Watcher and MobileMedia a group of 13 undergraduate and post-graduate Computer Science students were selected respectively (Section V.A).

Construct Validity. We identified the following risks: (i) mapping mistakes treated in an inadequate way: specific mistakes that should be treated in a different way might have biased the results; (ii) interaction of the subjects with the system: the subjects were aware of the proposed study. To reduce the risks (i) we defined procedures to be followed during the process of concern mapping and quantification of them. Our quantification strategy of the mistakes was performed in a manual way in which we analyzed them under a perspective of each concern. However, other deductions could be performed as future work mainly based on the mapping mistakes relationships. The mistakes that were not related to our category were not considered (Section V.C) in our analysis. Regarding the risk (ii) the subjects were prepared to accomplish this task through the instructions they received before the mapping. These subjects were selected thanks to their knowledge in the relevant topics addressed in the experiments. In addition, they performed these experiments in a voluntary way.

Internal and External Validity. We only identified one possible risk for internal validity: the complexity of the concerns; this complexity might have made one subject making more mistakes than others. However, this threat was minimized because all the subjects mapped the same set of concerns using the same versions of the system. Threats to external validity are conditions that allow results generalization. The first identified risk was the selected applications. In order to minimize this threat, we chose an industry applications from the logistic domain (OC framework), a typical Web-based system (Health Watcher) [15, 21], and a software product line (MobileMedia) [22]. All the applications are representative of different domains and they have a significant size. The OC framework is an industry case study that has been developed since 2006. Health Watcher and MobileMedia were extensively used and evaluated in previous research work [15, 17, 21-22]. In addition, these applications contain many types of (non-)crosscutting concerns with different complexity degrees. This way, they enabled us to observe the differences among the mappings. Therefore, this study represents a first step towards a more complete classification. It is important to mention that we only evaluated a part of these systems in the controlled experiments (four classes of each one). As a consequence, we do not observe, for example, the mistakes of Code Clone in these classes.

VII. FINAL REMARK AND FUTURE WORK Developers and maintainers need constantly to

understand, restructure, and extend concerns during the software maintenance activities. The activity of concern mapping is important to allow the developers to have the full knowledge about all the implementation elements associated with a given concern. Even though there are many tools and techniques that facilitate the concern mapping process, the developers still need to verify if their

109109

mappings are correct and complete. However, there is no characterization and classification of which mapping mistakes are more frequent in the literature.

To fill this gap, this paper presented and discussed a series of recurring mistakes made by developers when mapping a set of (non-)crosscutting concerns. Initially, we used an evolving industry system as exploratory study in order to observe the mistakes. In this study we classified the mistakes into two categories, depending whether a concern or module property was the most influential factor for the mistake. For each category, we described a set of mistakes and explained how they occur. As a second step, we verified the frequency of the mistake categories through 2 controlled experiments. These experiments were important in order to verify if in fact the categorized mistakes in this study are representative and relevant. These experiments involved 26 subjects and 2 different systems with various concerns. Our analyses demonstrated the occurrence of many mapping mistakes documented in this study.

As ongoing work, we have been working on the definition and formalization of a set of heuristic rules for helping developers to identify opportunities to improve their concern mappings. For example, a heuristic being defined is for detecting omitted concern attributes. The basic idea of this heuristic is to first identify the methods already mapped to a given concern and then, based on the attribute access graphs, we detect which class attributes are used by the mapped methods [32]. In addition, we intend to run other experiments in order to verify the accuracy of our heuristics. As future work, we intend to interview the subjects who will participate in these experiments with the goal of: (i) trying to find out which the real causes of their mistakes, and (ii) verifying the coverage of our mistakes taxonomy. Additionally, we also intend to better explore the information about the evolution of concern mappings. The goal is to verify the representativeness of our mistakes as well as capture other kinds of mistakes.

ACKNOWLEDGMENT This work has received full or partial funding from the

following agencies and projects: for Camila – CNPq (141242/2009-4); for Alessandro - FAPERJ (distinguished scientist grant E-26/102.211/2009), CNPq (productivity grant 305526/2009-0) and PUC-Rio (productivity grant); for all the authors: Universal Project grants (483882/2009-7 and 483699/2009-8) and Marinha do Brasil (MDArte and MDEvol projects); for Eduardo - FAPEMIG grant APQ-02932-10.

REFERENCES

[1] Robillard, M. P. and Murphy, G. C. “Representing Concerns in Source Code”. ACM TOSEM 16, 1 (Feb. 2007).

[2] Eisenbarth. T., Koschke R., and Simon, D. “Locating Features in Source Code”. IEEE TSE. v.29 n.3, pp. 210-224, 2003.

[3] Antoniol, G. and Gueheneuc, Y. “Feature Identification: A Novel Approach and a Case Study,” In ICSM, pp. 357-366, 2005.

[4] Biggerstaff, T. J., Mitbander, B. G., and Webster, D. E. “Program Understanding and the Concept Assignment Problem”. Commun. ACM 37, 5, pp. 72-82, 1994.

[5] Wilde, N. and Scully, M. “Software Reconnaissance: Mapping Program Features to Code”. Software Maintenance: Research and Practice, 7(1), pp. 49-62, 1995.

[6] Chen, K. and Rajlich, V. “Case Study of Feature Location Using Dependence Graph”. In IWPC. pp. 241 – 247, 2000.

[7] Wong, W. E., et al. “Locating Program Features using Execution Slices”. In ASSET. IEEE Computer Society, pp. 194–203, 1999.

[8] Eisenberg, A. and de Volder, K. “Dynamic Feature Traces: Finding Features in Unfamiliar Code”. In ICSM, pp. 337-346, 2005.

[9] Koschke, R. and Quante, J. “On Dynamic Feature Location”. In ASE, NY, pp. 86-95, 2005.

[10] Rohatgi, A., Hamou-Lhadj. A., and Riling, J. “An Approach for Mapping Features to Code Based on Static and Dynamic Analysis”. In ICPC, v. 0, pp. 236-241, 2008.

[11] Poshyvanyk, D., et al. “Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval”. IEEE Trans. Softw. Eng. 33, 6, pp. 420-432. 2007.

[12] Robillard, M. P., et al. “An Empirical Study of the Concept Assignment Problem,” Technical Report SOCS-TR-2007.3, School of Computer Science, McGill University, June 2007.

[13] Revelle, M., Broadbent T., and Coppit D. “Understanding Concerns in Software: Insights Gained from Two Case Studies”. In IWPC, pp. 23-32, 2005.

[14] Figueiredo, E. et al. “On the Impact of Crosscutting Concern Projection on Code Measurement”. In AOSD, Brazil, 2011 (to appear).

[15] Greenwood, P., et al. “On the Impact of Aspectual Decompositions on Design Stability: An Empirical Study”. In ECOOP, pp. 176-200, 2007.

[16] Figueiredo, E., et al. “Crosscutting Patterns and Design Stability: An Exploratory Analysis”. In ICPC, pp.138 – 147, 2009.

[17] Ferrari, F., et al. “An Exploratory Study of Fault-Proneness in Evolving Aspect-Oriented Programs. In ICSE, pp. 65-74, 2010.

[18] Garcia, A., et al. “Modularizing Design Patterns with Aspects: a Quantitative Study”. In AOSD, pp. 3-14, 2005.

[19] Figueiredo, E., et al. “On the Maintainability of Aspect-Oriented Software: A Concern-Oriented Measurement Framework”. In: CSMR, pp. 183-192, 2008.

[20] Robillard, M. P. and Weigand-Warr, F. “ConcernMapper: Simple View-based Separation of Scattered Concerns”. In OOPSLA Workshop on Eclipse Technology Exchange, pp. 65-69, 2005.

[21] Soares, S., et al. “Implementing Distribution and Persistence Aspects with AspectJ”. In OOPSLA, pp.174-190, 2002.

[22] Figueiredo, E., et al. “Evolving Software Product Lines with Aspects: An Empirical Study on Design Stability”. In ICSE, pp. 261-270, 2008.

[23] Mens, K., et al. “Co-Evolving Code and Design with Intensional Views”. Comput. Lang. Syst. Struct. 32, 2-3, pp. 140-156, Jul 2006.

[24] Savage, T., Revelle, M., and Poshyvanyk, D. “FLAT3: Feature Location and Textual Tracing Tool”. In ICSE – Vol 2. pp. 255-258, 2010.

[25] Tourwe, T. and Mens, K. “Mining Aspectual Views using Formal Concept Analysis”. In SCAM., Washington, DC, pp. 97-106, 2004.

[26] Adams, B., et al. “Identifying Crosscutting Concerns Using Historical Code Changes”. In ICSE – Vol. 1, pp. 305-314, 2010.

[27] Kellens, A., Mens, K., and Tonella, P. “A Survey of Automated Code-Level Aspect Mining Techniques”. In Transactions on Aspect Oriented Software Development, pp. 145-164, 2007.

[28] Fowler, M., et al. Refactoring: Improving the Design of Existing Code, Addison-Wesley Professional, 1999.

[29] Carneiro, G., et al. “Identifying Code Smells with Multiple Concern Views”. In SBES. Bahia, Brazil, pp. 128-137, 2010.

[30] Kim, M., et. al. “An Empirical Study of Code Clone Genealogies”. SIGSOFT Softw. Eng. Notes 30, 5, pp. 187-196, 2005.

[31] Wohlin, C., et al. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers, 2000.

[32] Nunes, C. “On the Proactive Identification of Mistakes on Concern Mapping Tasks”. In AOSD - Student Research Competition, 2011 (to appear).

110110