[lecture notes in computer science] computational intelligence volume 4114 || software metrics data...

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNAI 4114, pp. 959 – 964, 2006. © Springer-Verlag Berlin Heidelberg 2006

Software Metrics Data Clustering for Quality Prediction

Bingbing Yang, Xin Zheng, and Ping Guo*

Image Processing and Pattern Recognition Laboratory, Beijing Normal University, 100875, China

[email protected], [email protected]

Abstract. Software metrics are collected at various phases of the software development process. These metrics contain the information of software and can be used to predict software quality in the early stage of software life cycle. Intelligent computing techniques such as data mining can be applied in the study of software quality by analyzing software metrics. Clustering analysis, which is one of data mining techniques, is adopted to build the software quality prediction models in early period of software testing. In this paper, three clustering methods, k-means, fuzzy c-means and Gaussian mixture model, are investigated for the analysis of two real-world software metric datasets. The experiment results show that the best method in predicting software quality is dependent on practical dataset, and clustering analysis technique has advantages in software quality prediction since it can be used in the case having little prior knowledge.

1 Introduction

Software reliability engineering is one of the most important aspects of software quality [1]. Recent studies show that software metrics can be used in software module fault-proneness prediction [2]. A software module has a series of metrics, some of which are related to the module’s fault-proneness. Norman E. Fenton [2] presented software metrics introduction and the direction. Many research work about the software quality prediction using the relationship between software metrics and software module’s fault-proneness have been done in the last decades [3, 4, 7].

There are several techniques that are proposed to classify the modules for identifying fault-prone modules. From learning point of view, these techniques can be divided into two categories: supervised learning and unsupervised learning. Supervised learning method is based on the prior knowledge, that means enough samples must be provided to train the prediction model. On the other side, unsupervised learning methods do not need prior knowledge. Clustering analysis, which can groups the software modules by software metrics, is one of the unsupervised learning methods.

In this paper, three clustering methods are used in predicting two software project qualities, these clustering methods include k-means, fuzzy c-means and Gaussian mixture model with EM algorithm. The purpose is to perform comparative studies for these clustering methods with new datasets in order to obtain the most suitable technique to build the prediction model. In Section 2, the datasets are introduced and * Corresponding author.

960 B. Yang, X. Zheng, and P. Guo

the metrics chosen are illustrated. In Section 3, the three clustering methods are reviewed. In Section 4, the empirical study results are presented and analyzed. Finally, the discussion and, conclusion are given in Section 5.

2 Datasets and Metrics Description

In this section, we introduce the metrics datasets extracted from two real-world software project. These metrics are extracted by Krakatau Professional tool, which can extract more than twenty metrics of projects, files and functions. In the experiment, we choose some common ones from extracted metrics.

2.1 Celestial Spectrum Analysis System

Celestial Spectrum Analysis System (CSAS) is a subsystem of a Large Sky Area Multi-Object Spectroscopic Telescope (LAMOST) national key project of China. CASS is developed by more than 20 students of two college and graduate school with standard C program language, and it is well tested based on the function of the project. One function is considered as one module, CASS includes 70 modules each of which has a risky level obtained by software testing.We select 12 metrics from the software source code to anlyze.

2.2 Redundant Strapped-Down Inertial Measurement Unit project

The Redundant Strapped-Down Inertial Measurement Unit (RSDIMU) project involved more than one hundred students. RSDIMU has 34 versions each of which is developed independently, it is also developed by standard C. The reliability properties of the software were analyzed by Cai and Lyu [8] . The details of the project and development procedures are discussed in [5]. In this paper, we used all the modules of every version counted 223 each of which has its faults number. RSDIMU’s module is based on files, which is different from CSAS. Each file is counted as one module. We select 11 metrics from the source code to analyze.

3 Modeling Methodology

In this section, we review the three modeling methods mentioned above. These methods we choose to compare are the classical clustering methods.

3.1 K-Means Clustering

K-means is one of the typical clustering methods. It has been widely used in many fields. K-means classify the objects into k clusters and make the objects be high similar in the same cluster. The squared-error convergent criterion is defined as

∑∑= ∈

−=k

i Cpi

i

mpE1

2

(1)

Software Metrics Data Clustering for Quality Prediction 961

Where E is the sum of square-error for all objects in the database, p is the point in space representing a given object, and mi is the mean of cluster Ci. The algorithm attempts to determine k partitions that minimize the squared-error function.

3.2 Fuzzy C-Means Clustering

Fuzzy clustering is different from other clustering methods. For example, k-means clustering give a result with affirmed partition index, while fuzzy clustering’s result give the probability result. That means an object is either a member of one particular subset, or it is not a member of that subset using traditional clustering methods. But fuzzy clustering permit one object partly belongs to more than one subset by a degree indicated by the membership value for that point in that cluster, so long as the sum of that object’s membership values is 1.

Fuzzy c-means clustering is an iterative algorithm which attempts to cluster through minimizing the cost function

∑∑∈ ∈

⋅=Xx Kk

m kxdkxffL ),())(()( 2 (2)

where f is a fuzzy partition, f(x)(k) is the membership of pattern x in cluster k, m is the fuzzifier exponent, and d(x; k) is the distance between pattern x and the prototype (centroid ) of the kth cluster. More detail introduction can be found in reference [6].

3.3 Gaussian Mixture Model with EM Algorithm

Gaussian mixture model is based on Gaussian probability distribution. To explain the method we assume that there are k partitions in a data set, then there will be k Gaussian models forming a mixture with joint probability density:

∑=

Σ=Θk

iiii mxGxp

1

),,(),( α with 0≥iα , and ∑=

=k

ii

1

1α (3)

2/12/

1

)2(

)]()(21

exp[),,(

i

d

ii

T

i

ii

mxmxmxG

Σ

−Σ−−=Σ

−

π (4)

Here G is multivariate Gaussian density function, x denotes random vector (which integrates a variety of software metrics), d is the dimension of x, and parameter Θ={αi,mi,Σi} is a set of finite mixture model parameter vectors. Here αi is the mixing weight, mi is the mean vector, and Σi is the covariance matrix of the ith component [7].

To estimate the mixture model parameters, we use the maximum likelihood learning (ML) with EM algorithm which is described in [7] in details.

4 Experiments and Analysis

Clustering method can group the software modules into any amount of cluster. In the experiment, we choose the cluster number to be 2 or 3 in order to compare the methods’ performance. This partition method is meaningful because if number of module group is greater than 3, it can not give the testing manager useful information to plan how to do the test.


In practice, not only the entire classification accuracy is important, but also the pureness of fault-prone cluster. That is because if the pureness of fault-prone cluster is low, it means many fault-prone modules are not predicted accurately. They may not be tested well in testing phrase so that the software product’s quality will be low. So in our experiment, we also consider the fault-prone cluster’s pureness as an important point.

4.1 Experiment with CASS

Every CASS’s module is tested and has a risky level. The lower risky level is, the more non-fault-prone module is. At the situation of 2 objective clusters, we choose the non-fault-prone cluster with the risky level 0 and fault-prone cluster’s risky level greater than 0. At the situation of 3 objective clusters, we choose the non-fault-prone cluster with the risky level 0, the mean of fault-prone cluster with the risky level is in 0 to 1 and the fault-prone cluster greater than 1.

Table 1 shows the result of 2 clusters instance, and table 2 shows the result of 3.

Table 1. Result of CASS with 2 objective clusters. Row-caption presents the methods and Col-caption presents the entire accuracy and each objective cluster’s pureness.

Entire High-risky Low-risky

K-means 68.60% 76.80% 64.30%

Fuzzy c-means 61.40% 72.40% 8.30%

Mixture model 55.70% 69.20% 38.70%

Table 2. Result of CASS with 3 objective clusters. Row-caption presents the methods and Col-caption presents the entire accuracy and each objective cluster’s pureness.

Entire High-risky Mean-risky Low-risky

K-means 75.70% 91.70% 46.20% 66.70%

Fuzzy c-means 58.60% 46.70% 33.30% 80%

Mixture model 57.10% 70.20% 33.30% 30%

From the result, we know that k-means shows the best effect of these three methods. In 2-clusters instance, k-means gives the highest entire accuracy 68.60% and fault-prone pureness 76.80%. In 3-clusters instance, k-means also gives the highest entire accuracy 75.70% and fault-prone pureness 91.70%. We also see that except fuzzy c-means the clustering effect of 3-clusters is better than 2-clusters. But the accuracy of fuzzy c-means in 3-clusters instance is only a little lower than that in 2-clusters instance. So the effect of 3-clusters is better than that of 2-clusters.

4.2 Experiment of RSDIMU

RSDIMU’s modules are different from CASS’s, they are based on files but not on functions. So the metrics used in the experiment are not the same as CASS. Another difference is that RSDIMU’s modules do not have the risky level, but have the

Software Metrics Data Clustering for Quality Prediction 963

number of faults. At the situation of 2 objective clusters, we choose the non-fault-prone cluster with the fault number 0 and fault-prone cluster’s fault number greater than 0. At the situation of 3 objective clusters, we choose the non-fault-prone cluster with the fault number 0, the mean of fault-prone cluster with the fault number is in 1 to 2 and the fault-prone cluster greater than 2.

Table 3 shows the result of 2 clusters instance, and table 4 shows the result of 3.

Table 3. Result of RSDIMU with 2 objective clusters. Row-caption presents the methods and Col-caption presents the entire accuracy and each objective cluster’s pureness.

Entire High-risky Low-risky

K-means 55.60% 48.40% 64.90%

Fuzzy c-means 55.60% 48.40% 64.90%

Mixture model 72.20% 81.90% 64.10%

Table 4. Result of RSDIMU with 3 objective clusters. Row-caption presents the methods and Col-caption presents the entire accuracy and each objective cluster’s pureness.

Entire High-risky Mean-risky Low-risky

K-means 52% 58.70% 19.60% 63.90%

Fuzzy c-means 50.20% 66.70% 21.60% 64%

Mixture model 55.30% 78.70% 52.90% 35%

From the result, we see that the result is quite opposite to the CASS. Mixture

model with EM shows the best effect of these three methods. In 2-clusters instance, Mixture model with EM gives the highest entire accuracy 72.20% and fault-prone pureness 81.90%. In 3-clusters instance, Mixture model with EM gives also gives the highest entire accuracy 55.30% and fault-prone pureness 78.70%. We also see that all the three clustering methods’ effects of 2-clusters are better than 3-clusters. So the effect of 2-clusters is better than that of 3-clusters.

5 Conclusions and Future Work

By analyzing the experiment results, we can conclude that one clustering method which does well in a dataset may not also be good at another practice dataset. Besides that, different numbers of objective clusters also affect the prediction precision. What we also find is that clustering analysis is an effective method to predict the software quality in the early software development stage.

A proper clustering method will be effective to help software manager predict the software quality. It does not need any software fault measurement which must be prepared at the beginning of the test phrase. But how to select the proper method and what number of objective clusters are challenges. In our future work, we will focus on how to choose the method automatically based on new algorithms, and possible solution is to analyze the distribution of modules combining with cluster number selection criteria [9] to determine the suitable number of objective clusters.


References

1. Lyu, M.R.: Handbook of software Reliability Engineering. IEEE Computer Society Press. McGraw Hill (1996)

2. Fenton, N.E., Neil, M.: Software metrics: successes, failures and new directions. The Journal of Systems and Software 47 (1999) 149-157

3. Gyimothy, T., Ferenc, R., Siket, I.: Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. IEEE Transactions on Software Engineering, Vol. 31, No. 10 (2005)

4. Lanning, D.L., Khoshgoftaar, T., Pandya, A.S.: A comparative study of pattern recognition techniques for quality evaluation of telecommunications software. IEEE J. Selected Areas in Communication, Vol. 12, No. 2 (1994) 279–291

5. Lyu, M.R., Huang, Z., Sze, K.S., Cai, X.: An empirical study on testing and fault tolerance for software reliability engineering. In Proceedings 14th IEEE International Symposium on Software Reliability Engineering (ISSRE’2003). Denver, Colorado (2003) 119-130

6. Dick, S., Meeks, A., Last, M., Bunked, H., Kandel, A.: Data mining in software metrics databases. Fuzzy Sets and Systems 145 (2004) 81–110

7. Guo, P., Lyu, M.R.: Software Quality Prediction Using Mixture Models with EM Algorithm. Proceedings of the First Asia-Pacific Conference on Quality Software (APAQS 2000), ed. by TSE & CHEN. Hong Kong. (2000) 69-78

8. Cai, X., Lyu, M.R.: An Empirical Study on Reliability Modeling for Diverse Software Systems. Proceedings of the 15th International Symposium on Software Reliability Engineering (2004)

9. Guo, P., Chen, C.L.P., Lyu, M.R.: Cluster Number Selection for a Small Set of Samples Using the Bayesian Ying-Yang Model. IEEE trans. Neural Network, Vol.13, No.3 (2002) 757-763

[lecture notes in computer science] computational intelligence volume 4114 || software metrics data...

Documents