ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf

Download ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf

Post on 14-Apr-2018

214 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>7/29/2019 ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf</p><p> 1/14</p><p>On Predicting Software Development Effort using Machine Learning Techniques and Local Data</p><p>Vol. 2, No. 2, July-December 2010 12</p><p>* Institute of Information Technology in Management, Faculty of Economics and Management, University of Szczecin, PolanE-mail: lukrad@uoo.univ.szczecin.pl, woodieh@gmail.com</p><p>INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND COMPUT</p><p> International Science Press, ISSN: 2229-7</p><p>ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING</p><p>MACHINE LEARNING TECHNIQUES AND LOCAL DATA</p><p>Lukasz Radlinski and Wladyslaw Hoffmann</p><p>This paper analyses the accuracy of predictions for software development effort using various machine learning techniques. Tmain aim is to investigate the stability of these predictions by analyzing if particular techniques achieve a similar level accuracy for different datasets. Two key assumptions are that (1) predictions are performed using local empirical data and very little expert input is required. The study involves using 23 machine learning techniques with four publicly availabdatasets: COCOMO, Desharnais, Maxwell and QQDefects. The results show that the accuracy of predictions for each techniqvaries depending on the dataset used. With feature selection most techniques provide higher predictive accuracy and this accurais more stable across different datasets. The highest positive impact of feature selection on the accuracy has been observed for tK* technique, which has generated the most accurate predictions across all datasets.</p><p>Keywords: effort prediction, local data, process factors, machine learning, prediction accuracy</p><p>1. INTRODUCTION</p><p>Software development effort can be predicted usingvarious approaches. Some of them require largedataset of past projects while others require stronginput from domain expert. Some approaches areeasy to use while others are difficult to follow andtime consuming. Jrgensen and Shepperd [10]performed a systematic review of softwaredevelopment effort estimation studies. They haveprovided and analyzed an extensive list of relevantpublications in their paper. They have alsoperformed various classifications for this subject.</p><p>In this study, we focus on the subfield of thispopular research field by attempting to answer thefollowing main question: Is it possible to easilypredict software development effort from localdata?</p><p>There are two key constraints limiting the scope</p><p>of this study: easy prediction and using local data.By easy prediction we mean using such procedure,which can be followed even without havingbackground in quantitative statistical techniques.Some of such techniques can often be properlyapplied only after performing various checksrelated to input data, for example about the</p><p>normality of distributions or linear relationshibetween variables to name just a couple of thsimplest. The procedures of using such techniqumay be complex and time-consuming. Thus, in thstudy we examine a performance of a set of machinlearning techniques that can be applied easily the empirical input data. Naturally, they alsrequire some basic data preparation but this step</p><p>very simple.The second constraint is to use local data. B</p><p>this, we mean data from a single software companor a software development division of a largorganization. Various researchers have analyzeeffort predictions from local and cross-compandata. In [12, 16] the authors compare the accuracof predictions from local and cross-company datincluding results achieved in earlier work by othauthors. This analysis revealed that in somexperiments similar accuracy have been achieve</p><p>while in other experiments predictions based olocal data have been more accurate than based ocross-company data. When gathering croscompany data there are typically more peopinvolved than when gathering local datFurthermore, for cross-company data there islarger risk in different understanding the data b</p>mailto:lukrad@uoo.univ.szczecin.plmailto:woodieh@gmail.comINTERNATIONALmailto:woodieh@gmail.comINTERNATIONALmailto:woodieh@gmail.comINTERNATIONALmailto:lukrad@uoo.univ.szczecin.pl</li><li><p>7/29/2019 ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf</p><p> 2/14</p><p>Lukasz Radlinski and Wladyslaw Hoffmann</p><p>124 International Journal of Software Engineering and Computi</p><p>people who gather them. For example, theassessment very high for team experience ishighly subjective not only for different individualsbut also from a perspective of specific company, itsenvironment and culture. In addition, there arevarious ways of calculating project effort depending</p><p>on the types of activities included, people involvedin development, and many others. For thesereasons, to reduce the difficulties associated withusing cross-company data, we have decided to useonly local data of software projects.</p><p>To support answering the main researchquestion of this paper we analyze the following setof more detailed questions:</p><p> What is the accuracy of predictionsobtained using various machine learningtechniques?</p><p> What are the differences betweenpredictions using these techniques?</p><p> What is the impact of feature selection onachieved accuracy of predictions?</p><p> How stable are predictions from particulartechnique across various datasets?</p><p>This paper is organized as follows: Section 2provides information on the procedure, techniquesand datasets used in this experiment. Main pointsof the data preparation stage have beensummarized in Section 3. Section 4 discusses</p><p>achieved results. An overview of the threats tovalidity has been included in Section 5. We sum upthe whole study in Section 6.</p><p>2. RESEARCH APPROACH</p><p>2.1.Dataset Used</p><p>In this study we have used four publicly availabledatasets: COCOMO [2, 3], Desharnais [5], Maxwell</p><p>Table 1Summary of Datasets Used</p><p>Datasets Number of Project Effort Number of numeric Number of ordinal Number of nomin</p><p>cases size predictors predictors predict</p><p>COCOMO 93 KLOC person- 2 15[2, 3] months</p><p>Desharnais 81 function person- 7 0[5] points hours</p><p>Maxwell 62 function person- 3 15[15] points hours</p><p>QQDefects 29 KLOC person- 1 27[6, 7] hours</p><p>[15], and QQDefects [6, 7]. They have beesummarized in Table 1. All datasets are availabin the PROMISE repository [4]. Most of thedatasets, i.e. COCOMO, Desharnais, and Maxwehave been widely used in earlier experimentSelected results from these studies have bee</p><p>discussed in Section 4.3.QQDefects has been originally used in [6, 7] </p><p>predict the number of defects. In this study we hanot used a variable number of defects and attemto use the dataset for effort prediction. However,slightly modified version of this dataset has beeused. This includes an additional predictor projetype and additional cases for which the data hnot been previously published. It should be notethat QQDefects is the only dataset where thnumber of variables (31) exceeds the number </p><p>cases (29).Tables 25 list the variables in particular datas</p><p>used in this study. Most original datasets contaother variables. However, since they are nrelevant to this study, we have removed them annot listed in these tables. Detailed descriptions datasets and variables are available in relevaoriginal studies.</p><p>We can observe that there are high differencbetween these datasets, related to the number cases and number of predictors of specific type</p><p>Additionally, projects in different datasets atypically described from different point of view. Fexample, Maxwell dataset contains data on thdevelopment process but is more focused on thnature of projects. On the other hand, in QQDefecdataset almost all variables describe thdevelopment process. Furthermore, very fevariables have been included in all datasets, </p></li><li><p>7/29/2019 ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf</p><p> 3/14</p><p>On Predicting Software Development Effort using Machine Learning Techniques and Local Data</p><p>Vol. 2, No. 2, July-December 2010 12</p><p>Table 2List of Variables in COCOMO Dataset</p><p>Symbol Name Type Symbol Name Type</p><p>projectname name of project nominal acap analysts capability ordina</p><p>cat category of application nominal aexp application experience ordina</p><p>forg flight or ground system nominal pcap programmers capability ordina</p><p>center which NASA center nominal vexp virtual machine experience ordina</p><p>mode semidetached, embedded, organic nominal lexp language experience ordina</p><p>rely required software reliability ordinal modp modern programing ordinapractices</p><p>data data base size ordinal tool use of software tools ordina</p><p>cplx process complexity ordinal sced schedule constraint ordina</p><p>time time constraint for cpu ordinal year year of development numer</p><p>stor main memory constraint ordinal kloc number of KLOC numer</p><p>virt machine volatility ordinal effort development effort numer</p><p>turn turnaround time ordinal</p><p>Table 3List of Variables in Desharnais Dataset</p><p>Symbol Name Type Symbol Name Type</p><p>TeamExp team experience numeric Entities number of entities numer</p><p>Language programming language nominal FPNon function points non- numerAdjust adjusted</p><p>ManagerExp manager experience numeric Adjustment function point complexity numeradjustment factor</p><p>YearEnd year project ended numeric Effort actual effort numer</p><p>Transactions number of transactions numeric</p><p>Table 4List of Variables in Maxwell Dataset</p><p>Symbol Name Type Symbol Name Type</p><p>App application type nominal Complex software logical complexity ordina</p><p>Har hardware platform nominal Reqvol requirements volatility ordina</p><p>Dba type of database nominal Qualreq quality requirements ordina</p><p>Ifc type of user interface nominal Effreq efficiency requirements ordina</p><p>Source where developed nominal Instreq installation requirements ordina</p><p>Telonuse Telon use nominal Stanskil staff analysis skills ordina</p><p>Nlan number of languages numeric Stappknow staff application knowledge ordina</p><p>Custpart customer participation ordinal Sttoolskil staff tool skills ordina</p><p>Devenv development environment ordinal Stteamskil staff team skills ordinaadequacy</p><p>Staffav staff availability ordinal Size application size numer</p><p>Standards standards use ordinal Time start year numer</p><p>Methods methods use ordinal Effort effort numer</p><p>Tools tools use ordinal</p></li><li><p>7/29/2019 ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf</p><p> 4/14</p><p>Lukasz Radlinski and Wladyslaw Hoffmann</p><p>126 International Journal of Software Engineering and Computi</p><p>Table 5List of Variables in QQDefects Dataset</p><p>Symbol Name Type Symbol Name Type</p><p>specexp Relevant experience of ordinal stexpi Staff experience - independent ordinaspec &amp; doc staff test</p><p>qudoc Quality of documentation ordinal qutestc Quality of documented test ordina</p><p>inspected casesregsprev Regularity of spec &amp; ordinal devtrq Dev. staff training quality ordina</p><p>doc reviews</p><p>stproc Standard procedures followed ordinal coman Configuration management ordina</p><p>reveff Review process effectiveness ordinal pplan Project planning ordina</p><p>specdef Spec defects discovered in review ordinal scco Scale of distributed ordinacommunication</p><p>reqst Requirements stability ordinal stai Stakeholder involvement ordina</p><p>compl Complexity of new functionality ordinal cusi Customer involvement ordina</p><p>scfunc Scale of new functionality ordinal vman Vendor management ordinaimplemented</p><p>numio Total no. of inputs and outputs ordinal vend Vendors nominadevexp Relevant development staff ordinal icomm Internal communication / ordina</p><p>experience interaction</p><p>procap Programmer capability ordinal pmat Process maturity ordina</p><p>defpro Defined processes followed ordinal ptype Project type nomina</p><p>devstmo Development staff motivation ordinal kloc project size in KLoC numer</p><p>tprodef Testing process well defined ordinal effort effort numer</p><p>stexpu Staff experience - unit test ordinal</p><p>generally: more than a one dataset. Even in such</p><p>rare cases, some of these variables, for exampleproject size and effort, are typically expressed ondifferent scales. These differences may cause thatmachine learning techniques may achieve differentlevels of accuracy depending on the dataset, i.e. aparticular technique may perform better in onedataset but worse in another. We examine this whendiscussing the results in Section 4.2.</p><p>2.2.Research Procedure</p><p>As indicated in the Section 1, the main assumption</p><p>for this study is related with applying a researchprocedure that is simple and easy to follow. Thus,this procedure consists only of very few basic steps.</p><p>In the first step an expert has to review thevariables in each dataset and keep only these whichmay be potential predictors for effort, i.e. which aretypically known at the stage of the project wheneffort needs to be predicted. Thus, variables suchas duration and number of defects have been</p><p>removed because their values are not known </p><p>advance.The second step covers data transformation </p><p>new scales, i.e. discretisation. This step is necessabecause most techniques selected for use in thexperiment work only with categorized data. Theare various methods that perform this tasautomatically, for example to achieve equafrequency or equal-width intervals. However, sucalgorithms often define intervals with their startinand ending values that are not friendly, i.e. nrounded to any significant number. Thus, in th</p><p>study an expert has performed a discretization fall numeric variables with the aim to definintervals with meaningful values and similfrequencies.</p><p>Some machine learning techniques cannot bused with datasets that contain missing data. In ostudy two datasets, Desharnais and QQDefectcontain missing data. In such cases such missinvalues are often filled by mean value for numer</p></li><li><p>7/29/2019 ON PREDICTING SOFTWARE DEVELOPMENT EFFORT USING MACHINE LEARNING TECHNIQUES AND LOCAL DATA.pdf</p><p> 5/14</p><p>On Predicting Software Development Effort using Machine Learning Techniques and Local Data</p><p>Vol. 2, No. 2, July-December 2010 12</p><p>variables and the mode for nominal variables.However, to eliminate the bias related with suchapproach, we have decided to use only thosetechniques that can work with missing data.</p><p>In the main step, each selected technique hasbeen used to generate predictions for each dataset.In this analysis, we have used 23 techniques, listedin Table 6, implemented in a popular Weka tool [8].We have applied 10-fold cross-validation togenerate the predictions. Each dataset has beenrandomly divided into ten subsets with similarnumber of cases. Then, subset number 1 has beenused as a model/technique testing sample allremaining nine subsets have been used to build apredictive model and/or generate predictions foreach case in the testing sample. This procedure hasbeen repeated for each subset.</p><p>Table 6Machine Learning Techniques used in this Study</p><p>Group Symbol Technique</p><p>bayes AODE Averaged One-DependenceEstimators</p><p>BN Bayesian Net</p><p>NB Nave Bayes</p><p>functions MLP Multilayer Perceptron</p><p>RBFN Radial Basis Function Network</p><p>SMO Se...</p></li></ul>