using data envelopment analysis in software development productivity measurement

12
SOFTWARE PROCESS IMPROVEMENT AND PRACTICE Softw. Process Improve. Pract. 2006; 11: 561–572 Published online 27 July 2006 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/spip.298 Using Data Envelopment Analysis in Software Development Productivity Measurement Research Section Mette Asmild 1 , Joseph C. Paradi 2 * ,and Atin Kulkarni 2 1 Nottingham University Business School, Jubilee Campus, Wollaton Road, Nottingham NG8 1BB, UK 2 The Centre for Management of Technology and Entrepreneurship, University of Toronto, Canada The ever-increasing size and complexity of software systems make the cost of developing and maintaining software important. Unfortunately, the process of software production has not been particularly well understood. This article helps clarify the relationship between postimplementation function points (FP) and the corresponding development effort for software development projects in a large Canadian bank. Knowledge of this relationship enables evaluations of the productivity of completed projects and, in particular, provides a predictive tool for future projects. The empirical analysis employs a combination of traditional regression models and Data Envelopment Analysis (DEA). The regression analyses show a log-linear relationship between project size and development effort, which is subsequently used in the DEA models. The DEA models identify best performers and use these as benchmarks, but are not limited to the constant returns to scale assumption of the regression analyses and are capable of including the delivery time as a nondiscretionary input. Finally, by including data from the International Software Benchmarking Standards Group (ISBSG) repository in the DEA models, the bank’s projects are benchmarked not only against its own best performers but also against what is globally feasible. Copyright 2006 John Wiley & Sons, Ltd. KEY WORDS: software development; productivity; Data Envelopment Analysis (DEA); function points; development effort; bank 1. INTRODUCTION Half a century into the computer era, it is still the software that plays the dominant role in the success Correspondence to: Joseph C. Paradi, The Centre for Man- agement of Technology and Entrepreneurship, Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College Street, Toronto, Ontario, M5S 3E5, Canada E-mail: [email protected] Copyright 2006 John Wiley & Sons, Ltd. of most, if not all, computer applications. Unfortu- nately, while productivity measurement techniques in more traditional manufacturing are well devel- oped and generally accepted, this is not the case when it comes to measuring software development productivity. Many attribute this to the nature of the software production process and the program- ming staff who see themselves more as artistes than as scientists or technicians. Two fundamental problems in information systems management are accurate measurements of software development

Upload: mette-asmild

Post on 06-Jul-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Using Data Envelopment Analysis in software development productivity measurement

SOFTWARE PROCESS IMPROVEMENT AND PRACTICESoftw. Process Improve. Pract. 2006; 11: 561–572

Published online 27 July 2006 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/spip.298

Using Data EnvelopmentAnalysis in SoftwareDevelopment ProductivityMeasurement

Research SectionMette Asmild1, Joseph C. Paradi2*,† and Atin Kulkarni2

1 Nottingham University Business School, Jubilee Campus, Wollaton Road,Nottingham NG8 1BB, UK2 The Centre for Management of Technology and Entrepreneurship,University of Toronto, Canada

The ever-increasing size and complexity of software systems make the cost of developingand maintaining software important. Unfortunately, the process of software production hasnot been particularly well understood. This article helps clarify the relationship betweenpostimplementation function points (FP) and the corresponding development effort for softwaredevelopment projects in a large Canadian bank. Knowledge of this relationship enablesevaluations of the productivity of completed projects and, in particular, provides a predictivetool for future projects.

The empirical analysis employs a combination of traditional regression models and DataEnvelopment Analysis (DEA). The regression analyses show a log-linear relationship betweenproject size and development effort, which is subsequently used in the DEA models. The DEAmodels identify best performers and use these as benchmarks, but are not limited to the constantreturns to scale assumption of the regression analyses and are capable of including the deliverytime as a nondiscretionary input. Finally, by including data from the International SoftwareBenchmarking Standards Group (ISBSG) repository in the DEA models, the bank’s projects arebenchmarked not only against its own best performers but also against what is globally feasible.Copyright 2006 John Wiley & Sons, Ltd.

KEY WORDS: software development; productivity; Data Envelopment Analysis (DEA); function points; development effort; bank

1. INTRODUCTION

Half a century into the computer era, it is still thesoftware that plays the dominant role in the success

∗ Correspondence to: Joseph C. Paradi, The Centre for Man-agement of Technology and Entrepreneurship, Department ofChemical Engineering and Applied Chemistry, University ofToronto, 200 College Street, Toronto, Ontario, M5S 3E5, Canada†E-mail: [email protected]

Copyright 2006 John Wiley & Sons, Ltd.

of most, if not all, computer applications. Unfortu-nately, while productivity measurement techniquesin more traditional manufacturing are well devel-oped and generally accepted, this is not the casewhen it comes to measuring software developmentproductivity. Many attribute this to the nature ofthe software production process and the program-ming staff who see themselves more as artistesthan as scientists or technicians. Two fundamentalproblems in information systems management areaccurate measurements of software development

Page 2: Using Data Envelopment Analysis in software development productivity measurement

Research Section M. Asmild, J. C. Paradi and A. Kulkarni

project outcomes and prediction of developmentcosts. In the absence of accurate process and productmeasurement metrics and project cost estimationmodels, it is difficult to plan and manage projects.

In an environment with globalizing economiesand increasing competition, organizations are facedwith the task of developing quality software sys-tems driven by time-to-market considerations butstill at the lowest possible cost. A key elementfor improving anything, and software developmentperformance is no exception, is to measure it. Recog-nizing this need, the systems development depart-ment of a large Canadian bank wanted to establisha formal and comprehensive performance analysisprogram. The part of this effort we contributed tofocused on the measurement of software outputin terms of Function Points (FP) and on establish-ing the relationship between this measure and therequired work effort (WE) – using both well-knownstatistical methods and the lesser known nonpara-metric approach called Data Envelopment Analysis(DEA). This article empirically evaluates the claimthat there exists a relationship between the postim-plementation FPs and the development effort in thecase of a large dataset from this bank. It is postu-lated that once the characteristics of this relationshipare known, we can evaluate the programmer pro-ductivity observed in completed projects as wellas use the methodology for predictive purposeswhen estimating costs and deliveries for futuredevelopment projects. The research reported hereovercomes some of the limitations of earlier work inthe area by using sufficiently large datasets, explic-itly including the time factor in the analysis, andconsidering log-linear models that appear to beappropriate for this type of analysis. This articleis also novel in that it uses standard statisticaltechniques to analyze the functional relationshipbetween the variables, specifically the apparent log-linear relationship, but then utilizes this knowledgeto develop DEA models. The DEA approach has theadvantage of considering the frontier rather thanthe average relationship, which is the appropriatebenchmark for performance evaluation. DEA alsoenables the usage of variable returns to scale (VRS)models and the incorporation of nondiscretionaryinputs.

The rest of this article is structured as follows:Section 2 discusses and defines how to measurethe size of a software system and Section 3describes the DEA methodology, which we propose

as an alternative way of analyzing this problem.Section 4 presents the empirical data available forthe analysis, which consists of historical data from anumber of software development projects that havebeen completed in one bank as well as data fromthe International Software Benchmarking StandardsGroup (ISBSG) repository, which are used mainlyfor comparative purposes. Section 5 reviews thetypical regression models used for this type ofanalysis, and shows the corresponding results.Section 6 presents the DEA results and finally,Section 7 discusses the findings and presents theconcluding remarks.

2. MEASURING SOFTWARE

Two major aspects of software project managementare planning and control. A key requirement to bothis the capability to accurately and reliably measurethe ‘size’ of the software being delivered (Kemerer1993). From the project planning and controlpoint of view, size and complexity of the softwareto be developed are viewed to be of particularimportance. Size, in general, refers to the notion ofvolume although there is no clear definition. Sizeis assumed to be a major driver of the softwaredevelopment effort. Complexity refers to logical orcognitive complexity of a program or module, notthe time complexity, and is especially considered tohave a bearing on the testing effort (Pressman 1992).

Although the need for measuring the importantattributes of the software creation process has longbeen argued, there is a lack of acceptable metricsor measuring techniques. One of the earliest mea-sures of software size is the Source Lines Of Code(SLOC). SLOC has been heavily criticized, both byresearchers and practitioners. After all, program-mers can write codes that are more voluminous(i.e. has more SLOC) if they perceive that they arebeing measured on this metric. Moreover, the pro-gramming language used has a major impact on thiscount since some are very ‘verbose’, such as COBOL,and others very terse, such as APL. However, it stillremains one of the popular size metrics in use inindustry mainly because it is easy to count and thereare few other credible measures to replace it.

An attempt at establishing the relationshipbetween software size (in SLOC) and developmenteffort was given by Putnam and Myers (1992) who,on a purely empirical basis, developed the Software

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

562 DOI: 10.1002/spip

Page 3: Using Data Envelopment Analysis in software development productivity measurement

Research Section DEA in Software Development Productivity Measurement

Equation, which relates productivity to size, time,and effort in the following way:

Product = Productivity Parameter

× (Effort/B)(1/3) × Time(4/3)(1)

where product is the size of the project (measuredin SLOC), the productivity parameter is a numberdetermined empirically for each organization, Effortis measured in man-years of work, B is a skills factorthat is a function of size, and Time is the elapsedcalendar schedule in years.

One of the major uses of or motivations formeasurements is that they can be used in theestimation or prediction of a project before extensiveresources are spent on it. Still, many large firmsspend upward of 40% of their budget for a projectbefore a final go or no-go is given. In the case ofsoftware engineering, the quantities of interest to bepredicted are, e.g. WE, staffing levels, timeframes,quality, and productivity. The very nature ofprediction demands that the independent variablesbe available early in the project life cycle. However,the attributes of an entity cannot be measured unlessit exists, as the process of measurement requiresentities in real life. The reconciliation for thesecontradictory facts is that, for the purpose of theprediction, it is not the value of the actual measure ofthe attribute of interest (SLOC e.g.) that we can usebut only an estimate of this value (expected SLOC).Hence, we are replacing one difficult problem withanother, perhaps an even more difficult one.

Almost all metrics proposed fail in this respectin the sense that their accurate estimates cannotbe available early enough in the developmentcycle to be of use in the prediction system (Jones1991). Another major shortcoming of some of thesemetrics, especially a size metric such as SLOC, istheir technology and language dependency. Thismakes it extremely difficult to compare softwaresize as well as other derived measures, such asproductivity, across different platforms/languages.

It is clear that the desirable properties of metricsto be used for planning and control include thefollowing:

• Availability early in the project life cycle• Language and technology independence• Accuracy, reproducibility, and reliability

Alan Albrecht of IBM set out to tackle these prob-lems and came up with the Function Point metric

in 1979 (Albrecht 1979). FPs are a complexity-weighted summation of user identified Inputs, Out-puts, Inquiries, Internal Files and External Files(Reference Files or Interfaces) of a system adjustedfor a variety of technology factors (Abran andRobillard 1996). These five basic components areknown as Function Types. The five Function Typesare sometimes aggregated into two FunctionalityTypes: Transaction FPs (Inputs, Outputs, Inquiries)and Data FPs (Internal Files, External Files).

Since FPs take into account only end-user iden-tified or approved Function Types, they are saidto take an external view of the software systemand hence are language and technology indepen-dent. The concept is, that from an end-user pointof view, it does not matter what language thesystem is written in or what technical environ-ment it operates in, as long as they get what theyrequire from it. However, this technology inde-pendence argument is only partially valid whensoftware development concepts such as graphicaluser interfaces and object-oriented development areconsidered (Banker et al. 1992).

Requirements analysis is a process that capturesthe user requirements, and it can therefore beargued that what the users want should be known atthe end of this phase. Since FP, as per the argumentabove, is a measure of the functionality requestedby the users, it should be available at the end ofthe requirements analysis. In the project life cycle,this phase is considered to be approximately at the5% (of the total costs) mark. Hence, the FP estimatecould be available at a rather early stage in theproject development.

The FPs may be seen as satisfying the first tworequirements of a software metric as discussedearlier, at least to a certain degree. However,they do not appear to satisfy the last set ofdesirable properties. The main reason is that theprocess of counting FPs calls for subjective decisionson the part of the analyst and, predictably, thecount can vary from analyst to analyst. SeveralISO committees have addressed this problem andstandards like ISO14143 1–5 have been developed.This set of five standards define the generalrules for size measurement and consider concepts,conformity, measurements, and offer a referencemodel and the method for the determination ofthe functional domains for use with functional sizemeasurements. Other standards of relevance hereare ISO 19761, ISO 20926, ISO 20968 and ISO 24570.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

DOI: 10.1002/spip 563

Page 4: Using Data Envelopment Analysis in software development productivity measurement

Research Section M. Asmild, J. C. Paradi and A. Kulkarni

These efforts are aimed at making FP countingconsistent between analysts and addressing thecriticisms leveled at the FP concept by thosewho lament the inconsistencies between variousapproaches.

Notwithstanding the introduction of such stan-dards, some may still argue that many aspects of theprocess of counting FPs is empirical and practice-based and have no scientific or measurement theoryfoundation (SEI 2005)1 (Jones 2005)2 (Legaspi 2004).3

Indeed, FPs are dimensionless numbers that haveno physical meaning. As a result, it is not clearwhat attribute of the software, size or complexity,FPs represent. In the literature, FP is often referredto as a size metric (Longsteet 2005)4 (SEI 2005)5 butsometimes as a complexity metric as well (SEI 2005)6

(Boehm 1997)7.Despite these and other criticisms, FP has become

one of the more widely used software metrics.Specifically, in the Systems Development Depart-ment of the bank that provided one of the datasets used in these analyses FP is the preferred met-ric for quantifying software development. In orderto understand and improve their software systemdevelopment performance, the nature and strengthof the relationship between FPs and developmenteffort was carefully examined to understand theunderlying processes.

Although there have been some studies publishedconcerning these issues, many have been basedon inadequate data and may not be statisticallysignificant (e.g. Albrecht and Gaffney 1983, Kemerer1987, Matson et al. 1994, Abran and Robillard 1996).

1 ‘There are continuing concerns about the reliability andconsistency of function point counts, such as: whether twotrained human counters will produce the same result for thesame system’.2 ‘The software industry lacks standard metric and measurementpractices. Almost every major software metric has multipledefinitions and ambiguous counting rules’3 ‘Experience and insight of the estimator is very important. Thecombined experiences and perspectives of my boss and myselfallowed us to assign the right functional points and weights’.4 ‘Function points can be used to size software applicationsaccurately. Sizing is an important component in determiningproductivity (outputs/inputs)’.5 ‘FPA has become generally accepted as an effective way toestimate a software project’s size (and in part, duration)’.6 ‘Early and easy function points. Adjusts for problem and datacomplexity with two questions that yield a somewhat subjectivecomplexity measurement’.7 The value adjustment factor gives insight into the complexityof the overall system.

Since there does not appear to be a strong scientificbasis for FPs, the evidence in favor of or againstthe claim that they are a fair representation ofthe WE expended in developing software and thefunctionality delivered to the user must come frommore empirical studies. As noted by Kitchenhamand Mendes (2004), many studies on softwaredevelopment productivity are still based on simpleratios between product size and effort (e.g. Arnoldand Pedross 1998, MacCormack et al. 2003). Whilethis eases the analyses and the explanations ofthe approaches, it also means that factors likedevelopment time are not explicitly incorporatedinto the analysis. DEA, however, has the advantageof being able to simultaneously consider multipleinputs and multiple outputs. Furthermore, DEAcan model VRS, whereas a simple ratio approachas well as the standard linear regression methodmore or less explicitly assume a constant linearrelationship between the inputs and outputs. This(constant) linearity is typically not fulfilled forsoftware projects, where effort is assumed toincrease exponentially with size as evidenced by,e.g. the Cocomo software cost estimation model(Boehm et al. 1996).

Several studies in the literature use DEA to mea-sure software productivity. Banker et al. (1991) andBanker and Slaughter (1997) analyze software main-tenance productivity and Mahmood et al. (1996)analyze a combination of development and mainte-nance using SLOC and an additive constant returnsto scale (CRS) model. The study by Parkan et al.(1997) is based on a very small sample, and thereforeimplicitly uses a DEA window analysis. Further-more, while the model is claimed to be VRS, all dataare normalized with the output, implicitly assum-ing CRS anyway. Banker and Kemerer (1989) andBanker et al. (1994), directly addressing the issue ofreturns to scale, show that there are both economiesand diseconomies of scale in software development,and thus VRS models should be used. The appropri-ateness of VRS models is also supported by Myrtveitand Stensrud (1999) and Stensrud and Myrtveit(2004). The first of these studies uses very sim-ple DEA model formulations to analyze 30 COTSsoftware projects. The second study analyzes 30ERP projects considering a combination of technicalinfrastructure and application output variables, butwithout accounting for development time and thepossibility of nonlinear relationships. The study byParadi et al. (1997) is limited by the size of the data

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

564 DOI: 10.1002/spip

Page 5: Using Data Envelopment Analysis in software development productivity measurement

Research Section DEA in Software Development Productivity Measurement

sets (11 and 15 DMU’s respectively), which does notenable very detailed models. Furthermore, in theirwork the time factor is included in a somewhatproblematic way as a translated output. Finally,only a few of the earlier studies consider the log-linear relationship apparent from this work; thusthere appears to be ample opportunity for addingto the existing body of knowledge.

When discussing software development produc-tivity, one should also mention nonmetric ori-ented methods toward improving the softwareprocess, like the Personal Software Process (PSP)(Humphrey 1995). The Systems Engineering Insti-tute on the basis of the principles of their Capa-bility Maturity Model (CMM) developed the PSPmethodology. PSP is applied by taking the soft-ware engineers out of their work environment andputting them through a series of rigorous trainingcourses that take them through PSP0 to PSP3 levels.Early implementations of PSP reported measurableimprovements in estimation accuracy, bug countsand productivity (Ferguson et al. 1997, Hayes andOver 1997). It has also been suggested that theremay be intangible benefits, such as increased qual-ity of work life and improvements in organizationalcommunications and learning arising from the useof CMM-type approaches (Hyde and Wilson 2004).Other studies, however, have found that PSP hasdata quality problems and they had shown thatthe improvements are smaller than the PSP propo-nents usually assume and argue that PSP trainingalone does not automatically result in a realizationof all the potential benefits of claimed (Johnson andDisney 1998, Prechelt and Unger 2000). It wouldtherefore be very interesting to include informationon PSP in the analyses of the type presented here,especially to see if or to what extent productivityimprovements following PSP implementation areevident when using our proposed model specifi-cations and framework. Unfortunately, such datawere not available to us for the bank analyzed, sothis issue will have to be referred to future research.

3. DATA ENVELOPMENT ANALYSIS

DEA was first introduced by Charnes et al. (1978),extending the idea of Farrell (1957) of estimatingtechnical efficiency with respect to a productionfrontier.

DEA is a fractional linear programming–basedapproach that can handle multiple inputs andmultiple outputs simultaneously without relyingon a priori assumptions about their functionalrelationship – it is nonparametric. Furthermore, thevariables can be measured in different units. Finally,DEA identifies best performers and uses them asbenchmarks, thereby measuring the productivityof all observations relative to the efficient frontiermade up of the best performers.

DEA will be briefly introduced here; for a morethorough treatment see, e.g. Cooper et al. (2000).Formally, consider n production entities or DecisionMaking Units (DMUs), which are to be evaluated,that all use m different inputs to produce s differentoutputs. Let Xj be the input consumption vector forDMUj where Xj = (x1j, . . . , xmj)

T, and Yj is the outputproduction vector, and Yj = (y1j, . . . , ysj)

T.The DEA input efficiency score for DMU’, θ ′,

under a VRS assumption is given by

θ ′ = minθ,λ

θ

s.t.

θX′ ≤n∑

j=1

λjXj

Y′ ≥n∑

j=1

λjYj

n∑

j=1

λj = 1

(2)

The input efficiency scores, which by definition takevalues between 0 and 1, indicate the extent to whichall the inputs could be reduced, without changingthe output production. The DEA methodology intwo dimensions is illustrated in Figure 1, where theleft frame shows the one input, one output case andthe right frame the situation with two inputs and afixed output.

In both frames in Figure 1, the observations(DMUs) A, B, and C are all fully efficient andmake up the best practice frontier that envelopesthe production possibility set. The observationsD and E are in the interior of the possibility setand are therefore inefficient. In the left frame, thehorizontal distance to the frontier represents theirinput inefficiency, and their projections onto thefrontier (D′ and E′) constitute their benchmarks.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

DOI: 10.1002/spip 565

Page 6: Using Data Envelopment Analysis in software development productivity measurement

Research Section M. Asmild, J. C. Paradi and A. Kulkarni

*D

D*

Input 1

Input 2

*E

*

*

*

A

B

C

*E’

B*

A

C *

*

D**D’

Output

Input

E*E’ *

Figure 1. Illustration of input-oriented DEA with variable returns to scale

In the right frame, the input efficiency is thedistance to the frontier in the direction of the origin,representing proportional reductions of both inputssimultaneously.

The standard DEA model shown above canbe modified in numerous ways (Charnes et al.1994, Cooper et al. 2000). Of particular interest forthe current application is the concept of nondis-cretionary inputs. It is well known that withregard to software production the relationshipbetween WE and output size also depends onthe time available for the project, i.e. the shorterthe time, the larger the absolute effort requiredto produce the same output (Putnam and Myers1992).

Whereas this aspect is difficult to include inregression models, it can easily be incorporated intoDEA models. Since a quick delivery is preferred, theproject time, naturally, belongs to the input side,since DEA favors small values of inputs. The time,however, cannot reasonably be assumed to be underthe managers’ direct control, and furthermore, itis not necessarily preferred to reduce the projecttime (as long as the project is finished within theavailable or required time). Rather, given the timeavailable for the project and the required output,the most relevant consideration is a reduction ofthe WE. This situation can be analyzed usingan input-oriented DEA model with time as anondiscretionary input.

Dividing the input matrix X into the matrix XD ofdiscretionary inputs (which are to be reduced) andthe matrix XND of nondiscretionary inputs (whichare kept constant), the standard DEA model (2) ismodified to

θ ′ = minθ,λ

θ

s.t.

θXD ′ ≤n∑

j=1

λjXD

j

XND′ ≤n∑

j=1

λjXND

j

Y′ ≥n∑

j=1

λjYj

n∑

j=1

λj = 1

(3)

4. DATA

The first data sets used in the analysis containhistorical data from a number of software develop-ment projects that the bank has already completed.The largest dataset consists of manual counts ofunweighted FP and the corresponding WE (mea-sured in hours)8 for 144 different non-PC typedevelopment projects. For 43 of these projects (sub-set B), more detailed data are available, with the FPcount disaggregated into two Functionality Types(Transactions, Data), and further into five Func-tion Types (Inputs, Outputs, Inquiries, Internal files,External files).

Descriptive statistics of the full data set A and thesubset B are given in Table 1.

8 Calculated by multiplying the recorded ‘work days’ with 7.5.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

566 DOI: 10.1002/spip

Page 7: Using Data Envelopment Analysis in software development productivity measurement

Research Section DEA in Software Development Productivity Measurement

Table 1. Descriptive statistics of data sets A & B

All A (144) Subset B (43)

WE FP WE FP Transactions Data Inputs Outputs Inquiries Internal Files External Files

Mean 5265 592 2880 277 191 86 59 114 17 56 31Minimum 293 20 525 27 13 0 0 0 0 0 0Maximum 114 023 9147 12 233 1062 745 402 370 507 296 234 285Standard deviation 12 653 1201 2790 267 197 93 82 120 48 66 53

All of the data referred to above are from thesame cultural environment, i.e. a single, large Cana-dian bank’s systems development group housedin the same building complex. Therefore, analy-sis done within this dataset indicates average orbest performance within the bank without refer-ence to what might be globally feasible. We havetherefore obtained data from the ISBSG repositoryas well. This dataset contains detailed informationon 2027 completed software projects. After sortingand filtering the dataset, 158 observations remainedthat have complete information about the requiredelements (Function Types, time) and are highlycomparable in the sense that they are all new devel-opments, the FP are counted according to the sameprinciples, and the data are of ISBSG rating codeA, i.e. satisfies all criteria for seemingly sound data.Descriptive statistics for this dataset (C) are givenin Table 2 below.

5. REGRESSION MODELS AND RESULTS

Two different models have been widely used todescribe the relationship between the developmentof WE and the size of the developed software systemmeasured in FP. The simple linear, first order modelis of the form

WE = α + β × FP + ε (4)

Alternatively, exponential models have been sug-gested by, e.g. Banker and Kemerer (1989) (and isalso evident in the Cocomo model of Boehm et al.

1996), which can be rewritten as log-linear modelsof the form:

ln(WE) = α + β × ln(FP) + ε (5)

In both models, the coefficients α and β are esti-mated using minimum least squares regression. Theerror terms, ε, are assumed to be independent andidentically distributed. The validity of this assump-tion is examined using the residual and normalprobability plots. The R2 indicates which percent-age of the total sum of the squared differencesbetween the actual WE values and the average ofthe estimated WE values is explained by the modeland is thus a measure of the fit of the model.

The residual and normal probability plots fromthe simple linear regression (4) for data set A revealthat the underlying assumption of a linear relation-ship between WE and FP is not fulfilled. The sameconclusion can be drawn from data set C. We there-fore consider exponential models (5) instead. Forthe total data set A, the assumptions now appearto be fulfilled, and we can conclude that the expo-nential model is appropriate for the total data set.This model furthermore has a reasonable R2 of 64%.We do, however, observe some outliers in the dataset: One observation is somewhat smaller than therest, and three are clearly larger. One may considerremoving such observations, which may be influ-ential outliers, and thereby limit the validity of themodel to a smaller, but better described, range ofFP values. In this particular model, this does notseem to improve the prediction, however, and the

Table 2. Descriptive statistics of data set C

WE FP Transactions Data Inputs Outputs Inquiries Internal Files External Files

Mean 6907 725 527 199 254 164 109 165 34Minimum 136 41 11 0 0 0 0 0 0Maximum 59 809 4943 3495 1448 2221 1337 952 1208 437Standard deviation 10 032 887 669 254 366 234 156 231 63

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

DOI: 10.1002/spip 567

Page 8: Using Data Envelopment Analysis in software development productivity measurement

Research Section M. Asmild, J. C. Paradi and A. Kulkarni

best possible model estimated is for data set A (144observations) and is9as follows:

ln(WE) = 1.95 + 0.70 × ln(FP), R2 = 64% (6)

This is a rather impressive result, considering thefact that only one variable (FP) is used to describethe development effort required for an undertak-ing as complex as software system development.Meanwhile, similar studies in the literature usingunadjusted FP, such as Kemerer (1987) or Jeffreyet al. (1993), have explained only 36–54% of theobserved variance in the required WE.

In order to further validate the estimated model,we acquired a small data set from another Canadianbank. This validation data set contains 15 observa-tions and can be compared to the estimated model(6) above. It is noted that the observed ranges ofFPs are similar for the two data sets, so the testingdata are within the estimated model’s validity. Itis also important to bear in mind that this is anout-of-sample data set with a somewhat differentcultural environment. In spite of these factors, theestimated model still describes the observed vali-dation data reasonably well, with an average errorof 25%. The model is fairly symmetric in the sensethat it overestimates the required effort in slightlymore than half of the cases. There does not seem tobe any unreasonable bias, so the model is deemedappropriate for this data set too, thereby validatingthe model’s applicability. That the model tends tooverestimate the required WE more than it underes-timates it is in line with the general thinking that thetop programmers tend to produce, by far, the largestpart of the output. Therefore, the projects employ-ing these people have the potential of using a lotless input (WE) than estimated. Furthermore, oneexpects a tighter control and action taken when aproject is not performing well and therefore projectsare generally less likely to considerably exceed theestimated workdays.

Next, we decompose the FP measure into thetwo Functionality Types: Transaction FPs and DataFPs, and investigate whether this decomposition

9 The Bank divides the projects into different types (Stand-alone,Integrated and Other) and performing the regression withineach of these categories enables even better model fits (R2’s of59–77%) for all but the integrated projects. Including categoricalvariables to distinguish between and identify the characteristicsof the different types of projects is an interesting issue for futureresearch.

Table 3. Summary of functionality type models (data set B)

Model R2

ln(WE) = 2.31 + 0.64 × ln(FP) 0.60ln(WE) = 2.96 + 0.47 × ln(Trans) + 0.13 × ln(Data)a 0.63ln(WE) = 2.50 + 0.61 × ln(Trans) 0.62

a Not significant.

improves the predictive ability of the model. Thismodel formulation requires that we use the datasubset B instead. Again it is clear that linear modelsare unsuitable, whereas the log-linear models areappropriate. A summary of these models is given inTable 3. For comparative purposes, the one-variable(FP) model estimated on this data set is also shown.For all models, the number of observations is 43 asthere are no apparent outliers.

From the results in Table 3, it can be concludedthat the Functionality Type models are slightlybetter predictors than the basic FP model andthat, in particular the Transaction FPs are the maindeterminant for the required WE.

Regression models with the five Function Typesas explanatory variables might add more detailand accuracy to the analysis. They are, however,not possible in this case, as the logarithmic trans-formation requires that all five Function Types beslightly more than zero, which is the case for only afew observations, such that the degrees of freedombecome problematic.

A characteristic of regression models is that theydescribe the average relationship or tendency ofthe observed data. Regression models are thereforeappropriate for predictive purposes where theexpected value of WE, for example, is of interest.For this use, the models in Table 3 are suitable. Forpost development analysis of project productivityit is not the average relationship but rather the bestperformers, which are of interest for benchmarking.DEA can identify best performers as well asbenchmark other observations against them.

6. DEA RESULTS

Since the project time is only available in the ISBSGdata set C, we start by using this data set to developthe appropriate DEA models.

The simplest possible model uses WE as inputand the total number of FP as output in an input-oriented model. The average efficiency score of

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

568 DOI: 10.1002/spip

Page 9: Using Data Envelopment Analysis in software development productivity measurement

Research Section DEA in Software Development Productivity Measurement

Table 4. Summary of DEA results (data set C)

Inputs Outputs Averageefficiency

(%)

Number ofefficientDMU’s

WE FPs 22.3 6Ln(WE) Ln(FP) 75.6 6Ln(WE) Ln(Transactions), Ln(Data) 76.8 7Ln(WE) Ln(Inputs), Ln(Outputs), Ln(Inq), Ln(Int. files), Ln(Ext. files) 82.7 17Ln(WE), time (non-disc.) Ln(FP) 76.1 10Ln(WE), time (non-disc.) Ln(Transactions), Ln(Data) 76.6 16Ln(WE), time (non-disc.) Ln(Inputs), Ln(Outputs), Ln(Inq), Ln(Int. files), Ln(Ext. files) 85.2 27

22% does, however, indicate that the model ismisspecified, since it is not realistic to believethat the programming teams on average shouldbe able to reduce their WE with 78% while stillproducing the same output. Since all regressionanalyses have shown that the average input–outputrelationship is, in fact, log-linear, it seems reasonableto assume that this is also the case for thefrontier relationship. Using Ln(WE) and Ln(FP)gives a lot more credible results with an averageefficiency score of 76%. We therefore chose touse the logarithmically transformed data in thesubsequent analysis. Following the same steps as inthe development of the regression models in Section5, we also ran DEA analyses with the functionalitytypes and the function types as outputs respectively.The results of these analyses are summarized inTable 4. We note that the average efficiency scoreincreases when the number of outputs is increasedbut this is a methodological result caused by thedimensionality of the problem.

Since the input requirement necessary to producea given output increases when the delivery timeis shortened, we formulate corresponding DEAmodels where the delivery time is included as anondiscretionary input as described in Section 3.This means that while the model is input-oriented,only the discretionary input (WE) is reduced, whilethe nondiscretionary input (time) is kept fixed, cfEquation (3). The results from these DEA modelsare also shown in Table 4. While this approach isvery promising, we cannot, unfortunately, developthis idea further in the present study, since thedata set from the bank does not contain informationabout elapsed time. We recommend this approachfor future studies.

So far, we have only investigated the ISBSG dataset C. Analyzing the bank’s data using DEA models

can be done in different ways. If data set B10 isanalyzed independently, the results will be relativeto the best performance within the bank but withoutreference to what might be globally obtainable. Thatthis might not be the correct approach is evidentfrom the fact that the average efficiency score for thebank’s projects are significantly higher when onlycompared to the bank’s own projects rather than thewhole ISBSG data set. One might reasonably assumethat software development projects undertaken indifferent institutions or even different industriesor countries are in fact comparable. Therefore, wechose to pool the two datasets and benchmark thebank’s projects against not only themselves but alsoprojects completed elsewhere as is the case with theISBSG data. These results are shown in Table 5.

The results from each of the models presentedin Table 5 are summarized into average efficiencyand the number of efficient DMU’s. However, theindividual efficiency scores are of considerableimportance to managers. The scores will directlymeasure the overuse of inputs compared to the bestperformers and thereby indicate the improvementpotential. The average improvement potential interms of the reduction of WE is in the 16–28%range, depending on the choice of model. Realizingeven some of this potential means considerable costsavings for the bank. It is also worth noting thatsome projects have fairly low efficiency scores, withthe minimum scores in the different models being53–60%. A targeted effort to identify and improvethese clearly under-performing projects is likelyto be particularly cost effective. In addition, werecommend that data on project delivery time becollected and included as a nondiscretionary input

10 Note that as data set A does not enable the breakdown intoFunctionality or Function Types so we only consider data set Bhere.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

DOI: 10.1002/spip 569

Page 10: Using Data Envelopment Analysis in software development productivity measurement

Research Section M. Asmild, J. C. Paradi and A. Kulkarni

Table 5. Summary of DEA results (data sets B and C combined)

Inputs Outputs Averageefficiency

bank(%)

Totalnumber of

efficientDMU’s

EfficientDMU’s

frombank

Ln(WE) Ln(FP) 72.1 4 1Ln(WE) Ln(Transactions), Ln(Data) 74.6 5 1Ln(WE) Ln(Inputs), Ln(Outputs), Ln(Inq), Ln(Int. files), Ln(Ext. files) 84.1 15 3

in future studies that should eliminate a potentialbias against projects with a short time span.

Several models are presented in Table 5, andthere is no clear way of choosing between them.In practical applications, it is often a matter ofpreferences by managers and analysts as well asselection of models that appear fair and equitableto those being measured. When more variablesare included in the analysis, the comparisons andtargets are likely to be more relevant, but thelevel of detail comes at the cost of identifying lessinefficiency.

7. CONCLUSION

In this article, we have empirically evaluated theaverage relationship between postimplementationFPs and the corresponding development effortin a fairly large data set of software develop-ment projects that have been completed in a largeCanadian bank. Knowledge of this relationshipenables evaluations of the productivity of com-pleted projects and, in particular, provides a pre-dictive tool for future projects. The size of the dataset facilitates the development of proper statisticalmodels and the control of residual and normal prob-ability plot showed that all the exponential modelsdeveloped are sound and well behaved and sat-isfy the required assumptions, whereas the linearmodels are inappropriate in all cases.

The models using only FPs as explanatoryvariables have a rather impressive R2 of 64%. Thesimple FP (1 variable) model was validated on anout-of-sample data set, indicating that this modelis fairly accurate in describing the relationshipbetween system size and required effort even forprojects from another bank and therefore from adifferent cultural environment.

Decomposing the explanatory FPs into the twoFunctionality Types, Transaction FPs and Data FPs,

reveal that it is the Transaction FPs that are the maindeterminants for the required effort, as the Data FPsturn out to be insignificant. These models have sat-isfactory R2’s of 60–63%, but are not convincinglybetter than the simple FP models. When furtherdecomposing the explanatory variable into the fiveFunction Types, the necessary log-linear transfor-mations are hindered by the large number of 0’sin the data set. However, this is an area that needsfurther examination as it promises to shed light onthe importance of the individual components in FPsand hence would allow a better definition of whatone should count in the FP formula.

While regression models are good at describingaverage relationships and therefore useful forpredictive purposes, it is the best performersrather than the average that are relevant forbenchmarking and productivity analysis. In thisarticle, we have illustrated a novel approach tosoftware development efficiency analysis throughthe use of DEA. Initial regression analyses indicatethat the input–output relationship is log-linearrather than linear. This cannot be tested directlyin DEA and this application is thus an interestingexample of combining the advantages of DEA andregression analysis. DEA also enables the inclusionof the delivery time as a nondiscretionary input,which seems to be a promising approach. Byincluding data from the ISBSG repository in theDEA models, the bank’s projects are not onlybenchmarked against the best performers fromthe bank itself, but also against what is globallyfeasible, which is the ideal situation if one assumesthat software development projects are comparableacross institutions, industries, and countries. Thisobservation appears to be supported by the large-scale outsourcing of software efforts to Indiaand elsewhere; obviously, the outsourcing firmssubscribe to this viewpoint. DEA has thus beenshown to be a useful and versatile approach tosoftware development productivity analysis.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

570 DOI: 10.1002/spip

Page 11: Using Data Envelopment Analysis in software development productivity measurement

Research Section DEA in Software Development Productivity Measurement

REFERENCES

Abran A, Robillard PN. 1996. Function point analysis:An empirical study of its measurement processes. IEEETransactions on Software Engineering 22(12): 895–910.

Albrecht AJ. 1979. Measuring application developmentproductivity. In IBM Applications Development JointSHARE/GUIDE Symposium, Monterey, CA, 83–92.

Albrecht AJ, Gaffney JE Jr. 1983. Software function,source lines of code, and development effort prediction: Asoftware science validation. IEEE Transactions on SoftwareEngineering 9(6): 639–647.

Arnold M, Pedross P. 1998. Software size measurementand productivity rating in a large-scale softwaredevelopment department. In Proceedings of the 20thInternational Conference on Software Engineering, (ICSE ’98),Kyoto, Japan, 490–493.

Banker RD, Kemerer CF. 1989. Scale economies in newsoftware development. IEEE Transactions of SoftwareEngineering 15(10): 1199–1205.

Banker RD, Slaughter SA. 1997. A field study of scaleeconomies in software maintenance. Management Science43(12): 1709–1725.

Banker RD, Chang H, Kemerer CF. 1994. Evidence oneconomies of scale in software-development. Informationand Software Technology 36(5): 275–282.

Banker RD, Datar SM, Kemerer CF. 1991. A model toevaluate variables impacting the productivity of softwaremaintenance projects. Management Science 37(1): 1–18.

Banker RD, Kaufmann RJ, Kumar R. 1992. An empiricaltest of object-based output measurement metrics ina Computer Aided Software Engineering (CASE)environment. Journal of Management Information Systems8(3): 127–150.

Boehm Ray. 1997. What do you do with the counts?.Available at: http://ourworld.compuserve.com/homepages/softcomp/fpfaq.htm//#WhatDoYouDoWithTheCounts.

Boehm B, Clark B, Horowitz E, Westland C, Madachy R,Selby R. 1996. The Cocomo 2.0 software cost estimationmodel – a status report. American Programmer, July, 2–17.

Charnes A, Cooper WW, Rhodes E. 1978. Measuring theefficiency of decision making units. European Journal ofOperational Research 2(6): 429–444.

Charnes A, Cooper WW, Lewin AY, Seiford LM (eds).1994. Data Envelopment Analysis: Theory, Methodology andApplications. Kluwer Academic Publishers: Norwell, MA.

Cooper WW, Seiford LM, Tone K. 2000. Data EnvelopmentAnalysis. Kluwer Academic Publishers: Boston,Dordrecht, London.

Farrell MJ. 1957. The measurement of productiveefficiency. Journal of the Royal Statistical Society 120(3):253–281.

Ferguson P, Humphrey WS, Khajenoori S, Macke S,Matvya A. 1997. Introducing the personal softwareprocess: Three industry case studies. IEEE Computer 30(5):24–31.

Hayes W, Over JW. 1997. The Personal Software Process(PSP): An empirical study of the impact of PSP onindividual engineers, Technical Report CMU/SEI-97-TR-001, Software Engineering Institute, Carnegie MellonUniversity: Pittsburgh, Pennsylvania, PA.

Humphrey WS. 1995. A Discipline for Software Engineering,SEI Series in Software Engineering. Addison-Wesley:Reading, MA.

Hyde K, Wilson D. 2004. Intangible benefits of CMM-based software process improvement. Software Process:Improvement and Practice 9(4): 217–228.

Jeffrey DR, Low GC, Barnes M. 1993. A comparison offunction point counting techniques. IEEE Transactions onSoftware Engineering 129(5): 529–533.

Johnson PM, Disney AM. 1998. The personal softwareprocess: A cautionary case study. IEEE Software 13: 3.

Jones TC. 1991. Applied Software Measurement: AssuringProductivity and Quality. McGraw Hill: New York.

Jones C. 2005. Strengths and Weaknesses ofSoftware Metrics can be downloaded from:http://www.spr.com/catalog/.

Kemerer CF. 1987. An empirical validation of softwarecost estimation models. Communications of the Acm 30(5):416–429.

Kemerer CF. 1993. Reliability of function pointsmeasurement: A field experiment. Communications of theACM 36(2): 85–97.

Kitchenham B, Mendes E. 2004. Software productivitymeasurement using multiple size measures. IEEETransactions on Software Engineering 30(12): 1023–1035.

Legaspi CM. 2004. Software metrics: Lines ofCode vs Function Points, can be seen at:http://www.theserverside.com/tss.

Longsteet D. 2005. Fundamentals of FPA can be found at:http://www.ifpug.com/fpafund.htm.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

DOI: 10.1002/spip 571

Page 12: Using Data Envelopment Analysis in software development productivity measurement

Research Section M. Asmild, J. C. Paradi and A. Kulkarni

MacCormack A, Kemerer C, Cusumano M, Crandall B.2003. Trade-offs between productivity and quality inselecting software development practices. IEEE Software78–79.

Mahmood MA, Pettingell KJ, Shaskevich AI. 1996.Measuring productivity of software projects dataenvelopment analysis approach. Decision Sciences 27(1):57–80.

Matson JE, Barrett BE, Mellichamp JM. 1994. Softwaredevelopment cost estimation using function points. IEEETransactions on Software Engineering 20(4): 275–287.

Myrtveit I, Stensrud E. 1999. Benchmarking COTSprojects using data envelopment analysis. In Proceedingsof the 6th International Symposium on Software Metrics, BocaRaton, FL.

Stensrud E, Myrtveit I. 2004. Identifying highperformance ERP projects. IEEE Transactions on SoftwareEngineering 29(5): 398–416.

Paradi JC, Reese D, Rosen D. 1997. Application of DEAto measure the efficiency of software production at two

large Canadian banks. Annals of Operations Research 73:91–115.

Parkan C, Lam K, Hang G. 1997. Operational competi-tiveness analysis on software development. Journal of theOperational Research Society 48(9): 892–905.

Pressman RS. 1992. Software Engineering: A Practitioner’sApproach, 3rd edn. McGraw Hill International: New York.

Prechelt L, Unger B. 2000. An experiment measuring theeffects of Personal Software Process (PSP) training. IEEETransactions on Software Engineering 27(5): 465–472.

Putnam LH, Myers W. 1992. Measures for Excellence:Reliable Software on Time, Within Budget. Yourdon PressComputing Series: Englewood Cliffs, NJ.

Software Engineering Institute (SEI) article on theweb Last Modified: 14 December 2005. URL:http://www.sei.cmu.edu/str/descriptions/fpabody.html.

Copyright 2006 John Wiley & Sons, Ltd. Softw. Process Improve. Pract., 2006; 11: 561–572

572 DOI: 10.1002/spip