size doesn’t matter? on the value of software size features for effort estimation

Size Doesn’t Matter?

On the Value of Software SizeFeatures for Effort Estimation

Ekrem Kocaguneli, Tim Menzies : WVU,USAJairus Hihn : JPL, USAByeong Ho Kang : UTAS, Aus

2

Sound bites

Size matters!

But, lack of size features can be tolerated• caveat: need to first prune irrelevancies

PROMISE’12

Sept2012

3

Role of Size Features in SEE

Size features are at the heart of some of the most widely used SEE methods

COCOMO is based on LOC

Function points (FP) is based on logical transactions

Various others exist such as number ofrequirements, number of modules, number of web pages and so on…

PROMISE’12

Sept2012

4

Role of Size Features in SEE (cntd.)

Size features have their advantages and disadvantages

LOC can be automated for counting and is good a posteriori, but is difficult to estimate early on

FP provides a way of a size metric based on early design information; hence more accurate a priori

FP cannot be automated and is subjective… Even though training reduces the estimate variation

PROMISE’12

Sept2012

5

Objections to Size FeaturesAlthough particular size features may have their advantages in certain scenarios, there is a strong opposition…

“Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.” Bill Gates

“This (referring to LOC) is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view.” E. W. Dijkstra

So we question: Under what conditions are size features actually a “must” and can we compensate their absence?

PROMISE’12

Sept2012

6

So let’s check…

If we throw away size attributes, what happens?

PROMISE’12

Sept2012

7

If we remove “size”, what happens?

DatasetsCocomo81 Nasa93 SdrCocomo81o Nasa93c1 DesharnaisCocomo81e Nasa93c2 DesharnaisL1Cocomo81s Nasa93c5 DesharnaisL2

DesharnaisL3

Error Measures

MAR

MMRE

MdMRE

Pred(25)

MMER

MBRE

MIBRE

Methods

CART

1NN

Compare standard successful methods run on reduced and full data sets, using 7 error measures and 13 data sets…

Full data set includes size featuresReduced data sets lacks size features

PROMISE’12

Sept2012

8

Evaluation (cntd.)

DatasetsCocomo81 Nasa93 SdrCocomo81o Nasa93c1 DesharnaisCocomo81e Nasa93c2 DesharnaisL1Cocomo81s Nasa93c5 DesharnaisL2

DesharnaisL3

Error Measures

MAR

MMRE

MdMRE

Pred(25)

MMER

MBRE

MIBRE

Methods

pop1NN

CART

1NN

Using 7 error measures

Compare pop1NN against CART & 1NN

On multiple data sets collected via COCOMO, COCOMOII and FP

Mann-Whitney 95%Why CART?Dejaeger et al. TSE 2012

PROMISE’12

Sept2012

9

Results (full data has “size”, reduced has not)

CART on reduced-dataset vs. CART on full-dataset

Last column shows total loss count of CART run on reduced dataset (i.e. no size features)

In 7 of 13 tests, taking out size makes CART perform worse

PROMISE’12

Sept2012

10

Results (full data has “size”, reduced has not)

Total loss counts of CART and 1NN run on reduced data vs. their variants run on full data…

Standard methods are better off with size attributes of the data sets… I.e. they cannot compensate for the lack of size attributes well

(copied from last slide)PROMISE’12

Sept2012

11

New idea

If we prune data irrelevancies, can we survive losing size attributes?

PROMISE’12

Sept2012

12

Instance selection• Chang (1974)

– Most of the instances are uninformative.– Reduced data sets of size 514, 150, 66 to 34, 14,6 prototypes .

• Li et al. (2009) – genetic algorithm for instance selection

• Turhan et al. (2009) – instance selection as a filter for cross-company defect data – See also, Kocaguneli et al. 2011

• Kocaguneli et al. (2011) variance-based selection:– Dendogram of clusters: prune sub-trees with large variances

• Keung et al.’s (2011) Analogy-X – instance selection method for analogous entry

• New idea, 1popNN : a very simple instance selector

PROMISE’12

Sept2012

13

pop1NN : the urchin shapeWe propose that a “popularity” based method can compensate the lack of size features

The “popularity” of an instance is the number of times it is the nearest-neighbor of other instances

Sea urchin is a good example for SEE data… Popular central instances that are closest neighbors to scattered neighbors…

PROMISE’12

Sept2012

14

Formally, this is rNN• rNN =

– Reverse Nearest Neighbor– E.g. how many residential areas would find a new store as their nearest choice. – E.g. predict popularity of a new cell phone plan, determine how many profiles have

the plan as their best match, against the existing plans in the market.

• Can be computed efficiently (rNN chaining) – see Lopez-Sastre et al., – Fast Reciprocal Nearest Neighbors Clustering, – Signal Processing, 2012, Vol. 92, pages 270—275)

Sept2012

PROMISE’12

15

So let’s check…

If we (1) throw away size attributesand (2) irrelevant rows,

then what happens?

PROMISE’12

Sept2012

16

Details:pop1NN (cntd.)

1. Calculate distances between every training instance-tuple2. Convert distances of Step 1 into ordering of neighbors3. Mark closest neighbors and calculate popularity4. Order training instances in decreasing popularity5. Decide which instances to select• Experiments with nearest neighbor on a hold-out set

6. Return Estimates for the test instances

pop1NN is a 6-step procedure…

PROMISE’12

Sept2012

17

Results (reduced data)

Loss values of pop1NN (on reduced data) vs. CART and 1NN (on full data)

pop1NN loses 2 out of 13 data sets against 1NN

pop1NN loses 4 out of 13 data sets against 1NN

PROMISE’12

Sept2012

18

Discussion

PROMISE’12

Sept2012

19

ConclusionsSuccessful methods (1NN & CART) cannot compensate the lack of size attributes very well

Lack of size features decreases their performance in majority of the data sets

When 1NN is augmented with a popularity-based pre-processor to come up with pop1NN

Lack of size features can be tolerated in most of the datasets Caveat: need to first prune irrelevancies

Size features are essential for standard learners Practitioners with enough resources to correctly collect size

features should do so In the lack of such resources, pop1NN-like methods can

compensate for the lack of the size featuresPROMISE’12

Sept2012

20

Future Work• Pop1NN as a feature selector?

– Lipowezky (1998) : • feature and case selection are

similar tasks, • both remove cells in the

hypercube of all instancestimes all features.

– So it should be possible to convert a case selection mechanism into a feature selector. • Transpose data • Nearby columns are correlated• Keep columns that are near no other

• Active learning:– pop1NN does not use dependent variable information. – can identify the popular instances of a data set, guide expert reflection on

collect dependent variable information PROMISE’12

Sept2012

21

Questions? Comments?

PROMISE’12

Sept2012

size doesn’t matter? on the value of software size features for effort estimation

Technology