using machine learning to predict project effort: empirical case studies in data-starved domains

22
Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains Gary D. Boetticher Department of Software Engineering University of Houston - Clear Lake

Upload: steel-stokes

Post on 03-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains. Gary D. Boetticher Department of Software Engineering University of Houston - Clear Lake. What Customers Want. What Requirements Tell Us. Standish Group [Standish94]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Using Machine Learning to Predict Project Effort: Empirical Case Studies in

Data-starved Domains

Gary D. Boetticher

Department of Software Engineering

University of Houston - Clear Lake

Page 2: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

What Customers Want

Page 3: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

What Requirements Tell Us

Page 4: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Standish Group [Standish94]

• Exceeded planned budget by 90%

• Schedule by 222%

• More than 50% of the projects had less than 50% requirements

Page 5: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Underlying Problems

85% are at CMM 1 or 2 [CMU CMM95, Curtis93]

Scarcity of data

Page 6: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Consequences

Early life-cycle estimates use a factor of 4 [Boehm81, Heemstra92]

Page 7: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Related Research: Economic Models

Early inLifecycle

Late inLifecycle

Top-Down COCOMO II COCOMO II

Bottom-Up Function Points

Page 8: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Why are Machine Learning algorithms not used more often for estimating early in the life cycle?

Page 9: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Related Research - 2

Early inLifecycle

Late inLifecycle

Bayesian Chulani

CBR Delany Basio, Finnie, Kadoda,Mukhopadhyay, Prietula

GA Cordero

NeuralNetwork

Boetticher, Srinivasan,Samson, Wittig

Neurofuzzy Hodgkinson

OSR Briand

Page 10: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Goal

Apply Machine Learning (Neural Network)

early in the software lifecycle

against Empirical Data

Page 11: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Neural Network

Page 12: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Data

• B2B Electronic Commerce Data– Delphi-based– 104 Vectors

• Fleet Management Software– Delphi-based– 433 Vectors

Page 13: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Experiment 1: Product-Based Fleet to B2B

Vector SLOC Effort1 26 1: : :

Tra

inin

gD

ata

434 4398 2451 15 1: : :

Tes

tD

ata

104 2796 160

Page 14: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Experiment 1: Product Results

Experiment Actual Correct % Correct pred(25)1 11 out of 104 11%2 10 out of 104 10%3 11 out of 104 11%4 7 out of 104 7%5 12 out of 104 12%6 2 out of 104 2%7 8 out of 104 8%8 10 out of 104 10%9 14 out of 104 13%10 10 out of 104 10%

Page 15: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Experiment 2: Project-Based Results Fleet to B2B

Project Devel opment EffortExperimentNumber Actual Calculated

ProjectAccuracy

1 2083 1958 -6%2 2083 1962 -6%3 2083 1998 -4%4 2083 2238 7%5 2083 2110 1%6 2083 3412 64%7 2083 2555 23%8 2083 2104 1%9 2083 2083 0%10 2083 1777 -15%

Page 16: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Experiment 3: Product-Based B2B to Fleet

Vector SLOC Effort1 26 1: : :

Tra

inin

gD

ata

104 2796 1601 15 1: : :

Tes

tD

ata

434 4398 245

Page 17: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Extrapolation issue

Largest SLOCs divided by each other

4398 / 2796 = 1.57

Page 18: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Experiment 3: Product ResultsActual Correct

(raw scores)(out of 434)

% Correctpred(25)

(raw scores)

ActualCorrect (scaled)

(out of 434)

% Correctpred(25)(scaled)

130 30% 142 33%133 31% 96 22%78 18% 179 41%

118 27% 172 40%132 30% 136 31%130 30% 117 27%134 31% 68 16%146 34% 241 56%130 30% 117 27%106 24% 118 43%

Page 19: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Experiment 4: Project-Based Results B2B to Fleet

Calc. Proj.Dev. Effort(Raw Score)

(out of 15949)

ProjectAccuracy

(Raw Score)

Calc. Proj.Dev. Effort

(Scaled)(out of 15949)

ProjectAccuracy(Scaled)

9464 -41% 14887 -7%8787 -45% 13821 -13%9066 -43% 14261 -11%9809 -38% 15429 -3%9281 -42% 14599 -8%8753 -45% 13768 -14%8640 -46% 13591 -15%10855 -32% 17074 7%8915 -44% 14022 -12%9299 -42% 14627 -8%

Page 20: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Results

ExperimentNeural Network

Average Accuracy(Pred 25)

LinearRegression(Pred 25)

Fleet B2BProduct

9% 16%

Fleet B2BProject

90% 0%

B2B FleetProduct (Scaled)

34% 29%

B2B FleetProject (Scaled)

100% 100%

Page 21: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Conclusions

• Bottom-up approach produced very good results on a project-basis

• Results comparable between NN and stat.

• Scaling helped

• Estimation Approach is suitable for Prototype/Iterative Development

Page 22: Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains

Future Directions

• Explore an extrapolation function

• Apply other ML algorithms

• Collect additional metrics

• Integrate with COCOMO II

• Conduct more experiments (additional data)