using machine learning to predict project effort: empirical case studies in data-starved domains...
TRANSCRIPT
Using Machine Learning to Predict Project Effort: Empirical Case Studies in
Data-starved Domains
Gary D. Boetticher
Department of Software Engineering
University of Houston - Clear Lake
Standish Group [Standish94]
• Exceeded planned budget by 90%
• Schedule by 222%
• More than 50% of the projects had less than 50% requirements
Related Research: Economic Models
Early inLifecycle
Late inLifecycle
Top-Down COCOMO II COCOMO II
Bottom-Up Function Points
Related Research - 2
Early inLifecycle
Late inLifecycle
Bayesian Chulani
CBR Delany Basio, Finnie, Kadoda,Mukhopadhyay, Prietula
GA Cordero
NeuralNetwork
Boetticher, Srinivasan,Samson, Wittig
Neurofuzzy Hodgkinson
OSR Briand
Data
• B2B Electronic Commerce Data– Delphi-based– 104 Vectors
• Fleet Management Software– Delphi-based– 433 Vectors
Experiment 1: Product-Based Fleet to B2B
Vector SLOC Effort1 26 1: : :
Tra
inin
gD
ata
434 4398 2451 15 1: : :
Tes
tD
ata
104 2796 160
Experiment 1: Product Results
Experiment Actual Correct % Correct pred(25)1 11 out of 104 11%2 10 out of 104 10%3 11 out of 104 11%4 7 out of 104 7%5 12 out of 104 12%6 2 out of 104 2%7 8 out of 104 8%8 10 out of 104 10%9 14 out of 104 13%10 10 out of 104 10%
Experiment 2: Project-Based Results Fleet to B2B
Project Devel opment EffortExperimentNumber Actual Calculated
ProjectAccuracy
1 2083 1958 -6%2 2083 1962 -6%3 2083 1998 -4%4 2083 2238 7%5 2083 2110 1%6 2083 3412 64%7 2083 2555 23%8 2083 2104 1%9 2083 2083 0%10 2083 1777 -15%
Experiment 3: Product-Based B2B to Fleet
Vector SLOC Effort1 26 1: : :
Tra
inin
gD
ata
104 2796 1601 15 1: : :
Tes
tD
ata
434 4398 245
Experiment 3: Product ResultsActual Correct
(raw scores)(out of 434)
% Correctpred(25)
(raw scores)
ActualCorrect (scaled)
(out of 434)
% Correctpred(25)(scaled)
130 30% 142 33%133 31% 96 22%78 18% 179 41%
118 27% 172 40%132 30% 136 31%130 30% 117 27%134 31% 68 16%146 34% 241 56%130 30% 117 27%106 24% 118 43%
Experiment 4: Project-Based Results B2B to Fleet
Calc. Proj.Dev. Effort(Raw Score)
(out of 15949)
ProjectAccuracy
(Raw Score)
Calc. Proj.Dev. Effort
(Scaled)(out of 15949)
ProjectAccuracy(Scaled)
9464 -41% 14887 -7%8787 -45% 13821 -13%9066 -43% 14261 -11%9809 -38% 15429 -3%9281 -42% 14599 -8%8753 -45% 13768 -14%8640 -46% 13591 -15%10855 -32% 17074 7%8915 -44% 14022 -12%9299 -42% 14627 -8%
Results
ExperimentNeural Network
Average Accuracy(Pred 25)
LinearRegression(Pred 25)
Fleet B2BProduct
9% 16%
Fleet B2BProject
90% 0%
B2B FleetProduct (Scaled)
34% 29%
B2B FleetProject (Scaled)
100% 100%
Conclusions
• Bottom-up approach produced very good results on a project-basis
• Results comparable between NN and stat.
• Scaling helped
• Estimation Approach is suitable for Prototype/Iterative Development