everett fire department data analysis 2009 & 2012
DESCRIPTION
This is the Design of Experiments portion of the Everett Fire Department Data Analysis project that I completed which was part of a 35 page statistical final report including an ANOVA (Analysis of Variance) and graphical analysis of the response times for the Everett City Fire Department.TRANSCRIPT
City of Everett Fire Department Data Analysis (2009 & 2012)
Phil Engel / … / …
Math 370: Statistics for Engineers
December 11th, 2015
General Statistics for Response Times:
2009 (h:mm:ss) 2012 (h:mm:ss)Min. Response Time 0:00:00 0:00:00Max. Response Time 0:29:55 0:29:57Mean (Average) Response Time 0:05:43 0:06:07Median 0:05:12 0:05:30Standard Deviation 0:03:10 0:03:19Variance 0:10:00 0:10:59
P a g e | 2
Table of ContentsAbstract:.........................................................................................................................................................3
Project Inception........................................................................................................................................3
Design of Experiments....................................................................................................................................4
Analysis of DOE...........................................................................................................................................4
Regression Analysis: Response Time vs. TOD and Year...............................................................................6
Interactions of Factors................................................................................................................................7
Extra: Response Times vs. Populaton Growth............................................................................................8
Conclusions of DOE...................................................................................................................................10
Appendix A...................................................................................................................................................11
Glossary....................................................................................................................................................11
Appendix B....................................................................................................................................................12
Years 2008-2013 Average Response Times...............................................................................................12
Appendix C....................................................................................................................................................13
Anova Analysis Figures..............................................................................................................................13
Appendix D...................................................................................................................................................18
DOE Analysis Graphs and Data..................................................................................................................18
Works Cited..................................................................................................................................................19
P a g e | 3
Abstract:The purpose of this document is to provide statistical analysis of the Everett Fire Department response time data. Initiation of this project began by parsing the EFD response time data from the years 2007-2013. After sorting the data it was determined that the years 2006 and 2007 were unreliable data and thus they were then excluded from the project. After determination of the appropriate variance in order to ensure randomization in all of the samples, final analysis was performed on years 2009 and 2012, months of April and December, times of day morning, midday and evening, and the station response times inside and outside of the EFD fire zones. The tests performed were: an ANOVA (analysis of variance), a D.O.E. (Design of Experiments), an analysis of the station response times from inside and outside of the zones, and graphical interpretation of subsidiary data. The analysis was performed using the statistical analysis software Minitab.
Project InceptionAfter sorting the data there were several questions that needed answers. The first question asked was what are the independent and dependent variables? The dependent variable was clearly chosen to be the response time. By soft analyzing the data and through discussion with the Everett Fire Department, the independent variables were chosen to be Year, Month, and Time of Day. An additional statistical analysis was performed on individual fire stations and the zones they were in.
P a g e | 4
Design of ExperimentsAfter performing an ANOVA analysis to determine the best years, months, and times of the day to use based on the variances and high and low values of the average response times, a Design of Experiments test can now be performed to find out what were the significant factors and what kind of effect they have on the response time data. Design of Experiments can help conclude which factors are the most significant in affecting the response time and if there are any direct relationships or interactions between all of the factors chosen from the analysis.
The Year, Month, and Time of Day were chosen as particular factors that affected the overall response time of Everett Fire Department. This is the table of how the DOE was broken down:
Factors (+) (-)
Year 2012 2009
Month April December
Time of Day Morning Afternoon
After determining that factors and each level for each factor, an experiment could now be performed. A ½ fractional factorial design was used as it minimized the amount of data points required. Here is a figure 1 shows the factorial design treatment combinations (randomized) and the corresponding data response times.
Figure 1 – Treatment combinations for the DOE experiment
Analysis of DOETwo example factors that are the biggest effects from the analysis of variance can be calculated by the equation:MainEffect (Wheren=number of replicates ):
A= 12n [a+ab−b−(1 ) ]B= 1
2n[b+ab−a−(1 )]
P a g e | 5
The DOE results were as follows:
Coded Coefficients
Term Effect Coef Coef T-Value P-Value VIFConstant 5.987 * * *Year 0.4752 0.2376 * * * 1.00Month -0.03325 -0.01662 * * * 1.00TOD 0.3498 0.1749 * * * 1.00Year*Month 0.08325 0.04163 * * * 1.00Year*TOD -0.11675 -0.05837 * * * 1.00Month*TOD -0.10825 -0.05413 * * * 1.00Year*Month*TOD -0.12475 -0.06237 * * * 1.00
From these results, it can be concluded that Year and TOD (Time of Day) were numerically the largest effects in the factorial analysis of the response times. This can also be seen in the Pareto Chart that compares the effects and different treatment combination effects.
None of the combinations between the Year, Month, or TOD had a large effect on the response time data.
Since there were graphical and numerical conclusions that Year and TOD had the largest effects on the data, a regression analysis could now be performed to find out the accuracy in concluding that these are in fact affect-able factors in the response data. The regression analysis data table is as follows:
Regression Analysis: Response Time vs. TOD and Year
P a g e | 6
Analysis of Variance
Source DF Seq SS Contribution Adj SS Adj MS F-Value P-ValueRegression 2 0.69638 87.67% 0.69638 0.34819 17.78 0.005 TOD 1 0.24465 30.80% 0.24465 0.24465 12.50 0.017 Year 1 0.45173 56.87% 0.45173 0.45173 23.07 0.005Error 5 0.09789 12.33% 0.09789 0.01958 Lack-of-Fit 1 0.02726 3.43% 0.02726 0.02726 1.54 0.282 Pure Error 4 0.07063 8.89% 0.07063 0.01766Total 7 0.79427 100.00%
Model Summary
S R-sq R-sq(adj) PRESS R-sq(pred)0.139925 87.67% 82.74% 0.250610 68.45%
Coefficients
Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 5.9874 0.0495 (5.8602, 6.1145) 121.03 0.000TOD 0.1749 0.0495 (0.0477, 0.3020) 3.53 0.017 1.00Year 0.2376 0.0495 (0.1105, 0.3648) 4.80 0.005 1.00
Regression Equation
Response Time = 5.9874 + 0.1749 TOD + 0.2376 Year
As seen in the results, the T-Value has a high value and both the TOD and Year P-Values are less than .05 (with a selected alpha of .05), fitting the 95% confidence interval that these two factors have a significant effect on the response time.
P a g e | 7
Interactions of FactorsA secondary analysis of the factors chosen could be to analyze interaction plots between the three factors. This tells us if any of the factors have an interaction with each other and if that is the case, a conclusion can be drawn. Here is the matrix plot of interactions between the factors Year, Month, and Time of Day.
There is direct interaction with Month and the Year as observed. What this could represent in an applicable style is that if continuing years are showing predictable data, then it can be concluded that the months within those years will also be normal/predictable. This could be beneficial for the Everett Fire Department if they are interested in scheduling firefighters in accordance with historically low or high months, in order to maximize their response efficiency.
P a g e | 8
Extra: Response Times vs. Populaton Growth
It was interesting to note that the average response times per time of day were typically consistent with each other, minus the outlier of the response times in the afternoon of 2012. This could have been due to a data entry error however.
Another interesting note to mention is that when graphing the total population of Everett per year vs. the response time per year, the response times seemed to have a trailing and cubic effect against the population growth, significant seen after about 1-2 years after a large jump in population.
2007 2008 2009 2010 2011 2012 2013 20144.00
4.50
5.00
5.50
6.00
6.50
7.00
Comparison of Response Time with Time of Day
4pm-12am (Afternoon) 12am-8am (Morning)8am-4pm (Midday)
Tim
e (m
ins)
2007 2008 2009 2010 2011 2012 2013 20145.50
5.60
5.70
5.80
5.90
6.00
6.10
6.20
94000
96000
98000
100000
102000
104000
106000
Response Time and Population Growth
Average Reponse Time Per Year
Population Per Year
Tim
e (m
ins)
Popu
latio
n
P a g e | 9
0 1 2 3 4 5 6 7 8 9 10 11 125.005.105.205.305.405.505.605.705.805.906.006.106.206.306.406.506.60
Responses Times per Month per Year
2008 2009 2010 2011 2012 2013
Month in Numbers
Resp
onse
Tim
e
Response Times per Month per Year had no significant observable patterns. This appears to be the most scattered data that was displayed over the course of the 5 years analyzed.
P a g e | 10
Conclusions of DOEFrom the DOE test, it can be concluded that the year and months have strong interaction with each other. This means that if the year is following to be predictable in average response times, then the months concluding that year will also be predictable based on historical high and low months. An additional point to note is that the Time of Day that the Year that responses take place are the most significant effects and factors for response times, which mean that the Everett Fire Department should assign firefighters based on the average response time for the year up to date, and focus more on how to implement efficient scheduling based on specific times of the day rather than an arbitrary shift that crosses over the times analyzed (12am-12pm, 12pm-12am)
P a g e | 11
Appendix A
Glossary
ANOVA – Analysis of variance, this refers to a procedure that uses the F statistic to test the statistical differences among the average values of two or more random samples of a population of data.
DOE – Design of experiment, a procedure for eliminating insignificant test variables to reveal and properly analyze the significant variables and their reactions.
F value – When used in the ANOVA, the obtained value of F provides a test for the statistical significance of the observed differences among the means of two or more random samples.
P value – The probability of the hypothesis being tested to be true.
Regression – In a graph with data points plotted from the horizontal and vertical values the regression is a line that predicts the vertical values for a given horizontal value.
Pairwise – Coupled together with another value
Factor – The value(s) chosen in a plot to perform a DOE on.
EFD – Everett Fire Department
P a g e | 12
Appendix B
Years 2008-2013 Average Response Times
Year Overall Mean[h:mm:ss]
12am-8am (Morning)[in Mins.]
8am-4pm (Midday)[in Mins.]
4pm-12am (Afternoon)[in Mins.]
2008 0:05:48 6.483 5.550 5.7002009 0:05:43 6.483 5.400 5.6332010 0:05:45 6.467 5.467 5.6832011 0:05:51 6.550 5.733 5.6672012 0:06:07 6.717 5.917 6.8172013 0:05:58 6.667 5.767 5.817
Appendix C
P a g e | 13
Anova Analysis FiguresOne-way ANOVA: 2008-2013
Method
Null hypothesis All means are equalAlternative hypothesis At least one mean is differentSignificance level α = 0.05
Equal variances were assumed for the analysis.
Factor Information
Factor Levels ValuesFactor 6 2013, 2012, 2011, 2010, 2009, 2008
Analysis of Variance of the Years
Source DF Adj SS Adj MS F-Value P-ValueFactor 5 0.9299 0.18599 5.44 0.000Error 66 2.2567 0.03419Total 71 3.1867
Model Summary
S R-sq R-sq(adj) R-sq(pred)0.184912 29.18% 23.82% 15.72%
Means
Factor N Mean StDev 95% CI2013 12 5.9597 0.1298 (5.8531, 6.0663)2012 12 6.0306 0.2989 (5.9240, 6.1371)2011 12 5.8583 0.0815 (5.7518, 5.9649)2010 12 5.7472 0.1468 (5.6406, 5.8538)2009 12 5.7111 0.0854 (5.6045, 5.8177)2008 12 5.7958 0.2520 (5.6893, 5.9024)
Pooled StDev = 0.184912
One-way ANOVA: 4pm-12am, 12am-8am, 8am-4pm
Method
Null hypothesis All means are equalAlternative hypothesis At least one mean is differentSignificance level α = 0.05
Equal variances were assumed for the analysis.
Factor Information
Factor Levels ValuesFactor 3 4pm-12am, 12am-8am, 8am-4pm
Analysis of Variance Time of Day
P a g e | 14
Source DF Adj SS Adj MS F-Value P-ValueFactor 2 2.734 1.36724 15.63 0.000Error 15 1.312 0.08747Total 17 4.047
Model Summary
S R-sq R-sq(adj) R-sq(pred)0.295757 67.58% 63.25% 53.31%
Means
Factor N Mean StDev 95% CI4pm-12am 6 5.886 0.460 ( 5.629, 6.143)12am-8am 6 6.5611 0.1063 (6.3038, 6.8185)8am-4pm 6 5.6389 0.1985 (5.3815, 5.8962)
Pooled StDev = 0.295757One-way ANOVA: Months
* NOTE * Cannot draw the interval plot for the Tukey procedure. Interval plots for comparisons are illegible with more than 45 intervals.
Method
Null hypothesis All means are equalAlternative hypothesis At least one mean is differentSignificance level α = 0.05
Equal variances were assumed for the analysis.
Factor Information
Factor Levels ValuesFactor 12 Jan, Feb, March, April, May, June, July, Aug, sept, Oct, Nov, Dec
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-ValueFactor 11 0.2913 0.02648 0.55 0.862Error 60 2.8954 0.04826Total 71 3.1867
Model Summary
S R-sq R-sq(adj) R-sq(pred)0.219673 9.14% 0.00% 0.00%
Means
Factor N Mean StDev 95% CIJan 6 5.9889 0.2128 (5.8095, 6.1683)Feb 6 5.8222 0.0841 (5.6428, 6.0016)March 6 5.7361 0.1665 (5.5567, 5.9155)April 6 5.867 0.246 ( 5.687, 6.046)May 6 5.7861 0.2141 (5.6067, 5.9655)June 6 5.8167 0.1517 (5.6373, 5.9961)
P a g e | 15
July 6 5.8278 0.2059 (5.6484, 6.0072)Aug 6 5.8333 0.1680 (5.6539, 6.0127)sept 6 5.8250 0.2046 (5.6456, 6.0044)Oct 6 5.8806 0.2115 (5.7012, 6.0599)Nov 6 5.9222 0.1277 (5.7428, 6.1016)Dec 6 5.900 0.443 ( 5.721, 6.079)
Pooled StDev = 0.219673
Tukey Pairwise Comparisons
Grouping Information Using the Tukey Method and 95% Confidence
Factor N Mean GroupingJan 6 5.9889 ANov 6 5.9222 ADec 6 5.900 AOct 6 5.8806 AApril 6 5.867 AAug 6 5.8333 AJuly 6 5.8278 Asept 6 5.8250 AFeb 6 5.8222 AJune 6 5.8167 AMay 6 5.7861 AMarch 6 5.7361 A
Means that do not share a letter are significantly different.
Tukey Simultaneous Tests for Differences of Means
Difference of Difference SE of AdjustedLevels of Means Difference 95% CI T-Value P-ValueFeb - Jan -0.167 0.127 (-0.598, 0.265) -1.31 0.974March - Jan -0.253 0.127 (-0.684, 0.179) -1.99 0.696April - Jan -0.122 0.127 (-0.554, 0.309) -0.96 0.998May - Jan -0.203 0.127 (-0.634, 0.229) -1.60 0.903June - Jan -0.172 0.127 (-0.604, 0.259) -1.36 0.967July - Jan -0.161 0.127 (-0.592, 0.270) -1.27 0.980Aug - Jan -0.156 0.127 (-0.587, 0.276) -1.23 0.985sept - Jan -0.164 0.127 (-0.595, 0.267) -1.29 0.977Oct - Jan -0.108 0.127 (-0.540, 0.323) -0.85 0.999Nov - Jan -0.067 0.127 (-0.498, 0.365) -0.53 1.000Dec - Jan -0.089 0.127 (-0.520, 0.342) -0.70 1.000March - Feb -0.086 0.127 (-0.517, 0.345) -0.68 1.000April - Feb 0.044 0.127 (-0.387, 0.476) 0.35 1.000May - Feb -0.036 0.127 (-0.467, 0.395) -0.28 1.000June - Feb -0.006 0.127 (-0.437, 0.426) -0.04 1.000July - Feb 0.006 0.127 (-0.426, 0.437) 0.04 1.000Aug - Feb 0.011 0.127 (-0.420, 0.442) 0.09 1.000sept - Feb 0.003 0.127 (-0.429, 0.434) 0.02 1.000Oct - Feb 0.058 0.127 (-0.373, 0.490) 0.46 1.000Nov - Feb 0.100 0.127 (-0.331, 0.531) 0.79 1.000Dec - Feb 0.078 0.127 (-0.354, 0.509) 0.61 1.000April - March 0.131 0.127 (-0.301, 0.562) 1.03 0.996May - March 0.050 0.127 (-0.381, 0.481) 0.39 1.000June - March 0.081 0.127 (-0.351, 0.512) 0.64 1.000July - March 0.092 0.127 (-0.340, 0.523) 0.72 1.000Aug - March 0.097 0.127 (-0.334, 0.529) 0.77 1.000sept - March 0.089 0.127 (-0.342, 0.520) 0.70 1.000Oct - March 0.144 0.127 (-0.287, 0.576) 1.14 0.991Nov - March 0.186 0.127 (-0.245, 0.617) 1.47 0.944
P a g e | 16
Dec - March 0.164 0.127 (-0.267, 0.595) 1.29 0.977May - April -0.081 0.127 (-0.512, 0.351) -0.64 1.000June - April -0.050 0.127 (-0.481, 0.381) -0.39 1.000July - April -0.039 0.127 (-0.470, 0.392) -0.31 1.000Aug - April -0.033 0.127 (-0.465, 0.398) -0.26 1.000sept - April -0.042 0.127 (-0.473, 0.390) -0.33 1.000Oct - April 0.014 0.127 (-0.417, 0.445) 0.11 1.000Nov - April 0.056 0.127 (-0.376, 0.487) 0.44 1.000Dec - April 0.033 0.127 (-0.398, 0.465) 0.26 1.000June - May 0.031 0.127 (-0.401, 0.462) 0.24 1.000July - May 0.042 0.127 (-0.390, 0.473) 0.33 1.000Aug - May 0.047 0.127 (-0.384, 0.479) 0.37 1.000sept - May 0.039 0.127 (-0.392, 0.470) 0.31 1.000Oct - May 0.094 0.127 (-0.337, 0.526) 0.74 1.000Nov - May 0.136 0.127 (-0.295, 0.567) 1.07 0.995Dec - May 0.114 0.127 (-0.317, 0.545) 0.90 0.999July - June 0.011 0.127 (-0.420, 0.442) 0.09 1.000Aug - June 0.017 0.127 (-0.415, 0.448) 0.13 1.000sept - June 0.008 0.127 (-0.423, 0.440) 0.07 1.000Oct - June 0.064 0.127 (-0.367, 0.495) 0.50 1.000Nov - June 0.106 0.127 (-0.326, 0.537) 0.83 0.999Dec - June 0.083 0.127 (-0.348, 0.515) 0.66 1.000Aug - July 0.006 0.127 (-0.426, 0.437) 0.04 1.000sept - July -0.003 0.127 (-0.434, 0.429) -0.02 1.000Oct - July 0.053 0.127 (-0.379, 0.484) 0.42 1.000Nov - July 0.094 0.127 (-0.337, 0.526) 0.74 1.000Dec - July 0.072 0.127 (-0.359, 0.504) 0.57 1.000sept - Aug -0.008 0.127 (-0.440, 0.423) -0.07 1.000Oct - Aug 0.047 0.127 (-0.384, 0.479) 0.37 1.000Nov - Aug 0.089 0.127 (-0.342, 0.520) 0.70 1.000Dec - Aug 0.067 0.127 (-0.365, 0.498) 0.53 1.000Oct - sept 0.056 0.127 (-0.376, 0.487) 0.44 1.000Nov - sept 0.097 0.127 (-0.334, 0.529) 0.77 1.000Dec - sept 0.075 0.127 (-0.356, 0.506) 0.59 1.000Nov - Oct 0.042 0.127 (-0.390, 0.473) 0.33 1.000Dec - Oct 0.019 0.127 (-0.412, 0.451) 0.15 1.000Dec - Nov -0.022 0.127 (-0.454, 0.409) -0.18 1.000
Individual confidence level = 99.88%
Tukey Pairwise Comparisons
Grouping Information Using the Tukey Method and 95% Confidence
Factor N Mean Grouping12am-8am 6 6.5611 A4pm-12am 6 5.886 B8am-4pm 6 5.6389 B
Means that do not share a letter are significantly different.
Tukey Simultaneous Tests for Differences of Means
Difference SE of AdjustedDifference of Levels of Means Difference 95% CI T-Value P-Value12am-8am - 4pm-12am 0.675 0.171 ( 0.232, 1.118) 3.95 0.0038am-4pm - 4pm-12am -0.247 0.171 (-0.690, 0.196) -1.45 0.3438am-4pm - 12am-8am -0.922 0.171 (-1.365, -0.479) -5.40 0.000
Individual confidence level = 97.97%
P a g e | 17
One-way ANOVA: 2008-2013 Tukey Pairwise Comparisons
Grouping Information Using the Tukey Method and 95% Confidence
Factor N Mean Grouping2012 12 6.0306 A2013 12 5.9597 A B2011 12 5.8583 A B C2008 12 5.7958 B C2010 12 5.7472 B C2009 12 5.7111 C
Means that do not share a letter are significantly different.
Tukey Simultaneous Tests for Differences of Means
Difference Difference SE of Adjustedof Levels of Means Difference 95% CI T-Value P-Value2012 - 2013 0.0708 0.0755 (-0.1507, 0.2924) 0.94 0.9352011 - 2013 -0.1014 0.0755 (-0.3229, 0.1201) -1.34 0.7602010 - 2013 -0.2125 0.0755 (-0.4340, 0.0090) -2.81 0.0682009 - 2013 -0.2486 0.0755 (-0.4701, -0.0271) -3.29 0.0192008 - 2013 -0.1639 0.0755 (-0.3854, 0.0576) -2.17 0.2652011 - 2012 -0.1722 0.0755 (-0.3937, 0.0493) -2.28 0.2162010 - 2012 -0.2833 0.0755 (-0.5049, -0.0618) -3.75 0.0052009 - 2012 -0.3194 0.0755 (-0.5410, -0.0979) -4.23 0.0012008 - 2012 -0.2347 0.0755 (-0.4562, -0.0132) -3.11 0.0322010 - 2011 -0.1111 0.0755 (-0.3326, 0.1104) -1.47 0.6832009 - 2011 -0.1472 0.0755 (-0.3687, 0.0743) -1.95 0.3822008 - 2011 -0.0625 0.0755 (-0.2840, 0.1590) -0.83 0.9612009 - 2010 -0.0361 0.0755 (-0.2576, 0.1854) -0.48 0.9972008 - 2010 0.0486 0.0755 (-0.1729, 0.2701) 0.64 0.9872008 - 2009 0.0847 0.0755 (-0.1368, 0.3062) 1.12 0.870
Individual confidence level = 99.54%
Appendix D
P a g e | 18
DOE Analysis Graphs and DataDescription: Normal probability plot of the effects Year, Month, and Time of Da
Description: Four-way analysis of the residuals vs. fits and normal graph for the regression analysis with TOD and Year as factors.
P a g e | 19
Works CitedMontgomery, Douglas C., George C. Runger, and Norma Faris Hubele. Engineering Statistics, 5th Ed.: Student Solutions Manual. Hoboken, NJ: Wiley, 2011. Print.
https://www.google.com/publicdata/explore?ds=kf7tgg1uo9ude_#!ctype=l&strail=false&bcs=d&nselm=h&met_y=population&scale_y=lin&ind_y=false&rdim=country&idim=place:5322640&ifdim=country&hl=en_US&dl=en_US&ind=false
http://www.theurbanist.org/2015/01/05/how-fast-can-everett-really-grow/