page 1 4/14/16
TRANSCRIPT
Page 1
4/14/16
April 14, 2016 From: YLBG Consulting To: Dr. Brakes Human Resource Manager, StatCon Enterprize Re: FollowUp Analysis
PROJECT TITLE: Automobile Emission Analysis 1.0 PROJECT DESCRIPTION
StatCon Enterprize aims to ensure vehicles are held to the highest safety standards. Safety standards are tested using trained drivers to run experiments, such as air bag testing and obstacle courses. StatCon Enterprize conducted a designed experiment to test the installation of a device into cars and its effect on hydrocarbon, carbon monoxide, and carbon dioxide emissions.
The experiment was conducted 14 separate days in the time frame of December to June. A Bentley was used for all experiments and was tested under 4 conditions: Moderate engine load at 55 mph, high engine load at 35 mph, low engine load at 55 mph, and a hill climb at 145 mph.YLBG Consulting is assisting StatCon Enterprize with the analysis stage of the study. The objective is to perform a thorough data analysis of the collected data given by StatCon Enterprize. The analysis will be useful to aid StatCon about the efficiency of the device being studied.
1.1 RESEARCH QUESTIONS Question 1: After being installed, does the device reduce hydrocarbon and carbon monoxide, while increasing carbon dioxide emission? Question 2: Does the device work better under one of the conditions?
1.2 STATISTICAL QUESTIONS Question 1: Is there enough experimental data to create a statistical conclusion? Question 2: Do the results show significant statistical evidence?
An Equal Opportunity University
Page 2
4/14/16
1.3 VARIABLES OF INTEREST The response is hydrocarbon (ppm), carbon monoxide (pct of volume), and carbon dioxide (pct of volume) emission. The explanatory variable, “condition”, was modeled as a fixedeffect and modified to be a categorical variable representing conditions 1,2,3 and 4. “Mod” represents testdriving the car with a moderate engine load at 55 mph, “high” refers to a high engine load at 35 mph, “low” refers to a low engine load at 55 mph, and “hill” refers to a hill climb at 145 mph. The explanatory “device” is a binary variable, taking value 0 if the device has not been installed or value 1 if the device has been installed. To adjust for the random effects and to address the possible violation of the independence assumption, a new variable “group” was created to represent the combination of the 3 repetitions for each condition and day. 2.0 EXPLORATORY DATA ANALYSIS (EDA)
The data was first checked for missing values and we found that 9 obersvations are missing, which is about 5% of total observations. So it would be fine to simply delete the missing values when doing our data analysis. Then, we obtained boxplots (see figures in Appendix B) to check outliers. After grouping the emission of hydrocarbon, carbon monoxide, and carbon dioxide by the variable of device, we found that there is no outliers for the emission of carbon dioxide. But for hydrocarbon, we found 4 ouliers with device. For carbon monoxide emission, there are 10 outliers without device. 3.0 –STATISTICAL ANALYSIS
Model 1:
First of all, we fitted our data with multilevel modeling. The variable “Device ” and “Condition” (modified as categorical) are fixed effects. The variable “group” is treated as a random effect. In this sense, we constructed a random intercept in our model. Model Assumptions:
1. Linearity Assumption: The linearity assumption cannot be checked for our results because we are building a model with categorical and binomial predictors.
An Equal Opportunity University
Page 3
4/14/16
2. Normality Assumption:
We use the qqplot and histogrom to check the assumption of normality. In the qqplot, a linearity of points suggest that data are normally distributed.
3. Constant Variance Assumption:
We use the residual vs. predicted plot to check the assumption of constant variance. The errors have constant variance, with the residuals scattered randomly around zero. If residuals increase or decrease with fitted values in a pattern, the errors may not have constant variance.
4. Independence Assumption: The main concern of the assumption is that there might be intraclass correlation in the desgin of the experiment because for a given day and a given condition, there are three observations recorded. So in the same group, observation data might be correlated. To check the independence assumption, we could calculate the intraclass correlation coefficient of the variable “Group”, which is the ratio of the blocking variance “group” and total variance. When carbon monoxide is a response, intraclass corr. coeff. = 2301/(2301+1491) = 0.61 When hydrocarbon is a response, intraclass corr. coeff. = 8.414/(8.414+3.413) = 0.71 When carbon dioxide is a response, intraclass corr. coeff. = 0.32/(0.32+0.04) = 0.89
These results (Appendix A) show that the independence variable is violated. To fix this, we need to use the “Group” variable and model it as a random effect.
In the residual plot for hydrocarbon, the residuals showed a heteroskedastic effect. This means the constant variance assumption is violated. To fix this issue, the hydrocarbon model needs to be transformed. Also, for hydrocarbon, the histogram was right skewed also suggesting the need to transform the model.
For carbon dioxide, the histogram showed the data to be heavy tailed. Additionally, the QQ plot was not linear meaning the data is skewed. This also suggests a transformation for the carbon dioxide model.
Model 2 (with transformation):
After transformation, we found that the histogrom of the square root transformation of hydrocarbon follows a normal distribution. In the qqplot, almost all the points fall on the 45 degrees line. The constant variance assumption is satisfied. The same is true for the carbon monoxide model, which we transformed using the natural log function.
An Equal Opportunity University
Page 4
4/14/16
Interpretation: There is enough statistical evidence to draw conclusions from this data set. For carbon monoxide, condition was statistically significant (p<0.0001). Hill climb is significanly different from moderate, low, and high. Device and group are not significant predictors. For carbon dioxide, condition was statistically significant (p=0.0110). The emission for low condition was significanly different from the high condition. For hydrocarbon, condition and device were significant (p<0.0001 and p=0.0054, respectively). The emission for the hill climb condition was significantly different from low, moderate, and high. The variable, group, was not significant in any of the three models. This is what we want because it means that the repetition for each condition was not significantly different across each day. 4.0 RECOMMENDATIONS Question 1: From the model outputs (Appendix C), the device installation is significant for hydrocarbon emission (Table 5). The device significantly decreases hydrocarbon emission by 0.3986 ppm on average (Table 7). There is no conclusive evidence that installing the device reduces carbon monoxide emission or increases carbon dioxide emission. Question 2: For hydrocarbon emission, the device performed better under the low, moderate, and high conditions. There was a significantly lower difference in the estimates for emission under these three conditions than there was for the hill climb condition (Table 6).
5.0 RESOURCES Resources regarding SAS, the statistical software used for analysis, can be found at support.sas.com. This includes: tutorials, troubleshooting, and software purchasing information. Additionally, for more information on mixed models, see https://onlinecourses.science.psu.edu/stat502/node/162.
6.0 CONSIDERATIONS
Limitations: The emission results may not be applicable to all vehicle manufacturers and models. The only car the device was tested on was a Bentley, however all car models may not produce the same results. Concerns: There could be other possible covariates effecting the response. For example, the condition of the car may effect the response. Older car models may not have the same emission as newer models. Additionally, the way
An Equal Opportunity University
Page 5
4/14/16
in which the emissions were recorded or the different drivers enrolled in the study could produce a source of error.
Technical comments: In the carbon dioxide model, the normality of the data is still not satisfied because the histogram takes on a binomal shape.
7.0 ACKNOWLEDGEMENT OF WORK
In publications and presentations, please include an acknowledgment to the Statistical Consulting Center for statistical assistance. For example, we suggest wording such as “Statistical analysis of data was conducted by Greg Gibble, Chenyue (Sibyl) Lei, Minsuk Yeom, and Julie Brustle from YLBG Consulting at the Penn State Statistical Consulting Center.” Thank you to Dr. Brakes and StatCon Enterprize for the opportunity to work on this project. We have enjoyed working with your team and hope to work with you again in the future. Detailed information including the program used to run all analysis can be found in Appendix D.
Appendix A: Model Assumptions
An Equal Opportunity University
Page 6
4/14/16
Figure 1: Residual outputs with carbon monoxide as the response.
An Equal Opportunity University
Page 7
4/14/16
Figure 2: Residual outputs with the square root of hydrocarbon as the response.
An Equal Opportunity University
Page 8
4/14/16
Figure 3: Residual outputs with carbon dioxide as the response.
An Equal Opportunity University
Page 9
4/14/16
Figure 4: variance component for independence assumption in HC
An Equal Opportunity University
Page 10
4/14/16
Figure 5: variance component for independence assumption in CO
An Equal Opportunity University
Page 11
4/14/16
Figure 6: variance component for independence assumption in CO2
Appendix B: Box Plot Graphs
An Equal Opportunity University
Page 12
4/14/16
Figure 7: Boxplot of hydrocarbon emission. Each of the 4 conditions are shown, with (1) and without the device (0).
Figure 8: Boxplot of carbon monoxide emission. Each of the 4 conditions are shown, with (1) and without the device (0).
An Equal Opportunity University
Page 13
4/14/16
Figure 9: Boxplot of carbon dioxide emission. Each of the 4 conditions are shown, with (1) and without the device (0).
Appendix C: Model Outputs
Table 1: Carbon Monoxide Emission ANOVA
An Equal Opportunity University
Page 14
4/14/16
Table 2: CO pairwise difference between means for condition
Table 3: Carbon Dioxide Emission ANOVA
An Equal Opportunity University
Page 15
4/14/16
Table 4: CO2 pairwise difference between means for condition
Table 5: Hydrocarbon Emission ANOVA
An Equal Opportunity University
Page 16
4/14/16
Table 6: HC pairwise difference between means for condition
Table 7: HC pairwise difference between means for device
Appendix D: Code (SAS) Here is the SAS code used to do the analysis on your project for the device prototype VI, along with this
An Equal Opportunity University
Page 17
4/14/16
part, a SAS link will also be sent for ease of access to the analysis. /* This is the SAS code used by YLBG consulting to analyze the data from StatCon Enterprize on the device prototype VI. In this code there are many different sections which provide different outputs that will be explained at each part Most Recent Edit: April 14th, 2016 */ /* This section is just uploading the data with an extra variable that we created
ourselves called group which takes all the replications from one day and combines them into one entire
variable unit */ Data car; input day 1 Device 5 con $ 911 rep 17 hc 2124 co2 2933 co 3740 group 45; datalines; 1 0 Mod 1 3.69 15.22 0.06 1 1 0 Mod 2 1.62 15.28 0.05 1 1 0 Mod 3 3.58 15.26 0.06 1 1 0 High 1 1.38 15.22 0.05 2 1 0 High 2 1.23 14.84 0.05 2 1 0 High 3 2.85 15.37 0.06 2 1 0 Low 1 2.12 14.68 0.02 3 1 0 Low 2 1.41 14.68 0 3 1 0 Low 3 0.88 14.75 0 3 1 0 Hill 1 8 14.74 0.03 4 1 0 Hill 2 9.67 14.76 0.06 4 1 0 Hill 3 12.69 14.74 0.05 4 2 0 Mod 1 5.23 14.92 0.05 5 2 0 Mod 2 6 14.95 0.06 5 2 0 Mod 3 3.77 14.94 0.04 5 2 0 High 1 8.15 14.95 0.07 6 2 0 High 2 15.31 14.98 0.07 6 2 0 High 3 5.31 15.18 0.05 6 2 0 Low 1 1.65 15.6 0.04 7 2 0 Low 2 4.31 15.39 0.03 7 2 0 Low 3 7 2 0 Hill 1 15.15 15.02 0.1 8 2 0 Hill 2 15.54 15.12 0.08 8 2 0 Hill 3 12.31 15.11 0.09 8 3 0 Mod 1 4.23 14.73 0.06 9 3 0 Mod 2 0.71 14.86 0.05 9 3 0 Mod 3 0.85 14.72 0.05 9 3 0 High 1 4.54 14.68 0.04 10
An Equal Opportunity University
Page 18
4/14/16
3 0 High 2 5.23 14.71 0.02 10 3 0 High 3 3.92 14.63 0.02 10 3 0 Low 1 0.15 15.96 0.06 11 3 0 Low 2 0.9 14.22 0.01 11 3 0 Low 3 4.5 14.42 0.02 11 3 0 Hill 1 15.57 14.46 0.06 12 3 0 Hill 2 13.31 14.51 0.08 12 3 0 Hill 3 14.69 14.6 0.09 12 4 0 Mod 1 2.15 14.25 0.05 13 4 0 Mod 2 4 13.8 0.05 13 4 0 Mod 3 5.15 13.62 0.06 13 4 0 High 1 4.9 13.61 0.04 14 4 0 High 2 6.55 13.53 0.03 14 4 0 High 3 14 4 0 Low 1 0 13.96 0.02 15 4 0 Low 2 3.67 13.89 0.04 15 4 0 Low 3 0.07 13.98 0.02 15 4 0 Hill 1 8.46 13.88 0.08 16 4 0 Hill 2 9.69 13.23 0.12 16 4 0 Hill 3 1.62 13.69 0.08 16 5 0 Mod 1 10 13.91 0.05 17 5 0 Mod 2 10.77 13.93 0.05 17 5 0 Mod 3 8.58 14.02 0.06 17 5 0 High 1 8.9 14.01 0.08 18 5 0 High 2 8.3 14.14 0.06 18 5 0 High 3 4.54 14.01 0.04 18 5 0 Low 1 0.31 14.36 0.03 19 5 0 Low 2 2.31 14.18 0.03 19 5 0 Low 3 1 14.33 0.04 19 5 0 Hill 1 12.69 14.33 0.11 20 5 0 Hill 2 16.23 14.42 0.12 20 5 0 Hill 3 14.46 14.54 0.11 20 6 0 Mod 1 7.54 14.9 0.05 21 6 0 Mod 2 6.69 14.93 0.05 21 6 0 Mod 3 3.92 14.95 0.04 21 6 0 High 1 6.92 15.14 0.04 22 6 0 High 2 6.54 15.11 0.03 22 6 0 High 3 4.69 15 0.03 22 6 0 Low 1 8.69 15.14 0.05 23 6 0 Low 2 2.85 15.71 0.05 23 6 0 Low 3 1.77 15.87 0.05 23 6 0 Hill 1 14.31 15.89 0.11 24 6 0 Hill 2 12.38 15.96 0.11 24 6 0 Hill 3 11.38 16 0.11 24 7 1 Mod 1 25 7 1 Mod 2 25 7 1 Mod 3 25 7 1 High 1 11.23 15.34 0.08 26
An Equal Opportunity University
Page 19
4/14/16
7 1 High 2 8.46 15.3 0.04 26 7 1 High 3 9.77 15.24 0.04 26 7 1 Low 1 6.77 15.49 0.06 27 7 1 Low 2 5.38 15.68 0.05 27 7 1 Low 3 4.85 15.7 0.05 27 7 1 Hill 1 17.85 15.6 0.09 28 7 1 Hill 2 15.62 15.61 0.09 28 7 1 Hill 3 16.77 15.6 0.1 28 8 1 Mod 1 9.69 14.9 0.08 29 8 1 Mod 2 5.38 14.83 0.05 29 8 1 Mod 3 5 14.85 0.07 29 8 1 High 1 2.15 15 0.05 30 8 1 High 2 1.15 15 0.04 30 8 1 High 3 1.15 14.99 0.04 30 8 1 Low 1 3.31 15.18 0.04 31 8 1 Low 2 2.77 15.28 0.05 31 8 1 Low 3 3 15.29 0.07 31 8 1 Hill 1 9.38 15.27 0.11 32 8 1 Hill 2 7.08 15.34 0.11 32 8 1 Hill 3 9.31 15.35 0.12 32 9 1 Mod 1 5 14.6 0.05 33 9 1 Mod 2 3.46 14.6 0.05 33 9 1 Mod 3 2.92 14.6 0.05 33 9 1 High 1 5.31 14.69 0.02 34 9 1 High 2 5.08 14.68 0.01 34 9 1 High 3 4.46 14.69 0.02 34 9 1 Low 1 2.08 14.77 0.02 35 9 1 Low 2 1.36 14.9 0.02 35 9 1 Low 3 0.08 15.04 0.02 35 9 1 Hill 1 7 15.08 0.11 36 9 1 Hill 2 4.08 15.05 0.09 36 9 1 Hill 3 2.69 15.06 0.08 36 10 1 Mod 1 2.77 15.6 0.07 37 10 1 Mod 2 0 15.55 0.02 37 10 1 Mod 3 6.15 15 0.06 37 10 1 High 1 0.92 14.94 0.02 38 10 1 High 2 0.38 14.98 0.02 38 10 1 High 3 0.62 14.99 0.03 38 10 1 Low 1 4 15.14 0.06 39 10 1 Low 2 3.15 15.31 0.03 39 10 1 Low 3 1.62 15.43 0.02 39 10 1 Hill 1 3.46 15.45 0.07 40 10 1 Hill 2 4.62 15.52 0.07 40 10 1 Hill 3 4.31 15.52 0.09 40 11 1 Mod 1 3.23 14.16 0.02 41 11 1 Mod 2 3.83 14.23 0.05 41 11 1 Mod 3 41 11 1 High 1 5.31 14.21 0.04 42
An Equal Opportunity University
Page 20
4/14/16
11 1 High 2 10.23 14.18 0.02 42 11 1 High 3 3 14.29 0.03 42 11 1 Low 1 0.46 14.54 0.07 43 11 1 Low 2 5 14.31 0.08 43 11 1 Low 3 2.15 14.43 0.04 43 11 1 Hill 1 3.54 14.47 0.07 44 11 1 Hill 2 3.08 14.57 0.1 44 11 1 Hill 3 2.92 14.58 0.09 44 12 1 Mod 1 3.15 14.18 0.03 45 12 1 Mod 2 1.38 14.18 0.03 45 12 1 Mod 3 0.46 14.21 0.03 45 12 1 High 1 2.31 14.16 0.03 46 12 1 High 2 1 14.11 0.03 46 12 1 High 3 1.46 14.2 0.03 46 12 1 Low 1 3.69 14.56 0.06 47 12 1 Low 2 3.08 14.73 0.07 47 12 1 Low 3 1.77 14.85 0.08 47 12 1 Hill 1 1.38 14.86 0.07 48 12 1 Hill 2 4.85 14.09 0.06 48 12 1 Hill 3 4.15 14.11 0.07 48 13 1 Mod 1 49 13 1 Mod 2 49 13 1 Mod 3 49 13 1 High 1 0.92 14.18 0.03 50 13 1 High 2 0.36 14.19 0.03 50 13 1 High 3 0.15 14.17 0.03 50 13 1 Low 1 5.92 14.72 0.06 51 13 1 Low 2 5.38 15.19 0.07 51 13 1 Low 3 5.71 15.4 0.09 51 13 1 Hill 1 3.92 15.48 0.08 52 13 1 Hill 2 6.15 15.56 0.11 52 13 1 Hill 3 4.54 15.62 0.11 52 14 1 Mod 1 0.08 14.11 0.04 53 14 1 Mod 2 1.77 13.95 0.04 53 14 1 Mod 3 1.46 14.11 0.04 53 14 1 High 1 2.62 13.48 0.03 54 14 1 High 2 1.69 13.55 0.02 54 14 1 High 3 3.92 13.53 0.03 54 14 1 Low 1 2.23 14.45 0.03 55 14 1 Low 2 3.46 14.69 0.06 55 14 1 Low 3 7 15.08 0.05 55 14 1 Hill 1 5.54 13.99 0.06 56 14 1 Hill 2 7.62 14.05 0.07 56 14 1 Hill 3 8.15 14.17 0.06 56 run; /*
An Equal Opportunity University
Page 21
4/14/16
This section of code is used to transform some of the outputs of carbon monoixde and hydrocarbon.
This was done to help change the outputs and make the residuals more normal then what they previously were
*/ data transform; set work.car; sqrtco = sqrt(co); logco = log(co); sqrthc = sqrt(hc); loghc = log(hc); run; /* The next few sections are testing and analyzing the data using ANOVA models and tables. The first part of each section (proc mixed) is the ANOVA models and graphs themselves. The second part of each section (proc plm) further tests the levels of significant
variables to see how each different level changes the models. Each section has a title
corrisponding to which variable was being tested. */ proc mixed data=transform method=type3 plots=residualpanel;
class day device con rep; model logco = device con; random group ; title 'Carbon Monoxide'; store outFull; run;
Proc plm restore = outFull; lsmeans con / adjust=Tukey plot=meanplot cl lines; title 'Carbon Monoxide'; run; proc mixed data=car method=type3 plots=residualpanel;
class day device con rep; model co2 = device con; random group ; title 'Carbon Dioxide'; store outFull2; run;
Proc plm restore = outFull2; lsmeans con / adjust=Tukey plot=meanplot cl lines; title 'Carbon Dioxide'; run; proc mixed data=transform method=type3 plots=residualpanel;
class day device con rep;
An Equal Opportunity University
Page 22
4/14/16
model sqrthc = device con; random group ; title 'Hydrocarbon'; store outFull3; run;
Proc plm restore = outFull3; lsmeans con device / adjust=Tukey plot=meanplot cl lines; title 'Hydrocarbon'; run; /* For any other further questions of the code or the results please do not hesitate to
ask. An email that we can be reached at is [email protected] Thank you again for choosing YLBG consulting for your statistical analysis problems. We look forward to working with you again in the future. Sincerely, YBLG Consulting */
An Equal Opportunity University