page 1 4/14/16

22
Page 1 4/14/16 April 14, 2016 From: YLBG Consulting To: Dr. Brakes Human Resource Manager, StatCon Enterprize Re: FollowUp Analysis PROJECT TITLE: Automobile Emission Analysis 1.0 PROJECT DESCRIPTION StatCon Enterprize aims to ensure vehicles are held to the highest safety standards. Safety standards are tested using trained drivers to run experiments, such as air bag testing and obstacle courses. StatCon Enterprize conducted a designed experiment to test the installation of a device into cars and its effect on hydrocarbon, carbon monoxide, and carbon dioxide emissions. The experiment was conducted 14 separate days in the time frame of December to June. A Bentley was used for all experiments and was tested under 4 conditions: Moderate engine load at 55 mph, high engine load at 35 mph, low engine load at 55 mph, and a hill climb at 145 mph.YLBG Consulting is assisting StatCon Enterprize with the analysis stage of the study. The objective is to perform a thorough data analysis of the collected data given by StatCon Enterprize. The analysis will be useful to aid StatCon about the efficiency of the device being studied. 1.1 RESEARCH QUESTIONS Question 1: After being installed, does the device reduce hydrocarbon and carbon monoxide, while increasing carbon dioxide emission? Question 2: Does the device work better under one of the conditions? 1.2 STATISTICAL QUESTIONS Question 1: Is there enough experimental data to create a statistical conclusion? Question 2: Do the results show significant statistical evidence? An Equal Opportunity University

Upload: others

Post on 31-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page 1 4/14/16

Page 1

4/14/16

April 14, 2016 From: YLBG Consulting To: Dr. Brakes­ Human Resource Manager, StatCon Enterprize Re: Follow­Up Analysis

PROJECT TITLE: Automobile Emission Analysis 1.0 ­ PROJECT DESCRIPTION

StatCon Enterprize aims to ensure vehicles are held to the highest safety standards. Safety standards are tested using trained drivers to run experiments, such as air bag testing and obstacle courses. StatCon Enterprize conducted a designed experiment to test the installation of a device into cars and its effect on hydrocarbon, carbon monoxide, and carbon dioxide emissions.

The experiment was conducted 14 separate days in the time frame of December to June. A Bentley was used for all experiments and was tested under 4 conditions: Moderate engine load at 55 mph, high engine load at 35 mph, low engine load at 55 mph, and a hill climb at 145 mph.YLBG Consulting is assisting StatCon Enterprize with the analysis stage of the study. The objective is to perform a thorough data analysis of the collected data given by StatCon Enterprize. The analysis will be useful to aid StatCon about the efficiency of the device being studied.

1.1 ­ RESEARCH QUESTIONS Question 1: After being installed, does the device reduce hydrocarbon and carbon monoxide, while increasing carbon dioxide emission? Question 2: Does the device work better under one of the conditions?

1.2 ­ STATISTICAL QUESTIONS Question 1: Is there enough experimental data to create a statistical conclusion? Question 2: Do the results show significant statistical evidence?

An Equal Opportunity University

Page 2: Page 1 4/14/16

Page 2

4/14/16

1.3 ­ VARIABLES OF INTEREST The response is hydrocarbon (ppm), carbon monoxide (pct of volume), and carbon dioxide (pct of volume) emission. The explanatory variable, “condition”, was modeled as a fixed­effect and modified to be a categorical variable representing conditions 1,2,3 and 4. “Mod” represents test­driving the car with a moderate engine load at 55 mph, “high” refers to a high engine load at 35 mph, “low” refers to a low engine load at 55 mph, and “hill” refers to a hill climb at 145 mph. The explanatory “device” is a binary variable, taking value 0 if the device has not been installed or value 1 if the device has been installed. To adjust for the random effects and to address the possible violation of the independence assumption, a new variable “group” was created to represent the combination of the 3 repetitions for each condition and day. 2.0 ­ EXPLORATORY DATA ANALYSIS (EDA)

The data was first checked for missing values and we found that 9 obersvations are missing, which is about 5% of total observations. So it would be fine to simply delete the missing values when doing our data analysis. Then, we obtained boxplots (see figures in Appendix B) to check outliers. After grouping the emission of hydrocarbon, carbon monoxide, and carbon dioxide by the variable of device, we found that there is no outliers for the emission of carbon dioxide. But for hydrocarbon, we found 4 ouliers with device. For carbon monoxide emission, there are 10 outliers without device. 3.0 –STATISTICAL ANALYSIS

Model 1:

First of all, we fitted our data with multilevel modeling. The variable “Device ” and “Condition” (modified as categorical) are fixed effects. The variable “group” is treated as a random effect. In this sense, we constructed a random intercept in our model. Model Assumptions:

1. Linearity Assumption: The linearity assumption cannot be checked for our results because we are building a model with categorical and binomial predictors.

An Equal Opportunity University

Page 3: Page 1 4/14/16

Page 3

4/14/16

2. Normality Assumption:

We use the qq­plot and histogrom to check the assumption of normality. In the qq­plot, a linearity of points suggest that data are normally distributed.

3. Constant Variance Assumption:

We use the residual vs. predicted plot to check the assumption of constant variance. The errors have constant variance, with the residuals scattered randomly around zero. If residuals increase or decrease with fitted values in a pattern, the errors may not have constant variance.

4. Independence Assumption: The main concern of the assumption is that there might be intra­class correlation in the desgin of the experiment because for a given day and a given condition, there are three observations recorded. So in the same group, observation data might be correlated. To check the independence assumption, we could calculate the intra­class correlation coefficient of the variable “Group”, which is the ratio of the blocking variance “group” and total variance. When carbon monoxide is a response, intra­class corr. coeff. = 2301/(2301+1491) = 0.61 When hydrocarbon is a response, intra­class corr. coeff. = 8.414/(8.414+3.413) = 0.71 When carbon dioxide is a response, intra­class corr. coeff. = 0.32/(0.32+0.04) = 0.89

These results (Appendix A) show that the independence variable is violated. To fix this, we need to use the “Group” variable and model it as a random effect.

In the residual plot for hydrocarbon, the residuals showed a heteroskedastic effect. This means the constant variance assumption is violated. To fix this issue, the hydrocarbon model needs to be transformed. Also, for hydrocarbon, the histogram was right skewed also suggesting the need to transform the model.

For carbon dioxide, the histogram showed the data to be heavy tailed. Additionally, the Q­Q plot was not linear meaning the data is skewed. This also suggests a transformation for the carbon dioxide model.

Model 2 (with transformation):

After transformation, we found that the histogrom of the square root transformation of hydrocarbon follows a normal distribution. In the qq­plot, almost all the points fall on the 45 degrees line. The constant variance assumption is satisfied. The same is true for the carbon monoxide model, which we transformed using the natural log function.

An Equal Opportunity University

Page 4: Page 1 4/14/16

Page 4

4/14/16

Interpretation: There is enough statistical evidence to draw conclusions from this data set. For carbon monoxide, condition was statistically significant (p<0.0001). Hill climb is significanly different from moderate, low, and high. Device and group are not significant predictors. For carbon dioxide, condition was statistically significant (p=0.0110). The emission for low condition was significanly different from the high condition. For hydrocarbon, condition and device were significant (p<0.0001 and p=0.0054, respectively). The emission for the hill climb condition was significantly different from low, moderate, and high. The variable, group, was not significant in any of the three models. This is what we want because it means that the repetition for each condition was not significantly different across each day. 4.0 ­ RECOMMENDATIONS Question 1: From the model outputs (Appendix C), the device installation is significant for hydrocarbon emission (Table 5). The device significantly decreases hydrocarbon emission by 0.3986 ppm on average (Table 7). There is no conclusive evidence that installing the device reduces carbon monoxide emission or increases carbon dioxide emission. Question 2: For hydrocarbon emission, the device performed better under the low, moderate, and high conditions. There was a significantly lower difference in the estimates for emission under these three conditions than there was for the hill climb condition (Table 6).

5.0 ­ RESOURCES Resources regarding SAS, the statistical software used for analysis, can be found at support.sas.com. This includes: tutorials, troubleshooting, and software purchasing information. Additionally, for more information on mixed models, see https://onlinecourses.science.psu.edu/stat502/node/162.

6.0 ­ CONSIDERATIONS

Limitations: The emission results may not be applicable to all vehicle manufacturers and models. The only car the device was tested on was a Bentley, however all car models may not produce the same results. Concerns: There could be other possible covariates effecting the response. For example, the condition of the car may effect the response. Older car models may not have the same emission as newer models. Additionally, the way

An Equal Opportunity University

Page 5: Page 1 4/14/16

Page 5

4/14/16

in which the emissions were recorded or the different drivers enrolled in the study could produce a source of error.

Technical comments: In the carbon dioxide model, the normality of the data is still not satisfied because the histogram takes on a binomal shape.

7.0 ­ ACKNOWLEDGEMENT OF WORK

In publications and presentations, please include an acknowledgment to the Statistical Consulting Center for statistical assistance. For example, we suggest wording such as “Statistical analysis of data was conducted by Greg Gibble, Chenyue (Sibyl) Lei, Minsuk Yeom, and Julie Brustle from YLBG Consulting at the Penn State Statistical Consulting Center.” Thank you to Dr. Brakes and StatCon Enterprize for the opportunity to work on this project. We have enjoyed working with your team and hope to work with you again in the future. Detailed information including the program used to run all analysis can be found in Appendix D.

Appendix A: Model Assumptions

An Equal Opportunity University

Page 6: Page 1 4/14/16

Page 6

4/14/16

Figure 1: Residual outputs with carbon monoxide as the response.

An Equal Opportunity University

Page 7: Page 1 4/14/16

Page 7

4/14/16

Figure 2: Residual outputs with the square root of hydrocarbon as the response.

An Equal Opportunity University

Page 8: Page 1 4/14/16

Page 8

4/14/16

Figure 3: Residual outputs with carbon dioxide as the response.

An Equal Opportunity University

Page 9: Page 1 4/14/16

Page 9

4/14/16

Figure 4: variance component for independence assumption in HC

An Equal Opportunity University

Page 10: Page 1 4/14/16

Page 10

4/14/16

Figure 5: variance component for independence assumption in CO

An Equal Opportunity University

Page 11: Page 1 4/14/16

Page 11

4/14/16

Figure 6: variance component for independence assumption in CO2

Appendix B: Box Plot Graphs

An Equal Opportunity University

Page 12: Page 1 4/14/16

Page 12

4/14/16

Figure 7: Boxplot of hydrocarbon emission. Each of the 4 conditions are shown, with (1) and without the device (0).

Figure 8: Boxplot of carbon monoxide emission. Each of the 4 conditions are shown, with (1) and without the device (0).

An Equal Opportunity University

Page 13: Page 1 4/14/16

Page 13

4/14/16

Figure 9: Boxplot of carbon dioxide emission. Each of the 4 conditions are shown, with (1) and without the device (0).

Appendix C: Model Outputs

Table 1: Carbon Monoxide Emission ANOVA

An Equal Opportunity University

Page 14: Page 1 4/14/16

Page 14

4/14/16

Table 2: CO pairwise difference between means for condition

Table 3: Carbon Dioxide Emission ANOVA

An Equal Opportunity University

Page 15: Page 1 4/14/16

Page 15

4/14/16

Table 4: CO2 pairwise difference between means for condition

Table 5: Hydrocarbon Emission ANOVA

An Equal Opportunity University

Page 16: Page 1 4/14/16

Page 16

4/14/16

Table 6: HC pairwise difference between means for condition

Table 7: HC pairwise difference between means for device

Appendix D: Code (SAS) Here is the SAS code used to do the analysis on your project for the device prototype VI, along with this

An Equal Opportunity University

Page 17: Page 1 4/14/16

Page 17

4/14/16

part, a SAS link will also be sent for ease of access to the analysis. /* This is the SAS code used by YLBG consulting to analyze the data from StatCon Enterprize on the device prototype VI. In this code there are many different sections which provide different outputs that will be explained at each part Most Recent Edit: April 14th, 2016 */   /* This section is just uploading the data with an extra variable that we created

ourselves called group which takes all the replications from one day and combines them into one entire

variable unit */ Data car; input day 1 Device 5 con $ 9­11 rep 17 hc 21­24 co2 29­33 co 37­40 group 45; datalines; 1 0 Mod 1 3.69 15.22 0.06 1 1 0 Mod 2 1.62 15.28 0.05 1 1 0 Mod 3 3.58 15.26 0.06 1 1 0 High 1 1.38 15.22 0.05 2 1 0 High 2 1.23 14.84 0.05 2 1 0 High 3 2.85 15.37 0.06 2 1 0 Low 1 2.12 14.68 0.02 3 1 0 Low 2 1.41 14.68 0 3 1 0 Low 3 0.88 14.75 0 3 1 0 Hill 1 8 14.74 0.03 4 1 0 Hill 2 9.67 14.76 0.06 4 1 0 Hill 3 12.69 14.74 0.05 4 2 0 Mod 1 5.23 14.92 0.05 5 2 0 Mod 2 6 14.95 0.06 5 2 0 Mod 3 3.77 14.94 0.04 5 2 0 High 1 8.15 14.95 0.07 6 2 0 High 2 15.31 14.98 0.07 6 2 0 High 3 5.31 15.18 0.05 6 2 0 Low 1 1.65 15.6 0.04 7 2 0 Low 2 4.31 15.39 0.03 7 2 0 Low 3 7 2 0 Hill 1 15.15 15.02 0.1 8 2 0 Hill 2 15.54 15.12 0.08 8 2 0 Hill 3 12.31 15.11 0.09 8 3 0 Mod 1 4.23 14.73 0.06 9 3 0 Mod 2 0.71 14.86 0.05 9 3 0 Mod 3 0.85 14.72 0.05 9 3 0 High 1 4.54 14.68 0.04 10

An Equal Opportunity University

Page 18: Page 1 4/14/16

Page 18

4/14/16

3 0 High 2 5.23 14.71 0.02 10 3 0 High 3 3.92 14.63 0.02 10 3 0 Low 1 0.15 15.96 0.06 11 3 0 Low 2 0.9 14.22 0.01 11 3 0 Low 3 4.5 14.42 0.02 11 3 0 Hill 1 15.57 14.46 0.06 12 3 0 Hill 2 13.31 14.51 0.08 12 3 0 Hill 3 14.69 14.6 0.09 12 4 0 Mod 1 2.15 14.25 0.05 13 4 0 Mod 2 4 13.8 0.05 13 4 0 Mod 3 5.15 13.62 0.06 13 4 0 High 1 4.9 13.61 0.04 14 4 0 High 2 6.55 13.53 0.03 14 4 0 High 3 14 4 0 Low 1 0 13.96 0.02 15 4 0 Low 2 3.67 13.89 0.04 15 4 0 Low 3 0.07 13.98 0.02 15 4 0 Hill 1 8.46 13.88 0.08 16 4 0 Hill 2 9.69 13.23 0.12 16 4 0 Hill 3 1.62 13.69 0.08 16 5 0 Mod 1 10 13.91 0.05 17 5 0 Mod 2 10.77 13.93 0.05 17 5 0 Mod 3 8.58 14.02 0.06 17 5 0 High 1 8.9 14.01 0.08 18 5 0 High 2 8.3 14.14 0.06 18 5 0 High 3 4.54 14.01 0.04 18 5 0 Low 1 0.31 14.36 0.03 19 5 0 Low 2 2.31 14.18 0.03 19 5 0 Low 3 1 14.33 0.04 19 5 0 Hill 1 12.69 14.33 0.11 20 5 0 Hill 2 16.23 14.42 0.12 20 5 0 Hill 3 14.46 14.54 0.11 20 6 0 Mod 1 7.54 14.9 0.05 21 6 0 Mod 2 6.69 14.93 0.05 21 6 0 Mod 3 3.92 14.95 0.04 21 6 0 High 1 6.92 15.14 0.04 22 6 0 High 2 6.54 15.11 0.03 22 6 0 High 3 4.69 15 0.03 22 6 0 Low 1 8.69 15.14 0.05 23 6 0 Low 2 2.85 15.71 0.05 23 6 0 Low 3 1.77 15.87 0.05 23 6 0 Hill 1 14.31 15.89 0.11 24 6 0 Hill 2 12.38 15.96 0.11 24 6 0 Hill 3 11.38 16 0.11 24 7 1 Mod 1 25 7 1 Mod 2 25 7 1 Mod 3 25 7 1 High 1 11.23 15.34 0.08 26

An Equal Opportunity University

Page 19: Page 1 4/14/16

Page 19

4/14/16

7 1 High 2 8.46 15.3 0.04 26 7 1 High 3 9.77 15.24 0.04 26 7 1 Low 1 6.77 15.49 0.06 27 7 1 Low 2 5.38 15.68 0.05 27 7 1 Low 3 4.85 15.7 0.05 27 7 1 Hill 1 17.85 15.6 0.09 28 7 1 Hill 2 15.62 15.61 0.09 28 7 1 Hill 3 16.77 15.6 0.1 28 8 1 Mod 1 9.69 14.9 0.08 29 8 1 Mod 2 5.38 14.83 0.05 29 8 1 Mod 3 5 14.85 0.07 29 8 1 High 1 2.15 15 0.05 30 8 1 High 2 1.15 15 0.04 30 8 1 High 3 1.15 14.99 0.04 30 8 1 Low 1 3.31 15.18 0.04 31 8 1 Low 2 2.77 15.28 0.05 31 8 1 Low 3 3 15.29 0.07 31 8 1 Hill 1 9.38 15.27 0.11 32 8 1 Hill 2 7.08 15.34 0.11 32 8 1 Hill 3 9.31 15.35 0.12 32 9 1 Mod 1 5 14.6 0.05 33 9 1 Mod 2 3.46 14.6 0.05 33 9 1 Mod 3 2.92 14.6 0.05 33 9 1 High 1 5.31 14.69 0.02 34 9 1 High 2 5.08 14.68 0.01 34 9 1 High 3 4.46 14.69 0.02 34 9 1 Low 1 2.08 14.77 0.02 35 9 1 Low 2 1.36 14.9 0.02 35 9 1 Low 3 0.08 15.04 0.02 35 9 1 Hill 1 7 15.08 0.11 36 9 1 Hill 2 4.08 15.05 0.09 36 9 1 Hill 3 2.69 15.06 0.08 36 10 1 Mod 1 2.77 15.6 0.07 37 10 1 Mod 2 0 15.55 0.02 37 10 1 Mod 3 6.15 15 0.06 37 10 1 High 1 0.92 14.94 0.02 38 10 1 High 2 0.38 14.98 0.02 38 10 1 High 3 0.62 14.99 0.03 38 10 1 Low 1 4 15.14 0.06 39 10 1 Low 2 3.15 15.31 0.03 39 10 1 Low 3 1.62 15.43 0.02 39 10 1 Hill 1 3.46 15.45 0.07 40 10 1 Hill 2 4.62 15.52 0.07 40 10 1 Hill 3 4.31 15.52 0.09 40 11 1 Mod 1 3.23 14.16 0.02 41 11 1 Mod 2 3.83 14.23 0.05 41 11 1 Mod 3 41 11 1 High 1 5.31 14.21 0.04 42

An Equal Opportunity University

Page 20: Page 1 4/14/16

Page 20

4/14/16

11 1 High 2 10.23 14.18 0.02 42 11 1 High 3 3 14.29 0.03 42 11 1 Low 1 0.46 14.54 0.07 43 11 1 Low 2 5 14.31 0.08 43 11 1 Low 3 2.15 14.43 0.04 43 11 1 Hill 1 3.54 14.47 0.07 44 11 1 Hill 2 3.08 14.57 0.1 44 11 1 Hill 3 2.92 14.58 0.09 44 12 1 Mod 1 3.15 14.18 0.03 45 12 1 Mod 2 1.38 14.18 0.03 45 12 1 Mod 3 0.46 14.21 0.03 45 12 1 High 1 2.31 14.16 0.03 46 12 1 High 2 1 14.11 0.03 46 12 1 High 3 1.46 14.2 0.03 46 12 1 Low 1 3.69 14.56 0.06 47 12 1 Low 2 3.08 14.73 0.07 47 12 1 Low 3 1.77 14.85 0.08 47 12 1 Hill 1 1.38 14.86 0.07 48 12 1 Hill 2 4.85 14.09 0.06 48 12 1 Hill 3 4.15 14.11 0.07 48 13 1 Mod 1 49 13 1 Mod 2 49 13 1 Mod 3 49 13 1 High 1 0.92 14.18 0.03 50 13 1 High 2 0.36 14.19 0.03 50 13 1 High 3 0.15 14.17 0.03 50 13 1 Low 1 5.92 14.72 0.06 51 13 1 Low 2 5.38 15.19 0.07 51 13 1 Low 3 5.71 15.4 0.09 51 13 1 Hill 1 3.92 15.48 0.08 52 13 1 Hill 2 6.15 15.56 0.11 52 13 1 Hill 3 4.54 15.62 0.11 52 14 1 Mod 1 0.08 14.11 0.04 53 14 1 Mod 2 1.77 13.95 0.04 53 14 1 Mod 3 1.46 14.11 0.04 53 14 1 High 1 2.62 13.48 0.03 54 14 1 High 2 1.69 13.55 0.02 54 14 1 High 3 3.92 13.53 0.03 54 14 1 Low 1 2.23 14.45 0.03 55 14 1 Low 2 3.46 14.69 0.06 55 14 1 Low 3 7 15.08 0.05 55 14 1 Hill 1 5.54 13.99 0.06 56 14 1 Hill 2 7.62 14.05 0.07 56 14 1 Hill 3 8.15 14.17 0.06 56 run;   /*

An Equal Opportunity University

Page 21: Page 1 4/14/16

Page 21

4/14/16

This section of code is used to transform some of the outputs of carbon monoixde and hydrocarbon.

This was done to help change the outputs and make the residuals more normal then what they previously were

*/ data transform; set work.car; sqrtco = sqrt(co); logco = log(co); sqrthc = sqrt(hc); loghc = log(hc); run;   /* The next few sections are testing and analyzing the data using ANOVA models and tables. The first part of each section (proc mixed) is the ANOVA models and graphs themselves. The second part of each section (proc plm) further tests the levels of significant

variables to see how each different level changes the models. Each section has a title

corrisponding to which variable was being tested. */ proc mixed data=transform method=type3 plots=residualpanel;

class day device con rep; model logco = device con; random group ; title 'Carbon Monoxide'; store outFull; run;

Proc plm restore = outFull; lsmeans con / adjust=Tukey plot=meanplot cl lines; title 'Carbon Monoxide'; run;   proc mixed data=car method=type3 plots=residualpanel;

class day device con rep; model co2 = device con; random group ; title 'Carbon Dioxide'; store outFull2; run;

Proc plm restore = outFull2; lsmeans con / adjust=Tukey plot=meanplot cl lines; title 'Carbon Dioxide'; run;   proc mixed data=transform method=type3 plots=residualpanel;

class day device con rep;

An Equal Opportunity University

Page 22: Page 1 4/14/16

Page 22

4/14/16

model sqrthc = device con; random group ; title 'Hydrocarbon'; store outFull3; run;

Proc plm restore = outFull3; lsmeans con device / adjust=Tukey plot=meanplot cl lines; title 'Hydrocarbon'; run; /* For any other further questions of the code or the results please do not hesitate to

ask. An email that we can be reached at is [email protected] Thank you again for choosing YLBG consulting for your statistical analysis problems. We look forward to working with you again in the future. Sincerely, YBLG Consulting */

An Equal Opportunity University