sample using linear regression reportdorpjr/emse4765/project/sample rep… · independent variables...

20
1 Satellite Applications Motivated by the Development of a Silver-Zinc Battery Battery Performance Analysis Using Linear Regression By: Leslie Gillespie-Marthaler EMSE 271 December 18, 2009 SAMPLE REPORT

Upload: others

Post on 31-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

1

Satellite Applications Motivated by the Development of a Silver-Zinc Battery

Battery Performance Analysis Using Linear Regression

By: Leslie Gillespie-Marthaler

EMSE 271 December 18, 2009

SAMPLE REPORT

Page 2: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

2

Introduction: Satellite manufacturers recently proposed replacing battery technology with a silver-zinc technology. Since satellite applications require reliable and long-lasting battery technology, the manufacturing association requested an analysis of the following:

1. Develop a model for linear regression based on battery performance data, using the Log of (Cycles to Failure); the model should be based on the best predictors available to characteristic the behavior of the battery throughout its lifecycle;

2. Perform diagnostic analysis of the fitted model; and 3. Forecast the Cycles to Failure with a 95% confidence interval, using the model for the

following independent variables: X1 = 1.5, X2 = 4.5, X3 = 50, X4 = 25, X5 = 2.

The table below provides the original battery performance data provided by the manufacturing association.

The Dependent Variable is: - Cycles to Failure is the dependent variable (Y) - The Log of (Cycles to Failure) is represented as Log(Y)

The Independent Variables are: - Charge Rate (X1) - Discharge Rate (X2) - Depth of Discharge (X3) - Temperature (X4) - End of Charge (X5)

Table 1: Original Performance Data

Cycles to

Failure

Log Cycles

to Failure

Charge Rate

(Amps)

Discharge Rate

(Amps)

Depth of Discharge

(% of rated

ampere-hours)

Temperature (Celsius)

End of charge (Volts)

Data Y Log(Y) X1 X2 X3 X4 X5 1 101.000 2.004 0.375 3.130 60.000 40.000 2.000 2 141.000 2.149 1.000 3.130 76.800 30.000 1.990 3 96.000 1.982 1.000 3.130 60.000 20.000 2.000 4 125.000 2.097 1.000 3.130 60.000 20.000 1.980 5 43.000 1.633 1.625 3.130 43.200 10.000 2.010 6 16.000 1.204 1.625 3.130 60.000 20.000 2.000 7 188.000 2.274 1.625 3.130 60.000 20.000 2.020 8 10.000 1.000 0.375 5.000 76.800 10.000 2.010 9 3.000 0.477 1.000 5.000 43.200 10.000 1.990 10 386.000 2.587 1.000 5.000 43.200 30.000 2.010 11 45.000 1.653 1.000 5.000 100.000 20.000 2.000 12 2.000 0.301 1.625 5.000 76.800 10.000 1.990 13 76.000 1.881 0.375 1.250 76.800 10.000 2.010 14 78.000 1.892 1.000 1.250 43.200 10.000 1.990 15 160.000 2.204 1.000 1.250 76.800 30.000 2.000 16 3.000 0.477 1.000 1.250 60.000 0.000 2.000 17 216.000 2.334 1.625 1.250 43.200 30.000 1.990 18 73.000 1.863 1.625 1.250 60.000 20.000 2.000 19 314.000 2.497 0.375 3.130 76.800 30.000 1.990 20 170.000 2.230 0.375 3.130 60.000 20.000 2.000

SAMPLE REPORT

Page 3: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

3

When initially analyzing the performance data, the following observations were made concerning the Dependent Variable (Y) and its relationship with the Independent Variables (X1-5):

- There is large variability in the original cycles to failure (Y) data. In the histogram of the dependent variable (Y), we can see that it is skewed toward the left. This could be problematic in conducting the regression analysis.

- When we conduct a probability plot for this data, the standard deviation is also very large.

These observations are displayed in the histogram and probability plot generated by Minitab below:

Figure 1: Histogram of Cycles to Failure (Y)

Figure 2: Probability Plot of Cycles to Failure (Y)

4003002001000-100

5

4

3

2

1

0

Cycles to Failure

Freq

uenc

y

Mean 112.3StDev 104.7N 20

Histogram of Cycles to FailureNormal

5004003002001000-100-200-300

99

95

90

80

70

60504030

20

10

5

1

Cycles to Failure

Perc

ent

Mean 112.3StDev 104.7N 20AD 0.668P-Value 0.069

Probability Plot of Cycles to FailureNormal - 95% CI

SAMPLE REPORT

Page 4: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

4

We would prefer a more normalized distribution for the dependent variable. When comparing the original dependent variable (Y) to the Log (Y), we do see some improvement in the distribution, indicating increased normality. The following observations were made when analyzing Log (Y):

- The standard deviation for Log cycles to failure is much smaller, but the P-value has decreased.

- In general, we would prefer to have a larger p-value in order to indicate greater normality of the distribution.

- At this point, it is difficult to discern the greater normality expressed by the Log (Y). - For the purposes of this project (and to meet the client’s request), we will choose (Log

cycles to failure) as the dependent variable for the regression model. Choosing the Log(Y) allows for clear interpretation in that constant changes to Log(Y) translate to constant percentage changes in Y.

These observations are displayed in the histogram and probability plot generated by Minitab below:

Figure 3: Histogram of Log Cycles to Failure (Log(Y))

3.22.82.42.01.61.20.80.4

7

6

5

4

3

2

1

0

Log Cycles to Failure

Freq

uenc

y

Mean 1.737StDev 0.6875N 20

Histogram of Log Cycles to FailureNormal

SAMPLE REPORT

Page 5: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

5

Figure 4: Probability Plot of Log Cycles to Failure (Log(Y))

Correlation Analysis: In order to determine the best predictors for the regression model, we completed a correlation analysis of the dependent variable Log(Y) and the independent variables (X1-5). The figure below displays the correlation strengths between the dependent and independent variables.

Figure 5: Correlation between Log(Y) and X1-5

Log Cycles Charge Discharge Depth

End of

to Failure Rate Rate Discharge Temp Charge

Log(Y) X1 X2 X3 X4 X5 Log(Y) 1

X1

-0.175377126 1

X2

-0.291453599 -0.08686 1

X3

-0.068901748 -0.31402 0.191942 1

X4 0.718930287 -0.13537 -0.00283 0.066934 1

X5 0.101140168 0.007163 0.064439 0.019973 -

0.11434 1

The threshold chosen to indicate significant correlation is (0.19). The highlighted values represent significant correlation. Based on these findings, we should keep the following independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of Discharge, and (X4) Temperature.

Initial Regression Analysis: Based on this decision, we then move forward with regression analysis using the informed outcome from the correlation analysis. The results of the initial regression analysis are displayed below.

43210

99

95

90

80

70

60504030

20

10

5

1

Log Cycles to Failure

Perc

ent

Mean 1.737StDev 0.6875N 20AD 1.046P-Value 0.007

Probability Plot of Log Cycles to FailureNormal - 95% CI

SAMPLE REPORT

Page 6: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

6

Figure 6: Initial Regression Analysis for Log(Y) and X2, X3, X4

Regression Statistics Multiple R 0.778 R Square 0.605 Adjusted R Square 0.530 Standard Error 0.471 Observations 20.000

F-value is moderately high

P-value is moderately low

ANOVA

P-value df SS MS F Significance F Regression 3.000 5.429 1.810 8.154 0.002 Residual 16.000 3.551 0.222

Total 19.000 8.980

VIF

Coefficients Standard

Error t Stat P-value Lower 95% Upper 95% Lower 95.0%

Upper 95.0%

from Minitab

Log(Y) Intercept 1.352 0.510 2.651 0.017 0.271 2.434 0.271 2.434 (X2) Discharge rate -0.134 0.077 -1.730 0.103 -0.298 0.030 -0.298 0.030 1.039

(X3) Depth of discharge -0.003 0.007 -0.399 0.695 -0.018 0.012 -0.018 0.012 1.043 (X4) Temperature 0.050 0.011 4.584 0.000 0.027 0.073 0.027 0.073 1.005

The P-value for depth of discharge is high, which indicates that we may want to discard.

The P-value for discharge rate is also high, which indicates that we may want to discard.

The regression equation is:

Log Cycles to Failure = 1.35 - 0.134 Discharge Rate - 0.00285 Depth of Discharge + 0.0497 Temperature

SAMPLE REPORT

Page 7: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

7

The observations resulting from the initial regression results above are as follows:

- The variance inflation factors (VIF) values obtained from Minitab for each independent variable are all in the range of 1, so there is little to no colinearity among independent variables and the estimates for coefficients are considered stable.

- The R-Squared = 60.5%, which is moderately high. Ultimately, we would like a higher R-Squared value, indicating increased “goodness of fit” for the model.

- The Durbin-Watson statistic = 2.02425, indicating very little to no presence of auto-correlation among observations.

- The critical F-value is moderately high, but not significantly high. Ultimately, we would prefer a higher F-value.

- The statistical significance, or P-value is low, but not extremely low. Ultimately, we would prefer a lower P-value that is closer to zero.

- When looking at the individual P-values for the independent variables, that X2 and X3 have high P-values. In particular, the P-value for X3 is very high. This indicates that we may want to consider discarding X3 from the model.

- The residual analysis appears to support the assumption of normality for residuals. - The normal probability plot of the residuals shows some deviation from normality.

However, deviations do not invalidate the assumption of normality for the residuals. - We do see a high P-value for the residuals probability plot (0.243), which indicates

goodness of fit for normality test. - There is 1 influential observation (outlier) identified within the probability plot for the

residuals. This observation may require review or possible removal. - There is no apparent heteroscedasticity in the plot of the residual versus fitted values

for Log(Y). So, there is evidence to support constant variance in residuals.

The following figures support the observations listed above:

Figure 7: Residual Plots for Log (Y)

1.00.50.0-0.5-1.0

99

90

50

10

1

Residual

Per

cent

3.02.52.01.51.0

0.6

0.3

0.0

-0.3

-0.6

Fitted Value

Res

idua

l

0.60.40.20.0-0.2-0.4-0.6-0.8

4.8

3.6

2.4

1.2

0.0

Residual

Freq

uenc

y

2018161412108642

0.6

0.3

0.0

-0.3

-0.6

Observation Order

Res

idua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Log Cycles to Failure

SAMPLE REPORT

Page 8: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

8

Figure 8: Probability Plot of Residuals

Figure 9: Plot of Residuals versus Fitted Values for Log(Y)

1.00.50.0-0.5-1.0-1.5

99

95

90

80

70

60504030

20

10

5

1

RESI5

Perc

ent

Mean -4.99600E-16StDev 0.471N 20AD 0.654P-Value 0.243

Probability Plot of RESI5Normal - 95% CI

3.02.52.01.51.0

0.50

0.25

0.00

-0.25

-0.50

-0.75

FITS5

RES

I5

Residuals Versus Fitted Values

SAMPLE REPORT

Page 9: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

9

Diagnostic Analysis: Analysis of the initial regression model indicates that the model described in the following regression equation is within reason:

Log Cycles to Failure = 1.35 - 0.134 Discharge Rate - 0.00285 Depth of Discharge + 0.0497 Temperature

The analysis of the residuals versus fitted values indicates that the majority of the values fall within expected thresholds. Only observation 1 looks somewhat suspicious, but this is not enough to warrant invalidation of the model.

To determine where we may want to focus in order to improve the existing model, we look at the interaction between the independent variables when plotted against the dependent variable. The graph below depicts interaction between temperature and discharge rate, thus suggesting that an additional independent variable may be needed to better express the relationship between the dependent and independent variables. This can be seen in the figure below:

Figure 10: Plot of Log(Y) Versus Independent Variables

Based on interaction effect between Temperature and Discharge rate, we decided to add another independent variable (Xnew): Temp*Discharge Rate.

The following table displays the new independent variable along with the other remaining independent variables that comprise the regression model. We will now refer to the following analysis as an adjusted model based on the addition of the new independent variable.

50403020100

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Temperature

Log

Cycl

es t

o Fa

ilure

TemperatureDischarge RateDepth of Discharge

Variable

Scatterplot Log(Y) Versus X Variables

SAMPLE REPORT

Page 10: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

10

Table 2: Adjusted Model Variables

Log Cycles

to Failure

Discharge Rate

(Amps)

Depth of Discharge

(% of rated

ampere-hours)

Temperature (Celsius)

New I.V. Temp*

Discharge Rate

Log(Y) X2 X3 X4 X(new) 2.004 3.130 60.000 40.000 125.200 2.149 3.130 76.800 30.000 93.900 1.982 3.130 60.000 20.000 62.600 2.097 3.130 60.000 20.000 62.600 1.633 3.130 43.200 10.000 31.300 1.204 3.130 60.000 20.000 62.600 2.274 3.130 60.000 20.000 62.600 1.000 5.000 76.800 10.000 50.000 0.477 5.000 43.200 10.000 50.000 2.587 5.000 43.200 30.000 150.000 1.653 5.000 100.000 20.000 100.000 0.301 5.000 76.800 10.000 50.000 1.881 1.250 76.800 10.000 12.500 1.892 1.250 43.200 10.000 12.500 2.204 1.250 76.800 30.000 37.500 0.477 1.250 60.000 0.000 0.000 2.334 1.250 43.200 30.000 37.500 1.863 1.250 60.000 20.000 25.000 2.497 3.130 76.800 30.000 93.900 2.230 3.130 60.000 20.000 62.600

Adjusted Regression Analysis: With the addition of the new independent variable, we now need to analyze the regression results with the new independent variable to determine whether or not the added independent variable results in an improvement in model fit.

The figure below provides the results from the adjusted regression analysis:

SAMPLE REPORT

Page 11: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

11

Figure 11: Adjusted Regression Analysis for Log(Y) and X2, X3, X4, and Xnew1

Regression Statistics

Adjusted R2 is higher than original

Durbin-Watson statistic = 1.91397

Multiple R 0.868

original 60.5 R Square 0.754

adjusted 75.4

P-Value is lower

Adjusted R Square 0.683 Standard Error 0.396

The regression equation is Observations 19.000

Log Cycles to Failure = 1.77 - 0.305 Discharge Rate - 0.00224 Depth of Discharge

+ 0.0213 Temperature + 0.0104 Temp*Discharge Rate ANOVA

P-value

df SS MS F Significance F Regression 4 6.710 1.677 10.701 0.00035 Residual 14 2.195 0.157

Total 18 8.905

Coefficients

Standard Error t Stat P-value Lower 95%

Upper 95% Lower 95.0% Upper 95.0%

Intercept 1.766 0.509 3.467 0.004 0.674 2.858 0.674 2.858 (X2) Discharge rate -0.346 0.130 -2.655 0.019 -0.626 -0.067 -0.626 -0.067 (X3) Depth of discharge -0.003 0.006 -0.577 0.573 -0.016 0.009 -0.016 0.009 (X4) Temperature 0.026 0.021 1.268 0.226 -0.018 0.071 -0.018 0.071 Temp*DischRate 0.013 0.007 1.928 0.074 -0.001 0.028 -0.001 0.028

P-values for X3 and X4 are still high

The adjusted regression equation is:

Log Cycles to Failure = Log Cycles to Failure = 1.77 - 0.305 Discharge Rate - 0.00224 Depth of Discharge + 0.0213 Temperature + 0.0104 Temp*Discharge Rate

SAMPLE REPORT

Page 12: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

12

The observations resulting from the adjusted regression results above are as follows:

- The R-Squared value has increased from 60.5% to 75.4%, indicating better fit. - The Adjusted R-Squared value has increased from 53% to 68.3%, indicating better fit. - The P-Value has decreased from .002 to .00035, indicating better fit. - The F-Value has increased from 8.154 to 10.701, indicating better fit. - The Durbin-Watson statistic = 1.91397, which is still close enough to 2 to indicate very little to

no presence of auto-correlation. - When looking at the individual P-values for the independent variables, X3 and X4 have

high P-values. In particular, the P-value for X3 is very high. This indicates that we may want to consider discarding X3 from the model.

- The residual analysis appears to support the assumption of normality for residuals. - The normal probability plot of the residuals shows some deviation from normality.

However, deviations do not invalidate the assumption of normality for the residuals. - We still see a high P-value, although lower than that for the original model, for the

residuals probability plot (0.136), which indicates goodness of fit for normality test. - There is 1 influential observation (outlier) identified within the probability plot for the

residuals. - There is no apparent heteroscedasticity in the plot of the residual versus fitted values

for Log(Y).

The following figures support the observations listed above:

Figure12: Adjusted Residual Plots for Log (Y)

1.00.50.0-0.5-1.0

99

90

50

10

1

Residual

Per

cent

3.02.52.01.51.0

0.5

0.0

-0.5

-1.0

Fitted Value

Res

idua

l

0.500.250.00-0.25-0.50-0.75

8

6

4

2

0

Residual

Freq

uenc

y

2018161412108642

0.5

0.0

-0.5

-1.0

Observation Order

Res

idua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Log Cycles to Failure

SAMPLE REPORT

Page 13: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

13

Figure 13: Adjusted Probability Plot of Residuals

Figure 14: Adjusted Plot of Residuals versus Fitted Values for Log(Y)

1.00.50.0-0.5-1.0-1.5

99

95

90

80

70

60504030

20

10

5

1

RESI6

Perc

ent

Mean -9.10383E-16StDev 0.471N 20AD 0.813P-Value 0.136

Probability Plot of RESI6Normal - 95% CI

3.02.52.01.51.0

0.50

0.25

0.00

-0.25

-0.50

-0.75

-1.00

FITS6

RES

I6

Scatterplot of RESI6 vs FITS6

SAMPLE REPORT

Page 14: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

14

Overall, the addition of the new independent variable (Xnew1) Temp*Discharge Rate results in a better model fit.

Test for Adding a Second New Independent Variable: We then ask the question, would adding another independent variable result in additional improvement to the regression model? We can test this by adding another new independent variable and then comparing the regression results to the previous results. In particular, we will be looking at the R-Squared and Adjusted R-Squared valued to determine whether or not an additional variable improves the model.

Thus, we test the addition of (Xnew2) Discharge Rate *Depth of Discharge. The addition of this new independent variable is displayed in the table below:

Table 3: Test for Addition of Second New Independent Variable

Log Cycles

to Failure

Discharge Rate

(Amps)

Depth of Discharge

(% of rated

ampere-hours)

Temperature (Celsius)

New I.V. Temp*

Discharge Rate

New I.V. Discharge

Rate *Depth of Discharge

Log(Y) X2 X3 X4 X(new1) X(new2) 2.004 3.130 60.000 40.000 125.200 187.8 2.149 3.130 76.800 30.000 93.900 240.384 1.982 3.130 60.000 20.000 62.600 187.8 2.097 3.130 60.000 20.000 62.600 187.8 1.633 3.130 43.200 10.000 31.300 135.216 1.204 3.130 60.000 20.000 62.600 187.8 2.274 3.130 60.000 20.000 62.600 187.8 1.000 5.000 76.800 10.000 50.000 384 0.477 5.000 43.200 10.000 50.000 216 2.587 5.000 43.200 30.000 150.000 216 1.653 5.000 100.000 20.000 100.000 500 0.301 5.000 76.800 10.000 50.000 384 1.881 1.250 76.800 10.000 12.500 96 1.892 1.250 43.200 10.000 12.500 54 2.204 1.250 76.800 30.000 37.500 96 0.477 1.250 60.000 0.000 0.000 75 2.334 1.250 43.200 30.000 37.500 54 1.863 1.250 60.000 20.000 25.000 75 2.497 3.130 76.800 30.000 93.900 240.384 2.230 3.130 60.000 20.000 62.600 187.8

The results from the regression analysis with the second new independent variable are displayed in the following figure.

SAMPLE REPORT

Page 15: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

15

Figure 15: Adjusted Regression Analysis for Log(Y) and X2, X3, X4, Xnew1 and Xnew2

Regression Statistics

Adding another independent variable does not improve the model fit. Multiple R 0.804

The R2 value decreases and the adjusted R2 also decreases, indicating that we should not add another variable.

R Square 0.646

Further supporting this decision, the F-value has decreased and the P-value has increased. Adjusted R Square 0.520

We will not add the second new independent variable.

Standard Error 0.477 Observations 20.000

ANOVA df SS MS F Significance F

Regression 5 5.801 1.160 5.109 0.007 Residual 14 3.179 0.227

Total 19 8.980

Coefficients

Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%

Upper 95.0%

Intercept 1.928 1.215 1.586 0.135 -0.678 4.534 -0.678 4.534 X2 -0.350 0.340 -1.029 0.321 -1.078 0.379 -1.078 0.379 X3 -0.005 0.018 -0.257 0.801 -0.044 0.035 -0.044 0.035 X4 0.021 0.025 0.847 0.411 -0.032 0.075 -0.032 0.075 X(new1) 0.011 0.008 1.280 0.221 -0.007 0.028 -0.007 0.028 X(new2) 0.001 0.005 0.147 0.885 -0.009 0.010 -0.009 0.010

SAMPLE REPORT

Page 16: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

16

Very quickly, we can see that adding a second new independent variable does not improve the regression model. The following observations substantiate this conclusion:

- Both the R-Squared and Adjusted R-Squared values decrease, indicating that we should not add another variable.

- Further supporting this decision, the F-value has decreased and the P-value has increased.

We will not add the second independent variable.

Test for Removing an Independent Variable: However, we now consider the effect that removing an independent variable would have on the regression model. In general, we want a model that captures the total variance in an equation with the fewest number of independent variables. Would subtraction of an existing variable improve the regression model?

Specifically, we look at removing X3 (Depth of Discharge) because it originally had a very high P-value when compared to the other independent variables. The table below displays the dependent and independent variables when X3 is removed.

Table 4: Test for Removal of Independent Variable X3

Log Cycles

to Failure

Discharge Rate

(Amps) Temperature

(Celsius)

New I.V. Temp*

Discharge Rate

Log(Y) X2 X4 X(new1) 2.004 3.130 40.000 125.200 2.149 3.130 30.000 93.900 1.982 3.130 20.000 62.600 2.097 3.130 20.000 62.600 1.633 3.130 10.000 31.300 1.204 3.130 20.000 62.600 2.274 3.130 20.000 62.600 1.000 5.000 10.000 50.000 0.477 5.000 10.000 50.000 2.587 5.000 30.000 150.000 1.653 5.000 20.000 100.000 0.301 5.000 10.000 50.000 1.881 1.250 10.000 12.500 1.892 1.250 10.000 12.500 2.204 1.250 30.000 37.500 0.477 1.250 0.000 0.000 2.334 1.250 30.000 37.500 1.863 1.250 20.000 25.000 2.497 3.130 30.000 93.900 2.230 3.130 20.000 62.600

The results from the regression analysis with X3 removed are displayed in the following figure.

SAMPLE REPORT

Page 17: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

17

Figure 16: Adjusted Regression Analysis for Log(Y) and X2, X4, Xnew1 (X3 Removed)

Regression Statistics

Removing X3 does not improve the model fit. Multiple R 0.802

The R2 value decreases and the adjusted R2 also decreases, indicating that we should not remove the variable.

R Square 0.643

Further supporting this decision, the F-value has decreased and the P-value has increased. Adjusted R Square 0.576

We will not remove X3.

Standard Error 0.448 Observations 20.000

ANOVA df SS MS F Significance F

Regression 3.000 5.774 1.925 9.607 0.001 Residual 16.000 3.205 0.200

Total 19.000 8.980

Coefficients

Standard Error t Stat P-value Lower 95% Upper 95%

Lower 95.0%

Upper 95.0%

Intercept 1.659 0.459 3.616 0.002 0.686 2.631 0.686 2.631

X2 -0.312 0.145 -

2.162 0.046 -0.619 -0.006 -0.619 -0.006 X4 0.021 0.023 0.883 0.390 -0.029 0.070 -0.029 0.070 X(new1) 0.011 0.008 1.379 0.187 -0.006 0.027 -0.006 0.027

SAMPLE REPORT

Page 18: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

18

Again, we can quickly determine that removing X3 does not result in a better model fit. The following observations substantiate this conclusion:

- The R2 value decreases and the adjusted R2 also decreases, indicating that we should not remove the variable.

- Further supporting this decision, the F-value has decreased and the P-value has increased.

Best Regressions Model Fit: We can also test for improved fit in the adjusted model by comparing it to the original.

Test for model improvement:

R2f 0.754

adjusted

R2r 0.605

original

dferror dff 14

adjusted dferror dfr 16

original

F= (R^f-R^r)/(dfr-dff)

(1-R2f)/dff

F= 4.239837

F(.05,2,14) = 0.951398

F>F(.05,2,14)

At this point, we can say with confidence that the best fit model is the one represented by the following equation:

Log Cycles to Failure = Log Cycles to Failure = 1.77 - 0.305 Discharge Rate - 0.00224 Depth of Discharge + 0.0213 Temperature + 0.0104 Temp*Discharge Rate

With Dependent Variable Log(Y)

And Independent Variables: - Discharge Rate (X2) - Depth of Discharge (X3) - Temperature (X4) - Temperature*Discharge Rate (X(new1))

Forecasting Dependent Variable Values: Provided values for each of the independent variables are below:

Charge Rate (X1) = 1.5 Discharge Rate (X2) = 4.5 Depth of Discharge (X3) = 50 Temperature (X4) = 25 End of Charge (X5) = 2 Temperature*Discharge Rate (Xnew1) = 25*4.5 = 112.5 XT0 = (1, 4.5, 50, 25, 112.5) bhatT = (1.77, -.305, -.002, +.021, +.010) yhat = 1.998 s = .396

SAMPLE REPORT

Page 19: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

19

Adjusted Model Log Cycles to Failure = 1.77 - 0.305 Discharge Rate - 0.00224 Depth of Discharge Adjusted Model + 0.0213 Temperature + 0.0104 Temp*Discharge Rate

Log Cycles to Failure = 1.988

Adjusted Model

Cycles to Failure =

"10^1.988"

Adjusted Model

Cycles to Failure =

97.27

Adjusted Model

In addition, we can look at what the original model would have forecast and compare this with the output from the adjusted model. Original Model Log Cycles to Failure = - 27.7 - 0.199 Charge Rate - 0.142 Discharge Rate Original Model - 0.00483 Depth of Discharge + 0.0503 Temperature +14.7 End of Charge

Log Cycles to Failure =

1.7785

Original Model

Cycles to Failure =

"10^1.779"

Cycles to Failure =

60.12

Original Model

We must now determine the 95% prediction interval and look at the estimated values from the adjusted model. XTX

20.000 60.670 1256.800 390.000 1182.300 60.670 222.547 3892.784 1182.300 4213.599

1256.800 3892.784 83520.640 24704.000 75887.200 390.000 1182.300 24704.000 9500.000 28215.000

1182.300 4213.599 75887.200 28215.000 97632.950

(XTX)-1 1.655 -0.249 -0.012 -0.042 0.012

-0.249 0.107 -0.001 0.013 -0.005 -0.012 -0.001 0.000 0.000 0.000 -0.042 0.013 0.000 0.003 -0.001 0.012 -0.005 0.000 -0.001 0.000

SAMPLE REPORT

Page 20: SAMPLE Using Linear Regression REPORTdorpjr/EMSE4765/Project/SAMPLE REP… · independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of

20

XT0*XTX-1 0.243711

XT0*XTX-1*X0 0.243711

sigma hat y = 0.441626

95% Prediction Interval: XT0bhat = 1.988 df error = 14 t14, .975 = 2.144787

logUB 2.93609 logLB 1.04991

UB 862.9785 LB 11.22018

Log Cycles to Failure (1.988) is within the 95% prediction interval. Cycles to Failure (97.27) is within the 95% prediction interval. Conclusions and Recommendation: The prediction interval for cycles to failure is very large. This is actually indicative of actual battery performance in space environments which has large deviations due to changes in temperature. Using the adjusted model, the number of cycles to failure is 97.27. This is a higher number of cycles to failure than the original model which predicts 60.12 cycles to failure. Assuming that the adjusted model is a better fit and therefore, provides a better estimate for the dependent variable, it might be said that the original model could have resulted in higher costs or lower sales based on the assumption that the battery would fail in fewer cycles. The adjusted model indicates that the battery lifecycle is longer than would otherwise be expected. It can be assumed that satellite purchasers would show preference for battery technology that provides a longer life and fewer replacements or upgrades. Therefore, the adjusted model is the recommended model for the manufacturing association.

SAMPLE REPORT