polynomial regression and transformations sta 671 summer 2008

29
Polynomial Regression and Transformations STA 671 Summer 2008

Upload: kailey-myers

Post on 14-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Polynomial Regression and Transformations STA 671 Summer 2008

Polynomial Regressionand Transformations

STA 671

Summer 2008

Page 2: Polynomial Regression and Transformations STA 671 Summer 2008

Review

The estimated residuals e1,…,en provide the best method for checking the assumptions.

Remember the residuals εi ~ N(0,σ). The estimated residuals should be close to that.

In a residual plot, you are looking for outliers, curvature, or changing variance.

In this lecture we will discuss polynomial regression and transformations, two separate methods. Both are possible solutions to curvature, and transformations have the added benefit they sometimes address changing variance.

Page 3: Polynomial Regression and Transformations STA 671 Summer 2008

Recall the Hooker data. There appears to be a small amount of curvature.

Page 4: Polynomial Regression and Transformations STA 671 Summer 2008

This curvature is seen more clearly in the residual plot.

Page 5: Polynomial Regression and Transformations STA 671 Summer 2008

Polynomial regression – one method for dealing with curvature.

To account for curvature, we can perform something called “polynomial regression”, which consists of fitting a polynomial (a quadratic or cubic typically) instead of a line.

Recall the linear model was Yi = β0 + β1 Xi + εi. The quadratic model is Yi = β0 + β1 Xi + β2 Xi

2 + εi. The cubic model is Yi = β0 + β1 Xi + β2 Xi

2 + β3 Xi3 + εi.

The higher the order of the polynomial, the more curvature it can account for.

Page 6: Polynomial Regression and Transformations STA 671 Summer 2008

Quadratic model accounts for the curvature

Quadratic equation is 88.017 – 1.1295 Temp + 0.004 Temp2

If the quadratic model is better than the linear model, what about a cubic?

Page 7: Polynomial Regression and Transformations STA 671 Summer 2008

A cubic model produces no visual improvement

equation is Pressure = 124.14 – 1.69 Temp + 0.0069 Temp2

- 0.000005 Temp3

Page 8: Polynomial Regression and Transformations STA 671 Summer 2008

Which to choose, quadratic or cubic?

In general, choose the LOWEST order polynomial possible (i.e. prefer linear to quadratic, quadratic to cubic, etc.).

This is aimed at 1) “Occam’s razor” meaning that simpler models are preferred, and 2) the higher the order, the more parameters to estimate. Statistically, it’s easier to estimate a few parameters than many.

Page 9: Polynomial Regression and Transformations STA 671 Summer 2008

P-values for selecting order

The regression output provides a formal method for selecting the order of the polynomial. This method typically agrees with looking at the residual plot.

The regression output provides p-values for each term in the regression.

The p-value for the highest order term is the ONLY one that is used.

Page 10: Polynomial Regression and Transformations STA 671 Summer 2008

Using p-values to select order

Begin by fitting the cubic model. If the cubic term is significant, use the cubic model (you can consider higher order models, but we do not in STA671)

If the cubic term is NOT significant, remove it and RERUN the model (p-values change depending on what terms are in the model), then look to see if the quadratic term is significant.

If the quadratic term is not significant, remove it and RERUN the model, resulting in a linear regression.

If none of these models produce a reasonable residual plot, you may need another method.

Page 11: Polynomial Regression and Transformations STA 671 Summer 2008

For the boiling point data

We first run the cubic model and acquire the following p-values

The p-value is not significant, so remove the cubic term and RERUN the model (do NOT just remove the quadratic terms based on the p-value above)

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 124.13563 384.83452 0.32 0.7495 12434 Temperature 1 -1.68544 5.92071 -0.28 0.7781 444.16724 Temperature_2 2nd power of TEMPERATURE 1 0.00688 0.03032 0.23 0.8222 2.98566 Temperature_3 3rd power of TEMPERATURE 1 -0.00000486 0.00005171 -0.09 0.9259 0.00022757

Page 12: Polynomial Regression and Transformations STA 671 Summer 2008

Quadratic model for boiling point data

The quadratic model produces the following p-values

The quadratic term is significant AND we observe a reasonable residual plot, so we stop here. This is our final model.

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 88.01662 13.93063 6.32 <.0001 12434 Temperature 1 -1.12954 0.14336 -7.88 <.0001 444.16724 Temperature_2 2nd power of TEMPERATURE 1 0.00403 0.00036820 10.95 <.0001 2.98566

Page 13: Polynomial Regression and Transformations STA 671 Summer 2008

What if no polynomial model produces a reasonable residual plot?

If none of our polynomial model produces a reasonable residual plot, we need another method.

Another method to try is to transform the response variable.

Transformations, like polynomial regression, can handle curvature, and in addition transformations have the potential to handle changing spread as well.

Page 14: Polynomial Regression and Transformations STA 671 Summer 2008

Example - the ethanol data

Data comes from an engine exhaust study. NOx is a measure of the exhaust from the

engine, while E is a measure of the fuel/air mixture (high values are almost all fuel, low values are almost all air)

A cubic model does not fit the data. A quadratic of linear model would do worse.

Page 15: Polynomial Regression and Transformations STA 671 Summer 2008

A cubic fit to the ethanol data

Scatterplot Residual plot shows clearcurvature.

Page 16: Polynomial Regression and Transformations STA 671 Summer 2008

Transformations

Instead of fitting Y as the response variable, we fit a function of Y as the response variable.

Thus, instead of Yi = β0 + β1 Xi + εi, you can fit log(Yi) = β0 + β1 Xi + εi, orsqrt(Yi) = β0 + β1 Xi + εi, orcbrt(Yi) = β0 + β1 Xi + εi, etc.

Thus, you greatly expand the possible models you can fit.

You can transform the X variable as well, but in the interest of time we do not discuss that in detail in STA671.

Page 17: Polynomial Regression and Transformations STA 671 Summer 2008

Transformations allow different errors structures.

A quadratic regression looks like Yi = β0 + β1 Xi + β2 Xi2

+ εi. At any particular X, the variance is the same.

Taking the square root transformation sqrt(Yi) = β0 + β1 Xi + εi means thatYi = [β0 + β1 Xi + εi]2 = β0

2 + β12 Xi

2 + εi2 + 2 β0 β1 Xi + 2

β0 εi + 2 β1 Xi εi. There is a quadratic relationship between X and Y. Note the multiplication between Xi and εi, this allows

the variance to change for each Xi. Thus, in addition to handling curvature, transformations allow you to address changing variance.

Page 18: Polynomial Regression and Transformations STA 671 Summer 2008

Prototypical Data requiring transformation

Page 19: Polynomial Regression and Transformations STA 671 Summer 2008

After square root transformation

Page 20: Polynomial Regression and Transformations STA 671 Summer 2008

Which transformation?

There are no hard and fast rules on which transformation to try, no guaranteed method for finding a good transformation (in some data, you seem to never find a great fit).

Usually you have to perform trial and error, and remember you can combine polynomial regression with transformation. Thus for example, you can fit a cubic model in X to predict log(Y).

Page 21: Polynomial Regression and Transformations STA 671 Summer 2008

Some “typical” transformations

If you have area data, a square root transform is often useful (converts area to something proportional to the radius or length).

Similarly with volume, a cube root transformation may be appropriate.

With financial data (incomes, etc.), a log transform may be appropriate. Logs change percentage increases to constant increases, thus if a unit increase in X results in a 10% increase in Y, it also results in a 0.0953 increase in Y.

Page 22: Polynomial Regression and Transformations STA 671 Summer 2008

A general strategy

Fit the raw data (X and Y) with a least squares line. See if you get a good residual plot. If so, stop and be happy

If not, try a polynomial regression (quadratic or cubic). If one of these fits, stop and be happy (remember, fit the smallest model possible).

If a polynomial regression does not work, try transforming Y to log, sqrt, and cube root (i.e. perform three more regressions). Fit a cubic polynomial regression on each of these and determine the best outcome. Choose the transformation that provides the best residual plot.

If none of those work, then regression might not be effective (there are more advanced techniques) or you may have to start transforming X as well. This becomes true trial and error. Consult your friendly local statistician.

Page 23: Polynomial Regression and Transformations STA 671 Summer 2008

Back to the ethanol data.

We can see from the scatterplot that E and NOX are not linearly related.

We tried a cubic regression and that didn’t work. Now off to the transformations. We fit cubic

regressions with log(Y), sqrt(Y), and cbrt(Y) as the response variables.

We may be able to get satisfactory results with something less than cubic, but if cubic doesn’t work the lower order models won’t either, thus we start with cubic models.

Page 24: Polynomial Regression and Transformations STA 671 Summer 2008

Square root transformation. Still clear curvature.

Scatterplot Residual plot

Page 25: Polynomial Regression and Transformations STA 671 Summer 2008

Cube root transformation. Improved, but still some curvature.

Page 26: Polynomial Regression and Transformations STA 671 Summer 2008

Log transformation. Still some lack of fit, but best of the bunch.

Page 27: Polynomial Regression and Transformations STA 671 Summer 2008

Log transform is not perfect, but best we can do right now (I encourage you to play with the data on your own)

After we have chosen the log transformation on the basis of the best residual plot (and decided it is “ok”, if certainly not a great residual plot), we look at the p-value for the cubic term to see if we can remove it. We can.

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 -11.32827 2.48931 -4.55 <.0001 19.62487 E 1 25.55212 8.69871 2.94 0.0043 0.31320 E_2 2nd power of E 1 -10.74539 9.86507 -1.09 0.2792 34.56387 E_3 3rd power of E 1 -2.43334 3.63758 -0.67 0.5054 0.02403

Page 28: Polynomial Regression and Transformations STA 671 Summer 2008

Quadratic model for log(NOX)

The quadratic model produces almost identical scatter and residual plots. The quadratic term is significant, so this is our final model.

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 -12.95199 0.55039 -23.53 <.0001 19.62487 E 1 31.31051 1.24771 25.09 <.0001 0.31320 E_2 2nd power of E 1 -17.32873 0.68084 -25.45 <.0001 34.56387

Page 29: Polynomial Regression and Transformations STA 671 Summer 2008

Extras

There are more advanced ways of dealing with polynomial regression and transformations, which we do not address in STA671.

Polynomial regression can be extended to handle more general curved models, such as splines (piecewise polynomials with desirable smoothness properties)

Transformation can be selected automatically by using something called a Box-Cox transformation, which automatically determines the appropriate exponent to transform your data (with a tradeoff of some interpretability).

Consult your friendly local statistician.