quadratic regression ©2005 dr. b. c. paul. fitting second order effects can also use least square...
TRANSCRIPT
Quadratic Regression
©2005 Dr. B. C. Paul
Fitting Second Order Effects
Can also use least square error formulation to fit an equation of the form
Math is more difficult – but since you don’t have to do it – you may not care.
XBBB XXYo
2
21**)(
Fitting the Model With SPSS
We will re-use our data setWhere we saw a clearQuadratic effect in the trend
Click on Analyze to pull downThe menu
Highlight Regression to bringUp the side menu
Highlight and Click for CurveEstimation.
Setting for a Quadratic Model
Set your Dependent andIndependent Variables asBefore.
Check Off that you wantA Quadratic Model
Note that you have options to fitLogarithmic, Inverse, cubic,Power of your choice, exponentialAnd a number of other models.The computer will fit any of the modelsBy least squares.
Other Options to Check
I can have the model include aConstant or not.
I want it to plot my model.
I also want to see an ANOVA on my model theThe constants in the regression equation.
Click Ok when all is set.
Here Come Results
Tells me it fit a quadraticModel for DependentUsing independent as theControlling variable andThat I had 29 data cases.
Analyzing the Fit of the Quadratic Equation.
R squared value is 1 – pretty muchMeans that quadratic model is aPerfect fit.
Their regression mean square is6 orders of magnitude greater thanThe mean square error and theF test blows the null hypothesis offThe map.
Checking the Significance and Value of the Coefficients
B0=1.163+B1=4.061 ie 4.061*X+B2=0.068 ie 0.068*X2
How Significant are the Values
T tests are used to measure the certainty that ourCoefficient values are not 0. As can be seenNone of them have any noteworthy chance ofBeing a fluke.
Here is the Fit of the Model to the Data
As can be seen theModel fits the pointsIncluding the slightCurvature that reflectsThe quadratic effect.
Lets see if there is a Quadratic Effect of Distance on our MPG
We remember that there isSome definite scatter in ourMPG data. The linearRegression on distance onlyExplained about 37% of theTotal variability.
Of course unlike our last dataSet where we could see theCurve effect in the residuals –The residuals were fairlyScattered for our MPG plot
Looking at Results
The R^2 value is up to 40% of variabilityFrom about 37% - that’s improvement, butNot a lot.
The Regression itself is significantAt the 99.9% level
We have something going onDown here.
Significance of Coefficients
The constant isSignificant.
Significance of our distance and distance squared terms are somewhat lacking
At an alpha level of 9.9% some may not be sure the distance coefficient is notZero.
At an alpha level of 48.7% most people would have a lot of doubt about the quadraticTerm distance squared.
What Happened?
We already ran a linear regression and know we have a significant linear effect.
Now we run a quadratic regression and its telling us its not sure about the linear effect
Significance is measured by how much variation is explained by a term relative to the mean square error As new terms enter the equation the amount of variability
explained by a single term normally drops The prediction accuracy is now being shared It does make a difference what else is in the model
Checking Out the Plot
The curved regression lineDoes not appear to be a badFit to the data. In fact the dataSeems to have a bend.
But the significance of theSquare term is just over 50%Which is not mathematicallyConvincing.
Why are We Being Told Something that Looks Wrong?
Whether a term is significant depends on What else is in the equation
In this case the linear effect would seem stronger than the quadratic effect so we might expect more weight to go to the other variable.
What else is not explained 60% of the variability in this data is not explained by the
regression of distance only Some times the clarity with which we can see a trend
depends on the amount of confusion coming from other sources
Everything else other than distance that might influence gas mileage is being called random
We Know More Than We Are Telling
We would logically guess that gas mileage is influenced by more than distance driven When we leave other sources of prediction
unaccounted for we expand what we are saying is random Not using what we know can cause us to loose a lot of
power in our models Problem is our model can only handle one
independent variable Maybe we need a “Bigger Box”