simple linear regression: prediction instructor: g...
TRANSCRIPT
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 1
APS 425Fall 2015
Simple Linear Regression:
Prediction
Instructor: G. William Schwert585-275-2470
Ciba-Geigy Ritalin Experiment
• Ritalin is tested to see if it helps with Central Auditory Processing Disorder (CAPD)– Similar symptoms to ADD/ADHD
• Experiment:– “Randomly” select 64 children
– All receive auditory test
– 32 (control group) receive no drug (or placebo?)
– 32 (treatment group) receive varying doses of Ritalin
– All children are tested a second time
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 2
Ciba-Geigy Ritalin Experiment
• DOSAGEi = amount of Ritalin received by child i– Measured as Mg of Ritalin per Kg of body weight
• IMPROVEi = child’s 2nd test score – 1st test score– Dataset A425_ritalin.wf1 also contains:
• AGE of child in months
• Gender (FEMALE = 1, for girls)
Predictions
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 3
• Predictive model:
IMPROVEi = 0.226 + 12.18 DOSAGEi= the estimate of E[IMPROVEi | DOSAGEi, b0, b1]
• What is your estimate of the average IMPROVE score for all children who receive a dosage of 0.35 mg/kg?
IMPROVEi = 0.226 + 12.18 DOSAGEi = 4.488
• This question asks about the average or expected value for all children who get a DOSAGE of 0.35mg/kg.
Predictions
• A given child has been administered a DOSAGEof 0.35mg/kg. What value do you predict for the child’s IMPROVE score?
IMPROVEi = 0.226 + 12.18 × 0.35 = 4.488
• This question asks you to predict the value for an individual child who gets a DOSAGE of 0.35mg/kg
^
Predictions
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 4
• The prediction of the value for an individual child = the expected value for the population (of all children with DOSAGE = 0.35mg/kg) (see previous two slides)
• However, standard errors are different!
• Let’s derive them next
Predictions
Std Error of Predictions
• Standard error for predicting an individual value– The linear model is:
Yi = + Xi + ei
– Our prediction is:Yi = b0 + b1 Xi
– Three sources of error:• Error in estimating • Error in estimating • Error in estimating ei
^
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 5
• Standard error of Yi:SDP = [ s2 + s2 / n + s2 (Xi – X)2 / xi
2 ] ½
Error term Intercept Slope
uncertainty uncertainty uncertainty
where s2 = ei 2 / (n-2) is the residual variance
and s2 / xi 2 is the variance of b1
^
_
Std Error of Predictions
Prediction Intervals
• A 100(1–)% confidence interval for Yi is:
[Yi - t/2 SDP, Yi + t/2 SDP]
where Pr{tn-2 > t/2 } = /2
• In the Ritalin case with one child getting a DOSAGEof 0.35mg/kg, we have SDP = 12.056, so a 95% confidence interval for IMPROVEi is
[4.488 – 2.00(12.056), 4.488 + 2.00(12.056)]
= [– 19.61, 28.59]
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 6
Using Eviews to Get Prediction Intervals
• Redefine the workfile range so that you can generate an “out-of-sample” prediction
Using Eviews to Get Prediction Intervals
• Double-click dosage to open the spreadsheet
• Then click Edit+/-
• Move to observation 65 (which will say “NA”)
• Type in .35 into the workspace bar to enter this value for the 65th
observation
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 7
Using Eviews to Get Prediction Intervals
• Using the regression equation predicting IMPROVE as a function of DOSAGE, click FORECAST
• Then specify the forecast sample as 65 65 • the value of DOSAGE = 0.35 you just
entered
Upper limit
Point estimate
Lower limit
Prediction Intervals for Improvement from DOSAGE
-30
-20
-10
0
10
20
30
40
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
DOSAGE
Upper 95% PI Predicted IMPROVE Lower 95% PI
DOSAGE = 0.35
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 8
Std Error of Prediction for the Average
• Standard error for predicting the expected or average value– A group of children have been administered a DOSAGE of
0.35mg/kg. What value do you predict for the average IMPROVE score of these children on the test?
IMPROVEi = 0.226 + 12.18 × 0.35 = 4.488^
Std Error of Prediction for the Average
• Now let’s consider the standard error– The linear model is: Yi = + Xi + ei
– We are predicting: E[Y0] = + X0
– Our prediction is: Yi = b + b Xi
– Two sources of uncertainty:
• Error in estimating • Error in estimating
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 9
• Standard error of E(Yi):SEP = [s2 / n + s2 (Xi – X)2 / xi
2 ] ½
Intercept Slope
uncertainty uncertainty
• A 100(1–)% confidence interval for E(Yi) is:
[Yi - t/2 SEP, Yi + t/2 SEP]
where Pr{tn-2 > t/2 } = /2
_
Std Error of Prediction for the Average
• In the Ritalin case with one child getting a DOSAGE of 0.35mg/kg, we have SEP = 1.585, so a 95% confidence interval for
E(IMPROVEi | DOSAGE = 0.35, b0, b1) is
[4.488 – 2.00(1.585), 4.488 + 2.00(1.585)]
= [1.319, 7.657]
Std Error of Prediction for the Average
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 10
Prediction Intervals for Expected Improvement from DOSAGE
-10
-5
0
5
10
15
20
0 0.2 0.4 0.6 0.8
DOSAGE
Upper 95% PI Predicted IMPROVE Lower 95% PI
Note: prediction interval for expected improvement are much narrower and curvature is more apparent
Predictions and Eviewsfor DOSAGE = 0.35
(SEP)2 + (SE of Regression)2 = (SDP)2
[1.5852+11.9512]1/2 =12.056
Note that SEP and SDP depend on Xi = 0.35
(SEP)2 = s2 / n + (SE(b1))2 (Xi – X)2
= 11.9512/ 64 + (5.723)2 (0.35 – 0.257)2
= 2.513 => SEP = 1.585
_
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 11
Predictions and Eviews
• The range of DOSAGE values in the data is 0 to 0.71• Given this sample, the predictive model is:
IMPROVEi = 0.226 + 12.18 × 0.35 = 4.488• We have no support from the
data whether this relation extendsoutside of the sample range (e.g., to dosages > 0.71 mg/kg)
• To predict outside the samplerange is called extrapolation
• Extrapolation is ill-advised and subject to much criticism
^
Predictions of IMPROVE from Eviews
Generate predicted values, “predict”, and the standard deviation of the prediction, “sdp”
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 12
Prediction Interval for IMPROVE from Eviews
Generate upper and lower limits for 95% prediction interval, “predup” and “preddown”
Create a “group” of “dosage”, “predict”, “predup”, and “preddown”
Prediction Interval for IMPROVE from Eviews
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 13
Graph XY lineOne X against all Y’s
Make sure that the predictor variable, DOSAGE, is the first one in the set
Prediction Interval for IMPROVE from Eviews
Note that the 95% prediction interval for IMPROVE covers a wide range of outcomes for individual students
No assurance that any one child will improve if given Ritalin
Prediction Interval for IMPROVE from Eviews
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 14
Standard Error of the Regression Line for IMPROVE from Eviews
We will calculate the standard error of the regression line (SEP)
First, save the standard error of the regression, SER
Next, derive SEP from SDP and SER
Standard Errors Around the Regression Line for IMPROVE
Generate upper and lower limits for 95% regression line, “regrup” and “regrdown”
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 15
Create a “group” of “dosage”, “predict”, “regrup”, and “regrdown”
Standard Errors Around the Regression Line for IMPROVE
Graph simple scatter
Make sure that the predictor variable, DOSAGE, is the first one in the set
Standard Errors Around the Regression Line for IMPROVE
Prediction APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 16
Note that the 95% confidence interval for the regression line is much narrower
For the dosages used in this experiment the entire interval covers positive improvement
Standard Errors Around the Regression Line for IMPROVE
Links
Ritalin Datahttp://schwert.ssb.rochester.edu/a425/a425_ritalin.wf1
Return to APS 425 Home Pagehttp://schwert.ssb.rochester.edu/a425/a425main.htm