simple linear regression: prediction instructor: g...

Prediction APS 425 - Advanced Managerial Data Analysis

(c) Prof. G. William Schwert, 2001-2015 1

APS 425Fall 2015

Simple Linear Regression:

Prediction

Instructor: G. William Schwert585-275-2470

[email protected]

Ciba-Geigy Ritalin Experiment

• Ritalin is tested to see if it helps with Central Auditory Processing Disorder (CAPD)– Similar symptoms to ADD/ADHD

• Experiment:– “Randomly” select 64 children

– All receive auditory test

– 32 (control group) receive no drug (or placebo?)

– 32 (treatment group) receive varying doses of Ritalin

– All children are tested a second time



Ciba-Geigy Ritalin Experiment

• DOSAGEi = amount of Ritalin received by child i– Measured as Mg of Ritalin per Kg of body weight

• IMPROVEi = child’s 2nd test score – 1st test score– Dataset A425_ritalin.wf1 also contains:

• AGE of child in months

• Gender (FEMALE = 1, for girls)

Predictions



• Predictive model:

IMPROVEi = 0.226 + 12.18 DOSAGEi= the estimate of E[IMPROVEi | DOSAGEi, b0, b1]

• What is your estimate of the average IMPROVE score for all children who receive a dosage of 0.35 mg/kg?

IMPROVEi = 0.226 + 12.18 DOSAGEi = 4.488

• This question asks about the average or expected value for all children who get a DOSAGE of 0.35mg/kg.

Predictions

• A given child has been administered a DOSAGEof 0.35mg/kg. What value do you predict for the child’s IMPROVE score?

IMPROVEi = 0.226 + 12.18 × 0.35 = 4.488

• This question asks you to predict the value for an individual child who gets a DOSAGE of 0.35mg/kg

^

Predictions



• The prediction of the value for an individual child = the expected value for the population (of all children with DOSAGE = 0.35mg/kg) (see previous two slides)

• However, standard errors are different!

• Let’s derive them next

Predictions

Std Error of Predictions

• Standard error for predicting an individual value– The linear model is:

Yi = + Xi + ei

– Our prediction is:Yi = b0 + b1 Xi

– Three sources of error:• Error in estimating • Error in estimating • Error in estimating ei

^



• Standard error of Yi:SDP = [ s2 + s2 / n + s2 (Xi – X)2 / xi

2 ] ½

Error term Intercept Slope

uncertainty uncertainty uncertainty

where s2 = ei 2 / (n-2) is the residual variance

and s2 / xi 2 is the variance of b1

^

_

Std Error of Predictions

Prediction Intervals

• A 100(1–)% confidence interval for Yi is:

[Yi - t/2 SDP, Yi + t/2 SDP]

where Pr{tn-2 > t/2 } = /2

• In the Ritalin case with one child getting a DOSAGEof 0.35mg/kg, we have SDP = 12.056, so a 95% confidence interval for IMPROVEi is

[4.488 – 2.00(12.056), 4.488 + 2.00(12.056)]

= [– 19.61, 28.59]



Using Eviews to Get Prediction Intervals

• Redefine the workfile range so that you can generate an “out-of-sample” prediction


• Double-click dosage to open the spreadsheet

• Then click Edit+/-

• Move to observation 65 (which will say “NA”)

• Type in .35 into the workspace bar to enter this value for the 65th

observation




• Using the regression equation predicting IMPROVE as a function of DOSAGE, click FORECAST

• Then specify the forecast sample as 65 65 • the value of DOSAGE = 0.35 you just

entered

Upper limit

Point estimate

Lower limit

Prediction Intervals for Improvement from DOSAGE

-30

-20

-10

0

10

20

30

40

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

DOSAGE

Upper 95% PI Predicted IMPROVE Lower 95% PI

DOSAGE = 0.35



Std Error of Prediction for the Average

• Standard error for predicting the expected or average value– A group of children have been administered a DOSAGE of

0.35mg/kg. What value do you predict for the average IMPROVE score of these children on the test?

IMPROVEi = 0.226 + 12.18 × 0.35 = 4.488^


• Now let’s consider the standard error– The linear model is: Yi = + Xi + ei

– We are predicting: E[Y0] = + X0

– Our prediction is: Yi = b + b Xi

– Two sources of uncertainty:

• Error in estimating • Error in estimating



• Standard error of E(Yi):SEP = [s2 / n + s2 (Xi – X)2 / xi

2 ] ½

Intercept Slope

uncertainty uncertainty

• A 100(1–)% confidence interval for E(Yi) is:

[Yi - t/2 SEP, Yi + t/2 SEP]

where Pr{tn-2 > t/2 } = /2

_


• In the Ritalin case with one child getting a DOSAGE of 0.35mg/kg, we have SEP = 1.585, so a 95% confidence interval for

E(IMPROVEi | DOSAGE = 0.35, b0, b1) is

[4.488 – 2.00(1.585), 4.488 + 2.00(1.585)]

= [1.319, 7.657]




Prediction Intervals for Expected Improvement from DOSAGE

-10

-5

0

5

10

15

20

0 0.2 0.4 0.6 0.8

DOSAGE

Upper 95% PI Predicted IMPROVE Lower 95% PI

Note: prediction interval for expected improvement are much narrower and curvature is more apparent

Predictions and Eviewsfor DOSAGE = 0.35

(SEP)2 + (SE of Regression)2 = (SDP)2

[1.5852+11.9512]1/2 =12.056

Note that SEP and SDP depend on Xi = 0.35

(SEP)2 = s2 / n + (SE(b1))2 (Xi – X)2

= 11.9512/ 64 + (5.723)2 (0.35 – 0.257)2

= 2.513 => SEP = 1.585

_



Predictions and Eviews

• The range of DOSAGE values in the data is 0 to 0.71• Given this sample, the predictive model is:

IMPROVEi = 0.226 + 12.18 × 0.35 = 4.488• We have no support from the

data whether this relation extendsoutside of the sample range (e.g., to dosages > 0.71 mg/kg)

• To predict outside the samplerange is called extrapolation

• Extrapolation is ill-advised and subject to much criticism

^

Predictions of IMPROVE from Eviews

Generate predicted values, “predict”, and the standard deviation of the prediction, “sdp”



Prediction Interval for IMPROVE from Eviews

Generate upper and lower limits for 95% prediction interval, “predup” and “preddown”

Create a “group” of “dosage”, “predict”, “predup”, and “preddown”




Graph XY lineOne X against all Y’s

Make sure that the predictor variable, DOSAGE, is the first one in the set


Note that the 95% prediction interval for IMPROVE covers a wide range of outcomes for individual students

No assurance that any one child will improve if given Ritalin




Standard Error of the Regression Line for IMPROVE from Eviews

We will calculate the standard error of the regression line (SEP)

First, save the standard error of the regression, SER

Next, derive SEP from SDP and SER

Standard Errors Around the Regression Line for IMPROVE

Generate upper and lower limits for 95% regression line, “regrup” and “regrdown”



Create a “group” of “dosage”, “predict”, “regrup”, and “regrdown”


Graph simple scatter

Make sure that the predictor variable, DOSAGE, is the first one in the set




Note that the 95% confidence interval for the regression line is much narrower

For the dosages used in this experiment the entire interval covers positive improvement


Links

Ritalin Datahttp://schwert.ssb.rochester.edu/a425/a425_ritalin.wf1

Return to APS 425 Home Pagehttp://schwert.ssb.rochester.edu/a425/a425main.htm

simple linear regression: prediction instructor: g...

Documents