bcor 1020 business statistics lecture 25 – april 22, 2008
Post on 22-Dec-2015
220 views
TRANSCRIPT
BCOR 1020Business Statistics
Lecture 25 – April 22, 2008
Overview
Chapter 12 – Linear Regression– Ordinary Least Squares Formulas– Tests for Significance– Analysis of Variance: Overall Fit– Confidence and Prediction Intervals for Y– Example(s)
Chapter 12 – Ordinary Least Squares Formulas
• The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small.
• Recall that the residuals are the differences between observed y-values and the fitted y-values on the line…
• The sum of the residuals = 0 for any line…
• So, we consider the sum of the squared residuals (the SSE)…
Slope and Intercept:
iii yye ˆ
Chapter 12 – Ordinary Least Squares Formulas
• To find our OLS estimators, we need to find the values of b0 and b1 that minimize the SSE…
• The OLS estimator for the slope is:
• The OLS estimator for the intercept is:
Slope and Intercept:
or
These are computed by the regression function on your computer or calculator.
Chapter 12 – Ordinary Least Squares Formulas
Example (Regression Output):• We will consider the dataset “ShipCost” from your text
(12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).
• Using MegaStat we can generate a regression output (in handout)…
• Demonstration in Excel…
y = 4.9322x - 31.19
R2 = 0.67170
1000
2000
3000
4000
5000
6000
7000
0 500 1000 1500
Orders (X)
Sh
ip C
os
t (Y
)
Chapter 12 – Ordinary Least Squares Formulas
Example (Regression Output):Regression Analysis
r² 0.672 n 12
r 0.820 k 1
Std. Error 599.029 Dep. Var. Ship Cost (Y)
Regression output confidence interval
variables coefficients std. error t (df=10) p-value 95% lower 95% upper
Intercept -31.1895 1,059.8678 -0.029 .9771 -2,392.7222 2,330.3432
Orders (X) 4.9322 1.0905 4.523 .0011 2.5024 7.3619
ANOVA table
Source SS df MS F p-value
Regression 7,340,819.5514 1 7,340,819.5514 20.46 .0011
Residual 3,588,357.1152 10 358,835.7115
Total 10,929,176.6667 11
Chapter 12 – Ordinary Least Squares Formulas
• We want to explain the total variation in Y around its mean (SST for Total Sums of Squares)
• The regression sum of squares (SSR) is the explained variation in Y
Assessing Fit:
Chapter 12 – Ordinary Least Squares Formulas
Assessing Fit:• The error sum of squares (SSE) is the
unexplained variation in Y
• If the fit is good, SSE will be relatively small compared to SST.
• A perfect fit is indicated by an SSE = 0.• The magnitude of SSE depends on n and on the
units of measurement.
Chapter 12 – Ordinary Least Squares Formulas
Coefficient of Determination:
0 < R2 < 1
• Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates perfect fit.
• In a bivariate regression, R2 = (r)2
• R2 is a measure of relative fit based on a comparison of SSR and SST.
Clickers
Suppose you are have found the regression model for a given set of bivariate data. If the correlation is r = -0.72, what is the coefficient of determination?
(A) -0.5184
(B) 0.5184
(C) 0.7200
(D) 0.8485
(E) -0.8485
Chapter 12 – Test for Significance
• The standard error (syx) is an overall measure of model fit.
Standard Error of Regression:
• If the fitted model’s predictions are perfect (SSE = 0), then syx = 0. Thus, a small syx indicates a better fit.
• Used to construct confidence intervals.
• Magnitude of syx depends on the units of measurement of Y and on data magnitude.
Chapter 12 – Test for Significance
• Standard error of the slope:Confidence Intervals for Slope and Intercept:
• Standard error of the intercept:
• Confidence interval for the true slope:
• Confidence interval for the true intercept:
Chapter 12 – Test for Significance
• If b1 = 0, then X cannot influence Y and the regression model collapses to a constant b0 plus random error.
• The hypotheses to be tested are:
• These are tested in the standard regression output in any statistics package like MegaStat.
Hypothesis Tests:
Chapter 12 – Test for Significance
• A t test is used with = n – 2 degrees of freedomThe test statistics for the slope and intercept are:
Hypothesis Tests:
• tn-2 is obtained from Appendix D or Excel for a given .
• Reject H0 if t > t or if p-value < .
• The p-value is provided in the regression output.
Chapter 12 – Test for Significance
Example (Regression Output):• Let’s revisit the regression output from the dataset
“ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).
• Go through tests for significance on 0 and 1.
Chapter 12 – Analysis of Variance
• To explain the variation in the dependent variable around its mean, use the formula
Decomposition of Variance:
• This same decomposition for the sums of squares is
• The decomposition of variance is written asSST
(total variation around the
mean)
SSE
(unexplained or error variation)
SSR
(variation explained by the
regression)
= +
Chapter 12 – Analysis of Variance
• For a bivariate regression, the F statistic is
F Statistic for Overall Fit:
• For a given sample size, a larger F statistic indicates a better fit.
• Reject H0 if F > F1,n-2 from Appendix F for a given significance level or if p-value < .
Chapter 12 – Analysis of Variance
Example (Regression Output):• Let’s revisit the regression output from the dataset
“ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).
• Go through the Analysis of Variance (ANOVA) to assess overall fit.
Chapter 12 – Example
Example (Exam Scores):• We will consider the dataset “ExamScores” from your
text (Table 12.3 on p.434) which considers the relationship between Study Hours (X) and Exam Scores (Y).
• Generate MegaStat regression output.• Output on Overhead…
Clickers
If a randomly selected student had studied 12 hours for this exam, what score would this model Predict (to the nearest %)?
(A) 51%
(B) 61%
(C) 73%
(D) 82%
Clickers
Find the p-value on the hypothesis test…
(A) 0.0012
(B) 0.0520
(C) 0.3940
(D) 1.9641
0:
0:
11
10
H
H
Clickers
Recall from Tuesday’s lecture, the critical value for testing whether the correlation is significant is given by
Compute the critical value and determine whether the correlation is significant using = 10%.
(A) Yes, r is significant.(B) No, r is not significant.
22
2,2/
2,2/
nt
tr
n
n
Clickers – Work…
Work…Since n = 10 and = 10%, t/2,n-2 = t.05,8 = 1.860.From the output, r = 0.628.
Since |r| > r, we can reject H0: = 0 in favor of H1: 0.
Or, using …
Since |T*| > t/2,n-2 = t.05,8 = 1.860, we reach the same conclusion. The correlation is significant.
549.0210860.1
860.1
2 22
2,2/
2,2/
nt
tr
n
n
282.2628.0 22 628.01210
12*
r
nrT
Chapter 12 – Confidence & Prediction Intervals for Y
• The regression line is an estimate of the conditional mean of Y.
• An interval estimate is used to show a range of likely values of the point estimate.
• Confidence Interval for the conditional mean of Y
How to Construct an Interval Estimate for Y
Chapter 12 – Confidence & Prediction Intervals for Y
How to Construct an Interval Estimate for Y• Prediction interval for individual values of Y is
• Prediction intervals are wider than confidence intervals because individual Y values vary more than the mean of Y.
Chapter 12 – Confidence & Prediction Intervals for Y
MegaStat’s Confidence and Prediction Intervals: