(classes 1 & 2) 2 var regression-for upload
TRANSCRIPT
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
1/99
1
The Two-Variable RegressionModel
Reminder: open OLS B-hat formulas example-sport.xls
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
2/99
Slide #2
Intentionally left blank
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
3/99
Slide #3
Do Large Market NBA Teams Make Higher Profits?
See regression output.
Profit Market Size Wins
33.4 3 57
22.0 3 63
16.0 3 46
8.7 2 42
5.4 2 44
4.7 2 55-1.5 1 35
-2.1 1 13
-4.0 1 28
NOTE: NBA market size = 3 for large, 2 for medium, 1 for small
NOTE: Don't use this approach for measuring market size.
Use a better measure like population.
Profit is in millions of $.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
4/99
Slide #4
II. Regression
To study the influence of advertising on profits,the Celtics compiled the data in Table 1. Thissample is for each of the last five years. Adexpenditures are in $100,000s and profits arein millions of dollars.
Table 1.
Year 1 2 3 4 5Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
5/99
Slide #5
Regression (cont.)
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
The Celtics need answers to these questions
1. Does advertising increase profit?
2. How much does another $100,000 spent onadvertising increase profit?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
6/99
Slide #6
Regression (cont.)
3. What will our profit be if we spend $800,000on advertising?
4. How much will we need to spend on ads togenerate $12,000,000 in profit?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
7/99
Slide #7
Regression (cont.)
Surprisingly, one statistical decision-making tool can provide answers to all
of these questions -- and more.The tool is called regression analysis.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
8/99
Slide #8
Regression (cont.)
A. Regression analysis is a statisticaltechnique
B. Attempts to "explain" movements inone variable, the dependent variable . . .
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
9/99
Slide #9
Regression (cont.)
C. as function of movements in a set ofother variables, the independent
variables . . .D. through the quantification of one or
more equations.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
10/99
Slide #10
Regression (cont.)
E. Two-variable model
1. Simplest of regression models
y = + x + 2. Model is used to describe behavior of
variables; often an equation
3. Will covera) Estimating itb) Testing hypotheses
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
11/99
Slide #11
Who Uses This
Northwestern Memorial Hospital, whichhas the largest birthing facility in the
Midwest, uses a simple regression modelto forecast delivery volume based onprevious delivery volumes. (Source:Jerry Lassa, Northwestern MemorialHospital, Chicago, IL.)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
12/99
Slide #12
Who Uses This
IRI, the largest market research firm inthe United States uses simple regression
on adjusted weekly sales data todetermine baseline sales when there isno special promotion. (Source: DougHonnold, IRI, Inc., Chicago, IL.)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
13/99
Slide #13
Regression (cont.)
To use regression analysis for answeringthose questions, the analyst needs to
find the line which best fits the data.That is, she needs to find the line that best
represents the average relationship
between x yin this datay = + x +
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
14/99
Slide #14
Regression (cont.)
The line which bestrepresents theaverage relationship between x & yinthis data can be written as
y = + x + where is the intercept of the line is
its slope.
The , are called regression coefficients.Also are unknown parameters
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
15/99
Slide #15
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
*
*
*
*
*
y
x
goal: find the line which best
fits the data; that is, find the line
which best represents the average
relationship between x & y in thisdata sample.
(Profit)
(Ad expenditures)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
16/99
Slide #16
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
The actual line can be written as
y = a + bx
where a is the intercept
of the line & b is its slope.
One possible set of values for
a and b gives
y = 0.65 + 1.58x.
*
*
*
*
*
y
x
(Profit)
(Ad expenditures)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
17/99
Slide #17
*
*
*
*
*
y
x
The actual line can be written as
y = a + bx
where a is the intercept
of the line & b is its slope.
One possible set of values for
a and b gives
y =0.65
+ 1.58x.
(Profit)
(Ad expenditures)
Intercept
(0.65)
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
18/99
Slide #18
*
*
*
*
*
y
x
The actual line can be written as
y = a + bx
where a is the intercept
of the line & b is its slope.
One possible set of values for
a and b gives
y = 0.65 +1.58
x.
(Profit)
(Ad expenditures)
Intercept
Slope
(0.65)
(1.58)
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
19/99
Slide #19
III. Error term
y = + x + A. Error term needed in model for
combination of four reasons1. variables omitted from model
2. captures effects of nonlinearities in model
3. errors in measuring Y4. random effects.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
20/99
Slide #20
IV. Background
A. Variances Standard Deviation1. There are many terms to be learned
a) SAMPLE VARIANCE OF X (OR OFMANY Xs) = SXX
b) **VARIANCE OF THE ERROR TERM = 2c) ESTIMATOR OF 2 = s2 = sum (2t )/N-Kd) STANDARD ERROR OF THE
RESIDUALS = s = square root of s2
** most important
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
21/99
Slide #21
Background (cont.)
e) VARIANCE OF = 2 = 2/SXX
f) ESTIMATOR OF 2 = s2 = s2/SXX
g) **STANDARD ERROR OF = s = squareroot of s2
** most important
^
^
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
22/99
Slide #22
V. Assumptions of Model
A. Why Bother?
1. First objective: obtain best estimates of
parameters2. Think of these assumptions as conditions
that should be satisfied for obtaining bestestimates
3. y = + x +
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
23/99
Slide #23
Assumptions (cont.)
B. Therefore, impose assumptions1. Assumption #1 (linear regression model)
a) The regression model is linear in theunknown coefficients
b) ALSO we assume that Xt and Yt arerelated in a linear way: y = + x +
c) This:(1) might be true
(2) might be good approximation if are not linearlyrelated
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
24/99
Slide #24
Assumptions (cont.)
2. Assumption #2 (errors average to zero)a) Each error term t is a random variable
with E(t) = 0.b) Means: regression line passes through
middle of data (SEE NEXT SLIDE)
3. Assumption #3 (values of the Xs vary)
a)Not all of the values of each X
tare the
same
b) Means: if an X does not vary, it cannotexplain variation in Y
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
25/99
Slide #25
*
*
*
* *
y
**
*
*
**
X
Assumption #2
Means: regression line passes throughmiddle of data
line that bestrepresents the average relationship between x y
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
26/99
Slide #26
VI. The Method Of LeastSquares
y = + x + A. The analyst can never know the true
values of in the actual line aboveB. She can, however, calculate her bestguesses (called estimates) of the truevalues of using statisticalsoftware, spreadsheets, or somecalculators
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
27/99
Slide #27
Least Squares Formulas fory = + x +
_ _
1
_2
1
( )( )
( )
i
i
n
ii
n
i
x x y y
x x
= 1.58(Celtics)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
28/99
Slide #28
Least Squares Formulas fory = + x +
_ _
y x
= 0.65(Celtics)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
29/99
Slide #29
Example Using Data
Table 1.
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit(y) 3 6 8 10 11
See Excel file that calculates the B-hat value
OLS B-hat formulas example-sport.exe
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
30/99
Slide #30
Least Squares Formulas fory = +
1x
1+
2x
2+
Use x& for x1 Use x& for x2
1
11
_ _
1
11
2
1
( )( )
( )
i
i
n
i
i
n
i
x x y y
x x
22
22
_ _
12
2
1
( )( )
( )
i
i
n
i
i
n
i
x x y y
x x
_
x
_
x
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
31/99
Slide #31
Least Squares (cont.)
C. These estimates of and , called -hat and -hat, are called estimatedregression coefficients
D. By the way, these estimatedregression coefficients are numbers.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
32/99
Slide #32
Least Squares (cont.)
E.Substituting the estimated regressioncoefficients into + x gives -hat + ( -hat) x.
F. Whenever you substitute a given value ofx into -hat + ( -hat) x, you will get:
G. y-hat = -hat + ( -hat) x where y-hat isthepredicted value of y orpredicted y.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
33/99
Slide #33
Least Squares (cont.)
F. Whenever you substitute a given value ofx into -hat + ( -hat) x, you will get:
G. y-hat = -hat + ( -hat) x where y-hat isthepredicted value of y orpredicted y.
EXAMPLE:
substitute x = 2 into y-hat = .65 + 1.58x
y-hat = .65 + 1.58(2) = 3.81
WHAT-IF SCENARIO
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
34/99
Slide #34
Least Squares (cont.)
H. The equation
y-hat = -hat + (-hat) x is the estimatedregression line.
I. So, each y-hat value comes from theestimated regression line.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
35/99
Slide #35
Least Squares (cont.)
*
*
*
*
*
yy-hat = a-hat + (b-hat)x is
estimated regression line;
one possible set of valuesis y-hat = 0.65 + 1.58x
x
Least Squares Review
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
36/99
Slide #36
Least Squares Review
y = a + bx is actual regression linea and b are (unknown) regression coefficients
if b > 0, then as x rises, y also rises
if b < 0, then as x rises, y falls
possible estimated values are a-hat = 0.65 and b-
hat = 1.58: the values 0.65 and 1.58 areestimated regression coefficients.
0.65 + 1.58x is the estimated regression line
If b = 0,then as x rises, y . . . ?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
37/99
Slide #37
Least Squares (cont.)
J. For each actual value of x, there areusually differences between each y(actual y) & y-hat (predicted y).
Example
when x = 2, y =3 (both data)
when x = 2 put into y-hat = .65 + 1.58x,
y-hat = 3.81
(y - y-hat ) = 3 - 3.81 = -0.81
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
38/99
Slide #38
Least Squares (cont.)
y - y-hat = 3 - 3.81 = -0.81
K. The value (y - y-hat) is the deviation
(error) caused by calculating y from theestimated regression line.
L. The researcher's goal is to find
values for -hat -hat so that sum of(y - y-hat)2 is as small as possible acrossthe entire sample of x y values.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
39/99
Slide #39
Least Squares (cont.)
Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11
y-hat 3.81 5.39 7.76 9.34 11.71
(y-hat = .65 + 1.58x)
error -0.81 0.61 0.24 0.63 -0.71[(y) - y-hat]
SEE NEXT SLIDES
Y h Y ( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
40/99
Slide #40
Y-hat - Y (error)
Y-hat = 0.65 + 1.58 XWhen x = 3, actual y = 6(from data)
*
**
**
y
xx = 3
y = 6
Y h t Y ( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
41/99
Slide #41
Y-hat - Y (error)
*
**
**
y
x
Y-hat = 0.65 + 1.58 XWhen x = 3, predicted y = 5.39(from line)
x = 3
y = 6
y-hat = 5.39
Y h Y ( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
42/99
Slide #42
Y-hat - Y (error)
(y - y-hat)is the deviation (error) caused byestimating y from the estimated regression line
*
**
**
y
xx = 3
y = 6
y-hat = 5.39(y - y-hat) = 6 - 5.39 = .61
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
43/99
Slide #43
Least Squares (cont.)
M. The method of least squares (OLS)gives the line of best fit (it fits thesample data best) by finding values of
-hat -hat which minimizeN. sumof (y - y-hat)2
O. Also called Error Sum of Squares
orP. ESS or SSE
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
44/99
Slide #44
Least Squares (cont.)
The aim of OLS is to pick values for -hat & -hat so that the sum of all (y - y-hat)2 is assmall as possible across entire sample of x & yvalues
*
*
*
*
*
y
x
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
45/99
Slide #45
Least Squares (cont.)
Q. Software contains formulas thatcalculate values for -hat -hat fromthe values of the sample's data.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
46/99
Slide #46
Example #1
A. In the case of profit (y) andexpenditures (x) mentioned above, theestimated least squares (OLS)
regression line is y-hat = 0.65 + 1.58x.B. The 1.58 estimate for : (positive or
negative?) relationship between profit
and expenditures (sign on 1.58)C. is marginal effect of x on y
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
47/99
Slide #47
Example #1 (cont.)
y-hat = 0.65 + 1.58x
D. always in ys unitsE. For each extra $1.00 it spends on ads,
Celtics are getting $???? of profit. (notes)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
48/99
Slide #48
Example #2
A. If you estimated a house price (y)and size (x) model, the estimated least
squares (OLS) regression line isy-hat = 52.351 + 0.139x.
B. OR PRICE = 52.351 + 0.139 SQFT
C. y=Price ($1000s)D. x=Size (sq. feet)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
49/99
Slide #49
Example #2 (cont.)
PRICE = 52.351 + 0.139 SQFT
E. If size increases by 1 unit (1 sq. ft.),price . . . (direction?) (notes)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
50/99
Slide #50
Example #2 (cont.)
PRICE = 52.351 + 0.139 SQFT
G. If size increases by 1 unit (1 sq. ft.),price rises. . . (magnitude?) (notes)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
51/99
Slide #51
Exercise
You will next do an exercise to help consolidatemany of the concepts you have seen in thisintroduction to regression.
Answer the questions on the next slide.
Form groups and work on the questions for 5minutes.
I will then lead a discussion of your answers.
Ad E dit ( ) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
52/99
Slide #52
1. Does advertising increase profit?
2. How much does another $100, 000 increase profit?
3. What will our profit be if we spend $800,000 onadvertising?
4. How much will we need to spend on ads to generate$12,000,000 in profit?
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer prints
these results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
53/99
Slide #53
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer prints
these results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
54/99
Slide #54
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
55/99
Slide #55
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
56/99
Slide #56
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
57/99
Slide #57
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
58/99
Slide #58
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
59/99
Slide #59
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
Ad Expenditures (x) 2 3 4 5 5 5 7
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
60/99
Slide #60
Intentionally left blank
Ad Expenditures (x) 2 3 4.5 5.5 7
Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
61/99
Slide #61
Look at regression #1 on your handout
and tell me the answer.
Do Large Market NBA Teams Make
Higher Profits?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
62/99
Slide #62
Dummy Variables
A. New type of variable
1. Usually use quantitative variables;continuous
2. Sometimes variables take small number ofvalues; discrete
a) Market size
b) Genderc) Season
d) Marital status (married vs. not), etc
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
63/99
Slide #63
Dummy Variables (cont.)
B. Will use Qualitative (or Dummy) Variables1. Create a special variable that takes a value of
a) if the unit of observation falls into onecategory
b) if the unit falls into the other category
1
0
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
64/99
Slide #64
Dummy Variables (cont.)
C. Dummy variable as IV
PRICE = 52.351 + 0.139 SQFT
+ 18.52 POOL
POOL = 1 if house has a pool
POOL = 0 if no pool
More later
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
65/99
Slide #65
Dummy Variables (cont.)
D. Dummy variable as DV
BUY= 3.15 + 10.19 INCOME - 1.5 PRICE
BUY = 1 if buy the house
BUY = 0 if dont buy house
More later
VII Regression Hypotheses
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
66/99
Slide #66
VII. Regression HypothesesTests
Look at regression #1 on your handout
Questions:
1. Is there relationship among DV IVs? Profit and market size
2. How well does my model fit the data? How well does market size explain profit?
3. Which IVs affect DV? Does market size influence profit?
Note: same as #1 when have only one IV
VIII Testing Entire
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
67/99
Slide #67
VIII. Testing EntireRegression (F test)
A. Always first test
B. Tests entire model
C. If model fails this test1.
2. Model no good3. Back to drawing board
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
68/99
Slide #68
F test (cont.)
D. Step #1
1. HO: no relationship between y xs
2. OR HO: 1 = 2 = . . . = k = 0 ( 0)3. HA: is relationship between y xs
4. OR HA: at least 1 of s 0 ( 0)E. Step #21. Software prints test statistic
y = + 1x1 + 2x2 + 3x3 +
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
69/99
Slide #69
F test (cont.)
F. Step #3
1. Software prints a p-value
2. p-value is probability of making a Type Ierror
G. Step #4
1. Reject HO
if
p-value 5% (or 1%)
2. Do not reject HO ifp-value > 5% (or 1%)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
70/99
Slide #70
F test (cont.)
H. Rule
1. Large F-statistics are better
I. Logic1. F-statistic is ratio
explained variance
unexplained variance
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
71/99
Slide #71
F test (cont.)
explained variance
unexplained variance
3. If numerator = 0 (small)a) F = 0 (small)
b) Model terrible
4. If numerator large vs. denominatora) F largeb) Model good
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
72/99
Slide #72
F test (cont.)
ExamplePROFIT= -20.8 + .185 WINS + 11.2 MKTSIZE
F = 21.10 (0.002)
Do MKTSIZE & WINS jointly influence
PROFIT?
See regression output for F statistic.
IX Testing Regression:
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
73/99
Slide #73
IX. Testing Regression:Goodness of Fit
A. This is about measuring how well theestimated regression line fits the data.
B. The OLS estimated regression linefits the data better than other lines - -but how well does it fit?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
74/99
Slide #74
Goodness of Fit (cont.)
C. To measure how well the regressionline fits the data (its goodness of fit), usethe calculated value called R2.
D. R2 tells thepercentageof the variationamong y-values in your sample datathat is explained by variation of the x-values that are in your regression.
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
75/99
Slide #75
Goodness of Fit (cont.)
E. R2 = explained variation / totalvariation
= RSS / TSS= 1 (ESS / TSS)
( recall: TSS = RSS + ESS)
F. Use this second, after F test
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
76/99
Slide #76
Goodness of Fit (cont.)
R2
= explained variation / total variation
Questions about R2
1. What is max R2 percentage?2. What is min R2 percentage?
3. The closer R2 is to ??, the betterthe fit of the
estimated regression line to the data4. The closer R2 is to ??, the worse the fit of the
estimated regression line to the data
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
77/99
Slide #77
Questions about R2(cont.)
R2
= explained variation / total variation
5. Which is better: R2 =.25 or R2 =.89 ?
6. Why?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
78/99
Slide #78
Characteristics of R2
A. 0 R2 1.00
(same as 0% R2 100%)
B. By the way, R2 = 1.00 is perfectcorrelation,
C. either positive or negative
D. You cant tell from the R2 alone.
Characteristics of R2
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
79/99
Slide #79
Characteristics of R Review
E. 0 R2 1.00
(same as 0% R2 100%)
F. The closer R2
is to 1, the better the fitof the estimated regression line to thedata
G. The closer R2 is to 0, the worse the
fit of the estimated regression line todata
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
80/99
Slide #80
Goodness of Fit (cont.)
G. A high R2 does notmean that the xscausey. It means that xs and y arehighly correlated.
H. Example:
if R2 = 0.89, means 89% of variation of
y is explained by the regression line(see next two slides)
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
81/99
Slide #81
*
*
**
*
y
*
*
*
Better fit of regression
line to data
*
*
*
* *
y
* *
*
*
**
Worse fit of regression
line to data
R2 = 0.89
R2 = 0.25
Whi hWhi h
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
82/99
Slide #82
*
*
*
**
y
*
*
*
*
*
*
*
*y
*
*
*
*
* *
Which one
R2 = 0.89?Which oneR2 = 0.10?
A B
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
83/99
Slide #83
Example
PROFIT= -20.8 + .185 WINS + 11.2 MKTSIZE
with
R2
= 87.6%What % of (variation or variance?) inPROFIT is explained by MKTSIZE & WINS?
How good is this estimated regression line?
See regression output for R2.
X Testing Regression:
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
84/99
Slide #84
X. Testing Regression:t-tests
A. After R2 (third)
B. Tests individual IVs
C. Questions1. Does x2 affect y?
2. Does x3 affect y?
3. Etc.NOTE: t-tests apply to ALL IVs incl.dummy variables
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
85/99
Slide #85
t-tests (cont.)
D. If any xk does not affect y
1. Do NOT automatically drop xk
2. more later
( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
86/99
Slide #86
t-tests (cont.)
E. Step #1
1. HO: k = 02. Meansno relationship between y xk3. HA: k 04. Means is relationship between y xk
y = + 1x1 + 2x2 + 3x3 +
( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
87/99
Slide #87
t-tests (cont.)
F. Step #21. Software prints test statistic
G. Step #31. Software prints a p-value
2. p-value is probability of making a Type I error
H. Step #4
1. Reject HO ifp-value 5% (or 1%)2. Do not reject HO ifp-value > 5% (or 1%)
( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
88/99
Slide #88
t-tests (cont.)
I. Formula
1. tn-K = [(k -hat) - (k - hypothesized)] / s-hat ]2. since usually HO: k = 03. tn-K = (k-hat) / s-hat ]
J. Rules
1. Large t statistics are better2. Small p-values are better
( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
89/99
Slide #89
t-tests (cont.)
K. Limitations of t-test
1. Does not prove theoretical validitya) Only shows statistically significant
correlation
b) y-hat = 10.9 + 3.2 x
(19.5) (13.9)
(.0001) (.0001)c) appears that x causes y
( )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
90/99
Slide #90
t-tests (cont.)
d) actually(1) y = Britain's overall business activity
(2) x = sunspot activity
e) actual regression from 19th
century!2. Does not prove causality
a) Only shows statistically significantcorrelation
b) See abovec) You impose causality by choices of DV
IVs
t t t ( t )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
91/99
Slide #91
t-tests (cont.)
3. Does not show which x has biggestimpact on y
a) Common mistake
(1) "Biggest t-statistic means that x has biggest impacton y"
b) size of t shows prob. of type I error(1) larger t value: less chance of type I error
(2) larger t value: more confidence relationship exists
FALSE
t t t ( t )
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
92/99
Slide #92
t-tests (cont.)
SupposePROFIT= -20.8 + .185 WINS + 11.2 MKTSIZE
(-4.02) (1.41) (4.51)(0.007) (0.207) (0.004)
Is = 0 or0?
Is 1 = 0 or10?Is 2 = 0 or20?
See regression output for t-statistics.
E l t th M d l
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
93/99
Slide #93
Evaluate the Model
When you are asked to evaluate themodel:
F statistic
R2
Each t-statistic
Overall evaluationWeak, fair, . . ., Yippee!
See the handout titledEvaluate the Model
XI. Testing Regression:
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
94/99
Slide #94
. esting egression:Akaike Information Criteria
Number printed by many statisticalapplications along with rest ofregression output
Simple rule: lower the value of AIC, thebetter the model
XII T ti R i
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
95/99
Slide #95
XII. Testing Regression
The most difficult way to test regressionmodel is by applying . . .
COMMON SENSE!!
Remember: GNP = 10.9 + 3.2 sunspotsDoes that make sense?
XIII N li R l ti hi
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
96/99
Slide #96
XIII. Nonlinear Relationships
A. Introduction
1. So far: y = + x +
2. Linear relationship y x
3. Many nonlinear relationships in world
4. Can model those as well
5. More later
**
*
*
*
**
*
*
* *
Do More Wins Generate Higher Profits in the NBA?
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
97/99
Slide #97
Profit Market Size Wins
33.4 3 25
22.0 3 63
16.0 3 55
8.7 2 42
5.4 2 57
4.7 2 46-1.5 1 39
-2.1 1 13
-4.0 1 28
NOTE: NBA market size = 3 for large, 2 = medium, 1 = smallNOTE: don't use this approach for measuring market size.
Use a better measure like population.
Do More Wins Generate Higher Profits in the NBA?
See regression output and EVALUATE THIS MODEL.
QB R ti H d t
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
98/99
Slide #98
QB Rating Handout
Evaluate this regression output
Tom BradyNew England Patriots
MVP Super BowlXXXVIII
E i
-
7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload
99/99
Exercise
Two-variable Regression Exercise #1