(classes 1 & 2) 2 var regression-for upload

Upload: noor-afzal

Post on 03-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    1/99

    1

    The Two-Variable RegressionModel

    Reminder: open OLS B-hat formulas example-sport.xls

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    2/99

    Slide #2

    Intentionally left blank

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    3/99

    Slide #3

    Do Large Market NBA Teams Make Higher Profits?

    See regression output.

    Profit Market Size Wins

    33.4 3 57

    22.0 3 63

    16.0 3 46

    8.7 2 42

    5.4 2 44

    4.7 2 55-1.5 1 35

    -2.1 1 13

    -4.0 1 28

    NOTE: NBA market size = 3 for large, 2 for medium, 1 for small

    NOTE: Don't use this approach for measuring market size.

    Use a better measure like population.

    Profit is in millions of $.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    4/99

    Slide #4

    II. Regression

    To study the influence of advertising on profits,the Celtics compiled the data in Table 1. Thissample is for each of the last five years. Adexpenditures are in $100,000s and profits arein millions of dollars.

    Table 1.

    Year 1 2 3 4 5Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    5/99

    Slide #5

    Regression (cont.)

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

    The Celtics need answers to these questions

    1. Does advertising increase profit?

    2. How much does another $100,000 spent onadvertising increase profit?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    6/99

    Slide #6

    Regression (cont.)

    3. What will our profit be if we spend $800,000on advertising?

    4. How much will we need to spend on ads togenerate $12,000,000 in profit?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    7/99

    Slide #7

    Regression (cont.)

    Surprisingly, one statistical decision-making tool can provide answers to all

    of these questions -- and more.The tool is called regression analysis.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    8/99

    Slide #8

    Regression (cont.)

    A. Regression analysis is a statisticaltechnique

    B. Attempts to "explain" movements inone variable, the dependent variable . . .

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    9/99

    Slide #9

    Regression (cont.)

    C. as function of movements in a set ofother variables, the independent

    variables . . .D. through the quantification of one or

    more equations.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    10/99

    Slide #10

    Regression (cont.)

    E. Two-variable model

    1. Simplest of regression models

    y = + x + 2. Model is used to describe behavior of

    variables; often an equation

    3. Will covera) Estimating itb) Testing hypotheses

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    11/99

    Slide #11

    Who Uses This

    Northwestern Memorial Hospital, whichhas the largest birthing facility in the

    Midwest, uses a simple regression modelto forecast delivery volume based onprevious delivery volumes. (Source:Jerry Lassa, Northwestern MemorialHospital, Chicago, IL.)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    12/99

    Slide #12

    Who Uses This

    IRI, the largest market research firm inthe United States uses simple regression

    on adjusted weekly sales data todetermine baseline sales when there isno special promotion. (Source: DougHonnold, IRI, Inc., Chicago, IL.)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    13/99

    Slide #13

    Regression (cont.)

    To use regression analysis for answeringthose questions, the analyst needs to

    find the line which best fits the data.That is, she needs to find the line that best

    represents the average relationship

    between x yin this datay = + x +

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    14/99

    Slide #14

    Regression (cont.)

    The line which bestrepresents theaverage relationship between x & yinthis data can be written as

    y = + x + where is the intercept of the line is

    its slope.

    The , are called regression coefficients.Also are unknown parameters

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    15/99

    Slide #15

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

    *

    *

    *

    *

    *

    y

    x

    goal: find the line which best

    fits the data; that is, find the line

    which best represents the average

    relationship between x & y in thisdata sample.

    (Profit)

    (Ad expenditures)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    16/99

    Slide #16

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

    The actual line can be written as

    y = a + bx

    where a is the intercept

    of the line & b is its slope.

    One possible set of values for

    a and b gives

    y = 0.65 + 1.58x.

    *

    *

    *

    *

    *

    y

    x

    (Profit)

    (Ad expenditures)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    17/99

    Slide #17

    *

    *

    *

    *

    *

    y

    x

    The actual line can be written as

    y = a + bx

    where a is the intercept

    of the line & b is its slope.

    One possible set of values for

    a and b gives

    y =0.65

    + 1.58x.

    (Profit)

    (Ad expenditures)

    Intercept

    (0.65)

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    18/99

    Slide #18

    *

    *

    *

    *

    *

    y

    x

    The actual line can be written as

    y = a + bx

    where a is the intercept

    of the line & b is its slope.

    One possible set of values for

    a and b gives

    y = 0.65 +1.58

    x.

    (Profit)

    (Ad expenditures)

    Intercept

    Slope

    (0.65)

    (1.58)

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    19/99

    Slide #19

    III. Error term

    y = + x + A. Error term needed in model for

    combination of four reasons1. variables omitted from model

    2. captures effects of nonlinearities in model

    3. errors in measuring Y4. random effects.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    20/99

    Slide #20

    IV. Background

    A. Variances Standard Deviation1. There are many terms to be learned

    a) SAMPLE VARIANCE OF X (OR OFMANY Xs) = SXX

    b) **VARIANCE OF THE ERROR TERM = 2c) ESTIMATOR OF 2 = s2 = sum (2t )/N-Kd) STANDARD ERROR OF THE

    RESIDUALS = s = square root of s2

    ** most important

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    21/99

    Slide #21

    Background (cont.)

    e) VARIANCE OF = 2 = 2/SXX

    f) ESTIMATOR OF 2 = s2 = s2/SXX

    g) **STANDARD ERROR OF = s = squareroot of s2

    ** most important

    ^

    ^

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    22/99

    Slide #22

    V. Assumptions of Model

    A. Why Bother?

    1. First objective: obtain best estimates of

    parameters2. Think of these assumptions as conditions

    that should be satisfied for obtaining bestestimates

    3. y = + x +

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    23/99

    Slide #23

    Assumptions (cont.)

    B. Therefore, impose assumptions1. Assumption #1 (linear regression model)

    a) The regression model is linear in theunknown coefficients

    b) ALSO we assume that Xt and Yt arerelated in a linear way: y = + x +

    c) This:(1) might be true

    (2) might be good approximation if are not linearlyrelated

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    24/99

    Slide #24

    Assumptions (cont.)

    2. Assumption #2 (errors average to zero)a) Each error term t is a random variable

    with E(t) = 0.b) Means: regression line passes through

    middle of data (SEE NEXT SLIDE)

    3. Assumption #3 (values of the Xs vary)

    a)Not all of the values of each X

    tare the

    same

    b) Means: if an X does not vary, it cannotexplain variation in Y

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    25/99

    Slide #25

    *

    *

    *

    * *

    y

    **

    *

    *

    **

    X

    Assumption #2

    Means: regression line passes throughmiddle of data

    line that bestrepresents the average relationship between x y

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    26/99

    Slide #26

    VI. The Method Of LeastSquares

    y = + x + A. The analyst can never know the true

    values of in the actual line aboveB. She can, however, calculate her bestguesses (called estimates) of the truevalues of using statisticalsoftware, spreadsheets, or somecalculators

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    27/99

    Slide #27

    Least Squares Formulas fory = + x +

    _ _

    1

    _2

    1

    ( )( )

    ( )

    i

    i

    n

    ii

    n

    i

    x x y y

    x x

    = 1.58(Celtics)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    28/99

    Slide #28

    Least Squares Formulas fory = + x +

    _ _

    y x

    = 0.65(Celtics)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    29/99

    Slide #29

    Example Using Data

    Table 1.

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit(y) 3 6 8 10 11

    See Excel file that calculates the B-hat value

    OLS B-hat formulas example-sport.exe

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    30/99

    Slide #30

    Least Squares Formulas fory = +

    1x

    1+

    2x

    2+

    Use x& for x1 Use x& for x2

    1

    11

    _ _

    1

    11

    2

    1

    ( )( )

    ( )

    i

    i

    n

    i

    i

    n

    i

    x x y y

    x x

    22

    22

    _ _

    12

    2

    1

    ( )( )

    ( )

    i

    i

    n

    i

    i

    n

    i

    x x y y

    x x

    _

    x

    _

    x

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    31/99

    Slide #31

    Least Squares (cont.)

    C. These estimates of and , called -hat and -hat, are called estimatedregression coefficients

    D. By the way, these estimatedregression coefficients are numbers.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    32/99

    Slide #32

    Least Squares (cont.)

    E.Substituting the estimated regressioncoefficients into + x gives -hat + ( -hat) x.

    F. Whenever you substitute a given value ofx into -hat + ( -hat) x, you will get:

    G. y-hat = -hat + ( -hat) x where y-hat isthepredicted value of y orpredicted y.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    33/99

    Slide #33

    Least Squares (cont.)

    F. Whenever you substitute a given value ofx into -hat + ( -hat) x, you will get:

    G. y-hat = -hat + ( -hat) x where y-hat isthepredicted value of y orpredicted y.

    EXAMPLE:

    substitute x = 2 into y-hat = .65 + 1.58x

    y-hat = .65 + 1.58(2) = 3.81

    WHAT-IF SCENARIO

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    34/99

    Slide #34

    Least Squares (cont.)

    H. The equation

    y-hat = -hat + (-hat) x is the estimatedregression line.

    I. So, each y-hat value comes from theestimated regression line.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    35/99

    Slide #35

    Least Squares (cont.)

    *

    *

    *

    *

    *

    yy-hat = a-hat + (b-hat)x is

    estimated regression line;

    one possible set of valuesis y-hat = 0.65 + 1.58x

    x

    Least Squares Review

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    36/99

    Slide #36

    Least Squares Review

    y = a + bx is actual regression linea and b are (unknown) regression coefficients

    if b > 0, then as x rises, y also rises

    if b < 0, then as x rises, y falls

    possible estimated values are a-hat = 0.65 and b-

    hat = 1.58: the values 0.65 and 1.58 areestimated regression coefficients.

    0.65 + 1.58x is the estimated regression line

    If b = 0,then as x rises, y . . . ?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    37/99

    Slide #37

    Least Squares (cont.)

    J. For each actual value of x, there areusually differences between each y(actual y) & y-hat (predicted y).

    Example

    when x = 2, y =3 (both data)

    when x = 2 put into y-hat = .65 + 1.58x,

    y-hat = 3.81

    (y - y-hat ) = 3 - 3.81 = -0.81

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    38/99

    Slide #38

    Least Squares (cont.)

    y - y-hat = 3 - 3.81 = -0.81

    K. The value (y - y-hat) is the deviation

    (error) caused by calculating y from theestimated regression line.

    L. The researcher's goal is to find

    values for -hat -hat so that sum of(y - y-hat)2 is as small as possible acrossthe entire sample of x y values.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    39/99

    Slide #39

    Least Squares (cont.)

    Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11

    y-hat 3.81 5.39 7.76 9.34 11.71

    (y-hat = .65 + 1.58x)

    error -0.81 0.61 0.24 0.63 -0.71[(y) - y-hat]

    SEE NEXT SLIDES

    Y h Y ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    40/99

    Slide #40

    Y-hat - Y (error)

    Y-hat = 0.65 + 1.58 XWhen x = 3, actual y = 6(from data)

    *

    **

    **

    y

    xx = 3

    y = 6

    Y h t Y ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    41/99

    Slide #41

    Y-hat - Y (error)

    *

    **

    **

    y

    x

    Y-hat = 0.65 + 1.58 XWhen x = 3, predicted y = 5.39(from line)

    x = 3

    y = 6

    y-hat = 5.39

    Y h Y ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    42/99

    Slide #42

    Y-hat - Y (error)

    (y - y-hat)is the deviation (error) caused byestimating y from the estimated regression line

    *

    **

    **

    y

    xx = 3

    y = 6

    y-hat = 5.39(y - y-hat) = 6 - 5.39 = .61

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    43/99

    Slide #43

    Least Squares (cont.)

    M. The method of least squares (OLS)gives the line of best fit (it fits thesample data best) by finding values of

    -hat -hat which minimizeN. sumof (y - y-hat)2

    O. Also called Error Sum of Squares

    orP. ESS or SSE

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    44/99

    Slide #44

    Least Squares (cont.)

    The aim of OLS is to pick values for -hat & -hat so that the sum of all (y - y-hat)2 is assmall as possible across entire sample of x & yvalues

    *

    *

    *

    *

    *

    y

    x

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    45/99

    Slide #45

    Least Squares (cont.)

    Q. Software contains formulas thatcalculate values for -hat -hat fromthe values of the sample's data.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    46/99

    Slide #46

    Example #1

    A. In the case of profit (y) andexpenditures (x) mentioned above, theestimated least squares (OLS)

    regression line is y-hat = 0.65 + 1.58x.B. The 1.58 estimate for : (positive or

    negative?) relationship between profit

    and expenditures (sign on 1.58)C. is marginal effect of x on y

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    47/99

    Slide #47

    Example #1 (cont.)

    y-hat = 0.65 + 1.58x

    D. always in ys unitsE. For each extra $1.00 it spends on ads,

    Celtics are getting $???? of profit. (notes)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    48/99

    Slide #48

    Example #2

    A. If you estimated a house price (y)and size (x) model, the estimated least

    squares (OLS) regression line isy-hat = 52.351 + 0.139x.

    B. OR PRICE = 52.351 + 0.139 SQFT

    C. y=Price ($1000s)D. x=Size (sq. feet)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    49/99

    Slide #49

    Example #2 (cont.)

    PRICE = 52.351 + 0.139 SQFT

    E. If size increases by 1 unit (1 sq. ft.),price . . . (direction?) (notes)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    50/99

    Slide #50

    Example #2 (cont.)

    PRICE = 52.351 + 0.139 SQFT

    G. If size increases by 1 unit (1 sq. ft.),price rises. . . (magnitude?) (notes)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    51/99

    Slide #51

    Exercise

    You will next do an exercise to help consolidatemany of the concepts you have seen in thisintroduction to regression.

    Answer the questions on the next slide.

    Form groups and work on the questions for 5minutes.

    I will then lead a discussion of your answers.

    Ad E dit ( ) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    52/99

    Slide #52

    1. Does advertising increase profit?

    2. How much does another $100, 000 increase profit?

    3. What will our profit be if we spend $800,000 onadvertising?

    4. How much will we need to spend on ads to generate$12,000,000 in profit?

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer prints

    these results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    53/99

    Slide #53

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer prints

    these results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    54/99

    Slide #54

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    55/99

    Slide #55

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    56/99

    Slide #56

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    57/99

    Slide #57

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    58/99

    Slide #58

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    59/99

    Slide #59

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

    Ad Expenditures (x) 2 3 4 5 5 5 7

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    60/99

    Slide #60

    Intentionally left blank

    Ad Expenditures (x) 2 3 4.5 5.5 7

    Profit (y) 3 6 8 10 11After estimating the regression line y = a + bx, the computer printsthese results: a-hat = 0.65, b-hat = 1.58. This means thaty-hat = 0.65 + 1.58x.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    61/99

    Slide #61

    Look at regression #1 on your handout

    and tell me the answer.

    Do Large Market NBA Teams Make

    Higher Profits?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    62/99

    Slide #62

    Dummy Variables

    A. New type of variable

    1. Usually use quantitative variables;continuous

    2. Sometimes variables take small number ofvalues; discrete

    a) Market size

    b) Genderc) Season

    d) Marital status (married vs. not), etc

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    63/99

    Slide #63

    Dummy Variables (cont.)

    B. Will use Qualitative (or Dummy) Variables1. Create a special variable that takes a value of

    a) if the unit of observation falls into onecategory

    b) if the unit falls into the other category

    1

    0

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    64/99

    Slide #64

    Dummy Variables (cont.)

    C. Dummy variable as IV

    PRICE = 52.351 + 0.139 SQFT

    + 18.52 POOL

    POOL = 1 if house has a pool

    POOL = 0 if no pool

    More later

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    65/99

    Slide #65

    Dummy Variables (cont.)

    D. Dummy variable as DV

    BUY= 3.15 + 10.19 INCOME - 1.5 PRICE

    BUY = 1 if buy the house

    BUY = 0 if dont buy house

    More later

    VII Regression Hypotheses

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    66/99

    Slide #66

    VII. Regression HypothesesTests

    Look at regression #1 on your handout

    Questions:

    1. Is there relationship among DV IVs? Profit and market size

    2. How well does my model fit the data? How well does market size explain profit?

    3. Which IVs affect DV? Does market size influence profit?

    Note: same as #1 when have only one IV

    VIII Testing Entire

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    67/99

    Slide #67

    VIII. Testing EntireRegression (F test)

    A. Always first test

    B. Tests entire model

    C. If model fails this test1.

    2. Model no good3. Back to drawing board

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    68/99

    Slide #68

    F test (cont.)

    D. Step #1

    1. HO: no relationship between y xs

    2. OR HO: 1 = 2 = . . . = k = 0 ( 0)3. HA: is relationship between y xs

    4. OR HA: at least 1 of s 0 ( 0)E. Step #21. Software prints test statistic

    y = + 1x1 + 2x2 + 3x3 +

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    69/99

    Slide #69

    F test (cont.)

    F. Step #3

    1. Software prints a p-value

    2. p-value is probability of making a Type Ierror

    G. Step #4

    1. Reject HO

    if

    p-value 5% (or 1%)

    2. Do not reject HO ifp-value > 5% (or 1%)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    70/99

    Slide #70

    F test (cont.)

    H. Rule

    1. Large F-statistics are better

    I. Logic1. F-statistic is ratio

    explained variance

    unexplained variance

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    71/99

    Slide #71

    F test (cont.)

    explained variance

    unexplained variance

    3. If numerator = 0 (small)a) F = 0 (small)

    b) Model terrible

    4. If numerator large vs. denominatora) F largeb) Model good

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    72/99

    Slide #72

    F test (cont.)

    ExamplePROFIT= -20.8 + .185 WINS + 11.2 MKTSIZE

    F = 21.10 (0.002)

    Do MKTSIZE & WINS jointly influence

    PROFIT?

    See regression output for F statistic.

    IX Testing Regression:

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    73/99

    Slide #73

    IX. Testing Regression:Goodness of Fit

    A. This is about measuring how well theestimated regression line fits the data.

    B. The OLS estimated regression linefits the data better than other lines - -but how well does it fit?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    74/99

    Slide #74

    Goodness of Fit (cont.)

    C. To measure how well the regressionline fits the data (its goodness of fit), usethe calculated value called R2.

    D. R2 tells thepercentageof the variationamong y-values in your sample datathat is explained by variation of the x-values that are in your regression.

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    75/99

    Slide #75

    Goodness of Fit (cont.)

    E. R2 = explained variation / totalvariation

    = RSS / TSS= 1 (ESS / TSS)

    ( recall: TSS = RSS + ESS)

    F. Use this second, after F test

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    76/99

    Slide #76

    Goodness of Fit (cont.)

    R2

    = explained variation / total variation

    Questions about R2

    1. What is max R2 percentage?2. What is min R2 percentage?

    3. The closer R2 is to ??, the betterthe fit of the

    estimated regression line to the data4. The closer R2 is to ??, the worse the fit of the

    estimated regression line to the data

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    77/99

    Slide #77

    Questions about R2(cont.)

    R2

    = explained variation / total variation

    5. Which is better: R2 =.25 or R2 =.89 ?

    6. Why?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    78/99

    Slide #78

    Characteristics of R2

    A. 0 R2 1.00

    (same as 0% R2 100%)

    B. By the way, R2 = 1.00 is perfectcorrelation,

    C. either positive or negative

    D. You cant tell from the R2 alone.

    Characteristics of R2

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    79/99

    Slide #79

    Characteristics of R Review

    E. 0 R2 1.00

    (same as 0% R2 100%)

    F. The closer R2

    is to 1, the better the fitof the estimated regression line to thedata

    G. The closer R2 is to 0, the worse the

    fit of the estimated regression line todata

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    80/99

    Slide #80

    Goodness of Fit (cont.)

    G. A high R2 does notmean that the xscausey. It means that xs and y arehighly correlated.

    H. Example:

    if R2 = 0.89, means 89% of variation of

    y is explained by the regression line(see next two slides)

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    81/99

    Slide #81

    *

    *

    **

    *

    y

    *

    *

    *

    Better fit of regression

    line to data

    *

    *

    *

    * *

    y

    * *

    *

    *

    **

    Worse fit of regression

    line to data

    R2 = 0.89

    R2 = 0.25

    Whi hWhi h

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    82/99

    Slide #82

    *

    *

    *

    **

    y

    *

    *

    *

    *

    *

    *

    *

    *y

    *

    *

    *

    *

    * *

    Which one

    R2 = 0.89?Which oneR2 = 0.10?

    A B

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    83/99

    Slide #83

    Example

    PROFIT= -20.8 + .185 WINS + 11.2 MKTSIZE

    with

    R2

    = 87.6%What % of (variation or variance?) inPROFIT is explained by MKTSIZE & WINS?

    How good is this estimated regression line?

    See regression output for R2.

    X Testing Regression:

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    84/99

    Slide #84

    X. Testing Regression:t-tests

    A. After R2 (third)

    B. Tests individual IVs

    C. Questions1. Does x2 affect y?

    2. Does x3 affect y?

    3. Etc.NOTE: t-tests apply to ALL IVs incl.dummy variables

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    85/99

    Slide #85

    t-tests (cont.)

    D. If any xk does not affect y

    1. Do NOT automatically drop xk

    2. more later

    ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    86/99

    Slide #86

    t-tests (cont.)

    E. Step #1

    1. HO: k = 02. Meansno relationship between y xk3. HA: k 04. Means is relationship between y xk

    y = + 1x1 + 2x2 + 3x3 +

    ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    87/99

    Slide #87

    t-tests (cont.)

    F. Step #21. Software prints test statistic

    G. Step #31. Software prints a p-value

    2. p-value is probability of making a Type I error

    H. Step #4

    1. Reject HO ifp-value 5% (or 1%)2. Do not reject HO ifp-value > 5% (or 1%)

    ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    88/99

    Slide #88

    t-tests (cont.)

    I. Formula

    1. tn-K = [(k -hat) - (k - hypothesized)] / s-hat ]2. since usually HO: k = 03. tn-K = (k-hat) / s-hat ]

    J. Rules

    1. Large t statistics are better2. Small p-values are better

    ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    89/99

    Slide #89

    t-tests (cont.)

    K. Limitations of t-test

    1. Does not prove theoretical validitya) Only shows statistically significant

    correlation

    b) y-hat = 10.9 + 3.2 x

    (19.5) (13.9)

    (.0001) (.0001)c) appears that x causes y

    ( )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    90/99

    Slide #90

    t-tests (cont.)

    d) actually(1) y = Britain's overall business activity

    (2) x = sunspot activity

    e) actual regression from 19th

    century!2. Does not prove causality

    a) Only shows statistically significantcorrelation

    b) See abovec) You impose causality by choices of DV

    IVs

    t t t ( t )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    91/99

    Slide #91

    t-tests (cont.)

    3. Does not show which x has biggestimpact on y

    a) Common mistake

    (1) "Biggest t-statistic means that x has biggest impacton y"

    b) size of t shows prob. of type I error(1) larger t value: less chance of type I error

    (2) larger t value: more confidence relationship exists

    FALSE

    t t t ( t )

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    92/99

    Slide #92

    t-tests (cont.)

    SupposePROFIT= -20.8 + .185 WINS + 11.2 MKTSIZE

    (-4.02) (1.41) (4.51)(0.007) (0.207) (0.004)

    Is = 0 or0?

    Is 1 = 0 or10?Is 2 = 0 or20?

    See regression output for t-statistics.

    E l t th M d l

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    93/99

    Slide #93

    Evaluate the Model

    When you are asked to evaluate themodel:

    F statistic

    R2

    Each t-statistic

    Overall evaluationWeak, fair, . . ., Yippee!

    See the handout titledEvaluate the Model

    XI. Testing Regression:

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    94/99

    Slide #94

    . esting egression:Akaike Information Criteria

    Number printed by many statisticalapplications along with rest ofregression output

    Simple rule: lower the value of AIC, thebetter the model

    XII T ti R i

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    95/99

    Slide #95

    XII. Testing Regression

    The most difficult way to test regressionmodel is by applying . . .

    COMMON SENSE!!

    Remember: GNP = 10.9 + 3.2 sunspotsDoes that make sense?

    XIII N li R l ti hi

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    96/99

    Slide #96

    XIII. Nonlinear Relationships

    A. Introduction

    1. So far: y = + x +

    2. Linear relationship y x

    3. Many nonlinear relationships in world

    4. Can model those as well

    5. More later

    **

    *

    *

    *

    **

    *

    *

    * *

    Do More Wins Generate Higher Profits in the NBA?

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    97/99

    Slide #97

    Profit Market Size Wins

    33.4 3 25

    22.0 3 63

    16.0 3 55

    8.7 2 42

    5.4 2 57

    4.7 2 46-1.5 1 39

    -2.1 1 13

    -4.0 1 28

    NOTE: NBA market size = 3 for large, 2 = medium, 1 = smallNOTE: don't use this approach for measuring market size.

    Use a better measure like population.

    Do More Wins Generate Higher Profits in the NBA?

    See regression output and EVALUATE THIS MODEL.

    QB R ti H d t

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    98/99

    Slide #98

    QB Rating Handout

    Evaluate this regression output

    Tom BradyNew England Patriots

    MVP Super BowlXXXVIII

    E i

  • 7/29/2019 (Classes 1 & 2) 2 Var Regression-For Upload

    99/99

    Exercise

    Two-variable Regression Exercise #1