multiple linear regression i

Download Multiple Linear Regression I

Post on 02-Jun-2018

225 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • 8/11/2019 Multiple Linear Regression I

    1/33

    1/33

    EC114 Introduction to Quantitative Economics17. Multiple Linear Regression I

    Marcus Chambers

    Department of EconomicsUniversity of Essex

    28 February/01 March 2012

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    2/33

    2/33

    Outline

    1 Introduction

    2 Ordinary least squares with multiple regressors

    3 The Classical Multiple Regression Model

    Reference: R. L. Thomas,Using Statistics in Economics,

    McGraw-Hill, 2005, sections 13.1 and 13.2.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    3/33

    Introduction 3/33

    So far we have been concerned with regression models

    involving a single explanatory variable, X, of the form

    Yi =+Xi+i, i= 1, . . . , n,

    where andare the unknown population regressionparameters andi denotes a random disturbance.

    We have also considered the set of Classical assumptions

    onXandthat imply that the ordinary least squares (OLS)estimators ofand, denotedaandb, have goodsampling properties.

    In particular, the OLS estimators are: best linear unbiased estimators (BLUE); efficient (under normality).

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    4/33

    Introduction 4/33

    In addition to unbiasedness, the OLS estimators have the

    smallest variance among linear unbiased estimators, and ifwe assume normality they have the smallest variance

    among all unbiased estimators.

    The OLS estimators therefore provide a good basis for

    making inferences aboutand.

    For example, we can use the results that

    a

    sa tn2 and

    b

    sb tn2

    wheresa andsb are the estimated standard errors of aandb, respectively, to conduct hypothesis tests using the

    t-distribution.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    5/33

    Introduction 5/33

    However, many relationships that we study in Economics

    are concerned with more than two variables,YandX.

    For example, the demand for a good (Qd

    ) may depend notonly on its own price (P1) but also on consumers income

    (M) and the prices of other goods (substitutes and

    complements) (P2,P3,...) e.g.

    Qd =f(P1,M,P2,P3, . . .).

    We therefore need to extend our regression model to

    include additional explanatory variables (regressors) while,

    at the same time, keeping the desirable properties of the

    OLS estimators in the two-variable model.

    Fortunately it is possible to apply OLS to regressions with

    multiple explanatory variables and the optimality properties

    carry over under suitable assumptions.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    O di l i h l i l /

    http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    6/33

    Ordinary least squares with multiple regressors 6/33

    We begin by assuming that a linear relationship exists

    between a dependent variable Yandk 1explanatoryvariables,X2,X3, . . . , Xk:

    Y=1+2X2+3X3+. . .+kXk+,

    where is a random disturbance and the j (j= 1, . . . , k)are constants.

    Note that it is common to denote the first explanatory

    variable byX2 rather thanX1.

    In fact, it is convenient to interpret the intercept 1 as thecoefficient on a variableX1 that always takes the value 1.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    O di l t ith lti l 7/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    7/33

    Ordinary least squares with multiple regressors 7/33

    Assuming thatE() = 0and taking the Xvalues as given,

    we obtain

    E(Y) =1+2X2+3X3+. . .+kXk;

    this is thepopulation regression equation.

    Each coefficientj represents the effect onE(Y)of a unitchange inXj holding all otherXvariables constant.

    For example,2 measures the change in E(Y)when X2changes by one unit; it is the partial derivative E(Y)/X2.

    Thej coefficients are population parameters; their valuesare unknown and we aim to estimate them from a sampleof observations onYand the Xs.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 8/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    8/33

    Ordinary least squares with multiple regressors 8/33

    We shall use the following notation:

    Yi: observationion the dependent variableY;

    Xji: observationion explanatory variable Xj;

    i: the (unobserved) value of for observationi.

    For example, observation 6 consists of

    Y6,X26,X36, . . . ,Xk6;

    these values are related by

    Y6=1+2X26+3X36+. . .+kXk6+6.

    For a general observation iwe have

    Yi =1+2X2i+3X3i+. . .+kXki+i, i= 1, . . . , n.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 9/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    9/33

    Ordinary least squares with multiple regressors 9/33

    Suppose we estimate1, . . . , kusing b1, . . . , bk; for thetime being we shall not specify how the estimates are

    obtained.

    Thesample regression equationcorresponding to

    b1, . . . , bk is

    Yi = b1+ b2X2i+ b3X3i+. . .+bkXki, i= 1, . . . , n;

    the Yi (i= 1, . . . , n)are the fitted (or predicted) values of Y.

    The difference betweenYand Yis, as before, called aresidual, and is denoted

    ei=Yi Yi, i= 1, . . . , n.

    We can also write

    Yi =b1+ b2X2i+ b3X3i+. . .+bkXki+ ei, i= 1, . . . , n.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 10/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    10/33

    Ordinary least squares with multiple regressors 10/33

    How do we chooseb1, . . . , bk?

    The method of ordinary least squares (OLS) chooses theestimates so as to minimise the sum of squared residuals,

    S=

    e2i .

    We can expresseexplcitly in terms of b1, . . . , bk:

    ei = Yi b1 b2X2i b3X3i . . . bkXki, i= 1, . . . , n.

    It follows that the objective function is

    S=

    n

    i=1

    e2

    i =

    n

    i=1

    (Yi b1 b2X2i b3X3i . . . bkXki)2

    .

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 11/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    11/33

    Ordinary least squares with multiple regressors 11/33

    In order to minimiseSwith respect tob1, . . . , bkwe must:

    (i) partially differentiateSwith respect to eachbj;

    (ii) set thekpartial derivatives equal to zero and solve forb1, . . . , bk.

    In step (i) we obtain

    S

    b1 ,

    S

    b2 , . . . ,

    S

    bk.

    In step (ii) we equate to zero and solve the following k

    equations jointly:

    Sb1

    = 0, Sb2

    = 0, . . . , Sbk

    = 0.

    Askgets larger this becomes more and more difficult!

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 12/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    12/33

    Ordinary least squares with multiple regressors 12/33

    For an arbitrary value ofkit is possible to write the solution

    compactly in terms of matrices and vectors.In practice we rely on computer software to compute OLS

    estimates based on such a representation of the solution.

    Example. We return to the money demand example first

    encountered in Lecture 11.

    Our two-variable regression of money stock (Y) on GDPX2yielded

    Y= 0.0212+ 0.1749X2,

    based on our sample of 30 countries in 1985.

    Suppose we also add the rate of interest variable, X3, to

    the regression; we obtain the following output in Stata:

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 13/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    13/33

    y q p g

    . regress m g ir

    Source | SS df MS Number of obs = 30

    -------------+------------------------------ F( 2, 27) = 47.03

    Model | 20.5135791 2 10.2567896 Prob > F = 0.0000Residual | 5.88865732 27 .218098419 R-squared = 0.7770

    -------------+------------------------------ Adj R-squared = 0.7604

    Total | 26.4022364 29 .910421946 Root MSE = .46701

    ------------------------------------------------------------------------------

    m | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    g | .172615 .0183198 9.42 0.000 .1350258 .2102042

    ir | -.0006758 .0008844 -0.76 0.451 -.0024904 .0011388_cons | .0569582 .125639 0.45 0.654 -.2008317 .3147481

    ------------------------------------------------------------------------------

    The regression results, including standard errors in

    parentheses, can be represented as:

    Y= 0.0570 + 0.1726 X2 0.000676 X3,(0.1256) (0.0183) (0.000884)

    withR2 = 0.777.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 14/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    14/33

    y q p g

    The magnitudes of the estimated coefficients differ

    substantially, with the coefficient on X3 appearing to be

    very small.

    But this reflects the relative units of measurement of X3,

    which is measured as, for example, 16% rather than 0.16.

    If we had used the latter units of measurement (i.e. dividing

    all observations onX3 by 100), then then estimatedcoefficient would have been 100 times larger.

    Remember that statistical significance of a variable is

    tested using at-test and is not judged by the magnitude of

    the estimated coefficient!If we add another regressor,X4 (the rate of price inflation),

    we obtain:

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 15/33

    http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    15/33

    . regress m g ir pi

    Source | SS df MS Number of obs = 30

    -------------+------------------------------ F( 3, 26) = 30.70

    Model | 20.5893701 3 6.86312337 Prob > F = 0.0000

    Residual | 5.81286631 26 .223571781 R-squared = 0.7798-------------+------------------------------ Adj R-squared = 0.7544

    Total | 26.4022364 29 .910421946 Root MSE = .47283

    ------------------------------------------------------------------------------

    m | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    g | .1703745 .0189433 8.99 0.000 .1314361 .2093129

    ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967

    pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592_cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475

    ------------------------------------------------------------------------------

    These regression results, including standard errors in

    parentheses, can be represented as:

    Y = 0.0894 + 0.1704 X2 0.000169 X3 0.0022 X4,(0.1388) (0.0189) (0.001248) (0.0038)

    withR2 = 0.7798.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 16/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    16/33

    We could also carry out the estimations using logarithms of

    the variables; for example

    . regress lm lg lir

    Source | SS df MS Number of obs = 30

    -------------+------------------------------ F( 2, 27) = 175.53

    Model | 59.8192409 2 29.9096204 Prob > F = 0.0000

    Residual | 4.60058503 27 .170392038 R-squared = 0.9286-------------+------------------------------ Adj R-squared = 0.9233

    Total | 64.4198259 29 2.22137331 Root MSE = .41279

    ------------------------------------------------------------------------------

    lm | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    lg | 1.026927 .0570772 17.99 0.000 .9098146 1.14404

    lir | -.2486999 .0671987 -3.70 0.001 -.3865802 -.1108195

    _cons | -1.248211 .1991953 -6.27 0.000 -1.656926 -.8394964

    ------------------------------------------------------------------------------

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Ordinary least squares with multiple regressors 17/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    17/33

    The logarithmic results can be represented as

    ln(Y)=1.2482 + 1.0269 ln(X2) 0.2487 ln(X3),

    (0.1992) (0.0571) (0.0672)

    withR2 = 0.9286and where figures in parentheses arestandard errors.

    The estimated coefficients now have the interpretation of

    elasticities.For example, the income elasticity of the demand for

    money is estimated to be 1.0269, while the interest rateelasticity of money demand is estimated as 0.2487.

    However, in order to conduct formal hypothesis tests, weneed to know the sampling properties of the OLS

    estimators, and to do that we need to make some

    assumptions. . .

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 18/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    18/33

    Just as in the two-variable regression model, the OLS

    estimators in the multiple regression model are subject to

    sampling variability.The properties of the OLS estimators and their

    distributions depend on the conditions under which they

    are obtained i.e. the assumptions made.

    We have already studied the assumptions of the

    two-variable Classical model, and the Classical multiple

    regression model is basically a straightforward extension of

    the two-variable case.

    The assumptions we need to make concern the

    explanatory variablesX2, . . . ,Xkand the error term.As before we shall focus on the small-sample properties of

    the estimators and shall ignore large sample (n)properties.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 19/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    19/33

    The assumptions concerning the regressors are as follows:

    Assumptions concerning the explanatory variables

    IA (non-random): X2, . . . ,Xkare non-stochastic;

    IB (fixed): The values ofX2, . . . ,Xkare fixed in

    repeated samples;ID (no collinearity): There exist no exact linear relationships

    between the sample values of any two

    or more of the explanatory variables.

    Note that Assumption IC, used in Thomas, is a

    large-sample assumption and has been omitted here.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 20/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    20/33

    Assumptions IA (non-random) and IB (fixed) are identical

    to the two-variable model but are now applied to all

    regressors.

    It means that X2. . . ,Xkare not random variables and thesame values would appear in each sample if it were

    possible to conduct repeated sampling.

    The new assumption is ID (no collinearity) which has no

    equivalent in the two-variable model.

    It is included in order to rule out the possibility of what is

    calledperfect multicollinearity, which we will study in

    more detail in Lecture 18.

    For now, simply note that the assumption rules out thepossibility that, for example, X3i = 5+ 2X2i for all i.

    If Assumption ID is violated then all estimation methods,

    including OLS, are infeasible.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 21/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    21/33

    The assumptions concerning are the same as in thetwo-variable model.

    For completeness they are repeated below:

    Assumptions concerning the disturbances

    IIA (zero mean): E(i) = 0, for alli;

    IIB (constant variance): V(i) =2

    =constant for all i;IIC (zero covariance): Cov(i, j) = 0for alli =j;

    IID (normality): eachi is normally distributed.

    These assumptions govern the properties of therandompart of the model.

    Given thatX2, . . . ,Xkare fixed they therefore govern thevariation inYin repeated samples.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 22/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    22/33

    Assumption IIA (zero mean) implies that the average

    effect of in repeated samples is zero and the value ofY,on average, is:

    E(Yi) = E(1+2X2i+. . .+kXki+i)

    = 1+2X2i+. . .+kXki+ E(i)

    = 1+2X2i+. . .+kXki, i= 1, . . . , n,

    becauseE(i) = 0under IIA.

    Note thatE(Yi)is not the same for eachibut depends on

    X2i, . . . ,Xki which are not constant throughout the sample(if they were constant they would violate Assumption ID).

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 23/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    23/33

    Recall that combining IIA (zero mean), IIB (constant

    variance) and IID (normality) gives

    i N(0, 2), i= 1, . . . , n.

    Note that

    YiE(Y

    i) =Y

    i 1 2X2

    i . . .

    kX

    ki=

    i;

    this implies that

    V(Yi) =E(Yi E(Yi))2 =E(2i) =V(i) =

    2

    which in turn implies that

    Yi N1+2X2i+. . .+kXki,

    2, i= 1, . . . , n.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 24/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    24/33

    The implications of the Assumptions for the OLS

    estimators can be summarised as follows:

    Property Assumptions

    Linearity IA, IB, ID

    Unbiasedness IA, IB, ID, IIA

    BLUness IA, IB, ID, IIA, IIB, IICEfficiency IA, IB, ID, IIA, IIB, IIC, IID

    Normality IA, IB, ID, IIA, IIB, IIC, IID

    These are the same as in the two-variable model exceptthat we now require Assumption ID (no collinearity) in all

    cases.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 25/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    25/33

    The unbiasedness and normality properties imply that

    bj Nj, 2

    bj , j= 1, . . . , k,

    which can be used as a basis for inference.

    Fork> 2the variances,2bj , are complicated functions of

    the regressors, but all are proportional to 2 =V().

    In order to conduct inference we therefore need toestimate2.

    A generalisation of the estimator in the two-variable model

    is used for this, and is given by

    s2 =

    e2i

    n k;

    it is an unbiased estimator i.e. E(s2) =2.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 26/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    26/33

    Note that the denominator of s2 involves n k.

    This is because we have had to estimate kparameters(1, . . . , k)in order to compute the residuals e1, . . . , en andhave therefore lost kdegrees of freedom.

    If we uses2 in the (complicated) formulae for the estimator

    variances we obtain the estimated variances s2

    bj(j= 1, . . . , k).

    It follows that, for inference, we then use Students

    t-distribution instead of the normal distribution:

    bj

    jsbj

    tnk.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 27/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    27/33

    So, to test the significance of a regressor,Xj, i.e. to test

    H0:j = 0 against HA:j = 0,

    we can use the test statistic

    TS=

    bj

    sbj tnk under H0.

    Lett0.025denote the 5% critical value from the tnkdistribution that puts 2.5% of the distribution into each tail.

    As before the decision rule is:if |TS| >t0.025rejectH0; if |TS| < t0.025do not rejectH0.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 28/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    28/33

    We can also use the t-distribution to form confidenceintervals (CIs) for the unknown population parameters

    1, . . . , k.

    Witht0.025 as defined on the previous slide, a 95% CI for j

    is of the form

    bj t0.025sbj orbj t0.025sbj , bj+ t0.025sbj

    ,

    i.e. we are 95% confident thatj lies in this interval.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 29/33

    http://goforward/http://find/http://goback/
  • 8/11/2019 Multiple Linear Regression I

    29/33

    Example. Lets return to the money demand data wherewe estimated the model

    Y=1+2X2+3X3+4X4+,

    whereYdenotes money stock, X2 is GDP,X3 is the interestrate andX4 is the rate of price inflation.

    Lets test the hypotheses2= 0and3= 0and find a 95%confidence interval for4.

    The regression output is as follows:

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 30/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    30/33

    . regress m g ir pi

    Source | SS df MS Number of obs = 30

    -------------+------------------------------ F( 3, 26) = 30.70Model | 20.5893701 3 6.86312337 Prob > F = 0.0000

    Residual | 5.81286631 26 .223571781 R-squared = 0.7798

    -------------+------------------------------ Adj R-squared = 0.7544

    Total | 26.4022364 29 .910421946 Root MSE = .47283

    ------------------------------------------------------------------------------

    m | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    g | .1703745 .0189433 8.99 0.000 .1314361 .2093129ir | -.0001693 .0012483 -0.14 0.893 -.0027353 .0023967

    pi | -.002197 .0037733 -0.58 0.565 -.0099531 .0055592

    _cons | .0893538 .1388419 0.64 0.525 -.1960399 .3747475

    ------------------------------------------------------------------------------

    Note thatt-ratios for testingj =

    0are given in the output

    above, as are 95% CIs, but we shall go through the

    calculations nonetheless!

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 31/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    31/33

    To testH0:2= 0againstHA:2= 0we use

    TS= b2

    sb2= 0.1704

    0.0189= 8.99 t26 under H0.

    The 5% critical value for a two-tail test from the t26distribution is 2.056.

    As |TS| = 8.99> 2.056we rejectH0 in favour ofHAi.e. there is evidence that2= 0and hence that GDP is asignificant determinant of the money stock.

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    The Classical Multiple Regression Model 32/33

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    32/33

    Repeating the process for3 we obtain

    TS=0.0001693

    0.001248 = 0.14.

    Here |TS| = 0.14< 2.056and hence we do not rejectH0:3= 0i.e. we are unable to reject the hypothesis thatthe interest rate isnota significant determinant of money.

    A 95% CI for4 is obtained as

    b4 t0.025sb4 = 0.002197 (2.056 0.003773)

    which gives 0.002197 0.007757or[0.00995, 0.00556].

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    Summary 33/33

    Summary

    http://find/
  • 8/11/2019 Multiple Linear Regression I

    33/33

    Summary

    the Classical multiple linear regression model

    Next week:

    the problem of multicollinearitymaking inferences

    EC114 Introduction to Quantitative Economics 17. Multiple Linear Regression I

    http://find/