applied statistics - uniroma1.it · 2019. 11. 11. · applied statistics lecturer: cristinamollica....
TRANSCRIPT
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Applied Statistics
Lecturer: Cristina Mollica
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Outline of the lecture
IntroductionMultiple linear regression modelOrdinary least squares estimationExamplesFitted values and residualsCoefficient of determination R2
Properties of the OLS estimator
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Introduction
Regression models are used to describe how one or perhaps a fewresponse variables depend on other explanatory variables.
The idea of regression is at the core of much statistical modelling,because the question what happens to y when x varies? is central tomany investigations.
The main goal is to gain an understanding of the relation betweenthem and to make predictions of the response from the knowledgeof the explanatory variables.
There is usually a single response, treated as random, and a set ofexplanatory variables, which can be both quantitative andcategorical variables and are treated as non-stochastic.
We are going to generalize the theory of the simple linear regressionmodel to the case where a single response depends linearly onK ≥ 2 covariates (multiple regression model).
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Multiple regression
The multiple linear regression model assumes that r.v. Yi satisfies
Yi = β0 + β1xi1 + . . . , βKxiK + εi = x ti β + εi i = 1, . . . , n
x ti = (1, xi1, . . . , xiK ) is the 1× (K + 1) vector of covariatesassociated with the i-th observation (known constants)
β = (β0, β1, . . . , βK )t is a (K + 1)× 1 vector of unknown regressionparameters
x ti β is the deterministic component of the model referred to aslinear predictor, that is, a linear combination (in the parameters)of the covariates
εi is the random (unobserved) error term perturbing the linearrelation and inducing the discrepancy between Yi and x ti β
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Matrix notationEconometricians make frequent use of the matrix notation:
Y =
Y1...Yi...Yn
X =
1 x11 ... x1K... ... ... ...1 xi1 ... xiK... ... ... ...1 xn1 ... xnK
β =
β0...βk...βK
ε =
ε1...εi...εn
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Matrix notation
Using the matrix notation, where:Y is the n × 1 vector of the response observationsX is the n × (K + 1) design matrix with the covariate valuesβ is the (K + 1)× 1 vector of regression coefficientsε is the n × 1 vector of unknown error terms
the regression equation can be concisely written as
Y = Xβ + ε
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
The Gauss-Markov assumptionsFor a linear regression model
Y = Xβ + ε
the Gauss-Markov assumptions concern the errors εi and their relationwith the xi :
E [εi ] = 0
V (εi ) = σ2 not depending on i homoschedasticity
COV (εi , εj) = 0 for all i 6= j uncorrelation
ε1, ..., εn and x1, ..., xn are independent
The error terms are uncorrelated drawings from a distribution with 0mean and constant variance σ2. Using the matrix notation
E [ε] = 0 and V (ε) = σ2In
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example: simple regression
For the straight line regression model yi = β0 + β1xi + εi fori = 1, . . . , n, the matrix form of the model is
y1y2...yn
=
1 x11 x2...
...1 xn
(β0β1
)+
ε1ε2...εn
so X is an n × 2 matrix and β a 2× 1 vector of parameters.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example: two groups comparison
Suppose that the response variable y has been observed on twogroups of observations of size n1 and n2. Let y1i for i = 1, . . . n1 bethe observations of the first group and let y2i for i = 1 . . . n2 be theobservation of the second group.
Let β0 and β0 + β1 be the means of the variable y in the twogroups. Hence
y1i = β0 + ε1i i = 1, . . . , n1
y2i = β0 + β1 + ε2i i = 1 . . . n2
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example: two groups comparison
We can write the model for the two groups comparison in matrixnotation y = Xβ + ε where
y =
y11...
y1n1
y21...
y2n2
X =
1 0...
...1 01 1...
...1 1
β =
(β0β1
)ε =
ε11...
ε1n1
ε21...
ε2n2
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example: polynomial regressionSuppose that the response is a polynomial function of a singlecovariate,
yi = β0 + β1xi + · · ·+ βKxKi + εi
Useful tool to fit, for example, a quadratic or cubic trend to thedata, in which case we would have K = 2 or K = 3 respectively.Then
y1y2...yn
=
1 x1 x2
1 · · · xK11 x2 x2
2 · · · xK2...
...1 xn x2
n · · · xKn
β0β1...βK
+
ε1ε2...εn
Note that the model y = Xβ + ε is linear in the parameters β.Polynomial regression can be written as a linear model because ofits linearity, not in x , but in β.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Ordinary least squares
The OLS estimate of β is obtained by the solution of the followingoptimization problem
β = argminβ∈RK+1
SS(β)
where the objective function to be minimized is the sum of squares
SS(β) =n∑
i=1
(yi − x ti β)2 = (y − Xβ)t(y − Xβ)
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Ordinary least squaresBy differentiating SS(β) with respect to β and setting thederivatives equal to zero, one has to solve the system
∂SS(β)∂β0
= −2∑n
i=1(yi − x ti β) = 0...
∂SS(β)∂βk
= −2∑n
i=1(yi − x ti β)xik = 0...
∂SS(β)∂βK
= −2∑n
i=1(yi − x ti β)xiK = 0
With the matrix notation, it corresponds to
∂SS(β)
∂β= −2(y − Xβ)tX = (0, . . . 0)
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Ordinary least squaresIn matrix form these amount to the equations
(y − Xβ)tX = (0, . . . 0)
that isX t(y − Xβ) = (0, . . . 0)t
which imply that the estimate satisfies
X ty = X tXβ
Provided the (K + 1)× (K + 1) matrix X tX is of full rank
β = (X tX )−1X ty =
(n∑
i=1
xixti
)−1 n∑i=1
xiyi
is the system solution.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Ordinary least squaresMoreover, the (r , s) element of the matrix of second derivatives ofSS(β) is
∂2SS(β)
∂βr∂βs= 2
n∑i=1
xirxis .
Hence, the matrix of second derivatives of SS(β) is 2X tX , which isa semi-positive matrix.
Thus, β = (X tX )−1X ty is the value that minimizes SS(β). Theminimum value of the objective function
SS(β) =n∑
i=1
(yi − x ti β)2 = (y − X β)t(y − X β) = (y − y)t(y − y)
is called residual sum of squares.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Examples 1: simple regressionIn simple cases it is possible to have analytical expressions for theleast square estimates. For example in the straight-line regressionmodel
yi = β0 + β1xi + εi i = 1, . . . n
The X matrix of the representation y = Xβ + ε is
X =
1 x11 x2...
...1 xn
Then, we have that
X tX =
(n
∑ni=1 xi∑n
i=1 xi∑n
i=1 x2i
)X ty =
( ∑ni=1 yi∑n
i=1 xiyi
)
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Examples 1: simple regressionMoreover
(X tX )−1 =1
n∑n
i=1 x2i − (
∑ni=1 xi )
2
( ∑ni=1 x
2i −
∑ni=1 xi
−∑n
i=1 xi n
)so that
β = (X tX )−1X ty
=1
n∑n
i=1 x2i − (
∑ni=1 xi )
2
( ∑ni=1 yi
∑ni=1 x
2i −
∑ni=1 xi
∑ni=1 xiyi
n∑n
i=1 xiyi −∑n
i=1 xi∑n
i=1 yi
)
Now let sxy = 1n
∑ni=1(xi − x)(yi − y), s2
y = 1n
∑ni=1(yi − y)2 and
s2x = 1
n
∑ni=1(xi − x)2. After some algebra we have
β =
(y − x sxy/s
2x
sxy/s2x
)
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Examples 2: binary covariate
For the two groups comparison
y1i = β0 + ε1i i = 1, . . . , n1
y2i = β0 + β1 + ε2i i = 1 . . . n2
we have
y =
y11...
y1n1
y21...
y2n2
X =
1 0...
...1 01 1...
...1 1
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Examples 2: binary covariate
To obtain the least square estimates observe that
X tX =
(n1 + n2 n2
n2 n2
)
(X tX )−1 =1
n1n2
(n2 −n2−n2 n1 + n2
)=
(n−11 −n−1
1n−11 n−1
1 + n−12
)
X ty =
( ∑n1i=1 y1i +
∑n2i=1 y2i∑n2
i=1 y2i
)=
(n1y1 + n2y2
n2y2
)hence
β = (X tX )−1X ty =
(y1
y2 − y1
)
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Examples 2: categorical covariateThe two groups comparison can be extended to G ≥ 2 groups
y1i = β0 + ε1i i = 1, . . . , n1
y2i = β0 + β1 + ε2i i = 1 . . . n2
......
...
yGi = β0 + βG + εGi i = 1 . . . nG
Let yg = (yg1, . . . , ygng )t for i = 1, . . . ,G .
y =
y1...yG
X =
1n1 0n1 · · · 0n1
1n2 1n2 · · · 0n2...
. . .1nG 0nG · · · 1nG
β = (X tX )−1X ty =
y1
y2 − y1...
yG − y1
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example 2: Wages data
Consider the wages data. We can extend our regression model withaddition explanatory variables, such as the years of schooling(schooli ), the experience in years (experi ). The model is
Yi = β0 + β1malei + β2schooli + β3experi + εi
The model is now interpreted to describe the conditionalexpected wages of an individual given his or her gender, yearsof schooling and experience.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example 2: Wages data
The model is estimated as follows:
yi = −3.38 + 1.34malei + 0.64schooli + 0.12experi
The coefficient of male measures the expected wage betweenmale and female with the same schooling and experience: itmeans that if we compare an arbitrary male and female withthe same schooling and same experience, the expected wagedifferential is 1.34
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Example 2: Wages data
The coefficient of school measures the expected wagedifference between two individuals with the same experience,the same gender where one has one additional year of schoolingand for the parameter of the experience?
In general, the coefficients in a multiple regression model canonly be interpreted under a ceteris paribus condition, whichsays that the other variables that are included in the model areconstant.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Fitted values
The fitted or predicted values for y are given by
yi = x ti β i = 1, . . . , n.
In vector terms,
y = X β = X (X tX )−1X ty = Hy
In linear algebra, the matrix H = X (X tX )−1X t is known as projectionmatrix. It projects the vector y onto the orthogonal space spanned by thecolumns of X . The matrix H is more frequently known as hat matrixbecause it transforms y to y .
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Residuals
The unobservable error εi = yi − x ti β is estimated by the i-th residual
ei = yi − x ti β
In vector terms,
e = y − X β = y − Hy = (I − H)y = My
where I is the n × n identity matrix and M is called residual matrix.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Hat and residual matrices
The projection matrix is symmetric and idempotent, that is
H t = H and HH = H
The same properties hold for the matrix M.
Moreover, the vector of residuals e = My and the vector of fitted valuesy = Hy are orthogonal. In fact,
y te = y tH tMy = y tH t(I − H)y = y t(H − H)y = 0
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Deviance decomposition and R2
Finally, note that by the orthogonality of y and e, we haven∑
i=1
y2i = y ty = (e + y)t(e + y) = y t y + ete =
n∑i=1
y2i +
n∑i=1
e2i
The overall sum of squares of the data equals the sum of squares of thefitted model and the residual sum of squares. Thanks to the fact that yiand yi have the same mean y , one can write
n∑i=1
y2i − ny2 =
n∑i=1
y2i − ny2 +
n∑i=1
e2i
n
(∑ni=1 y
2i
n− y2
)= n
(∑ni=1 y
2i
n− y2
)+
n∑i=1
e2i
n∑i=1
(yi − y)2 =n∑
i=1
(yi − y)2 +n∑
i=1
e2i
known as deviance decomposition.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Deviance decomposition and R2
The deviance decomposition is briefly indicated as
TSS = ESS + RSS
implying that the total deviance TSS of the response variable y is equalto the deviance of the predicted values y , called explained devianceESS by the estimated multiple regression model, and the deviance of theresiduals, that is, the residual sum of squares RSS resulting from the OLSminimization.
This decomposition is exploited to derive the coefficient of determinationR2 to assess the goodness-of-fit of the estimated model
R2 =ESS
TSS= 1− RSS
TSS∈ [0, 1]
It quantifies the portion of original variability recovered by the fittedmodel.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Properties of the OLS estimatorWhen the Gauss-Markov conditions hold:
1 The OLS estimator β is unbiased: E (β) = β. In fact
β = (X tX )−1X ty = (X tX )−1X t [Xβ + ε] = β + (X tX )−1X tε
Hence,
E (β) = E (β + (X tX )−1X tε)
= β + (X tX )−1X tE (ε) = β.
2 The variance of β is V (β) = σ2(∑n
i=1 xixti
)−1= σ2(X tX )−1
V (β) = V (β + (X tX )−1X tε) = V ((X tX )−1X tε)
= (X tX )−1X tσ2I ((X tX )−1X t)t = σ2(X tX )−1
3 Gauss-Markov theorem: The OLS estimator is the best linearunbiased estimator (BLUE) of β
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Gauss-Markov theorem
Let β be a linear estimator of β that is β = Ay , where A is (K + 1)× nmatrix. Assume that β is unbiased that is
E (β) = E (Ay) = AXβ = β
hence, AX = I . Then, under the Gauss-Markov assumptions one has
V (β)− V (β)
is a positive semidefinite matrix.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Gauss-Markov theorem
In fact,
V (β)− V (β) = Aσ2IAt − σ2(X tX )−1
= σ2 {AAt − AX (X tX )−1X tAt}
= σ2A(I − H)At = σ2A(I − H)(I − H)tAt
This result shows that β has smallest variance in finite samples among alllinear unbiased estimators of β, provided that the Gauss-Markovassumptions hold. Of course, nonlinear estimators may have smallervariance.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Properties of the OLS estimator
To estimate the variance of β, we need to replace the unknown errorvariance σ2 with an estimate. An unbiased estimator of σ2 is
s2 =1
n − (K + 1)
n∑i=1
e2i =
ete
n − (K + 1)
where n is the number of observations and (K + 1) the number ofregressors in the model (including the intercept).
The naive estimate of σ2, given by
σ2 =1n
n∑i=1
e2i =
ete
n=
(y − y)t(y − y)
n=
1n
n∑i=1
(yi − yi )2
is, therefore, a biased estimate for σ2.
Multiple regression OLS estimation A real example Fitted values and residuals R2 Properties OLS estimator
Properties of the OLS estimator
Hence the variance-covariance matrix of β can be estimated as
V (β) = s2(X tX )−1
We define standard error of β the quantity
SE (β) =
√V (β)
The standard error of β is a measure for the precision of the estimator.We will define
SE (βk) = s√ckk
where ckk is the (k , k) element of (∑
i=1 xixti )−1.