Download - 3.5. MULTIPLE LINEAR REGRESSION
Chapter 3, section 3.5. Multiple linear regression 80
Guy Mélard, 1997, 1999 ISRO, U.L.B.
3.5. MULTIPLE LINEAR REGRESSION
THE PROBLEM
NUMERICALLY STABLE ALGORITHMS
THE MODIFIED GRAM-SCHMIDT METHOD
HOUSEHOLDER TRANSFORMATION
SINGULAR VALUE DECOMPOSITION
GENERALIZED INVERSES
CHOLESKY FACTORISATION
USE OF MATLAB
References
Chapter 3, section 3.5. Multiple linear regression 81
Guy Mélard, 1997, 1999 ISRO, U.L.B.
THE PROBLEMLet n observations of a dependent variable y and explanatory variables x1, ..., xk stored in avector Y and a matrix X:
Yy
yn
=
1
!
Xx x
x x
k
n kn
=
11 1
1
"! !"
The model is described byy x x ek k= + + +β β1 1 … ,
The βj are the regression coefficients, which should be estimate from the data,and e, is the error term.The least squares estimator is the solution of the system of normal equations, in matrix form
$ ( ) .β = −X X X YT T1 (1)
Chapter 3, section 3.5. Multiple linear regression 82
Guy Mélard, 1997, 1999 ISRO, U.L.B.
From the numerical point of view, there is a double problem:1° the use of XTX and XTY in (1) corresponds implicitly to the "calculator" algorithm (seesection 2.1) to calculate the variances and covariances but we saw that that algorithm is notnumerically stable.2° inversion of the matrix XTX can raise numerical problems if the matrix is "almostnoninvertible " (case of quasi-colinearity).
Notice. A simple way to improve the precision is to center the variables before carrying outthe calculations, provided that the initial model has a constant.
Chapter 3, section 3.5. Multiple linear regression 83
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Example. Let us suppose calculations with a precision ε such as 1 + ε2 ≈ 1 (for example, insingle precision, with ε = 10-7) and the matrix
X =
1 10
00 0
εε
so that
X XT =+
+
≈
1 11 1
1 11 1
2
2
εε
is a non invertible matrix.Numerically stable algorithms for multiple linear multiple regression1° don't calculate XTX ;2° a fortiori, don't invert XTX.
Chapter 3, section 3.5. Multiple linear regression 84
Guy Mélard, 1997, 1999 ISRO, U.L.B.
NUMERICALLY STABLE ALGORITHMSThey are based on orthogonal transformations and a factorisation of matrix X :1° the modified Gram-Schmidt method, an alternative of a well-known method in linearalgebra;2° the Householder transformation3° the singular value decompositionThe Givens method can also be used, particularly when the data are read in one by one.We add the Cholesky factorisation, which is interesting for other reasons.One can find all these algorithms in the LINPACK library.For recent references, see Gentle (1998).
Remark. The least squares method can be introduced without using the normal equations, forexample Thisted [1988, p. 63].
Chapter 3, section 3.5. Multiple linear regression 85
Guy Mélard, 1997, 1999 ISRO, U.L.B.
THE MODIFIED GRAM-SCHMIDT METHOD
Description
Example.
Indirect solution of the normal equations
Implementation
Chapter 3, section 3.5. Multiple linear regression 86
Guy Mélard, 1997, 1999 ISRO, U.L.B.
DescriptionPrinciple: build an orthonormalised basis in a metric space in the n-dimensional spacegenerated by the columns of XClassical Gram-Schmidt method (Chambers, 1977):1° as first base vector take the first column of X, duly normalised2° as second base vector take a vector orthogonal with the first base vector, in the spacegenerated by the first two columns: it is obtained by taking the residual of the orthogonalprojection of the second column on the first, followed by a normalisation3° and so on, the i-th vector being built in the space of first i columns, orthogonal with (i - 1)first base vectors and being normalised.The i-th column of X can be written as a linear combination of the elements of thatorthonormalised base using the first (i - 1) elements only.The factorisation of X has the following form:
X Q Rn k n n n k
=× × ×
.
whereQ is an orthogonal matrix, such that QT = Q-1,
RR
=
1
0, where R1 is a upper triangular matrix.
Chapter 3, section 3.5. Multiple linear regression 87
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Example.
X =
1 10
00 0
εε
First
x1
1
00
=
ε⇒ x1
21 1= + ≈ε
⇒ q1
1
00
=
ε
⇒x1 = q1.
Chapter 3, section 3.5. Multiple linear regression 88
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Denoting the scalar product by <.,.>:
x2
10
0
=
ε⇒ [ ]x q2 1 1 0 0
1
00
1, =
=εε
w x x q q2 2 2 1 1
10
0
1
00
0
0
= − =
−
=−
,ε
ε εε
⇒ w222 2= =ε ε
qw
22
2
01 2
1 20
= =−
ε/
/⇒ x q q2 1 22= + ε
hence the factorisation
X =
=
• •− • •
• •• •
⋅
1 10
00 0
1 01 2
0 1 20 0
1 10 20 00 0
εε
ε ε//
.
Chapter 3, section 3.5. Multiple linear regression 89
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Indirect solution of the normal equationsA solution is obtained by solving the triangular system
R Q YT$ .β =Indeed, left multiply by RT:
R R R Q YT T T$ ,β =hence$ ( ) ( ) ( ) ( ) .β = = = =− − − − −R R R Q Y R Q QR R Q Y R Q QR R Q Y X X X YT T T T T T T T T T T T1 1 1 1 1
Example.
X =
=
• •− • •
• •• •
⋅
1 10
00 0
1 01 2
0 1 20 0
1 10 20 00 0
εε
ε ε//
Chapter 3, section 3.5. Multiple linear regression 90
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Consequently,
Q Y
yyyy
y yy yT =
−• • • •• • • •
⋅
=
+− +
••
1 0 00 1 2 1 2 0 1 2 1 2
1
2
3
4
1 2
2 3
ε ε/ / / /
hence
R
y yy y$ .
$$
/ /β
ε ββ
ε
=
=
+− +
••
1 10 20 00 0
1 2 1 21
2
1 2
2 3
and finally$ $
$β β ε
εβ1 2 1 2
2 2 32 12
12
+ = +
= − +
y y
y y.
Chapter 3, section 3.5. Multiple linear regression 91
Guy Mélard, 1997, 1999 ISRO, U.L.B.
This is an easy to solve a triangular systemThe solution is obtained without loss of precision
Remark. 1.There is an inconvenience: the whole matrix X should be in memory, as well as Q(more precisely k rows). However, it is possible to add an explanatory variable without doingall the calculations.
2. A problem is the lack of orthogonality of Q when X is badly conditioned. It should beorthogonalised. This is the reason for the modification in the modified Gram-Schmidtmethod.
Chapter 3, section 3.5. Multiple linear regression 92
Guy Mélard, 1997, 1999 ISRO, U.L.B.
ImplementationHere is MGSREG (from TSE, see Mélard et Pasteels [1994]) which implements the modifiedGram-Schmidt method for multiple linear regression.
Mgsreg.htm
Chapter 3, section 3.5. Multiple linear regression 93
Guy Mélard, 1997, 1999 ISRO, U.L.B.
HOUSEHOLDER TRANSFORMATION
2-dimensional case
k dimensional case
Householder transformation of a matrix
Example.
Numerical aspects
Application to regression
Chapter 3, section 3.5. Multiple linear regression 94
Guy Mélard, 1997, 1999 ISRO, U.L.B.
The Householder transformation is a reflection, i. e. a symmetry with respect to a hyperplan,sending a vector in a lower-dimensional space.
2-dimensional caseLet e1, e2 an orthonormal basis. We want a reflection H which maps vector x on Hx on thefirst axis, e. g. Hx = x e1. It suffices to take vector u = x - x e1, and perform a symmetrywith respect to the straight line orthogonal to u. Hence Hx = x - u. Remark that the solution isnot unique. More precisely, there is a second solution for which Hx = - x e1, defined by u =x + x e1. An arbitrary vector, x’, will not be mapped on the first axis and Hx’ ≠ x’ - u.Geometrically, it can be seen that Hx’ = x’ - 2 (<x’, u>/ u 2) u.
Chapter 3, section 3.5. Multiple linear regression 95
Guy Mélard, 1997, 1999 ISRO, U.L.B.
k dimensional caseConsider x = (x1 ... xk)T in an orthonormalised basis. We are looking for transformation Hwhich1° preserves the first j - 1 components of x;2° put zeros in the last n - j components;3° preserves the norm.More precisely
H x
x
xx
x
x
Hx
x
xs
j
j
j
n
j
j: ,=
→ = ±
−
+
−
1
1
1
1
1
0
0
!
!
!
!
where s x xj j n= + +2 2" .
Chapter 3, section 3.5. Multiple linear regression 96
Guy Mélard, 1997, 1999 ISRO, U.L.B.
The reflection is a symmetry with respect to the hyperplan which is orthogonal to vector
u x Hx
x
xx
x
x
x
xs x s
x
x
j
j
j
n
j
j j j
j
n
= − =
− ±
=
−
+
−
+
1
1
1
1
1
10
0
0
0!
!
!
!
!
∓
!
.
More generally, if x’ is an arbitrary vector, the orthogonal projection of x’ onto u should bedetermined, and
Hx’ = x’ - 2 (<x’, u>/ u 2) u = x’ - 2 u xu u
T
T
′ u = I uuu u
xT
T−
′2 ,
because uT x’ u = u uT x’.
Chapter 3, section 3.5. Multiple linear regression 97
Guy Mélard, 1997, 1999 ISRO, U.L.B.
That expression defines H as a matrix: uu
uuIH T
T
2−=
Remark that H = HT.The reflection is an isometry since
HH HH I uuu u
I uuu u
I uu uuu
uuu
ITT
T
T
T
T T T
= = −
−
= + −
=2 2 4 44 2 ,
noting that scalar uT u equals u 2. Consequently H = HT = H-1 is a symmetric matrix.
Remark. For x’ = u + Hx, we have uT x’ = ∓ " ∓s x x x s x sj j j n j j j+ + + = +2 2 2 .
But u x s x x s x sj j j n j j j2 2
12 2 22= + + + = ++( ) ( )∓ " ∓ .
Hence uT x’/ u 2 = 1/2 and, after simplification Hx’ = x’ - u.
Remark. Householder reflection is also used to transform a symmetric matrix in tridiagonalform. This is the starting point of an algorithm for computing eigenvalues and eigenvectors(Press et al., 1993).
Chapter 3, section 3.5. Multiple linear regression 98
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Householder transformation of a matrixConsider a matrix X, with dimensions n×k, composed of columns [X1...Xk]We apply successively k reflections Hj, where the j-th reflection puts zeros in the last n - jcomponents.As the following reflections Hj + 1,..., Hk, will preserve the first j, ..., k - 1 components, zeroswill be preserved.Consequently
R*=déf H H H Xk k − =
∗ ∗ ∗∗ ∗
∗
1 1
00
0
0 0 0
...…
! ! !
.
Note that the product H H Hk k −1 1... = Q*T is an orthogonal matrix and a factorisation X =Q*R*, has been obtained which is similar but not identical to the QR factorisation obtained inthe modified Gram-Schmidt method.
Chapter 3, section 3.5. Multiple linear regression 99
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Example.
[ ]X X X
x xx xx xx x
= ==
=
1 2
11 12
21 22
31 32
41 42
1 10
00 0
εε
s12 21 1= + ≈ε
⇒u
x sxxx
u1
11 1
21
31
41
12 2
2
00
4 4=
=
⇒ = + ≈
∓ε
ε.
H X
s
1 1
1
000
1000
=
±
=
−
.
[ ]u XT1 2 2 0 0
10
0
2=
=εε
.
H X X u u XT1 2 2 1 1 1
22 24
1
0
2 2= − =
−−
⇒ = + ≈ε
εε .
H X X u1 2 2 12 24
10
0
2
00
1
0
= − =
−
=
−−
εε ε
ε
Chapter 3, section 3.5. Multiple linear regression 100
Guy Mélard, 1997, 1999 ISRO, U.L.B.
H2H1X1 = H1X1 =
−
1000
.
H2H1X2.= ?
s22 2 2 22= + ≈ ⇒ε ε ε u2
02
0
=− +
ε εε
.
H2H1X2.=
−−
12
00
ε.
⇒R* = H2H1X .=
− −−
1 10 20 00 0
ε
The coefficients of R* are those of R, in absolute value.
Chapter 3, section 3.5. Multiple linear regression 101
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Numerical aspects1. Multiplicity of the solutions are due to the choice of signs ± at each reflection. In theexample, the first sign must be - to avoid division by 0. It is generally recommended, toimprove numerical stability, to select for sj the sign of xj. In the example, the second time, weshould have taken the upper sign, producing the following changes:H2H1X2.= ?
s22 2 2 22= + ≈ ⇒ε ε ε u2
02
0
=− −
ε εε
.
H2H1X2.=
−
1200
ε.
⇒R* = H2H1X .=
− −
1 10 20 00 0
ε
Chapter 3, section 3.5. Multiple linear regression 102
Guy Mélard, 1997, 1999 ISRO, U.L.B.
2. If sj = 0, then u j = 0 and a division by 0 will occur. This is a case of multicolinearity. Inthat case, perform a permutation of rows and columns in order to have rows of zeros at thepositions r + 1 to k of R*.3. Numerical stability can be bad in certain circumstances. In order to improve it, apermutation of rows can be done to improve conditioning. In column j, the pivot isdetermined, i. e. the maximum among rows j to n. If it is located on row l > j, let us permuterows l and j. In practice, it is not necessary to perform physically the permutations of rowsand columns, just use pointers.
Chapter 3, section 3.5. Multiple linear regression 103
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Application to regressionIt is better to perform Householder reflections to matrix [X1 ... Xk Y] obtained by juxtaposingcolumn Y.Regression with a constant consists, as we know, to consider a column of 1 in X .Suppose that it is the first column.Since s1
2 = n, and we need to preserve the norm of X1, we have
H X
n
1 10
0
=
!
u
n
1
11
1
=
−
!
Chapter 3, section 3.5. Multiple linear regression 104
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Effect of H1 on the other columns, for example on Y?Given that u n n1
2 2 1= −( ) and u Y y nyTi1 1= −∑ , we have
( )H Y Y
y nyn n
n y ni
i
112
2 1
11
1
= −−
−
−
= ∗
∗
∑∑
( )
/
! !
where we know that the sum of squares of the other elements is equal to
yyn
nsii
y2
2
2∑ ∑−
= .
Chapter 3, section 3.5. Multiple linear regression 105
Guy Mélard, 1997, 1999 ISRO, U.L.B.
SINGULAR VALUE DECOMPOSITION
Definition
Properties
Algorithm
Use in multiple regression
Chapter 3, section 3.5. Multiple linear regression 106
Guy Mélard, 1997, 1999 ISRO, U.L.B.
DefinitionThe SVD concerns an arbitrary matrix X with dimensions n × k. Suppose that n > k.There exists always two orthogonal matrices U, n × n, et V, k × k, and a matrix D, n × k, DT =(D*T O) were D* is a k × k diagonal matrix, with diagonal elements di ≥ 0, such that
U XV DDO
T = =
∗
.
The factorisation can be written X = U D VT.
Chapter 3, section 3.5. Multiple linear regression 107
Guy Mélard, 1997, 1999 ISRO, U.L.B.
PropertiesIn the special case where n = k, and matrix X is square, A, the singular values are theeigenvalues.The non-zero singular values of X are the square roots of the non zero eigenvalues of thesymmetric matrix A = XT X.For a symmetric positive definite matrix A, the di are the eigenvalues of A, U = V, and thecolumns of V are the eigenvectors of A. Then: A = V D VT.The ratio max{di}/min{di} is a measure of the degree of conditioning of the matrix.The singular value decomposition is very important from the theoretical point of view butalso from the numerical point of view.It allows to detect the rank deficiency of a matrix with a higher precision (QR factorisationcan also be used but it is less accurate).The SVD also allows to determine the null space of a matrix.Suppose that the n × k, n>k, matrix X with rank r, is such that the singular values ared d d dr r k1 1 0≥ ≥ ≥ = = =+… … .The null space of X, which is the set of vectors x such that X x = 0, is then the subspacespanned by the column vectors v vr k+1, ,… of V.The space spanned by the column vectors of X is identical to the one spanned by the first rcolumns of U: u ur1, ,… .That factorisation has another advantage: it leads to the definition of a function of a matrix.For example, the square root of a symmetric positive definite matrix A = V D V T is defined byA1/2 = V D1/2 V T, where D1/2 is the diagonal matrix composed of the square roots of theelements of D, the eigenvalues of A.Similarly, eA can be defined. Note also A2 = V D V T V D V T = V D2 V T or A-1 = V D-1 V T(supposing strictly positive di's) are compatible with the usual definition.
Chapter 3, section 3.5. Multiple linear regression 108
Guy Mélard, 1997, 1999 ISRO, U.L.B.
AlgorithmThe algorithm of Golub-Reinsch is too complex to be given here. See Golub and Van Loan[1983]. It makes use of Householder reflections to transform the matrix in a tridiagonalmatrix. It is included in library EISPACK.
Use in multiple regression
We have XTX = V D UT U D VT = V D2 VT. Furthermore the di2 are eigenvalues of XTX with
corresponding eigenvector vi, i-th column of V. Supposing d d d dr r k1 1 0≥ ≥ ≥ = = =+… … , it canbe shown that
$ .β ==∑
u yd viT
ii
i
r
1
Remark. In the case of multicolinearity, that approach can eb used to determine a generalisedinverse.
Chapter 3, section 3.5. Multiple linear regression 109
Guy Mélard, 1997, 1999 ISRO, U.L.B.
GENERALIZED INVERSES
Definition
Application to regression
DefinitionLet a n×k matrix A,. Define the (g1-) inverse of A, every matrix A- such that AA-A = A.Matrix A- is not unique.
Remark. There are other definitions of a pseudo-inverse A+ of a matrix A. Consider the fourproperties:
1. AA+A = A
2. A+AA+ = A+
3. (AA+)T = AA+
4. (A+A)T = A+A
A gi-inverse fulfils properties 1 to i.
A g4-inverse is also called a Moore-Penrose inverse. Such a matrix is unique.
Chapter 3, section 3.5. Multiple linear regression 110
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Block inverseSuppose a matrix of rank r.By means of a permutation of rows and columns, it can be put under the following form
AA AA A
=
11 12
21 22
so that A11 is invertible. Let us show that matrix
AA−
−
=
111 0
0 0is a generalised inverse. Indeed, since A and A11 are both rank r matrices, there exists a matrixK such that [A21 A22] = K [A11 A12] and K = A21 A11
1− , hence A22 = A21 A111− A12. Now
AA AA AA A
AA
IA A
A AA A
A AA A A A A
A−−
− −=
=
=
=
=11 12
21 22
111
21 111
11 12
21 22
11 12
21 21 111
12 22
00 0
00
.
Chapter 3, section 3.5. Multiple linear regression 111
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Application to regressionIn the case where X is not of maximum rank, there is a phenomenon of multicolinearity.The least squares method has then an infinite number of solutions.One of these solutions can be written in terms of a generalised inverse: $ ( ) .β = −X X X YT T
This is not an estimator of β.However, an estimator of E(y) = x β is determined uniquely by prediction x $β , in spite ofmulticolinearity.More generally an estimable function is a function of the parameters which can be uniquelyestimated.
Chapter 3, section 3.5. Multiple linear regression 112
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Example. Suppose that y = m + α1 x1 + α2 x2 + e, and X has the form:
X =
1 1 01 1 01 1 01 0 11 0 11 0 1
,
so that
X XT =
6 3 33 3 03 0 3
.
Chapter 3, section 3.5. Multiple linear regression 113
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Let us compute ( )X XT − . The matrix has rank 2. Since
3 00 3
13
0
0 13
1
=
−
, we have
(X X)T − =
0 0 0
0 13
0
0 0 13
hence
$β =
+ ++ ++ +
= + +
+ +
0 0 0
013 0
0 013
01313
1 6
1 2 3
4 5 6
1 2 3
4 5 6
. ( )
( )
.y yy y yy y y
y y y
y y y
"
Note that the first and second elements of $β estimate uniquely m + α1 and m + α2,respectively, hence α1 - α2 is estimated by ( ) /y y y y y y1 2 3 4 5 6 3+ + − − − , in a unique way.These are three estimable functions.It is well known that omitting the column of 1's in matrix X, or imposing constraint α1 + α2,to suppress multicolinearity.
Chapter 3, section 3.5. Multiple linear regression 114
Guy Mélard, 1997, 1999 ISRO, U.L.B.
CHOLESKY FACTORISATION
Definition
Algorithm
square root free variant
DefinitionLet us factorise a symmetric definite positive matrix A. It is such that the quadratic formvT A v > 0 for all vector v of ℜ k such that v ≠ 0.Such a matrix has all its eigenvalues strictly positive.A is factorised as a product of a lower triangular matrix L, by its transposed, a uppertriangular matrix: A = LU = LLT.A square root free Cholesky factorisation is: A = L*D*U* = L*DL*T, where matrix D* isdiagonal and the main diagonal of L* is composed of 1.We can then factorise L as L = L*D and define D* = DDT = D2.
Chapter 3, section 3.5. Multiple linear regression 115
Guy Mélard, 1997, 1999 ISRO, U.L.B.
AlgorithmLet us start with a 3×3 example:
Aa a a
a aa
ll ll l l
l l ll l
l= •
• •
=
11 12 13
22 23
33
11
21 22
31 32 33
11 21 31
22 32
33
0 00 0
0 0. .
The following relations are used, in the given order:a l11 11
2= l a11 11=a l l12 11 21= l a l21 12 11= /
a l l22 212
222= + l a l22 22 21
2= −a l l13 11 31= l a l31 13 11= /
a l l l l23 21 31 22 32= + l a l ll32
23 21 31
22
=−
a l l l33 312
322
332= + + l a l l33 33 31
2322= − −
which can be summarised by
la l l
lk i
l a l
ik
ki kj ijj
k
kk
ii ii ijj
i
=−
<
= −
=
−
=
−
∑
∑
1
1
2 2
1
1
,
Chapter 3, section 3.5. Multiple linear regression 116
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Remark. In order to determine a generalised inverse of A, the Cholesky factorisation can beused by controlling the denominator of the ratio. There is numerical instability if
d a l aii ii ijj
i
ii= − <=
−
∑ 2
1
1
η ,
where η depends on the machine precision (for example 10-5, in simple precision). Then lii isreplaced by 0. For an implementation, see Healy [1968a, 1968b].
Remark. The square root free variant is as follows:
Aa a a
a aa
ll l
dd
d
l ll= •
• •
=
11 12 13
22 23
33
21
31 32
11
22
33
21 31
32
1 0 01 0
1
0 00 00 0
10 10 0 1
*
* *
*
*
*
* *
*. . .
using the relations:a d11 11= * d a11 11
* =a d l12 11 21= * * l a d21 12 11
* */=a d l d22 11 21 22= +* *2 * d a d l22 22 11 21
* * *2= −a d l13 11 31= * * l a d31 13 11
* */=
a d l l d l l23 11 21 31 22 21 32= +* * * * * * l a d l ld l32
23 11 21 31
22 21
** * *
* *=−
a d l d l d33 11 31 22 32 33= + +* *2 * *2 * d a d l d l33 33 11 31 22 32* * *2 * *2= − −
which can be summarised by:
la d l l
dk i
d a d l
ik
ki ii kj ijj
k
ii
ii ii ii ijj
i
*
* * *
*
* * *2
,=−
<
= −
=
−
=
−
∑
∑
1
1
1
1
Chapter 3, section 3.5. Multiple linear regression 117
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Use in multiple regressionWe have a symmetric definite positive matrix A = XTX.Suppose factorisation XTX = LU.The normal equations are written as L U X YT( $ )β = which is a triangular system in U $β ,
which can be solved immediately by recurrence: U L X YT$β = −1 .
The system in $β is also in triangular formRemark. The algorithm is numerically less stable than modified Gram-Schmidt orHouseholder because we need XTX.
Chapter 3, section 3.5. Multiple linear regression 118
Guy Mélard, 1997, 1999 ISRO, U.L.B.
UTILISATION OF MATLABTo solve the system of equations A*x = b, we write
x =A\bIf matrix A is n x k, avec n > k, the system is impossible and MATLAB computes the leastsquares solutionTo obtain the usual statistics (R2, standard errors, etc.), it is better to perform a QRfactorisation (using a Householder transformation) as follows (excerpt from REGRESS.M, inthe Statistics Toolbox of MATLAB):
Regress.htm
Chapter 3, section 3.5. Multiple linear regression 119
Guy Mélard, 1997, 1999 ISRO, U.L.B.
Functions of MATLAB which correspond to algorithms considered in this section.
rank(X) rank of Xnorm(X) ornorm(X, 2) largest singular value of Xnorm(X, 1) largest sum of columns of Xnorm(X, inf) largest sum of rows of Xnorm(X, 'fro') Fröbenius norm[U, S, V] = svd(X) singular value decomposition of X, where U and V
are orthogonal matrices and S is diagonal, such thatX = U S VT
R = chol(A) Cholesky factorisation of matrix A (hermitian orsymmetric) positive definite; R is upper triangularsuch that A = RT R,
[R, p] = chol(A) factorisation such that A(1:p-1,1:p-1) = RT RY = pinv(X) (g4-) pseudo-inverse or Moore-Penrose inverseL = eig(X) vector of eigenvectors of X[V, L] = eig(X) matrix V of eigenvectors of X and vector L of eigenvalues
Remark. In both cases eig(X, Y) gives the solutions ofX v = Y v L (where L is a matrix)
null(X) null space of columns of Xorth(X)
ReferencesJ. M. CHAMBERS, «Computational Methods for Data Analysis», Wiley, New York, 1977.J. E. GENTLE, «Numerical Linear Algebra for Applications in Statistics», Springer-Verlag,1998.G. H. GOLUB and C. F. VAN LOAN, «Matrix Computations», John Hopkins UniversityPress, Baltimore, 1983.P. GRIFFITHS and I. D. HILL (editors), «Applied Statistics Algorithms», Ellis Horwood,Chichester, 1985.N. J. HIGHAM, «Accuracy and stability of numerical algorithms», Society for Industrial andApplied Mathematics, Philadelphia, 1996.M. J. R. HEALY, Algorithm AS6, Triangular decomposition of a symmetric matrix, J. Roy.Statist. Soc. Ser. C Applied Statistics, 1968a. Reprinted in GRIFFITH and HILL, pp. 43-45.M. J. R. HEALY, Algorithm AS7, Inversion of a positive semi-definite symmetric matrix, J.Roy. Statist. Soc. Ser. C Applied Statistics, 1968b. Reprinted in GRIFFITH and HILL, pp.46-48.W. J. KENNEDY, Jr. and J. E. GENTLE, «Statistical Computing», Marcel Dekker, NewYork, 1980.G. MELARD et J.-M. PASTEELS, Manuel d’utilisateur de Time Series Expert (TSE version2.2), Institut de Statistique, Université Libre de Bruxelles, Bruxelles.W. H. PRESS, B. P. FLANNERY, S. A. TEUKOLSKY, and W. T. VETTERLING,«Numerical Recipes in FORTRAN 77: The Art of Scientific Computing», CambridgeUniversity Press, Cambridge, 1986.R. A. THISTED, «Elements of Statistical Computing: Numerical Computation», Chapmanand Hall, New York, 1988.