# multiple linear regression - geosciences

Post on 22-Oct-2021

1 views

Embed Size (px)

TRANSCRIPT

Multiple Linear Regression

-:gression model that involves more than one regressor variable is called a : -rltiple regression model. Fitting and analyzing these models is discussed in this

.rter. The results are extensions of those in Chapter 2 for simple linear - :css ion .

1 I \IULTIPLE REGRESSION MODELS

.-rose that the yield in pounds of conversion in a chemical process depends on " rcrature and the catalyst concentration. A multiple regression model that *:t describe this relationship is

l : F o - t F f i r + F z x r * e ( 3 . 1 )

-:e ,1, denotes the yield, x, denotes the temperature, and x2 denotes the ,.rst concentration. This is a multiple linear regression model with two regres-

- '. ariables. The term linear is used because Eq. (3.1) is a linear function of the ..:r()wn parameters Be, Br, and B,r. . ne regression model in Eq. (3.1) describes a plane in the three-dimensional .-: of l, xt, and xr. Figure3.la shows this regression plane for the model

E ( y ) : 5 0 * 1 0 x , t 7 x ,

;r-d we have assumed that the expected value of the error term e in Eq. (3.1) is - ,. The parameter Bo is the intercept of the regression plane. If the range of the : - r inc ludes 11 : xz :0 , then Fo is the mean o f y whe i l x t : x2 :0 . Otherw ise

has no physical interpretation. The parameter Fr indicates the expected ,nse iil response (y) per unit change in x, when x, is held constant. Similarly measures the expected change in y per unit change in x, when x, is held

:r:triflt. Figure 3.1b shows a contour plot of the regression model, that is, lines of

.luction to Linear Regression Analysis, Fourth Edition. :)r)uglas C. Montgomery,F,lizabeth A. Peck, and Geoffrey Vining. ', rieht @ 2006 John Wiley & Sons, Inc.

63

64

(a) (b)

(a) ttre regression plane for the model E(y) :50 + L0rr -r 7xr. (b) The contour plot.

( 3 .2 )

(3 .3 |

(3 .4 )

( 3 . 5 )

| l * - , r - . l r

t} fraa {tbtrlll

I tl:n I :o ,r - -.: lnt,

k s a tlMs $c1

ff i<.r.r

* " + & r , =

{ , r = t

constant expected response E(y) as a function of x, and xr. Notice that the contour lines in this plot are parallel straight lines.

In general, the response y may be related to k regressor or predictor variables. The model

f : F o i - F f l t t F z x z + . . . + B p x p t e

! : F o * F f i * F z x z + B r x 3 + e

If we let xr: x, x2: x2, and .x3 : x3, then Eq. (3.3) can be written as

f : F o 1 - F f i r * F z x z * B r x r + . e

xE.H ir -J tiis called a multiple linear regression model with k regressors. The parameters F.,.

j :0,1,...,k, are called the regression coefficients. This model describes a hyperplane in the k-dimensional space of the regressor variables xr. The parame- ter Bt represents the expected change in the response y per unit change in x when all of the remaining regressor variables xi(i + /) are held constant. For this reason the parameters Fi, j : 1,2, . . . , k, are often called partiat regression coefficients.

Multiple linear regression models are often used as empirical models or approximating functions. That is, the true functional relationship between y and xy x2,. . ., xk is unknown, but over certain ranges of the regressor variables the linear regression model is an adequate approximation to the true unknown function.

Models that are more complex in structure than Eq. (3.2) may often still be analyzed by multiple linear regression techniques. For example, consider the cubic polynomial model

which is a multiple linear regression model with three regressor variables. Polyno mial models will be discussed in more detail in Chapter 7.

Models that include interaction effects may also be analyzed by multiple linear regression methods. For example, suppose that the model is

! : Bo - f F f i r t Fzxz * Fnx rx , * e

3.I MULTIPLE REGRESSION MODELS 65

xz 1 0

(a) (b)

ftlre 3.2 (c) Three-dimensional plot of regression model E(y) :50 * 1.0x1 * 7x2 * 5xp2. (D) The otour plot.

I we let x3 : xtxz and B3 : Ftz, then Eq. (3.5) can be written as

. y : F o * Q f l r * F z x z * B r x r * e

Itich is a linear regression model. Figure 3.2a shows the three-dimensional plot of the regression model

1 l : 5 0 * 1 0 x , * 7 x , l 5 x r x 2

rd Figu re 3.2b the corresponding two-dimensional contour plot. Notice that, *bough this model is a linear regression model, the shape of the surface that is ;nerated by the model is not linear. In general, any regression model that is hrr in the parameters (the B's) is a linear regression model, regardless of the -pe of the surface that it generates.

Figure 3.2 provides a nice graphical interpretation of an interaction. Generally, beraction implies that the effect produced by changing one variable (xr, say) tpends on the level of the other variable (x). For example, Figure 3.2 shows that &nging x, from 2 to 8 produces a much smaller change in E(y) when xz : 2 h whefl xz: 10. Interaction effects occur frequently in the study and analysis d real-world systems, and regression methods are one of the techniques that we C! use to describe them.

As a final example, consider the second-order model with interaction

! : Fo * Ftxr * Fzxz + Fnx? * Fzzx2z + Brrxrx, * e

(3.6)

(3.7)

f we le t x , . : x?, x4: x t r , x5 : x tx2, Fz: Fw F+: Br2, and Fs: Fp, then D+ (3.7) can be written as a multiple linear regression model as follows:

! : F o * F f i r * F z x z * F z x t + . F + x q * B r x r * e

Figure 3.3 shows the three-dimensional plot and the corresponding contour plot

Nill

66 MULTIPLE LINEAR REGRESSION

(a) (b)

Figure 3.3 (a) Three-dimensional plot of the regression model E(y) -- 800 + 10xr * 7x2 - 8.5x1 -

5xj + 4xrxr . (b) The contour p lot .

These plots indicate that the expected change in y when x, is changed by one unit (say) is a function of both x, and xz. The quadratic and interaction terms in this model produce a mound-shaped function. Depending on the values of the regres- sion coefficients, the second-order model with interaction is capable of assuming a wide variety of shapes; thus, it is a very flexible regression model.

In most real-world problems, the values of the parameters (the regression coefficients Fi) and the error variance o2 will not be known, and they must be estimated from sample data. The fitted regression equation or model is typically used in prediction of future observations of the response variable y or for estimating the mean response at particular levels of the y's.

3.2 ESTIMATION OF THE MODEL PARAMETERS

3.2.1 Least-Squares Estimation of the Regression CoefTicients

The method of least squares can be used to estimate the regression coefficients in F,q. (3.2). Suppose that n ) k observations are available, and leL y, denote the lth observed response and x,, denote the ith observation or level of regressor xr. The data will appear as in Table 3.1. We assume that the error term s in the model has E(e):0, Var( e) : cr2, and that the errors are uncorrelated.

Throughout this chapter we will assume that the regressor variables x1, x2,..., xk are fixed (i.e., mathematical or nonrandom) variables, measured without error. However, just as was discussed in Section 2.II for the simple linear

TABLE 3.1 Data for Multiple Linear Regression

RegressorsObservation, i

: model, a l l o1 . rl variables. ' ' i ln observati

. \ \ hen the da I be fixed r,

" : : t i r t the obsr :t r.rttt depend

: - , thcses or co t ' i . r ' g i r en-____

, Li f lc l Vanan----------{- , , \ n tL ' t hC Sar

t , -

\ , / l R, " , . t _ 1 . . .

1 / 1

\-

lcgression model, all of our results are still valid for the case where the regressors re random variables. This is certainly important, because when regression datarb€ from an observational study, some or most of the regressors will be randomreriables' When the data result iio- a designed experiment, it is more likely that6e 1'5 will be fixed variables. When the x's are random variables, it is onlyDcessary that the observations on each regressor be independent and that the{srribution not depend on the regression c6efficients (the F's) or on o2. when T$ing hypotheses or constructing -I.s, we will have to assume ,trui rtt.-.."0iffi"j €@r rtr%""".-.t r,+ --- r 1>kffivar:.rtncg [email protected](3.2)as

l: ESTIMATION OF THE MODEL PARAMETERS

l i : Fo * Ff i t * Fzxn + .. . * B1,x,p * e, k

: F o * D F , * , , I a ; , i : I , 2 , . . . , f l j : 1

Tbe least-squares function is

n n

s ( Fo , Fr , . . . , F , , ) : D t? : I i : l i : L

The function ,s must be minimized with respect to Fo, Fr,. . ., Fr,.The least-squares cslimators of Fo, By. . ., Fr, must satisff

: 0 (3.10a)

simplifing Eq. (3.10), we obtain the least-squares normar equations

nEo + pr i x , * Br i x iz r . . . + Br i x ik : i r , i : r i : r i : r i : l

n n f l n n

po I xn * BrDx?t + i . r f x i tx rz+ . . . +Br ix r txu , : I x i t t i i : r i : t i : I i : \ i : 1

( r , - Fo- 8u,. , , ) ' , r l

#1u,,8,, ,6o: -2i(' '- Bo- ,Lu,.,,)

#lr,,B,, ,Bo ',1,( ' ' - Eo

lbte that there are p : k + L

+ Er.f :*,**,r+ .. . * pri x?r,: i *,0y, (3.11) i : t i : L i : 1 ,

normal equations, one for each of the unknown

MULTIPLE LINEAR REGRESSION

regression coefficients. The solution to the normal equations will be the least- rq,ru"es estimatorr 60, 8r,.. ., Eo.

It is more convenient to deal with multiple regression models if they are expressed in matrix notation. This allows a very compact display of the model, data, and results. In matrix notation, the model given by Eq. (3.8) is

y :xF+e

T.l :olve the norn I I' Thus. the least-

F:lrr*J rhat the inr firr : the regressor r F r s a l i n e a r c o n

f s sas to see thi r fu r-alar form (3.

I * rrrlr-r:sJ matrlr l H r n r i i l t i s o b t r aG. G., _r -, L- o y

G r\c Jugonal e |} ns*r.rf I- and t*ctr n rhe col

i*lnr :t irrteJ r al

a r r ; - r r - r t l - 5 tnrtrerJ r alr

tltr r .tnlra * *"ax rr\r

f r r *nE l i l l d r

frrru n

v : x-

B: e -

In general, yis an n X L vectorof the observations, Xis an nxp matrixof the levels of the regressor variables, B is a p x L vector of the regression coefficients, and e is an n X I vector of random errors.

We wish to find the vector of least-squares estimators, B, that minimizes

S(B) : i " : : e ' t : (y -Xn ' (y - xB)

i : I

Note that S( F) may be expressed as

S ( B ) : y ' y - F ' X ' y - y ' X B + B ' X ' I ^ P : y ,y - 2B ,X ,y * B ,X ,XB

since F'X'y is a 1X l matrix, or a scalar, and its transpose (F'X'y)':y'XB is the same scalar. The least-squares estimators must satisfy

I |:l

n

ds l - l : -2x 'y+ lx 'xp :o aF la

X 'XP : X 'Y (3.r2)

the matrixEquations (3.12) are the least-squares nonnal equations. They are analogue of the scalar presentation in (3.11).

IT ESTIMATION OF THE MODEL PARAMETERS

To solve the normal equations, multiply both sides of (3.L2) by the inverse of

TI. Thus, the least-squares estimator of B is

F : (x'x) -tx'y (3.13)

;nided that the inverse matrix (X'X)-r exists. The (X'X)-l matrix will always if the regressors are linearly independent, that is, if no column of the X

rrix is a linear combination of the other columns. It is easy to see that the matrix form of the normal equations (3.L2) is identical tte scalar form (3.11). Writing ofi (3.ID in detail, wd obtain

69

n n F1 fan L x t L x i z

i : t i : l n n n

s r ! ? s r L x n L x i t L x i f i i z

i : l i : t i : 1 ,

: : :

Fo

B,

: Fr

n n n n s - s r s r F _ 2 L Xi* L X*Xtt L Xi*Xiz lJ * ik

j : 1 i : l i : I i : l

the indicated matrix multiplication is performed, the scalar form of the normal ions (3.11) is obtained. In this display we see that X'X is a p xp symmetric

ix and X'y is a p x 1 column vector. Note the special structure of the X'X ir The diagonal elements of X'X are the sums of squares of the elements in

columns of X, and the off-diagonal elements are the sums of cross products of elements in the columns of X. Furthermore, note that the elements of X'y are srms of cross products of the columns of X and the observations y,.

The fitted regression model corresponding to the levels of the regressor vari- x ' : [1 , xp x2 t . . . , x2 ] i s

9 : * 'B : i .o t

vector of fitted vdlues j, corresponding to the observed values y, is

g :XF:x (x 'X ) - t x ' y :Hy

k

(3.r4)

n x n matrix H : X(X'X)-lX' is usually called the hat matrix. It maps the of observed values into a vector of fitted values. The hat matrix and its

rties play a central role in regression analysis. The difference between the observed value y, and the corresponding fitted

fr is the residual €r : li - i. Th" n residuals may be conveniently written in notation as

e : y - X (3.1sa)

MULTIPLE LINEAR REGRESSION

There are several other ways to express the vector of residuals e that will prove

useful, including

e : y - X p - y - H y : ( I - H ) V (3.1sb)

j :.lIt\t..\TIoN OF

Fgrn J.{

la*crr .-ao b€ v -lrr rrtrir of *rFmr:ia.rn.rl plO ffrr i.rt:ern. Thr k"r r 31tr Of r.i lm I tri:-€F*-al SUn

tl rc .i r-rnable tmume intJ son {f: t :.t}-}n.

-e

Example 3.1 The Delivery Time Data

A soft drink bottler is analyzing the vending machine service routes in his distribution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the delivery time (y) are the number of cases of product stocked (x1) and the distance walked by the route driver (x2). The engineer has collected 25 observations on delivery time, which are shown in Table 3.2. (Note that this is an expansion of the data set used in Example 2.9.) We will fit the multiple linear regression model

! : F o * F f i r * B r x t * e

to the delivery time data in Table 3.2.

TABLE 3.2 Delivery Time Data for Example 3.1

Number of Cases,

Distance, xr( f t )

1, 2 3 4 5 6 7 8 9

10 1 1 12 t3 1,4 15 l6 T7 18 19 20 2l 22 23 24 25

16.68 11.50 12.03 14.88 13.75 18.11 8.00

t7.83 79.24 21.50 40.33 21,.00 13.50 19.75 24.00 29.00 15.35 19.00 9.50

35.10 L7.90 52.32 18.75 19.83 r0.75

7 J

30 5

17 10 26 9 8 4

560 220 340 80

150 330 110 2t0

1,460 605 688 2r5 255 462 448 776 200 r32 36

770 t40 810 450 635 150

l:l ESTIMATION OF THE MODEL PARAMETERS

Time

79.24

55.49

cases 2'oo

Figure 3.5 Three-dimensional scatterplot of the delivery time data from Example 3.1.

7l

15 25

t---_lt-l I

t'*' ]l- ..'. ]

[- ,, ]b,T, i

Figure 3.4 Scatterplot matrix for the delivery time data from Example 3.1.

Graphics can be very useful in fitting multiple regression models. Figure 3.4 is a rrlterplot matrix of the delivery time data. This is just a two-dimensional array of bdimensional plots, where (except for the diagonal) each frame contains a Glter diagram. Thus, each plot is an attempt to shed.light on the relationship breen a pair of variables. This is often a better summary of the relationships h a numerical summary (such as displaying the correlation coefficients between d pair of variables) because it gives a sense of linearity or nonlinearity of the drtionship and some awareness of how the individual data points are arranged tBr the region.

t-----l l ' l L r J l o l t . l

L';i,:':,,,'l

F] f , , , , , , , , . l 0 400 1000

fl r 1

I case. Ir1 t - l r..r.._..._-Jlr tr t t . .] l - ' . ' ' l l-. '.' i L . ' o o J

l : 3 , t i , , , J

L . l t l F {

l . . l t - t l- f. ' I l-:1 l l . - r ' r r l

r._1-r----..T-----] r . l L J

t l [ . . . ] f .t '

I I a . I F i I t a I

l o a . I

MULTIPLE LINEAR REGRESSION

When there are only two regressors, sometimes a three-dimensional scatter diagram is useful in visualizing the relationship between the response and the regressors. Figure 3.5 presents this plot for the delivery time data. By spinning these plots, some software packages permit different views of the point cloud. This view provides an indication that a multiple linear regression model may provide a reasonable fit to the data.

To fit the multiple regression model we first form the X matrix and y vector:

2r9 3,055

L33,899 6,

I 2 3 4 \ ; 7 8 9

l0 l t l : l-i l { .t5 l 6 l - t 8 l9 l0 : l : t - ;t

lr x

560 220 340 80

150 330 110 2r0

r460 605 688 215 255 462 448 776 200 r32 36

770 140 810 450 63s 150

v :

16.68 11.50 12.03 14.88 L3.75 18 .11 8.00

17.83 79.24 2r.50 40.33 2r.00 r3.50 19.75 24.00 29.00 15.35 19.00 9.50

35.10 17.90 52.32 18.75 19.83 r0.75

1 , 7 I 3 1 . 3 t 4 L 6 L 7 L 2 1 , 7 1 3 0 1 5 I L 6 1 1 0 1 . 4 L 6 t 9 1 1 0 1 , 6 T 7 L 3 L 1 7 1 1 0 t 2 6 1 , 9 1 8 L 4

[;:l: | 2 Iu, J lrc,z:

:f r; |- -o o

The X'X matrix is

X'Y :

f,,i ,,i::: ,,1]l:.::] : | ,,,,,iii,t2;l

ESTIMATION OF THE MODEL PARAMETERS

least-squares estimator of B is

F : (x'x)-tx'v

25 219

2r9 3,055

L33,899 6,

73

Lo,z32l -'

I ssl.oo I rr: ,aerl I t ,zts.++l 72s,688J L zn,on.oo lL0,232[l;]

I o.rr zrsrl :

f -o.ooo08367

I z.z+rztr rs I: I r.orssonzl lo.or+:s+al I

- o.oo4448se - o.oooos wlf sss.oo l

0.00274378 -0.00004786 ll

TABLE 3.3 Obsenations, Fitted Values, and Residuals for Example 3.1

Observation Number li

I 2 J

4 5 6 7 8 9

10 L T t2 t3 1.4 15 t6 t7 18 79 20 2l 22 23 24 25

16.68 11.50 12.03 14.88 13.75 18.11 8.00

17.83 79.24 21.50 40.33 21,.00 13.50 19.75 24.00 29.00 15.35 19.00 9.50

35.10 17.90 52.32 18.75 19.83 10.75

21,.7081, 10.3536 12.0798 9.9556

1,6.6734 7r.8203 19.1236 38.0925 2r.5930 12.4730 18.6825 23.3288 29.6629 14.9t36 t5.5514 7.7068

40.8880 20.5142 56.0065 23.3576 24.4028 r0.9626

-s.0287

- 0.5930 r.0270 1,.0675 0.6712

74

MULTIPLE LINEAR REGRESSION rJ ESTIMATION 01

Regression Analysis: Time versus Cases, Distance

The regress ion.equat ion is T i m e = 2 . 3 4 + 1 . 6 2 c a s e s + 0 . 0 1 - 4 4 D i s t a n c e

S = 3 . 25947 R- Sq= 96 .OVo

Analys is o f Var iance

Source DF Regress ion 2 Residual Er ror 22 Tota l 24

SE Coef T 1 - . 0 9 7 2 . r 3

0 . L 7 0 7 9 . 4 6 0 . 0 0 3 6 1 3 3 . 9 8

R- Sq (ad j 1 = 95 . 6Vo

SS MS 5 5 5 0 . 8 2 7 7 5 . 4

2 3 3 . 7 r 0 . 6 5 7 8 4 . 5

Seq SS 5 3 8 2 . 4

1 , 6 8 . 4

Source Cases Dis tance

Coef 2 . 3 4 L

r _ . 6 1 5 9 0 . 0 1 4 3 8 5

DF t- 1

P 0 . 0 4 4 0 . 0 0 0 0 . 0 0 r -

F

P

The least-squares fit (with the regression coefficients reported to five decimals) is

i :2.34123 + L.6L591,x1 + 0.01,438x'

Table 3.3 shows the observations y, along with the corresponding fitted values f, and the residuals e, from this model.

Computer Output Table 3.4 presents a portion of the MINITAB output for the soft drink delivery time data in Example 3.1,. While the output format differs from one computer program to another, this display contains the information typically generated. Most of the output in Table 3.4 is a straightforward extension to the multiple regression case of the computer output for simple linear regression. In the next few sections we will provide explanations of this output information.

3.2.2 A Geometrical Interpretation of Least Squares

An intuitive geometrical interpretation of least squares is sometimes helpful. We may think of the vector of observations y' :l ly!2,...,!rl as defining a vector from the origin to the point A in Figure 3.6. Note that f y !2,. .. , yn form the coordinates of an n-dimensional sample space. The sample space in Figure 3.6 is three-dimensional.

The X matrix consists of p (n x L) column vectors, for example, L (a column vector of L's), xpx2,. . . , Xk. Each of these columns defines a vector from the origin in the sample space. These p vectors form a p-dimensional subspace called the

ftln space. T rFcnt any poir f rr., - - .. rr. Thus, E Ip determir J r p r

lre. minimizi E t lo the estin fL r dmest to A. m space is I:m space. Tr t- It Thererore. ||F- rc may write

x recognize

z _ _ _

Figure 3.6 A geometrical interpretation of least squares.

space. The estimation space for p : 2 is shown in Figure 3.6. We may nt any point in this subspace by a linear combination of the vectors

r,...,x0. Thus, any point in the estimation space is of the form XB.I-et the XB determine the point B in Figure 3.6. The squared distance from B to

b just

s ( B) : (y - xp) ' (y - xB)

bre, minimizing the squared distance of point ,4 defined by the observation lr y to the estimation space requires finding the point in the estimation space is closest to A. The squared distance will be a minimum when the point in the

imation space is the foot of the line from A normal (or perpendicular) to the imation space. This is point C in Figure 3.6. This point is defined by the vector

- XP. Therefore, since y - f : y - XB is perpendicular to the estimation we mav wnte

X ' (y -xB) :o or X'XB : X'y

we recognize as the least-squares normal equations.

Properties of the Least-Squares Estimators

statistical properties of the least-squares estimatorF may be easily demon- lrated. Consider first bias:

E(B): a[{x 'x)- 'v 'v l : r [1x'x)- 'x ' (xp * ' ) l

: r [1x'x)- tx 'xB + (x 'x)- tx ' t l : B

-rc€ E(e):0 and (x'x)-lx'x: I. Thus, B ir an unbiased estimatorof B.

76 MULTIPLE LINEAR REGRESSION

by the covariance matrix

cov( p):u{ lp-E(P)I t p-n(P)l ' )

which is a p X p symmetric matrix whose 7th diagonal element is the variance o1 E,and whos" tA)tft off-$iagonal element is the covariance between B,^and p,.

The covariance matrix of P is found by applying a variance operator to F;

cov( p) : var( B) : var[1x'x)- 'x'yl

Now (X'X)-iX'is a matrix of constants, and the variance of y is o2I, so

var( p) : var[(x'x) - 'v'vl : (x'x) - lx' var(y) [(x'x)

- '* ' ]

: a l (x ,x) - tx ,X1x,x) - t : o t lx 'x)

- t

Therefore, if we t^et C : (x'X)*l, the variance o1 E, is ozC,, and the covariance between B, and F; is o'Cii.

Appendix C.4 dstablishes that the least-squares estimator B is the best linear unbiased estimator of B (the Gauss-Markov theorem). If we further assume that the errors s, are normally distributed, then as we will see in Section 3.2.6, p is also the maximum-tikelihood estimator of B. The maximum-likelihood estimator is the minimum variance unbiased estimator of B.

3.2.4 Estimation of o2

As in simple linear regression, we may devetop an estimator of o2 from the residual sum of squares

ss*", i0,-9,) ' : i r f : e 'e i : T i : I

Substituting e : y - X B, *"have

SS*" , : ( v - I ^B) ' ( v -xB)

:y ,y - B ,X ,y -y ,XB* B ,X ,XB

: y,y - ZB,X,y * B,X,XB

SS*", : y 'y - B'x'y (3.16)

Appendix C.3 shows that the residual sum of squares has n - p degrees of freedom associated with it since p parameters are estimated in the regression model. The residual mean square is

MS*", : SS*",

ls:e:"Jn C.3 also sh fror of a: is si,

b LtsJ in the s im fin*rt-

htt 3J The Det

ls r.li .\llmate the e * . f , r r Jel i rc.n-t im

F \ _ r : [ ] . ,

a : ' - { r : r 'h is ar:cl- \l

It -r .kpenr * s Jrrrre(

th : :3i:-s

n - p r |tt rf scel

D ESTIMATION OF THE MODEL PARAMETERS 77

so an unbiased

(3 .18)

ficndix C.3 also shows that -tor of o2 is given by

6 2 :

62 : MS*",

fi ooted in the simple linear regression case, this estimator of c2 is model f:rdent.

hple 3.2 The Delivery Time Data

triU estimate the error variance az for the multiple regression model fit to the J drink delivery time data in Example 3.1. Since

: 18,310.6290"

L337,072.00 ) : 18,076.90304

S S * " r : Y ' Y - B ' x ' Y

: 18,3L0 .6290 - 18,076.9030 :233.7260

Drefore, the estimate of o2 is the residual mean square

25 y ' y : Dy?

n - p 2 5 - 3

llc MINITAB output in Table 3.4 reports the residual mean square as 10.6. The model-dependent nature of this estimate a2 may be easily demonstrated.

fture 2.13 displays the computer output from a least-squares fit to the delivery be data using only one regressor, cases (xr). The residual mean square for this Ddel is 17.5, which is considerably larger than the result obtained above for the bregressor model. Which estimate is "correct"? Both estimates are in a sense Grect, but they depend heavily on the choice of model. Perhaps a better question I shich model is correct? Since o2 is the variance of the errors (the unexplained fise about the regression line), we would usually prefer a model with a small rrilual mean square to a model with a large one.

12.5 Inadequacy of Scatter Diagrams in Multiple Regression

Je saw in Chapter 2 that the scatter diagram is an important tool in analyzing the dationship between y and r in simple linear regression. We also saw in Example t,l that a matrix of scatterplots was useful in visualizing the relationship between

r--l tJtl | ' lf. l t ][ ,' .J

10 30 50

Figure 3.7 A matrix of scatterplots.

y and two regressors. It is tempting to conclude that this is a general concept; that is, examining scatter diagrams of y versus xp ! versus xz,. .., y versus Jk is always useful in assessing the relationships between y and each of the regressors xp x2t. . ., xk. Unfortunately, this is not true in general.

Following Daniel and Wood [1980], we illustrate the inadequacy of scatter diagrams for a problem with t'wo regressors. Consider the data shown in Figure 3.7. These data were generated from the equation

Y : 8 - 5 x r * L 2 x ,

The matrix of scatterplots is shown in Figure 3.7.The y-versus-x, plot does not exhibit any apparent relationship between the two variables. The y-versus-x, plot indicates that a linear relationship exists, with a slope of approximately 8. Note that both scatter diagrams convey erroneous information. Since in this data set there are two pairs of points that have the same x, values (xz: 2 and xz:^ 4), we could measure the x, effect at fixed x, from both pairs. This gives, Fr: ( t l - 2 7 ) / ( 3 - 1 ) : - 5 f o r x z 2 a n d F t : Q S - L 6 ) / ( 6 - 8 ) : - 5 f o r x z : 4 the correct results. Knowing Fp we could now estimate the x2 effect. This procedure is not generally useful, however, because many data sets do not have duplicate points.

This example illustrates that constructing scatter diagrams of y versus xj (j : 1,2, . . . , k) can be misleading, even in the case of only two regressors operating in a perfectly additive fashion with no noise. A more realistic rejression situation with several regressors and error in the y's would confuse the situation even further. If there is only one (or a few) dominant regressor, or if the regressors

78 MULTIPLE LINEAR REGRESSION : FJTIMATION OF T

f,E:arc- nearly indep r?so s€r'eral importz o:rra:lms can be very Grc. r =L'rn'een severa]

I '^ \laximum-Likel

br rs in the simple I Lrr:( srtimators for tl -rt. crTors are nol r*fr..i:rrf,\. The mOde

r:f '.h

oro

X2X1

1 0 2 1 1 7 3 2 4 8 4 5 2 7 1 2 5 5 5 6 2 6 6 4 9 7 3 c o

1 6 8 4

[ , ' , ' ,

]

/ - t r . p . o

l . [ n I p . o : l =

n:r' 1"r3 a t-Uer

'ti.s-r.lrj est

nearly independently, the matrix of scatterplots is most useful. Flowever, several important regressors are themselves interrelated, then these scatter

can be very misleading. Analytical methods for sorting out the relation- between several regressors and a response are discussed in Chapter 9.

Maximum-Likelihood Estimation

c in the simple linear regression case, we can show that the maximum-likeli- estimators for the model parameters in multiple linear regression when the

errors are normally and independently distributed are also least-squares . The model is

Y : X B T E

the errors are normally and independently distributed with constant variance '. q e is distributed as N(0, o2I). The normal density function for the errors is

79

av z7r \ za- I

likelihood function is the joint density of a1, 82, . . . 1 Ent or I-Il:r fG). There- the likelihood function is

L(t, F, .- ') : n f (",) : ^+r, -*pf +r'r )' i : 1 . (2n ) " 'o ' ^

\ 2o ' I

since we can write r : y - X P, the likelihood function becomes

L(y,x, F, o\ : -+exp[- +(v - xp)'(y - xp))' ( 2 o ) ' / ' o n

' \ 2 0 "

F ) \ r ' - F I

in the simple linear regression case, it is convenient to work with the log of the

lnz(v , x ,B, . - ' ) : - ;h (2n) -n tn(o) - f i f t -xD

' (y - xp)

I b clear that for a fixed value of o the log-likelihood is maximized when the term

(v-xF) '0-xB)

i minimized. Therefore, the maximum-likelihood estimator of B under normal errors is equivalent to the least-squares estimator F : (X'X)-tX'y. The Drimum-liketihood estimator of o2 is

- ) o. - - ( y -xF) ' (v -xB)

80 MULTIPLE LINEAR REGRESSION

These are multiple linear regression generalizations of the results given for simple linear regression in Section 2.10. The statistical properties of the maximum-likeli- hood estimators are summarized in Section 2.L0.

3.3 I{YPOTHESIS TESTING IN MULTIPLE LINEAR RBGRESSION

Once we have estimated the parameters in the model, we face two immediate questions:

1. What is the overall adequacy of the model? 2. Which specific regressors seem important?

Several hypothesis testing procedures prove useful for addressing these questions. The formal tests require that our random errors be independent and follow a normal distr ibution with mean E(e,):0 and variance Var(e,): o2.

3.3.1 Test for Significance of Regression

The test for significance of regression is a test to determine if there is a linear relationship between the response y and any of the regressor variables x12 x.t xo. This procedure is often thought of as an overall or global test of model adequacy. The appropriate hypotheses are

H o : F r : F r : : 9 * : 0

Hr: B, + 0 for at least one /

Rejection of this null hypothesis implies that at least one of the regressors x11 x2 xo contributes significantly to the model.

The test procedure is a generalization of the analysis of variance used in simple linear regression. The total sum of squares SS., is partitioned into a sum of squares due to regression, SSp, and a residual sum of squares, SS*"r. Thus,

S S r : S S * + S S o " ,

Appendix C.3 shows that if the null hypothesis is true, then SS*/o2 follows & X;: distribution, which has the same number of degrees of freedom as number of regressor variables in the model. Appendix C.3 also shows that SS*.,/ o' - Xn2-t - and that SS*". and SS* are independent. By the definition of an F statistic given in Appendix C.1,

trr o - ssR/k MS*

follows the Fo.n-k-l distribution. Appendix C.3 shows that

E ( M S R . . ) : o '

E ( M S R ) : o ' + F* 'X'rX.. F*

kd

l j : ( F t , F

- - r c d m e r r ; i r tha t . :hcn F( l : r c J o m I

: : . : l i t r p i ' . - . 1 \ t

O n : ] l ' \L l tc t

. 1 . , ) | \

-:gression model that involves more than one regressor variable is called a : -rltiple regression model. Fitting and analyzing these models is discussed in this

.rter. The results are extensions of those in Chapter 2 for simple linear - :css ion .

1 I \IULTIPLE REGRESSION MODELS

.-rose that the yield in pounds of conversion in a chemical process depends on " rcrature and the catalyst concentration. A multiple regression model that *:t describe this relationship is

l : F o - t F f i r + F z x r * e ( 3 . 1 )

-:e ,1, denotes the yield, x, denotes the temperature, and x2 denotes the ,.rst concentration. This is a multiple linear regression model with two regres-

- '. ariables. The term linear is used because Eq. (3.1) is a linear function of the ..:r()wn parameters Be, Br, and B,r. . ne regression model in Eq. (3.1) describes a plane in the three-dimensional .-: of l, xt, and xr. Figure3.la shows this regression plane for the model

E ( y ) : 5 0 * 1 0 x , t 7 x ,

;r-d we have assumed that the expected value of the error term e in Eq. (3.1) is - ,. The parameter Bo is the intercept of the regression plane. If the range of the : - r inc ludes 11 : xz :0 , then Fo is the mean o f y whe i l x t : x2 :0 . Otherw ise

has no physical interpretation. The parameter Fr indicates the expected ,nse iil response (y) per unit change in x, when x, is held constant. Similarly measures the expected change in y per unit change in x, when x, is held

:r:triflt. Figure 3.1b shows a contour plot of the regression model, that is, lines of

.luction to Linear Regression Analysis, Fourth Edition. :)r)uglas C. Montgomery,F,lizabeth A. Peck, and Geoffrey Vining. ', rieht @ 2006 John Wiley & Sons, Inc.

63

64

(a) (b)

(a) ttre regression plane for the model E(y) :50 + L0rr -r 7xr. (b) The contour plot.

( 3 .2 )

(3 .3 |

(3 .4 )

( 3 . 5 )

| l * - , r - . l r

t} fraa {tbtrlll

I tl:n I :o ,r - -.: lnt,

k s a tlMs $c1

ff i<.r.r

* " + & r , =

{ , r = t

constant expected response E(y) as a function of x, and xr. Notice that the contour lines in this plot are parallel straight lines.

In general, the response y may be related to k regressor or predictor variables. The model

f : F o i - F f l t t F z x z + . . . + B p x p t e

! : F o * F f i * F z x z + B r x 3 + e

If we let xr: x, x2: x2, and .x3 : x3, then Eq. (3.3) can be written as

f : F o 1 - F f i r * F z x z * B r x r + . e

xE.H ir -J tiis called a multiple linear regression model with k regressors. The parameters F.,.

j :0,1,...,k, are called the regression coefficients. This model describes a hyperplane in the k-dimensional space of the regressor variables xr. The parame- ter Bt represents the expected change in the response y per unit change in x when all of the remaining regressor variables xi(i + /) are held constant. For this reason the parameters Fi, j : 1,2, . . . , k, are often called partiat regression coefficients.

Multiple linear regression models are often used as empirical models or approximating functions. That is, the true functional relationship between y and xy x2,. . ., xk is unknown, but over certain ranges of the regressor variables the linear regression model is an adequate approximation to the true unknown function.

Models that are more complex in structure than Eq. (3.2) may often still be analyzed by multiple linear regression techniques. For example, consider the cubic polynomial model

which is a multiple linear regression model with three regressor variables. Polyno mial models will be discussed in more detail in Chapter 7.

Models that include interaction effects may also be analyzed by multiple linear regression methods. For example, suppose that the model is

! : Bo - f F f i r t Fzxz * Fnx rx , * e

3.I MULTIPLE REGRESSION MODELS 65

xz 1 0

(a) (b)

ftlre 3.2 (c) Three-dimensional plot of regression model E(y) :50 * 1.0x1 * 7x2 * 5xp2. (D) The otour plot.

I we let x3 : xtxz and B3 : Ftz, then Eq. (3.5) can be written as

. y : F o * Q f l r * F z x z * B r x r * e

Itich is a linear regression model. Figure 3.2a shows the three-dimensional plot of the regression model

1 l : 5 0 * 1 0 x , * 7 x , l 5 x r x 2

rd Figu re 3.2b the corresponding two-dimensional contour plot. Notice that, *bough this model is a linear regression model, the shape of the surface that is ;nerated by the model is not linear. In general, any regression model that is hrr in the parameters (the B's) is a linear regression model, regardless of the -pe of the surface that it generates.

Figure 3.2 provides a nice graphical interpretation of an interaction. Generally, beraction implies that the effect produced by changing one variable (xr, say) tpends on the level of the other variable (x). For example, Figure 3.2 shows that &nging x, from 2 to 8 produces a much smaller change in E(y) when xz : 2 h whefl xz: 10. Interaction effects occur frequently in the study and analysis d real-world systems, and regression methods are one of the techniques that we C! use to describe them.

As a final example, consider the second-order model with interaction

! : Fo * Ftxr * Fzxz + Fnx? * Fzzx2z + Brrxrx, * e

(3.6)

(3.7)

f we le t x , . : x?, x4: x t r , x5 : x tx2, Fz: Fw F+: Br2, and Fs: Fp, then D+ (3.7) can be written as a multiple linear regression model as follows:

! : F o * F f i r * F z x z * F z x t + . F + x q * B r x r * e

Figure 3.3 shows the three-dimensional plot and the corresponding contour plot

Nill

66 MULTIPLE LINEAR REGRESSION

(a) (b)

Figure 3.3 (a) Three-dimensional plot of the regression model E(y) -- 800 + 10xr * 7x2 - 8.5x1 -

5xj + 4xrxr . (b) The contour p lot .

These plots indicate that the expected change in y when x, is changed by one unit (say) is a function of both x, and xz. The quadratic and interaction terms in this model produce a mound-shaped function. Depending on the values of the regres- sion coefficients, the second-order model with interaction is capable of assuming a wide variety of shapes; thus, it is a very flexible regression model.

In most real-world problems, the values of the parameters (the regression coefficients Fi) and the error variance o2 will not be known, and they must be estimated from sample data. The fitted regression equation or model is typically used in prediction of future observations of the response variable y or for estimating the mean response at particular levels of the y's.

3.2 ESTIMATION OF THE MODEL PARAMETERS

3.2.1 Least-Squares Estimation of the Regression CoefTicients

The method of least squares can be used to estimate the regression coefficients in F,q. (3.2). Suppose that n ) k observations are available, and leL y, denote the lth observed response and x,, denote the ith observation or level of regressor xr. The data will appear as in Table 3.1. We assume that the error term s in the model has E(e):0, Var( e) : cr2, and that the errors are uncorrelated.

Throughout this chapter we will assume that the regressor variables x1, x2,..., xk are fixed (i.e., mathematical or nonrandom) variables, measured without error. However, just as was discussed in Section 2.II for the simple linear

TABLE 3.1 Data for Multiple Linear Regression

RegressorsObservation, i

: model, a l l o1 . rl variables. ' ' i ln observati

. \ \ hen the da I be fixed r,

" : : t i r t the obsr :t r.rttt depend

: - , thcses or co t ' i . r ' g i r en-____

, Li f lc l Vanan----------{- , , \ n tL ' t hC Sar

t , -

\ , / l R, " , . t _ 1 . . .

1 / 1

\-

lcgression model, all of our results are still valid for the case where the regressors re random variables. This is certainly important, because when regression datarb€ from an observational study, some or most of the regressors will be randomreriables' When the data result iio- a designed experiment, it is more likely that6e 1'5 will be fixed variables. When the x's are random variables, it is onlyDcessary that the observations on each regressor be independent and that the{srribution not depend on the regression c6efficients (the F's) or on o2. when T$ing hypotheses or constructing -I.s, we will have to assume ,trui rtt.-.."0iffi"j €@r rtr%""".-.t r,+ --- r 1>kffivar:.rtncg [email protected](3.2)as

l: ESTIMATION OF THE MODEL PARAMETERS

l i : Fo * Ff i t * Fzxn + .. . * B1,x,p * e, k

: F o * D F , * , , I a ; , i : I , 2 , . . . , f l j : 1

Tbe least-squares function is

n n

s ( Fo , Fr , . . . , F , , ) : D t? : I i : l i : L

The function ,s must be minimized with respect to Fo, Fr,. . ., Fr,.The least-squares cslimators of Fo, By. . ., Fr, must satisff

: 0 (3.10a)

simplifing Eq. (3.10), we obtain the least-squares normar equations

nEo + pr i x , * Br i x iz r . . . + Br i x ik : i r , i : r i : r i : r i : l

n n f l n n

po I xn * BrDx?t + i . r f x i tx rz+ . . . +Br ix r txu , : I x i t t i i : r i : t i : I i : \ i : 1

( r , - Fo- 8u,. , , ) ' , r l

#1u,,8,, ,6o: -2i(' '- Bo- ,Lu,.,,)

#lr,,B,, ,Bo ',1,( ' ' - Eo

lbte that there are p : k + L

+ Er.f :*,**,r+ .. . * pri x?r,: i *,0y, (3.11) i : t i : L i : 1 ,

normal equations, one for each of the unknown

MULTIPLE LINEAR REGRESSION

regression coefficients. The solution to the normal equations will be the least- rq,ru"es estimatorr 60, 8r,.. ., Eo.

It is more convenient to deal with multiple regression models if they are expressed in matrix notation. This allows a very compact display of the model, data, and results. In matrix notation, the model given by Eq. (3.8) is

y :xF+e

T.l :olve the norn I I' Thus. the least-

F:lrr*J rhat the inr firr : the regressor r F r s a l i n e a r c o n

f s sas to see thi r fu r-alar form (3.

I * rrrlr-r:sJ matrlr l H r n r i i l t i s o b t r aG. G., _r -, L- o y

G r\c Jugonal e |} ns*r.rf I- and t*ctr n rhe col

i*lnr :t irrteJ r al

a r r ; - r r - r t l - 5 tnrtrerJ r alr

tltr r .tnlra * *"ax rr\r

f r r *nE l i l l d r

frrru n

v : x-

B: e -

In general, yis an n X L vectorof the observations, Xis an nxp matrixof the levels of the regressor variables, B is a p x L vector of the regression coefficients, and e is an n X I vector of random errors.

We wish to find the vector of least-squares estimators, B, that minimizes

S(B) : i " : : e ' t : (y -Xn ' (y - xB)

i : I

Note that S( F) may be expressed as

S ( B ) : y ' y - F ' X ' y - y ' X B + B ' X ' I ^ P : y ,y - 2B ,X ,y * B ,X ,XB

since F'X'y is a 1X l matrix, or a scalar, and its transpose (F'X'y)':y'XB is the same scalar. The least-squares estimators must satisfy

I |:l

n

ds l - l : -2x 'y+ lx 'xp :o aF la

X 'XP : X 'Y (3.r2)

the matrixEquations (3.12) are the least-squares nonnal equations. They are analogue of the scalar presentation in (3.11).

IT ESTIMATION OF THE MODEL PARAMETERS

To solve the normal equations, multiply both sides of (3.L2) by the inverse of

TI. Thus, the least-squares estimator of B is

F : (x'x) -tx'y (3.13)

;nided that the inverse matrix (X'X)-r exists. The (X'X)-l matrix will always if the regressors are linearly independent, that is, if no column of the X

rrix is a linear combination of the other columns. It is easy to see that the matrix form of the normal equations (3.L2) is identical tte scalar form (3.11). Writing ofi (3.ID in detail, wd obtain

69

n n F1 fan L x t L x i z

i : t i : l n n n

s r ! ? s r L x n L x i t L x i f i i z

i : l i : t i : 1 ,

: : :

Fo

B,

: Fr

n n n n s - s r s r F _ 2 L Xi* L X*Xtt L Xi*Xiz lJ * ik

j : 1 i : l i : I i : l

the indicated matrix multiplication is performed, the scalar form of the normal ions (3.11) is obtained. In this display we see that X'X is a p xp symmetric

ix and X'y is a p x 1 column vector. Note the special structure of the X'X ir The diagonal elements of X'X are the sums of squares of the elements in

columns of X, and the off-diagonal elements are the sums of cross products of elements in the columns of X. Furthermore, note that the elements of X'y are srms of cross products of the columns of X and the observations y,.

The fitted regression model corresponding to the levels of the regressor vari- x ' : [1 , xp x2 t . . . , x2 ] i s

9 : * 'B : i .o t

vector of fitted vdlues j, corresponding to the observed values y, is

g :XF:x (x 'X ) - t x ' y :Hy

k

(3.r4)

n x n matrix H : X(X'X)-lX' is usually called the hat matrix. It maps the of observed values into a vector of fitted values. The hat matrix and its

rties play a central role in regression analysis. The difference between the observed value y, and the corresponding fitted

fr is the residual €r : li - i. Th" n residuals may be conveniently written in notation as

e : y - X (3.1sa)

MULTIPLE LINEAR REGRESSION

There are several other ways to express the vector of residuals e that will prove

useful, including

e : y - X p - y - H y : ( I - H ) V (3.1sb)

j :.lIt\t..\TIoN OF

Fgrn J.{

la*crr .-ao b€ v -lrr rrtrir of *rFmr:ia.rn.rl plO ffrr i.rt:ern. Thr k"r r 31tr Of r.i lm I tri:-€F*-al SUn

tl rc .i r-rnable tmume intJ son {f: t :.t}-}n.

-e

Example 3.1 The Delivery Time Data

A soft drink bottler is analyzing the vending machine service routes in his distribution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the delivery time (y) are the number of cases of product stocked (x1) and the distance walked by the route driver (x2). The engineer has collected 25 observations on delivery time, which are shown in Table 3.2. (Note that this is an expansion of the data set used in Example 2.9.) We will fit the multiple linear regression model

! : F o * F f i r * B r x t * e

to the delivery time data in Table 3.2.

TABLE 3.2 Delivery Time Data for Example 3.1

Number of Cases,

Distance, xr( f t )

1, 2 3 4 5 6 7 8 9

10 1 1 12 t3 1,4 15 l6 T7 18 19 20 2l 22 23 24 25

16.68 11.50 12.03 14.88 13.75 18.11 8.00

t7.83 79.24 21.50 40.33 21,.00 13.50 19.75 24.00 29.00 15.35 19.00 9.50

35.10 L7.90 52.32 18.75 19.83 r0.75

7 J

30 5

17 10 26 9 8 4

560 220 340 80

150 330 110 2t0

1,460 605 688 2r5 255 462 448 776 200 r32 36

770 t40 810 450 635 150

l:l ESTIMATION OF THE MODEL PARAMETERS

Time

79.24

55.49

cases 2'oo

Figure 3.5 Three-dimensional scatterplot of the delivery time data from Example 3.1.

7l

15 25

t---_lt-l I

t'*' ]l- ..'. ]

[- ,, ]b,T, i

Figure 3.4 Scatterplot matrix for the delivery time data from Example 3.1.

Graphics can be very useful in fitting multiple regression models. Figure 3.4 is a rrlterplot matrix of the delivery time data. This is just a two-dimensional array of bdimensional plots, where (except for the diagonal) each frame contains a Glter diagram. Thus, each plot is an attempt to shed.light on the relationship breen a pair of variables. This is often a better summary of the relationships h a numerical summary (such as displaying the correlation coefficients between d pair of variables) because it gives a sense of linearity or nonlinearity of the drtionship and some awareness of how the individual data points are arranged tBr the region.

t-----l l ' l L r J l o l t . l

L';i,:':,,,'l

F] f , , , , , , , , . l 0 400 1000

fl r 1

I case. Ir1 t - l r..r.._..._-Jlr tr t t . .] l - ' . ' ' l l-. '.' i L . ' o o J

l : 3 , t i , , , J

L . l t l F {

l . . l t - t l- f. ' I l-:1 l l . - r ' r r l

r._1-r----..T-----] r . l L J

t l [ . . . ] f .t '

I I a . I F i I t a I

l o a . I

MULTIPLE LINEAR REGRESSION

When there are only two regressors, sometimes a three-dimensional scatter diagram is useful in visualizing the relationship between the response and the regressors. Figure 3.5 presents this plot for the delivery time data. By spinning these plots, some software packages permit different views of the point cloud. This view provides an indication that a multiple linear regression model may provide a reasonable fit to the data.

To fit the multiple regression model we first form the X matrix and y vector:

2r9 3,055

L33,899 6,

I 2 3 4 \ ; 7 8 9

l0 l t l : l-i l { .t5 l 6 l - t 8 l9 l0 : l : t - ;t

lr x

560 220 340 80

150 330 110 2r0

r460 605 688 215 255 462 448 776 200 r32 36

770 140 810 450 63s 150

v :

16.68 11.50 12.03 14.88 L3.75 18 .11 8.00

17.83 79.24 2r.50 40.33 2r.00 r3.50 19.75 24.00 29.00 15.35 19.00 9.50

35.10 17.90 52.32 18.75 19.83 r0.75

1 , 7 I 3 1 . 3 t 4 L 6 L 7 L 2 1 , 7 1 3 0 1 5 I L 6 1 1 0 1 . 4 L 6 t 9 1 1 0 1 , 6 T 7 L 3 L 1 7 1 1 0 t 2 6 1 , 9 1 8 L 4

[;:l: | 2 Iu, J lrc,z:

:f r; |- -o o

The X'X matrix is

X'Y :

f,,i ,,i::: ,,1]l:.::] : | ,,,,,iii,t2;l

ESTIMATION OF THE MODEL PARAMETERS

least-squares estimator of B is

F : (x'x)-tx'v

25 219

2r9 3,055

L33,899 6,

73

Lo,z32l -'

I ssl.oo I rr: ,aerl I t ,zts.++l 72s,688J L zn,on.oo lL0,232[l;]

I o.rr zrsrl :

f -o.ooo08367

I z.z+rztr rs I: I r.orssonzl lo.or+:s+al I

- o.oo4448se - o.oooos wlf sss.oo l

0.00274378 -0.00004786 ll

TABLE 3.3 Obsenations, Fitted Values, and Residuals for Example 3.1

Observation Number li

I 2 J

4 5 6 7 8 9

10 L T t2 t3 1.4 15 t6 t7 18 79 20 2l 22 23 24 25

16.68 11.50 12.03 14.88 13.75 18.11 8.00

17.83 79.24 21.50 40.33 21,.00 13.50 19.75 24.00 29.00 15.35 19.00 9.50

35.10 17.90 52.32 18.75 19.83 10.75

21,.7081, 10.3536 12.0798 9.9556

1,6.6734 7r.8203 19.1236 38.0925 2r.5930 12.4730 18.6825 23.3288 29.6629 14.9t36 t5.5514 7.7068

40.8880 20.5142 56.0065 23.3576 24.4028 r0.9626

-s.0287

- 0.5930 r.0270 1,.0675 0.6712

74

MULTIPLE LINEAR REGRESSION rJ ESTIMATION 01

Regression Analysis: Time versus Cases, Distance

The regress ion.equat ion is T i m e = 2 . 3 4 + 1 . 6 2 c a s e s + 0 . 0 1 - 4 4 D i s t a n c e

S = 3 . 25947 R- Sq= 96 .OVo

Analys is o f Var iance

Source DF Regress ion 2 Residual Er ror 22 Tota l 24

SE Coef T 1 - . 0 9 7 2 . r 3

0 . L 7 0 7 9 . 4 6 0 . 0 0 3 6 1 3 3 . 9 8

R- Sq (ad j 1 = 95 . 6Vo

SS MS 5 5 5 0 . 8 2 7 7 5 . 4

2 3 3 . 7 r 0 . 6 5 7 8 4 . 5

Seq SS 5 3 8 2 . 4

1 , 6 8 . 4

Source Cases Dis tance

Coef 2 . 3 4 L

r _ . 6 1 5 9 0 . 0 1 4 3 8 5

DF t- 1

P 0 . 0 4 4 0 . 0 0 0 0 . 0 0 r -

F

P

The least-squares fit (with the regression coefficients reported to five decimals) is

i :2.34123 + L.6L591,x1 + 0.01,438x'

Table 3.3 shows the observations y, along with the corresponding fitted values f, and the residuals e, from this model.

Computer Output Table 3.4 presents a portion of the MINITAB output for the soft drink delivery time data in Example 3.1,. While the output format differs from one computer program to another, this display contains the information typically generated. Most of the output in Table 3.4 is a straightforward extension to the multiple regression case of the computer output for simple linear regression. In the next few sections we will provide explanations of this output information.

3.2.2 A Geometrical Interpretation of Least Squares

An intuitive geometrical interpretation of least squares is sometimes helpful. We may think of the vector of observations y' :l ly!2,...,!rl as defining a vector from the origin to the point A in Figure 3.6. Note that f y !2,. .. , yn form the coordinates of an n-dimensional sample space. The sample space in Figure 3.6 is three-dimensional.

The X matrix consists of p (n x L) column vectors, for example, L (a column vector of L's), xpx2,. . . , Xk. Each of these columns defines a vector from the origin in the sample space. These p vectors form a p-dimensional subspace called the

ftln space. T rFcnt any poir f rr., - - .. rr. Thus, E Ip determir J r p r

lre. minimizi E t lo the estin fL r dmest to A. m space is I:m space. Tr t- It Thererore. ||F- rc may write

x recognize

z _ _ _

Figure 3.6 A geometrical interpretation of least squares.

space. The estimation space for p : 2 is shown in Figure 3.6. We may nt any point in this subspace by a linear combination of the vectors

r,...,x0. Thus, any point in the estimation space is of the form XB.I-et the XB determine the point B in Figure 3.6. The squared distance from B to

b just

s ( B) : (y - xp) ' (y - xB)

bre, minimizing the squared distance of point ,4 defined by the observation lr y to the estimation space requires finding the point in the estimation space is closest to A. The squared distance will be a minimum when the point in the

imation space is the foot of the line from A normal (or perpendicular) to the imation space. This is point C in Figure 3.6. This point is defined by the vector

- XP. Therefore, since y - f : y - XB is perpendicular to the estimation we mav wnte

X ' (y -xB) :o or X'XB : X'y

we recognize as the least-squares normal equations.

Properties of the Least-Squares Estimators

statistical properties of the least-squares estimatorF may be easily demon- lrated. Consider first bias:

E(B): a[{x 'x)- 'v 'v l : r [1x'x)- 'x ' (xp * ' ) l

: r [1x'x)- tx 'xB + (x 'x)- tx ' t l : B

-rc€ E(e):0 and (x'x)-lx'x: I. Thus, B ir an unbiased estimatorof B.

76 MULTIPLE LINEAR REGRESSION

by the covariance matrix

cov( p):u{ lp-E(P)I t p-n(P)l ' )

which is a p X p symmetric matrix whose 7th diagonal element is the variance o1 E,and whos" tA)tft off-$iagonal element is the covariance between B,^and p,.

The covariance matrix of P is found by applying a variance operator to F;

cov( p) : var( B) : var[1x'x)- 'x'yl

Now (X'X)-iX'is a matrix of constants, and the variance of y is o2I, so

var( p) : var[(x'x) - 'v'vl : (x'x) - lx' var(y) [(x'x)

- '* ' ]

: a l (x ,x) - tx ,X1x,x) - t : o t lx 'x)

- t

Therefore, if we t^et C : (x'X)*l, the variance o1 E, is ozC,, and the covariance between B, and F; is o'Cii.

Appendix C.4 dstablishes that the least-squares estimator B is the best linear unbiased estimator of B (the Gauss-Markov theorem). If we further assume that the errors s, are normally distributed, then as we will see in Section 3.2.6, p is also the maximum-tikelihood estimator of B. The maximum-likelihood estimator is the minimum variance unbiased estimator of B.

3.2.4 Estimation of o2

As in simple linear regression, we may devetop an estimator of o2 from the residual sum of squares

ss*", i0,-9,) ' : i r f : e 'e i : T i : I

Substituting e : y - X B, *"have

SS*" , : ( v - I ^B) ' ( v -xB)

:y ,y - B ,X ,y -y ,XB* B ,X ,XB

: y,y - ZB,X,y * B,X,XB

SS*", : y 'y - B'x'y (3.16)

Appendix C.3 shows that the residual sum of squares has n - p degrees of freedom associated with it since p parameters are estimated in the regression model. The residual mean square is

MS*", : SS*",

ls:e:"Jn C.3 also sh fror of a: is si,

b LtsJ in the s im fin*rt-

htt 3J The Det

ls r.li .\llmate the e * . f , r r Jel i rc.n-t im

F \ _ r : [ ] . ,

a : ' - { r : r 'h is ar:cl- \l

It -r .kpenr * s Jrrrre(

th : :3i:-s

n - p r |tt rf scel

D ESTIMATION OF THE MODEL PARAMETERS 77

so an unbiased

(3 .18)

ficndix C.3 also shows that -tor of o2 is given by

6 2 :

62 : MS*",

fi ooted in the simple linear regression case, this estimator of c2 is model f:rdent.

hple 3.2 The Delivery Time Data

triU estimate the error variance az for the multiple regression model fit to the J drink delivery time data in Example 3.1. Since

: 18,310.6290"

L337,072.00 ) : 18,076.90304

S S * " r : Y ' Y - B ' x ' Y

: 18,3L0 .6290 - 18,076.9030 :233.7260

Drefore, the estimate of o2 is the residual mean square

25 y ' y : Dy?

n - p 2 5 - 3

llc MINITAB output in Table 3.4 reports the residual mean square as 10.6. The model-dependent nature of this estimate a2 may be easily demonstrated.

fture 2.13 displays the computer output from a least-squares fit to the delivery be data using only one regressor, cases (xr). The residual mean square for this Ddel is 17.5, which is considerably larger than the result obtained above for the bregressor model. Which estimate is "correct"? Both estimates are in a sense Grect, but they depend heavily on the choice of model. Perhaps a better question I shich model is correct? Since o2 is the variance of the errors (the unexplained fise about the regression line), we would usually prefer a model with a small rrilual mean square to a model with a large one.

12.5 Inadequacy of Scatter Diagrams in Multiple Regression

Je saw in Chapter 2 that the scatter diagram is an important tool in analyzing the dationship between y and r in simple linear regression. We also saw in Example t,l that a matrix of scatterplots was useful in visualizing the relationship between

r--l tJtl | ' lf. l t ][ ,' .J

10 30 50

Figure 3.7 A matrix of scatterplots.

y and two regressors. It is tempting to conclude that this is a general concept; that is, examining scatter diagrams of y versus xp ! versus xz,. .., y versus Jk is always useful in assessing the relationships between y and each of the regressors xp x2t. . ., xk. Unfortunately, this is not true in general.

Following Daniel and Wood [1980], we illustrate the inadequacy of scatter diagrams for a problem with t'wo regressors. Consider the data shown in Figure 3.7. These data were generated from the equation

Y : 8 - 5 x r * L 2 x ,

The matrix of scatterplots is shown in Figure 3.7.The y-versus-x, plot does not exhibit any apparent relationship between the two variables. The y-versus-x, plot indicates that a linear relationship exists, with a slope of approximately 8. Note that both scatter diagrams convey erroneous information. Since in this data set there are two pairs of points that have the same x, values (xz: 2 and xz:^ 4), we could measure the x, effect at fixed x, from both pairs. This gives, Fr: ( t l - 2 7 ) / ( 3 - 1 ) : - 5 f o r x z 2 a n d F t : Q S - L 6 ) / ( 6 - 8 ) : - 5 f o r x z : 4 the correct results. Knowing Fp we could now estimate the x2 effect. This procedure is not generally useful, however, because many data sets do not have duplicate points.

This example illustrates that constructing scatter diagrams of y versus xj (j : 1,2, . . . , k) can be misleading, even in the case of only two regressors operating in a perfectly additive fashion with no noise. A more realistic rejression situation with several regressors and error in the y's would confuse the situation even further. If there is only one (or a few) dominant regressor, or if the regressors

78 MULTIPLE LINEAR REGRESSION : FJTIMATION OF T

f,E:arc- nearly indep r?so s€r'eral importz o:rra:lms can be very Grc. r =L'rn'een severa]

I '^ \laximum-Likel

br rs in the simple I Lrr:( srtimators for tl -rt. crTors are nol r*fr..i:rrf,\. The mOde

r:f '.h

oro

X2X1

1 0 2 1 1 7 3 2 4 8 4 5 2 7 1 2 5 5 5 6 2 6 6 4 9 7 3 c o

1 6 8 4

[ , ' , ' ,

]

/ - t r . p . o

l . [ n I p . o : l =

n:r' 1"r3 a t-Uer

'ti.s-r.lrj est

nearly independently, the matrix of scatterplots is most useful. Flowever, several important regressors are themselves interrelated, then these scatter

can be very misleading. Analytical methods for sorting out the relation- between several regressors and a response are discussed in Chapter 9.

Maximum-Likelihood Estimation

c in the simple linear regression case, we can show that the maximum-likeli- estimators for the model parameters in multiple linear regression when the

errors are normally and independently distributed are also least-squares . The model is

Y : X B T E

the errors are normally and independently distributed with constant variance '. q e is distributed as N(0, o2I). The normal density function for the errors is

79

av z7r \ za- I

likelihood function is the joint density of a1, 82, . . . 1 Ent or I-Il:r fG). There- the likelihood function is

L(t, F, .- ') : n f (",) : ^+r, -*pf +r'r )' i : 1 . (2n ) " 'o ' ^

\ 2o ' I

since we can write r : y - X P, the likelihood function becomes

L(y,x, F, o\ : -+exp[- +(v - xp)'(y - xp))' ( 2 o ) ' / ' o n

' \ 2 0 "

F ) \ r ' - F I

in the simple linear regression case, it is convenient to work with the log of the

lnz(v , x ,B, . - ' ) : - ;h (2n) -n tn(o) - f i f t -xD

' (y - xp)

I b clear that for a fixed value of o the log-likelihood is maximized when the term

(v-xF) '0-xB)

i minimized. Therefore, the maximum-likelihood estimator of B under normal errors is equivalent to the least-squares estimator F : (X'X)-tX'y. The Drimum-liketihood estimator of o2 is

- ) o. - - ( y -xF) ' (v -xB)

80 MULTIPLE LINEAR REGRESSION

These are multiple linear regression generalizations of the results given for simple linear regression in Section 2.10. The statistical properties of the maximum-likeli- hood estimators are summarized in Section 2.L0.

3.3 I{YPOTHESIS TESTING IN MULTIPLE LINEAR RBGRESSION

Once we have estimated the parameters in the model, we face two immediate questions:

1. What is the overall adequacy of the model? 2. Which specific regressors seem important?

Several hypothesis testing procedures prove useful for addressing these questions. The formal tests require that our random errors be independent and follow a normal distr ibution with mean E(e,):0 and variance Var(e,): o2.

3.3.1 Test for Significance of Regression

The test for significance of regression is a test to determine if there is a linear relationship between the response y and any of the regressor variables x12 x.t xo. This procedure is often thought of as an overall or global test of model adequacy. The appropriate hypotheses are

H o : F r : F r : : 9 * : 0

Hr: B, + 0 for at least one /

Rejection of this null hypothesis implies that at least one of the regressors x11 x2 xo contributes significantly to the model.

The test procedure is a generalization of the analysis of variance used in simple linear regression. The total sum of squares SS., is partitioned into a sum of squares due to regression, SSp, and a residual sum of squares, SS*"r. Thus,

S S r : S S * + S S o " ,

Appendix C.3 shows that if the null hypothesis is true, then SS*/o2 follows & X;: distribution, which has the same number of degrees of freedom as number of regressor variables in the model. Appendix C.3 also shows that SS*.,/ o' - Xn2-t - and that SS*". and SS* are independent. By the definition of an F statistic given in Appendix C.1,

trr o - ssR/k MS*

follows the Fo.n-k-l distribution. Appendix C.3 shows that

E ( M S R . . ) : o '

E ( M S R ) : o ' + F* 'X'rX.. F*

kd

l j : ( F t , F

- - r c d m e r r ; i r tha t . :hcn F( l : r c J o m I

: : . : l i t r p i ' . - . 1 \ t

O n : ] l ' \L l tc t

. 1 . , ) | \