Download - Multiple Linear Regression - Geosciences

HAPTER 3

Multiple Linear Regression

-:gression model that involves more than one regressor variable is called a: -rltiple regression model. Fitting and analyzing these models is discussed in this

.rter. The results are extensions of those in Chapter 2 for simple linear- :css ion .

1 I \IULTIPLE REGRESSION MODELS

.-rose that the yield in pounds of conversion in a chemical process depends on" rcrature and the catalyst concentration. A multiple regression model that*:t describe this relationship is

l : F o - t F f i r + F z x r * e ( 3 . 1 )

-:e ,1, denotes the yield, x, denotes the temperature, and x2 denotes the,.rst concentration. This is a multiple linear regression model with two regres-

- '. ariables. The term linear is used because Eq. (3.1) is a linear function of the..:r()wn parameters Be, Br, and B,r.. ne regression model in Eq. (3.1) describes a plane in the three-dimensional.-: of l, xt, and xr. Figure3.la shows this regression plane for the model

E ( y ) : 5 0 * 1 0 x , t 7 x ,

;r-d we have assumed that the expected value of the error term e in Eq. (3.1) is- ,. The parameter Bo is the intercept of the regression plane. If the range of the: - r inc ludes 11 : xz :0 , then Fo is the mean o f y whe i l x t : x2 :0 . Otherw ise

has no physical interpretation. The parameter Fr indicates the expected,nse iil response (y) per unit change in x, when x, is held constant. Similarlymeasures the expected change in y per unit change in x, when x, is held

:r:triflt. Figure 3.1b shows a contour plot of the regression model, that is, lines of

.luction to Linear Regression Analysis, Fourth Edition.:)r)uglas C. Montgomery,F,lizabeth A. Peck, and Geoffrey Vining.', rieht @ 2006 John Wiley & Sons, Inc.

63

64

2402AO160

E(y)12o804A0

Figure 3.1

MULTIPLE LINEAR REGRESSION

6 I 10x1

(a) (b)

(a) ttre regression plane for the model E(y) :50 + L0rr -r 7xr. (b) The contour plot.

( 3 .2 )

(3 .3 |

(3 .4 )

( 3 . 5 )

0 L0

f L - : t t_ i i i ( ,R

| l * - , r - . l r

l l r

r.l_rS t

L u--*Oc- stl fu ::rrJ

t} fraa{tbtrlll

!j rr-..si atrFr :5al* grr: of

I tl:n I :o,r - -.: lnt,

k s atlMs $c1

ff i<.r.r

l - , i - Le t t

* " + & r , =

t r:rcn

r - 5 ,

Ll rnr --:c d

{ , r = t

constant expected response E(y) as a function of x, and xr. Notice that thecontour lines in this plot are parallel straight lines.

In general, the response y may be related to k regressor or predictor variables.The model

f : F o i - F f l t t F z x z + . . . + B p x p t e

! : F o * F f i * F z x z + B r x 3 + e

If we let xr: x, x2: x2, and .x3 : x3, then Eq. (3.3) can be written as

f : F o 1 - F f i r * F z x z * B r x r + . e

xE.Hir -J tiis called a multiple linear regression model with k regressors. The parameters F.,.

j :0,1,...,k, are called the regression coefficients. This model describes ahyperplane in the k-dimensional space of the regressor variables xr. The parame-ter Bt represents the expected change in the response y per unit change in xwhen all of the remaining regressor variables xi(i + /) are held constant. For thisreason the parameters Fi, j : 1,2, . . . , k, are often called partiat regressioncoefficients.

Multiple linear regression models are often used as empirical models orapproximating functions. That is, the true functional relationship between y andxy x2,. . ., xk is unknown, but over certain ranges of the regressor variables thelinear regression model is an adequate approximation to the true unknownfunction.

Models that are more complex in structure than Eq. (3.2) may often still beanalyzed by multiple linear regression techniques. For example, consider the cubicpolynomial model

which is a multiple linear regression model with three regressor variables. Polynomial models will be discussed in more detail in Chapter 7.

Models that include interaction effects may also be analyzed by multiple linearregression methods. For example, suppose that the model is

! : Bo - f F f i r t Fzxz * Fnx rx , * e

3.I MULTIPLE REGRESSION MODELS 65

xz1 0

I

6

4

1 02

0

(a) (b)

ftlre 3.2 (c) Three-dimensional plot of regression model E(y) :50 * 1.0x1 * 7x2 * 5xp2. (D) Theotour plot.

I we let x3 : xtxz and B3 : Ftz, then Eq. (3.5) can be written as

. y : F o * Q f l r * F z x z * B r x r * e

Itich is a linear regression model.Figure 3.2a shows the three-dimensional plot of the regression model

1 l : 5 0 * 1 0 x , * 7 x , l 5 x r x 2

rd Figu re 3.2b the corresponding two-dimensional contour plot. Notice that,*bough this model is a linear regression model, the shape of the surface that is;nerated by the model is not linear. In general, any regression model that ishrr in the parameters (the B's) is a linear regression model, regardless of the-pe of the surface that it generates.

Figure 3.2 provides a nice graphical interpretation of an interaction. Generally,beraction implies that the effect produced by changing one variable (xr, say)tpends on the level of the other variable (x). For example, Figure 3.2 shows that&nging x, from 2 to 8 produces a much smaller change in E(y) when xz : 2h whefl xz: 10. Interaction effects occur frequently in the study and analysisd real-world systems, and regression methods are one of the techniques that weC! use to describe them.

As a final example, consider the second-order model with interaction

! : Fo * Ftxr * Fzxz + Fnx? * Fzzx2z + Brrxrx, * e

(3.6)

(3.7)

f we le t x , . : x?, x4: x t r , x5 : x tx2, Fz: Fw F+: Br2, and Fs: Fp, thenD+ (3.7) can be written as a multiple linear regression model as follows:

! : F o * F f i r * F z x z * F z x t + . F + x q * B r x r * e

Figure 3.3 shows the three-dimensional plot and the corresponding contour plot

Nill

A0) : 800 * 1,0x, * 7x, - 8.5x2, - 5*7 * 4xrx,

66 MULTIPLE LINEAR REGRESSION

1 000

800

600E(v)-

400

200

0

(a) (b)

Figure 3.3 (a) Three-dimensional plot of the regression model E(y) -- 800 + 10xr * 7x2 - 8.5x1 -

5xj + 4xrxr . (b) The contour p lot .

These plots indicate that the expected change in y when x, is changed by one unit(say) is a function of both x, and xz. The quadratic and interaction terms in thismodel produce a mound-shaped function. Depending on the values of the regres-sion coefficients, the second-order model with interaction is capable of assuming awide variety of shapes; thus, it is a very flexible regression model.

In most real-world problems, the values of the parameters (the regressioncoefficients Fi) and the error variance o2 will not be known, and they must beestimated from sample data. The fitted regression equation or model is typicallyused in prediction of future observations of the response variable y or forestimating the mean response at particular levels of the y's.

3.2 ESTIMATION OF THE MODEL PARAMETERS

3.2.1 Least-Squares Estimation of the Regression CoefTicients

The method of least squares can be used to estimate the regression coefficients inF,q. (3.2). Suppose that n ) k observations are available, and leL y, denote the lthobserved response and x,, denote the ith observation or level of regressor xr. Thedata will appear as in Table 3.1. We assume that the error term s in the model hasE(e):0, Var( e) : cr2, and that the errors are uncorrelated.

Throughout this chapter we will assume that the regressor variablesx1, x2,..., xk are fixed (i.e., mathematical or nonrandom) variables, measuredwithout error. However, just as was discussed in Section 2.II for the simple linear

TABLE 3.1 Data for Multiple Linear Regression

RegressorsObservation,i

Response,

v X rx2x l

I t

I"

12

:n

x t t

*?.'X n l

X t t .

Xc r ,

:X n k

x t z'r'v . . .

n n 2

. . : \T ION OF THI

: model, a l l o1. rl variables. '' i ln observati

. \ \ hen the daI be fixed r,

" : : t i r t the obsr:t r.rttt depend

: - , thcses or cot ' i . r ' g i r en-____

, Li f lc l Vanan----------{-, , \ n tL ' t hC Sar

t , -

:

. . . . . r rcs func t i<

\ , / l R, " , . t _ 1 . . .

\ i lL rs t be n-

1 / 1

.,.\

,. 1

J . p t

\-

lcgression model, all of our results are still valid for the case where the regressorsre random variables. This is certainly important, because when regression datarb€ from an observational study, some or most of the regressors will be randomreriables' When the data result iio- a designed experiment, it is more likely that6e 1'5 will be fixed variables. When the x's are random variables, it is onlyDcessary that the observations on each regressor be independent and that the{srribution not depend on the regression c6efficients (the F's) or on o2. whenT$ing hypotheses or constructing -I.s, we will have to assume ,trui rtt.-.."0iffi"j€@r rtr%""".-.t r,+ --- r 1>kffivar:.rtncg o-..wem@regressionmodelcorreSpondingtoEq.(3.2)as

l: ESTIMATION OF THE MODEL PARAMETERS

l i : Fo * Ff i t * Fzxn + .. . * B1,x,p * e,k

: F o * D F , * , , I a ; , i : I , 2 , . . . , f lj : 1

Tbe least-squares function is

(3.8)

n n

s ( Fo , Fr , . . . , F , , ) : D t? : Ii : l i : L

The function ,s must be minimized with respect to Fo, Fr,. . ., Fr,.The least-squarescslimators of Fo, By. . ., Fr, must satisff

: 0 (3.10a)

simplifing Eq. (3.10), we obtain the least-squares normar equations

nEo + pr i x , * Br i x iz r . . . + Br i x ik : i r ,i : r i : r i : r i : l

n n f l n n

po I xn * BrDx?t + i . r f x i tx rz+ . . . +Br ix r txu , : I x i t t ii : r i : t i : I i : \ i : 1

( r , - Fo- 8u,. , , ) ' , r l

#1u,,8,, ,6o: -2i(' '- Bo- ,Lu,.,,)

#lr,,B,, ,Bo ',1,( ' ' - Eo

Eu'.")xri: o'

n

BrL xir, * BrD xir,x;ti : 7 i : 1 .

lbte that there are p : k + L

+ Er.f :*,**,r+ .. . * pri x?r,: i *,0y, (3.11)i : t i : L i : 1 ,

normal equations, one for each of the unknown


regression coefficients. The solution to the normal equations will be the least-rq,ru"es estimatorr 60, 8r,.. ., Eo.

It is more convenient to deal with multiple regression models if they areexpressed in matrix notation. This allows a very compact display of the model,data, and results. In matrix notation, the model given by Eq. (3.8) is

y :xF+e

where

: ESTI\TATION OF T

T.l :olve the nornI I' Thus. the least-

F:lrr*J rhat the inrfirr : the regressorr F r s a l i n e a r c o n

f s sas to see thir fu r-alar form (3.

I * rrrlr-r:sJ matrlrl H r n r i i l t i s o b t raG. G., _r -, L- o y

G r\c Jugonal e|} ns*r.rf I- andt*ctr n rhe col

i*lnr :t irrteJ r al

a r r ; - r r - r t l -5 tnrtrerJ r alr

tltr r .tnlra* *"ax rr\r

f r r *nE l i l l d r

frrru n

t n ;f i{}!\ prod* fry rcgris.io

v : x-

B: e -

In general, yis an n X L vectorof the observations, Xis an nxp matrixof thelevels of the regressor variables, B is a p x L vector of the regression coefficients,and e is an n X I vector of random errors.

We wish to find the vector of least-squares estimators, B, that minimizes

S(B) : i " : : e ' t : (y -Xn ' (y - xB)

i : I

Note that S( F) may be expressed as

S ( B ) : y ' y - F ' X ' y - y ' X B + B ' X ' I ^ P: y ,y - 2B ,X ,y * B ,X ,XB

since F'X'y is a 1X l matrix, or a scalar, and its transpose (F'X'y)':y'XB isthe same scalar. The least-squares estimators must satisfy

I|:l

|:l[ : : ]

xt t xtz

xzt xzz

xn l Xn2

,::lr

L x i t, - |: _ 1

n

\- ,.:L ' ' t l

: : .

4l- ' - r ,

t

f

-

which simplifies to

ds l- l : -2x 'y+ lx 'xp :oaF la

X 'XP : X 'Y (3.r2)

the matrixEquations (3.12) are the least-squares nonnal equations. They areanalogue of the scalar presentation in (3.11).

IT ESTIMATION OF THE MODEL PARAMETERS

To solve the normal equations, multiply both sides of (3.L2) by the inverse of

TI. Thus, the least-squares estimator of B is

F : (x'x) -tx'y(3.13)

;nided that the inverse matrix (X'X)-r exists. The (X'X)-l matrix will alwaysif the regressors are linearly independent, that is, if no column of the X

rrix is a linear combination of the other columns.It is easy to see that the matrix form of the normal equations (3.L2) is identicaltte scalar form (3.11). Writing ofi (3.ID in detail, wd obtain

69

n nF1 fan L x t L x i z

i : t i : ln n n

s r ! ? s rL x n L x i t L x i f i i z

i : l i : t i : 1 ,

: : :

n\-rL xir

i : ln

\-rL xnx*

i: 1.

:

n

Lv 'i :1 .

n

D x,r!,i : I

:n

L x,,,v,i : l

Fo

B,

:Fr

n n n ns - s r s r F _ 2L Xi* L X*Xtt L Xi*Xiz lJ * ik

j : 1 i : l i : I i : l

the indicated matrix multiplication is performed, the scalar form of the normalions (3.11) is obtained. In this display we see that X'X is a p xp symmetric

ix and X'y is a p x 1 column vector. Note the special structure of the X'Xir The diagonal elements of X'X are the sums of squares of the elements in

columns of X, and the off-diagonal elements are the sums of cross products ofelements in the columns of X. Furthermore, note that the elements of X'y aresrms of cross products of the columns of X and the observations y,.

The fitted regression model corresponding to the levels of the regressor vari-x ' : [1 , xp x2 t . . . , x2 ] i s

9 : * 'B : i .o t

vector of fitted vdlues j, corresponding to the observed values y, is

g :XF:x (x 'X ) - t x ' y :Hy

k

D fl,,j :1 .

(3.r4)

n x n matrix H : X(X'X)-lX' is usually called the hat matrix. It maps theof observed values into a vector of fitted values. The hat matrix and its

rties play a central role in regression analysis.The difference between the observed value y, and the corresponding fitted

fr is the residual €r : li - i. Th" n residuals may be conveniently written innotation as

e : y - X (3.1sa)


There are several other ways to express the vector of residuals e that will prove

useful, including

e : y - X p - y - H y : ( I - H ) V (3.1sb)

j :.lIt\t..\TIoN OF

Fgrn J.{

la*crr .-ao b€ v-lrr rrtrir of*rFmr:ia.rn.rl plOffrr i.rt:ern. Thrk"r r 31tr Of r.ilm I tri:-€F*-al SUn

tl rc .i r-rnabletmume intJ son{f: t :.t}-}n.

-e

. - 2 4 -

: .3 33 _

j ' ; 5 -

i : c <

3 0 0

{5r !^-t T}rce

Example 3.1 The Delivery Time Data

A soft drink bottler is analyzing the vending machine service routes in hisdistribution system. He is interested in predicting the amount of time required bythe route driver to service the vending machines in an outlet. This service activityincludes stocking the machine with beverage products and minor maintenance orhousekeeping. The industrial engineer responsible for the study has suggested thatthe two most important variables affecting the delivery time (y) are the number ofcases of product stocked (x1) and the distance walked by the route driver (x2). Theengineer has collected 25 observations on delivery time, which are shown in Table3.2. (Note that this is an expansion of the data set used in Example 2.9.) We will fitthe multiple linear regression model

! : F o * F f i r * B r x t * e

to the delivery time data in Table 3.2.

TABLE 3.2 Delivery Time Data for Example 3.1

Number ofCases,

X1

-LIIr

I

-

9 :- l

^ - ta , - , '

a l

_ ---

_ a

- l o

_g2(

Observation Delivery Time,Number v (min)

Distance,xr( f t )

1,23456789

101 112t31,415l6T71819202l22232425

16.6811.5012.0314.8813.7518.118.00

t7.8379.2421.5040.3321,.0013.5019.7524.0029.0015.3519.009.50

35.10L7.9052.3218.7519.83r0.75

7J

J

46727

305

r610469

1067aJ

171026984

56022034080

1503301102t0

1,4606056882r5255462448776200r3236

770t40810450635150

l:l ESTIMATION OF THE MODEL PARAMETERS

Time

79.24

55.49

31.751460

8.0030.00 Distance

cases 2'oo

Figure 3.5 Three-dimensional scatterplot of the delivery time data from Example 3.1.

7l

15 25

t---_lt-lI

t'*' ]l- ..'. ]

[- ,, ]b,T, i

Figure 3.4 Scatterplot matrix for the delivery time data from Example 3.1.

Graphics can be very useful in fitting multiple regression models. Figure 3.4 is arrlterplot matrix of the delivery time data. This is just a two-dimensional array ofbdimensional plots, where (except for the diagonal) each frame contains aGlter diagram. Thus, each plot is an attempt to shed.light on the relationshipbreen a pair of variables. This is often a better summary of the relationshipsh a numerical summary (such as displaying the correlation coefficients betweend pair of variables) because it gives a sense of linearity or nonlinearity of thedrtionship and some awareness of how the individual data points are arrangedtBr the region.

t-----ll ' lL r Jl o lt . l

L';i,:':,,,'l

F.']F*::''

',,, l

F-r-1-T--rT---r--rlI tt - tF oistriuution ]

F]f , , , , , , , , . l0 400 1000

flr 1

I case. Ir1t - lr..r.._..._-Jlr tr tt . .]l - ' . ' ' ll-. '.' iL . ' o o J

l : 3 , t i , , , J

L . lt lF {

l . . lt - tl- f. ' Il-:1 ll . - r ' r r l

r._1-r----..T-----]r . lL J

t l[ . . . ]f .t '

II a . IF iI t a I

l o a . I

Lf::' , , ,l

20 40 60 80

ro(\l

ro

oo

o(o

o$

()(\l

oc)or

oosf

o

o

l-:"":--4;ffi-2


When there are only two regressors, sometimes a three-dimensional scatterdiagram is useful in visualizing the relationship between the response and theregressors. Figure 3.5 presents this plot for the delivery time data. By spinningthese plots, some software packages permit different views of the point cloud. Thisview provides an indication that a multiple linear regression model may provide areasonable fit to the data.

To fit the multiple regression model we first form the X matrix and y vector:

2r93,055

L33,899 6,

I' ESTIMATION OF

lle least-squares el

lf

IJULE 33 OI

Obeervatio:l-umber

I234\;789

l0l tl :l-il {.t5l 6l -t 8l9l0: l:t -;t

lrx

x :

56022034080

1503301102r0

r460605688215255462448776200r3236

77014081045063s150

v :

16.6811.5012.0314.88L3.7518 .118.00

17.8379.242r.5040.332r.00r3.5019.7524.0029.0015.3519.009.50

35.1017.9052.3218.7519.83r0.75

1 , 7I 31 . 3t 4L 6L 7L 21 , 71 3 01 5I L 61 1 01 . 4L 6t 91 1 01 , 6T 7L 3L 1 71 1 0t 2 61 , 91 8L 4

[;:l: | 2Iu, J lrc,z:

:f r;|- -o o

|zs+t: I r.orsL 0.014

The X'X matrix is

X ' X :

and the X'y vector is

X'Y :

r0,nzf133,899 I725,688l

|+ \::: llfi ";iSl :| ,?l

f soo zzo ro Jli i ,ro I lto,ztz

f,,i ,,i::: ,,1]l:.::] : | ,,,,,iii,t2;l

ESTIMATION OF THE MODEL PARAMETERS

least-squares estimator of B is

F : (x'x)-tx'v

25 219

2r9 3,055

L33,899 6,

73

Lo,z32l -'

I ssl.oo Irr: ,aerl I t ,zts.++l72s,688J L zn,on.oo lL0,232[l;]

I o.rr zrsrl:

| - o.oo4448se

f -o.ooo08367

I z.z+rztr rs I: I r.orssonzllo.or+:s+al I

- o.oo4448se - o.oooos wlf sss.oo l

0.00274378 -0.00004786 ll

7,37s.44 |- 0.00004786 0.00000123 1L337,072.00 l

TABLE 3.3 Obsenations, Fitted Values, and Residuals for Example 3.1

ObservationNumber li

I2J

456789

10L Tt2t31.415t6t71879202l22232425

16.6811.5012.0314.8813.7518.118.00

17.8379.2421.5040.3321,.0013.5019.7524.0029.0015.3519.009.50

35.1017.9052.3218.7519.8310.75

21,.7081,10.353612.07989.9556

t4.194418.39967.1554

1,6.67347r.820319.123638.09252r.593012.473018.682523.328829.662914.9t36t5.55147.7068

40.888020.514256.006523.357624.4028r0.9626

-s.0287

1.t464-0.0498

4.9244-0.4444-0.2896

0.84461,.15667.41972.37642.2375

- 0.5930r.02701,.06750.6712

-0.6629

0.43643.4486t.7932

- 5.7880-2.6142- 3.6865- 4.6076-4.5728-0.2L26

74

TABLE 3.4 MINITAB Output for Soft Drink Time Data

MULTIPLE LINEAR REGRESSION rJ ESTIMATION 01

Regression Analysis: Time versus Cases, Distance

The regress ion.equat ion isT i m e = 2 . 3 4 + 1 . 6 2 c a s e s + 0 . 0 1 - 4 4 D i s t a n c e

S = 3 . 25947 R- Sq= 96 .OVo

Analys is o f Var iance

Source DFRegress ion 2Residual Er ror 22Tota l 24

SE Coef T1 - . 0 9 7 2 . r 3

0 . L 7 0 7 9 . 4 60 . 0 0 3 6 1 3 3 . 9 8

R- Sq (ad j 1 = 95 . 6Vo

SS MS5 5 5 0 . 8 2 7 7 5 . 4

2 3 3 . 7 r 0 . 65 7 8 4 . 5

Seq SS5 3 8 2 . 4

1 , 6 8 . 4

Predic torConstantCasesDis tance

SourceCasesDis tance

Coef2 . 3 4 L

r _ . 6 1 5 90 . 0 1 4 3 8 5

DFt-1

P0 . 0 4 40 . 0 0 00 . 0 0 r -

F

2 6 L . 2 4

P

0 . 0 0 0

a _ _ _

2 /

F

The least-squares fit (with the regression coefficients reported to five decimals) is

i :2.34123 + L.6L591,x1 + 0.01,438x'

Table 3.3 shows the observations y, along with the corresponding fitted values f,and the residuals e, from this model.

Computer OutputTable 3.4 presents a portion of the MINITAB output for the soft drink deliverytime data in Example 3.1,. While the output format differs from one computerprogram to another, this display contains the information typically generated. Mostof the output in Table 3.4 is a straightforward extension to the multiple regressioncase of the computer output for simple linear regression. In the next few sectionswe will provide explanations of this output information.

3.2.2 A Geometrical Interpretation of Least Squares

An intuitive geometrical interpretation of least squares is sometimes helpful. Wemay think of the vector of observations y' :l ly!2,...,!rl as defining a vectorfrom the origin to the point A in Figure 3.6. Note that f y !2,. .. , yn form thecoordinates of an n-dimensional sample space. The sample space in Figure 3.6 isthree-dimensional.

The X matrix consists of p (n x L) column vectors, for example, L (a columnvector of L's), xpx2,. . . , Xk. Each of these columns defines a vector from the originin the sample space. These p vectors form a p-dimensional subspace called the

ftln space. TrFcnt any poirf rr., - - .. rr. Thus,E Ip determirJ r p r

lre. minimiziE t lo the estinfL r dmest to A.m space isI:m space. Trt- It Thererore.||F- rc may write

x recognize

lfrrpcrties of tl

Frtal prop€Cmider fin

E( B)

& t s ' = 0 a n d

ESTIMATION OF THE MODEL PARAMETERS 75

z _ _ _

Figure 3.6 A geometrical interpretation of least squares.

space. The estimation space for p : 2 is shown in Figure 3.6. We maynt any point in this subspace by a linear combination of the vectors

r,...,x0. Thus, any point in the estimation space is of the form XB.I-et theXB determine the point B in Figure 3.6. The squared distance from B to

b just

s ( B) : (y - xp) ' (y - xB)

bre, minimizing the squared distance of point ,4 defined by the observationlr y to the estimation space requires finding the point in the estimation spaceis closest to A. The squared distance will be a minimum when the point in the

imation space is the foot of the line from A normal (or perpendicular) to theimation space. This is point C in Figure 3.6. This point is defined by the vector

- XP. Therefore, since y - f : y - XB is perpendicular to the estimationwe mav wnte

X ' (y -xB) :o or X'XB : X'y

we recognize as the least-squares normal equations.

Properties of the Least-Squares Estimators

statistical properties of the least-squares estimatorF may be easily demon-lrated. Consider first bias:

E(B): a[{x 'x)- 'v 'v l : r [1x'x)- 'x ' (xp * ' ) l

: r [1x'x)- tx 'xB + (x 'x)- tx ' t l : B

-rc€ E(e):0 and (x'x)-lx'x: I. Thus, B ir an unbiased estimatorof B.


The variance property of f it "*ptessed

by the covariance matrix

cov( p):u{ lp-E(P)I t p-n(P)l ' )

which is a p X p symmetric matrix whose 7th diagonal element is the varianceo1 E,and whos" tA)tft off-$iagonal element is the covariance between B,^and p,.

The covariance matrix of P is found by applying a variance operator to F;

cov( p) : var( B) : var[1x'x)- 'x'yl

Now (X'X)-iX'is a matrix of constants, and the variance of y is o2I, so

var( p) : var[(x'x) - 'v'vl : (x'x) - lx' var(y) [(x'x)

- '* ' ]

: a l (x ,x) - tx ,X1x,x) - t : o t lx 'x)

- t

Therefore, if we t^et C : (x'X)*l, the variance o1 E, is ozC,, and the covariancebetween B, and F; is o'Cii.

Appendix C.4 dstablishes that the least-squares estimator B is the best linearunbiased estimator of B (the Gauss-Markov theorem). If we further assume thatthe errors s, are normally distributed, then as we will see in Section 3.2.6, p isalso the maximum-tikelihood estimator of B. The maximum-likelihood estimator isthe minimum variance unbiased estimator of B.

3.2.4 Estimation of o2

As in simple linear regression, we may devetop an estimator of o2 from theresidual sum of squares

ss*", i0,-9,) ' : i r f : e 'ei : T i : I

Substituting e : y - X B, *"have

SS*" , : ( v - I ^B) ' ( v -xB)

:y ,y - B ,X ,y -y ,XB* B ,X ,XB

: y,y - ZB,X,y * B,X,XB

Since X'XP : X'y, this last equation becomes

SS*", : y 'y - B'x'y (3.16)

Appendix C.3 shows that the residual sum of squares has n - p degrees offreedom associated with it since p parameters are estimated in the regressionmodel. The residual mean square is

MS*", :SS*",

j :-STI\|-{TIO\ OF T

ls:e:"Jn C.3 also shfror of a: is si,

b LtsJ in the s imfin*rt-

htt 3J The Det

ls r.li .\llmate the e* . f , r r Jel i rc.n-t im

F \ _ r : [ ] . ,

= 1 8 . 0

* ur-:r-r- rJm of sql

-S-S*..

htrc 's stlmat(

! mf: \B '"::tDUrL

- ---_\E-S-n,lel

(3.r7)

l:i scrc'-lrs thr* rq inh or

a : ' - { r : r 'h isar:cl- \l

It -r .kpenr* s Jrrrre(

th : :3i:-s

-Ir r;;lre t(

-Fr-r of Sce

r {}gs: I thrh : * c : t ' i

n - p r |tt rf scel

D ESTIMATION OF THE MODEL PARAMETERS 77

so an unbiased

(3 .18)

ficndix C.3 also shows that-tor of o2 is given by

6 2 :

the expected value of MS*", is o2,

62 : MS*",

fi ooted in the simple linear regression case, this estimator of c2 is modelf:rdent.

hple 3.2 The Delivery Time Data

triU estimate the error variance az for the multiple regression model fit to theJ drink delivery time data in Example 3.1. Since

: 18,310.6290"

' ^ | 559.601B'X'y : lz.z+tzztts r.6l5s0721 0.01438483I | 7 p7s.44 |

L337,072.00 ): 18,076.90304

t residual sum of squares is

S S * " r : Y ' Y - B ' x ' Y

: 18,3L0 .6290 - 18,076.9030 :233.7260

Drefore, the estimate of o2 is the residual mean square

25y ' y : Dy?

i : r

SS*", - 233.7260 : 10.6239

n - p 2 5 - 3

llc MINITAB output in Table 3.4 reports the residual mean square as 10.6.The model-dependent nature of this estimate a2 may be easily demonstrated.

fture 2.13 displays the computer output from a least-squares fit to the deliverybe data using only one regressor, cases (xr). The residual mean square for thisDdel is 17.5, which is considerably larger than the result obtained above for thebregressor model. Which estimate is "correct"? Both estimates are in a senseGrect, but they depend heavily on the choice of model. Perhaps a better questionI shich model is correct? Since o2 is the variance of the errors (the unexplainedfise about the regression line), we would usually prefer a model with a smallrrilual mean square to a model with a large one.

12.5 Inadequacy of Scatter Diagrams in Multiple Regression

Je saw in Chapter 2 that the scatter diagram is an important tool in analyzing thedationship between y and r in simple linear regression. We also saw in Examplet,l that a matrix of scatterplots was useful in visualizing the relationship between

r--ltJtl| ' lf. lt ][ ,' .J

10 30 50

Figure 3.7 A matrix of scatterplots.

y and two regressors. It is tempting to conclude that this is a general concept; thatis, examining scatter diagrams of y versus xp ! versus xz,. .., y versus Jk isalways useful in assessing the relationships between y and each of the regressorsxp x2t. . ., xk. Unfortunately, this is not true in general.

Following Daniel and Wood [1980], we illustrate the inadequacy of scatterdiagrams for a problem with t'wo regressors. Consider the data shown in Figure 3.7.These data were generated from the equation

Y : 8 - 5 x r * L 2 x ,

The matrix of scatterplots is shown in Figure 3.7.The y-versus-x, plot does notexhibit any apparent relationship between the two variables. The y-versus-x, plotindicates that a linear relationship exists, with a slope of approximately 8. Notethat both scatter diagrams convey erroneous information. Since in this data setthere are two pairs of points that have the same x, values (xz: 2 and xz:^ 4),we could measure the x, effect at fixed x, from both pairs. This gives, Fr:( t l - 2 7 ) / ( 3 - 1 ) : - 5 f o r x z ̂ 2 a n d F t : Q S - L 6 ) / ( 6 - 8 ) : - 5 f o r x z : 4the correct results. Knowing Fp we could now estimate the x2 effect. Thisprocedure is not generally useful, however, because many data sets do not haveduplicate points.

This example illustrates that constructing scatter diagrams of y versus xj(j : 1,2, . . . , k) can be misleading, even in the case of only two regressorsoperating in a perfectly additive fashion with no noise. A more realistic rejressionsituation with several regressors and error in the y's would confuse the situationeven further. If there is only one (or a few) dominant regressor, or if the regressors

78 MULTIPLE LINEAR REGRESSION : FJTIMATION OF T

f,E:arc- nearly indepr?so s€r'eral importzo:rra:lms can be veryGrc. r =L'rn'een severa]

I '^ \laximum-Likel

br rs in the simple ILrr:( srtimators for tl-rt. crTors are nolr*fr..i:rrf,\. The mOde

r:f '.h

c : r fSrrors are nolis distributed

oro

o(f)

X2X1

1 0 2 11 7 3 24 8 4 52 7 1 25 5 5 62 6 6 49 7 3 c o

1 6 8 4

oF

[----]rIl."ltitiL"l

r--ltJl..ll-.

' .l

r-----llt lr " lt tr '1t l

[ , ' , ' ,

]

* tis*:..xxt functior*, r€ ielihood fur

/ - t r . p . o

b rr[= Lc jan u'ritt

. r r . I . F . o . )

' |tr * s'-; '13 l inear

l . [ n I p . o : l =

n:r' 1"r3 a t-Uer

: Tkretorta : l J ; * 1 3 ^ C n t

'ti.s-r.lrj est

ESTIMATION OF THE MODEL PARAMETERS

nearly independently, the matrix of scatterplots is most useful. Flowever,several important regressors are themselves interrelated, then these scatter

can be very misleading. Analytical methods for sorting out the relation-between several regressors and a response are discussed in Chapter 9.

Maximum-Likelihood Estimation

c in the simple linear regression case, we can show that the maximum-likeli-estimators for the model parameters in multiple linear regression when the

errors are normally and independently distributed are also least-squares. The model is

Y : X B T E

the errors are normally and independently distributed with constant variance'. q e is distributed as N(0, o2I). The normal density function for the errors is

79

1 I 1 _ \f ( " , ) : , ^ exp l - ^ , re f l

av z7r \ za- I

likelihood function is the joint density of a1, 82, . . . 1 Ent or I-Il:r fG). There-the likelihood function is

L(t, F, .- ') : n f (",) : ^+r, -*pf +r'r )' i : 1 . (2n ) " 'o ' ^

\ 2o ' I

since we can write r : y - X P, the likelihood function becomes

L(y,x, F, o\ : -+exp[- +(v - xp)'(y - xp))' ( 2 o ) ' / ' o n

' \ 2 0 "

F ) \ r ' - F I

in the simple linear regression case, it is convenient to work with the log of the

lnz(v , x ,B, . - ' ) : - ;h (2n) -n tn(o) - f i f t -xD

' (y - xp)

I b clear that for a fixed value of o the log-likelihood is maximized when the term

(v-xF) '0-xB)

i minimized. Therefore, the maximum-likelihood estimator of B under normalerrors is equivalent to the least-squares estimator F : (X'X)-tX'y. TheDrimum-liketihood estimator of o2 is

- )o. - - ( y -xF) ' (v -xB)


These are multiple linear regression generalizations of the results given for simplelinear regression in Section 2.10. The statistical properties of the maximum-likeli-hood estimators are summarized in Section 2.L0.

3.3 I{YPOTHESIS TESTING IN MULTIPLE LINEAR RBGRESSION

Once we have estimated the parameters in the model, we face two immediatequestions:

1. What is the overall adequacy of the model?2. Which specific regressors seem important?

Several hypothesis testing procedures prove useful for addressing these questions.The formal tests require that our random errors be independent and follow anormal distr ibution with mean E(e,):0 and variance Var(e,): o2.

3.3.1 Test for Significance of Regression

The test for significance of regression is a test to determine if there is a linearrelationship between the response y and any of the regressor variablesx12 x.t xo. This procedure is often thought of as an overall or global test ofmodel adequacy. The appropriate hypotheses are

H o : F r : F r : : 9 * : 0

Hr: B, + 0 for at least one /

Rejection of this null hypothesis implies that at least one of the regressorsx11 x2 xo contributes significantly to the model.

The test procedure is a generalization of the analysis of variance used in simplelinear regression. The total sum of squares SS., is partitioned into a sum ofsquares due to regression, SSp, and a residual sum of squares, SS*"r. Thus,

S S r : S S * + S S o " ,

Appendix C.3 shows that if the null hypothesis is true, then SS*/o2 follows & X;:distribution, which has the same number of degrees of freedom as number ofregressor variables in the model. Appendix C.3 also shows that SS*.,/ o' - Xn2-t -and that SS*". and SS* are independent. By the definition of an F statistic givenin Appendix C.1,

trr o -ssR/k MS*

S S * . . / ( n - k - 1 ) M S * " ,

follows the Fo.n-k-l distribution. Appendix C.3 shows that

E ( M S R . . ) : o '

E ( M S R ) : o ' +F* 'X'rX.. F*

kd

.,I I |HESIS TEI

l j : ( F t , F

- - r c d m e rr ; i r tha t. :hcn F(l : r c J o m I

: : . : l i t r p i' . - . 1 \ t

O n: ] l ' \L l tc t

. 1 . , ) | \

' ' : t . r l f ,

r . j . 1 ; f

. . t t , )

'il'irnr'!3,.' "{HEFflIll

Download - Multiple Linear Regression - Geosciences

Top Related