3. linear least-squares regression · linear least-squares regression 2 2. introduction i despite...

32
Sociology 740 John Fox Lecture Notes 3. Linear Least-Squares Regression Copyright © 2014 by John Fox Linear Least-Squares Regression 1 1. Goals: I To review/introduce the calculation and interpretation of the least- squares regression coefficients in simple and multiple regression. I To review/introduce the calculation and interpretation of the regression standard error and the simple and multiple correlation coefficients. I To introduce and criticize the use of standardized regression coefficients I Time and interest permitting: To introduce matrix arithmetic and least-squares regession in matrix form. c ° 2014 by John Fox Sociology 740

Upload: vokien

Post on 29-Apr-2018

225 views

Category:

Documents


2 download

TRANSCRIPT

Sociology 740 John Fox

Lecture Notes

3. Linear Least-Squares Regression

Copyright © 2014 by John Fox

Linear Least-Squares Regression 1

1. Goals:I To review/introduce the calculation and interpretation of the least-

squares regression coefficients in simple and multiple regression.

I To review/introduce the calculation and interpretation of the regressionstandard error and the simple and multiple correlation coefficients.

I To introduce and criticize the use of standardized regression coefficients

I Time and interest permitting: To introduce matrix arithmetic andleast-squares regession in matrix form.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 2

2. IntroductionI Despite its limitations, linear least squares lies at the very heart of

applied statistics:• Some data are adequately summarized by linear least-squares

regression.• The effective application of linear regression is expanded by data

transformations and diagnostics.• The general linear model — an extension of least-squares linear

regression — is able to accommodate a very broad class of specifica-tions.

• Linear least-squares provides a computational basis for a variety ofgeneralizations (such as generalized linear models).

I This lecture describes the mechanics and descriptive interpretation oflinear least-squares regression.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 3

3. Simple Regression3.1 Least-Squares FitI Figure 1 shows Davis’s data on the measured and reported weight in

kilograms of 101 women who were engaged in regular exercise.• The relationship between measured and reported weight appears to

be linear, so it is reasonable to fit a line to the plot.

I Denoting measured weight by and reported weight by , a linerelating the two variables has the equation = + .• No line can pass perfectly through all of the data points. A residual, ,

reflects this fact.• The regression equation for the th of the = 101 observations is

= + +

= b +c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 4

40 45 50 55 60 65 70 75

4050

6070

Reported Weight (kg)

Mea

sure

d W

eigh

t (kg

)

Figure 1. A least-squares line fit to Davis’s data on reported and measuredweight. (The broken line is the line = .) Some points are over-plotted.c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 5

• The residual= b = ( + )

is the signed vertical distance between the point and the line, asshown in Figure 2.

I A line that fits the data well makes the residuals small.• Simply requiring that the sum of residuals,

P=1 , be small is futile,

since large negative residuals can offset large positive ones.• Indeed, any line through the point ( ) has

P= 0.

I Two possibilities immediately present themselves:• Find and to minimize the absolute residuals,

P | |, which leadsto least-absolute-values (LAV) regression.

• Find and to minimize the squared residuals,P

2, which leads toleast-squares (LS) regression.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 6

X

Y

0

Y A BX

Xi

Yi

Yi

Ei

(Xi, Yi)

Figure 2. The residual is the signed vertical distance between the pointand the line.c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 7

I In least-squares regression, we seek the values of and thatminimize:

( ) =X=1

2 =X( )2

• For those with calculus:– The most direct approach is to take the partial derivatives of the

sum-of-squares function with respect to the coefficients:( )

=X( 1)(2)( )

( )=X( )(2)( )

• Setting these partial derivatives to zero yields simultaneous linearequations for and , the normal equations for simple regression:

+X

=XX

+X

2 =X

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 8

• Solving the normal equations produces the least-squares coefficients:=

=

P P PP2 (

P)2

=

P( )( )P

( )2

– The formula for implies that the least-squares line passes throughthe point-of-means of the two variables. The least-squares residualstherefore sum to zero.

– The second normal equation implies thatP

= 0; similarly,P b = 0. These properties imply that the residuals are uncorre-lated with both the ’s and the b ’s.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 9

I For Davis’s data on measured weight ( ) and reported weight ( ):= 101

=5780

101= 57 23

=5731

101= 56 74X

( )( ) = 4435X( )2 = 4539

=4435

4539= 0 9771

= 57 23 0 9771× 56 74 = 1 789• The least-squares regression equation is

\Measured Weight = 1 79 + 0 977× Reported Weight

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 10

I Interpretation of the least-squares coefficients:• = 0 977: A one-kilogram increase in reported weight is associated

on average with just under a one-kilogram increase in measuredweight.– Since the data are not longitudinal, the phrase “a unit increase” here

implies not a literal change over time, but rather a static comparisonbetween two individuals who differ by one kilogram in their reportedweights.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 11

• Ordinarily, we may interpret the intercept as the fitted valueassociated with = 0, but it is impossible for an individual to have areported weight equal to zero.– The intercept is usually of little direct interest, since the fitted value

above = 0 is rarely important.– Here, however, if individuals’ reports are unbiased predictions of

their actual weights, then we should have b = — i.e., = 0. Theintercept = 1 79 is close to zero, and the slope = 0 977 is closeto one.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 12

3.2 Simple CorrelationI It is of interest to determine how closely the line fits the scatter of points.

I The standard deviation of the residuals, , called the standard error ofthe regression, provides one index of fit.• Because of estimation considerations, the variance of the residuals is

defined using 2 degrees of freedom:2 =

P2

2• The standard error is therefore

=

sP2

2

• Since it is measured in the units of the response variable, the standarderror represents a type of ‘average’ residual.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 13

• For Davis’s regression of measured on reported weight, the sum ofsquared residuals is

P2 = 418 9, and the standard error

=

r418 9

101 2= 2 05 kg.

• I believe that social scientists overemphasize correlation and payinsufficient attention to the standard error of the regression.

I The correlation coefficient provides a relative measure of fit: To whatdegree do our predictions of improve when we base that prediction onthe linear relationship between and ?• A relative index of fit requires a baseline — how well can be

predicted if is disregarded?– To disregard the explanatory variable is implicitly to fit the equation

= 0 + 0

– We can find the best-fitting constant 0 by least-squares, minimizing( 0) =

X 02 =X( 0)2

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 14

– The value of 0 that minimizes this sum of squares is the response-variable mean, .

• The residuals = b from the linear regression of onwill generally be smaller than the residuals 0 = , and it isnecessarily the case thatX

( b )2 X( )2

– This inequality holds because the ‘null model,’ = 0 + 0 isa special case of the more general linear-regression ‘model,’= + + , setting = 0.

• We call X 02 =X( )2

the total sum of squares for , abbreviated TSS, whileX2 =

X( b )2

is called the residual sum of squares, and is abbreviated RSS.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 15

• The difference between the two, termed the regression sum ofsquares,

RegSS TSS RSSgives the reduction in squared error due to the linear regression.

• The ratio of RegSS to TSS, the proportional reduction in squared error,defines the square of the correlation coefficient:

2 RegSSTSS

• To find the correlation coefficient we take the positive square root of2 when the simple-regression slope is positive, and the negative

square root when is negative.• If there is a perfect positive linear relationship between and , then

= 1.• A perfect negative linear relationship corresponds to = 1.• If there is no linear relationship between and , then RSS = TSS,

RegSS = 0, and = 0.c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 16

• Between these extremes, gives the direction of the linear relationshipbetween the two variables, and 2 may be interpreted as the proportionof the total variation of that is ‘captured’ by its linear regression on

.• Figure 3 depicts several different levels of correlation.

I The decomposition of total variation into ‘explained’ and ‘unexplained’components, paralleling the decomposition of each observation into afitted value and a residual, is typical of linear models.• The decomposition is called the analysis of variance for the regression:

TSS = RegSS + RSS• The regression sum of squares can also be directly calculated as

RegSS =X(b )2

I It is also possible to define by analogy with the correlation betweentwo random variables.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 17

(a) r 0 (b) r 0 (c) r 0.2

(d) r 0.5 (e) r 0.8 (f) r 1

Figure 3. Scatterplots showing different correlation coefficients . Panel(b) reminds us that measures linear relationship.c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 18

• First defining the sample covariance between and ,P( )( )

1• we may then write

= =

P( )( )qP( )2

P( )2

where and are, respectively, the sample standard deviations ofand .

I Some comparisons between and :• The correlation coefficient is symmetric in and , while the

least-squares slope is not.• The slope coefficient is measured in the units of the response

variable per unit of the explanatory variable. For example, if dollars ofincome are regressed on years of education, then the units of aredollars/year. The correlation coefficient , however, is unitless.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 19

• A change in scale of or produces a compensating change in, but does not affect . If, for example, income is measured in

thousands of dollars rather than in dollars, the units of the slopebecome $1000s/year, and the value of the slope decreases by a factorof 1000, but remains the same.

I For Davis’s regression of measured on reported weight,TSS = 4753 8

RSS = 418 87

RegSS = 4334 9

2 =4334 9

4753 8= 91188

• Since is positive, = + 91188 = 9549.• The linear regression of measured on reported weight captures 91

percent of the variation in measured weight.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 20

• Equivalently,

=4435 9

101 1= 44 359

2 =4539 3

101 1= 45 393

2 =4753 8

101 1= 47 538

=44 359

45 393× 47 538 = 9549

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 21

4. Multiple Regression4.1 Two Explanatory VariablesI The linear multiple-regression equationb = + 1 1 + 2 2

for two explanatory variables, 1 and 2, describes a plane in thethree-dimensional { 1 2 } space, as shown in Figure 4.• The residual is the signed vertical distance from the point to the plane:

= b = ( + 1 1 + 2 2)

• To make the plane come as close as possible to the points in theaggregate, we want the values of 1 and 2 that minimize the sumof squared residuals:

( 1 2) =X

2 =X( 1 1 2 2)

2

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 22

X1

X2

Y

(Xi1, Xi2, Yi)

(Xi1, Xi2, Yi)

Ei

1

1

A

B1

B2

Y A B1X1 B2X2

Figure 4. The multiple regression plane.c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 23

• Differentiating the sum-of-squares function with respect to the regres-sion coefficients, setting the partial derivatives to zero, and rearrangingterms produces the normal equations,

+ 1

P1 + 2

P2 =

PP1 + 1

P21 + 2

P1 2 =

P1P

2 + 1

P2 1 + 2

P22 =

P2

• This is a system of three linear equations in three unknowns, so itusually provides a unique solution for the least-squares regressioncoefficients 1 and 2.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 24

– Dropping the subscript for observations, and using asterisks todenote variables in mean-deviation form (e.g., ),

= 1 1 2 2

1 =

P1

P22

P2

P1 2P

21

P22 (

P1 2 )

2

2 =

P2

P21

P1

P1 2P

21

P22 (

P1 2 )

2

– The least-squares coefficients are uniquely defined as long asX21

X22 6= (

X1 2)

2

that is, unless 1 and 2 are perfectly correlated or unless one ofthe explanatory variables in invariant.

– If 1 and 2 are perfectly correlated, then they are said to becollinear.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 25

I An illustration, using Duncan’s occupational prestige data and regressingthe prestige of occupations ( ) on their educational and income levels( 1 and 2, respectively):

= 45P

21 = 38 971

=2146

45= 47 69

P22 = 26 271

1 =2365

45= 52 56

P1 2 = 23 182

2 =1884

45= 41 87

P1 = 35 152P2 = 28 383

• Substituting these results into the equations for the least-squarescoefficients produces = 6 070 1 = 0 5459 and 2 = 0 5987.

• The fitted least-squares regression equation is\Prestige = 6 07 + 0 546× Education + 0 599× Income

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 26

I A central difference in interpretation between simple and multipleregression: The slope coefficients for the explanatory variables in themultiple regression are partial coefficients, while the slope coefficient insimple regression gives the marginal relationship between the responsevariable and a single explanatory variable.• That is, each slope in multiple regression represents the ‘effect’ on the

response variable of a one-unit increment in the corresponding ex-planatory variable holding constant the value of the other explanatoryvariable.

• The simple-regression slope effectively ignores the other explanatoryvariable.

• This interpretation of the multiple-regression slope is apparent in thefigure showing the multiple-regression plane. Because the regressionplane is flat, its slope in the direction of 1, holding 2 constant, doesnot depend upon the specific value at which 2 is fixed.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 27

• Algebraically, fix 2 to the specific value 2 and see how b changesas 1 is increased by 1, from some specific value 1 to 1 + 1:

[ + 1( 1 + 1) + 2 2] ( + 1 1 + 2 2) = 1

• A similar result holds for 2.

I For Duncan’s regression:• A unit increase in education, holding income constant, is associated

on average with an increase of 0 55 units in prestige.• A unit increase in income, holding education constant, is associated

on average with an increase of 0 60 units in prestige.• The regression intercept, = 6 1, has the following literal interpreta-

tion: The fitted value of prestige is 6 1 for a hypothetical occupationwith education and income levels both equal to zero. No occupationshave levels of zero for both income and education, however, and theresponse variable cannot take on negative values.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 28

4.2 Several Explanatory VariablesI For the general case of explanatory variables, the multiple-regression

equation is= + 1 1 + 2 2 + · · · + +

= b +• It is not possible to visualize the point cloud of the data directly when

2, but it is simple to find the values of and the ’s that minimize( 1 2 )

=X=1

[ ( + 1 1 + 2 2 + · · · + )]2

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 29

• Minimization of the sum-of-squares function produces the normalequations for general multiple regression:

+ 1

P1 + 2

P2 + · · · + P

=PP

1 + 1

P21 + 2

P1 2 + · · · +

P1 =

P1P

2 + 1

P2 1 + 2

P22 + · · · + P

2 =P

2

···

···P

+ 1

P1 + 2

P2 + · · · +

P2 =

P

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 30

• Because the normal equations are linear, and because there are asmany equations as unknown regression coefficients ( + 1), there isusually a unique solution for the coefficients 1 2 .

• Only when one explanatory variable is a perfect linear function ofothers, or when one or more explanatory variables are invariant, willthe normal equations not have a unique solution.

• Dividing the first normal equation through by reveals that the least-squares surface passes through the point of means ( 1 2 ).

I To illustrate the solution of the normal equations, let us return to theCanadian occupational prestige data, regressing the prestige of theoccupations on average education, average income, and the percent ofwomen in each occupation.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 31

• The various sums, sums of squares, and sums of products that arerequired are given in the following table:

Variable Prestige Education Income PercentWomen

Prestige 253,618.Education 55,326. 12,513.Income 37,748,108. 8,121,410. 6,534,383,460.Percent Women 131,909. 32,281. 14,093,097. 187,312.Sum 4777. 1095. 693,386. 2956.• Substituting these values into the normal equations and solving for the

regression coefficients produces= 6 7943

1 = 4 1866

2 = 0 0013136

3 = 0 0089052

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 32

• The fitted regression equation is\Prestige = 6 794 + 4 187× Education

+ 0 001314× Income0 008905× Percent Women

• In interpreting the regression coefficients, we need to keep in mind theunits of each variable:

• Prestige scores are arbitrarily scaled, and range from a minimum of14.8 to a maximum of 87.2 for these 102 occupations; the hinge-spread of prestige is 24.4 points.

• Education is measured in years, and hence the impact of education onprestige is considerable — a little more than four points, on average,for each year of education, holding income and gender compositionconstant.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 33

• Despite the small absolute size of its coefficient, the partial effect ofincome is also substantial — about 0 001 points on average for anadditional dollar of income, or one point for each $1,000.

• The impact of gender composition, holding education and incomeconstant, is very small — an average decline of about 0.01 pointsfor each one-percent increase in the percentage of women in anoccupation.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 34

4.3 Multiple CorrelationI As in simple regression, the standard error in multiple regression

measures the ‘average’ size of the residuals.• As before, we divide by degrees of freedom, here ( +1) = 1

to calculate the variance of the residuals; thus, the standard error is

=

s P2

1

• Heuristically, we ‘lose’ + 1 degrees of freedom by calculating the+ 1 regression coefficients, 1 .

• For Duncan’s regression of occupational prestige on the income andeducational levels of occupations, the standard error is

=

r7506 7

45 2 1= 13 37

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 35

– The response variable here is the percentage of raters classifyingthe occupation as good or excellent in prestige; an average error of13 is substantial.

I The sums of squares in multiple regression are defined as in simpleregression:

TSS =X( )2

RegSS =X(b )2

RSS =X( b )2 =X 2

• The fitted values b and residuals now come from the multiple-regression equation.

• We also have a similar decomposition of variation:TSS = RegSS + RSS

• The least-squares residuals are uncorrelated with the fitted values andwith each of the ’s.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 36

I The squared multiple correlation 2 represents the proportion ofvariation in the response variable captured by the regression:

2 RegSSTSS• The multiple correlation coefficient is the positive square root of 2,

and is interpretable as the simple correlation between the fitted andobserved -values.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 37

I For Duncan’s regression,TSS = 43 687

RegSS = 36 181

RSS = 7506 7

• The squared multiple correlation is2 =

36 181

43 688= 8282

indicating that more than 80 percent of the variation in prestige amongthe 45 occupations is accounted for by its linear regression on theincome and educational levels of the occupations.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 38

4.4 Standardized Regression CoefficientsI Social researchers often wish to compare the coefficients of different

explanatory variables in a regression analysis.• When the explanatory variables are commensurable, comparison is

straightforward.• Standardized regression coefficients permit a limited assessment of

the relative effects of incommensurable explanatory variables.

I Imagine that the annual dollar income of wage workers is regressedon their years on education, years of labor-force experience, and someother explanatory variables, producing the fitted regression equation

\Income = + 1 × Education + 2 × Experience + · · ·• Since education and experience are measured in years, the coef-

ficients 1 and 2 are both expressed in dollars/year, and can bedirectly compared.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 39

I More commonly, explanatory variables are measured in different units.• In the Canadian occupational prestige regression, for example, the

coefficient for education is expressed in points (of prestige) per year;the coefficient for income is expressed in points per dollar; and thecoefficient of gender composition in points per percent-women.– The income coefficient (0.001314) is much smaller than the educa-

tion coefficient (4.187) not because income is a much less importantdeterminant of prestige, but because the unit of income (the dollar)is small, while the unit of education (the year) is relatively large.

– If we were to re-express income in $1000s, then we would multiplythe income coefficient by 1000.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 40

I Standardized regression coefficients rescale the ’s according to ameasure of explanatory-variable spread.• We may, for example, multiply each regression coefficient by the

hinge-spread of the corresponding explanatory variable. For theCanadian prestige data:

-spread×Education: 4 28× 4 187 = 17 92Income: 4131× 0 001314 = 5 4281Gender: 48 68× 0 008905 = 0 4335

• For other data, where the variation in education and income may bedifferent, the relative impact of the two variables may also differ, evenif the regression coefficients are unchanged.

• The following observation should give you pause: If two explanatoryvariables are commensurable, and if their hinge-spreads differ, thenperforming this calculation is, in effect, to adopt a rubber ruler.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 41

I It is much more common to standardize regression coefficients usingthe standard deviations of the explanatory variables rather than theirhinge-spreads.• The usual practice standardizes the response variable as well, but this

is inessential:= + 1 1 + · · · + +

= + 1 1 + · · · += 1( 1 1) + · · · + ( ) +

=

μ1

1

¶1 1

1

+ · · · +μ ¶

+

= 1 1 + · · · + +

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 42

– ( ) is the standardized response variable, linearlytransformed to a mean of zero and a standard deviation of one.

– 1 are the explanatory variables, similarly standardized.– is the transformed residual which, note, does not have a

standard deviation of one.– ( ) is the standardized partial regression coefficient for

the th explanatory variable.– The standardized coefficient is interpretable as the average change

in , in standard-deviation units, for a one standard-deviationincrease in , holding constant the other explanatory variables.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 43

I For the Canadian prestige regression,Education: 4 187× 2 728 17 20 = 0 6639Income: 0 001314× 4246 17 20 = 0 3242Gender: 0 008905× 31 72 17 20 = 0 01642

• Because both income and gender composition have substantiallynon-normal distributions, however, the use of standard deviations hereis difficult to justify.

I A common misuse of standardized coefficients is to employ them tomake comparisons of the effects of the same explanatory variable in twoor more samples drawn from populations with different spreads.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 44

5. Least-Squares Regression Using Matrices(time permitting)5.1 Basic DefinitionsI A matrix is a rectangular table of numbers or of numerical variables; for

example

X(4×3)

=

1 2 34 5 67 8 90 0 10

or, more generally,

A( × )

=

11 12 · · · 1

21 22 · · · 2... ... ...1 2 · · ·

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 45

• The first matrix is of order 4 by 3; the second of order by .

I a one-column matrix is called a column vector,

a( ×1)

=

1

2...

I Likewise, a matrix with just one row is called a row vector,b0 = [ 1 2 · · · ]

I The transpose of a matrix reverses rows and columns:

X(4×3)

=

1 2 34 5 67 8 90 0 10

X0(3×4)

=1 4 7 02 5 8 03 6 9 10

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 46

I A square matrix has equal numbers of rows and columns:

B(3×3)

=5 1 32 2 67 3 4

• The diagonal of a (generic) square matrix A of order consists of theentries 11 22 .

• For example, the diagonal of B consists of the entries 5, 2 and 4.

I A symmetric square matrix is equal to its transpose; thus B is notsymmetric, but the following matrix is:

C =5 1 31 2 63 6 4

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 47

5.2 Matrix ArithmeticI To be added or subtracted, matrices must be of the same order; matrices

are added, subtracted, and negated element-wise; for

A(2×3)

=1 2 34 5 6

¸and

B(2×3)

=5 1 23 0 4

¸we have

C(2×3)

= A+B =4 3 57 5 2

¸D(2×3)

= A B =6 1 11 5 10

¸E(2×3)

= A =1 2 34 5 6

¸

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 48

I The inner product of two vectors (each with entries), say a0(1× )

and

b( ×1)

, denoted a0 · b, is a number formed by multiplying corresponding

entries of the vectors and summing the resulting products:

a0 · b =X=1• For example,

[2 0 1 3] ·1609

= 2( 1) + 0(6) + 1(0) + 3(9) = 25

I Two matrices A and B are conformable for multiplication in the ordergiven (i.e., AB) if the number of columns of the left-hand factor (A) isequal to the number of rows of the right-hand factor (B).• Thus A and B are conformable for multiplication if A is of order( × ) and B is of order ( × ).

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 49

• For example,

1 2 34 5 6

¸(2×3)

1 0 00 1 00 0 1(3×3)

are conformable for multiplication, but1 0 00 1 00 0 1(3×3)

1 2 34 5 6

¸(2×3)

are not.

I Let C = AB be the matrix product; and let a0 be the th row ofA and bbe the th column of B.• Then C is a matrix of order ( × ) in which

= a0 · b =X=1

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 50

• Some examples:=

1 2 34 5 6(2×3)

1 0 00 1 00 0 1(3×3)

=1(1) + 2(0) + 3(0) 1(0) + 2(1) + 3(0) 1(0) + 2(0) + 3(1)4(1) + 5(0) + 6(0) 4(0) + 5(1) + 6(0) 4(0) + 5(0) + 6(1)

¸(2×3)

=1 2 34 5 6

¸

1 23 4

¸0 32 1

¸=

4 58 13

¸

0 32 1

¸1 23 4

¸=

9 125 8

¸c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 51

I A matrix that has ones down the main diagonal and zeroes elsewhereis called an identity matrix, and is symbolized by I; the identity matrix oforder 3 is symbolized by I3.• As in the previous example, multiplying a matrix by an identity matrix

recovers the original matrix.

I In scalar algebra, division is essential to the solution of simple equations.• For example,

6 = 12

=12

6= 2

or, equivalently,

1

6× 6 =

1

6× 12

= 2

where 16 = 6

1 is the scalar inverse of 6.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 52

I In matrix algebra, there is no direct analog of division, but most squarematrices have a matrix inverse.• The inverse of a square matrixA is a square matrix of the same order,

written A 1, with the property that AA 1 = A 1A = I.• If a square matrix has an inverse, then the matrix is termed non-

singular ; a square matrix without an inverse is termed singular.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 53

• For example, the inverse of the nonsingular matrix2 51 3

¸is the matrix

3 51 2

¸as we can verify:

2 51 3

¸3 51 2

¸=

1 00 1

¸X

3 51 2

¸2 51 3

¸=

1 00 1

¸X

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 54

I In scalar algebra, only the number 0 has no inverse, but there aresingular nonzero matrices.• Let us hypothesize that B is the inverse of the matrix

A =1 00 0

¸• But

AB =1 00 0

¸11 12

21 22

¸= 11 12

0 0

¸6= I2

I Matrix inverses can be used to solve systems of linear simultaneousequations.• For example:

2 1 + 5 2 = 4

1 + 3 2 = 5

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 55

• Writing these equations as a matrix equation,2 51 3

¸1

2

¸=

45

¸A(2×2)

x(2×1)

= b(2×1)

• ThenA 1Ax = A 1b

Ix = A 1b

x = A 1b

• For the example1

2

¸=

2 51 3

¸ 145

¸=

3 51 2

¸45

¸=

136

¸c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 56

5.3 Least-Squares RegressionI In scalar (i.e., single-number) form, multiple linear regression is written

as= + 1 1 + 2 2 + · · · + +

• There are such equations, one for each observation = 1 2 .

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 57

I We can write these equations as a single matrix equation:y

( ×1)= X( × +1)

b( +1×1)

+ e( ×1)

• y = ( 1 2 )0 is a vector of observations on the responsevariable

X =

1 11 · · · 1

1 21 · · · 2... ... ...1 1 · · ·

called the model (or design) matrix, contains the values of theexplanatory variables, with an initial column of ones for the regressionconstant (called the constant regressor ).

• b = ( 1 )0 contains the regression coefficient• e = ( 1 2 )0 is a vector of residuals.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 58

I The residual sum of squares is (b) = e0e.

I Minimizing the residual sum of squares leads to the least-squaresnormal equations in matrix form:

X0X( +1× +1)

b( +1×1)

= X0y( +1×1)

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 59

• This is a system of + 1 linear equations in the + 1 unknownregression coefficients b.

• The coefficient matrix for the system of equation,

X0X =

P1

P2 · · · PP

1

P2

P1 2 · · ·

P1P

2

P2 1

P22 · · · P 2

... ... ... . . . ...P P1

P2 · · ·

P2

contains sums of squares and cross-products among the columns ofthe model matrix.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 60

• The right-hand-side vector,X0y = [P P

1

P2

P]0

contains sums of cross-products between each column of the modelmatrix and the vector of responses.

• The sums of squares and products X0X and X0y can be calculateddirectly from the data.

I X0X is nonsingular if no explanatory variable is a perfect linear functionof the others.• Then the solution of the normal equations is

b = (X0X) 1X0y

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 61

6. SummaryI In simple linear regression, the least-squares coefficients are given by

=

P( )( )P

( )2

=

I The least-squares coefficients in multiple linear regression are foundby solving the normal equations for the intercept and the slopecoefficients 1 2 .

I The least-squares residuals, , are uncorrelated with the fitted values,b , and with the explanatory variables, 1 .

I The linear regression decomposes the variation in into ‘explained’ and‘unexplained’ components: TSS = RegSS + RSS.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 62

I The standard error of the regression,

=

s P2

1gives the ‘average’ size of the regression residuals.

I The squared multiple correlation,2 =

RegSSTSS

indicates the proportion of the variation in that is captured by its linearregression on the ’s.

I By rescaling regression coefficients in relation to a measure of variation— e.g., the hinge-spread or standard deviation — standardizedregression coefficients permit a limited comparison of the relativeimpact of incommensurable explanatory variables.

c° 2014 by John Fox Sociology 740

Linear Least-Squares Regression 63

I The least-squares regression coefficients can be calculated in matrixform as

b = (X0X) 1X0y

c° 2014 by John Fox Sociology 740