weighted least squares 2

8/11/2019 Weighted Least Squares 2

1/93

Weighted Least-

Squares Regression

A technique for correcting the

problem of heteroskedasticity by log-

likelihood estimation of a weight thatadjusts the errors of prediction

Weighted Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University


2/93

Key Concepts

*****

Weighted Least-Squares Regression

OLS

Parameter estimates as:

Unbiased

Efficient

BLUE

Theoretical Sampling distribution of b

Standard error of b

Relationship between the standard error of b and:

The variance of XThe residual sum of squares

The sample size

Gauss-Markov Theorem

Assumptions about the errors (e) in regression analysis and the

consequences of their violation:

e is uncorrelated with X

e has the same variance across all levels of X

The values of e are independent of each other

e is normally distributedThe concepts of homoskedasticity and heteroskedasticity of the

error distributions

The concept of autocorrelation or serial correlation

Spurious relationships

Collinear relationships

Intervening relationships

Techniques for identifying heteroskedasticity

Graphic

Statistical

Whites Test for heteroskedasticity

Rezidualizing a variable

Techniques for identifying WLS weights

Theory, the literature, or prior experience

Regression of e2on X and transformation

Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

2


3/93

Log-likelihood estimation of wi

SPSS weight estimation procedure

SPSS WLS>>procedure


3


4/93

Overview

Theoretical sampling distribution of b

Assumptions about errors in regression

Identifying heteroskedasticity

The concept of weighted least-squares

regression

Methods for estimating weights

Regressing ei2on X

Log-likelihood estimation of weights

Using WLS>> command in SPSS


4


5/93

References

White, Halbert (1980)A heteroskedasticity-consistent covariance matrix estimator and a

direct test for heteroskedasticity. Econometrica 48:817-838.

Graybill, Fraklin A. and Iyer, Hariharan K. (1994) Regression Analysis: Concepts and

Applications. Duxbury Press 571-592.

Freund, Rudolf J. and Wilson, William J. (1998) Regression Analysis: Statistical

Modeling of a Response Variable. Academic Press 378-382.

McClendon, McKee J. (1994) Multiple Regression and Causal Analysis. F. E. PeacockPublishers, Inc. 138-146, 174-181, 189-197.


5


6/93

Violation of OLS Regression Assumptions

Y = a + b1X1+ b2X2+ + bkXk

OLS regression makes various assumptions about

the errors that result from a regression model.

If these assumptions are met

One can assume that the estimates of the

regression constant (a) and the regression

coefficients (bk) are

Unbiased: Replications of the study will

yield values of a and bkwhich will be

distributed on either side of their respective

parametersandk

Efficient: The standard errors of a and

bkwill neither over- nor underestimate

their associated theoretical standard

errors


6


7/93

Violation of one or more of these assumptions may

lead to biased and/or inefficient estimates.


7


8/93

Theoretical Sampling Distribution of b

1

Population

Y =+X 2

3m

Theoretical sampling distribution of b b =

b

68.26%

Theoretical standard error of b

b= () / (SXn ) = ( Y Y)2/ N


8


9/93

SX= (X X)2/ N


9


10/93

The Theoretical Standard Error of b

b= () / (SX n )

The standard error (b) is directly related

to the standard deviation of the errors produced bythe model ()

The greater the errors produced by the

model, the greater the standard error

of b

The standard error(b)is inversely related to

the standard deviation of the predictor

variable (SX)

As the variability of X increases, thestandard error of b decreases

The standard error(b)is inversely related to


10


11/93

the sample size (n)

As the sample size increases, thestandard error of b decreases

Estimation of the Theoretical

Standard Error of b

The theoretical standard error of b (b) is usuallyestimated from a single sample, vis--vis a sampling

distribution of b.

SEb= (Se) / ( TSSX)

Se= RSS / (n- k)


11


12/93

TSSX=(X X)2


12


13/93

Gauss-Markov Theorem

b is an unbiased estimate of. On repeated

estimates, the distribution of b will be centered

around.

The sampling distribution of b will be normalif the samples are large and a sufficient number of

samples are taken.

OLS provides the best linear unbiased estimate of

(BLUE)

Best means:

OLS provides the most unbiased and

efficient estimate of.

Efficiency refers to the size of the

standard error of b (b); neither too largenor small.


13


14/93


14


15/93

The FourAssumptions About

Regression Error

e = (Y Y)

e = prediction error

e is uncorrelated with X, the independence

assumption.

e has the same variance (Se2) across the

different levels of X, i.e. the variance of e is

homoskedastic v heteroskedastic.

The values of e are independent of each

other, i.e. not autocorrelated or seriallycorrelated.


15


16/93

e is normally distributed.


16


17/93

The Problem of the Correlation of e & X

Y = a + bX

Spurious relationship: e and X may be correlated

because Z is a common cause of X and Y. In thiscase b is a biased estimate of.

spurious relationship

X Y

Z

Collinear Relationship: If X2is correlated with X1& Y

but is not the cause of either, b1will be a biased

estimate of1

X1 Y


17


18/93

X2


18


19/93

Correlation of e with X ( con'd )

Intervening Relationship: X2 intervenes in the

relationship between X1and Y. In this case b1will not

be a biased estimate of, but:

It will reflect both the direct and indirect

effects of X1on Y.

X1 X2 Y


19


20/93

Homoskedasticity of Errors (e) Over Levels

of X

The dotted lines represent the pattern of the

dispersion of the residuals.

0 0

Homoskedastic Heteroskedastic (+)RXSe2>0.0

0 0


20


21/93

Heteroskedastic (-) Heteroskedastic

RXSe2


22/93

Consequences of Heteroscedasticity

b will be an unbiased estimate of, but SEbwill be inefficient, too large or small.

SEb= (Y-Y )2/ (n-k)

TSSx

If SEbis overestimated, (RXSe20.0)


22


23/93

b will not be an efficient estimate ofand a

Type I error may occur, since

t = (b / SEb)


23


24/93


25/93

may occur.

If SEbis underestimated, a Type I errormay occur.


25


26/93

The Distribution of the Errors

OLS regression assumes that the errors of

prediction are normally distributed.

This can be tested by saving the errors and

Plotting

A histogram or

A normal probability plot

Histogram of errors


26

Std. Dev = 4.04

Mean = 22.9

N = 70.00

35.032.530.027.525.022.520.017.5

Errors as a function of predictions

Predictions

30

20

10

0


27/93

Distribution of errors ( con'd )

Normal probability plot of errors

If the errors are non-normally distributed

b may still be unbiased and efficient if

The homoskedasticity and independence

assumptions are met and the sample is large


27

Nora! pro"a"i!it# p!ot of errors

$"served %uu!ative Pro"a"i!it#

1.00.75.50.250.00

1.00

.75

.50

.25

0.00


28/93

If the sample is small, the use of the t distribution

in determining the significance of b and its

confidence interval will be biased.


28


29/93

Summary of Assumptions and

The Consequences of Their Violation

Assumption Violation Consequences

Errors correlated with X

Spurious relationship b biased estimate of

Collinear relationship b biased estimate of

Intervening relationship b unbiased estimate ofbut reflects both direct &

indirect effects

Heteroskedastisity

(RXSe20.0)

b unbiased but not

efficient, SEbtoo

small/large, Type I or II

error may result

Autocorrelated errors

b unbiased but not

efficient, SEbtoo

small/large, Type I or II

error may result

Errors non-normally

b may be unbiased if

homoskedasticity & in-


29


30/93

distributed dependence assumptions

met & N is large. If N is

small, t distribution may

be biased.


30


31/93

Heteroskedastic Errors and

Weighted Least-Squares Regression

If the errors are heteroskedastically distributed

The SEbmay inefficient, i.e. either too small or

large, which may lead to a Type I or IIerror

Ways to detect heteroskedasticity

Scatterplot of X against Y (prior to analysis)

Scatterplot of predictions against residuals, either

unstandardized or standardized

Scatterplot of X against residuals

Scatterplot of X against the absolute value

of the residuals (e)


31


32/93

Scatterplot of X against the squared

residuals (e2)

Whites Test for homoskedasticiy


32


33/93

Example

Scatterplot of X Against Y

Sentence length (Y) as a function of

drug dependency (X)

Heteroskedasticity

As drug score increases, the variability in

sentence increases


33

D&'S%$&E

1210()420

30

20

10

0


34/93

Example

Scatterplot of Predictions Against Residuals

Sentence length (Y) as a function of

drug dependency (X)

Heteroskedasticity

As predicted sentence becomes longer,


34

Predicted *a!ue

9(7)5432

20

10

0

+10


35/93

variability in residuals becomes greater.


35


36/93


37/93

Ho= residuals are homoskedastic

n = number of cases

df = number of independent variables

Whites Test for Heteroskedasticity (cont.)

Example

The regression of sentence on dr_score

R2= 0.06517

2= n R2= (70) (0.06517) = 4.56

df = 1

p


38/93

Reject the Hothat the residuals are

homoskedastic


38


39/93

How Does One Correct the Problem of

Heteroskedastic Errors?

Solution: Weighted Least-Squares Regression (WLS

Regression)

The logic of WLS Regression

Find a weight (wi)

That can be used to modify the influence of

large errors on the estimation of

The best fit values of

The regression constant (a)

The regression coefficients (bk)

OLS is designed to minimize:(Y Y)2

In WLS, values of a and bkare estimatedwhich minimize RSS =wi(Y Y)

2


39


40/93

This process has the effect of minimizing the

influence of a case with a large error on the

estimation of a and bk

And maximizing the influence of a case with a

small error on the estimation of a and bk


40


41/93

Techniques for Estimating a Suitable

Value of a Weight wi

From theory, the literature, or experience gained in

prior research.

Rarely will this approach prove successful,

except by trial and error

Estimate wiby regressing e2on theoffending

independent variable X and

Transforming the values of X and Y.

This is called residualizing the variable X.

Use log-likelihood estimation to determine a

suitable value of wi

This can be done in SPSS using theregression weight estimation procedure

coupled with the WLS>>procedure.


41


42/93

In the following case study both the residualizing

and SPSS WLS>>procedures will be demonstrated.


42


43/93

An Example

*****

The Relationship Between Drug Dependency &Length of Sentence

The model

Sentence = a + b (drug_score)

The results

Sentence = 1.97 + 0.6438 (drug_score)

For this model to be BLUE, the residuals must be

homoskedastic.

Q Are the residuals homoskedastic?


43


44/93

An Example (cont.)

Scatterplot of the residuals

Notice how the residuals become larger the

greater the degree of drug dependency. These

are heteroskedastic residuals.


44

,eteros-edastic &esidua!s

Standardied Predicted *a!ue

1.51.0.50.0+.5+1.0+1.5+2.0

5

4

3

2

1

0

+1

+2


45/93

Solving the Problem of Heteroskedasticity

Solution

Residualize the offending variable X

Steps in the process

1. Plot X against Y to determine the presence of

heteroskedasticity

2. Estimate the following regression equation

and save the residuals (e = Y Y). In SPSSthe residuals appear as res_1


45


46/93

Y = a + bX


46


47/93

Solving the Problem of Heteroskedasticity (cont.)

3. Square the residuals

e2= (res_1)

2= residsq

4. Regress residsq on X and save the predicted

residsq, in SPSS this is called pre_2

Residsq = a + bX

5. Transform X and Y, and compute a weight wi

called wtsqroot

wtX = X / pre_2


47


48/93

wtY = Y / pre_2

wtsqroot = 1 / pre_2

Solving the Problem of Heteroskedasticity (cont.)

6. Estimate the following weighted regression

equation through the origin, i.e. with a

regression constant equal to 0.0

wtY = a(wtsqroot) + b(wtX)


48


49/93

Step 1

Sentence length as a function of

drug dependency

SPSS scatterplot of sentence as a function of

dr_score . This can only be done when there are 2

or less IV.


49

Sentence as a function of dru/ dependenc#

Dru/ Dependenc#

1210()420

30

20

10

0


50/93

Heteroskedastic The variability in sentence

length increases as the degree of drug dependency

increases.


50


51/93

Step 2

Regress sentence length on drug

dependency, save the residuals (res_1) and

the predictions (pre_1)

sentence = 1.975 + 0.644 dr_score

R2= 0.12 (F = 9.24, p = 0.003)

SPSS results for Step 2

Regression

Variables Entered/Removedb

DR_SCOR

Ea . Enter

Model1

Variables

Entered

Variables

Removed Method

All requested variables entered.a.

Dependent Variable: SE!ECEb.


51


52/93

Step 2 (cont.)

Model Summaryb

."#$a .1%& .1&' #.$(1$

Model

1

R R Square

Ad)usted

R Square

Std. Error o*

the Estimate

+redi,tors: -Constant/ DR_SCOREa.


ANOVAb

%&%.01$ 1 %&%.01$ .%#& .&&"a

1#&."0$ $( %1.1'

1$%.('1 $

Re2ression

Residual

!otal

Model1

Sum o*

S uares d* Mean S uare 3 Si .



Coefficientsa

1.'0 1.#%0 1."($ .1'&.$## .%1% ."#$ ".& .&&"

-ConstantDR_SCORE

Model

1

4 Std. Error

5nstandardi6ed

Coe**i,ients

4eta

Standardi

6ed

Coe**i,ien

ts

t Si .

Dependent Variable: SE!ECEa.


52


53/93

Casewise Diagnosticsa

".0$ %0.&&Case umber$&

Std. Residual SE!ECE


Step 2 (cont.)

Residuals Statisticsa

%.$1(0 (.#1%( 0.0'1 1.'1"% '&

7'.#1%( 1(.01($ 7'.$1E71' #.$#'0 '&

71.# 1.#"" .&&& 1.&&& '&

71.0(" ".0$ .&&& ." '&

+redi,ted Value

Residual

Std. +redi,ted Value

Std. Residual

Minimum Ma8imum Mean Std. Deviation



53


54/93

N.B. The residuals are heteroskedastic. Compare

this scatterplot with the scatterplot of sentence as a

function of dr_score. Notice that the patters are the

same.


54

,eteros-edastic &esidua!s

Standardied Predicted *a!ue

1.51.0.50.0+.5+1.0+1.5+2.0

5

4

3

2

1

0

+1

+2


55/93

Step 3

Calculate the squared residuals

In SPSS, the unstandardized residuals are saved as

res_1.

Step 3 involves squaring the residuals by use of thedata transformation procedure in SPSS.

squared residual = (res_1)2= residsq

The SPSS syntax for this transformation is as

follows:

Residsq = res_1**2


55


56/93

The steps used in this transformation process are

described in the case study associated with this

module.

Step 3 (cont.)


pre_1 res_1 residsq

7.76901 -6.76901 45.82

8.41282 -7.41282 54.95

8.41282 -7.41282 54.95

6.48139 -5.48139 30.05

7.76901 -5.76901 33.28

7.12520 -5.12520 26.27

7.76901 -5.76901 33.28

8.41282 -6.41282 41.12

5.83758 -2.83758 8.057.76901 -4.76901 22.74


56


57/93

2.61852 -.61852 .38

2.61852 .38148 .15

3.26233 1.73767 3.025.19377 1.80623 3.26

8.41282 -.41282 .17

7.76901 1.23099 1.52

7.12520 2.87480 8.26

6.48139 5.51861 30.46

7.12520 6.87480 47.26

7.12520 7.87480 62.01


57


58/93

Step 4

Regress the squared residuals on the independent

variable dr-score and save the predictions as (pre_2)

residsq = -7.587 + 4.6685 dr_score

R2= 0.065 (F = 4.74, p = 0.0329)

This process is called residualizing a variable.

By OLS definition, the residuals (residsq)represent

the variance in Y that is unrelated to X.

Therefore, there should be no significant relationship

between X and residsq.

If there is, one or more OLS regression

assumptions have been violated.

In this case, the violated assumption is the

homoskedasticity of the residuals.


58


59/93

Step 4 (cont.)


Regression

Variables Entered/Removedb

DR_SCOR

Ea . Enter

Model1

Variables

Entered

Variables

Removed Method


Dependent Variable: RES9DSb.

Model Summaryb

.%00a .&$0 .&01 #'."0"Model1

R R S uare

Ad)usted

R S uare

Std. Error o*

the Estimate



ANOVAb

1&$#(.(&% 1 1&$#(.(&% #.'#1 .&""a

10%'#.# $( %%#$."101$""(.% $

Re2ression

Residual!otal

Model1

Sum o*

S uares d* Mean S uare 3 Si .



Coefficientsa

7'.0(' 1#.#%% 7.0%$ .$&1#.$$ %.1## .%00 %.1'' .&""

-ConstantDR_SCORE

Model

1

4 Std. Error

5nstandardi6ed

Coe**i,ients

4eta

Standardi

6ed

Coe**i,ien

ts

t Si .

Dependent Variable: RES9DSa.


59


60/93


61/93

Step 5

Compute the absolute value of pre_2 and three new

variables wtsent, wtdrug and the weight wtsqroot.

wtsent = (sentence) / abspre_2

wtdrug = (dr_score) / abspre_2

wtsqroot = (1) / abspre_2

pre_2 from the previous step is the information in the

squared residuals (residsq) that is related to the IVdr_score.

Dividing sentence and dr_score by pre_2 reduces

the influence of extreme values on the estimation of

a and b.

Finally a third transformation is performed by

creating the variable wtsqroot.


61


62/93

This will serve as a weighting factor.


62


63/93

Step 5 (cont.)


abspre_2 wtsent wtdr_sco wtsqroot

34.43. 17 1.53 .17

39.10. 16 1.60 .16

39.10. 16 1.60 .16

25.09 .20 1.40 .20

34.43 .34 1.53 .17

29.76 .37 1.47 .18

34.43 .34 1.53 .17

39.10 .32 1.60 .16

20.42 .66 1.33 .22

34.43 .51 1.53 .17

20.42 .66 1.33 .22

39.10 1.28 1.60 .16

34.43 1.53 1.53 .17

29.76 1.83 1.47 .18

25.09 2.40 1.40 .20

29.76 2.57 1.47 .18

29.76 2.75 1.47 .18


63


64/93

Step 6

Compute the WLS regression

wtsent = a(wtsqroot) + b (wtdrug)

Results:

wtsent = 1.159 (wtsqroot) + 0.7833 (wtdrug)

R2= 0.674 (F = 70.29, p


65/93

Step 6 (cont.)

Model Summarycd

.(%1b .$'# .$$# .$00Model1

R R S uarea

Ad)usted

R S uare

Std. Error o*

the Estimate

3or re2ression throu2h the ori2in -the no7inter,ept

model/ R Square measures the proportion o* the

variabilit= in the dependent variable about the ori2ine8plained b= re2ression. !his CAO! be ,ompared

to R Square *or models >hi,h in,lude an inter,ept.

a.

+redi,tors: ;!SROO!/ ;!DR_SCOb.

Dependent Variable: ;!SE!,.


66/93

Step 6 (cont.)

Casewise Diagnosticsab

".'$ #.Case umber$&

Std. Residual ;!SE!

Dependent Variable: ;!SE!a.


67/93

N.B. The heteroskedasticity has been reduced.


67

&esidua!s of tsent re/ressed on tdru/

tdru/

2.22.01.(1.)1.41.21.0

4

3

2

1

0

+1

+2


68/93

Comparison of the OLS v Residualized

Regression Models

Compare of scatterplots, notice thesubstantial

reduction of heteroskedasticity

Statistical results

Method a b SEa SEb R2 p

OLS 1.975 0.644 1.425 0.212 0.1196 0.0034

Resid-

ualized

1.159 0.783 0.605 0.139 0.6740 0.0001

The residualized model is more efficient, SEs are smaller.

Comparison of 95% confidence intervals

Method

95% Confidence

Interval Difference

OLS 0.221 to 1.066 0.845

Residualized 0.504 to 1.062 0.558


68


69/93

N.B. The width of the residualized 95% confidence interval

is less than that of the OLS interval.


69


70/93

An Alternative Procedure for Correcting

Heteroskedasticity

Log-Likelihood Estimation of wi

If it can be assumed that the variance in the DV

Is proportional to the IV,

Log-likelihood estimation can be used to

estimate wI.

In this case it is assumed that

Sy2(X)wor Sy

2(1 / Xw)

(is read proportional to)

In log-likelihood estimation of wi, the question is:


70


71/93

What power of X, i.e. wi, is most likely to have

produce the proportional relationship between

Sy2

and X ?


71


72/93

SPSS Weight Estimation and WLS>>

Regression Procedures

This procedure begins by using log-likelihood

estimation to iteratively determine a weight wI

To be used in estimating the values of the

regression constant (a) and the regressioncoefficient (b)

Such that the RSS is minimized.

RSS =[ (1 / Xwi) (Y Y)2]

This may solve the heteroskedasticity problem if:

Sy2(X)wor Sy

2(1 / Xw)


72


73/93


73


74/93

Step 1

Estimation of the weight wiusing the SPSS weight

estimation procedure

The result

The most likely weight = 1.8

The variance in sentence is estimated to be

Sy2

= (dr_score)

1.8

Regression equation

sentence = 0.94 + 0.83 (dr_score)


74


75/93

For a drug score of 6,the prediction would be

sentence = 0.94 + 0.83 (6) = 5.92 years

Step 1 (cont.)

Examination of weights for individual subjects

Subject dr_score Weight

Jones 10 1/(10)1.8= 0. 01585

Smith 1 1/(1)1.8= 1.00


75


76/93

Step 1 (cont.)

SPSS results

!eig"ted #east S$uaresMODEL: MOD_1.

Source variable.. DR_SCORE Dependent variable.. SENTENCE

Log-lieli!ood "unction # -$%&.&'%'() *O+ER value # -).'''Log-lieli!ood "unction # -$%%.&11')% *O+ER value # -$.,''Log-lieli!ood "unction # -$%1.&%&% *O+ER value # -$.''Log-lieli!ood "unction # -$),.,11)& *O+ER value # -$.%''Log-lieli!ood "unction # -$).('( *O+ER value # -$.$''Log-lieli!ood "unction # -$)).')'$, *O+ER value # -$.'''Log-lieli!ood "unction # -$)'.1,1(, *O+ER value # -1.,''Log-lieli!ood "unction # -$$&.)&%, *O+ER value # -1.''Log-lieli!ood "unction # -$$%.(&$(& *O+ER value # -1.%''Log-lieli!ood "unction # -$$1.,1') *O+ER value # -1.$''Log-lieli!ood "unction # -$1(.1)), *O+ER value # -1.'''Log-lieli!ood "unction # -$1.%()1& *O+ER value # -.,''Log-lieli!ood "unction # -$1).,&,(%' *O+ER value # -.''Log-lieli!ood "unction # -$11.)1& *O+ER value # -.%''Log-lieli!ood "unction # -$',.,1) *O+ER value # -.$''Log-lieli!ood "unction # -$'.)&(,, *O+ER value # .'''Log-lieli!ood "unction # -$'%.'$,&1$ *O+ER value # .$''Log-lieli!ood "unction # -$'1.&&&$%' *O+ER value # .%''Log-lieli!ood "unction # -1((.%&) *O+ER value # .''Log-lieli!ood "unction # -1(&.,)1, *O+ER value # .,''Log-lieli!ood "unction # -1(.,&$' *O+ER value # 1.'''Log-lieli!ood "unction # -1(%.)1)()( *O+ER value # 1.$''Log-lieli!ood "unction # -1().'%1$$ *O+ER value # 1.%''Log-lieli!ood "unction # -1($.1$&$ *O+ER value # 1.''Log-lieli!ood "unction # -1(1.,' *O+ER value # 1.,''Log-lieli!ood "unction # -1(1.&$,1,( *O+ER value # $.'''Log-lieli!ood "unction # -1($.%1) *O+ER value # $.$''Log-lieli!ood "unction # -1().(%'( *O+ER value # $.%''Log-lieli!ood "unction # -1(.)'$)& *O+ER value # $.''Log-lieli!ood "unction # -1((.$)'' *O+ER value # $.,''Log-lieli!ood "unction # -$').(%$$( *O+ER value # ).'''

T!e /alue o0 *O+ER Mai2i3ing Log-lieli!ood "unction # 1.,''

log-likelihood estimated weight wi = 1.8


76


77/93

Step 1 (cont.)

Estimation of the weighted regression model

Source variable.. DR_SCORE *O+ER value # 1.,''

Dependent variable.. SENTENCE

Li4t5i4e Deletion o0 Mi44ing Data

Multiple R .)%,R S6uare .%'1'7d8u4ted R S6uare .)()Standard Error .,%)&

7nal94i4 o0 /ariance:

D" Su2 o0 S6uare4 Mean S6uare

Regre44ion 1 )$.(% )$.(%Re4idual4 , %,.%'(,$ .&11('(

" # %.)'&$ Signi0 " # .''''

------------------ /ariable4 in t!e E6uation ------------------

/ariable SE eta T Sig T

DR_SCORE .,$,%&' .1$1&%& .)%&, .,' .'''';Con4tant< .()((&& .)(%,, $.),$ .'$''

Log-lieli!ood "unction # -1(1.,'

T!e 0ollo5ing ne5 variable4 are being created:

Na2e Label

+=T_1 +eig!t 0or SENTENCE 0ro2 +LS> MOD_1 DR_SCORE?? -1.,''

Weighted equation

Sentence = 0.9399 + 0.8285 (dr_score)

Unweighted equation


77


78/93

Sentence = 1.97 + 0.6438 (dr_score)


78


79/93

Step 2

Plot the relationship between dr_score

and the weight wi

The heteroscdasticity problem

Recall the previous scatterplot: as the value of drug scores

increases, the variance in sentences increases as well.

The log-likelihood estimated weight is such that

As the value of drug score increases, the weight adjusted

drug score (wgt_1) decreases.


79

Scatterp!ot of adusted dr'score and dr'score

ei/t adusted dr'score=1dr'score6 1.(

D&'S%$&E

1210()420

1.2

1.0

.(

.)

.4

.2

0.0


80/93


81/93

The weight of 1.8 reduces the effect of large

errors on the RSS providing a more efficient

estimate of the SEb.


81


82/93

Step 3

The SPSS WLS>>in linear regression

If an appropriate weight wiis already known by

another means

The WLS>>procedure in SPSS linear

regression can be use instead of the SPSS

weight estimation procedure

The procedure

Simply specify the regression model

Enter the known weight-variable under the

WLS>>command and estimate themodel

In this case, the weight variable wgt _1 from

Step 2 will be used


82


83/93

Step 3 (cont.)

The results of the WLS>>analysis using a weight-

variable wgt_1, wi= 1.8, with regression through the

origin

R2= 0.668, F = 139.14, p = 0.0001

sentence = 1.036 (dr_score)

SEb=0.087

N.B. Since this model does not include a constant

(a), the R2and the other statistical results can not be

compared with the associated values of a model that

does use a constant (a).


83


84/93

Step 3 (cont)

SPSS results

Regression

Variables Entered/Removedbc

DR_SCOR

Ea . Enter

Model1

Variables

Entered

Variables

Removed Method



;ei2hted


85/93

Step 3 (cont)

Coefficients

ab

.#& ."0 %."(% .&%&

.(%( .1%% .$"$ $.(&0 .&&&

-Constant

DR_SCORE

Model1

4 Std. Error

5nstandardi6ed

Coe**i,ients

4eta

Standardi

6ed

Coe**i,ien

ts

t Si .


;ei2hted


86/93

Step 3 (cont)

Saved predicted & residual values, and the weighted

values of dr_score (i.e. wgt_1)

wtg_1 = 1 / (dr_score)1.8

wgt_1 pre_1 res_1

%&'(') *%+(),' -.%+(),'%&'* (%,,0)* -*%,,0)*%&'* (%,,0)* -*%,,0)*%&+&', )%.+(,. -%.+(,.%&'(') *%+(),' -)%+(),'

%&,+)* .%)..0 -%)..0%&'(') *%+(),' -)%+(),'%&'* (%,,0)* -.%,,0)*%&+(. %('&*& -,%('&*&

%&,+)* .%)..0 ,%0+,,)%&+&', )%.+(,. %,)&.+%&,+)* .%)..0 )%0+,,)

%&,+)* .%)..0 .%0+,,)


86


87/93

Step 4

Variable transformations and

plot of the residuals

Unfortunately, the weighted residuals and predictions

produced by the SPSS weight estimation and

WLS>>procedures

Can not be directly graphed from the saved

residuals and predictions

The residuals and the predictions must first betransformed as follows:

Transformed residual = (res_1) (wt)0.5

Transformed prediction = (pre_1) (wt)0.5


87


88/93


88


89/93

Step 4 (cont.)

SPSS results

Compare the degree of heteroskedasticity in this

scatterplot with

The plot of the residuals from the un-weighted regression model.


89

Scatterp!ot of te 8ransfored &esidua!s

8ransfored ei/ted Predictions

1.41.31.21.11.0

4

3

2

1

0

+1

+2


90/93

Notice the substantial change in the degree of

heteroskedasticity.


90


91/93

Step 4 (cont.)

Transformed variables transres and transpre

transres = res_1*sqrt(wgt_1)

transpre = pre_1*sqrt(wgt_1)

transres transpre

-'%&, '%')-'%&0 '%')-'%&0 '%')-'%&& '%'.-%*( '%')-%*) '%')-%*( '%')-%(' '%')

%+. '%')%(' '%'.%(( '%')'%'0 '%')


91


92/93

Comparison of Results:

OLS, Residualized , and Log-Likelihood

Models

Method a b SEa SEb

OLS 1.975 0.644 1.425 0.212

Rezidu-

alized1.159 0.783 0.605 0.139

LogLike-

lihood

0.940 0.828 0.394 0.121

N.B. The standard errors of the residualized &

log-likelihood models are lower than the OLS model.

The log-likelihood model produces smaller standard

errors than the residualized model.


92


93/93

93

weighted least squares 2

Documents