weighted least squares 2

Upload: bhavinmaheta2552

Post on 02-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Weighted Least Squares 2

    1/93

    Weighted Least-

    Squares Regression

    A technique for correcting the

    problem of heteroskedasticity by log-

    likelihood estimation of a weight thatadjusts the errors of prediction

    Weighted Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

  • 8/11/2019 Weighted Least Squares 2

    2/93

    Key Concepts

    *****

    Weighted Least-Squares Regression

    OLS

    Parameter estimates as:

    Unbiased

    Efficient

    BLUE

    Theoretical Sampling distribution of b

    Standard error of b

    Relationship between the standard error of b and:

    The variance of XThe residual sum of squares

    The sample size

    Gauss-Markov Theorem

    Assumptions about the errors (e) in regression analysis and the

    consequences of their violation:

    e is uncorrelated with X

    e has the same variance across all levels of X

    The values of e are independent of each other

    e is normally distributedThe concepts of homoskedasticity and heteroskedasticity of the

    error distributions

    The concept of autocorrelation or serial correlation

    Spurious relationships

    Collinear relationships

    Intervening relationships

    Techniques for identifying heteroskedasticity

    Graphic

    Statistical

    Whites Test for heteroskedasticity

    Rezidualizing a variable

    Techniques for identifying WLS weights

    Theory, the literature, or prior experience

    Regression of e2on X and transformation

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    2

  • 8/11/2019 Weighted Least Squares 2

    3/93

    Log-likelihood estimation of wi

    SPSS weight estimation procedure

    SPSS WLS>>procedure

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    3

  • 8/11/2019 Weighted Least Squares 2

    4/93

    Overview

    Theoretical sampling distribution of b

    Assumptions about errors in regression

    Identifying heteroskedasticity

    The concept of weighted least-squares

    regression

    Methods for estimating weights

    Regressing ei2on X

    Log-likelihood estimation of weights

    Using WLS>> command in SPSS

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    4

  • 8/11/2019 Weighted Least Squares 2

    5/93

    References

    White, Halbert (1980)A heteroskedasticity-consistent covariance matrix estimator and a

    direct test for heteroskedasticity. Econometrica 48:817-838.

    Graybill, Fraklin A. and Iyer, Hariharan K. (1994) Regression Analysis: Concepts and

    Applications. Duxbury Press 571-592.

    Freund, Rudolf J. and Wilson, William J. (1998) Regression Analysis: Statistical

    Modeling of a Response Variable. Academic Press 378-382.

    McClendon, McKee J. (1994) Multiple Regression and Causal Analysis. F. E. PeacockPublishers, Inc. 138-146, 174-181, 189-197.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    5

  • 8/11/2019 Weighted Least Squares 2

    6/93

    Violation of OLS Regression Assumptions

    Y = a + b1X1+ b2X2+ + bkXk

    OLS regression makes various assumptions about

    the errors that result from a regression model.

    If these assumptions are met

    One can assume that the estimates of the

    regression constant (a) and the regression

    coefficients (bk) are

    Unbiased: Replications of the study will

    yield values of a and bkwhich will be

    distributed on either side of their respective

    parametersandk

    Efficient: The standard errors of a and

    bkwill neither over- nor underestimate

    their associated theoretical standard

    errors

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    6

  • 8/11/2019 Weighted Least Squares 2

    7/93

    Violation of one or more of these assumptions may

    lead to biased and/or inefficient estimates.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    7

  • 8/11/2019 Weighted Least Squares 2

    8/93

    Theoretical Sampling Distribution of b

    1

    Population

    Y =+X 2

    3m

    Theoretical sampling distribution of b b =

    b

    68.26%

    Theoretical standard error of b

    b= () / (SXn ) = ( Y Y)2/ N

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    8

  • 8/11/2019 Weighted Least Squares 2

    9/93

    SX= (X X)2/ N

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    9

  • 8/11/2019 Weighted Least Squares 2

    10/93

    The Theoretical Standard Error of b

    b= () / (SX n )

    The standard error (b) is directly related

    to the standard deviation of the errors produced bythe model ()

    The greater the errors produced by the

    model, the greater the standard error

    of b

    The standard error(b)is inversely related to

    the standard deviation of the predictor

    variable (SX)

    As the variability of X increases, thestandard error of b decreases

    The standard error(b)is inversely related to

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    10

  • 8/11/2019 Weighted Least Squares 2

    11/93

    the sample size (n)

    As the sample size increases, thestandard error of b decreases

    Estimation of the Theoretical

    Standard Error of b

    The theoretical standard error of b (b) is usuallyestimated from a single sample, vis--vis a sampling

    distribution of b.

    SEb= (Se) / ( TSSX)

    Se= RSS / (n- k)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    11

  • 8/11/2019 Weighted Least Squares 2

    12/93

    TSSX=(X X)2

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    12

  • 8/11/2019 Weighted Least Squares 2

    13/93

    Gauss-Markov Theorem

    b is an unbiased estimate of. On repeated

    estimates, the distribution of b will be centered

    around.

    The sampling distribution of b will be normalif the samples are large and a sufficient number of

    samples are taken.

    OLS provides the best linear unbiased estimate of

    (BLUE)

    Best means:

    OLS provides the most unbiased and

    efficient estimate of.

    Efficiency refers to the size of the

    standard error of b (b); neither too largenor small.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    13

  • 8/11/2019 Weighted Least Squares 2

    14/93

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    14

  • 8/11/2019 Weighted Least Squares 2

    15/93

    The FourAssumptions About

    Regression Error

    e = (Y Y)

    e = prediction error

    e is uncorrelated with X, the independence

    assumption.

    e has the same variance (Se2) across the

    different levels of X, i.e. the variance of e is

    homoskedastic v heteroskedastic.

    The values of e are independent of each

    other, i.e. not autocorrelated or seriallycorrelated.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    15

  • 8/11/2019 Weighted Least Squares 2

    16/93

    e is normally distributed.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    16

  • 8/11/2019 Weighted Least Squares 2

    17/93

    The Problem of the Correlation of e & X

    Y = a + bX

    Spurious relationship: e and X may be correlated

    because Z is a common cause of X and Y. In thiscase b is a biased estimate of.

    spurious relationship

    X Y

    Z

    Collinear Relationship: If X2is correlated with X1& Y

    but is not the cause of either, b1will be a biased

    estimate of1

    X1 Y

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    17

  • 8/11/2019 Weighted Least Squares 2

    18/93

    X2

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    18

  • 8/11/2019 Weighted Least Squares 2

    19/93

    Correlation of e with X ( con'd )

    Intervening Relationship: X2 intervenes in the

    relationship between X1and Y. In this case b1will not

    be a biased estimate of, but:

    It will reflect both the direct and indirect

    effects of X1on Y.

    X1 X2 Y

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    19

  • 8/11/2019 Weighted Least Squares 2

    20/93

    Homoskedasticity of Errors (e) Over Levels

    of X

    The dotted lines represent the pattern of the

    dispersion of the residuals.

    0 0

    Homoskedastic Heteroskedastic (+)RXSe2>0.0

    0 0

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    20

  • 8/11/2019 Weighted Least Squares 2

    21/93

    Heteroskedastic (-) Heteroskedastic

    RXSe2

  • 8/11/2019 Weighted Least Squares 2

    22/93

    Consequences of Heteroscedasticity

    b will be an unbiased estimate of, but SEbwill be inefficient, too large or small.

    SEb= (Y-Y )2/ (n-k)

    TSSx

    If SEbis overestimated, (RXSe20.0)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    22

  • 8/11/2019 Weighted Least Squares 2

    23/93

    b will not be an efficient estimate ofand a

    Type I error may occur, since

    t = (b / SEb)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    23

  • 8/11/2019 Weighted Least Squares 2

    24/93

  • 8/11/2019 Weighted Least Squares 2

    25/93

    may occur.

    If SEbis underestimated, a Type I errormay occur.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    25

  • 8/11/2019 Weighted Least Squares 2

    26/93

    The Distribution of the Errors

    OLS regression assumes that the errors of

    prediction are normally distributed.

    This can be tested by saving the errors and

    Plotting

    A histogram or

    A normal probability plot

    Histogram of errors

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    26

    Std. Dev = 4.04

    Mean = 22.9

    N = 70.00

    35.032.530.027.525.022.520.017.5

    Errors as a function of predictions

    Predictions

    30

    20

    10

    0

  • 8/11/2019 Weighted Least Squares 2

    27/93

    Distribution of errors ( con'd )

    Normal probability plot of errors

    If the errors are non-normally distributed

    b may still be unbiased and efficient if

    The homoskedasticity and independence

    assumptions are met and the sample is large

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    27

    Nora! pro"a"i!it# p!ot of errors

    $"served %uu!ative Pro"a"i!it#

    1.00.75.50.250.00

    1.00

    .75

    .50

    .25

    0.00

  • 8/11/2019 Weighted Least Squares 2

    28/93

    If the sample is small, the use of the t distribution

    in determining the significance of b and its

    confidence interval will be biased.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    28

  • 8/11/2019 Weighted Least Squares 2

    29/93

    Summary of Assumptions and

    The Consequences of Their Violation

    Assumption Violation Consequences

    Errors correlated with X

    Spurious relationship b biased estimate of

    Collinear relationship b biased estimate of

    Intervening relationship b unbiased estimate ofbut reflects both direct &

    indirect effects

    Heteroskedastisity

    (RXSe20.0)

    b unbiased but not

    efficient, SEbtoo

    small/large, Type I or II

    error may result

    Autocorrelated errors

    b unbiased but not

    efficient, SEbtoo

    small/large, Type I or II

    error may result

    Errors non-normally

    b may be unbiased if

    homoskedasticity & in-

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    29

  • 8/11/2019 Weighted Least Squares 2

    30/93

    distributed dependence assumptions

    met & N is large. If N is

    small, t distribution may

    be biased.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    30

  • 8/11/2019 Weighted Least Squares 2

    31/93

    Heteroskedastic Errors and

    Weighted Least-Squares Regression

    If the errors are heteroskedastically distributed

    The SEbmay inefficient, i.e. either too small or

    large, which may lead to a Type I or IIerror

    Ways to detect heteroskedasticity

    Scatterplot of X against Y (prior to analysis)

    Scatterplot of predictions against residuals, either

    unstandardized or standardized

    Scatterplot of X against residuals

    Scatterplot of X against the absolute value

    of the residuals (e)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    31

  • 8/11/2019 Weighted Least Squares 2

    32/93

    Scatterplot of X against the squared

    residuals (e2)

    Whites Test for homoskedasticiy

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    32

  • 8/11/2019 Weighted Least Squares 2

    33/93

    Example

    Scatterplot of X Against Y

    Sentence length (Y) as a function of

    drug dependency (X)

    Heteroskedasticity

    As drug score increases, the variability in

    sentence increases

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    33

    D&'S%$&E

    1210()420

    30

    20

    10

    0

  • 8/11/2019 Weighted Least Squares 2

    34/93

    Example

    Scatterplot of Predictions Against Residuals

    Sentence length (Y) as a function of

    drug dependency (X)

    Heteroskedasticity

    As predicted sentence becomes longer,

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    34

    Predicted *a!ue

    9(7)5432

    20

    10

    0

    +10

  • 8/11/2019 Weighted Least Squares 2

    35/93

    variability in residuals becomes greater.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    35

  • 8/11/2019 Weighted Least Squares 2

    36/93

  • 8/11/2019 Weighted Least Squares 2

    37/93

    Ho= residuals are homoskedastic

    n = number of cases

    df = number of independent variables

    Whites Test for Heteroskedasticity (cont.)

    Example

    The regression of sentence on dr_score

    R2= 0.06517

    2= n R2= (70) (0.06517) = 4.56

    df = 1

    p

  • 8/11/2019 Weighted Least Squares 2

    38/93

    Reject the Hothat the residuals are

    homoskedastic

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    38

  • 8/11/2019 Weighted Least Squares 2

    39/93

    How Does One Correct the Problem of

    Heteroskedastic Errors?

    Solution: Weighted Least-Squares Regression (WLS

    Regression)

    The logic of WLS Regression

    Find a weight (wi)

    That can be used to modify the influence of

    large errors on the estimation of

    The best fit values of

    The regression constant (a)

    The regression coefficients (bk)

    OLS is designed to minimize:(Y Y)2

    In WLS, values of a and bkare estimatedwhich minimize RSS =wi(Y Y)

    2

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    39

  • 8/11/2019 Weighted Least Squares 2

    40/93

    This process has the effect of minimizing the

    influence of a case with a large error on the

    estimation of a and bk

    And maximizing the influence of a case with a

    small error on the estimation of a and bk

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    40

  • 8/11/2019 Weighted Least Squares 2

    41/93

    Techniques for Estimating a Suitable

    Value of a Weight wi

    From theory, the literature, or experience gained in

    prior research.

    Rarely will this approach prove successful,

    except by trial and error

    Estimate wiby regressing e2on theoffending

    independent variable X and

    Transforming the values of X and Y.

    This is called residualizing the variable X.

    Use log-likelihood estimation to determine a

    suitable value of wi

    This can be done in SPSS using theregression weight estimation procedure

    coupled with the WLS>>procedure.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    41

  • 8/11/2019 Weighted Least Squares 2

    42/93

    In the following case study both the residualizing

    and SPSS WLS>>procedures will be demonstrated.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    42

  • 8/11/2019 Weighted Least Squares 2

    43/93

    An Example

    *****

    The Relationship Between Drug Dependency &Length of Sentence

    The model

    Sentence = a + b (drug_score)

    The results

    Sentence = 1.97 + 0.6438 (drug_score)

    For this model to be BLUE, the residuals must be

    homoskedastic.

    Q Are the residuals homoskedastic?

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    43

  • 8/11/2019 Weighted Least Squares 2

    44/93

    An Example (cont.)

    Scatterplot of the residuals

    Notice how the residuals become larger the

    greater the degree of drug dependency. These

    are heteroskedastic residuals.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    44

    ,eteros-edastic &esidua!s

    Standardied Predicted *a!ue

    1.51.0.50.0+.5+1.0+1.5+2.0

    5

    4

    3

    2

    1

    0

    +1

    +2

  • 8/11/2019 Weighted Least Squares 2

    45/93

    Solving the Problem of Heteroskedasticity

    Solution

    Residualize the offending variable X

    Steps in the process

    1. Plot X against Y to determine the presence of

    heteroskedasticity

    2. Estimate the following regression equation

    and save the residuals (e = Y Y). In SPSSthe residuals appear as res_1

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    45

  • 8/11/2019 Weighted Least Squares 2

    46/93

    Y = a + bX

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    46

  • 8/11/2019 Weighted Least Squares 2

    47/93

    Solving the Problem of Heteroskedasticity (cont.)

    3. Square the residuals

    e2= (res_1)

    2= residsq

    4. Regress residsq on X and save the predicted

    residsq, in SPSS this is called pre_2

    Residsq = a + bX

    5. Transform X and Y, and compute a weight wi

    called wtsqroot

    wtX = X / pre_2

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    47

  • 8/11/2019 Weighted Least Squares 2

    48/93

    wtY = Y / pre_2

    wtsqroot = 1 / pre_2

    Solving the Problem of Heteroskedasticity (cont.)

    6. Estimate the following weighted regression

    equation through the origin, i.e. with a

    regression constant equal to 0.0

    wtY = a(wtsqroot) + b(wtX)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    48

  • 8/11/2019 Weighted Least Squares 2

    49/93

    Step 1

    Sentence length as a function of

    drug dependency

    SPSS scatterplot of sentence as a function of

    dr_score . This can only be done when there are 2

    or less IV.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    49

    Sentence as a function of dru/ dependenc#

    Dru/ Dependenc#

    1210()420

    30

    20

    10

    0

  • 8/11/2019 Weighted Least Squares 2

    50/93

    Heteroskedastic The variability in sentence

    length increases as the degree of drug dependency

    increases.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    50

  • 8/11/2019 Weighted Least Squares 2

    51/93

    Step 2

    Regress sentence length on drug

    dependency, save the residuals (res_1) and

    the predictions (pre_1)

    sentence = 1.975 + 0.644 dr_score

    R2= 0.12 (F = 9.24, p = 0.003)

    SPSS results for Step 2

    Regression

    Variables Entered/Removedb

    DR_SCOR

    Ea . Enter

    Model1

    Variables

    Entered

    Variables

    Removed Method

    All requested variables entered.a.

    Dependent Variable: SE!ECEb.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    51

  • 8/11/2019 Weighted Least Squares 2

    52/93

    Step 2 (cont.)

    Model Summaryb

    ."#$a .1%& .1&' #.$(1$

    Model

    1

    R R Square

    Ad)usted

    R Square

    Std. Error o*

    the Estimate

    +redi,tors: -Constant/ DR_SCOREa.

    Dependent Variable: SE!ECEb.

    ANOVAb

    %&%.01$ 1 %&%.01$ .%#& .&&"a

    1#&."0$ $( %1.1'

    1$%.('1 $

    Re2ression

    Residual

    !otal

    Model1

    Sum o*

    S uares d* Mean S uare 3 Si .

    +redi,tors: -Constant/ DR_SCOREa.

    Dependent Variable: SE!ECEb.

    Coefficientsa

    1.'0 1.#%0 1."($ .1'&.$## .%1% ."#$ ".& .&&"

    -ConstantDR_SCORE

    Model

    1

    4 Std. Error

    5nstandardi6ed

    Coe**i,ients

    4eta

    Standardi

    6ed

    Coe**i,ien

    ts

    t Si .

    Dependent Variable: SE!ECEa.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    52

  • 8/11/2019 Weighted Least Squares 2

    53/93

    Casewise Diagnosticsa

    ".0$ %0.&&Case umber$&

    Std. Residual SE!ECE

    Dependent Variable: SE!ECEa.

    Step 2 (cont.)

    Residuals Statisticsa

    %.$1(0 (.#1%( 0.0'1 1.'1"% '&

    7'.#1%( 1(.01($ 7'.$1E71' #.$#'0 '&

    71.# 1.#"" .&&& 1.&&& '&

    71.0(" ".0$ .&&& ." '&

    +redi,ted Value

    Residual

    Std. +redi,ted Value

    Std. Residual

    Minimum Ma8imum Mean Std. Deviation

    Dependent Variable: SE!ECEa.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    53

  • 8/11/2019 Weighted Least Squares 2

    54/93

    N.B. The residuals are heteroskedastic. Compare

    this scatterplot with the scatterplot of sentence as a

    function of dr_score. Notice that the patters are the

    same.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    54

    ,eteros-edastic &esidua!s

    Standardied Predicted *a!ue

    1.51.0.50.0+.5+1.0+1.5+2.0

    5

    4

    3

    2

    1

    0

    +1

    +2

  • 8/11/2019 Weighted Least Squares 2

    55/93

    Step 3

    Calculate the squared residuals

    In SPSS, the unstandardized residuals are saved as

    res_1.

    Step 3 involves squaring the residuals by use of thedata transformation procedure in SPSS.

    squared residual = (res_1)2= residsq

    The SPSS syntax for this transformation is as

    follows:

    Residsq = res_1**2

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    55

  • 8/11/2019 Weighted Least Squares 2

    56/93

    The steps used in this transformation process are

    described in the case study associated with this

    module.

    Step 3 (cont.)

    SPSS results for Step 3

    pre_1 res_1 residsq

    7.76901 -6.76901 45.82

    8.41282 -7.41282 54.95

    8.41282 -7.41282 54.95

    6.48139 -5.48139 30.05

    7.76901 -5.76901 33.28

    7.12520 -5.12520 26.27

    7.76901 -5.76901 33.28

    8.41282 -6.41282 41.12

    5.83758 -2.83758 8.057.76901 -4.76901 22.74

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    56

  • 8/11/2019 Weighted Least Squares 2

    57/93

    2.61852 -.61852 .38

    2.61852 .38148 .15

    3.26233 1.73767 3.025.19377 1.80623 3.26

    8.41282 -.41282 .17

    7.76901 1.23099 1.52

    7.12520 2.87480 8.26

    6.48139 5.51861 30.46

    7.12520 6.87480 47.26

    7.12520 7.87480 62.01

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    57

  • 8/11/2019 Weighted Least Squares 2

    58/93

    Step 4

    Regress the squared residuals on the independent

    variable dr-score and save the predictions as (pre_2)

    residsq = -7.587 + 4.6685 dr_score

    R2= 0.065 (F = 4.74, p = 0.0329)

    This process is called residualizing a variable.

    By OLS definition, the residuals (residsq)represent

    the variance in Y that is unrelated to X.

    Therefore, there should be no significant relationship

    between X and residsq.

    If there is, one or more OLS regression

    assumptions have been violated.

    In this case, the violated assumption is the

    homoskedasticity of the residuals.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    58

  • 8/11/2019 Weighted Least Squares 2

    59/93

    Step 4 (cont.)

    SPSS results for Step 4

    Regression

    Variables Entered/Removedb

    DR_SCOR

    Ea . Enter

    Model1

    Variables

    Entered

    Variables

    Removed Method

    All requested variables entered.a.

    Dependent Variable: RES9DSb.

    Model Summaryb

    .%00a .&$0 .&01 #'."0"Model1

    R R S uare

    Ad)usted

    R S uare

    Std. Error o*

    the Estimate

    +redi,tors: -Constant/ DR_SCOREa.

    Dependent Variable: RES9DSb.

    ANOVAb

    1&$#(.(&% 1 1&$#(.(&% #.'#1 .&""a

    10%'#.# $( %%#$."101$""(.% $

    Re2ression

    Residual!otal

    Model1

    Sum o*

    S uares d* Mean S uare 3 Si .

    +redi,tors: -Constant/ DR_SCOREa.

    Dependent Variable: RES9DSb.

    Coefficientsa

    7'.0(' 1#.#%% 7.0%$ .$&1#.$$ %.1## .%00 %.1'' .&""

    -ConstantDR_SCORE

    Model

    1

    4 Std. Error

    5nstandardi6ed

    Coe**i,ients

    4eta

    Standardi

    6ed

    Coe**i,ien

    ts

    t Si .

    Dependent Variable: RES9DSa.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    59

  • 8/11/2019 Weighted Least Squares 2

    60/93

  • 8/11/2019 Weighted Least Squares 2

    61/93

    Step 5

    Compute the absolute value of pre_2 and three new

    variables wtsent, wtdrug and the weight wtsqroot.

    wtsent = (sentence) / abspre_2

    wtdrug = (dr_score) / abspre_2

    wtsqroot = (1) / abspre_2

    pre_2 from the previous step is the information in the

    squared residuals (residsq) that is related to the IVdr_score.

    Dividing sentence and dr_score by pre_2 reduces

    the influence of extreme values on the estimation of

    a and b.

    Finally a third transformation is performed by

    creating the variable wtsqroot.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    61

  • 8/11/2019 Weighted Least Squares 2

    62/93

    This will serve as a weighting factor.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    62

  • 8/11/2019 Weighted Least Squares 2

    63/93

    Step 5 (cont.)

    SPSS results for Step 5

    abspre_2 wtsent wtdr_sco wtsqroot

    34.43. 17 1.53 .17

    39.10. 16 1.60 .16

    39.10. 16 1.60 .16

    25.09 .20 1.40 .20

    34.43 .34 1.53 .17

    29.76 .37 1.47 .18

    34.43 .34 1.53 .17

    39.10 .32 1.60 .16

    20.42 .66 1.33 .22

    34.43 .51 1.53 .17

    20.42 .66 1.33 .22

    39.10 1.28 1.60 .16

    34.43 1.53 1.53 .17

    29.76 1.83 1.47 .18

    25.09 2.40 1.40 .20

    29.76 2.57 1.47 .18

    29.76 2.75 1.47 .18

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    63

  • 8/11/2019 Weighted Least Squares 2

    64/93

    Step 6

    Compute the WLS regression

    wtsent = a(wtsqroot) + b (wtdrug)

    Results:

    wtsent = 1.159 (wtsqroot) + 0.7833 (wtdrug)

    R2= 0.674 (F = 70.29, p

  • 8/11/2019 Weighted Least Squares 2

    65/93

    Step 6 (cont.)

    Model Summarycd

    .(%1b .$'# .$$# .$00Model1

    R R S uarea

    Ad)usted

    R S uare

    Std. Error o*

    the Estimate

    3or re2ression throu2h the ori2in -the no7inter,ept

    model/ R Square measures the proportion o* the

    variabilit= in the dependent variable about the ori2ine8plained b= re2ression. !his CAO! be ,ompared

    to R Square *or models >hi,h in,lude an inter,ept.

    a.

    +redi,tors: ;!SROO!/ ;!DR_SCOb.

    Dependent Variable: ;!SE!,.

  • 8/11/2019 Weighted Least Squares 2

    66/93

    Step 6 (cont.)

    Casewise Diagnosticsab

    ".'$ #.Case umber$&

    Std. Residual ;!SE!

    Dependent Variable: ;!SE!a.

  • 8/11/2019 Weighted Least Squares 2

    67/93

    N.B. The heteroskedasticity has been reduced.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    67

    &esidua!s of tsent re/ressed on tdru/

    tdru/

    2.22.01.(1.)1.41.21.0

    4

    3

    2

    1

    0

    +1

    +2

  • 8/11/2019 Weighted Least Squares 2

    68/93

    Comparison of the OLS v Residualized

    Regression Models

    Compare of scatterplots, notice thesubstantial

    reduction of heteroskedasticity

    Statistical results

    Method a b SEa SEb R2 p

    OLS 1.975 0.644 1.425 0.212 0.1196 0.0034

    Resid-

    ualized

    1.159 0.783 0.605 0.139 0.6740 0.0001

    The residualized model is more efficient, SEs are smaller.

    Comparison of 95% confidence intervals

    Method

    95% Confidence

    Interval Difference

    OLS 0.221 to 1.066 0.845

    Residualized 0.504 to 1.062 0.558

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    68

  • 8/11/2019 Weighted Least Squares 2

    69/93

    N.B. The width of the residualized 95% confidence interval

    is less than that of the OLS interval.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    69

  • 8/11/2019 Weighted Least Squares 2

    70/93

    An Alternative Procedure for Correcting

    Heteroskedasticity

    Log-Likelihood Estimation of wi

    If it can be assumed that the variance in the DV

    Is proportional to the IV,

    Log-likelihood estimation can be used to

    estimate wI.

    In this case it is assumed that

    Sy2(X)wor Sy

    2(1 / Xw)

    (is read proportional to)

    In log-likelihood estimation of wi, the question is:

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    70

  • 8/11/2019 Weighted Least Squares 2

    71/93

    What power of X, i.e. wi, is most likely to have

    produce the proportional relationship between

    Sy2

    and X ?

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    71

  • 8/11/2019 Weighted Least Squares 2

    72/93

    SPSS Weight Estimation and WLS>>

    Regression Procedures

    This procedure begins by using log-likelihood

    estimation to iteratively determine a weight wI

    To be used in estimating the values of the

    regression constant (a) and the regressioncoefficient (b)

    Such that the RSS is minimized.

    RSS =[ (1 / Xwi) (Y Y)2]

    This may solve the heteroskedasticity problem if:

    Sy2(X)wor Sy

    2(1 / Xw)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    72

  • 8/11/2019 Weighted Least Squares 2

    73/93

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    73

  • 8/11/2019 Weighted Least Squares 2

    74/93

    Step 1

    Estimation of the weight wiusing the SPSS weight

    estimation procedure

    The result

    The most likely weight = 1.8

    The variance in sentence is estimated to be

    Sy2

    = (dr_score)

    1.8

    Regression equation

    sentence = 0.94 + 0.83 (dr_score)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    74

  • 8/11/2019 Weighted Least Squares 2

    75/93

    For a drug score of 6,the prediction would be

    sentence = 0.94 + 0.83 (6) = 5.92 years

    Step 1 (cont.)

    Examination of weights for individual subjects

    Subject dr_score Weight

    Jones 10 1/(10)1.8= 0. 01585

    Smith 1 1/(1)1.8= 1.00

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    75

  • 8/11/2019 Weighted Least Squares 2

    76/93

    Step 1 (cont.)

    SPSS results

    !eig"ted #east S$uaresMODEL: MOD_1.

    Source variable.. DR_SCORE Dependent variable.. SENTENCE

    Log-lieli!ood "unction # -$%&.&'%'() *O+ER value # -).'''Log-lieli!ood "unction # -$%%.&11')% *O+ER value # -$.,''Log-lieli!ood "unction # -$%1.&%&% *O+ER value # -$.''Log-lieli!ood "unction # -$),.,11)& *O+ER value # -$.%''Log-lieli!ood "unction # -$).('( *O+ER value # -$.$''Log-lieli!ood "unction # -$)).')'$, *O+ER value # -$.'''Log-lieli!ood "unction # -$)'.1,1(, *O+ER value # -1.,''Log-lieli!ood "unction # -$$&.)&%, *O+ER value # -1.''Log-lieli!ood "unction # -$$%.(&$(& *O+ER value # -1.%''Log-lieli!ood "unction # -$$1.,1') *O+ER value # -1.$''Log-lieli!ood "unction # -$1(.1)), *O+ER value # -1.'''Log-lieli!ood "unction # -$1.%()1& *O+ER value # -.,''Log-lieli!ood "unction # -$1).,&,(%' *O+ER value # -.''Log-lieli!ood "unction # -$11.)1& *O+ER value # -.%''Log-lieli!ood "unction # -$',.,1) *O+ER value # -.$''Log-lieli!ood "unction # -$'.)&(,, *O+ER value # .'''Log-lieli!ood "unction # -$'%.'$,&1$ *O+ER value # .$''Log-lieli!ood "unction # -$'1.&&&$%' *O+ER value # .%''Log-lieli!ood "unction # -1((.%&) *O+ER value # .''Log-lieli!ood "unction # -1(&.,)1, *O+ER value # .,''Log-lieli!ood "unction # -1(.,&$' *O+ER value # 1.'''Log-lieli!ood "unction # -1(%.)1)()( *O+ER value # 1.$''Log-lieli!ood "unction # -1().'%1$$ *O+ER value # 1.%''Log-lieli!ood "unction # -1($.1$&$ *O+ER value # 1.''Log-lieli!ood "unction # -1(1.,' *O+ER value # 1.,''Log-lieli!ood "unction # -1(1.&$,1,( *O+ER value # $.'''Log-lieli!ood "unction # -1($.%1) *O+ER value # $.$''Log-lieli!ood "unction # -1().(%'( *O+ER value # $.%''Log-lieli!ood "unction # -1(.)'$)& *O+ER value # $.''Log-lieli!ood "unction # -1((.$)'' *O+ER value # $.,''Log-lieli!ood "unction # -$').(%$$( *O+ER value # ).'''

    T!e /alue o0 *O+ER Mai2i3ing Log-lieli!ood "unction # 1.,''

    log-likelihood estimated weight wi = 1.8

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    76

  • 8/11/2019 Weighted Least Squares 2

    77/93

    Step 1 (cont.)

    Estimation of the weighted regression model

    Source variable.. DR_SCORE *O+ER value # 1.,''

    Dependent variable.. SENTENCE

    Li4t5i4e Deletion o0 Mi44ing Data

    Multiple R .)%,R S6uare .%'1'7d8u4ted R S6uare .)()Standard Error .,%)&

    7nal94i4 o0 /ariance:

    D" Su2 o0 S6uare4 Mean S6uare

    Regre44ion 1 )$.(% )$.(%Re4idual4 , %,.%'(,$ .&11('(

    " # %.)'&$ Signi0 " # .''''

    ------------------ /ariable4 in t!e E6uation ------------------

    /ariable SE eta T Sig T

    DR_SCORE .,$,%&' .1$1&%& .)%&, .,' .'''';Con4tant< .()((&& .)(%,, $.),$ .'$''

    Log-lieli!ood "unction # -1(1.,'

    T!e 0ollo5ing ne5 variable4 are being created:

    Na2e Label

    +=T_1 +eig!t 0or SENTENCE 0ro2 +LS> MOD_1 DR_SCORE?? -1.,''

    Weighted equation

    Sentence = 0.9399 + 0.8285 (dr_score)

    Unweighted equation

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    77

  • 8/11/2019 Weighted Least Squares 2

    78/93

    Sentence = 1.97 + 0.6438 (dr_score)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    78

  • 8/11/2019 Weighted Least Squares 2

    79/93

    Step 2

    Plot the relationship between dr_score

    and the weight wi

    The heteroscdasticity problem

    Recall the previous scatterplot: as the value of drug scores

    increases, the variance in sentences increases as well.

    The log-likelihood estimated weight is such that

    As the value of drug score increases, the weight adjusted

    drug score (wgt_1) decreases.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    79

    Scatterp!ot of adusted dr'score and dr'score

    ei/t adusted dr'score=1dr'score6 1.(

    D&'S%$&E

    1210()420

    1.2

    1.0

    .(

    .)

    .4

    .2

    0.0

  • 8/11/2019 Weighted Least Squares 2

    80/93

  • 8/11/2019 Weighted Least Squares 2

    81/93

    The weight of 1.8 reduces the effect of large

    errors on the RSS providing a more efficient

    estimate of the SEb.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    81

  • 8/11/2019 Weighted Least Squares 2

    82/93

    Step 3

    The SPSS WLS>>in linear regression

    If an appropriate weight wiis already known by

    another means

    The WLS>>procedure in SPSS linear

    regression can be use instead of the SPSS

    weight estimation procedure

    The procedure

    Simply specify the regression model

    Enter the known weight-variable under the

    WLS>>command and estimate themodel

    In this case, the weight variable wgt _1 from

    Step 2 will be used

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    82

  • 8/11/2019 Weighted Least Squares 2

    83/93

    Step 3 (cont.)

    The results of the WLS>>analysis using a weight-

    variable wgt_1, wi= 1.8, with regression through the

    origin

    R2= 0.668, F = 139.14, p = 0.0001

    sentence = 1.036 (dr_score)

    SEb=0.087

    N.B. Since this model does not include a constant

    (a), the R2and the other statistical results can not be

    compared with the associated values of a model that

    does use a constant (a).

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    83

  • 8/11/2019 Weighted Least Squares 2

    84/93

    Step 3 (cont)

    SPSS results

    Regression

    Variables Entered/Removedbc

    DR_SCOR

    Ea . Enter

    Model1

    Variables

    Entered

    Variables

    Removed Method

    All requested variables entered.a.

    Dependent Variable: SE!ECEb.

    ;ei2hted

  • 8/11/2019 Weighted Least Squares 2

    85/93

    Step 3 (cont)

    Coefficients

    ab

    .#& ."0 %."(% .&%&

    .(%( .1%% .$"$ $.(&0 .&&&

    -Constant

    DR_SCORE

    Model1

    4 Std. Error

    5nstandardi6ed

    Coe**i,ients

    4eta

    Standardi

    6ed

    Coe**i,ien

    ts

    t Si .

    Dependent Variable: SE!ECEa.

    ;ei2hted

  • 8/11/2019 Weighted Least Squares 2

    86/93

    Step 3 (cont)

    Saved predicted & residual values, and the weighted

    values of dr_score (i.e. wgt_1)

    wtg_1 = 1 / (dr_score)1.8

    wgt_1 pre_1 res_1

    %&'(') *%+(),' -.%+(),'%&'* (%,,0)* -*%,,0)*%&'* (%,,0)* -*%,,0)*%&+&', )%.+(,. -%.+(,.%&'(') *%+(),' -)%+(),'

    %&,+)* .%)..0 -%)..0%&'(') *%+(),' -)%+(),'%&'* (%,,0)* -.%,,0)*%&+(. %('&*& -,%('&*&

    %&,+)* .%)..0 ,%0+,,)%&+&', )%.+(,. %,)&.+%&,+)* .%)..0 )%0+,,)

    %&,+)* .%)..0 .%0+,,)

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    86

  • 8/11/2019 Weighted Least Squares 2

    87/93

    Step 4

    Variable transformations and

    plot of the residuals

    Unfortunately, the weighted residuals and predictions

    produced by the SPSS weight estimation and

    WLS>>procedures

    Can not be directly graphed from the saved

    residuals and predictions

    The residuals and the predictions must first betransformed as follows:

    Transformed residual = (res_1) (wt)0.5

    Transformed prediction = (pre_1) (wt)0.5

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    87

  • 8/11/2019 Weighted Least Squares 2

    88/93

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    88

  • 8/11/2019 Weighted Least Squares 2

    89/93

    Step 4 (cont.)

    SPSS results

    Compare the degree of heteroskedasticity in this

    scatterplot with

    The plot of the residuals from the un-weighted regression model.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    89

    Scatterp!ot of te 8ransfored &esidua!s

    8ransfored ei/ted Predictions

    1.41.31.21.11.0

    4

    3

    2

    1

    0

    +1

    +2

  • 8/11/2019 Weighted Least Squares 2

    90/93

    Notice the substantial change in the degree of

    heteroskedasticity.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    90

  • 8/11/2019 Weighted Least Squares 2

    91/93

    Step 4 (cont.)

    Transformed variables transres and transpre

    transres = res_1*sqrt(wgt_1)

    transpre = pre_1*sqrt(wgt_1)

    transres transpre

    -'%&, '%')-'%&0 '%')-'%&0 '%')-'%&& '%'.-%*( '%')-%*) '%')-%*( '%')-%(' '%')

    %+. '%')%(' '%'.%(( '%')'%'0 '%')

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    91

  • 8/11/2019 Weighted Least Squares 2

    92/93

    Comparison of Results:

    OLS, Residualized , and Log-Likelihood

    Models

    Method a b SEa SEb

    OLS 1.975 0.644 1.425 0.212

    Rezidu-

    alized1.159 0.783 0.605 0.139

    LogLike-

    lihood

    0.940 0.828 0.394 0.121

    N.B. The standard errors of the residualized &

    log-likelihood models are lower than the OLS model.

    The log-likelihood model produces smaller standard

    errors than the residualized model.

    Weighted Least-squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

    92

  • 8/11/2019 Weighted Least Squares 2

    93/93

    93