multiple regression extension of simple linear regression –

of 46/46
Multiple Regression Extension of Simple Linear Regression – using multiple predictors each predictor could help predict or explain additional variability in the response/criterion variable However: What should be the effect of using any additional predictors?

Post on 26-Jan-2016

89 views

Category:

Documents

5 download

Embed Size (px)

DESCRIPTION

Multiple Regression Extension of Simple Linear Regression – using multiple predictors each predictor could help predict or explain additional variability in the response/criterion variable However: What should be the effect of using any additional - PowerPoint PPT Presentation

TRANSCRIPT

  • Multiple Regression

    Extension of Simple Linear Regression using multiple predictors

    each predictor could help predict or explain additional variability in the response/criterion variable

    However:What should be the effect of using any additional predictors?

  • Multiple Regression

    What should be the effect of using additional predictors?

    Logically, unless correlation with DV is 0, each predictor will improve prediction (explain additional variance in DV)

    So just adding variables as predictors at random will usually improve model

    Creates potential for misuse of the strategy

  • IDEALLYeach predictor should be - correlated with the DV- uncorrelated with other predictors(r over .8 undesirable)each predictor should explain some unique variability in DV

    each predictor should make sense!

  • Best situation CLEAR THEORY or LOGIC determines the predictors selected

    Examples

    Relationship Commitmentsatisfaction with outcomes (+)investments in relationship (+)attractiveness of available alternatives (-)

    Job Satisfactionsalaryphysical conditionssocial conditions

  • Simple linear regressionYp = a + bX (+ residuals)

    Multiple regression

    Yp = a + b1X1 + b2X2 (+ residuals)

    a is value of Y when all X = 0 (regression constant)bs are partial regression coefficientsslope for each predictor when other predictors held constant

  • Graph of relationship when two predictors are used

    Now try to fit a plane rather than a line to minimize the errors of prediction

  • Multiple regression

    Yp = a + b1X1 + b2X2 + b3X3 (+ residuals)

    Commitment = a + b1 (satisfaction) +b2 (investments) +- b3 (alternatives) (+ residuals)A weighted linear combination of predictorscomparison to ANOVA main effects only model

  • Lets return to the question of predicting Exam 2 grades using multiple predictors

    Undergraduate GPA (0-4 scale)GRE Verbal (200-800 scale)GRE Quantitative (200-800 scale)Exam 1 grade (0-100 scale)Mean Homework grade (0-10 scale)Note variety of scales for predictors, weights (partial regression coefficients) will be variable to take those into account

  • Ideally, all predictors are related to the criterion, and are unrelated to each other

  • Exam 2 (Pred) = 16.06 + .19 (gpa) - .00(grev) + .00(greq) +.44(exam1) +3.67(homework)Using just Exam 1 score, the correlation between Exam 1 and Exam 2 was r = .637, r2 = .406Now the R, between the set of predictors and Exam 2 is .735, and R2 = .540Since gpatot, grev, greq were all not significant, should they be excluded from the equation?

  • Assumptions - never likely to satisfy all

    Essentially same as for r, but at a multivariate level

    Independent ObservationsInterval/Ratio Data or at least pretendNormality all Predictors (Xs) and Response (Y) - errors of prediction are normally distributedLinearityall Xs have linear relationship with Y- errors of prediction/predicted scores are linearEquality of Variances (Homoscedasticity)- variability of errors of Y are same at all values of X

  • Assumptions can be evaluated within SPSS at the multivariate level. In the Regression window, choose Plots and request zresid (Y) and zpred (X). The tables at the right demonstrate the patterns that would indicate each violation. Although deciding when there is enough discrepancy is still subjective.From Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon.Example to follow

  • Predicting Rated Distress : (1) none to Extreme (9) - when partner is emotionally unfaithful using Age and Rated Distress over Sexual Infidelity as predictors. All 3 variables are skewed.

  • Other Considerations in Multiple Regression -- Truncated Range same as with r, can lead to poor assessment of real R-- Outliers due to multivariate deviation

    Discrepancy (distance) outlier on criterionLeverage outlier on predictorsInfluence combines D & L to assess influence on solution (change in regression coefficients if case deleted)

  • How these would appear in a simple linear regression situationFrom Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon.

  • Other Considerations in Multiple Regression-- Outliers due to multivariate deviation

    Simple diagnostic for Influence is to request Cooks Distance statistic in Regression window, Save option. Values over 1 would suggest potentially strong influence.

  • Cooks distance = 92.6Note that residual for the outlier is not great, but it has strong influence on solutionLine with outlierLine without outlier

  • Other Considerations in Multiple Regression

    Sample Size if too small may get good, but meaningless prediction too little variabilityMinimum sample sizes recommended (to detect moderate effect sizes, 13% with power of approximately .80)(Green, S. B. (1991) How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26, 499-510.)

    For test of a model n= 50 +8pFor test of individual predictors in model 104 + pp = number of predictors

    Can also conduct a power analysis based on the effect size you desire to select your sample size

  • Other Considerations in Multiple Regression

    Multicollinearity or Singularity

    Singularity when one predictor is a combination of other predictors included

    Multicollinearity - when other predictors can account for a high degree of variability in a predictor

  • Other Considerations in Multiple Regression

    Diagnostics for Multicollinearity or Singularity

    Tolerance is used as diagnostic statistic

    If other predictors used to predict a predictor, what variance is shared?But reported as 1-R2, so closer to 1 is better - less than .2 indicates a problem

    Variance Inflation Factor (VIF) also used. It is the reciprocal of Tolerance, so can range from 1 up. Reflects degree to which standard error of b is increased due to correlations among predictors.value of 4 cause for some concernvalue of 10 serious problem

  • Assessing the Outcome

    Testing the Overall Model as a single outcome

    How well do the set of predictors (Xs) predict the criterion (Y)

    Ho: all bs are = 0, all partial regression coefficients = 0 OrHo: R = 0, the Multiple Correlation Coefficient = 0

    R = correlation of actual Y with weighted linear combination of predictors (Xs)

    Or since weighted linear combination leads to predicted scores

    R = correlation of actual Y with predicted Yp

  • Reminder: Partitioning the Variability in Y

    SSTotal = Sum (Y - Mean Y)2variability of Y scores from the mean

    Separated into

    SSregression = Sum (Yp Mean Y) 2 Improvement in predictions when using X (variability in Y explained by X), rather than assuming everyone gets the Mean

    SSresidual = Sum (Y - Yp) 2

    Degree to which predictions do not match the actual scores(prediction errors that have been minimized)

  • Mean IQ = 105This would be your best guess for every person if you had no useful predictorImprovement in Prediction using GPAResidual distance from theprediction lineExample from Simple Linear RegressionResidual much greater hereMean GPA = 3.06

  • Test using F similar to simple linear regression

    Partition SST into SSregression (explained by weighted combination)

    SSresidual (unexplained)

    F = SSregression /df regression df = (p + a) - 1 SSresidual / df residual df = n p - 1

    F = MSregression = explained (systematic+unsystematic) MSresidual unexplained (unsystematic)

    Was R reliably different from 0 ? Yes, if F is significant

    Recall: Standard Error of the Estimate = SQRT (MS residual)Number of parameters in model (predictors + intercept) df often indicated as p, since always only one a (df = p +1 -1)

  • R2 = SSregression = explained variability SST total variability

    % of variance accounted for by the model(see next slide for ANOVA example)

    Adjusted R2 for better estimate for population, adjusted based on number of predictors and sample sizeAdjusted R2 = 1- ((1-R2) (n-1/n-p-1)) so lower if small sample, but many predictors

    Can use R2 for describing a sample

  • Example from Handout Packet, Page approx. 47Test of model in which there are 3 predictors used to predict the rating on Sensitive, the DVPartial

  • In some cases, the purpose of the regression analysis is simply to see if the Model works.

    Does it explain variance in the criterion? Can it be used to make predictions?

    Thus, the overall test of the model is all you need, and you can interpret the R2 or R2adj

    and the SEE, if plan predictions

    In other cases, you might want to know how the individual predictors contributed to the overall model.

  • Assessing the contribution of individual predictors

    Dependent upon the set of predictors included!

    Partial regression coefficient can test to see if b = 0

    Is b = 0 (slope = 0) when other predictors are held constant

    Tested using a t test with df = n p 1

    Beta partial regression coefficient when all variables are standardized (standardized slope) If b is significant, so is beta

    Test of Partial Regression Coefficient is like a typical statistical test of significance - it is or is not significant, and is influenced by sample size

  • Can also evaluate predictors based on effect size measures (practical significance)

    These would be significant if b is significant

    Partial correlation (pr) as described in simple covariation section

    correlation of predictor (X1) with DV (Y) after removing the variance in both explained by the other predictors

    So both X1 and Y are adjusted before correlation is calculatedAll other Xs are partialed out of X1 and Y

    pr2 shared variance within context what % of variability in Y does X1 explain after other variables contributions to explaining both are removed

    there is less than 100% of variability of Y left for X1 explain

  • Semi-partial (part) correlation (sr)

    correlation of predictor (X1) with DV (Y) after removing the variance of X1 shared with the other predictors So X1 is adjusted by removing variance shared with other XsBut all variability in Y is left to be explainedAssesses unique contribution of X1 to explaining YThere is 100% of variability in Y to explain for each X in modelsr2 is considered best measure of individual predictor importance (practical significance)R2 will be lowered by sr2 for predictor when it is removed from model

    (BOTH pr and sr ARE STILL DEPENDENT ON THE MODEL USED)WHY?

  • Variability of DV (Y)IV 1IV 2Shared variabilityDV and X1Shared variabilityIV 1 and X2Shared variabilityDV and X2abcdPartial correlation (X1) = a/(a + d)Semi-partial correlation (X1) = a/(a + b + c + d)

  • Types of Multiple Regression

    Standard all predictors entered together

    contribution of each depends on others in the group

    Assumes other variables would usually be there and/or are relevant

    Four Humor StylesInvestment Model VariablesBig Five Personality Dimensions

  • Hierarchical Regression - enter in planned sequenceCan enter individual predictors one at a timeOr enter groups of variables at separate stepsAs new predictors are added, each one can only explain variability that is left

    Assess change in R2 at each step (increase significantly)and overall model when done

    Predicting adult IQParental IQPrenatal experienceEarly infant experienceEducation

  • Statistical Methods

    Let the data determine inclusion in the model, not based on a logical or theoretical plan

    Assess each step by evaluating change in R or R2

    Usually an exploratory tool in possible model building

    Requires a larger sample to have confidence (40 cases per predictor)

  • Stepwise

    Begins with single best predictor

    Adds next best, and assesses if model is better

    At each step, each variable is reassessed, and might be kept or removedStops when adding additional variables does not significantly improve model (R)

  • Forward inclusion

    Begins with single best predictor

    Adds next best, and assesses improvement

    Once in, stay in, but only stay if improved model (R)

    Backward exclusion

    Begins with full model

    Removes weakest contributor and assesses loss

    Keeps removing unless significant drop in R

  • Research questions using Multiple Regression

    Assess Overall ModelAssess individual predictorsEffects of adding or changing predictorson overall modelon other individual predictorsPredictions in new sample

  • Other Multiple Regression issues/applications

    Suppressor variables variables that improve the model due to correlations with other predictors, not criterion. They suppress variance in another predictor that is noise

    evident if simple r with criterion very low but contributes to model (sr is higher) can also produce a change in sign from r to b (i.e. positive r but negative b)

  • Other issues/applications

    Mediation Models Relationship of X to Y is mediated by some other variable

    Positive use of Humor for self Perceived StressPositive Personality (optimistic, hopeful, happy)

    Positive use of Humor for selfPerceived Stress(High self-enhancing/Low self-defeating)Humor use (H) predicts Perceived Stress (c) - the direct pathHumor use predicts Positive Personality (PP) (a)Positive Personality predicts Perceived Stress, with Humor in model (b)In Hierarchical Model, enter PP first, then H, if PP mediates H, H no longer contributes to the model the c path, indirect, not significantabCC

  • Other issues/applications

    Moderator Models relationship of predictor with the criterion depends upon some othervariable (just like an interaction in ANOVA)

    Yp = a + b1x1 + b2x2 + b3(x1x2) + residuals

    Often requires some modifications of the data prior to the analysis- centering variables to avoid multicollinearity (if predictors do not have true 0 scores)Interaction term added to equationMain effects

  • Best situation CLEAR THEORY to be tested

    Relationship Commitment (low 8 72 high)satisfaction with outcomes (+) (low 3 21 high)investments in relationship (+) attractiveness of available alternatives (-) ( low 6 high 48)

    (Subj low 6 54 high)(Obj none 0 - ?? Lots)

  • In Handout PacketBegin by examining the individual variables for normality and outliers etc.Can request Cooks D to assess for outlier influenceCan check assumptions using plot from regression analysis

  • Then look at the simple correlations (r)Expect predictors to correlate with criterion, but not a lot with each other

  • Check to see how well the model workedR and R2, and test of significanceStandard error of the estimate To describe sampleTo generalize to populationTypical residual

  • 95Now can look at individual predictorscheck collinearitysee which predictors are individually significantlook at individual contributions (semi-partial or part r2)

  • Go through example in SPSSLook at G*PowerStepwise example in Handouts

    *************************************