i271b quantitative methods regression and diagnostics

15
I271B QUANTITATIVE METHODS Regression and Diagnostics

Upload: joan-cunningham

Post on 21-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: I271B QUANTITATIVE METHODS Regression and Diagnostics

I271B QUANTITATIVE METHODS

Regression and Diagnostics

Page 2: I271B QUANTITATIVE METHODS Regression and Diagnostics

Regression versus Correlation2

Correlation makes no assumption about one whether one variable is dependent on the other– only a measure of general association

Regression attempts to describe a dependent nature of one or more explanatory variables on a single dependent variable. Assumes one-way causal link between X and Y.

Thus, correlation is a measure of the strength of a relationship -1 to 1, while regression measures the exact nature of that relationship (e.g., the specific slope which is the change in Y given a change in X)

Page 3: I271B QUANTITATIVE METHODS Regression and Diagnostics

Basic Linear Model3

Page 4: I271B QUANTITATIVE METHODS Regression and Diagnostics

Basic Linear Function4

Page 5: I271B QUANTITATIVE METHODS Regression and Diagnostics

Slope5

But...what happens if B is negative?

Page 6: I271B QUANTITATIVE METHODS Regression and Diagnostics

Statistical Inference Using Least Squares

6

We obtain a sample statistic, b, which estimates the population parameter.

We also have the standard error for b

Uses standard t-distribution with n-2 degrees of freedom for hypothesis testing.

YYii = b = b0 0 + b+ b11xxii + e + eii..

Page 7: I271B QUANTITATIVE METHODS Regression and Diagnostics

Why Least Squares?7

For any Y and X, there is one and only one line of best fit. The least squares regression equation minimizes the possible error between our observed values of Y and our predicted values of Y (often called y-hat).

Page 8: I271B QUANTITATIVE METHODS Regression and Diagnostics

Data points and Regression8

http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

Page 9: I271B QUANTITATIVE METHODS Regression and Diagnostics

Multivariate Regression

Control Variables

Alternate Predictor Variables

Nested Models

9

Page 10: I271B QUANTITATIVE METHODS Regression and Diagnostics

Nested Models10

Page 11: I271B QUANTITATIVE METHODS Regression and Diagnostics

Regression Diagnostics11

Page 12: I271B QUANTITATIVE METHODS Regression and Diagnostics

Lab #4

Stating Hypothesis

Interpreting Hypotheses Terminology Appropriate statistics and conventions

Effect Size (revisited) Cohen’s d and the .2, .5, .8 interpretation values See also: http://web.uccs.edu/lbecker/Psy590/es.htm for

a very nice lecture and discussion of the different types of effect size calculations

12

Page 13: I271B QUANTITATIVE METHODS Regression and Diagnostics

13

Multicollinearity

Occurs when an IV is very highly correlated with one or more other IV’s Caused by many things (including variables computed by other

variables in same equation, using different operationalizations of same concept, etc)

Consequences For OLS regression, it does not violate assumptions, but

Standard Errors will be much, much larger than normal when there is multicollinearity (confidence intervals become wider, t-statistics become smaller)

We often use VIF (variance inflation factors) scores to detect multicollinearity Generally, VIF of 5-10 is problematic, higher values considered

problematic

Solving the problem Typically, regressing each IV on the other IV’s is a way to find the

problem variable(s).

Page 14: I271B QUANTITATIVE METHODS Regression and Diagnostics

14

Heteroskedasticity

OLS regression assumes that the variance of the error term is constant. If the error does not have a constant variance, then it is heteroskedastic.

Where it comes from Error may really change as an IV increases Measurement error Underspecified model

Page 15: I271B QUANTITATIVE METHODS Regression and Diagnostics

15

Heteroskedasticity (continued)

Consequences We still get unbiased parameter estimates, but our line may not be the best

fit. Why? Because OLS gives more ‘weight’ to the cases that might actually

have the most error from the predicted line.

Detecting it We have to look at the residuals (difference between observed responses

from the predicted responses)

First, use a residual versus fitted values plot (in STATA, rvfplot) or the residuals versus predicted values plot, which is a plot of the residuals versus one of the independent variables. We should see an even band across the 0 point (the line), indicating that our error

is roughly equal.

If we are still concerned, we can run a test such as the Breusch-Pagan/Cook-Weisberg Test for Heteroskedasticity. It tests the null hypothesis that the error variances are all EQUAL, and the alternative hypothesis that there is some difference. Thus, if it is significant then we reject the null hypothesis and we have a problem of heteroskedasticity.