multiple independent variables
DESCRIPTION
Multiple Independent Variables. POLS 300 Butz. Multivariate Analysis. Problem with bivariate analysis in nonexperimental designs: Spuriousness and Causality Need for techniques that allow the research to control for other independent variables. Multivariate Analysis. - PowerPoint PPT PresentationTRANSCRIPT
Multiple Independent Variables
POLS 300
Butz
Multivariate Analysis
• Problem with bivariate analysis in nonexperimental designs:– Spuriousness and Causality
• Need for techniques that allow the research to control for other independent variables
Multivariate Analysis
• Employed to see how large sets of variables are interrelated.
• Idea is that if one can find a relationship between x and y after accounting for other variables (w and z) we may be able to make “causal inference”.
Multivariate Analysis
• We know that both X and Y both may be caused by Z, spurious relationship.
• Multivariate Analysis allows for the inclusions of other variables and to test if there is still a relationship between X and Y.
Multivariate Analysis
• Must ask if the possibility of a third variable (and maybe others) is the “true” cause of both the IV and DV
• Experimental analyses “prove” causation but only in Laboratory Setting…must use Multivariate Statistical Analyses in “real-world”
• Need to “Control” or “hold constant” other variables to isolate the effect of IV on DV!
Controlling for Other Independent Variables
• Multivariate Crosstabulation – evaluate bivariate relationship within subsets of sample defined by different categories of third variable (“control by grouping”)
• At what level(s) of measurement would we use Multivariate Crosstabulation??
Multivariate Crosstabulation
• Control by grouping: group the observations according to their values on the third variable and…
• then observe the original relationship within each of these groups.
• P. 407/506 – Spending Attitudes and Voting…controlling for Income! – spurious
• Occupational Status and Voter Turnout• P. 411/510…control for “education”!
Quick Review: Regression
• In general, the goal of linear regression is to find the line that best predicts Y from X.
• Linear regression does this by estimating a line that minimizes the sum of the squared errors from the line
• Minimizing the vertical distances of the data points from the line.
Regression vs. Correlation
• The purpose of regression analysis is to determine exactly what the line is (i.e. to estimate the equation for the line)
• The regression line represents predicted values of Y based on the values of X
Equation for a Line (Perfect Linear Relationship)
Yi = a + BXi
a = Intercept, or Constant = The value of Y when X = 0
B = Slope coefficient = The change (+ or ‑) in Y given a one unit change in X
Slope• Yi = a + BXi
• B = Slope coefficient• If B is positive than you have a positive
relationship. If it is negative you have a negative relationship.
• The larger the value of B the more steep the slope of the line…Greater (more dramatic) change in Y for a unit change in X
• General Interpretation: For one unit change in X, we expect a B change in Y on average.
Calculating the Regression Equation For “Threat Hypothesis”
• The estimated regression equation is:
E(welfare benefit1995) = 422.7879 + [(-6.292) * %black(1995)]
Number of obs = 50F( 1, 64) = 76.651Prob < = 0.001
R-squared = 0.3361------------------------------------------------------------------------------welfare1995 | Coef. Std. Err. t P< [95% Conf. Interval]---------+-------------------------------------------------------------------
Black1995(b)| -6.29211 .771099 -8.162 0.001 -8.1173 -4.0746 _cons(a)| 422.7879 12.63348 25.551 0.001 317.90407 336.6615------------------------------------------------------------------------------
Regression Example: “Threat Hypothesis”
• To generate a predicted value for various % of AA in 1995, we could simply plug in the appropriate X values and solve for Y.
10% E(welfare benefit1995) = 422.7879 + [(-6.292) * 10] = $359.87
20% E(welfare benefit1995) = 422.7879 + [(-6.292) * 20] = $296.99
30% E(welfare benefit1995) = 422.7879 + [(-6.292) * 30] = $234.09
Regression Analysis and Statistical Significance
• Testing for statistical significance for the slope
– The p-value - probability of observing a sample slope value (Beta Coefficent) at least as large as the one we are observing in our sample IF THE NULL HYPOTHESIS IS TRUE
– P-values closer to 0 suggest the null hypothesis is less likely to be true (P < .05 usually the threshold for statistical significance)
– Based on t-value…(Beta/S.E.) = t
Multiple Regression
• At what level(s) of measurement would we employ multiple regression???
• Interval and Ratio DVs
• Now working with a new model:
• Yi = abXibX2ibkXki ei
Multiple Regression
• Yi = abXibX2ibkXki ei
• b are “Partial” slope coefficients.
• a is the Y-Intercept.
• e is the Error Term.
Slope Coefficients
• Slope coefficients are now Partial Slope Coefficients, although we still refer to them generally as slope coefficients. They have a new interpretation:
• “The expected change in Y given a one‑unit change in X1, holding all other variables constant”
Multiple Regression
• By “holding constant” all other X’s, we are therefore “controlling for” all other X’s, and thus isolating the “independent effect” of the variable of interest.
• “Holding Constant” – group observations according to levels of X2, X3, ect…then look at impact of X1 on Y!
• This is what Multiple Regression is doing in practice!!!• Make everyone “equal” in terms of “control” variable then
examine the impact of X1 on Y!
“Holding Constant” other IVs
• Income (Y) = Education (X1); Seniority (X2)
• Look at relationship between Seniority and Income WITHIN different levels of education!!! “Holding Education Constant”
• Look at relationship between Education and Income WITHIN different levels of Senority!!! “Holding Seniority Constant”
The Intercept
Yi = abXibX2ibkXki ei
Y-Intercept (Constant) value…(a)…is now the expected value of Y when ALL the Independent Variables are set to 0.
Testing for Statistical Significance
• Proceeds as before – a probability that the null hypothesis holds (p-value) is generated for each sample slope coefficient
• Based on “t-value” (Beta/ S.E.)
• And Degrees of Freedom!
Fit of the Regression
• R-squared value – the proportion of variation in the dependent variable explained by ALL of the independent variables combined
• TSS – ResSS/ TSS… “Explained Variation in DV divided by Total Variation in DV”
R-square
• R-square ranges from 0 to 1.
• 0 is no relationship.
• 1 is a prefect relationship…IVs explain 100% of the variance in the DV.
R-square
• Doesn’t tell us WHY the dependent variable varies or explains the results….This is why we need Theory!!!
• Simply a measure of how well your model fits the dependent variable.
• How well are the Xs predicting Y!• How much variation in Y is explained by Xs!
Multiple Regression
- Y= Income in dollars
• X1= Education in years
• X2= Seniority in years
• Y= a + b1(education) + b2(Seniority) + e
Example
• Y= 5666 + 432X1 + 281X2 + e
- Both Coefficients are statistically significant at the P < .05 Level…
• Because of the positive Beta…expected change in Income (Y) given a one‑unit increase in Education is +$432, holding seniority in years constant.
Predicted Values
• Lets predict someone with 10 years of education and 5 years of seniority.
• Y= 5666+432X1+281X2+e
• = 5666+432(10)+281(5)
• = 5666+ 4320+1405
• Predicted value of Y for this case is $11,391.
R-squared
• r-squared for this model is .56.
• Education and Seniority explain 56% of the variation in income.