multiple independent variables

Multiple Independent Variables

POLS 300

Butz

Multivariate Analysis

• Problem with bivariate analysis in nonexperimental designs:– Spuriousness and Causality

• Need for techniques that allow the research to control for other independent variables


• Employed to see how large sets of variables are interrelated.

• Idea is that if one can find a relationship between x and y after accounting for other variables (w and z) we may be able to make “causal inference”.


• We know that both X and Y both may be caused by Z, spurious relationship.

• Multivariate Analysis allows for the inclusions of other variables and to test if there is still a relationship between X and Y.


• Must ask if the possibility of a third variable (and maybe others) is the “true” cause of both the IV and DV

• Experimental analyses “prove” causation but only in Laboratory Setting…must use Multivariate Statistical Analyses in “real-world”

• Need to “Control” or “hold constant” other variables to isolate the effect of IV on DV!

Controlling for Other Independent Variables

• Multivariate Crosstabulation – evaluate bivariate relationship within subsets of sample defined by different categories of third variable (“control by grouping”)

• At what level(s) of measurement would we use Multivariate Crosstabulation??

Multivariate Crosstabulation

• Control by grouping: group the observations according to their values on the third variable and…

• then observe the original relationship within each of these groups.

• P. 407/506 – Spending Attitudes and Voting…controlling for Income! – spurious

• Occupational Status and Voter Turnout• P. 411/510…control for “education”!

Quick Review: Regression

• In general, the goal of linear regression is to find the line that best predicts Y from X.

• Linear regression does this by estimating a line that minimizes the sum of the squared errors from the line

• Minimizing the vertical distances of the data points from the line.

Regression vs. Correlation

• The purpose of regression analysis is to determine exactly what the line is (i.e. to estimate the equation for the line)

• The regression line represents predicted values of Y based on the values of X

Equation for a Line (Perfect Linear Relationship)

Yi = a + BXi

a = Intercept, or Constant = The value of Y when X = 0

B = Slope coefficient = The change (+ or ‑) in Y given a one unit change in X

Slope• Yi = a + BXi

• B = Slope coefficient• If B is positive than you have a positive

relationship. If it is negative you have a negative relationship.

• The larger the value of B the more steep the slope of the line…Greater (more dramatic) change in Y for a unit change in X

• General Interpretation: For one unit change in X, we expect a B change in Y on average.

Calculating the Regression Equation For “Threat Hypothesis”

• The estimated regression equation is:

E(welfare benefit1995) = 422.7879 + [(-6.292) * %black(1995)]

Number of obs = 50F( 1, 64) = 76.651Prob < = 0.001

R-squared = 0.3361------------------------------------------------------------------------------welfare1995 | Coef. Std. Err. t P< [95% Conf. Interval]---------+-------------------------------------------------------------------

Black1995(b)| -6.29211 .771099 -8.162 0.001 -8.1173 -4.0746 _cons(a)| 422.7879 12.63348 25.551 0.001 317.90407 336.6615------------------------------------------------------------------------------

Regression Example: “Threat Hypothesis”

• To generate a predicted value for various % of AA in 1995, we could simply plug in the appropriate X values and solve for Y.

10% E(welfare benefit1995) = 422.7879 + [(-6.292) * 10] = $359.87



Regression Analysis and Statistical Significance

• Testing for statistical significance for the slope

– The p-value - probability of observing a sample slope value (Beta Coefficent) at least as large as the one we are observing in our sample IF THE NULL HYPOTHESIS IS TRUE

– P-values closer to 0 suggest the null hypothesis is less likely to be true (P < .05 usually the threshold for statistical significance)

– Based on t-value…(Beta/S.E.) = t

Multiple Regression

• At what level(s) of measurement would we employ multiple regression???

• Interval and Ratio DVs

• Now working with a new model:

• Yi = abXibX2ibkXki ei

Multiple Regression

• Yi = abXibX2ibkXki ei

• b are “Partial” slope coefficients.

• a is the Y-Intercept.

• e is the Error Term.

Slope Coefficients

• Slope coefficients are now Partial Slope Coefficients, although we still refer to them generally as slope coefficients. They have a new interpretation:

• “The expected change in Y given a one‑unit change in X1, holding all other variables constant”

Multiple Regression

• By “holding constant” all other X’s, we are therefore “controlling for” all other X’s, and thus isolating the “independent effect” of the variable of interest.

• “Holding Constant” – group observations according to levels of X2, X3, ect…then look at impact of X1 on Y!

• This is what Multiple Regression is doing in practice!!!• Make everyone “equal” in terms of “control” variable then

examine the impact of X1 on Y!

“Holding Constant” other IVs

• Income (Y) = Education (X1); Seniority (X2)

• Look at relationship between Seniority and Income WITHIN different levels of education!!! “Holding Education Constant”

• Look at relationship between Education and Income WITHIN different levels of Senority!!! “Holding Seniority Constant”

The Intercept

Yi = abXibX2ibkXki ei

Y-Intercept (Constant) value…(a)…is now the expected value of Y when ALL the Independent Variables are set to 0.

Testing for Statistical Significance

• Proceeds as before – a probability that the null hypothesis holds (p-value) is generated for each sample slope coefficient

• Based on “t-value” (Beta/ S.E.)

• And Degrees of Freedom!

Fit of the Regression

• R-squared value – the proportion of variation in the dependent variable explained by ALL of the independent variables combined

• TSS – ResSS/ TSS… “Explained Variation in DV divided by Total Variation in DV”

R-square

• R-square ranges from 0 to 1.

• 0 is no relationship.

• 1 is a prefect relationship…IVs explain 100% of the variance in the DV.

R-square

• Doesn’t tell us WHY the dependent variable varies or explains the results….This is why we need Theory!!!

• Simply a measure of how well your model fits the dependent variable.

• How well are the Xs predicting Y!• How much variation in Y is explained by Xs!

Multiple Regression

- Y= Income in dollars

• X1= Education in years

• X2= Seniority in years

• Y= a + b1(education) + b2(Seniority) + e

Example

• Y= 5666 + 432X1 + 281X2 + e

- Both Coefficients are statistically significant at the P < .05 Level…

• Because of the positive Beta…expected change in Income (Y) given a one‑unit increase in Education is +$432, holding seniority in years constant.

Predicted Values

• Lets predict someone with 10 years of education and 5 years of seniority.

• Y= 5666+432X1+281X2+e

• = 5666+432(10)+281(5)

• = 5666+ 4320+1405

• Predicted value of Y for this case is $11,391.

R-squared

• r-squared for this model is .56.

• Education and Seniority explain 56% of the variation in income.

multiple independent variables

Documents