multivariate models analysis of variance and regression using dummy variables

23
Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Upload: shanon-underwood

Post on 04-Jan-2016

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Multivariate Models

Analysis of Variance and Regression Using Dummy

Variables

Page 2: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Models

• A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it.

• Steps in the Process of Quantitative Analysis:– Specification of the model– Estimation of the model– Evaluation of the model

Page 3: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Model of Housing Values and Building Size

• Historian A hypothesizes that there is a linear relationship among housing value, building size and the number of families in the dwelling.

• Building Size = Square Feet/1000• Housing Value = 1905 Property Assessment in

2002 dollars/1000• Families = Number of families in the dwelling• Housing Value = a + b1(Building Size) +

b2(Familes).

Page 4: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

The Model of Determinants of Housing Value

Dep Var: NEWVAL N: 467 Multiple R: 0.724 Squared multiple R: 0.524

Adjusted squared multiple R: 0.522 Standard error of estimate: 20.284

Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -2.551 3.029 0.000 . -0.842 0.400

NEWSIZE 25.893 1.146 0.734 0.972 22.595 0.000

FAMILIES -5.626 2.094 -0.087 0.972 -2.687 0.007

Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P

Regression 210541.070 2 105270.535 255.858 0.000

Residual 190908.482 464 411.441

Page 5: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

New Questions…

• Historian B suggests that there will be a neighborhood effect on housing values, and suggests that the values will be different, even taking size and number of families into consideration, on the north side, south side and east side.

• Historian B poses the problem to Historian A.

Page 6: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

New Possibility: Analysis of Variance

• Comparison of the levels of an interval level dependent variable and a categorical or nominal independent variable.

• Are the property values different in the three neighborhoods, East, NW and South.

• Take a look first at the mean differences.

Page 7: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Value by Neighborhood

Page 8: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

But…

• Are the results statistically significant?

• What is the strength of the relationship?

• How would we integrate this information into the earlier regression model?

Page 9: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Concepts

• We partition the total variation or variance into two components: – (1) variance which is a function of the group

membership, that is the differences between the groups; and

– (2) variance within the groups.

• More formally: Total Sum of Squares = Between Groups Sum of Squares + Within Groups Sum of Squares

Page 10: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Equation

• Total Sum of Squares = Within Groups Sum of Squares + Between Groups Sum of Squares

• TSS= SSW + SSB

2 2 2

Y Y Y Y Y Yi i j j ( )

Page 11: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Calculations

SSW Ni

k

iS 11

2

SSB Ni ii

k

X X

2

1

Page 12: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

LET SSBETWEEN = N* (MEAN-28.818)* (MEAN -28.818)

Case VAR00001$ MEAN N SD VARIANCE SSBETWEEN 3 EASTSIDE 47.313 92.000 18.334 336.134 31469.982 4 NW 26.035 308.000 12.096 146.305 2385.316 5 SOUTHSID 17.992 78.000 8.994 80.890 9141.271 6 Total 28.818 478.000 16.171 261.487 0.000

. . . 42996.569

Page 13: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Anova Table

• DF between = k -1

• DF within = N – k

Page 14: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Degrees of Freedom

• DF between = k -1

• DF within = N – k

• Website for F Table:– http://www.itl.nist.gov/div898/handbook/eda/s

ection3/eda3673.htm#ONE-05-1-10

• Eta Squared = SSBetween/Total SS = .345 (equivalent to R Square)

Page 15: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

So, now what…

• We know that the neighborhood affects the value of the house.

• How do we integrate that knowledge into a regression model?

Page 16: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

A Dilemma….

• Regression requires interval level measurement.

• One cannot include categorical variables in the equation.

• Historian A proposes testing separate models for the three neighborhoods.

Page 17: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Results

• Regression Models for the Three Wards: Determinants of Housing Value

• Northwest East Side South Side

Constant 5.90* -13.26 5.35*Newsize 11.99* 41.49* 14.88*Families 1.37 -19.90* -1.38

N 295 98 74R Squared .57 .55 .60

• *Statistically significant at the .05 level.

Page 18: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Is there another way?

• Can we develop one model instead of three?

• Answer: Yes, by remeasuring the neighborhood at the interval level.

• How? By conceiving of new variables identifying the presence or the absence of the neighborhood, that is a set of binary variables, called dummy variables.

Page 19: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Illustration of Dummy Variables

Neighborhood East Side

South Side

Northwest Side

East Side 1 0 0

South Side 0 1 0

Northwest Side 0 0 1

Page 20: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Illustration continued…

• Two new binary variables provide all the information needed for the three categories.

• Rule: Create k -1 dummy variables for the original categorical variable.

• The omitted category represents the value of the equation when the other dummy variables = 0.

Page 21: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

New variables: Northwest Side as the Omitted Category

• Variable: Eastside. Codes: Yes=1; No=0• Variable: South. Codes: Yes=1; No=0• By implication:

– For a household on the Eastside, Eastside=1 and South=0

– For a household on the Southside, Eastside=0 and Southside=1

– For a household in the Northwest Side, Eastside = 0 and South = 0.

Page 22: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Results

Newval = a + b1(Newsize) + b2(Families) + b3(Eastside) + b4(South)

Dep Var: NEWVAL N: 467 Multiple R: 0.75 Squared multiple R: 0.56 Adjusted squared multiple R: 0.55 Standard error of estimate: 19.61 Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -3.32 2.95 0.00 . -1.13 0.26NEWSIZE 23.60 1.32 0.67 0.68 17.88 0.00FAMILIES -5.27 2.15 -0.08 0.87 -2.46 0.01EASTSIDE 14.06 2.53 0.20 0.78 5.56 0.00SOUTH 6.08 2.75 0.08 0.81 2.21 0.03

Page 23: Multivariate Models Analysis of Variance and Regression Using Dummy Variables

Implications

• 1. Separate regressions for each neighborhood imply that the other coefficients in the equation vary by ward.

• 2. Regression with dummy variables implies that the neighborhood effect is a movement of the Y intercept.

• There may be interactions between the slope coefficients and the dummy variables, i.e., both 1 and 2 may be the case.