1 mgt 511: hypothesis testing and regression lecture 8: framework for multiple regression analysis...

20
1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

Upload: coleen-wilkinson

Post on 18-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

1

MGT 511: Hypothesis Testing and RegressionLecture 8: Framework for Multiple Regression

Analysis

K. SudhirYale SOM-EMBA

Page 2: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

2

Recall

Simple Regression T-test of slope coefficients, R-square Forecasts, Prediction and Confidence Intervals Transformations for nonlinearity and non-constant variance

Multiple Regression Partial Slopes, tradeoff between bias and precision ANOVA, F-test Dummy Variables and Interaction Variables Residual Analysis and Outliers

Page 3: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

3

Framework for Multiple Regression

Use theory, knowledge to build the initial model Residual Analysis and Refinement of model Perform F-test; If F-test rejects null, perform t-tests

Possible Reasons for Insignificance of Individual Slope Coefficients

Refine the model

Page 4: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

4

Step 1: Using knowledge, theory to specify initial model

What is dependent variable? potential predictor variables? Should you use

Transformations to accommodate nonlinear effects Normalize the y or x variables (per-capita, constant $ etc) Dummy variables Interaction variables if slope effects can be different

Collect data, Estimate the model Are the results plausible? For e.g., how is prediction at extreme

values? If not refine model.

Page 5: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

5

What should be the Y and X variables?

Y- Sales of personal printers in different sales districts What are appropriate X variables?

Knowledge suggests several segments: College students, home users, small businesses, computer network

workstations Appropriate X variables

College freshmen, household income, small business starts, new network installations

Page 6: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

6

Potential X variables: Tradeoffs

Omitting important variables can bias results or reduce explanatory power

Using too many variables can make all variables insignificant

Prioritize the variables, based on what you consider are most important

Page 7: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

7

Transformations

Is the relationship nonlinear? Sales-Advertising relationship Experience Curve effect

Page 8: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

8

Normalization of the Variables

Normalizing the Y variable: Example Y- Unit Sales in different cities (Problem?) X- Price and Feature Advertising Solution?

Normalizing the X variable: Example Y- Total Market Value of Firm X- Value of Assets, Number of Employees (Problem?) Solution?

Page 9: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

9

Interaction Effects

Y- Sales; X: Prices, Feature Y- Sales; X: Price,Holiday

Y-Salary; X: Gender, Experience

Page 10: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

10

Plausibility of Results

Will results make sense at extreme values? Usually alerts to nonlinearity issues

Examples: What will sales be at very high prices, very high advertising? What will cost be at high levels of experience?

Page 11: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

11

Step 2: Residual Analysis

Check the residuals; refine model Accommodating Nonlinear Effects Accounting for non-constant variance Accounting for outliers

Keep refining the model, estimate the refined model until the residuals are “satisfactory” Remember that residuals will not perfectly follow the “rules”

due to randomness; minor deviations will not affect regression results

Page 12: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

12

Step 3: Performing F-tests and t-tests

If estimated equation and residual analysis are OK, conduct F-test for the model as a whole

If we reject the null using the F-test conduct t-tests for individual slopes

Question: What to do if one or more individual slope coefficients are insignificant?

Page 13: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

13

Possible Reasons for Insignificance of Individual Slope Coefficients

Omitted Variable Bias Nonlinearity not appropriately taken care of Multicollinearity True effect is non-zero, but small True effect is zero

Page 14: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

14

Omitted Variable Bias

One or more relevant predictor variables are missing action: add the variables to the model

Example 1 Y- Sales X- Price Omitted X variable – Advertising

Example 2 Y- Salary X- Schooling Omitted X variable – Job Experience

Page 15: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

15

Regression of Salary against Schooling and Experience

CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept 47334.97 3526.717 13.42182 1.84E-13 40098.75 54571.19Schooling 311.0538 226.6091 1.372645 0.181158 -153.909 776.017

CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%Intercept -65798 29966.01 -2.19575 0.03723 -127394 -4201.91Schooling 5793.49 1457.244 3.975647 0.000498 2798.079 8788.901Experience 1836.442 484.1689 3.792978 0.0008 841.2179 2831.666

Explain this phenomenon

Page 16: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

16

Nonlinearity not taken care of

The X variable affects the Y variable differently than assumed in the model

action: use a different transformation

Example: Recall HW Problem Y- Yield X-Temperature; Solution: Add Temperature^2

Page 17: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

17

Multicollinearity

Highly Correlated X variables reduce significance of all variables

action 1: reformulate the model (e.g. per capita; constant $) action 2: obtain more data action 3: delete this predictor variable

Page 18: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

18

True Effect is Small or Zero

True effect of X is small, but non-zero action 1: obtain more data (or) action 2: delete this variable

True effect of X is zero action 2: delete this variable

Page 19: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

19

Possible Reasons for Insignificance of Individual Slope Coefficients

Omitted Variable Bias Nonlinearity not appropriately taken care of Multicollinearity True effect is non-zero, but small True effect is zero

Page 20: 1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA

20

Summary

For multiple regression to provide valid and meaningful results, it is critical that the proposed model is “well done”

Before we can justify statistical inference (about the model, about slope parameters or for predictions), the plausibility of the estimated equation should be checked and the residuals should be examined

Variables should be transformed to accommodate nonlinear effects for the original variables (e.g. resulting in linear effects for the transformed variables)

There are many possible reasons for the occurrence of insignificant slope coefficients (and it is not easy to distinguish between these reasons)