multiple regression models. exploring an example: “chapter 4: multiple regression ii” data ...

28
Multiple Regression Models

Upload: amie-clark

Post on 23-Dec-2015

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Multiple Regression Models

Page 2: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Exploring an example: “Chapter 4: Multiple Regression II” data

Online stock trading through the Internet has increased dramatically during the past several years. An article discussing this new method of investing provided data on the major Internet stock brokerages who provide this service. Here we have some data for the top 10 Internet brokerages. The variables are Mshare, the market share of the firm; Accts, the number of Internet accounts in thousands; and Assets, the total assets in billions of dollars.

Describe the data: How many variables does the data set contain?

How would you describe them in terms of levels of measurement?

Page 3: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Explaining Assets with each predictor variable

Find the correlation between Assets, and the explanatory variables Mshare and Accts.

Use a Simple Linear Regression to predict Assets content using the number of accounts. What is the regression equation? What are the results of the significance test for

the regression coefficient? Do the same using Mshare.

Page 4: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

What is Multiple Regression?What is Multiple Regression?

Predicting an outcome (dependent variable) based upon several independent variables simultaneously.

Why is this important? Why is this important? Behavior is rarely a function of just one

variable, but is instead influenced by many variables. So the idea is that we should be able to obtain a more accurate predicted score if using multiple variables to predict our outcome.

Page 5: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Strategy for Multiple Regression

Hypothesize form of the model (choose which independent variables to include)

Conduct exploratory data analysis

Develop one or more tentative models

Identify most suitable model

Make inferences based on model

Stop…

Start

Page 6: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

The Multiple Linear Regression Model

Regression applications in which there are several independent variables, x1, x2, … , xk . A multiple linear regression model with p independent variables has the equation

βi is the intercept and βi determines the contribution of the independent variable xi

The ε is a random variable with mean 0 and variance σ2.

y o 1 1 p px x

Page 7: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

The Prediction Equation

The equation for this model fitted to data is

Where denotes the “predicted” value computed from the equation, and bi denotes an estimate of βi.

As with Simple Linear Regression, they’re obtained by the method of least squares Among the set of all possible values for the

parameter estimates, I find the ones which minimize the sum of squared residuals.

o 1 1 p py b b x b x y

Page 8: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Basic Idea

With multiple regression, we form a 'linear combination' of multiple variables to best predict an outcome, and then we assess the contribution that each predictor variable makes to the equation.

My research question might be: “How much does an independent variable

contribute to explaining dependent variable after the effect of another independent variable is taken into account?”

Page 9: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Doing the Calculations

Computation of the estimates by hand is tedious.

They are ordinarily obtained using a regression computer program.

Standard errors also are usually part of output from a regression program.

Page 10: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Let’s Return to the Example

Construct a 3-D plot.  Come up with a prediction equation for

the multiple regression model.

Page 11: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Assessing the Utility of the Model: Hypothesis tests (see MLR handout)

Test if all of the slope parameters are zero: F –test.

Test if a particular slope parameter is zero given that all other x's remain in the model: t –test.

Page 12: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

ANOVA: ANOVA: ANalysis Of VAriance

This is a test of the null hypothesis that Multiple R in the population = 0.0. If this is .05 or less, reject the null hypothesis.

For a multiple linear regression model with p independent variables fitted to a data set with n observations is, the ANOVA is:

Source ofVariation DF SS MSModel p SSM MSMError n-p-1 SSE MSETotal n-1 SST

Page 13: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Sums of squares

The sums of squares SSM, SSE, and SST have the same definitions in relation to the model as in simple linear regression:

2

2

2

ˆSSR y y

ˆSSE y y

SST y y

M

Page 14: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

SST = SSM + SSE

The value of SST does not change with the model. It depends only on the values of the dependent variable

y.

SSE decreases as variables are added to a model, and SSM increases by the same amount.

This amount of increase in SSM is the amount of variation due to variables in the larger model that was not accounted for by variables in the smaller model.

Page 15: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

F statistic

F is the statistic to test if ALL the slope parameters are zero.

• ANOVA gives F statistic and p-value (be sure to set the α level)

• Under the null hypothesis

the F statistic has an F(p, n-p-1) distribution and the p-value is ___. According to this distribution, the chance of obtaining an F statistic of __ or larger is _(p-value). We conclude that the model is useful/not useful for predicting…

1 2: ... 0o pH

MSMF

MSE

Page 16: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Proceed only if F and corresponding p-value indicate sufficient evidence that the overall model is useful

If so, look to the individual variables to determine their contribution

We do this with t-tests p = .05 or less than each variable

indicates a significant contribution

Page 17: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Interpreting coefficients

Constant = slope Other coefficients are the regression

coefficients, interpreted as the change in the mean dependent variable for each unit change in the corresponding independent variable, all other variables held constant.

Page 18: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Confidence Intervals

Use

bj is the least-squares estimate of

t* is the (1-C)/2 critical value from the t(n-p-1) distribution.

*jj bb t SE

j

Page 19: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Returning to our example…

How good is the model? Which variables contribute to the

model?

Page 20: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

What if the Relationship is Curvilinear?

Example: Application journal for chapter 4 (data- Chapter 4: Curvilinear Relationship) Explore the relationship between IgG (y) as a

function of maximal oxygen uptake (x). Does a linear or curvilinear model better explain

the variation in IgG? How do you determine this?

Page 21: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Basic Quadratic Model

E(y) = β0 + β1x + β2x2 • β0 is the y-intercept of the curve; value

of E(y) when x = 0• β1 is the shift parameter; changing the

value of β1 shifts the parabola to the right (if increased) or left (with decrease)

• β2 is the rate of curvature

Page 22: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Interpreting the Coefficient (β) Estimates

Estimate of β0 can only be meaningfully interpreted if the sampled range of the independent variable includes zero.

The estimated coefficient of the first-order terms no longer represent the slope and cannot typically be meaningfully interpreted.

The sign of the coefficient associated with the quadratic term (x2) indicates if curve is concave downward (mound-shaped): - concave upward (bowl-shaped): +

What is the prediction equation, and how would you interpret the βs for the example?

Page 23: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Assessing Model Utility Again, refer to the F test statistic and associate p-value. If these indicate that the model is useful, proceed to

the t-test of the β associated with the quadratic term (x2)- β2 here

H0: β2 = 0 (no curvature in response curve) Ha: β2 < 0 (downward concavity exists)

Or Ha: β2 > 0 (upward concavity exists)

This is a one-tailed test, so we divide the associated p-value by 2.

We do not need to consider the test statistics for the coefficients associated with the y-intercept and first-order term(s)

Page 24: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

What if I have a Qualitative Independent Variable?

Create a “dummy” variable (indicator variable.)

Instructions included on Minitab worksheet.

Example: Application journal # 3 (data- Chapter 4: Dummy Variable) Create a dummy variable for repellent type Is repellent type useful for predicting cost per

use? Number of hours of protection?

Page 25: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

What if the relationship between E(y) and any one IV depends on the value of another IV?

In this case, the two independent variables interact, and we model this a cross-product of the IVs.

Page 26: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Example: Graph and interpret the following findings

Let’s say we want to study how hard students work on tests. We have some achievement-oriented students and some achievement-avoiders. We create two random halves in each sample, and give half of each sample a challenging test, the other an easy test. We measure how hard the students work on the test. The means of this study are:

Achievement-oriented (n=100)

Achievement –avoiders (n=100)

Challenging test 10 5

Easy test 5 10

Page 27: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Caution!

Once an interaction has been deemed important in a model, all associated first-order terms should be kept in the model, regardless of the magnitude of their p-values.

Page 28: Multiple Regression Models. Exploring an example: “Chapter 4: Multiple Regression II” data  Online stock trading through the Internet has increased dramatically

Conclusions

E(y)= β0 + β1x1 + β2x2+ β3x1x2

The effect of test difficulty (x1) on effort (y) depends on a student’s achievement orientation (x2).

Thus, the type of achievement orientation and test difficulty interact in their effect on effort.

This is an example of a two-way interaction between achievement orientation and test difficulty.