copyright cengage learning. all rights reserved. 13 nonlinear and multiple regression

46
Copyright © Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

Upload: robyn-chandler

Post on 19-Jan-2018

223 views

Category:

Documents


1 download

DESCRIPTION

3 The necessity for an alternative to the linear model Y =  0 +  1 x +  may be suggested either by a theoretical argument or else by examining diagnostic plots from a linear regression analysis. In either case, settling on a model whose parameters can be easily estimated is desirable. An important class of such models is specified by means of functions that are “intrinsically linear.”

TRANSCRIPT

Page 1: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

Copyright © Cengage Learning. All rights reserved.

13 Nonlinear and Multiple Regression

Page 2: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

Copyright © Cengage Learning. All rights reserved.

13.2 Regression with Transformed Variables

Page 3: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

3

Regression with Transformed Variables

The necessity for an alternative to the linear model Y = 0 + 1x + may be suggested either by a theoretical argument or else by examining diagnostic plots from a linear regression analysis.

In either case, settling on a model whose parameters can be easily estimated is desirable. An important class of such models is specified by means of functions that are “intrinsically linear.”

Page 4: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

4

Regression with Transformed Variables

DefinitionA function relating y to x is intrinsically linear if, by means of a transformation on x and/or y, the function can be expressed as y = 0 + 1x, where x = the transformed independent variable and y = the transformed dependent variable.

Page 5: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

5

Regression with Transformed Variables

Four of the most useful intrinsically linear functions are given in Table 13.1.

In each case, the appropriate transformation is either a log transformation—either base 10 or natural logarithm (base e)—or a reciprocal transformation.

Useful Intrinsically Linear FunctionsTable 13.1

Page 6: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

6

Regression with Transformed Variables

Representative graphs of the four functions appear in Figure 13.3.

Graphs of the intrinsically linear functions given in Table 13.1Figure 13.3

Page 7: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

7

Regression with Transformed Variables

For an exponential function relationship, only y is transformed to achieve linearity, whereas for a power function relationship, both x and y are transformed.

Because the variable x is in the exponent in an exponential relationship, y increases (if > 0) or decreases (if < 0) much more rapidly as x increases than is the case for the power function, though over a short interval of x values it can be difficult to differentiate between the two functions.

Examples of functions that are not intrinsically linear are y = + γex and y = + γx.

Page 8: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

8

Regression with Transformed Variables

Intrinsically linear functions lead directly to probabilistic models that, though not linear in x as a function, have parameters whose values are easily estimated using ordinary least squares.

DefinitionA probabilistic model relating Y to x is intrinsically linear if, by means of a transformation on Y and/or x, it can be reduced to a linear probabilistic model Y = 0 + 1x + .

Page 9: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

9

Regression with Transformed Variables

The intrinsically linear probabilistic models that correspond to the four functions of Table 13.1 are as follows:

a. Y = ex , a multiplicative exponential model, from

which ln(Y) = Y = 0 + 1x + with x = x, 0 = ln(), 1 = , and = ln().

Useful Intrinsically Linear FunctionsTable 13.1

Page 10: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

10

Regression with Transformed Variables

b. Y = x , a multiplicative power model, so that

log(Y) = Y = 0 + 1x + with x = log(x), 0 = log(x) + , and = log().

c. Y = + log(x) + , so that x = log(x) immediately linearizes the model.

d. Y = + 1/x + , so that x = 1/x yields a linear model.

The additive exponential and power models, Y = ex + and Y = x + , are not intrinsically linear.

Page 11: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

11

Regression with Transformed Variables

Notice that both (a) and (b) require a transformation on Y and, as a result, a transformation on the error variable .

In fact, if has a lognormal distribution with and V() = independent of x, then the transformed models for both (a) and (b) will satisfy all the assumptions regarding the linear probabilistic model; this in turn implies that all inferences for the parameters of the transformed model based on these assumptions will be valid.

If 2 is small, Yx ex in (a) or x in (b).

Page 12: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

12

Regression with Transformed Variables

The major advantage of an intrinsically linear model is that the parameters 0 and 1 of the transformed model can be immediately estimated using the principle of least squares simply by substituting x and y into the estimating formulas:

Parameters of the original nonlinear model can then be estimated by transforming back and/or if necessary.

(13.5)

Page 13: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

13

Regression with Transformed Variables

Once a prediction interval for y when x = x has been calculated, reversing the transformation gives a PI for y itself.

In cases (a) and (b), when 2 is small, an approximate CI

for Yx results from taking antilogs of the limits in the

CI for (strictly speaking, taking antilogs gives a CI for the median of theY distribution, i.e., for . Because the lognormal distribution is positively skewed, ; the two are approximately equal if

2 is close to 0.)

Page 14: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

14

Example 3Taylor’s equation for tool life y as a function of cutting time x states that xyc = k or, equivalently, that y = x.

The article “The Effect of Experimental Error on the Determination of Optimum Metal Cutting Conditions” (J. of Engr. for Industry, 1967: 315–322) observes that the relationship is not exact (deterministic) and that the parameters and must be estimated from data.

Page 15: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

15

Example 3Thus an appropriate model is the multiplicative power model Y = x , which the author fit to the accompanying data consisting of 12 carbide tool life observations (Table 13.2).

cont’d

Data for Example 3Table 13.2

Page 16: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

16

Example 3In addition to the x, y, x, and y values, the predicted transformed values and the predicted values on the original scale ( , after transforming back) are given.

The summary statistics for fitting a straight line to the transformed data are xI = 74.41200, yI = 26.22601, xI

2 = 461.75874, yI

2 = 67.74609, and

xI yI = 160.84601, so

cont’d

Page 17: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

17

Example 3

The estimated values of and , the parameters of the power function model, are and .

cont’d

Page 18: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

18

Example 3Thus the estimated regression function is

3.094491530 1015 x –5.3996.

To recapture Taylor’s (estimated) equation,

set y = 3.094491530 1015 x –5.3996, whence xy.185 = 740.

cont’d

Page 19: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

19

Example 3Figure 13.4(a) gives a plot of the standardized residuals from the linear regression using transformed variables (for which r2 = .922); there is no apparent pattern in the plot, though one standardized residual is a bit large, and the residuals look as they should for a simple linear regression.

cont’d

(a) Standardized residuals versus x from Example 3Figure 13.4

Page 20: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

20

Example 3Figure 13.4(b) pictures a plot of versus y, which indicates satisfactory predictions on the original scale.

cont’d

(b) y Versus y from Example 3^

Figure 13.4

Page 21: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

21

Example 3To obtain a confidence interval for median tool life when cutting time is 500, we transform x = 500 to x = 6.21461. Then = 2.1120, and a 95% CI for 0 + 1(6.21461)

is 2.1120 (2.228)(.0824) = (1.928, 2.296).

The 95% CI for is then obtained by taking antilogs: (e1.928, e2.296) = (6.876, 9.930).

It is easily checked that for the transformed data s2 = ≈ .081. Because this is quite small, (6.876, 9.930) is an approximate interval for .

cont’d

Page 22: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

22

More General Regression Methods

Page 23: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

23

More General Regression MethodsThus far we have assumed that either Y = f (x) + (an additive model) or that Y = f (x) (a multiplicative model). In the case of an additive model, yx = f (x), so estimating the regression function f (x) amounts to estimating the curve of mean y values.

On occasion, a scatter plot of the data suggests that there is no simple mathematical expression for f (x).

Statisticians have recently developed some more flexible methods that permit a wide variety of patterns to be modeled using the same fitting procedure.

Page 24: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

24

More General Regression MethodsOne such method is LOWESS (or LOESS), short for locallyweighted scatter plot smoother. Let (x, y) denote a particular one of the n (x, y) pairs in the sample.

The value corresponding to (x, y) is obtained by fitting astraight line using only a specified percentage of the data (e.g., 25%) whose x values are closest to x.

Furthermore, rather than use “ordinary” least squares, which gives equal weight to all points, those with x values closer to x are more heavily weighted than those whose x values are farther away.

Page 25: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

25

More General Regression MethodsThe height of the resulting line above x is the fitted value . This process is repeated for each of the n points, so n different lines are fit (you surely wouldn’t want to do all this by hand).

Finally, the fitted points are connected to produce a LOWESS curve.

Page 26: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

26

Example 5Weighing large deceased animals found in wilderness areas is usually not feasible, so it is desirable to have a method for estimating weight from various characteristics of an animal that can be easily determined.

Minitab has a stored data set consisting of various characteristics for a sample of n = 143 wild bears.

Page 27: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

27

Example 5Figure 13.7(a) displays a scatter plot of y = weight versus x = distance around the chest (chest girth).

A Minitab scatter plot for the bear weight dataFigure 13.7 (a)

cont’d

Page 28: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

28

Example 5At first glance, it looks as though a single line obtained from ordinary least squares would effectively summarize the pattern. Figure 13.7(b) shows the LOWESS curve produced by Minitab using a span of 50% [the fit at (x, y) is determined by the closest 50% of the sample].

cont’d

A Minitab LOWESS curve for the bear weight dataFigure 13.7 (b)

Page 29: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

29

Example 5The curve appears to consist of two straight line segments joined together above approximately x = 38.

The steeper line is to the right of 38, indicating that weight tends to increase more rapidly as girth does for girths exceeding 38 in.

cont’d

Page 30: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

30

Logistic Regression

Page 31: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

31

Logistic RegressionThe simple linear regression model is appropriate for relating a quantitative response variable to a quantitative predictor x.

Consider now a dichotomous response variable with possible values 1 and 0 corresponding to success and failure.

Let p = P(S) = P(Y = 1). Frequently, the value of p will depend on the value of some quantitative variable x.

Page 32: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

32

Logistic RegressionFor example, the probability that a car needs warranty service of a certain kind might well depend on the car’s mileage, or the probability of avoiding an infection of a certain type might depend on the dosage in an inoculation.

Instead of using just the symbol p for the success probability, we now use p(x) to emphasize the dependence of this probability on the value of x. The simple linear regression equation Y = 0 + 1x + is no longer appropriate, for taking the mean value on each side of the equation gives

Page 33: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

33

Logistic RegressionWhereas p(x) is a probability and therefore must be between 0 and 1, 0 + 1x need not be in this range.

Instead of letting the mean value of Y be a linear function of x, we now consider a model in which some function of the mean value of Y is a linear function of x.

In other words, we allow p(x) to be a function of 0 + 1x rather than 0 + 1x itself. A function that has been found quite useful in many applications is the logit function

Page 34: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

34

Logistic RegressionFigure 13.8 shows a graph of p(x) for particular values of 0 and 1 with 1 > 0.

A graph of a logit function

Figure 13.8

Page 35: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

35

Logistic RegressionAs x increases, the probability of success increases. For 1 negative, the success probability would be a decreasing function of x.

Logistic regression means assuming that p(x) is related to x by the logit function. Straightforward algebra shows that

The expression on the left-hand side is called the odds.

If, for example, ,

then when x = 60 a success is three times as likely as a failure.

Page 36: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

36

Logistic RegressionWe now see that the logarithm of the odds is a linear function of the predictor.

In particular, the slope parameter 1 is the change in the log odds associated with a one-unit increase in x.

This implies that the odds itself changes by the multiplicative factor when x increases by 1 unit.

Page 37: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

37

Logistic RegressionFitting the logistic regression to sample data requires that the parameters 0 and 1 be estimated. This is usually done using the maximum likelihood technique.

The details are quite involved, but fortunately the most popular statistical computer packages will do this on request and provide quantitative and pictorial indications of how well the model fits.

Page 38: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

38

Example 6Here is data, in the form of a comparative stem-and-leaf display, on launch temperature and the incidence of failure of O-rings in 23 space shuttle launches prior to the Challenger disaster of 1986 (Y = yes, failed; N = no, did not fail).

Observations on the left side of the display tend to be smaller than those on the right side.

Stem: Tens digit

Leaf : Ones digit

Page 39: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

39

Example 6Figure 13.9 shows Minitab output for a logistic regression analysis and a graph of the estimated logit function from the R software.

(a) Logistic regression output from Minitab(b) graph of estimated logistic function and classification probabilities from R

Figure 13.9

cont’d

Page 40: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

40

Example 6We have chosen to let p denote the probability of failure. The graph of decreases as temperature increases because failures tended to occur at lower temperatures than did successes.

The estimate of 1 and its estimated standard deviation are = –.232 and = .1082, respectively.

We assume that the sample size n is large enough here so that has approximately a normal distribution.

cont’d

Page 41: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

41

Example 6If 1 = 0 (i.e., temperature does not affect the likelihood of O-ring failure), the test statistic has approximately a standard normal distribution.

The reported value of this ratio is z = –2.14, with a corresponding two-tailed P value of .032 (some packages report a chi square value which is just z2, with the same P-value).

At significance level .05, we reject the null hypothesis of no temperature effect.

cont’d

Page 42: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

42

Example 6The estimated odds of failure for any particular temperature value x is

This implies that the odds ratio—the odds of failure at a temperature of x + 1 divided by the odds of failure at a temperature of x—is

cont’d

Page 43: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

43

Example 6The interpretation is that for each additional degree of temperature, we estimate that the odds of failure will decrease by a factor of .79 (21%). A 95% CI for the true odds ratio also appears on output.

In addition, Minitab provides three different ways of assessing model lack-of-fit: the Pearson, deviance, and Hosmer-Lemeshow tests. Large P-values are consistent with a good model.

cont’d

Page 44: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

44

Example 6These tests are useful in multiple logistic regression, where there is more than one predictor in the model relationship so there is no single graph like that of Figure 13.9(b).

cont’d

(b) graph of estimated logistic function and classification probabilities from R

Figure 13.9

Page 45: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

45

Example 6Various diagnostic plots are also available. The R output provides information based on classifying an observation as a failure if the estimated p(x) is at least .5 and as a non failure otherwise.

Since p(x) = .5 when x = 64.80, three of the seven failures (Ys in the graph) would be misclassified as non-failures (a misclassification proportion of .429), whereas none of the non-failure observations would be misclassified.

cont’d

Page 46: Copyright  Cengage Learning. All rights reserved. 13 Nonlinear and Multiple Regression

46

Example 6A better way to assess the likelihood of misclassification is to use cross-validation: Remove the first observation from the sample, estimate the relationship, then classify the first observation based on this estimated relationship, and repeat this process with each of the other sample observations (so a sample observation does not affect its own classification).

The launch temperature for the Challenger mission was only 31°F. This temperature is much smaller than any value in the sample, so it is dangerous to extrapolate the estimated relationship. Nevertheless, it appears that O-ring failure is virtually a sure thing for a temperature this small.

cont’d