example: boats and manatees - florida atlantic...

21
Slide 1 Example: Boats and Manatees Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant linear correlation between the number of registered boats and the number of manatees killed by boats. Use method 2. Using the same procedure previously illustrated, we find that r = 0.922. Method 2: Referring to Table A-6, we conclude that there is a significant linear correlation between number of registered boats and number of manatee deaths from boats. Figure 9-6

Upload: trinhbao

Post on 21-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Slide 1Example: Boats and Manatees

Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant linear correlation between the number of registered boats and the number of manatees killed by boats. Use method 2.

Using the same procedure previously illustrated, we find that r = 0.922.

Method 2: Referring to Table A-6, we conclude that there is a significant linear correlation between number of registered boats and numberof manatee deaths from boats.

Figure 9-6

Slide 2Use method 1

1 – 0.922 2

10 – 2

0.922t = = 6.735

1 – r 2n – 2

rt =

Slide 3Using either of the two methods, we find

Method 1: 6.735 > 2.306. Method 2: 0.922 > 0.632;

That is, the test statistic falls in the critical region.

Conclusion:

We therefore reject the null hypothesis. There is sufficient evidence at significance level 0.05 to support the claim of a linear correlation between the number of registered boats and the number of manatee deaths from boats.

Slide 4FIGURE 9-4Testing for a LinearCorrelation

Slide 5Interpreting r: Explained Variation

The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

Manatee example: With r = 0.922, we get r2 = 0.850.

We conclude that 0.850 (or about 85%) of the variation in manatee deaths can be explained by the linear relationship between the number of boat registrations and the number of manatee deaths from boats. This implies that 15% of the variation of manatee deaths cannot be explained by the number of boat registrations.

Slide 69.3 Regression

� Regression EquationVariables:

X =independent variable, predictor variable or explanatory variableY= dependent variable or response variable.

A straight line (linear relationship):

Y = a X + b,

where a is the y-intercept and b is the slope.

Slide 7AssumptionsFor each x-value,

• Normality: Y is a random variable having a normal (bell-shaped) distribution.

• Homogeneity: All of these y distributions have the same variance.

• Linearity: The distribution of y-values has a mean that lies on the regression line.

(Results are not seriously affected if departures from normal distributions and equal variances are not too extreme.)

Slide 8Regression Equation

y-intercept of regression equation ββββ0 b0

Slope of regression equation ββββ1 b1

Equation of the regression line y = ββββ0 + ββββ1 x y = b0 + b1 x

PopulationParameter

SampleStatistic

^

�Given paired sample data (x,y) of size nsatisfying population parameter equation below.

Regression linewhere b0 estimates ββββ0 and b1 estimates ββββ1 .Q: How to get b0 and b1?

Slide 9Formula for b0 and b1

Formula 9-2n(Σxy) – (Σx) (Σy)b1 = (slope)

n(Σx2) – (Σx)2

b0 = y – b1 x (y-intercept)Formula 9-3

calculators or computers can compute these values

Slide 10

The regression line fits the sample

points best.

Slide 11

12

18

36

54

Datax

y

Calculating the Regression Equation

n = 4ΣΣΣΣx = 10ΣΣΣΣy = 20ΣΣΣΣx2 = 36ΣΣΣΣy2 = 120ΣΣΣΣxy = 48

n(ΣΣΣΣxy) – (ΣΣΣΣx) (ΣΣΣΣy)n(ΣΣΣΣx2) –(ΣΣΣΣx)2

b1 =

4(48) – (10) (20)4(36) – (10)2

b1 =

–844

b1 = = –0.181818

In Section 9-2, we used these values to find that the linear correlation coefficient of r = –0.135. Use this sample to find the regression equation.

5.45 (2.5)(-.181818)-5

10

==

−= xbyb

The estimated equation of the regression line is: xy 182.45.5ˆ −=

Slide 12Example: Boats and Manatees

Slide 13

Given the sample data in Table 9-1, find the regression equation.

Using the same procedure as in the previous example, we find that b1 = 2.27 and b0 = –113 or computer, the estimated regression equation is:

y = –113 + 2.27x^

Example: Boats and Manatees

Slide 14

In predicting a value of y based on some given value of x ...1. If there is not a significant linear

correlation, the best predicted y-value is y.

Predictions

2. If there is a significant linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation.

Slide 15

Figure 9-8Predicting the

Value of a Variable

Slide 16

The best regression equation is y = –113 + 2.27x.

Assume that in 2001 there were 850,000 registered boats. Because Table 9-1 lists the numbers of registered boats in tens of thousands, this means that for 2001 we havex = 85.

Q: Given that x = 85, find the best predicted value of y, the number of manatee deaths from boats.

^

Revisit Boats and Manatees Example

We do have a significant linear correlation (with r = 0.922).

Slide 17Example:

Boats and Manatees

y = –113 + 2.27x–113 + 2.27(85) = 80.0

^

The predicted number of manatee deaths is 80.0. The actual number of manatee deaths in 2001 was 82, so the predicted value of 80.0 is quite close.

Slide 18

1. If there is no significant linear correlation, don’t use the regression equation to make predictions.

2. When using the regression equation for predictions, stay within the scope of the available sample data.

3. A regression equation based on old data is not necessarily valid now.

4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.

Guidelines for Using TheRegression Equation

Slide 19Definitions� Marginal Change: The marginal change is

the amount that a variable changes when the other variable changes by exactly one unit.

� Outlier: An outlier is a point lying far away from the other data points.

� Influential Points: An influential point strongly affects the graph of the regression line.

Slide 20

Definitions�Residual

for a sample of paired (x, y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation.

�Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

^

Residuals and the Least-Squares Property

^

Slide 21

x 1 2 4 5y 4 24 8 32

y = 5 + 4x^

Figure 9-9

Residuals and the Least-Squares Property