chapter 10 correlation and regression. scatter diagrams and linear correlation

44
Chapter 10 Correlation and Regression

Upload: alexia-williamson

Post on 16-Jan-2016

271 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Chapter 10 Correlation and Regression

Page 2: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

SCATTER DIAGRAMS AND LINEAR CORRELATION

Page 3: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Note:

• Studies of correlation and regression of two variables usually begin with a graph of paired data values (x,y). We call this a scatter diagram

Page 4: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Scatter Plot/Scatter Diagram

• It is a graph in which data pairs (x,y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. We call x the explanatory variable and y the response variable.

Page 5: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Why do we use scatter plot?

• We use scatter plot to observe whether there seems to be a linear relationship between x and y values.

Page 6: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Example:

• Phosphorous is a chemical used in many household and industrial cleaning compound. Unfortunately, phosphorous tends to find its way into surface water, where it can kill fish, plants, and other wetland creatures. Phosphorous reduction programs are required by law and are monitored by the EPA. A random sample of eight sites in a California wetlands study gave the following information about phosphorous reduction in drainage water. X is a random variable that represents phosphorous concentration and y is a random variable that represents total phosphorous concentration.

• Graph the scatter plot and then comment on the relationship between x and yx 5.2 7.3 6.7 5.9 6.1 8.3 5.5 7.0

y 3.3 5.9 4.8 4.5 4.0 7.1 3.6 6.1

Page 7: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Group Work• Here are the safety report of different divisions:

• Make a scatter plot• Does a line fit the data reasonably well?• Draw a line that “fits best”

Division X (hours in safety training)

Y (accidents)

1 10 80

2 19.5 65

3 30 68

4 45 55

5 50 35

6 65 10

7 80 12

Page 8: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Scatter Diagrams with its correlation

Page 9: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Correlation coefficient r• It is a numerical measurement that assesses the strength of a linear relationship

between two variables x and y

• 1. r is a unitless measurement between -1 and 1. . If r=1, there is perfect positive linear correlations. If r=-1, there is perfect negative linear correlation. If r=0, then there is no linear correlation. The closer r is to 1 or -1, the better a line describes the relationship between the two variables x and y.

• 2. Positive values of r imply that as x increases, y tends to increase.• Negative values of r imply that as x increases, y tends to decrease.

• 3. The value of r is the same regardless of which variable is the explanatory variable and which is the response variable. In other words, the value of r is the same for the pairs (x,y) and the corresponding pairs (y,x)

• 4. The value of r does not change when either variable is converted to different units.

Page 10: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

How do you compute the sample correlation coefficient r?

• Obtain a random sample of n data pairs (x,y). The data pairs should have a bivariate normal distribution. This means that for a fixed value of x, the y values should have a normal distribution, and for a fixed y, the x values should have their own normal distribution

• 1. Using the data pairs, compute

• 2. With n=sample size, , you are ready to compute the sample correlation coefficient r using the computation formula

Page 11: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Example: calculate rX 70 115 105 82 93 125 88

y 3 45 21 7 16 62 12

Page 12: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Group work: calculate rX 10 15 16 1 4 6 18 12 14

y 5 2 1 9 7 8 1 5 3

Page 13: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Note:• Usually a scatter diagram does not contain all possible data points

that could be gathered. • r=sample correlation coefficient computed from a random sample of

(x,y) data pairs.• = population correlation coefficient computed from all population

data pairs (x,y)

• In ordered pairs (x,y), x is called the explanatory variable and y is called the response variable. When r indicates a linear correlation between x and y, changes in values of y tend to respond to changes in values of x according to a linear model. A lurking variable is a variable that is neither an explanatory nor a response variable. Yet, a lurking variable may be responsible for changes in both x and y

Page 14: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

TI 83/TI 84

• Enter the data into two columns. Use Stat Plot and choose the first type. Use option 9:ZoomStat under Zoom.

• CATALOG, find DiagnosticON, press enter twice. Then, press STAT, CALC, then option 8:LinReg(a+bx)

Page 15: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Homework Practice

• P503 #1-16 even

Page 16: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

LINEAR REGRESSION AND THE COEFFICIENT OF DETERMINATION

Page 17: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Least-squares criterion

• The sum of the squares of the vertical distances from the data points (x,y) to the line is made as small as possible.

Page 18: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

What does it mean?

• It means that for any given point, the distance from the point to the line has the least amount of error (the sum of the squares of the vertical distance from the points to the line be made as small as possible). We use the least-squares criterion to find the best linear equation.

Page 19: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

How to find the equation of the Least-Squares line

• Slope:

• Y-intercept=

• Another way to calculate b is:

Page 20: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Example: Find the least-squares liney=a+bx

X 30 34 27 25 17 23 20

y 66 79 70 60 48 55 60

Page 21: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Sketch the scatter plot and the least-squares line

Page 22: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Remember

• The point is always on the least-squares line

Page 23: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Using the Least-Squares Line for Prediction

• Predicting values for x values that between observed x values in the data set is called interpolation

• Predicting values of x values that are beyond observed x values in the data set is called extrapolation. Extrapolation may produce unrealistic forecasts.

Page 24: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Group Work

a) Find the least-squares line.

b) Predict when x = 51

X= Super soldier serum y=Captain America

Page 25: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Coefficient of determination• is the coefficient of determination.

• 1) Compute the sample correlation r using the procedure for r. Then simply compute , the sample coefficient of determination

• 2) The value is the ratio of explained variation over total variation. That is, is the fractional amount of total variation in y that can be explained by using the linear model

• 3)Furthermore, is the fractional amount of total variation in y that is due to random chance or to the possibility of lurking variables that influence y.

Page 26: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Homework Practice

• Pg 520 #1-18 eoo

Page 27: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

INFERENCES FOR CORRELATION AND REGRESSION

Page 28: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Sample Statistic to Population Parameter

Sample Statistic Population Parameter

Page 29: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Important Note:To make inferences regarding the population correlation coefficient and the slope of the population least-squares line, we need to be sure that

a) The set (x,y) of ordered pairs is a random sample from the population of all possible such (x,y) pairs.

b) For each fixed value of x, the y values have a normal distribution. All of the y distributions have the same variance, and, for a given x value, the distribution of y values has a mean that lies on the least-squares line. We also assume that for a fixed y, each x has its own normal distribution. In most cases the result are still accurate if the distribution are simply mound-shaped and symmetric, and the y variances are approximately equal.

Page 30: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

How to test the Population Correlation Coefficient

• 1) Use the null hypothesis . In the context of the application, state the alternate hypothesis ( and set the level of significance .

• 2) Obtain a random sample of data pairs (x,y) and compute the sample test statistic

• with degrees of freedom • 3) Use a Student’s t distribution and the type of test, one-

tailed or two-tailed, to find (estimate) the P-value corresponding to the test statistic

• 4) Conclude the test. If P-value If P-value• 5) Interpret your conclusion in the context of the application.

Page 31: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Example:

• Learn more, earn more! We have probably all heard this platitude. The question is whether or not there is some truth in the statement. Do college graduates have an improved chance at a better income? Is there a trend in the general population? Consider the following variables: x=percentage of the population 25 or older with at least four years of college and y = percentage growth in per capita income over the past seven years. A random sample of six communities in Ohio gave the info. Use 1% level of significance

Page 32: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Answer

• First you compute the sample correlation coefficient r.

• with d.f=4• 0.005<P-value<0.010

• Actual P-value=0.0092 so we reject null and conclude that population correlation coefficient between x and y is positive.

Page 33: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Group Work

• A medical research team is studying the effect of a new drug on red blood cells. Let x be a random variable representing milligrams of the drug given to a patient. Let y be a random variable representing red blood cells per cubic milliliter of whole blood. A random sample of n=7 volunteer patients gave the following result.

• Use 1% level of significance to test claim that x 9.2 10.1 9.0 12.5 8.8 9.1 9.5

y 5.0 4.8 4.5 5.7 5.1 4.6 4.2

Page 34: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Standard Error of Estimate

• It is to find the “error” from the data points to the best fit line. You want to minimalize this.

• 1) Obtain a random sample of n data pairs (x,y)

• 2)Use the procedure to find a and b from the sample least-squares line

Page 35: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

TI 83/TI 84

• The value of is given as s under STAT, TEST, Option E:LinREGTTest

Page 36: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

How to Find a Confidence Interval for a predicted y from the Least-Squares Line

• 1) Obtain a random sample of data pairs (x,y)• 2) Use the procedure to find the least-squares

line . You also need to find from the sample data and the standard error of estimate using equation of this section.

• 3) The c confidence interval for y for a specified value x is

Page 37: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Confidence interval Continue.

• is the predicted value of y from the least-squares line for a specified x value

• c=confidence level• n= number of data pairs• critical value from student’s t distribution for

c confidence level using d.f.=n-2 • = standard error of estimate

Page 38: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Example:X Y

10 17

20 21

30 25

40 28

50 33

60 40

70 49

a) Find the least-square equationb) Find rc) Find when x is at 45d) Find the 95% confidence interval

Page 39: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Inferences about the slope

• It is important because it measures the rate at which y changes per unit change in x

Page 40: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

How to Test and find a confidence interval for

• Obtain a random sample of 3 or more data pairs.

• 1) Use the null hypothesis . Use an alternate hypothesis . Set the level of significance .

• 2) sample test statistic• with d.f.= n-2

• 3) Use a Student’s t distribution and the type of test, one-tailed or two-tailed, find the P-value

• 4) Conclude

• 5) Interpret your result

Page 41: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

To Find a confidence interval for

• Where

• c=confidence level• n=number of data pairs• students t distribution critical value for confidence

level c and d.f.=n-2• standard error of estimate

Page 42: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Example:

• Plate tectonics and the spread of the ocrean floor are very important topics in modern studies of earthquakes and earth science in general.

• A) Compute a,b, , ,• B) Use 5% level of significance to test the claim

that is positive• C) Find a 75% confidence interval for

X 120 83 60 50 35 30 20 17

Y 30 16 15.5 14.5 22 18 12 0

Page 43: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Group Work

• How fast do puppies grow? X=age in weeks and y=weight in kilograms for a random sample of male wolf pups.

• A) Find the a, b, and standard error• B) Use a 1% level of significance to test the

claim that • C) Compute an 80% confidence interval for

x 8 10 14 20 28 40 45

Y 7 13 17 23 30 34 35

Page 44: Chapter 10 Correlation and Regression. SCATTER DIAGRAMS AND LINEAR CORRELATION

Homework Practice

• Pg 543 #1-12 odd