ch10: correlation and regression - uc denvermath.ucdenver.edu/~ssantori/math2830sp13/math2830... ·...

33
CH10: Correlation and Regression Santorico - Page 410 CH10: Correlation and Regression

Upload: dodang

Post on 09-May-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 410

CH10: Correlation and Regression

Page 2: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 411

Section 10-1: Paired Data and Scatter Plots

Many times we are interested in determining if there is a relationship between two variables. To do this we can collect data consisting of two measurements that are paired with each other. One variable will be the independent variable, x

(explanatory), and the second the dependent variable, y (response).

Examples: height and weight of individuals, maximum speed limit of each state versus number of car crash deaths per capita

Page 3: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 412

Once we’ve collected all the pairs of data listed as (x, y) we can draw a graph to represent the data. This graph is called a scatter plot. A scatter plot is a graph of ordered pairs of data values that is used to determine if a relationship exists between the two variables. Drawing a scatter plot: Step 1: Draw and label the x and y axes Step 2: Plot the points for pairs of data

Page 4: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 413

Example: Create a scatter plot for the following data set Height Hand span 71 23.5 69 22.0 66 18.5 64 20.5 71 21.0 72 24.0 67 19.5 65 20.5 How would you describe the above relationship?

Page 5: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 414

Analyzing the Scatter Plot A positive linear relationship exists when the points fall approximately in an ascending straight line and both x and y values increase at the same time.

As the values for the x variable increase the values for the y variable increase

Page 6: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 415

A negative linear relationship exists when the points fall approximately in a descending straight line from left to right.

As the values for the x variable increase the values for the y variable decrease

Page 7: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 416

A nonlinear relationship exists when the points fall in a curved line.

The relationship is then described by the nature of the curve (e.g. quadratic, cubic, exponential, etc).

Page 8: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 417

No relationship exists when there is no discernable pattern of the points.

Page 9: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 418

How Can We Summarize Strength of Association? When the data points follow roughly a straight line trend, the variables are said to have an approximately linear relationship. If we have a linear relationship we can use the correlation coefficient to help determine the strength of the association.

Page 10: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 419

Correlation Coefficient – Computed from the sample data, measures the strength and direction of a linear relationship between two variables.

The symbol for the sample correlation coefficient is r, and the symbol for the population correlation coefficient is ρ.

The correlation coefficient takes on values between -1 and +1.

A positive value for r:

A negative value for r:

An r value close to +1 or -1 indicates

An r value close to 0 indicates

Page 11: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 420

Page 12: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 421

Calculating the correlation, r

2 22 2

n xy x yr

n x x n y y

where n is the number of data pairs. You will not be required to compute r manually, but you will need to know how to calculate it using your calculator.

Page 13: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 422

TI-83 and TI-84 Directions To compute the correlation, the diagnostic setting must be turned on.

Press 2nd, then 0, this takes you to the catalog. Scroll down to the “DiagnosticOn” entry and then press

ENTER twice. You will only have to do this once! Computing correlation:

Type your x variable into L1 and your y variable into L2. Press STAT, highlight CALC, and select LinReg(ax+b) (or

press number 4). Type L1, L2 then press ENTER.

Page 14: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 423

Determine the correlation coefficient for the height and hand span data. Height Hand span 71 23.5 69 22.0 66 18.5 64 20.5 71 21.0 72 24.0 67 19.5 65 20.5

Page 15: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 424

At what point is r high enough to conclude that there is a significant linear relationship between two variables, or the value of r differs from zero due to chance?

We can use a hypothesis test to determine the significance of r. See page 536-539.

We will not cover this hypothesis test. You should know that it is possible to test whether the

relationship is statistically significant (i.e. r is far enough away from 0).

Page 16: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 425

Correlation does not Imply Causation A correlation between x and y means that a linear relationship exists between the two variables (note that this should be verified with a scatter plot, because the correlation coefficient can always be computed no matter what the relationship between x and y is). A correlation between x and y, does not mean that x causes y. Example: beer sales and ice cream sales There is NO proof of causation WITHOUT manipulation (i.e., randomized experiment).

Page 17: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 426

Lurking Variable – is a variable, usually unobserved, that influences the association between the variables of primary interest. Example: What could the lurking variable be for the last example?

Page 18: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 427

Confounding variable or confounder: A confounder is related to both the exposure of interest and the outcome, but is not on the causal pathway.” (More commonly used term for a lurking variable)

Z = confounder

Y = outcome X = exposure of interest

In the following, Z creates an association between X and Y; however, if Z was controlled for, this association would disappear.

X Z Y

X Z Y

X Z Y

Page 19: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 428

Is organic food the real cause of the increase in Autism?

Page 20: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 429

Section 10-2: Regression Once we have discovered that a linear relationship exists, we can then determine the equation of the regression line, which is the data’s line of best fit. The purpose of the regression line is to enable the researcher to make predictions based on the data.

Page 21: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 430

Page 22: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 431

Line of Best Fit To find the line of best fit we try to _________________________ the _________________________ distance from each point to the regression the line. We need a line of best fit so that we can predict values of y from the values of x. Therefore the closer the points are to the line, the better the fit and the better the prediction.

Page 23: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 432

Determination of the Regression Line Equation Recall from algebra that the equation of a line is usually given as y = mx+b where m is ______________________ and b is ____________________. In statistics the equation of the regression line is written as

y abx where a is ___________________ and b is ___________________.

Page 24: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 433

Formulas for the Regression Line

a y x2 x xy

n x2 x 2

b n xy x y

n x2 x 2

where a is the y’ intercept and b is the slope of the line. We can use the calculator to help us find the regression line without using these formulas. We will use the same process we did to find the correlation coefficient r. Note: Round a and b to 3 decimal places!

Page 25: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 434

TI-83 and TI-84 Directions To compute the regression line equation, the diagnostic setting must be turned on.

Press 2nd, then 0, this takes you to the catalog. Scroll down to the “DiagnosticOn” entry and then press

ENTER twice. You will only have to do this once! Computing the regression line equation:

Type your x variable into L1 and your y variable into L2. Press STAT, highlight CALC, and select LinReg(ax+b) (or

press number 4). Type L1, L2 then press ENTER.

Page 26: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 435

Notice the output will be for y=ax+b. So the a reported in calculator is the slope and b is the y-intercept. Example: Age and sick days Age, x 18 26 39 48 53 58 Days, y 16 12 9 5 6 2 Note: linear relationship confirmed by scatterplot.

20 30 40 50

24

68

10

14

x

y

Page 27: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 436

Find the equation of the regression line and the correlation coefficient.

Page 28: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 437

Relationship between r and b

If r is positive, then b will be positive (and vice versa) If r is negative, then b will be negative (and vice versa) If r is zero, then b will be zero (and vice versa).

Predicting a Response Using the Regression Line To predict the value of a new response for some value of the explanatory variable, we simply plug that value of the explanatory variable into our regression equation. The resulting value is the predicted value.

Page 29: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 438

Find the number of sick days predicted for someone who is 30 years old. Find the number of sick days predicted for someone who is 75 years old.

Page 30: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 439

Extrapolation Is Dangerous! Extrapolation: Using a regression line to predict y values for x values outside the observed range of the data.

Page 31: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 440

Example: Collect a sample of heights and weights from male children aged 0 to 5. What would happen if we predicted the height or weight of an adult male? Example: Suppose we give lab rats various levels of amphetamine and observe their subsequent caloric intake for the next hour. Let y = Caloric intake and x = Amphetamine dosage. Why is it a bad idea to extrapolate in this example?

Page 32: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 441

Comical/sad example of extrapolation: “If current trends continue, by 2606 the US diet will be 100 percent sugar.”

Page 33: CH10: Correlation and Regression - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830... ·  · 2013-04-19CH10: Correlation and Regression Santorico - Page 422 TI-83 and

CH10: Correlation and Regression Santorico - Page 442

And with that…..WE ARE DONE WITH COURSE MATERIAL!!!!

Important dates:

Fri, 4/26: Exam 3 Study session. Email me if you are interested, and we’ll find a time to fit the maximal number of schedules.

Wednesday, 5/1: Exam 3, covering Chapters 7-10. Dr. Cribari will proctor the exam.

Mon, 5/6, and Wed, 5/8: Project presentations (10 minutes allowed / group). Be sure to place presentation in the Dropbox group.

Thursday 5/9: Final exam Study session. Email me if you are interested, and we’ll try to find a time to fit the maximal number of schedules.

Saturday, May 11, 9-12 Uniform Final Exam in MC-2 (Modular Classroom 2, located between the Tivoli and the Athletic Fields)