bivariate data and scatter plots

21
Bivariate Data and Scatter Plots Bivariate Data: The values of two different variables that are obtained from the same population element. While the variables may be either categorical or quantitative, we will focus on cases where they are both quantitative. Can we predict values of one variable from values of the other variable? Do the values of one variable cause the values of the other variable? 1 Section 3.1, Page 59

Upload: jameson-sparks

Post on 30-Dec-2015

130 views

Category:

Documents


3 download

DESCRIPTION

Bivariate Data and Scatter Plots. Bivariate Data: The values of two different variables that are obtained from the same population element. While the variables may be either categorical or quantitative, we will focus on cases where they are both quantitative. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bivariate Data and  Scatter Plots

Bivariate Data and Scatter Plots

Bivariate Data: The values of two different variables that are obtained from the same population element.While the variables may be either categorical or quantitative, we will focus on cases where they are both quantitative.

Can we predict values of one variable from values of the other variable?Do the values of one variable cause the values of the other variable?

1Section 3.1, Page 59

Page 2: Bivariate Data and  Scatter Plots

Scatter Plot ExampleTI-83

Scatter Plots always have and explanatory variable and a response variable. The choice is arbitrary. The explanatory variable is always plotted on the x-axis, and the response variable is always plotted on the y axis.

STAT – EDIT – ENTER; Enter x data in L1, and y in L22nd STAT PLOT – ENTER -1: Plot 1Highlight ONType: Highlight first iconXList: 2nd L1YList 2nd L2ZOOM 9: ZoomStatTRACE; Use arrows to move to points and display values.

2Section 3.1, Page 60

Page 3: Bivariate Data and  Scatter Plots

Linear CorrelationLinear Correlation: A measure of the strength of a linear relationship between two variables. The closer to a straight line the dots are, the stronger the relationship.

3Section 3.1, Page 61

If there correlation, then we say the two variables are associated. Changes in the value of one variable are associated with changes in the value of the other variable.

Page 4: Bivariate Data and  Scatter Plots

Coefficient of CorrelationMeasure of Strength

r =ZxZy

n −1∑ where Zx =

(x − x )

sx

; Zy =(y − y )

sy

−1≤ r ≤1;

r = −1

r = 0

r =1

Also known as the Pearson Correlation Coefficient.

4Section 3.2, Page 62

perfect straight line negative slope

no relationship at all

perfect straight line with positive slope

Page 5: Bivariate Data and  Scatter Plots

Problems

5Problems, Page 71

Page 6: Bivariate Data and  Scatter Plots

Correlation CoefficientTI-83 Add-In Program

Finding r.

STAT – EDIT – ENTER: Enter data in L1 and L2PRGM-CORRELTN2nd LI – ENTER – 2nd L2 – ENTERSCATTER PLOT? – 1=YES; (Displays scatter plot)ENTER; (Displays: r=.8394)This is a moderately strong positive relationship.

6Section 3.2, Page 62

Page 7: Bivariate Data and  Scatter Plots

Section 3.2, Page 63 7

Association and Causality

1 4 81

4

8

Shoe Size

Grade Level

Elementary School StudentsReading Scores

Is this a reasonable association?

Does giving students bigger shoes cause reading scores to improve?

What explains this association?

Lurking Variable: A variable that is not included in the study but has an effect on the variables in the study makes it appear those variables are related.

Association alone can never establish causality!

Page 8: Bivariate Data and  Scatter Plots

Problems

8Problems, Page 71

Page 9: Bivariate Data and  Scatter Plots

Problems

9Problems, Page 72

Page 10: Bivariate Data and  Scatter Plots

Problems

10Problems, Page 72

Page 11: Bivariate Data and  Scatter Plots

Linear Regression

11Section 3.3, Page 65

ˆ y = a + bx

Line of Best Fit

If a straight line model seems appropriate, the best fit straight line is found by using the method of least squares. Suppose that is the equation of a straight line, where (read “y-hat) represents the predicted value of y that corresponds to a particular value of x. The least squares criteria requires that we find the constants, a and b such that is as small as possible.

ˆ y

(y − ˆ y )2∑

ˆ y = a + bx

Page 12: Bivariate Data and  Scatter Plots

Line of Best Fit

The best line will be the one where the sum of the squares of the “misses” is at a minimum. Calculus procedures are used to find the coefficients, a and b such that the line ŷ = a + bx has the least squares.

12Section 3.3, Page 66

b = r ×sy

sx

r is the correlation coefficient, sy is the standard deviation of y-values and sx is the standard deviation of the x values

Page 13: Bivariate Data and  Scatter Plots

Linear RegressionTI-83 Add-In Program

a. For the above data, make a scatter plot, and comment on the suitability of the data for regression analysis.

STAT – EDIT; Enter Height in L1, and Weight in L2.PRGN – REGBASICX LIST=2ND L1; Y LIST=2ND L2SCATTER PLOT: 1=YES

The pattern looks positive, linear, and no outliers which could cause problems.

Scatter Plot

13Section 3.3, Page 68

Page 14: Bivariate Data and  Scatter Plots

Linear RegressionTI-83 Add-In Program

b. Find the regression equation and r.

ENTER; The program is paused to view graph, hitting ENTER moves the program along.

The equation is: =-186.4706 + 4.7059x r, the coefficient of correlation = .7979, a relatively strong relationship.

ˆ y

c. Check the plot of the regression line versus the scatter plot.

ENTER – 1=YES

14Section 3.3, Page 68

Page 15: Bivariate Data and  Scatter Plots

Linear RegressionTI-83 Add-In Program

d. What is the value of the slope of the line, and what does it mean?

b = 4.7059 is the slope of the line. It indicates the number of units change in the y value for every one unit increase in the x value. In this problem, for each one inch increase in height, weight increases by 4.7059 lbs. Its units are lbs/inch.

e. What is the value of the intercept of the line, and what does it mean?

a = -186.4706 is the y intercept. It has no meaning in this problem. It would be the weight of a person of zero height.

f. What is the value of r2 and what does it tell you?

It is called the index of determination. It measures the strength of the model, 1 being perfect and 0 being useless. It also equals the percentage of the variance in the y-values explained by the model.r2 = .6367 indicating a relative strong positive correlation explaining 63.67% of the y variance.

15Section 3.3, Page 68

Page 16: Bivariate Data and  Scatter Plots

Linear RegressionTI-83 Add-In Program

ENTER; 1 = YES

The horizontal line represents the regression line. For each actual value of x, the residual is the actual y-value – predicted y-value. The dots show the “misses” or residuals.

If the residuals show some kind of a pattern, it means that the linear regression model is not appropriate for the data, so another model, i.e. quadratic, may be better. Since there is not pattern is this plot, the linear model is appropriate for this data.

16Section 3.3, Page 68

g. Check the residual plot and explain what it means

Page 17: Bivariate Data and  Scatter Plots

Linear RegressionTI-83 Add-In Program

h. Use the model to predict the weight of a woman who is 65 inches tall.

PREDICTED Y: 1 = YESX=65Answer: 119.4 lbs

i. Use the model to predict the weight of a woman who is 77 inches tall.

ENTER: 1 = YESX=77Answer 175.9 lbs.

Notice that the range of the x values is from 61 to 69 inches. 77 inches is too far above the actual values used to develop the model. While the result is mathematically correct, the result is not valid in the context of the problem.

17Section 3.3, Page 68

Page 18: Bivariate Data and  Scatter Plots

Problems

18Problems, Page 72

Page 19: Bivariate Data and  Scatter Plots

Problems

19Problems, Page 73

a. Construct a scatter diagram.b. Does the pattern appear linear?c. Find the equation of best fit.d. What is the value of r and what does it mean?e. What is the slope? What are its units? Interpret

its meaning.f. What is the y-intercept value? What does it

mean?g. What does the residual plot show? What does it

mean?h. Estimate the the stride rate for a speed of 19.2

ft/sec. Is the estimate reliable? Why?i. Estimate the stride rate for a speed of 31 ft/sec.

Is the estimate reliable? Why?

Page 20: Bivariate Data and  Scatter Plots

Problems

20Problems, Page 73

c. What is the value of r and what does it mean?d. What is the slope? What are its units? Interpret

its meaning.e. What is the y-intercept value? What does it

mean?f. What does the residual plot show? What does it

mean?g. Estimate the # of intersections for a state with

450 miles. Is the estimate reliable? Why?h. Estimate the # of intersections for a state with

950 miles. Is the estimate reliable? Why?

Page 21: Bivariate Data and  Scatter Plots

Problems

21Problems, Page 73

a. Construct a scatter diagram. What does it indicate to you?

b. Find the equation of best fit.c. What is the value of r and what does it mean?d. What is the slope? What are its units? Interpret

its meaning.e. What is the y-intercept value? What does it

mean?f. What does the residual plot show? What does it

mean?g. Estimate the price of an 8 year old car. Is the

estimate reliable? Why?h. Estimate price of a 22 year old car. Is the

estimate reliable? Why?