9.1 Correlation
• Key Concepts:– Scatter Plots– Correlation– Sample Correlation Coefficient, r– Hypothesis Testing for the Population
Correlation Coefficient, ρ
9.1 Correlation
• What exactly do we mean by correlation?– If two variables are correlated, it means a relationship
exists between them.– Examples of correlated variables:
• Job Satisfaction and Job Attendance• Number of Cows per Square Mile and Crime Rate• Height and Weight• High School GPA and College GPA• Square Footage and Price (of a house)
9.1 Correlation
• Two questions we need to answer:1. Does a linear (or straight line) correlation exist
between the two variables?
2. If the variables appear linearly correlated, how strong is the correlation?
– We can answer (1) using a scatter plot• The independent (explanatory) variable is x• The dependent (response) variable is y
– Example: How well does High School GPA, x, “explain” College GPA, y?
– See section 2.2 for a review of scatter plots
9.1 Correlation
• Once the scatter plot is complete, we should be able to see if a linear relationship exists between the two variables.– See p. 470 for what we mean by Negative Linear
Correlation, Positive Linear Correlation, No Correlation, and Nonlinear Correlation.
• Next, we need a way to quantify or measure the strength of the linear relationship between the two variables.
9.1 Correlation
• The Correlation Coefficient measures the strength and the direction of the linear relationship between two variables. The sample correlation coefficient, r, is defined as:
where n is the number of pairs of data
2 22 2
n xy x yr
n x x n y y
9.1 Correlation
• Things we need to know about the sample correlation coefficient, r :– r will always lie between -1 and 1, inclusive: -1 ≤ r ≤ 1
– If r = -1, we say there is a perfect negative linear correlation between the two variables.
– If r = 1, there is a perfect positive linear correlation between the two variables.
– The strength of the linear relationship between the variables is determined by r ’s proximity to 1 or -1. In other words, the closer r is to 1 or -1, the stronger the linear relationship. The closer r is to 0, the weaker the linear relationship.
• Practice:#22 p. 482 (Age and Vocabulary)
9.1 Correlation
• Once we have the sample linear correlation coefficient, r, we can use it in a t-Test to make an inference about the population linear correlation coefficient, ρ (Greek letter “rho”).– Why bother?
• Remember we found r using a limited set of data. What about the rest of the population? Do we have enough evidence from the sample data to claim that a significant linear correlation exists between our two variables?
– Example: If we have analyzed the High School GPA and College GPA of 25 students, is there enough evidence to claim that a significant linear correlation exists between the High School GPA and College GPA of all students?
9.1 Correlation
• t-Test for the Population Correlation Coefficient– We will use the two-tailed version of this test:
H0: ρ = 0 (no significant correlation exists)
Ha: ρ ≠ 0 (a significant correlation exists)
– The test statistic is r and the standardized test statistic is given by:
Note: t follows a t-distribution with n – 2 degrees of freedom
212
r
r rt
rn
9.1 Correlation
• Practice using the t-Test:
#32 p. 484 (Braking Distances: Wet Surface)