chapter 10: correlation and regression chapter 13:...
TRANSCRIPT
Chapter 10: Correlation and Regression
Chapter 13: Nonparametric Statistics
Objectives:
❑ Learn how to draw a scatter plot for a set of
ordered pairs.
❑ Learn how to compute the correlation coefficient.
❑ Learn how to compute the equation of the
regression line.
❑ Learn how to compute the Spearman rank
correlation coefficient.
Overview of Chapters 10 and 13
Sec. # Title Page(s)
10 - 1 Scatter Plots and Correlation 369 – 385
13 - 6The Spearman Rank Correlation
Coefficient459 – 461
10 - 2 Regression 386 – 393
Remember?
Independent variable
influencesDependent variable
At a Glance!
Are two or more variables linearly related?
(Scatter plot and/or correlation coefficient)
If so, what is the strength of the relationship?
(Scatter plot and/or correlation coefficient)
What type of relationship exists?
(Scatter plot, correlation coefficient and/or regression)
What kind of predictions can be made from the relationship?
(Regression)
A scatter plot is a graph of the ordered pairs
of numbers (x, y) consisting of the independent
variable x and the dependent variable y.
10 – 1: Scatter Plots and Correlation
10 – 1: Scatter Plots and Correlation
(cont.)
It is a visual way to describe the nature of the
relationship between the x and y. It may shows:
a positive linear relationship,
a negative linear relationship,
a curvilinear relationship,
or no relationship.
Example 10 – 1, page 372, Example 10 – 2,
page 372 – 373, Example 10 – 3, page 373.
Examples of scatter plots patterns
Correlation
Pearson’s linear correlation coefficient, which will
be denoted by 𝑟, measures the strength and the
direction of a linear relationship between two
quantitative variables.
Calculating 𝒓
The linear correlation coefficient is given by
𝒓 =𝒏∑𝒙𝒚 − (∑𝒙)(∑𝒚)
𝒏∑𝒙𝟐 − ∑𝒙 𝟐 𝒏∑𝒚𝟐 − ∑𝒚 𝟐
The above coefficient is also known as Pearson
product moment correlation coefficient (PPMC).
Properties of 𝒓
The range of the correlation coefficient is from +1 to -1.
If the value of 𝑟 is close to +1, then there is a strong positive linear relationship between the variables.
If the value of 𝑟 is close to -1, then there is a strong negative linear relationship between the variables.
If the value of 𝑟 is close to 0, then there is either a weak or no linear relationship between the variables.
Properties of 𝒓
Example 10 – 4: Car Rental Companies
# of Cars (x) Revenue (y)
63 7
29 3.9
20.8 2.1
19.1 2.8
13.4 1.4
8.5 1.5
From the left table, we
obtain:
∑𝒙 = 𝟏𝟓𝟑. 𝟖,∑𝒚 = 𝟏𝟖. 𝟕,
∑𝒙𝒚 = 𝟔𝟖𝟐. 𝟕𝟕,∑𝒙𝟐 = 𝟓𝟖𝟓𝟗. 𝟐𝟔,∑𝒚𝟐 = 𝟖𝟎. 𝟔𝟕.
Example 10 – 4 (cont.)
𝒓 =𝟔(𝟔𝟖𝟐. 𝟕𝟕) − (𝟏𝟓𝟑. 𝟖)(𝟏𝟖. 𝟕)
𝟔(𝟓𝟖𝟓𝟗. 𝟐𝟔) − 𝟏𝟓𝟑. 𝟖 𝟐 𝟔(𝟖𝟎. 𝟔𝟕) − 𝟏𝟖. 𝟕 𝟐
= 𝟎. 𝟗𝟖𝟐
Hence, there is a strong positive linear correlation relation
between the number of rented cars and revenues.
Example 10 – 5, page 377 (Negative correlation),
Example 10 – 6, page 378 (Weak positive correlation).
13 – 6: The Spearman Rank Correlation
Coefficient
If 𝑛 is the sample size, and 𝑑 is difference in ranks,
then the Spearman rank correlation coefficient is
calculated as
𝒓𝒔 = 𝟏 −𝟔∑𝒅𝟐
𝒏(𝒏𝟐 − 𝟏)
Example 13 – 7: Bank Branches and
Deposits (page 459)
# of branches (X) Deposits (Y) Rank
(X)
Rank
(Y)
209 23 4 4353 31 2 119 7 8 6201 12 5 5344 26 3 2132 5 6 7401 24 1 3126 5 7 8
# of branches (X) Deposits (Y) Rank
(X)
209 23 4353 31 219 7 8201 12 5344 26 3132 5 6401 24 1126 5 7
# of branches (X) Deposits (Y)
209 23353 3119 7201 12344 26132 5401 24126 4
# of branches (X)
20935319201344132401126
Example 13 – 7 (cont.)
Rank (X) Rank (Y) 𝒅 𝒅𝟐
4 4 0 02 1 1 18 6 2 45 5 0 03 2 1 16 7 -1 11 3 -2 47 8 -1 1
∑ 𝟎 𝟏𝟐 = ∑𝒅𝟐
Example 13 – 7 (cont.)
𝒓𝒔 = 𝟏 −𝟔∑𝒅𝟐
𝒏 𝒏𝟐 − 𝟏= 𝟏 −
𝟔 ⋅ 𝟏𝟐
𝟖 𝟔𝟒 − 𝟏= 𝟏 −
𝟕𝟐
𝟓𝟎𝟒= 𝟎. 𝟖𝟓𝟕
The above value indicates that we have a strong positive correlation.
We can calculate Spearmen’s correlation if the data are ordinal-level qualitative.
10 – 2: Regression
If the value of the correlation coefficient is
significant, the next step is to determine the
equation of the regression line, which is the data’s
line of best fit.
Best fit means that the sum of the squares of the
vertical distances from each point to the line is at
a minimum.
Line of best fit
Line of best fit (cont.)
Determination of the Regression Line
Equation
The equation regression line is:
𝒚′ = 𝒂 + 𝒃 ⋅ 𝒙
Here, 𝑎 is the intercept or the regression constant,
𝑏 is the slope or the regression coefficient, 𝑥 is the
observed independent variable, and they are used
to calculate 𝑦′which is the predicted dependent
variable.
Determination of the Regression Line
Equation (cont.)
𝒂 =∑𝒚 ∑𝒙𝟐 − (∑𝒙)(∑𝒚)
𝒏 ∑𝒙𝟐 − ∑𝒙 𝟐
𝒃 =𝒏 ∑𝒙𝒚 − (∑𝒙)(∑𝒚)
𝒏 ∑𝒙𝟐 − ∑𝒙 𝟐
Example 10 – 9 (page 388)
Number of rented cars is the independent variable
𝑥, while the revenue is the dependent variable 𝑦.
The regression line is found to be:
𝒚′ = 𝟎. 𝟑𝟗𝟔 + 𝟎. 𝟏𝟎𝟔 ⋅ 𝒙
This means that as the number of rented cars
increases by 1 as the revenue increases by 0.106
on average.
Example 10 – 10 (page 389)
Number of absences is the independent variable
𝑥, while the final grade is the dependent variable
𝑦. The regression line is found to be:
𝒚′ = 𝟏𝟎𝟐. 𝟒𝟗𝟑 − 𝟑. 𝟔𝟐𝟐 ⋅ 𝒙
This means that as the number of absences
increases by 1 as the final grade decreases by
3.622 on average.
Example 10 – 11 (page 391)
Predict the income of a car rental agency (y) that
has 200,000 automobiles (x).
Note that in the Example 10 – 1, the unit of number
of rented automobiles is in ten thousands.
Therefore, 200,000 automobiles is in fact 20 ten
thousand, i.e. x = 20. Hence,
𝑦′ = 0.396 + 0.106 𝟐𝟎 = 2.516
Important Rule!
Q. Is there any relationship between the Person’s
correlation coefficient and the regression coefficient
𝒃?
A. The sign of the correlation coefficient and the sign
of the slope of the regression line will always be the
same.
Application Summary
Measure Excel only Excel + MegaStat
Scatter plot ✓
Person’s linear correlation
coefficient✓
Spearman’s correlation
coefficient✓
Regressions equation ✓