chapter 11 correlation and simple linear regression statistics for business (econ) 1
TRANSCRIPT
Chapter 11Correlation and
Simple Linear Regression
Statistics for Business(Econ)
1
2
Introduction
• In this chapter we employ Regression Analysisto examine the relationship among quantitative variables.
• The technique is used to predict the value of one variable (the dependent variable - y) based on the value of other variables (independent variables x1, x2,…xk.)
3
Correlation is a statistical technique that is used to measure and describe a relationship between two variables. The correlation between two variables reflects the degree to which the variables are related.For example:
A researcher interested in the relationship between nutrition and IQ could observe the dietary patterns for a group of children and then measure their IQ scores.A business analyst may wonder if there is any relationship between profit margin and return on capital for a group of public companies.
4
A set of n= 6 pairs of scores (X and Y values) is shown in a table and in a scatterplot. The scatterplot allows you to see the relationship between X and Y.
5
Positive correlation
6
Negative correlation
7
Non-linear relationship
8
9
A strong positive relationship, approximately +0.90;
A relatively weak negative correlation, approximately -0.40
10
A perfect negative correlation, -1.00
No linear trend, 0.00.
11
A demonstration of how one extreme data point (an outrider) can influence the value of a correlation.
12
A demonstration of how one extreme data point (an outrider) can influence the value of a correlation.
13
Pearson correlation The most common measure of correlation is the Pearson Product Moment Correlation (called Pearson's correlation for short).
=
=
1
1
n
yyxxs
n
iii
xy
yx
xy
ss
sr
correlation coefficient
14
The value r2 is called the coefficient of determination because it measures the proportion of variability in one variable that can be determined from the relationship with the other variable. A correlation of r =0.80 (or -0.80), for example, means that r2 =0.64 (or 64%) of the variability in the Y scores can be predicted from the relationship with X.
15
16
17
18
19
20
21
22
23
24
25
Least square fit
26
27
28
29
30
• The linear model
y = dependent variablex = independent variable0 = y-intercept
1 = slope of the line
= error variable
xy 10 xy 10
x
y
0 Run
Rise = Rise/Run
0 and 1 are unknown,therefore, are estimated from the data.
31
To calculate the estimates of the coefficientsthat minimize the differences between the data points and the line, use the formulas:
xbyb
s
)Y,Xcov(b
10
2x
1
xbyb
s
)Y,Xcov(b
10
2x
1
The regression equation that estimatesthe equation of the first order linear modelis:
xbby 10ˆ xbby 10ˆ
32
• Example 12.1 Relationship between odometer reading and a used car’s selling price.
– A car dealer wants to find the relationship between the odometer reading and the selling price of used cars.
– A random sample of 100 cars is selected, and the data recorded.
– Find the regression line.
Car Odometer Price1 37388 53182 44758 50613 45833 50084 30862 57955 31705 57846 34010 5359
. . .
. . .
. . .
Independent variable x
Dependent variable y
33
• Solution– Solving by hand
• To calculate b0 and b1 we need to calculate several statistics first;
;41.411,5y
;45.009,36x
256,356,11n
)yy)(xx()Y,Xcov(
688,528,431n
)xx(s
ii
2i2
x
where n = 100.
533,6)45.009,36)(0312.(41.5411xbyb
0312.688,528,43256,356,1
s
)Y,Xcov(b
10
2x
1
x0312.533,6xbby 10
34
4500
5000
5500
6000
19000 29000 39000 49000
OdometerPrice
– Using the computer (see file Xm17-01.xls)
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.806308R Square 0.650132Adjusted R Square0.646562Standard Error151.5688Observations 100
ANOVAdf SS MS F Significance F
Regression 1 4183528 4183528 182.1056 4.4435E-24Residual 98 2251362 22973.09Total 99 6434890
CoefficientsStandard Error t Stat P-valueIntercept 6533.383 84.51232 77.30687 1.22E-89Odometer -0.03116 0.002309 -13.4947 4.44E-24
x0312.533,6y
Tools > Data analysis > Regression > [Shade the y range and the x range] > OK
35
This is the slope of the line.For each additional mile on the odometer,the price decreases by an average of $0.0312
4500
5000
5500
6000
19000 29000 39000 49000
Odometer
Price
x0312.533,6y
The intercept is b0 = 6533.
6533
0 No data
Do not interpret the intercept as the “Price of cars that have not been driven”