![Page 1: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/1.jpg)
Scatterplots & Regression
Week 3 Lecture
MG461
Dr. Meredith Rolfe
![Page 2: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/2.jpg)
Key Goals of the Week
• What is regression?• When is regression used?• Formal statement of linear model equation• Identify components of linear model• Interpret regression results:
• decomposition of variance and goodness of fit• estimated regression coefficients• significance tests for coefficients
MG461, Week 3 Seminar
2
![Page 3: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/3.jpg)
Who has studied regression or linear models before?
1. Taken before2. Not taken before
![Page 4: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/4.jpg)
Which group are you in?
1. Group 12. Group 23. Group 34. Group 45. Group 56. Group 67. Group 78. Group 8
Which group are you in?
![Page 5: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/5.jpg)
Regression is a set of statistical tools to model the conditional expectation…
1. of one variable on another variable.
2. of one variable on one or more other variables.
![Page 6: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/6.jpg)
LINEAR MODEL BACKGROUND
MG461, Week 3 Seminar
6
![Page 7: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/7.jpg)
Recap: Theoretical System
MG461, Week 3 Seminar
7
X
Y
![Page 8: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/8.jpg)
What is regression?
• Regression is the study of relationships between variables
• It provides a framework for testing models of relationships between variables
• Regression techniques are used to assess the extent to which the outcome variable of interest, Y, changes dependent on changes in the independent variable(s), X
MG461, Week 3 Seminar
8
![Page 9: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/9.jpg)
What is regression
50% 50%50% 50%
A statistic... A statistic...
Taken before Not taken before
Conditional Dependence: Correct Answer vs. Prior Exposure
![Page 10: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/10.jpg)
When to use Regression
• We want to know whether the outcome, y, varies depending on x• We can use regression to study correlation (not
causation) or make predictions• Continuous variables (but many exceptions)
MG461, Week 3 Seminar
10
![Page 11: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/11.jpg)
Questions we might care about
• Do higher paid employees contribute more to organizational success?
• Do large companies earn more? Do they have lower tax rates?
MG461, Week 3 Seminar
11
Does an change in X lead to an change in Y
![Page 12: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/12.jpg)
How to answer the questions?
Observational (Field) Study• Collect data on income and
a measure of contributions to the organization
• Collect data on corporate profits in various regions (which companies, which regions)
Experimental Study• Random assignment to
different contribution levels or levels of pay (?)
• ? Random assignment to country and/or tax rate, quasi-experiment?
MG461, Week 3 Seminar
12
![Page 13: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/13.jpg)
When to use Regression
• We want to know whether the outcome, y, varies depending on x
• Continuous variables (but many exceptions)• Observational data (mostly)
MG461, Week 3 Seminar
13
![Page 14: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/14.jpg)
Example 1: Pay and Performance
MG461, Week 3 Seminar
14
X
Y
Performance Pay
Runs Yearly Salary
![Page 15: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/15.jpg)
Scatterplot:
Salaries vs. Runs
MG461, Week 3 Seminar
15
X
Y
![Page 16: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/16.jpg)
Scatterplot:
Salaries vs. Runs
MG461, Week 3 Seminar
16
Δx
Δy
![Page 17: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/17.jpg)
What is the equation for
a line?W
hat is the equation for a line?
1.y=ax2
2.y=ax
3.y=ax+b
4.y=x+b
![Page 18: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/18.jpg)
Equation of a (Regression) Line
MG461, Week 3 Seminar
18
Intercept Slope
But… x and y are random variables, we need an equation that accounts for noise and signal
Population Parameters
![Page 19: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/19.jpg)
Simple Linear Model
MG461, Week 3 Seminar
19
DependentVariable
IndependentVariable
Intercept Coefficient(Slope)
Error
Observation or data point, i, goes from 1…n
![Page 20: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/20.jpg)
The relationship must be LINEAR
MG461, Week 3 Seminar
20
• The linear model assumes a LINEAR relationship
• You can get results even if the relationship is not linear
• LOOK at the data!
• Check for linearity
![Page 21: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/21.jpg)
When to use Regression
• We want to know whether the outcome, y, varies depending on x
• Continuous variables (but many exceptions)• Observational data (mostly)• The relationship between x and y is linear
MG461, Week 3 Seminar
21
![Page 22: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/22.jpg)
Understanding the key points
• What is regression?• When is regression used?• Formal statement of linear model equation
MG461, Week 3 Seminar
22
![Page 23: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/23.jpg)
Understand what regression is…
1. Strongly Agree2. Agree3. Disagree4. Strongly Disagree
Mean =
Median =
![Page 24: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/24.jpg)
UNDERSTAND WHEN TO USE REGRESSION
1. Strongly Agree2. Agree3. Disagree4. Strongly Disagree
Mean =
Median =
![Page 25: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/25.jpg)
KNOW HOW TO MAKE FORMAL STATEMENT OF LINEAR MODEL
1. Strongly Agree2. Agree3. Disagree4. Strongly Disagree
Mean =
Median =
![Page 26: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/26.jpg)
DISCUSSION
What is regression?When is regression used?Formal statement of linear model equation
MG461, Week 3 Seminar
26
![Page 27: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/27.jpg)
MODEL ESTIMATION
MG461, Week 3 Seminar
27
![Page 28: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/28.jpg)
WHICH MODEL PARAMETER DO WE NOT NEED TO ESTIMATE?
1 2 3 4
8%
38%
53%
1%
1. β0
2. β1
3. xi
4. σ2
![Page 29: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/29.jpg)
Goal: Estimate the Relationship between X and Y
• Estimate the population parametersβ0 and β1
MG461, Week 3 Seminar
29
• We can also estimate the error variance σ2 as
0̂1̂("beta hat zero") ("beta hat one")
2̂
![Page 30: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/30.jpg)
What would “good” estimates do?
1 2 3
14%
70%
15%
1. Minimize explained variance
2. Minimize distance to outliers
3. Minimize unexplained variance
![Page 31: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/31.jpg)
Finding the Best Line: U
nexplained Variance
MG461, Week 3 Seminar
31
ei
![Page 32: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/32.jpg)
Ordinary Least Squares (OLS) Criteria
MG461, Week 3 Seminar
32
0̂1̂("beta hat zero") ("beta hat one")
Minimize “noise” (unexplained variance) defined as residual sum of squares (RSS)
![Page 33: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/33.jpg)
OLS Estimates of Beta-hat
MG461, Week 3 Seminar
33
![Page 34: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/34.jpg)
Mean of x and y
MG461, Week 3 Seminar
34
![Page 35: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/35.jpg)
Variance of x
MG461, Week 3 Seminar
35
![Page 36: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/36.jpg)
Variance of y
MG461, Week 3 Seminar
36
![Page 37: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/37.jpg)
Covariance of x and y
MG461, Week 3 Seminar
37
![Page 38: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/38.jpg)
OLS Estimates of Beta-hat
MG461, Week 3 Seminar
38
0 1ˆ ˆy x
Note the similarity between ß1 and the slope of a line: change in y over change in x (rise over run)
![Page 39: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/39.jpg)
Why squared residuals?
• Geometric intuition: X and Y are vectors, find the shortest line between them:
MG461, Week 3 Seminar
39
X
Y
![Page 40: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/40.jpg)
Decomposing the Variance
• As in Anova, we now have:• Explained Variation• Unexplained Variation• Total Variation
• This decomposition of variance provides one way to think about how well the estimated model fits the data
MG461, Week 3 Seminar
40
![Page 41: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/41.jpg)
Total Squared Residuals (SYY)
MG461, Week 3 Seminar
41
![Page 42: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/42.jpg)
Explained vs. Unexplained
Squared Residuals
MG461, Week 3 Seminar
42
![Page 43: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/43.jpg)
R2 and goodness of fit
MG461, Week 3 Seminar
43
RSS (residual sum of squares) =
unexplained variationTSS (total sum of squares) =
SYY, total variation of dependent variable yESS (explained sum of squares) =
explained variation (TSS-RSS)
![Page 44: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/44.jpg)
Examples of high and low R2
MG461, Week 3 Seminar
44
Graph 1 Graph 2
![Page 45: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/45.jpg)
Which graph had a high R2?
1 2
67%
33%
1. Graph 12. Graph 2
![Page 46: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/46.jpg)
Recall that R2= ESS/TSS or 1-(RSS/TSS). What values can R2 take on?
1 2 3 4
3% 0%
86%
12%
1. Can be any number2. Any number
between -1 and 13. Any number
between 0 and 14. Any number
between 1 and 100
![Page 47: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/47.jpg)
Examples of high and low R2
R2=0.29 R2=0.87
MG461, Week 3 Seminar
47
![Page 48: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/48.jpg)
Interpretation of ß-hats
• ß-hat0 : intercept, value of yi when xi is 0
• ß-hat1: average or expected change in yi for every 1 unit change in xi
MG461, Week 3 Seminar
48
![Page 49: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/49.jpg)
Visualization of Coeffi
cients
MG461, Week 3 Seminar
49
Δx=1
Δy=β1
β0
![Page 50: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/50.jpg)
OLS estimates: Pay for RunsCoefficient s.e. t p-value (sig)
Intercept -34.29 98.27 -0.35 0.727
Runs 27.47 1.79 15.36 < 0.001
R2
n0.41336
MG461, Week 3 Seminar
50
Salaryi = Beta-hat0 + Beta-hat1 * Runsi + errori
![Page 51: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/51.jpg)
OLS estim
ates of Regression Line
MG461, Week 3 Seminar
51
Salary = -34 + 27.47*Runs
![Page 52: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/52.jpg)
Interpretation of ß-hats
• ß-hat0 : players with no runs don’t get paid (not really – come back to this next week!)
• ß-hat1: Each additional run translates into $27,470 in salary per year
• OR: a difference of almost $1 million/year between a player with an average (median) and an above average (80%) number of runs
MG461, Week 3 Seminar
52
y (Salary) = -34 + 27.47x (Runs)
![Page 53: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/53.jpg)
Significance of Results
Model Significance• H0: None of the 1 (or more)
independent variables covary with the dependent variable
• HA: At least one of the independent variables covaries with d.v.
• Application: compare two fitted models
• Test: Anova/F-Test • **assumes errors (ei) are
normally distributed
Coefficient Significance• H0: ß1=0, there is no
relationship (covariation) between x and y
• HA: ß1≠0, there is a relationship (covariation) between x and y
• Application: a single estimated coefficient
• Test: t-test**assumes errors (ei) are
normally distributed
MG461, Week 3 Seminar
53
![Page 54: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/54.jpg)
OLS estimates: Pay for RunsCoefficient s.e. t p-value (sig)
Intercept -34.29 98.27 -0.35 0.727
Runs 27.47 1.79 15.36 < 0.001
R2
n0.41336
MG461, Week 3 Seminar
54
Assuming normality, we can derive estimated standard errors for the coefficients
![Page 55: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/55.jpg)
OLS estimates: Pay for RunsCoefficient s.e. t p-value (sig)
Intercept -34.29 98.27 -0.35 0.727
Runs 27.47 1.79 15.36 < 0.001
R2
n0.41336
MG461, Week 3 Seminar
55
And using these, calculate a t-statistic and test for whether or not the coefficients are equal to zero
0
0
ˆ
ˆse
1
1
ˆ
ˆse
![Page 56: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/56.jpg)
OLS estimates: Pay for RunsCoefficient s.e. t p-value (sig)
Intercept -34.29 98.27 -0.35 0.727
Runs 27.47 1.79 15.36 < 0.001
R2
n0.41336
MG461, Week 3 Seminar
56
And finally, the probability of being wrong (Type 1) if we reject H0
![Page 57: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/57.jpg)
Plotting Confidence Intervals
MG461, Week 3 Seminar
57
![Page 58: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/58.jpg)
Agree or Disagree, “The lecture was clear and easy to follow”
1 2 3 4 5 6 7
46%
28%
20%
3%1%1%
0%
1. Strongly Agree2. Agree3. Somewhat Agree4. Neutral5. Somewhat Disagree6. Disagree7. Strongly Disagree
![Page 59: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/59.jpg)
Next time..
• Multiple independent variable• OLS assumptions• What to do when OLS assumptions are
violated
MG461, Week 3 Seminar
59
![Page 60: Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f2c5503460f94c471e7/html5/thumbnails/60.jpg)
Team Scores