lecture 19 · 2020. 8. 24. · residual variance the mean of residuals is always 0, regardless of...

13
190F Spring 2020 Foundations of Data Science Lecture 19 Residuals

Upload: others

Post on 18-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

190F Spring 2020

Foundations of Data Science

Lecture 19 Residuals

Page 2: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Announcements

Page 3: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Residuals

Page 4: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Residuals ●  Error in regression estimate

●  One residual corresponding to each point (x, y)

●  residual = observed y - regression estimate of y = observed y - height of regression

line at x = vertical difference between point

and line (Demo)

Page 5: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Residual Plot A scatter diagram of residuals

●  Should look like an unassociated blob for linear relations

●  But still contains patterns for non-linear relations

●  Used to check whether linear regression is appropriate

(Demo)

Page 6: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Regression Diagnostics

Page 7: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Dugong

(Demo)

Page 8: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Properties of Residuals

Page 9: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Residual Variance The mean of residuals is always 0, regardless of the original data

Variance is standard deviation squared: mean squared deviation

(Variance of residuals) / (Variance of y) = 1 - r2

(Variance of fitted values) / (Variance of y) = r2

Variance of y = (Variance of fitted values) + (Variance of residuals)

(Demo)

Page 10: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Discussion Question How does the SD of the fitted values relate to r?

A.  (SD of fitted) / (SD of y) = r

B.  (SD of fitted) / (SD of y) = |r|

C.  (SD of fitted) / (SD of residuals) = r

D.  (SD of fitted) / (SD of residuals) = |r|

Page 11: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

Regression Model

Page 12: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

A “Model”: Signal + Noise

Distance drawn at random from normal distribution with mean 0

Another distance drawn independently from the same normal distribution

Page 13: Lecture 19 · 2020. 8. 24. · Residual Variance The mean of residuals is always 0, regardless of the original data Variance is standard deviation squared: mean squared deviation

What We Get to See

(Demo)