fitting a line to a set of points
DESCRIPTION
Fitting a Line to a Set of Points. Scatterplot fitting a line. Least squares method Minimize the error term e. y (dependent). x (independent). n. S (y - ŷ) 2. n. S (y i - a - bx i ) 2. min a,b. min a,b. =. i = 1. i = 1. Minimizing the SSE ( Sum of Squared Errors ). n. - PowerPoint PPT PresentationTRANSCRIPT
• Scatterplot fitting a line
Fitting a Line to a Set of Points
x (independent)
y (dependent)
• Least squares method
• Minimize the error term e
Minimizing the SSE(Sum of Squared Errors)
(y - ŷ)2
i = 1
n
mina,b
n
(yi - a - bxi)2
i = 1
mina,b
=
• Least squares method
Finding Regression Coefficients
(xi - x) (yi - y)i = 1
n
b =
(xi - x)2
i = 1
n
a = y - bx
Coefficient of Determination (r2)
x
y
(a)
x
y
(b)
• Numerical measure to express the strength of the
relationship
coefficient of determination (r2)
Coefficient of Determination (r2)
yy
y
Coefficient of Determination (r2)
• Regression sum of squares (SSR)
SSR = (ŷi - y)2
i = 1
n
SST = (yi - y)2
i = 1
n
yy
y
• Total sum of squares (SST)
• Coefficient of determination (R2)
r2 =SSRSST
Partitioning the Total Sum of Squares
SST = (yi - y)2
i = 1
n
+ (yi - ŷ)2
i = 1
n
= (ŷi - y)2
i = 1
n
SSTSSE
SSR
yy
ySST = SSR + SSE
Regression ANOVA Table
(yi - y)2
i = 1
n
(yi - ŷ)2
i = 1
n
(ŷi - y)2
i = 1
nComponent
Regression(SSR)
Error(SSE)
Total(SST)
Sum of Squares df
1
n - 2
n - 1
Mean Square
SSR / 1
SSE / (n - 2)
F
MSSRMSSE
Regression Example
Glyndon Field Sampled Soil Moisture versus TVDI from a 3x3 kernel
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
TVDI (3x3 kernel)
Vo
lum
etri
c S
oil
Mo
istu
re
TVDISoil
Moisture
0.274 0.4140.542 0.3590.419 0.3960.286 0.4580.374 0.3500.489 0.3570.623 0.2550.506 0.1890.768 0.1710.725 0.119
Regression Example
Excel
Regression ANOVA table
Sum of Degrees of MeanComponent Squares Freedom Square F-
Test
Regression
(SSR)
Error
(SSE)
Total
(SST)
A Significance Test for r2
Ftest =r2 (n - 2)
1 - r2
F-distribution with degrees of freedom:
df = (1, n - 2)
=MSSRMSSE
Significance of r2 Example
Assumptions of Regression
1. The relationship is linear
• y = + x +
• Not linear (scatterplot) transform one or both of the variables
Assumptions of Regression
2. The errors have a mean of zero and a constant
variance
• i.e. the errors need to distributed evenly on either side
of the regression line
• The magnitude of their dispersion has to be
reasonably constant for all values of x
• Variation in the errors is larger for some values of x
than others a linear model is not appropriate
Assumptions of Regression
3. Residuals
• Independent
• No pattern in the distribution
• Pattern
the model is not effectively capturing some
systematic aspect of the relationship
Another factor cannot be accounted for by this
model
Assumptions of Regression
Significance Tests for Regression Parameters
• t-tests
significance of individual regression parameters
• Standard error of the estimate
also known as the standard deviation of the residuals
(se):
i = 1
n(yi - ŷ)2
(n - 2)se =
Significance Test for Slope (b)
• H0: = 0
se2
(n - 1) sx2
sb =
ttest =bsb
sb is the standard deviation of the slope parameter:
df = (n - 2)
Hypothesis Testing - Significance Test for Regression Slope Example
Significance Test for Regression Intercept
ttest =asa
where sa is the standard deviation of the intercept:
and degrees of freedom = (n - 2)
se2
n(xi - x)2sa =
xi2
Hypothesis Testing - Significance Test for Regression Intercept Example
Simple Linear Regression in Excel
• Built-in functions
•SLOPE(array1, array2)
•INTERCEPT(array1, array2)
• Data Analysis Tool
S-Plus
TVDI (x)0.2740.5420.4190.2860.3740.4890.6230.5060.7680.725
Theta (y)0.4140.3590.3960.4580.3500.3570.2550.1890.1710.119
TVDI0.4130.2230.8110.5130.6550.3540.1980.7630.6710.424