stats for engineers lecture 9
DESCRIPTION
Stats for Engineers Lecture 9. Summary From Last Time. Confidence Intervals for the mean. If is known , confidence interval for is to , where is obtained from Normal tables. If is unknown ( only know sample variance ):. - PowerPoint PPT PresentationTRANSCRIPT
Stats for Engineers Lecture 9
Summary From Last Time
Confidence Intervals for the mean
If is known, confidence interval for is to , where is obtained from Normal tables.
Assuming independent Normal data, the confidence interval for is: to .
If is unknown (only know sample variance ):
t-tables π (π‘π)=β«ββ
π‘π
π π (π‘ )ππ‘
Qπ=πβ1
Student t-distribution
Sample size
How many random samples do you need to reach desired level of precision?
Suppose we want to estimate to within , where (and the degree of confidence) is given.
βπ=π‘πβ 12 π 2
πΏ2
Want πΏ=π‘πβ1β π 2πNeed:
- Estimate of (e.g. previous experiments)
- Estimate of . This depends on n, but not very strongly.
e.g. take for 95% confidence.
Rule of thumb: for 95% confidence, choose
Example
A large number of steel plates will be used to build a ship. Ten are tested and found to have sample mean weight and sample variance How many need to be tested to determine the mean weight with 95% confidence to within ?
Answer:
βπ=π‘πβ 12 π 2
πΏ2
Assuming plates have independent weights with a Normal distributionπΏ=0.1 kg=π‘πβ 1β π 2πTake for 95% confidence.
i.e. need to test about 28
ΒΏ2.120.252
0.12=27.6
1. 2 3 4
18%
11%
50%
21%
Number of samples
If you need 28 samples for the confidence interval to be approximately how many samples would you need to get a more accurate answer with confidence interval
1. 88.52. 2803. 28004. 28000
πΏ=π‘πβ1β π 2π so need more. i.e. 2800
πΏ=π‘πβ1β π 2π
Linear regression
Linear regression: fitting a straight line to the mean value of as a function of
403020100
250
200
150
100
x
y
We measure a response variable at various values of a controlled variable
e.g. measure fuel efficiency at various values of an experimentally controlled external temperature
π₯
π¦
π₯1 π₯2 π₯3
Distribution of when
Regression curve: fits the mean values of the distributions
π¦=ππ₯+π
403020100
250
200
150
100
x
y
From a sample of values at various , we want to fit the regression curve.
e.g.
Straight line plots
Which graph is of the line ?
1 2 3 4
12%
0%
82%
6%
1. Plot2. Plot23. Plot34. Plot4
1. 2.
3. 4.
π₯
π¦
403020100
250
200
150
100
x
y
From a sample of values at various , we want to fit the regression curve.
e.g.
403020100
250
200
150
100
x
y
Or is it
What do we mean by a line being a βgood fitβ?
Simple model for data:
π¦ π=π+ππ₯ π+ππ
Equation of straight line is
Random errorStraight line
- Linear regression model
Simplest assumption: for all , and 's are independent
Maximum likelihood estimate = least -squares estimate
Minimize πΈ=βπππ2=ΒΏβ
π(π¦ πβ οΏ½ΜοΏ½ π)=β
π( π¦ πβπβππ₯ π )
2ΒΏ
Data point Straight-line prediction
Model is
E is defined and can be minimized even when errors not Normal β least-squares is simple general prescription for fitting a straight line
(but statistical interpretation in general less clear)
Want to estimate parameters a and b, using the data.
e.g. - choose and to minimize the errors
The line has been proposed as a line of best fit for the following four sets of data. For which data set is this line the best fit (minimum )?
1 2 3 4
59%
6%
31%
3%
Question from Derek Bruff
1. Pic12. Pic23. Pic correct4. pic4
1. 2.
3. 4.
How to find and that minimize ?
For minimum want and , see notes for derivation
Solution is the least-squares estimates and :
οΏ½ΜοΏ½=ππ₯π¦
ππ₯π₯β§οΏ½ΜοΏ½=π¦β οΏ½ΜοΏ½π₯
ππ₯π₯=βππ₯ π2β
(βπ π₯π)2
π =βπ
(π₯πβπ₯ )2
ππ₯π¦=βππ₯ π π¦ πβ
βππ₯ πβ
ππ¦ π
π =βπ
(π₯πβπ₯) ( π¦ πβ π¦ )β
Where
Sample means
οΏ½ΜοΏ½=οΏ½ΜοΏ½+οΏ½ΜοΏ½ π₯Equation of the fitted line is
Most of the things you need to use are on the formula sheet
Note that since
β οΏ½ΜοΏ½βπ¦=οΏ½ΜοΏ½ (π₯βπ₯)
i.e. is on the line
403020100
250
200
150
100
x
y
π₯
π¦
y 240 181 193 155 172 110 113 75 94x 1.6 9.4 15.5 20.0 22.0 35.5 43.0 40.5 33.0
Example: The data y has been observed for various values of x, as follows:
Fit the simple linear regression model using least squares.
Answer:
n = 9 ,
οΏ½ΜοΏ½=ππ₯π¦
ππ₯π₯β§οΏ½ΜοΏ½=π¦β οΏ½ΜοΏ½π₯
ππ₯π₯=βππ₯ π2β
(βπ π₯π)2
π
ππ₯π¦=βππ₯ π π¦ πβ
βππ₯ πβ
ππ¦ π
π
Want to fit
,
ππ₯π¦=26864β220.50Γ1333.0
9
ππ₯π₯=7053.7β220.52
9ΒΏ1651.42
ΒΏβ5794.1
βοΏ½ΜοΏ½=ππ₯π¦
ππ₯π₯=β 5794.5
1651.45ΒΏβ3.5086
Answer:
n = 9 ,
οΏ½ΜοΏ½=ππ₯π¦
ππ₯π₯β§οΏ½ΜοΏ½=π¦β οΏ½ΜοΏ½π₯
ππ₯π₯=βππ₯ π2β
(βπ π₯π)2
π
ππ₯π¦=βππ₯ π π¦ πβ
βππ₯ πβ
ππ¦ π
π
Want to fit
,
ππ₯π¦=26864β220.50Γ1333.0
9
ππ₯π₯=7053.7β220.52
9ΒΏ1651.42
ΒΏβ5794.1
βοΏ½ΜοΏ½=ππ₯π¦
ππ₯π₯=β 5794.5
1651.45ΒΏβ3.5086
Now just need
οΏ½ΜοΏ½=π¦βοΏ½ΜοΏ½ π₯ΒΏ 1333.0
9β(β3.5086)Γ (220.50 )
9=234.1
So the fit is approximately
Which of the following data are likely to be most appropriately modelled using a linear regression model?
1 2 3
100%
0%0%
1. Correct2. Errors change3. Not straight
1. 2.
3.
Estimating : variance of y about the fitted line Estimated error is:
, so the ordinary sample variance of the 's is
In fact, this is biased since two parameters, a and b have been estimated. The unbiased estimate is:
ΒΏππ¦π¦ βοΏ½ΜοΏ½ππ₯π¦
πβ2[derivation in notes]
Quantifying the goodness of the fit
Residual sum of squares
Which of the following plots would have the greatest residual sum of squares [variance of about the fitted line]?
1 2 3
25%
63%
13%
Question from Derek Bruff
1. Pic12. Pic23. Pic correct
1. 2. 3.