stats for engineers lecture 9

22
Stats for Engineers Lecture 9

Upload: renee

Post on 23-Feb-2016

32 views

Category:

Documents


4 download

DESCRIPTION

Stats for Engineers Lecture 9. Summary From Last Time. Confidence Intervals for the mean. If is known , confidence interval for is to , where is obtained from Normal tables. If is unknown ( only know sample variance ):. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stats for Engineers Lecture  9

Stats for Engineers Lecture 9

Page 2: Stats for Engineers Lecture  9

Summary From Last Time

Confidence Intervals for the mean

If is known, confidence interval for is to , where is obtained from Normal tables.

Assuming independent Normal data, the confidence interval for is: to .

If is unknown (only know sample variance ):

t-tables 𝑄 (π‘‘πœˆ)=βˆ«βˆ’βˆž

π‘‘πœˆ

𝑓 𝜈 (𝑑 )𝑑𝑑

Q𝜈=π‘›βˆ’1

Student t-distribution

Page 3: Stats for Engineers Lecture  9

Sample size

How many random samples do you need to reach desired level of precision?

Suppose we want to estimate to within , where (and the degree of confidence) is given.

⇒𝑛=π‘‘π‘›βˆ’ 12 𝑠2

𝛿2

Want 𝛿=π‘‘π‘›βˆ’1√ 𝑠2𝑛Need:

- Estimate of (e.g. previous experiments)

- Estimate of . This depends on n, but not very strongly.

e.g. take for 95% confidence.

Rule of thumb: for 95% confidence, choose

Page 4: Stats for Engineers Lecture  9

Example

A large number of steel plates will be used to build a ship. Ten are tested and found to have sample mean weight and sample variance How many need to be tested to determine the mean weight with 95% confidence to within ?

Answer:

⇒𝑛=π‘‘π‘›βˆ’ 12 𝑠2

𝛿2

Assuming plates have independent weights with a Normal distribution𝛿=0.1 kg=π‘‘π‘›βˆ’ 1√ 𝑠2𝑛Take for 95% confidence.

i.e. need to test about 28

ΒΏ2.120.252

0.12=27.6

Page 5: Stats for Engineers Lecture  9

1. 2 3 4

18%

11%

50%

21%

Number of samples

If you need 28 samples for the confidence interval to be approximately how many samples would you need to get a more accurate answer with confidence interval

1. 88.52. 2803. 28004. 28000

𝛿=π‘‘π‘›βˆ’1√ 𝑠2𝑛 so need more. i.e. 2800

𝛿=π‘‘π‘›βˆ’1√ 𝑠2𝑛

Page 6: Stats for Engineers Lecture  9

Linear regression

Linear regression: fitting a straight line to the mean value of as a function of

403020100

250

200

150

100

x

y

We measure a response variable at various values of a controlled variable

e.g. measure fuel efficiency at various values of an experimentally controlled external temperature

Page 7: Stats for Engineers Lecture  9

π‘₯

𝑦

π‘₯1 π‘₯2 π‘₯3

Distribution of when

Regression curve: fits the mean values of the distributions

𝑦=π‘Žπ‘₯+𝑏

Page 8: Stats for Engineers Lecture  9

403020100

250

200

150

100

x

y

From a sample of values at various , we want to fit the regression curve.

e.g.

Page 9: Stats for Engineers Lecture  9

Straight line plots

Which graph is of the line ?

1 2 3 4

12%

0%

82%

6%

1. Plot2. Plot23. Plot34. Plot4

1. 2.

3. 4.

π‘₯

𝑦

Page 10: Stats for Engineers Lecture  9

403020100

250

200

150

100

x

y

From a sample of values at various , we want to fit the regression curve.

e.g.

Page 11: Stats for Engineers Lecture  9

403020100

250

200

150

100

x

y

Or is it

What do we mean by a line being a β€˜good fit’?

Page 12: Stats for Engineers Lecture  9

Simple model for data:

𝑦 𝑖=π‘Ž+𝑏π‘₯ 𝑖+𝑒𝑖

Equation of straight line is

Random errorStraight line

- Linear regression model

Simplest assumption: for all , and 's are independent

Page 13: Stats for Engineers Lecture  9

Maximum likelihood estimate = least -squares estimate

Minimize 𝐸=βˆ‘π‘–π‘’π‘–2=ΒΏβˆ‘

𝑖(𝑦 π‘–βˆ’ οΏ½Μ‚οΏ½ 𝑖)=βˆ‘

𝑖( 𝑦 π‘–βˆ’π‘Žβˆ’π‘π‘₯ 𝑖 )

2ΒΏ

Data point Straight-line prediction

Model is

E is defined and can be minimized even when errors not Normal – least-squares is simple general prescription for fitting a straight line

(but statistical interpretation in general less clear)

Want to estimate parameters a and b, using the data.

e.g. - choose and to minimize the errors

Page 14: Stats for Engineers Lecture  9

The line has been proposed as a line of best fit for the following four sets of data. For which data set is this line the best fit (minimum )?

1 2 3 4

59%

6%

31%

3%

Question from Derek Bruff

1. Pic12. Pic23. Pic correct4. pic4

1. 2.

3. 4.

Page 15: Stats for Engineers Lecture  9

How to find and that minimize ?

For minimum want and , see notes for derivation

Solution is the least-squares estimates and :

οΏ½Μ‚οΏ½=𝑆π‘₯𝑦

𝑆π‘₯π‘₯βˆ§οΏ½Μ‚οΏ½=π‘¦βˆ’ οΏ½Μ‚οΏ½π‘₯

𝑆π‘₯π‘₯=βˆ‘π‘–π‘₯ 𝑖2βˆ’

(βˆ‘π‘– π‘₯𝑖)2

𝑛 =βˆ‘π‘–

(π‘₯π‘–βˆ’π‘₯ )2

𝑆π‘₯𝑦=βˆ‘π‘–π‘₯ 𝑖 𝑦 π‘–βˆ’

βˆ‘π‘–π‘₯ π‘–βˆ‘

𝑖𝑦 𝑖

𝑛 =βˆ‘π‘–

(π‘₯π‘–βˆ’π‘₯) ( 𝑦 π‘–βˆ’ 𝑦 )❑

Where

Sample means

οΏ½Μ‚οΏ½=οΏ½Μ‚οΏ½+οΏ½Μ‚οΏ½ π‘₯Equation of the fitted line is

Page 16: Stats for Engineers Lecture  9

Most of the things you need to use are on the formula sheet

Page 17: Stats for Engineers Lecture  9

Note that since

β‡’ οΏ½Μ‚οΏ½βˆ’π‘¦=οΏ½Μ‚οΏ½ (π‘₯βˆ’π‘₯)

i.e. is on the line

403020100

250

200

150

100

x

y

π‘₯

𝑦

Page 18: Stats for Engineers Lecture  9

y 240 181 193 155 172 110 113 75 94x 1.6 9.4 15.5 20.0 22.0 35.5 43.0 40.5 33.0

Example: The data y has been observed for various values of x, as follows:

Fit the simple linear regression model using least squares.

Answer:

n = 9 ,

οΏ½Μ‚οΏ½=𝑆π‘₯𝑦

𝑆π‘₯π‘₯βˆ§οΏ½Μ‚οΏ½=π‘¦βˆ’ οΏ½Μ‚οΏ½π‘₯

𝑆π‘₯π‘₯=βˆ‘π‘–π‘₯ 𝑖2βˆ’

(βˆ‘π‘– π‘₯𝑖)2

𝑛

𝑆π‘₯𝑦=βˆ‘π‘–π‘₯ 𝑖 𝑦 π‘–βˆ’

βˆ‘π‘–π‘₯ π‘–βˆ‘

𝑖𝑦 𝑖

𝑛

Want to fit

,

𝑆π‘₯𝑦=26864βˆ’220.50Γ—1333.0

9

𝑆π‘₯π‘₯=7053.7βˆ’220.52

9ΒΏ1651.42

ΒΏβˆ’5794.1

β‡’οΏ½Μ‚οΏ½=𝑆π‘₯𝑦

𝑆π‘₯π‘₯=βˆ’ 5794.5

1651.45ΒΏβˆ’3.5086

Page 19: Stats for Engineers Lecture  9

Answer:

n = 9 ,

οΏ½Μ‚οΏ½=𝑆π‘₯𝑦

𝑆π‘₯π‘₯βˆ§οΏ½Μ‚οΏ½=π‘¦βˆ’ οΏ½Μ‚οΏ½π‘₯

𝑆π‘₯π‘₯=βˆ‘π‘–π‘₯ 𝑖2βˆ’

(βˆ‘π‘– π‘₯𝑖)2

𝑛

𝑆π‘₯𝑦=βˆ‘π‘–π‘₯ 𝑖 𝑦 π‘–βˆ’

βˆ‘π‘–π‘₯ π‘–βˆ‘

𝑖𝑦 𝑖

𝑛

Want to fit

,

𝑆π‘₯𝑦=26864βˆ’220.50Γ—1333.0

9

𝑆π‘₯π‘₯=7053.7βˆ’220.52

9ΒΏ1651.42

ΒΏβˆ’5794.1

β‡’οΏ½Μ‚οΏ½=𝑆π‘₯𝑦

𝑆π‘₯π‘₯=βˆ’ 5794.5

1651.45ΒΏβˆ’3.5086

Now just need

οΏ½Μ‚οΏ½=π‘¦βˆ’οΏ½Μ‚οΏ½ π‘₯ΒΏ 1333.0

9βˆ’(βˆ’3.5086)Γ— (220.50 )

9=234.1

So the fit is approximately

Page 20: Stats for Engineers Lecture  9

Which of the following data are likely to be most appropriately modelled using a linear regression model?

1 2 3

100%

0%0%

1. Correct2. Errors change3. Not straight

1. 2.

3.

Page 21: Stats for Engineers Lecture  9

Estimating : variance of y about the fitted line Estimated error is:

, so the ordinary sample variance of the 's is

In fact, this is biased since two parameters, a and b have been estimated. The unbiased estimate is:

¿𝑆𝑦𝑦 βˆ’οΏ½Μ‚οΏ½π‘†π‘₯𝑦

π‘›βˆ’2[derivation in notes]

Quantifying the goodness of the fit

Residual sum of squares

Page 22: Stats for Engineers Lecture  9

Which of the following plots would have the greatest residual sum of squares [variance of about the fitted line]?

1 2 3

25%

63%

13%

Question from Derek Bruff

1. Pic12. Pic23. Pic correct

1. 2. 3.