review – using standard deviation

31
Review – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. 59, 70, 73, 75, 81, 84, 86 b) two standard deviations from the mean i.e. 59, 70, 73, 75, 81, 84, 86 c) three standard deviations from the mean i.e. 35, 59, 70, 73, 75, 81, 84, 86 1

Upload: nibaw

Post on 06-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Review – Using Standard Deviation. Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within one standard deviation from the mean i.e. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Review – Using Standard Deviation

Review – Using Standard Deviation

Here are eight test scores from a previous Stats 201 class:

35, 59, 70, 73, 75, 81, 84, 86.

The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within

a) one standard deviation from the mean i.e.

59, 70, 73, 75, 81, 84, 86

b) two standard deviations from the mean i.e.

59, 70, 73, 75, 81, 84, 86

c) three standard deviations from the mean i.e.

35, 59, 70, 73, 75, 81, 84, 86

1

Page 2: Review – Using Standard Deviation

Idea!

• The example suggests that there may be a general rule which allows us to estimate the fraction of data points which are within a given number of standard deviations of the mean.

2

Page 3: Review – Using Standard Deviation

Distribution curves

If the number of data points n is small, one uses a small number of class intervals and obtains a typical histogram

3

Page 4: Review – Using Standard Deviation

Histogram (small n)

4

Page 5: Review – Using Standard Deviation

Distribution curves

If the number of data points n is lager, one can use more subintervals in producing a histogram for the data set

5

Page 6: Review – Using Standard Deviation

Histogram (Medium n)

6

Page 7: Review – Using Standard Deviation

Distribution curvesIf the number of data points n is very large, one can use a corresponding large number of class intervals and obtain a histogram which cn be seen to approach a curve in the limit as n increases.

In this case, it is convenient to choose the scale on the vertical axis so that the area of each vertical bar corresponds to the fraction of data poins in the corresponding subinterval.

NOTE: The total area under the graph of the histogram will therefore be 1. 7

Page 8: Review – Using Standard Deviation

Histogram (large n)

8

Page 9: Review – Using Standard Deviation

In the limit as the size of the population increases, one obtains a smooth curve.

• This curve was called a probability density function in Math 112

• One obtains different curves corresponding to different population means and variances.

9

Page 10: Review – Using Standard Deviation

Normal Distributions

Shape of this curve is determined by µ and σ – µ it’s centered, σ is how far it’s spread out.

Page 11: Review – Using Standard Deviation

Interpreting the Standard Deviation

Chebyshev’s Theorem

The proportion (or fraction) of any data set lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1.

For K=2 we obtain, at least 3/4 (75 %) of all scores will fall within 2 standard deviations of the mean, i.e. 75% of the data will fall between

11sxsx 2 and 2

Page 12: Review – Using Standard Deviation

Interpreting the Standard Deviation

Chebyshev’s Theorem

The proportion (or fraction) of any data set lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1.

For K=3 we obtain, at least 8/9 (89 %) of all scores will fall within 3 standard deviations of the mean, i.e. 89% of the data will fall between

12sxsx 3 and 3

Page 13: Review – Using Standard Deviation

Exercise 1

• Data collected daily at an intersection giving the number of cars passing through.

• Mean = 375 with standard deviation 25

• Estimate the fraction of days that more than 425 cars used the intersection.

13

Page 14: Review – Using Standard Deviation

Exercise 1

• Data collected daily at an intersection giving the number of cars passing through.

• Mean = 375 with standard deviation 25

• Estimate the fraction of days that more than 425 cars used the intersection.

• ANS: At least 75% lie in the interval (325, 425). Therefore, at most 25% lie outside this interval.

14

Page 15: Review – Using Standard Deviation

Exercise 1

• NOTE: Assuming the data is symmetric and therefore evenly distributed about the mean, we can conclude that the 25% which lie outside the interval (325,425) are evenly distributed into 12.5% lying above 425 and 12.5% lying below 3.25. 5, 425). Therefore, on roughly 12.5% of the days there will be more than 425 cars using the intersection. 15

Page 16: Review – Using Standard Deviation

If the data is known to have a histogram which is symmetric

about the mean and “bell shaped”, one can improve upon

Chebyshev’s Rule

16

Page 17: Review – Using Standard Deviation

This Data is Symmetric, Bell Shaped (or Normal Data)

17

Relative Frequency

0 1 3 4 52

0.3

0.4

0.5

0.2

0.1

Mx

Page 18: Review – Using Standard Deviation

This Data is Symmetric, Bell Shaped (or Normal Data)

18

Relative Frequency

0 1 3 4 52

0.3

0.4

0.5

0.2

0.1

Mx

Page 19: Review – Using Standard Deviation

This Data is Symmetric, Bell Shaped (or Normal Data)

19

Relative Frequency

0 1 3 4 52

0.3

0.4

0.5

0.2

0.1

6 7

Mx

Page 20: Review – Using Standard Deviation

The Empirical RuleThe Empirical Rule states that for bell shaped (normal) data:68% of all data points are within 1 standard deviations of the mean 95% of all data points are within 2 standard deviations of the mean

99.7% of all data points are within 3 standard deviations of the mean

20

Page 21: Review – Using Standard Deviation

The Empirical RuleThe Empirical Rule states that for bell shaped (normal) data, approximately:68% of all data points are within 1 standard deviations of the mean 95% of all data points are within 2 standard deviations of the mean

99.7% of all data points are within 3 standard deviations of the mean

21

Page 22: Review – Using Standard Deviation

Exercise 2

• Data collected daily at an intersection giving the number of cars passing through.

• Mean = 375 with standard deviation 25

• Estimate the fraction of days that more than 425 cars used the intersection assuming the data is bell-shaped.

22

Page 23: Review – Using Standard Deviation

Exercise 2

• Data collected daily at an intersection giving the number of cars passing through.

• Mean = 375 with standard deviation 25

• Estimate the fraction of days that more than 425 cars used the intersection assuming the data is bell-shaped.

• ANS: 2.5% (Check this!)

23

Page 24: Review – Using Standard Deviation

Z-Score

To calculate the number of standard deviations a particular point is away from the standard deviation we use the following formula.

24

Page 25: Review – Using Standard Deviation

Z-Score

To calculate the number of standard deviations a particular point is away from the standard deviation we use the following formula.

The number we calculate is called the z-score of the measurement x.

25

or s

xxz

xz

Page 26: Review – Using Standard Deviation

Example – Z-score

Here are eight test scores from a previous Stats 201 class:

35, 59, 70, 73, 75, 81, 84, 86.

The mean and standard deviation are 70.4 and 16.7, respectively.

a) Find the z-score of the data point 35.

b) Find the z-score of the data point 73.

26

Page 27: Review – Using Standard Deviation

Example – Z-score

Here are eight test scores from a previous Stats 201 class:

35, 59, 70, 73, 75, 81, 84, 86.

The mean and standard deviation are 70.4 and 16.7, respectively.

a) Find the z-score of the data point 35.z = -2.11

b) Find the z-score of the data point 73.

z = 0.16

27

Page 28: Review – Using Standard Deviation

Interpreting Z-scoresThe further away the z-score is from zero the more exceptional the original score.

Values of z less than -2 or greater than +2 can be considered exceptional or unusual (“a suspected outlier”).

Values of z less than -3 or greater than +3 are often exceptional or unusual (“a highly suspected outlier”).

28

Page 29: Review – Using Standard Deviation

29

Example: Aptitude tests

Before being accepted into a manufacturing job, one must complete two aptitude tests. Your score on the tests will decide whether you will be in management or whether you will work on the factory floor. One test is a manual dexterity test, the other is a statistics test. The manual dexterity test (out of 10) has a mean of 6 and a standard deviation of 1. The statistics test (out of 50) has a mean of 25 with a standard deviation of 3. Your score is 7/10 on the manual dexterity test, and a 34/50 on the statistics test. In which test were you exceptional?

Page 30: Review – Using Standard Deviation

Example: Aptitude tests

30

The problem with comparing the two test scores stems from the fact that the tests are on two different scales.

If we are going to do meaningful comparisons, then we must somehow, standardize the scores.

Page 31: Review – Using Standard Deviation

Answer

Calculate the z-score for the two tests. – Z-score of Man. Dex. = (7-6)/1 = 1– Z-score of Stats. = (34-25)/3 = 3

Your score on the stats test was exceptionally high (3 standard deviations above the mean.

31