3.1 measure of centerjga001/chapter 3.pdf · 3.1 - 1 3.1 measure of center calculate the mean for a...
TRANSCRIPT
3.1 - 1
3.1 Measure of Center
Calculate the mean for a given data set
Find the median, and describe why the median is sometimes preferable to the mean
Find the mode of a data set
Describe how skewness affects these measures of center
3.1 - 2
Measure of Center
Measure of Center
the value at the center or middle of a data set
The three common measures of center are the mean, the median, and the mode.
3.1 - 3
Mean
the measure of center obtained by adding the data values and then dividing the total by the number of values
What most people call an average also called the arithmetic mean.
3.1 - 4
Notation
Greek letter sigma used to denote the sum of a set of values.
x is the variable usually used to represent the data values.
n represents the number of data values in a sample.
3.1 - 5
Example of summation
• If there are n data values that are denoted as:
Then:
nxxx ,,, 21
nxxxx 21
3.1 - 6
Example of summation
• data
Then:
367
6462625348322521
x
21,25,32,48,53,62,62,64
3.1 - 7
Sample Mean
x = n
x
is pronounced „x-bar‟ and denotes the mean of a set of sample values
x
3.1 - 8
Example of Sample Mean
• data
Then:
21,25,32,48,53,62,62,64
9.45875.458
367
8
6462625348322521
x
3.1 - 9
Notation
µ Greek letter mu used to denote the
population mean
N represents the number of data values in a population.
3.1 - 10
Population Mean
N µ =
x
Note: here x represents the data values in the
population
3.1 - 11
Advantages
Is relatively reliable: means of samples
drawn from the same population don‟t vary
as much as other measures of center
Takes every data value into account
Mean
3.1 - 12
Mean
Disadvantage
Is sensitive to every data value, one
extreme value can affect it dramatically;
is not a resistant measure of center
Example:
21,25,32,48,53,62,62,64 → 9.45x
21,25,32,48,53,62,62,300 → 4.75x
3.1 - 13
Median
Median
the measure of center which is the middle
value when the original data values are
arranged in order of increasing (or
decreasing) magnitude
3.1 - 14
Finding the Median
1. If the number of data values is odd, the median is the number located in the exact middle of the list. Its position in the list is:
First sort the values (arrange them in order), the follow one of these
thn
2
1
3.1 - 15
Finding the Median
2. If the number of data values is even, the median is found by computing the mean of the two middle numbers which are those that lie on either side of the data value in the position:
thn
2
1
3.1 - 16
Example of Median
• 6 data values:
5.40 1.10 0.42 0.73 0.48 1.10
• Sorted data:
0.42 0.48 0.73 1.10 1.10 5.40
(even number of values – no exact middle)
915.02
1.173.0median
3.1 - 17
Example of Median
• 7 data values:
5.40 1.10 0.42 0.73 0.48 1.10 0.66
73.0median
• Sorted data:
0.42 0.48 0.66 0.73 1.10 1.10 5.40
3.1 - 18
Median
Median is not affected by an
extreme value - is a resistant
measure of the center
Example:
21,25,32,48,53,62,62,64
21,25,32,48,53,62,62,300
Median is 50.5 for both data sets.
3.1 - 19
Median
From Example 3.3, page 91
3.1 - 20
Mode
the value that occurs with the greatest
frequency
Data set can have one, more than one, or no mode
3.1 - 21
Mode
Mode is the only measure of central
tendency that can be used with nominal data
Bimodal two data values occur with the same greatest frequency
Multimodal more than two data values occur with the same greatest frequency
No Mode no data value is repeated
3.1 - 22
a) 5.40 1.10 0.42 0.73 0.48 1.10
b) 27 27 27 55 55 55 88 88 99
c) 1 2 3 6 7 8 9 10
Mode - Examples
Mode is 1.10
Bimodal - 27 & 55
No Mode
3.1 - 23
• These data values represent weight gain or loss in kg for a random sample of18 college freshman (negative data values indicate weight loss)
11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2
• Do these values support the legend that college students gain 15 pounds (6.8 kg) during their freshman year? Explain
3.1 - 24
• Sample Mean
kg 9.118
34
n
x
• Median
-5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11
kg 5.12/)21(median
• Mode: -2
3.1 - 25
• All of the measures of center are below 6.8 kg (15 pounds)
• Based on measures of center, these data values do not support the idea that college students gain 15 pounds (6.8 kg) during their freshman year
CONCLUSION
3.1 - 26
• First, enter the list of data values
•Then select 2nd STAT (LIST) and arrow right to MATH option 3:mean( or 4: median(
•and input the desired list
Mean/Median with Graphing Calculator
3.1 - 27
Example of Computing the Mean Using Calculator
Sorted amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979
Note: this data is related to Three Mile Island nuclear power plant
Accident in 1979.
x = n
x = 149.2
3.1 - 28
Example of Computing the Mean Using Calculator
Median is 150
3.1 - 29
Symmetric distribution of data is symmetric if the
left half of its histogram is roughly a mirror image of its right half
Skewed
distribution of data is skewed if it is not symmetric and extends more to one side than the other
Skewed and Symmetric
3.1 - 30
Skewed to the left
(also called negatively skewed) have a longer left tail, mean and median are to the left of the mode
Skewed to the right
(also called positively skewed) have a longer right tail, mean and median are to the right of the mode
Skewed Left or Right
3.1 - 31
Distribution Skewed Left
• Mean is smaller than the median.
3.1 - 32
Symmetric Distribution
• Mean, median, mode approximately equal.
3.1 - 33
Distribution Skewed Right
• Mean is larger than the median.
3.1 - 34
Example data set
5 5 5 5 5 10 10 10 10 10 10 15 15 15 15 15
20 20 20 20 25 25 25 30 30 30 35 35 40 45
• Mean:
• Median:
Distribution is skewed right.
18.730
560
n
xx
152
1515
3.1 - 35
3.2 Measures of Variability
The range
What is a deviation?
The standard deviation and the variance.
3.1 - 36
Why is it important to understand variation?
• A measure of the center by itself can be misleading
• Example:
Two nations with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families (see the following table).
3.1 - 37
Example of variation
Data Set A Data Set B
50,000 10,000
60,000 20,000
70,000 70,000
80,000 120,000
90,000 130,000
MEAN 70,000 70,000
MEDIAN 70,000 70,000
Data set B has more variation about the mean
3.1 - 38
Histograms: example of variation
Data set B has more variation about the mean (Target).
3.1 - 39
How do we quantify variation?
3.1 - 40
Definition
The range of a set of data values is the difference between the maximum data value and the minimum data value.
Range = (maximum value) – (minimum value)
3.1 - 41
Range = 30 - 6 = 24
Example of range.
Data:
27 28 25 6 27 30 26
3.1 - 42
Range (cont.)
This shows that the range is very sensitive to extreme values; therefore not as useful as other measures of variation.
Ignoring the outlier of 6 in the previous data set gives data 27 28 25 27 30 26
Range = 30 - 25 = 5
3.1 - 43
Deviation
The deviation for a given data value is the distance between the data value and the mean, except that the deviation can be negative while a distance is always positive.
3.1 - 44
Deviation
A deviation for a given data value is the difference between the data value and the mean of the data set. If x is the data value, 1. For a sample, the deviation of x is
2. For a population, the deviation of x is
xx
x
3.1 - 45
Deviation
The deviation can be positive, negative, or zero. 1. If the data value is larger than the mean, the
deviation will be positive.
2. If the data value is smaller than the mean, the deviation will be negative.
3. If the data value equals the mean, the deviation will be zero.
3.1 - 46
Example:
• data
Mean
8,5,12,8,9,15,21,16,3
78.109
3162115981258
n
xx
3.1 - 47
Data Value Deviation
8
5
12
8
9
15
21
16
3
78.278.108
78.578.105
78.278.108
78.178.109
78.778.103
22.578.1016
22.1078.1021
22.478.1015
22.178.1012
3.1 - 48
Population Variance
N
x2
2
The population variance is the mean of the squared deviations in the population
3.1 - 49
Population Standard Deviation
N
x2
The population standard deviation is the square root of the population variance.
3.1 - 50
Sample Variance
1
2
2
n
xxs
The sample variance is
3.1 - 51
Sample Variance
Note that the sample variance is only approximately the mean of the squared deviations in the sample because we use n-1 instead of n.
3.1 - 52
Sample Variance
A statistic is an unbiased estimator of a parameter if its mean value equals the parameter it is trying to estimate.
Using n-1 instead of n makes the sample variance an unbiased estimator of the population variance.
3.1 - 53
Sample Standard Deviation
1
2
n
xxs
The sample standard deviation is the square root of the sample variance.
3.1 - 54
Steps to calculate the sample standard deviation
1. Calculate the sample mean
2. Find the squared deviations from the sample mean for each sample data value:
3. Add the squared deviations
4. Divide the sum in step 3 by n-1
5. Take the square root of the quotient in step 4
2)( xx
x
3.1 - 55
Example: Standard Deviation
Given the data set:
8, 5, 12, 8, 9, 15, 21, 16, 3
Find the standard deviation
3.1 - 56
Example: Standard Deviation
• Find the mean
78.109
3162115981258
n
xx
3.1 - 57
Data Value Squared Deviations
From the Mean
8
5
12
8
9
15
21
16
3
73.7)78.108( 2
41.33)78.105( 2
73.7)78.108( 2
17.3)78.109( 2
53.60)78.103( 2
25.27)78.1016( 2
45.104)78.1021( 2
81.17)78.1015( 2
49.1)78.1012( 2
3.1 - 58
Example: Standard Deviation
• Add the squared deviations (last column in the table above)
53.6025.2745.10481.1717.373.749.141.3373.7
57.263
3.1 - 59
• Divide the sum by 9-1=8:
• Take the square root:
95.328/57.263
74.595.32
7.5s
Example: Standard Deviation
3.1 - 60
Sample Standard Deviation (Computational Formula)
1
/22
n
nxxs
3.1 - 61
Example: Standard Deviation
Data:
2 1 1 1 1 1 1 4 1 2 2 1 2 3
3 2 3 1 3 1 3 1 3 2 2
Determine the standard deviation
using the previous formula
3.1 - 62
Example: Standard Deviation
• We need to find each the following:
n
)( 22 xx
x
3.1 - 63
Data Table (25 data values)
TOTALS: 47 109
3.1 - 64
Example: Standard Deviation
• Thus:
25n
109 )( 22 xx
47 x
3.1 - 65
Example: Standard Deviation
• And:
24
25/47109
1
/ 222
n
nxxs
9.086.0
3.1 - 66
Standard Deviation - Important Properties
The standard deviation is a measure of variation of all values from the mean.
The value of the standard deviation s is never negative and usually not zero.
The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others).
Unlike variance, the units of the standard deviation s are the same as the units of the original data values.
3.1 - 67
Example: page 116
3.1 - 68
Colony A
7361134 range
Colony B
9167158 range
ANSWER
3.1 - 69
28(b) Which colony has the greater variability according to the range?
Example: page 116
ANSWER: colony B
3.1 - 70
Use the previous example and calculate the standard deviation for each colony with a calculator.
3.1 - 71
Data is stored in Lists. Locate and press
the STAT button on the
calculator. Choose EDIT. The calculator
will display the first three of six lists
(columns) for entering data. Simply type
your data and press ENTER. Use your
arrow keys to move between lists.
3.1 - 72
• Enter STAT then arrow right to CALC to get
• then press ENTER
3.1 - 73
Calculator Example
When 1-Var Stats appears on the home screen, enter the name of the list containing the data. You can do this by entering List (= 2nd STAT) and choosing which list has the desired data.
3.1 - 74
• 1-Var Stats
• NOTE: Previous example data will give different values than these.
3.1 - 75
Colony A
standard deviation = 21.9
Colony B
standard deviation = 26.4
ANSWER
3.1 - 76
• Compare histograms (SPSS)
3.1 - 77
3.3 Working with grouped data
• Calculate the weighted mean for a given data set
•Estimate the mean from grouped data
•Estimate the variance and standard deviation from grouped data
3.1 - 78
Weighted Mean
When data values are assigned different weights, we can compute a weighted mean.
Data values:
Corresponding weights:
nxxxx ,..., , , 321
nwwww ,..., , , 321
3.1 - 79
Computing the weighted mean
• Multiply each data value by its corresponding weight:
•Sum these products.
•Divide the result by the sum of the weights.
iixw
3.1 - 80
Weighted Mean
n
nn
i
iiw
www
xwxwxw
w
xwx
21
2211
3.1 - 81
Example: Weighted Mean
Suppose homework/quiz average is weighted 10%, 2 exams are weighted 60%, and final exam is weighted 30%.
If a student makes homework/quiz average 87, exam scores of 80 and 92, and final exam score 85, compute the weighted average.
3.1 - 82
Example: Weighted Mean
ANSWER:
8.85
00.1
)85(30.0)92(30.0)80(30.0)87(10.0
3.1 - 83
Example: Weighted Mean
3.1 - 84
Employment Hourly Mean Wage
($)
(weight) x (data value)
12,380 60.32 746,761.60
18,580 60.25 1,119,445.00
9,540 59.39 566580.60
35,550 57.98 2,061,189.00
10,130 55.95 566773.50
Weights Data Values
180,86
70.749,060,5
i
iiw
w
xwx
= $58.72
3.1 - 85
Estimating the mean from grouped data
• Given a frequency distribution, how do we compute the mean?
HEIGHT
(inches)
FREQUENCY
59-60 3
61-62 3
63-64 4
65-66 7
67-68 6
69-70 1
71-72 1
Heights of 25 Women:
3.1 - 86
Estimating the mean from grouped data
• Assume all sample values are at the class midpoints
3.1 - 87
Assume all sample values are at the class
midpoints. HEIGHT
(inches)
FREQUENCY
59-60 3
61-62 3
63-64 4
65-66 7
67-68 6
69-70 1
71-72 1
Class midpoints:
59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5
Estimating the mean from grouped data
3.1 - 88
Estimating the mean from grouped data
• Multiply each class midpoint by its corresponding frequency
•Add the result
•Divide by the sum of the frequencies (total number of data values)
3.1 - 89
Class midpoints:
59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5
Estimating the mean from grouped data
Frequencies:
3, 3, 4, 7, 6, 1, 1
25
)5.71(1)5.69(1)5.67(6)5.65(7)5.63(4)5.61(3)5.59(3
Estimated mean:
inches 9.6425/5.1621
3.1 - 90
Estimating the mean from a frequency distribution: Class midpoint = Frequency =
•Note: is “mu-hat” where the hat denotes the fact the mu is not exact, but approximate.
n
nn
i
ii
fff
fmfmfm
f
fm
21
2211
im if
3.1 - 91
Estimating the variance from a frequency distribution:
i
ii
f
fm
)ˆ(ˆ
22
Estimating the standard deviation from a frequency distribution:
i
ii
f
fm
)ˆ(ˆ
2
3.1 - 92
HEIGHT
(inches)
Frequency Class
Midpoints
59-60 3 59.5 86.2
61-62 3 61.5 33.9
63-64 4 63.5 7.4
65-66 7 65.5 2.9
67-68 6 67.5 41.8
69-70 1 69.5 21.5
71-72 1 71.5 44.0
inches 9.64ˆ
ii fm 2)ˆ(
inches 1.325
7.237 )ˆ(ˆ
2
i
ii
f
fm
3.1 - 93
3.4 Measures of Position
Find percentiles for small and large data sets
Calculate z-scores and explain why we use them
Use z-scores to detect outliers.
3.1 - 94
Let p be any integer between 0 and
100. The pth percentile of a data set is a value for which p percent of the values in the data set are less than or equal to this value.
Percentile
3.1 - 95
Sort the data from small to large. If you are finding the pth percentile of a
sample of size n, calculate:
which is p percent of n
Steps to find the pth percentile for small data sets
np
i
100
3.1 - 96
If i is an integer, the pth percentile is the
mean of the data values in positions i and i+1. If i is not an integer, round up and use
the value in this position as the pth percentile.
Steps to find the pth percentile for small data sets (cont)
3.1 - 97
Example of Finding Percentile
• Find the 25th and 75th percentiles of these 12 data values
36 37 38 39 44 44 47 50 53 57 65 69
3.1 - 98
Example of Finding Percentile
25% of 12
75% of 12
312100
25
i
912100
75
i
3.1 - 99
Example of Finding Percentile
• The data can be grouped as follows:
36 37 38 39 44 44 47 50 53 57 65 69
25% of the data is below 38.5 (the mean of 38 and 39).
The 25th percentile is 38.5
3rd position
3.1 -
100
Example of Finding Percentile
• The data can be grouped as follows:
36 37 38 39 44 44 47 50 53 57 65 69
75% of the data is below 55 (the mean of 53 and 57).
The 75th percentile is 55
9th position
3.1 -
101
Example of Finding Percentile
• Find the 25th and 75th percentiles of these 7 data values:
36 38 39 44 47 50 65
3.1 -
102
Example of Finding Percentile
25% of 7
75% of 7
75.17100
25
i
25.57100
75
i
3.1 -
103
Example of Finding Percentile
36 38 39 44 47 50 65
2nd position
The 25th percentile is 38
1.75 round up to position 2
3.1 -
104
Example of Finding Percentile
36 38 39 44 47 50 65
6th position
The 75th percentile is 50
5.25 round up to position 6
3.1 -
105
Example of Finding Percentile
Page 132
3.1 -
106
Example of Finding Percentile
2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7
Note: 15 data values
Data:
3.1 -
107
Example of Finding Percentile
16(a) To find position, 5% of 15
16(b) To find position, 95% of 15
1 toup rounded 75.015100
5
i
15 toup rounded 25.1415100
95
i
3.1 -
108
Example of Finding Percentile
2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7
16(a) 5th percentile is 2.0 million
16(b) 95th percentile is 14.7 million
Data:
3.1 -
109
z Score (or standardized value)
the number of standard deviations that a given value x is above or below the mean
Z score
3.1 -
110
Sample Population
Z score Formulas
s
xxz
xz
3.1 -
111
Interpreting Z Scores
1. A z-score has no units.
2. Whenever a value is greater than the mean, its z score is positive
3. Whenever a value is less than the mean, its z score is negative
3.1 -
112
Example of Finding Z Score
Page 132
3.1 -
113
Example of Finding Z score
2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7
Using calculator we get that
Data:
4.3 and 1.5 sx
3.1 -
114
Example of Finding Z score
2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7
18(a) z-score for fish oil (data value is 4.2)
26.0
4.3
1.52.4
z
3.1 -
115
Example of Finding Z score
2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7
19(a) z-score for Ginseng (data value is 8.8)
11.1
4.3
1.58.8
z
3.1 -
116
Outliers
An outlier is an extreme data value.
We will define a data value as “extreme” if it is at least three standard deviations from the mean.
3.1 -
117
Outliers and z Scores
Data values are not unusual (exteme) if
Data values are unusual or outliers if
22 z
3or 3 zz
3.1 -
118
Interpreting Z Scores
page 131: bell shaped distribution
3.1 -
119
3.5 Chebyshev‟s Rule and the Empirical Rule
Calculate percentages using Chebyshev‟s Rule
Find percentages and data values using the Empirical Rule
3.1 -
120
Chebyshev‟s Rule
The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1–1/K2, where K is any positive number greater than 1.
For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean.
For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.
3.1 -
121
Example of Chebyshev‟s Rule
Page 139, problem 11(a)
A data distribution has a mean of 500 and a standard deviation of 100. Suppose we do not know whether the distribution is bell-shaped. (a) Estimate the proportion of data that falls between 300 and 700.
3.1 -
122
Example of Chebyshev‟s Rule
Page 139, problem 11(a) ANSWER: data values obey First compute k using given
skx 003
700300 x
2 700100500 003 kk
100 and 500 sx
700 skx
3.1 -
123
Example of Chebyshev‟s Rule
Page 139, problem 11(a) ANSWER: Since k =2, And Chebyshev’s rule says that at least 75% of the data falls between 300 and 700.
4
3
4
11
2
11
11
22
k
3.1 -
124
The Empirical Rule
For data sets having a distribution that is approximately bell shaped, the following properties apply:
About 68% of all values fall within 1 standard deviation of the mean.
About 95% of all values fall within 2 standard deviations of the mean.
About 99.7% of all values fall within 3 standard deviations of the mean.
3.1 -
125
For data sets having a distribution that is approximately bell shaped, the following properties apply:
About 68% of all data values obey
About 95% of all data values obey
About 99.7% of all data values obey
11 z
22 z
33 z
The Empirical Rule
3.1 -
126
The Empirical Rule
3.1 -
127
The Empirical Rule
3.1 -
128
The Empirical Rule
3.1 -
129
The Empirical Rule
Explain these percentages.
page 136: bell shaped distribution
3.1 -
130
The Empirical Rule
For the green regions: 1. 50% of the data lies to the left of z=0. 2. 34% (half of 68%) of the data lies between z=-1 and
z=0. Therefore, 16% (=50%-34%) of the data is to the left of z=-1.
3. 47.5% (half of 95%) of the data lies between z=-2 and z=0. Therefore, 2.5% (=50%-47.5%) of the data is to the left of z=-2.
4. Subtracting areas gives that 13.5%=16%-2.5% of the data lies between z=-2 and z=-1.
5. Using symmetry, 13.5% of the data also lies between
z=1 and z=2.
3.1 -
131
Example of Empirical Rule
Page 139, problem 12(a) A data distribution has a mean of 500 and a standard deviation of 100. Assume that the distribution is bell-shaped. (a) Estimate the proportion of data that falls between 300 and 700.
3.1 -
132
Page 139, problem 12(a) ANSWER: data values obey As in problem 11, we are given so that the data in the interval are within 2 standard deviations of the mean
700300 x
100 and 500 sx
Example of Empirical Rule
3.1 -
133
Page 139, problem 12(a) ANSWER: Here we are also given that the distribution is bell-shaped. Using the empirical rule, approximately 95% of the data lies between 300 and 700.
Example of Empirical Rule
3.1 -
134
Note the difference in problems 11 and 12: For problem 11 we are not told that the distributioin is bell-shaped and we can only say that “at least” 75% of data is between 300 and 700 (using Chebyshev’s Rule). We cannot use the Empirical Rule in problem 11.
Empirical Rule vs. Chebyshev‟s Rule
3.1 -
135
Example
Page 141, problem 24(a) In San Francisco the mean and standard deviation of the wind speed in January is (Note these are population parameters) Assume that the distribution of the wind speed is bell-shaped. (a) Estimate the proportion of times that the wind speed is between 1.2 mph and 13.2 mph.
mph 7.2 andmph 2.7
3.1 -
136
Let the variable x represent wind speed so that Note that the mean 7.2 is the midpoint of this interval. Calculate how many standard deviations from the mean this interval represents. Use the formula:
2.132.1 x
Example
deviation standard
mean valuenumerical k
3.1 -
137
Example
83.07.2
7.2 1.2
k 83.0
7.2
7.2 13.2
k
so that the data in the interval are within 0.83 standard deviations of the mean.
ANSWER
The empirical rule implies that less than 68% of the time the windspeed is between 1.2 mph and 13.2 mph.
Note: convice yourself of this by sketching areas below the bell-shaped distribution.
3.1 -
138
Example
Page 141, problem 24(b)
(b) Estimate the proportion of times that the wind speed is less than 1.2 mph.
3.1 -
139
Example
Page 141, problem 24(b)
1. Since 0.0 mph is 1 standard deviation below the mean of 7.2 mph, the empirical rule implies that 34% (half of 68%) of the time, the windspeed is between 0.0 mph and 7.2 mph.
2. Subtracting 34% from 50% gives that 16% of the time the windspeed is less than 0.0 mph.
3. Since 1.2 mph is greater than 0.0 mph but less than 7.2 mph, we can say that at least 16% of the time but no more than 50% of the time the windspeed is less than 0.0 mph
Note: convice yourself of this by sketching areas below the bell-shaped
distribution.
3.1 -
140
3.6 Robust Measures
• Find quartiles and the interquartile range
•Calculate the five number summary of a data set
•Construct a boxplot for a given data set
•Apply robust detection of outliers
3.1 -
141
Quartiles
Q1 (First Quartile) is the 25th percentile
Q2 (Second Quartile) is the 50th percentile or the median
Q3 (Third Quartile) is the 75th percentile
Are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.
3.1 -
142
Example of Quartiles
1 Q
• Given the 24 data values (sorted):
36 37 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 69 75
Find , ,
2 Q3 Q
3.1 -
143
For first quartile (25th percentile), position:
therefore the first quartile is the mean of the data values in positions 6 and 7
624100
25
i
Example of Quartiles
0.422
4341
2 76
1
xx
Q
3.1 -
144
For second quartile, (50th percentile), position:
therefore the second quartile is the mean of the data values in positions 12 and 13
1224100
50
i
5.532
5453
2 1312
2
xx
Q
Example of Quartiles
3.1 -
145
For third quartile, (75th percentile), position:
therefore the third quartile is the mean of the data values in positions 18 and 19
1824100
75
i
0.602
6159
2 1918
3
xx
Q
Example of Quartiles
3.1 -
146
Interquartile Range
The Interquartile Range (IQR) is the difference between the third quartile and the first quartile which measures the spread of the middle 50% of the data:
It is considered a “robust” measure of variability because it is not affected by outliers in the data (bottom 25% and top 25% of data are ignored).
13IQR QQ
3.1 -
147
0.60 and 0.42 31 QQ
Example of IQR
• Given the 24 data values (sorted):
36 37 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 69 75
we found that
0.180.420.60IQR 13 QQ
3.1 -
148
0.60 and 0.42 31 QQ
Example of IQR
• Introduce outliers into previous data set:
2 5 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 100 200
we still have:
0.180.420.60IQR 13 QQ
3.1 -
149
For a set of data, the 5-number
summary consists of the
minimum value; the first quartile
Q1; the median (or second
quartile Q2); the third quartile,
Q3; and the maximum value.
5-Number Summary
3.1 -
150
Example of Five Number Summary
Given Data (sorted):
128 130 133 137 138 142 142 144 147 149
151 151 151 155 156 161 163 163 166
3.1 -
151
• Minimum data value is 128
• First quartile location
• round up to get
138 51 xQ
Example of Five Number Summary
75.419100
25
i
3.1 -
152
• Second quartile location
• round up to get
149 102 xQ
Example of Five Number Summary
5.919100
50
i
3.1 -
153
• Third quartile location
• round up to get
156 153 xQ
Example of Five Number Summary
25.1419100
75
i
3.1 -
154
• Max data value is 166
• Five Number Summary
min max
128 138 149 156 166
Example of Five Number Summary
3 Q2 Q1 Q
3.1 -
155
Using a five number summary,
a data value is an outlier if
1. It is located 1.5(IQR) or more
below the first quartile
2. It is located 1.5(IQR) or more
above the third quartile
Robust Detection of Outliers
3.1 -
156
• Given the data set: 2 5 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 100 200
has a five number summary:
2 42.0 53.5 60.0 200
0.180.420.60IQR 13 QQ
Robust Detection of Outliers
3.1 -
157
Calculate: 1.5(IQR)=27
1.5(IQR) below the first quartile:
42 - 27=15
1.5(IQR) above the third quartile:
60 + 27=87
Therefore, 2,5,100,200 are outliers
Robust Detection of Outliers
3.1 -
158
A boxplot (or box-and-whisker-
diagram) is a graph of a data set
that consists of a line extending
from the minimum value to the
maximum value, and a box with
lines drawn at the first quartile,
Q1; the median; and the third
quartile, Q3.
Boxplot
3.1 -
159
Example of Boxplot
3.1 -
160
Sorted amounts of Strontium-90 (in millibecquerels) in a random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979.
128 130 133 137 138 142 142 144 147 149
151 151 151 155 156 161 163 163 166 172
Example of Boxplot
3.1 -
161
• Five Number Summary
128 140 150 158.5 172
• Boxplot?
Next slide: page 148 constructing a boxplot by hand
or calculator or SPSS
Example of Boxplot
3.1 -
162
Constructing a Boxplot
Page 148 (see example 3.41)
3.1 -
163
Example of Boxplot
• Boxplot
3.1 -
164
• Enter the data in a list:
Calculator Five Number Summary
3.1 -
165
• Go to STAT - CALC and choose 1-Var Stats
Calculator Five Number Summary
3.1 -
166
• On the HOME screen, when 1-Var Stats appears, type the list containing the data.
Calculator Five Number Summary
3.1 -
167
• Arrow down to the five number summary (last five items in the list)
Calculator Five Number Summary
3.1 -
168
• CLEAR out the graphs under y = (or turn them off).
• Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries)
Calculator Boxplot
3.1 -
169
• Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the second box-and-whisker icon is highlighted, and that the list you
will be using is indicated next to Xlist.
Calculator Boxplot
3.1 -
170
• To see the box-and-whisker plot, press ZOOM and #9 ZoomStat. Press the TRACE key to see on-screen data about the box-and-whisker plot.
Calculator Boxplot
3.1 -
171
Boxplot - Symmetric Distribution
Normal Distribution: Heights from a Random Sample of Women
1223 QQQQ NOTE:
value)data(min value)data(max 23 QQ
3.1 -
172
Boxplot - Skewed Distribution
Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches