04/18/23 Chapter 2 1
Chapter 2
Describing Distributions
with Numbers
04/18/23 Chapter 2 2
Numerical Summaries of:
• Central location– mean– median
• Spread– Range– Quartiles – Standard Deviation / variance
• Shape measures not covered
04/18/23 Chapter 2 3
Arithmetic Mean
• Most common measure of central location
• Notation (“xbar”):
xnx x x
nxn i
i
n
1 1
1 2
1
x
Where
n is the sample size
∑ is the summation symbol
04/18/23 Chapter 2 4
Example: Sample MeanData: Metabolic rates, calories / day:
1792 1666 1362 1614 1460 1867 1439
1600 7
200,11
7
1439186714601614136216661792
x
04/18/23 Chapter 2 5
Median (M)
• Half the values are less than the median, half are greater
• If n is odd, the median is the middle ordered value
• If n is even, the median is the average of the two middle ordered values
04/18/23 Chapter 2 6
Examples: Median• Example 1: 2 4 6
Median = 4
• Example 2: 2 4 6 8 Median = 5 (average of 4 and 6)
• Example 3: 6 2 4 Median 2
(Values must first be ordered first 2 4 6 , Median = 4)
04/18/23 Chapter 2 7
Example: Median
Ordered array:
1362 1439 1460 1614 1666 1792 1867 median
Data = metabolic rates in slide 4 (n = 7)
The location of the median in ordered array: L(M) = (n + 1) / 2
Value of median = 1614
04/18/23 Chapter 2 8
The Median is robust to outliers
This data set:
1362 1439 1460 1614 1666 1792 1867
has median 1614 and mean 1600
This similar data with high outlier:
1362 1439 1460 1614 1666 1792 9867
still has median 1614 but now has mean 2742.9
04/18/23 Chapter 2 9
The skew pulls the mean
• The average salary at a high tech firm is $250K / year
• The median salary is $60K
• What does this tell you?
• Answer: There are some very highly paid executives, but most of the workers make modest salaries, i.e., there is a positive skew to the distribution
04/18/23 Chapter 2 10
Spread = Variability• Amount of spread around the center!
• Statistical measures of spread
–Range
–Inter-Quartile Range
–Standard deviation
Range and IQR• Range = maximum – minimum
• Easy, but NOT as good as the…
• Quartiles & Inter-Quartile Range (IQR)– Quartile 1 (Q1) cuts off bottom 25% of data
(“25th percentile”)– Quartile 2 (Q2) cuts off two-quarters of data– same as the Median!– Quartile 3 (Q3) cuts off three-quarters of the
data (“75th percentile”)
04/18/23 Chapter 2 12
Obtaining Quartiles• Order data
• Find the median
• Look at the lower half of data set – Find “median” of this lower half– This is Q1
• Look at the upper half of the data set. – Find “median” of this upper half – This is Q3
04/18/23 Chapter 2 13
Example: QuartilesConsider these 10 ages:05 11 21 24 27 28 30 42 50 52
median
The median of the bottom half (Q1) = 2105 11 21 24 27
The median of the top half (Q3) = 4228 30 42 50 52
04/18/23 Chapter 2 14
Example 2: Quartiles, n = 53
100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212
L(M)=(53+1) / 2 = 27 Median = 165
04/18/23 Chapter 2 15
Example 2: Quartiles, n = 53
100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212
Bottom half has n* = 26 L(Q1)=(26 + 1) / 2= 13.5 from bottom
Q1 = avg(127, 128) = 127.5
04/18/23 Chapter 2 16
Example 2: Quartiles, n = 53
100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212
Top half has n* = 26 L(Q3) = 13.5 from the top!
Q3 = avg(185, 185) = 185
04/18/23 Chapter 2 17
10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0
Example 2Quartiles
Q2 = 165
Q3 = 185
Q1 = 127.5
"5 point summary"
= {Min, Q1, Median, Q3, Max}
= {100, 127.5, 165, 185, 260}
04/18/23 Chapter 2 18
Inter-quartile Range (IQR)
• Q1 = 127.5
• Q3 = 185
Inter-QuartileRange (IQR)= Q3 Q1
= 185 – 127.5= 57.5
“spread of middle 50%”
04/18/23 Chapter 2 19
M
Simple Box5-point summary graphically
Q1 Q3min max
100 125 150 175 200 225 250 275
Weight
04/18/23 Chapter 2 20
Boxplots are useful for comparing groups
04/18/23 Chapter 2 21
Standard Deviation & Variance
• Most popular measures of spread
• Each data value has a deviation, defined as:
xxi
04/18/23 Chapter 2 22
Example: DeviationsMetabolic data (n = 7)
1 1439 1600 161x x
1 1792 1600 192x x
04/18/23 Chapter 2 23
Variance• Find the mean• Find the deviation of each value • Square the deviations• Sum the squared deviations• Divide by (n − 1)
sn
x xii
n2
1
12
1
( )
( )
04/18/23 Chapter 2 24
DataData: Metabolic rates, n = 7
1792 1666 1362 1614 1460 1867 1439
04/18/23 Chapter 2 25
“Sum of Squares”Obs Deviations Squared deviations
1792 17921600 = 192 (192)2 = 36,864
1666 1666 1600 = 66 (66)2 = 4,356
1362 1362 1600 = -238 (-238)2 = 56,644
1614 1614 1600 = 14 (14)2 = 196
1460 1460 1600 = -140 (-140)2 = 19,600
1867 1867 1600 = 267 (267)2 = 71,289
1439 1439 1600 = -161 (-161)2 = 25,921
11,200 0 214,870
xxi ix 2xxi
SUMS11200
16007
x 2( ) "Sum of Squares"ix x
04/18/23 Chapter 2 26
Variance
67.811,35
870,21417
11
1 22
xxn
s i
Sum of Squares
04/18/23 Chapter 2 27
Standard Deviation
2ss
Square root of variance
24.18967.811,35 s
04/18/23 Chapter 2 28
Standard DeviationDirect Formula
189
870,21417
1
1
1 2
xxn
s i
Use calculator to check work!
TI-30XIIS sequence:• On > CLEAR > 2nd > STAT >
Scroll > Clear Data > Enter• 2nd > STAT > 1-VAR or 2-VAR• DATA > “enter data• STATVAR key
I’m supporting the TI-30XIIS only
04/18/23 Chapter 2 30
Choosing Summary Statistics
• Use the mean and standard deviation to describe symmetrical distributions & distributions free of outliers
• Use the median and quartiles (IQR) to describe distributions that are skewed or have outliers
04/18/23 Chapter 2 31
Example: Number of Books Read
0 1 2 4 10 30
0 1 2 4 10 990 1 2 4 120 1 3 5 130 2 3 5 140 2 3 5 140 2 3 5 150 2 4 5 150 2 4 5 201 2 4 6 20
M
n = 52 L(M)=(52+1)/2=26.5
04/18/23 Chapter 2 32
Example: Books read, n = 52
5-point summary: 0, 1, 3, 5.5, 99Highly asymmetric distribution
The mean (“xbar” = 7.06) and standard deviation (s = 14.43) give false impressions of location and spread for this distribution and are considered
inappropriate. Use the median and 5-point summary instead.
0 10 20 30 40 50 60 70 80 90 100 Number of books