![Page 1: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/1.jpg)
Chapter 3
Describing Distributions Numerically
![Page 2: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/2.jpg)
Describing the Distribution
• Center– Median– Mean
• Spread– Range– Interquartile Range– Standard Deviation
![Page 3: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/3.jpg)
Median
• Literally = middle number (data value)• n (number of observations) is odd
– Order the data from smallest to largest– Median is the middle number on the list– (n+1)/2 number from the smallest value
• Ex: If n=11, median is the (11+1)/2 = 6th number from the smallest value
• Ex: If n=37, median is the (37+1)/2 = 19th number from the smallest value
![Page 4: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/4.jpg)
Example – August Temps
• High Temperatures for Des Moines, Iowa taken from the first 13 days of August 2005.
71 76 81 81 85 86 90 90 91 93 93 96 96
Remember to order the values, if they aren’t already in order!
• 13 observations
– (13+1)/2 = 7th observation from the bottom
• Median = 90
![Page 5: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/5.jpg)
Median
• n is even– Order the data from smallest to largest– Median is the average of the two middle
numbers– (n+1)/2 will be halfway between these two
numbers• Ex: If n=10, (10+1)/2 = 5.5, median is average
of 5th and 6th numbers from smallest value
![Page 6: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/6.jpg)
Example – Yankees
•Scores of last 10 games2 3 3 5 5 5 6 7 7 10
•Remember to order the values if they aren’t already in order!
• 10 observations– (10 + 1)/2 = 5.5,
average of 5th and 6th observations from bottom
• Median = 5
![Page 7: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/7.jpg)
Mean
• Ordinary average– Add up all observations– Divide by the number of observations
• Formula – n observations
– y1, y2, y3, …, yn are the values
![Page 8: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/8.jpg)
Mean ( )
xn
n
xxxxn
xx
n
1
321
x
![Page 9: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/9.jpg)
Example – Vikings (as of 1/9)
• Find the mean of the (17 values)
13 14 16 18 20 22 23 27
27 28 28 31 31 31 34 35 38
65.2517
)38...18161413(
x
![Page 10: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/10.jpg)
Example – Colts as of (1/9)
• Find the mean of the scores (17 values)14 20 23 24 24 24 31 31 34 35 35 41 41 45 49 49 51
59.3317
)51...24232014(
x
![Page 11: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/11.jpg)
Mean vs. Median
• Median = middle number
• Mean = value where histogram balances• Mean and Median similar when
– Data are symmetric
• Mean and median different when– Data are skewed– There are outliers
![Page 12: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/12.jpg)
Mean vs. Median
• Mean influenced by unusually high or unusually low values– Example: Income in a small town of 6
people$25,000 $27,000 $29,000
$35,000 $37,000 $38,000
**The mean income is $31,830
**The median income is $32,000
![Page 13: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/13.jpg)
Mean vs. Median– Bill Gates moves to town
$25,000 $27,000 $29,000
$35,000 $37,000 $38,000 $40,000,000
**The mean income is $5,741,571
**The median income is $35,000
– Mean is pulled by the outlier – Median is not– Mean is not a good center of these data
![Page 14: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/14.jpg)
Mean vs. Median
• Skewness pulls the mean in the direction of the tail– Skewed to the right = mean > median– Skewed to the left = mean < median
• Outliers pull the mean in their direction– Large outlier = mean > median– Small outlier = mean < median
![Page 15: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/15.jpg)
Weighted Mean
• Used when values are not equally represented.
• Weighted mean =
n
nn
www
xwxwxw
w
wXX
...
...
21
2211
![Page 16: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/16.jpg)
Example (weighted mean)
Area % Favored Number surveyed
1 40 1000
2 30 3000
3 50 800
A recent survey of new diet cola reported the following percentages of people who liked the taste.
Find the weighted mean of the percentages.
![Page 17: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/17.jpg)
Example (cont.)
x1 = .40
x2 = .30
x3 = .50
w1 = 1000
w2 = 3000
w3 = 800
Use formula:
{.40(1000) + .30(3000) + .50(800)} / {1000+3000+800}
= 1700/4800
= 0.354 = 35.4%
![Page 18: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/18.jpg)
Spread
• Range is a very basic measure of spread (Max – Min).– It is highly affected by outliers– Makes spread appear larger than reality– Ex. The annual numbers of deaths from
tornadoes in the U.S. from 1990 to 2000: 53 39 39 33 69 30 25 67 130 94 40
• Range with outlier: 130 – 25 = 105• Range without outlier: 94 – 25 = 69
![Page 19: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/19.jpg)
Spread
• Interquartile Range (IQR)– First Quartile (Q1)
• 25th Percentile
– Third Quartile (Q3)• 75th Percentile
• IQR = Q3 – Q1– Center (Middle) 50% of the values
![Page 20: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/20.jpg)
Finding Quartiles
• Order the data
• Split into two halves at the median– When n is odd, include the median in both
halves– When n is even, do not include the median
in either half
• Q1 = median of the lower half
• Q3 = median of the upper half
![Page 21: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/21.jpg)
Top 15 Populations US Cities 2004New York, N.Y. 810
Los Angeles, Calif. 385
Chicago, Ill. 286
Houston, Tex. 201
Philadelphia, Pa. 147
Phoenix, Ariz. 142
San Diego, Calif. 126
San Antonio, Tex. 124
Dallas, Tex. 121
San Jose, Calif. 90
Detroit, Mich. 90
Indianapolis, Ind. 78
Jacksonville, Fla. 78
San Francisco, Calif. 74
* Populations were all divided by 10,000.
![Page 22: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/22.jpg)
Example – Top City Populations
• Order the values (14 values) 74 78 78 90 90 121 124 126 142 147
201 286 385 810
Lower Half = 74 78 78 90 90 121 124 Q1 = Median of lower half = 90
• Upper Half = 126 142 147 201 286 385 810– Q3 = Median of upper half = 201
• IQR = Q3 – Q1 = 201 - 90 = 111
![Page 23: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/23.jpg)
August High Temps (8/1–8/13)
• Order the values (13 values)
71 76 81 81 85 86 90 90 91 93 93 96 96
• Lower Half = 71 76 81 81 85 86– Q1 = Median of lower half = 81
• Upper Half = 90 90 91 93 93 96 96– Q3 = Median of upper half = 93
• IQR = Q3 – Q1 = 93 - 81 = 12
![Page 24: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/24.jpg)
August High Temps (8/14–8/25)
• Order the values (12 values) 76 77 77 79 81 83 84 85 86 88 91 93
• Lower Half = 76 77 77 79 81 83– Q1 = Median of lower half = 78
• Upper Half = 84 85 86 88 91 93– Q3 = Median of upper half = 87
• IQR = Q3 – Q1 = 87-78 = 9
![Page 25: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/25.jpg)
Five Number Summary
• Minimum
• Q1
• Median
• Q3
• Maximum
![Page 26: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/26.jpg)
Examples
• Vikings (as of 1/9)– Min = 13 – Q1 = 20– Median = 27 – Q3 = 31 – Max = 38
• Colts (as of 1/9)– Min = 14– Q1 = 24– Median = 34– Q3 = 41– Max = 51
![Page 27: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/27.jpg)
Graph of Five Number Summary
• Boxplot– Box between Q1 and Q3– Line in the box marks the median– Lines extend out to minimum and
maximum
• Best used for comparisons
• Use this simpler method
![Page 28: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/28.jpg)
Example – Vikings & Colts
• Boxplot of Vikings scores– Box from 20 to 31– Line in box 27– Lines extend out from box from 14 and 38
• Boxplot of Colts scores– Box from 24 to 41– Line in box at 34– Lines extend out from box to 14 and 51
![Page 29: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/29.jpg)
Side by Side Boxplots of Vikings Scores and Colts Scores
![Page 30: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/30.jpg)
Spread
• Standard deviation– “Average” spread from mean – Most common measure of spread– Denoted by letter s– Make a table when calculating by hand
![Page 31: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/31.jpg)
Standard Deviation
1
)()()(
1
1
1
222
21
2
2
n
xxxxxx
n
xx
xxn
s
n
![Page 32: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/32.jpg)
Example – Deaths from Tornadoes
53 53-56.27 =-3.27 10.69
39 39-56.27 = -17.27 298.25
39 39-56.27 = -17.27 298.25
33 33-56.27 = -23.27 541.49
69 69-56.27 = 12.73 162.05
30 30-56.27 = -26.27 690.11
25 25-56.27 = -31.27 977.81
67 67-56.27 = 10.73 115.13
130 130-56.27 = 73.73 5436.11
94 94-56.27 = 37.73 1423.55
40 40-56.27 = -16.27 264.71
x )( xx
97.31111
71.26425.29869.10
s
2)( xx
![Page 33: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/33.jpg)
Example - Vikings
• Find the standard deviation of the scores of Vikings games given the following statistic:
2436.732)( 2 xx
77.6117
2436.732
1
)( 2
n
xxs
![Page 34: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/34.jpg)
Properties of s
• s = 0 only when all observations are equal; otherwise, s > 0
• s has the same units as the data
• s is not resistant – Skewness and outliers affect s, just like mean– Tornado Example:
• s with outlier: 31.97• s without outlier: 21.70
![Page 35: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/35.jpg)
Which summaries should you use with different distributions?
• The appropriate measures of center and spread when your distribution is symmetric are:
– Mean– Standard deviation
• The appropriate measures of center and spread when your distribution is skewed are:
– Median– IQR
![Page 36: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/36.jpg)
Comparing Variance
• When comparing the variance for two sets of numbers find the coefficient of variation:
• Formula = Cvar = =
– Then compare the percentages.
x
s
![Page 37: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/37.jpg)
Standardizing (first look)
• I got a 85 on my English test and you got a 36 on your Spanish test. Who did better?
• How can we compare things that come from different scales?
• Standardizing– Use z formula (called z-score)
s
xxz
![Page 38: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/38.jpg)
Standardizing
Z=standardized score
X = raw score
X-bar = mean of raw scores
S = sample standard deviation
• So what does this mean for our test scores?
s
xxz
![Page 39: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/39.jpg)
Standardizing
• I got a 85 on my English test and you got a 35 on your Spanish test. Who did better?
• Now I need to give you more information.
• The English class’s tests had a mean of 83 and a standard deviation of 3.
• The Spanish tests had a mean of 30 and a standard deviation of 2.
![Page 40: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/40.jpg)
Standardizing
5.22/52
3035
667.03/23
8385
z
z
![Page 41: Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation](https://reader035.vdocuments.mx/reader035/viewer/2022081512/56649c735503460f94925d98/html5/thumbnails/41.jpg)
Comparing Standardized Scores
• I scored .667 standard deviations above the mean on my English test where you scored 2.5 standard deviations above the mean on your Spanish test.
• Comparatively you scored better on your exam.