mathematical statistics anna janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf ·...
TRANSCRIPT
Mathematical Statistics
Anna Janicka
Lecture II, 25.02.2019
DESCRIPTIVE STATISTICS, PART II
Plan for today
1. Descriptive Statistics, part II:
� mode
� quantiles
� measures of variability
� measures of asymmetry
� the boxplot
Measures of central tendency – reminder
� Classic:
� arithmetic mean
� Position (order, rank):
� median
� mode
� quartile
Example 1 – cont.
Mode – examples
Quartiles – example
Variance – example
Grade Number Frequency
2 59 31.89%
3 63 34.05%
3.5 33 17.84%
4 14 7.57%
4.5 10 5.41%
5 6 3.24%
Total 185 100%
Example 3 – cont.
IntervalClass
markNumber Frequency
Cumulative
numbercni
Cumulative
frequencycfi
(30,40] 35 11 0,11 11 0,11
(40,50] 45 23 0,23 34 0,34
(50,60] 55 33 0,33 67 0,67
(60,70] 65 12 0,12 79 0,79
(70,80] 75 6 0,06 85 0,85
(80,90] 85 8 0,08 93 0,93
(90,100] 95 3 0,03 96 0,96
(100,110] 105 2 0,02 98 0,98
(110,120] 115 2 0,02 100 1
Total 100 1Mode – example
Quartiles – example
Variance – example
Mode
Mode
the value that appears most often
� for grouped data:
Mo = most frequent value
� for grouped class interval data:
where
nMo – number of elements in mode’s class,
cL, b – analogous to the median
bnnnn
nncMo
MoMoMoMo
MoMoL ⋅
−+−−
+≅+−
−
)()( 11
1
Mode – examples
Example 1:
Mo = 3
Example 3:
the mode’s interval is (50,60], with 33 elements
nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12
23.5310)1233()2333(
233350 ≈⋅
−+−−
+≅Mo
Example 1 –
cont.
Example 3 –
cont.
Which measure should we choose?
� Arithmetic mean: for typical data series
(single max, monotonous frequencies)
� Mode: for typical data series, grouped data
(the lengths of the mode’s class and
neighboring classes should be equal)
� Median: no restrictions. The most robust (in
case of outlier observations, fluctuations
etc.)
Quantiles, quartiles
� p-th quantile (quantile of rank p): number
such that the fraction of observations less
than or equal to it is at least p, and values
greater than or equal to it at least 1-p
� Q1 : first quartile = quantile of rank ¼
� Second quartile = median
= quantile of rank ½
� Q3: Third quartile = quantile of rank ¾
Quantiles – cont.
Empirical quantile of rank p:
∉
∈+
=
+
+
ZnpX
ZnpXX
Q
nnp
nnpnnp
p
:1][
:1:
2
Quartiles – cont.
� Quantiles for p = ¼ and p = ¾.
� For grouped class interval data –
analogous to the median
for k=1 or 3
where M1, M3 – number of the quartile’s class
b – length of quartile class interval
cL – lower end of the quartile class interval
−
⋅+≅ ∑
−
=
1
14
k
k
M
i
i
M
Lk nnk
n
bcQ
Quartiles – examples
Example 1:
so
Example 3:
so
75100 25100 43
41 =⋅=⋅
4M ,2 31 ==M
67,66)6775(12
106009,46)1125(
23
1040 31 ≈−+≅≈−+≅ QQ
Example1 –
cont.
Example 3 –
cont.
75.13818525.46185 41 =⋅=⋅ 4
3
5.3,2 185:1393185:471 ==== XX Q Q
Variability measures
� Classical measures
� variance, standard deviation
� average (absolute) deviation
� coefficient of variation
� Measures based on order statistics
� range
� interquartile range
� quartile deviation
� coefficients of variation (based on order stats)
� median absolute deviation
Measures based on order statistics
� Range
the most simple measure, does not take into
account anything but the extreme values
� Inter Quartile Range (midspread, middle fifty)
more robust than the range
nnn XXr :1: −=
13 QQIQR −=
may be further used to calculate quartile deviation Q= IQR/2, and
coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile
variation coefficient) or the typical range: [Med – Q, Med + Q]
length of the interval that covers the
middle 50% observations
Range, interquartile range – examples
Example 1:
Example 3:
(in reality
5.125.3
,325
=−=
=−=
IQR
r
58,2009,4667,66
)45,8645329118
9030120
=−≅
=
=−≅
IQR
,-,
r
Classical measures of dispersion
Variance
� raw data
� grouped data
� grouped class interval data
+ Sheppard’s correction
in general
2
1
21
1
212 )()(ˆ ∑∑==
−=−=n
i
in
n
i
inXXXXS
2
1
21
1
212 )()(ˆ ∑∑==
−=−=k
i
iin
k
i
iinXXnXXnS
2
1
21
1
212 )()(ˆ ∑∑==
−=−≅k
i
iin
k
i
iinXcnXcnS
12
22 2ˆ cSS −≅
c=length of
class
interval (for
equal
intervals)
2
1
112122 )(ˆ ∑
=−−−≅
k
i
iiinccnSS
Variance – examples
Example 1:
Example 3:
in reality
( )6)99.25(10)99.25.4(14)99.24(33)99.25.3(63)99.23(59)99.22( 222222
1851 ⋅−+⋅−+⋅−+⋅−+⋅−+⋅−
69.0
ˆ 2
≈
≈S
98.32212
1031.331
31.331
ˆ
22
10012
≈−=
=
⋅≈
S
S
)2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(
12)7.5865(33)7.5855(23)7.5845(11)7.5835((
22222
2222
⋅−+⋅−+⋅−+⋅−+⋅−+
⋅−+⋅−+⋅−+⋅−
85.333ˆ 2 =S
Example 1 – cont.
Example 3 – cont.
distrubution not normal or
sample too small for
Sheppard’s correction –
larger errors from small
sample size than from class
grouping.
Standard deviation
In the same units as the initial variable
Example 1:
Example 3:
22 ,ˆˆ SSSS ==
[grade] 83.0ˆ ≈S
][m 22.18ˆ ≈S
Average (absolute) deviation, mean deviation
Nowadays seldom used. Simple
calculations.
for raw data
etc...
We have: d<S
∑=
−=n
i
inXXd
1
1 ||
Coefficient of variation (classical)
For comparisons of the same varaible
accross populations or different
variables for the same population
%)100( or
%),100(ˆ
⋅=
⋅=
X
dV
X
SV
d
S
Skewness (asymmetry)
left symmetry right
(negative) (zero) (positive)
(typical order)
MoMedX << MoMedX == MoMedX >>
Measures of asymmetry
� Skewness
where M3 is the third
central moment
� Skewness coefficient
� Quartile skewness coefficient
3
3
S
MA =
ˆ
or ˆ 11
S
MedXA
S
MoXA
−=
−=
13
132
2
QMedQA
−+−
=measures skewness for the
middle 50% observations only
Interpretation
� positive values= positive asymmetry (right
skewed distribution)
� negative values = negative asymmetry
(left skewed distribution)
� For the skewness coefficient (with the
median) and the quartile skewness
coefficient the strength of asymmetry
(absolute value):
� 0 – 0.33: weak
� 0.34 – 0.66: medium
� 0.67 – 1: strong
Asymmetry – examples
Example 1:
Example 3:
15.009.4667.66
09.4685.54267.66
24.02.18
85.547.583.0
2.18
23.537.58
,15.1
2
11
≈−
+⋅−≅
≈−
=≈−
≅
≅
A
)( A or )( A
A
MedMo
3
1
25.3
2325.3
01.083.0
399.2
01.083.0
399.2
41.0
2
1
1
−=−+⋅−
=
−≈−
=
−≈−
=
≈
A
MoA
MedA
A
)(
)(
Boxplot
(Box and whisker plot)
Allows to compare two (or more)
populations
outliers:
xmax
outliers
X*
Q3
Med
Q1
X*
outliers
xmin
]},[:max{
]},[:min{
23
33
123
1
IQRQQXXX
QIQRQXXX
ii
ii
+∈=
−∈=∗
∗
∗∗ >< XxXx or
Boxpolot – example of comparison
05
10
15
1 2
Examples (1)
Source: CSO Poland, 2009
Examples (2)
Source: European
Commission
Dispersion (coeffcient of variability) of unemployment
rates
Examples (3)
Growth charts
Source: WHO
Examples (4)
Gross hourly earinings
Source: European Commission
Examples(5)
Salary by occupational group and gender
Source: New Zealand State Services