mathematical statistics anna janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf ·...

32
Mathematical Statistics Anna Janicka Lecture II, 25.02.2019 DESCRIPTIVE STATISTICS, PART II

Upload: others

Post on 21-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Mathematical Statistics

Anna Janicka

Lecture II, 25.02.2019

DESCRIPTIVE STATISTICS, PART II

Page 2: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Plan for today

1. Descriptive Statistics, part II:

� mode

� quantiles

� measures of variability

� measures of asymmetry

� the boxplot

Page 3: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Measures of central tendency – reminder

� Classic:

� arithmetic mean

� Position (order, rank):

� median

� mode

� quartile

Page 4: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Example 1 – cont.

Mode – examples

Quartiles – example

Variance – example

Grade Number Frequency

2 59 31.89%

3 63 34.05%

3.5 33 17.84%

4 14 7.57%

4.5 10 5.41%

5 6 3.24%

Total 185 100%

Page 5: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Example 3 – cont.

IntervalClass

markNumber Frequency

Cumulative

numbercni

Cumulative

frequencycfi

(30,40] 35 11 0,11 11 0,11

(40,50] 45 23 0,23 34 0,34

(50,60] 55 33 0,33 67 0,67

(60,70] 65 12 0,12 79 0,79

(70,80] 75 6 0,06 85 0,85

(80,90] 85 8 0,08 93 0,93

(90,100] 95 3 0,03 96 0,96

(100,110] 105 2 0,02 98 0,98

(110,120] 115 2 0,02 100 1

Total 100 1Mode – example

Quartiles – example

Variance – example

Page 6: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Mode

Mode

the value that appears most often

� for grouped data:

Mo = most frequent value

� for grouped class interval data:

where

nMo – number of elements in mode’s class,

cL, b – analogous to the median

bnnnn

nncMo

MoMoMoMo

MoMoL ⋅

−+−−

+≅+−

)()( 11

1

Page 7: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Mode – examples

Example 1:

Mo = 3

Example 3:

the mode’s interval is (50,60], with 33 elements

nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12

23.5310)1233()2333(

233350 ≈⋅

−+−−

+≅Mo

Example 1 –

cont.

Example 3 –

cont.

Page 8: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Which measure should we choose?

� Arithmetic mean: for typical data series

(single max, monotonous frequencies)

� Mode: for typical data series, grouped data

(the lengths of the mode’s class and

neighboring classes should be equal)

� Median: no restrictions. The most robust (in

case of outlier observations, fluctuations

etc.)

Page 9: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Quantiles, quartiles

� p-th quantile (quantile of rank p): number

such that the fraction of observations less

than or equal to it is at least p, and values

greater than or equal to it at least 1-p

� Q1 : first quartile = quantile of rank ¼

� Second quartile = median

= quantile of rank ½

� Q3: Third quartile = quantile of rank ¾

Page 10: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Quantiles – cont.

Empirical quantile of rank p:

∈+

=

+

+

ZnpX

ZnpXX

Q

nnp

nnpnnp

p

:1][

:1:

2

Page 11: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Quartiles – cont.

� Quantiles for p = ¼ and p = ¾.

� For grouped class interval data –

analogous to the median

for k=1 or 3

where M1, M3 – number of the quartile’s class

b – length of quartile class interval

cL – lower end of the quartile class interval

⋅+≅ ∑

=

1

14

k

k

M

i

i

M

Lk nnk

n

bcQ

Page 12: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Quartiles – examples

Example 1:

so

Example 3:

so

75100 25100 43

41 =⋅=⋅

4M ,2 31 ==M

67,66)6775(12

106009,46)1125(

23

1040 31 ≈−+≅≈−+≅ QQ

Example1 –

cont.

Example 3 –

cont.

75.13818525.46185 41 =⋅=⋅ 4

3

5.3,2 185:1393185:471 ==== XX Q Q

Page 13: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Variability measures

� Classical measures

� variance, standard deviation

� average (absolute) deviation

� coefficient of variation

� Measures based on order statistics

� range

� interquartile range

� quartile deviation

� coefficients of variation (based on order stats)

� median absolute deviation

Page 14: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Measures based on order statistics

� Range

the most simple measure, does not take into

account anything but the extreme values

� Inter Quartile Range (midspread, middle fifty)

more robust than the range

nnn XXr :1: −=

13 QQIQR −=

may be further used to calculate quartile deviation Q= IQR/2, and

coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile

variation coefficient) or the typical range: [Med – Q, Med + Q]

length of the interval that covers the

middle 50% observations

Page 15: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Range, interquartile range – examples

Example 1:

Example 3:

(in reality

5.125.3

,325

=−=

=−=

IQR

r

58,2009,4667,66

)45,8645329118

9030120

=−≅

=

=−≅

IQR

,-,

r

Page 16: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Classical measures of dispersion

Variance

� raw data

� grouped data

� grouped class interval data

+ Sheppard’s correction

in general

2

1

21

1

212 )()(ˆ ∑∑==

−=−=n

i

in

n

i

inXXXXS

2

1

21

1

212 )()(ˆ ∑∑==

−=−=k

i

iin

k

i

iinXXnXXnS

2

1

21

1

212 )()(ˆ ∑∑==

−=−≅k

i

iin

k

i

iinXcnXcnS

12

22 2ˆ cSS −≅

c=length of

class

interval (for

equal

intervals)

2

1

112122 )(ˆ ∑

=−−−≅

k

i

iiinccnSS

Page 17: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Variance – examples

Example 1:

Example 3:

in reality

( )6)99.25(10)99.25.4(14)99.24(33)99.25.3(63)99.23(59)99.22( 222222

1851 ⋅−+⋅−+⋅−+⋅−+⋅−+⋅−

69.0

ˆ 2

≈S

98.32212

1031.331

31.331

ˆ

22

10012

≈−=

=

⋅≈

S

S

)2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(

12)7.5865(33)7.5855(23)7.5845(11)7.5835((

22222

2222

⋅−+⋅−+⋅−+⋅−+⋅−+

⋅−+⋅−+⋅−+⋅−

85.333ˆ 2 =S

Example 1 – cont.

Example 3 – cont.

distrubution not normal or

sample too small for

Sheppard’s correction –

larger errors from small

sample size than from class

grouping.

Page 18: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Standard deviation

In the same units as the initial variable

Example 1:

Example 3:

22 ,ˆˆ SSSS ==

[grade] 83.0ˆ ≈S

][m 22.18ˆ ≈S

Page 19: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Average (absolute) deviation, mean deviation

Nowadays seldom used. Simple

calculations.

for raw data

etc...

We have: d<S

∑=

−=n

i

inXXd

1

1 ||

Page 20: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Coefficient of variation (classical)

For comparisons of the same varaible

accross populations or different

variables for the same population

%)100( or

%),100(ˆ

⋅=

⋅=

X

dV

X

SV

d

S

Page 21: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Skewness (asymmetry)

left symmetry right

(negative) (zero) (positive)

(typical order)

MoMedX << MoMedX == MoMedX >>

Page 22: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Measures of asymmetry

� Skewness

where M3 is the third

central moment

� Skewness coefficient

� Quartile skewness coefficient

3

3

S

MA =

ˆ

or ˆ 11

S

MedXA

S

MoXA

−=

−=

13

132

2

QQ

QMedQA

−+−

=measures skewness for the

middle 50% observations only

Page 23: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Interpretation

� positive values= positive asymmetry (right

skewed distribution)

� negative values = negative asymmetry

(left skewed distribution)

� For the skewness coefficient (with the

median) and the quartile skewness

coefficient the strength of asymmetry

(absolute value):

� 0 – 0.33: weak

� 0.34 – 0.66: medium

� 0.67 – 1: strong

Page 24: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Asymmetry – examples

Example 1:

Example 3:

15.009.4667.66

09.4685.54267.66

24.02.18

85.547.583.0

2.18

23.537.58

,15.1

2

11

≈−

+⋅−≅

≈−

=≈−

A

)( A or )( A

A

MedMo

3

1

25.3

2325.3

01.083.0

399.2

01.083.0

399.2

41.0

2

1

1

−=−+⋅−

=

−≈−

=

−≈−

=

A

MoA

MedA

A

)(

)(

Page 25: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Boxplot

(Box and whisker plot)

Allows to compare two (or more)

populations

outliers:

xmax

outliers

X*

Q3

Med

Q1

X*

outliers

xmin

]},[:max{

]},[:min{

23

33

123

1

IQRQQXXX

QIQRQXXX

ii

ii

+∈=

−∈=∗

∗∗ >< XxXx or

Page 26: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Boxpolot – example of comparison

05

10

15

1 2

Page 27: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Examples (1)

Source: CSO Poland, 2009

Page 28: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Examples (2)

Source: European

Commission

Dispersion (coeffcient of variability) of unemployment

rates

Page 29: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Examples (3)

Growth charts

Source: WHO

Page 30: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Examples (4)

Gross hourly earinings

Source: European Commission

Page 31: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of

Examples(5)

Salary by occupational group and gender

Source: New Zealand State Services

Page 32: Mathematical Statistics Anna Janickacoin.wne.uw.edu.pl/azylicz/ms/ms02_2019_presentation.pdf · Descriptive Statistics, part II: mode quantiles measures of variability measures of