# mathematical statistics anna descriptive statistics, part ii: mode quantiles measures of variability

Post on 21-May-2020

0 views

Embed Size (px)

TRANSCRIPT

Mathematical Statistics

Anna Janicka

Lecture II, 25.02.2019

DESCRIPTIVE STATISTICS, PART II

Plan for today

1. Descriptive Statistics, part II:

� mode

� quantiles

� measures of variability

� measures of asymmetry

� the boxplot

Measures of central tendency – reminder

� Classic:

� arithmetic mean

� Position (order, rank):

� median

� mode

� quartile

Example 1 – cont.

Mode – examples

Quartiles – example

Variance – example

Grade Number Frequency

2 59 31.89%

3 63 34.05%

3.5 33 17.84%

4 14 7.57%

4.5 10 5.41%

5 6 3.24%

Total 185 100%

Example 3 – cont.

Interval Class

mark Number Frequency

Cumulative

number cni

Cumulative

frequency cfi

(30,40] 35 11 0,11 11 0,11

(40,50] 45 23 0,23 34 0,34

(50,60] 55 33 0,33 67 0,67

(60,70] 65 12 0,12 79 0,79

(70,80] 75 6 0,06 85 0,85

(80,90] 85 8 0,08 93 0,93

(90,100] 95 3 0,03 96 0,96

(100,110] 105 2 0,02 98 0,98

(110,120] 115 2 0,02 100 1

Total 100 1 Mode – example

Quartiles – example

Variance – example

Mode

Mode

the value that appears most often

� for grouped data:

Mo = most frequent value

� for grouped class interval data:

where

nMo – number of elements in mode’s class,

cL, b – analogous to the median

b nnnn

nn cMo

MoMoMoMo

MoMo L ⋅−+−

− +≅

+−

−

)()( 11

1

Mode – examples

Example 1:

Mo = 3

Example 3:

the mode’s interval is (50,60], with 33 elements

nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12

23.5310 )1233()2333(

2333 50 ≈⋅

−+− −

+≅Mo

Example 1 –

cont.

Example 3 –

cont.

Which measure should we choose?

� Arithmetic mean: for typical data series

(single max, monotonous frequencies)

� Mode: for typical data series, grouped data

(the lengths of the mode’s class and

neighboring classes should be equal)

� Median: no restrictions. The most robust (in

case of outlier observations, fluctuations

etc.)

Quantiles, quartiles

� p-th quantile (quantile of rank p): number

such that the fraction of observations less

than or equal to it is at least p, and values

greater than or equal to it at least 1-p

� Q1 : first quartile = quantile of rank ¼

� Second quartile = median

= quantile of rank ½

� Q3: Third quartile = quantile of rank ¾

Quantiles – cont.

Empirical quantile of rank p:

∉

∈ +

=

+

+

ZnpX

Znp XX

Q

nnp

nnpnnp

p

:1][

:1:

2

Quartiles – cont.

� Quantiles for p = ¼ and p = ¾.

� For grouped class interval data –

analogous to the median

for k=1 or 3

where M1, M3 – number of the quartile’s class

b – length of quartile class interval

cL – lower end of the quartile class interval

−

⋅ +≅ ∑

−

=

1

14

k

k

M

i

i

M

Lk n nk

n

b cQ

Quartiles – examples

Example 1:

so

Example 3:

so

75100 25100 4341 =⋅=⋅

4M ,2 31 ==M

67,66)6775( 12

10 6009,46)1125(

23

10 40 31 ≈−+≅≈−+≅ QQ

Example1 –

cont.

Example 3 –

cont.

75.13818525.46185 41 =⋅=⋅ 43

5.3,2 185:1393185:471 ==== XX Q Q

Variability measures

� Classical measures

� variance, standard deviation

� average (absolute) deviation

� coefficient of variation

� Measures based on order statistics

� range

� interquartile range

� quartile deviation

� coefficients of variation (based on order stats)

� median absolute deviation

Measures based on order statistics

� Range

the most simple measure, does not take into

account anything but the extreme values

� Inter Quartile Range (midspread, middle fifty)

more robust than the range

nnn XXr :1: −=

13 QQIQR −=

may be further used to calculate quartile deviation Q= IQR/2, and

coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile

variation coefficient) or the typical range: [Med – Q, Med + Q]

length of the interval that covers the

middle 50% observations

Range, interquartile range – examples

Example 1:

Example 3:

(in reality

5.125.3

,325

=−=

=−=

IQR

r

58,2009,4667,66

)45,8645329118

9030120

=−≅

=

=−≅

IQR

,-,

r

Classical measures of dispersion

Variance

� raw data

� grouped data

� grouped class interval data

+ Sheppard’s correction

in general

2

1

21

1

212 )()(ˆ ∑∑ ==

−=−= n

i

in

n

i

in XXXXS

2

1

21

1

212 )()(ˆ ∑∑ ==

−=−= k

i

iin

k

i

iin XXnXXnS

2

1

21

1

212 )()(ˆ ∑∑ ==

−=−≅ k

i

iin

k

i

iin XcnXcnS

12

22 2ˆ cSS −≅

c=length of

class

interval (for

equal

intervals)

2

1

112 122 )(ˆ ∑

= −−−≅

k

i

iiin ccnSS

Variance – examples

Example 1:

Example 3:

in reality

( )6)99.25(10)99.25.4(14)99.24(33)99.25.3(63)99.23(59)99.22( 222222 185 1 ⋅−+⋅−+⋅−+⋅−+⋅−+⋅−

69.0

ˆ 2

≈

≈S

98.322 12

1031.331

31.331

ˆ

2 2

100 12

≈−=

=

⋅≈

S

S

)2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(

12)7.5865(33)7.5855(23)7.5845(11)7.5835((

22222

2222

⋅−+⋅−+⋅−+⋅−+⋅−+

⋅−+⋅−+⋅−+⋅−

85.333ˆ 2 =S

Example 1 – cont.

Example 3 – cont.

distrubution not normal or

sample too small for

Sheppard’s correction –

larger errors from small

sample size than from class

grouping.

Standard deviation

In the same units as the initial variable

Example 1:

Example 3:

22 ,ˆˆ SSSS ==

[grade] 83.0ˆ ≈S

][m 22.18ˆ ≈S

Average (absolute) deviation, mean deviation

Nowadays seldom used. Simple

calculations.

for raw data

etc...

We have: d

Coefficient of variation (classical)

For comparisons of the same varaible

accross populations or different

variables for the same population

%)100( or

%),100( ˆ

⋅=

⋅=

X

d V

X

S V

d

S

Skewness (asymmetry)

left symmetry right

(negative) (zero) (positive)

(typical order)

MoMedX >

Measures of asymmetry

� Skewness

where M3 is the third

central moment

� Skewness coefficient

� Quartile skewness coefficient

3

3

Ŝ

M A =

ˆ

or ˆ 11 S

MedX A

S

MoX A

− =

− =

13

13 2

2

QQ

QMedQ A

− +−

= measures skewness for the

middle 50% observations only

Interpretation

� positive values= positive asymmetry (right

skewed distribution)

� negative values = negative asymmetry

(left skewed distribution)

� For the skewness coefficient (with the

median) and the quartile skewness

coefficient the strength of asymmetry

(absolute value):

� 0 – 0.33: weak

� 0.34 – 0.66: medium

� 0.67 – 1: strong

Asymmetry – examples

Example 1:

Example 3:

15.0 09.4667.66

09.4685.54267.66

24.0 2.18

85.547.58 3.0

2.18

23.537.58

,15.1

2

11

≈ −

+⋅− ≅

≈ −

=≈ −

≅

≅

A

)( A or )( A

A

MedMo

3

1

25.3

2325.3

01.0 83.0

399

Recommended