mathematical statistics anna descriptive statistics, part ii: mode quantiles measures of variability

Download Mathematical Statistics Anna Descriptive Statistics, part II: mode quantiles measures of variability

Post on 21-May-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Mathematical Statistics

    Anna Janicka

    Lecture II, 25.02.2019

    DESCRIPTIVE STATISTICS, PART II

  • Plan for today

    1. Descriptive Statistics, part II:

    � mode

    � quantiles

    � measures of variability

    � measures of asymmetry

    � the boxplot

  • Measures of central tendency – reminder

    � Classic:

    � arithmetic mean

    � Position (order, rank):

    � median

    � mode

    � quartile

  • Example 1 – cont.

    Mode – examples

    Quartiles – example

    Variance – example

    Grade Number Frequency

    2 59 31.89%

    3 63 34.05%

    3.5 33 17.84%

    4 14 7.57%

    4.5 10 5.41%

    5 6 3.24%

    Total 185 100%

  • Example 3 – cont.

    Interval Class

    mark Number Frequency

    Cumulative

    number cni

    Cumulative

    frequency cfi

    (30,40] 35 11 0,11 11 0,11

    (40,50] 45 23 0,23 34 0,34

    (50,60] 55 33 0,33 67 0,67

    (60,70] 65 12 0,12 79 0,79

    (70,80] 75 6 0,06 85 0,85

    (80,90] 85 8 0,08 93 0,93

    (90,100] 95 3 0,03 96 0,96

    (100,110] 105 2 0,02 98 0,98

    (110,120] 115 2 0,02 100 1

    Total 100 1 Mode – example

    Quartiles – example

    Variance – example

  • Mode

    Mode

    the value that appears most often

    � for grouped data:

    Mo = most frequent value

    � for grouped class interval data:

    where

    nMo – number of elements in mode’s class,

    cL, b – analogous to the median

    b nnnn

    nn cMo

    MoMoMoMo

    MoMo L ⋅−+−

    − +≅

    +−

    )()( 11

    1

  • Mode – examples

    Example 1:

    Mo = 3

    Example 3:

    the mode’s interval is (50,60], with 33 elements

    nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12

    23.5310 )1233()2333(

    2333 50 ≈⋅

    −+− −

    +≅Mo

    Example 1 –

    cont.

    Example 3 –

    cont.

  • Which measure should we choose?

    � Arithmetic mean: for typical data series

    (single max, monotonous frequencies)

    � Mode: for typical data series, grouped data

    (the lengths of the mode’s class and

    neighboring classes should be equal)

    � Median: no restrictions. The most robust (in

    case of outlier observations, fluctuations

    etc.)

  • Quantiles, quartiles

    � p-th quantile (quantile of rank p): number

    such that the fraction of observations less

    than or equal to it is at least p, and values

    greater than or equal to it at least 1-p

    � Q1 : first quartile = quantile of rank ¼

    � Second quartile = median

    = quantile of rank ½

    � Q3: Third quartile = quantile of rank ¾

  • Quantiles – cont.

    Empirical quantile of rank p:

    

      

    ∈ +

    =

    +

    +

    ZnpX

    Znp XX

    Q

    nnp

    nnpnnp

    p

    :1][

    :1:

    2

  • Quartiles – cont.

    � Quantiles for p = ¼ and p = ¾.

    � For grouped class interval data –

    analogous to the median

    for k=1 or 3

    where M1, M3 – number of the quartile’s class

    b – length of quartile class interval

    cL – lower end of the quartile class interval

     

      

     −

    ⋅ +≅ ∑

    =

    1

    14

    k

    k

    M

    i

    i

    M

    Lk n nk

    n

    b cQ

  • Quartiles – examples

    Example 1:

    so

    Example 3:

    so

    75100 25100 4341 =⋅=⋅

    4M ,2 31 ==M

    67,66)6775( 12

    10 6009,46)1125(

    23

    10 40 31 ≈−+≅≈−+≅ QQ

    Example1 –

    cont.

    Example 3 –

    cont.

    75.13818525.46185 41 =⋅=⋅ 43

    5.3,2 185:1393185:471 ==== XX Q Q

  • Variability measures

    � Classical measures

    � variance, standard deviation

    � average (absolute) deviation

    � coefficient of variation

    � Measures based on order statistics

    � range

    � interquartile range

    � quartile deviation

    � coefficients of variation (based on order stats)

    � median absolute deviation

  • Measures based on order statistics

    � Range

    the most simple measure, does not take into

    account anything but the extreme values

    � Inter Quartile Range (midspread, middle fifty)

    more robust than the range

    nnn XXr :1: −=

    13 QQIQR −=

    may be further used to calculate quartile deviation Q= IQR/2, and

    coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile

    variation coefficient) or the typical range: [Med – Q, Med + Q]

    length of the interval that covers the

    middle 50% observations

  • Range, interquartile range – examples

    Example 1:

    Example 3:

    (in reality

    5.125.3

    ,325

    =−=

    =−=

    IQR

    r

    58,2009,4667,66

    )45,8645329118

    9030120

    =−≅

    =

    =−≅

    IQR

    ,-,

    r

  • Classical measures of dispersion

    Variance

    � raw data

    � grouped data

    � grouped class interval data

    + Sheppard’s correction

    in general

    2

    1

    21

    1

    212 )()(ˆ ∑∑ ==

    −=−= n

    i

    in

    n

    i

    in XXXXS

    2

    1

    21

    1

    212 )()(ˆ ∑∑ ==

    −=−= k

    i

    iin

    k

    i

    iin XXnXXnS

    2

    1

    21

    1

    212 )()(ˆ ∑∑ ==

    −=−≅ k

    i

    iin

    k

    i

    iin XcnXcnS

    12

    22 2ˆ cSS −≅

    c=length of

    class

    interval (for

    equal

    intervals)

    2

    1

    112 122 )(ˆ ∑

    = −−−≅

    k

    i

    iiin ccnSS

  • Variance – examples

    Example 1:

    Example 3:

    in reality

    ( )6)99.25(10)99.25.4(14)99.24(33)99.25.3(63)99.23(59)99.22( 222222 185 1 ⋅−+⋅−+⋅−+⋅−+⋅−+⋅−

    69.0

    ˆ 2

    ≈S

    98.322 12

    1031.331

    31.331

    ˆ

    2 2

    100 12

    ≈−=

    =

    ⋅≈

    S

    S

    )2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(

    12)7.5865(33)7.5855(23)7.5845(11)7.5835((

    22222

    2222

    ⋅−+⋅−+⋅−+⋅−+⋅−+

    ⋅−+⋅−+⋅−+⋅−

    85.333ˆ 2 =S

    Example 1 – cont.

    Example 3 – cont.

    distrubution not normal or

    sample too small for

    Sheppard’s correction –

    larger errors from small

    sample size than from class

    grouping.

  • Standard deviation

    In the same units as the initial variable

    Example 1:

    Example 3:

    22 ,ˆˆ SSSS ==

    [grade] 83.0ˆ ≈S

    ][m 22.18ˆ ≈S

  • Average (absolute) deviation, mean deviation

    Nowadays seldom used. Simple

    calculations.

    for raw data

    etc...

    We have: d

  • Coefficient of variation (classical)

    For comparisons of the same varaible

    accross populations or different

    variables for the same population

    %)100( or

    %),100( ˆ

    ⋅=

    ⋅=

    X

    d V

    X

    S V

    d

    S

  • Skewness (asymmetry)

    left symmetry right

    (negative) (zero) (positive)

    (typical order)

    MoMedX >

  • Measures of asymmetry

    � Skewness

    where M3 is the third

    central moment

    � Skewness coefficient

    � Quartile skewness coefficient

    3

    3

    M A =

    ˆ

    or ˆ 11 S

    MedX A

    S

    MoX A

    − =

    − =

    13

    13 2

    2

    QQ

    QMedQ A

    − +−

    = measures skewness for the

    middle 50% observations only

  • Interpretation

    � positive values= positive asymmetry (right

    skewed distribution)

    � negative values = negative asymmetry

    (left skewed distribution)

    � For the skewness coefficient (with the

    median) and the quartile skewness

    coefficient the strength of asymmetry

    (absolute value):

    � 0 – 0.33: weak

    � 0.34 – 0.66: medium

    � 0.67 – 1: strong

  • Asymmetry – examples

    Example 1:

    Example 3:

    15.0 09.4667.66

    09.4685.54267.66

    24.0 2.18

    85.547.58 3.0

    2.18

    23.537.58

    ,15.1

    2

    11

    ≈ −

    +⋅− ≅

    ≈ −

    =≈ −

    A

    )( A or )( A

    A

    MedMo

    3

    1

    25.3

    2325.3

    01.0 83.0

    399