lecture 03. numerical descriptive statistics

Upload: mrfrederick87

Post on 03-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    1/43

    Statistics

    ST 361: Statistics for Engineers

    Numerical Descriptive Statistics

    Kimberly Weems

    [email protected]

    5260 SAS Hall

    mailto:[email protected]:[email protected]
  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    2/43

    Statistics

    Numeric Measures

    Why?Kim is in an introductory history class. On the

    midterm exam Kim scored 64 out of 100? Did she

    do well?

    The class average was a 42.

    By knowing the average for the class we can

    make a comparison.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    3/43

    Statistics

    Numeric Measures

    Allow us to make comparisonsOf individuals to the group

    Of group to other groups

    Measures of centerGive an idea about the main chunk of the data

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    4/43

    Statistics

    Measures of Central Tendency

    Mean-average Notation:

    Population mean: mu

    Sample mean: y-bary

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    5/43

    Statistics

    Measures of Central Tendency

    Summation Notation

    1 2 3

    1

    ...

    n

    i n

    i

    y y y y y

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    6/43

    Statistics

    Measures of Central Tendency

    Summation Notation

    1 2 3

    1

    ...

    n

    i n

    i

    y y y y y

    Sum of y

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    7/43Statistics

    Measures of Central Tendency

    Summation Notation

    1 2 3

    1

    ...

    n

    i n

    i

    y y y y y

    Sum of yIndividualvalues

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    8/43

    Statistics

    Measures of Central Tendency

    iyy

    n

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    9/43

    Statistics

    Measures of Central Tendency

    iyy

    n

    Sum of thevalues

    Sample size

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    10/43

    Statistics

    Measures of Central Tendency

    Median- Middle value in a data set whenvalues are put in increasing order

    50% of values above and 50% below

    If even number of observations just averagemiddle two.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    11/43

    Statistics

    Simple Example:

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    12/43

    Statistics

    Simple Example:

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    Mean: 15

    9+9+6+15+12+14+40

    7

    10515

    7

    iyy

    n

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    13/43

    Statistics

    Simple Example:

    Soda consumed Median:

    In increasing order 6 9 9 12 14 15 40

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    14/43

    Statistics

    Simple Example:

    Soda consumed Median: 12

    In increasing order 6 9 9 12 14 15 40

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    15/43

    Statistics

    Mean vs Median

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    16/43

    Statistics

    Mean vs Median

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

    mean

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    17/43

    Statistics

    Mean vs Median

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

    meanmedian

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    18/43

    Statistics

    Problem with the mean:

    Sensitive to unusual values and skewed datapulled away from the median

    Skewed Right

    Mean greater than median Skewed left

    Mean less than median.

    SymmetricMean and median are about the same.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    19/43

    Statistics

    Trimmed Mean

    A compromise between the average and themedian.

    Less sensitive to outliers.

    Observations are ordered from smallest to largest.A trimming percentage 100r% is chosen where r is

    a number between 0 and 0.5.

    Suppose r=0.1, so that the trimming percentage is

    10%. Then if n=20, 10% of 20 is 2: the trimmedmean results from deleting (trimming) the largest

    2 observations and the 2 smallest.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    20/43

    Statistics

    CoalEmissions Uncertainty Project

    (2009-10), Alissa Anderson,Colin Geisenhoffer, Brody

    Heffner, Michael Shaw & Emily

    Wisner

    After 2% Trim

    Before 2% Trim

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    21/43

    Statistics21

    Measures of Variability

    Why? Tell us about consistency and predictability

    Allow comparison of groups

    Gives scale of reference to compare individuals

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    22/43

    Statistics22

    Measures of Variability

    Range-difference in maximum and minimumHow spread out are the values

    Soda Amounts: Range = 40-6=34

    0 10 20 30 40

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    23/43

    Statistics23

    Measures of Variability

    Problem: Range only looks at two values.Does not quantify spread of the others.

    Solution: Look at all values => How far are

    they from mean Variance- summarizes distance between all

    individuals and the mean

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    24/43

    Statistics24

    Measures of Variability

    Important notation:Population variance: 2sigma squared

    Sample variance: s2

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    25/43

    Statistics25

    Measures of Variability

    Important Formula:

    2

    2 1

    N

    i

    i

    y

    N

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    26/43

    Statistics26

    Measures of Variability

    Important Formula:

    2

    2 1

    N

    i

    i

    y

    N

    Squaredto get ridofnegatives

    Calculateaverage ofthe

    squareddistances

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    27/43

    Statistics27

    Measures of Variability

    Sample Variance

    2

    2 1

    1

    n

    i

    i

    y ys

    n

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    28/43

    Statistics28

    Measures of Variability

    Sample Variance

    2

    2 1

    1

    n

    i

    i

    y ys

    n

    Divides byn-1 insteadof N

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    29/43

    Statistics29

    Measures of Variability

    Sample Variance

    2

    2 1

    1

    n

    i

    i

    y ys

    n

    Divides byn-1 insteadof N

    Sum ofsquares

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    30/43

    Statistics

    Simple Example:

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    31/43

    Statistics

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    32/43

    Statistics

    What does it tell us?

    By itself not much. Some people try lots of tricks to try to recreate

    the data set from this number.

    The purpose of the number is to make a

    comparison with other data sets.

    Example: Another group of teens had soda

    consumption that had a variance of 473.2.

    Other group was more spread out than our group.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    33/43

    Statistics33

    The Standard Deviation

    Variance is not on the same scale as theoriginal data.

    Standard Deviationsquare root of the

    variance.Has the same units as original data

    Allows more direct comparisons

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    34/43

    Statistics

    The Standard Deviation

    For amount of soda

    2

    131.33 11.46

    s s

    s

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    35/43

    Statistics

    What does it tell us?

    Understand variability in the data.Which is more consistent.

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    36/43

    Statistics 36

    City Temperature

    City Raleigh

    Mean 59

    Median 61

    SD 15

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    37/43

    Statistics 37

    City Temperature

    City Raleigh Fargo

    Mean 59 42

    Median 61 43

    SD 15 24

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    38/43

    Statistics 38

    City Temperature

    City Raleigh Fargo Fairbanks

    Mean 59 42

    Median 61 43

    SD 15 24

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    39/43

    Statistics 39

    City Temperature

    City Raleigh Fargo Fairbanks

    Mean 59 42 28

    Median 61 43 31

    SD 15 24 28

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    40/43

    Statistics 40

    City Temperature

    City Raleigh Fargo Fairbanks Honolulu

    Mean 59 42 28 77

    Median 61 43 31 77

    SD 15 24 28 3

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    41/43

    Statistics 41

    Coefficient of Variation

    Coefficient of Variation (CV)ratio of standarddeviation to mean

    Used to compare variability when scales are very

    different.

    . .s

    C Vy

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    42/43

    Statistics

    Example:

    Students in a midwestern state take a end ofgrade exam that has a maximum of 100 points.

    A class testing a new teaching method had a

    standard deviation of 10.

    Students in an east coast state take an end of

    grade exam that has a maximum of 500 points.

    A class testing the new teaching method had a

    standard deviation of 30. Which is more

    varied?

  • 7/28/2019 Lecture 03. Numerical Descriptive Statistics

    43/43

    Example

    The mean for the midwestern state was 70.

    The mean for the east coast state was 350.

    10. . 0.143 14.3%

    70

    sC V

    y

    30. . 0.086 8.6%

    350

    s

    C V y