w2 frequency distribution 0

Upload: danny-manno

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 W2 Frequency Distribution 0

    1/47

    Centre for Computer Technology

    ICT114Mathematics for

    Computing

    Week2

    Statistics and FrequencyDistribution

  • 8/2/2019 W2 Frequency Distribution 0

    2/47

  • 8/2/2019 W2 Frequency Distribution 0

    3/47

    March 20, 2012 Copyright Box Hill Institute

    Set : Introduction

    A set is a well-defined list, collection or class of objects.

    The objects could be anything : numbers, names,

    people, cities. These objects are called the elements ormembers of the set.

    Example 1: The numbers 1,3,5,7,9,11,13,Example 2: The solutions of the equation x2 4x+3=0Example 3 : The rivers in Australia

  • 8/2/2019 W2 Frequency Distribution 0

    4/47

    March 20, 2012 Copyright Box Hill Institute

    Set Notation

    Sets are usually denoted by capital letters

    A, B, P, X, ..

    The elements are usually represented bylowercase letters a, b, p, x, ..

    There are two forms for presentation of a set :Tabular form , A = {1,3,5,7,9,11,}Set builder form, A = {x | x is odd}

  • 8/2/2019 W2 Frequency Distribution 0

    5/47

    March 20, 2012 Copyright Box Hill Institute

    Subsets

    If every element in a set A is also a member of aset B, then A is called a subsetof B

    In other words,

    if x A x B for all x,

    then A is a subsetof B

    It is written as AB or BA

    A is called a proper subsetof B, if A B and Ais not equal to B.

  • 8/2/2019 W2 Frequency Distribution 0

    6/47

    March 20, 2012 Copyright Box Hill Institute

    Venn Diagram to represent sets

    U is the universal set.

    A and B are disjoint sets

    R is a subset of S

    UU

    A

    B

    S

    R

  • 8/2/2019 W2 Frequency Distribution 0

    7/47

    March 20, 2012 Copyright Box Hill Institute

    Set Operations

    Let A and B represent two sets. We havethe definitions in a compact manner

    1. A U B ={ x | x

    A or x

    B or x

    both}2. A B ={ x | x A and x B }3. A B ={ x | x A and x B }

    4. A

    /

    ={ x | x

    A }5. A B={ x | x A or x B but x both}6. #A = Number of elements in set A

  • 8/2/2019 W2 Frequency Distribution 0

    8/47

    Centre for Computer Technology

    Statistics and FrequencyDistribution

  • 8/2/2019 W2 Frequency Distribution 0

    9/47

    March 20, 2012 Copyright Box Hill Institute

    Introduction

    Statistics is the medium to describe thecenter spread and shape of a data set.

    Two components Gathering of information or scientific data

    Inferential statistics/Statistical methods

    Statistical Methods are employed to makejudgements in the face of uncertainty andvariation.

  • 8/2/2019 W2 Frequency Distribution 0

    10/47

    March 20, 2012 Copyright Box Hill Institute

    Measures of Central Tendency Measures of Central Tendency are single

    values that act as a representative of data

    Three main measures

    Mean

    Mode

    Median

  • 8/2/2019 W2 Frequency Distribution 0

    11/47

    March 20, 2012 Copyright Box Hill Institute

    Mean

    For a given set of n numbers x1,x2,x3,.....xn.

    The mean denoted by

    x1+x2+x3+.....+xn

    = ------------------------

    n

  • 8/2/2019 W2 Frequency Distribution 0

    12/47

    March 20, 2012 Copyright Box Hill Institute

    Example : Consider the following set of

    numbersS = {1, 2, 3, 4, 5, 6, 7, 8, 9}

    The mean of the set S is

    1+2+3+4+5+6+7+8+9

    = ------------------------------- = 59

  • 8/2/2019 W2 Frequency Distribution 0

    13/47

    March 20, 2012 Copyright Box Hill Institute

    Median

    For a given set of n numbers x1,x2,x3,.....xn

    Median is a value where half the values

    are of x1,x2,x3,.....xn are larger than themedian and the other half are smaller thanthe median.

    In other words, Median is the middlemostnumber

  • 8/2/2019 W2 Frequency Distribution 0

    14/47

    March 20, 2012 Copyright Box Hill Institute

    Median

    Example : Consider the following set ofnumbers

    S = {1, 6, 3, 8, 2, 4, 9}

    To find the median, we need to order thelist

    S = {1, 2, 3, 4, 6, 8, 9}

    The middlemost number is 4 which is themedian of the set.

  • 8/2/2019 W2 Frequency Distribution 0

    15/47

    March 20, 2012Copyright Box Hill Institute

    What happens when we have to find the

    median of a set with an even number ofelements

    For example:

    Find the median of

    S = {1, 6, 3, 8, 2, 12, 4, 9}

  • 8/2/2019 W2 Frequency Distribution 0

    16/47

    March 20, 2012Copyright Box Hill Institute

    Some More Concepts

    For a set of n ordered data points

    If n is odd, the median is found in the location(n+1)/2 of the set

    If n is even, the median is the average of thetwo middle terms.

    The two terms are found in the location

    n/2, n/2+1

  • 8/2/2019 W2 Frequency Distribution 0

    17/47

    March 20, 2012Copyright Box Hill Institute

    Mode

    Mode of a data set is the value that occursmost often

    If there are two, three or multiple valuesthe data is bimodal, trimodal or multimodal

    Example:

    R = {2, 8, 1, 9, 5, 2, 7, 2, 7, 9, 4, 7, 1, 5, 2}

    The number that appears most is 2, whichis the mode of R.

  • 8/2/2019 W2 Frequency Distribution 0

    18/47

    March 20, 2012Copyright Box Hill Institute

    Measures of Dispersion

    Consider two sets

    S={5, 5, 5, 5, 5, 5}

    R={0, 0, 0, 10, 10, 10}for both the above sets, mean = 5

    But the above sets are two different datasets.

    Is it a good practice to use mean, medianor mode to describe them?

  • 8/2/2019 W2 Frequency Distribution 0

    19/47

    March 20, 2012Copyright Box Hill Institute

    Measures of Dispersion

    We use another descriptive statistic toevaluate the data called Measure of

    Dispersion. It is a measure of scatter or

    dispersion.

    It is a measure of scatter about the

    mean.

  • 8/2/2019 W2 Frequency Distribution 0

    20/47

    March 20, 2012Copyright Box Hill Institute

    Measures of Dispersion

    What happens to the values of dispersion

    If they are concentrated near the mean ?

    If they are distributed far from the mean?

  • 8/2/2019 W2 Frequency Distribution 0

    21/47

    March 20, 2012Copyright Box Hill Institute

    Measures of Dispersion

    If the values are concentrated near the meanof the data set, the measure is small.

    If they are distributed far from the mean of thedata set, the measure will be large.

    There are two main measures of dispersion

    Variance

    Standard Deviation

  • 8/2/2019 W2 Frequency Distribution 0

    22/47

    March 20, 2012Copyright Box Hill Institute

    Variance and Standard Deviation

    For a given set of n numbersx1,x2,x3,.....xn, the Variance, denoted by 2

    is given by

    (x1- )2 + (x2- )2 + .....+ (xn- )2

    2 = -------------------------------------------n

  • 8/2/2019 W2 Frequency Distribution 0

    23/47

    March 20, 2012Copyright Box Hill Institute

    Variance (method 2)

    Variance (method 2)

    = Mean of squares minusSquare ofMean

    = ( x2 / n) - ( x / n)2

    =

    x = x1 + x2 + x3........+ xn

  • 8/2/2019 W2 Frequency Distribution 0

    24/47

    March 20, 2012Copyright Box Hill Institute

    Variance and Standard Deviation

    The Variance is a non negative number

    The positive square root of the varianceis standard deviation.

    The simplest spread of variability isSample Range.Xmax - Xmin

  • 8/2/2019 W2 Frequency Distribution 0

    25/47

    March 20, 2012Copyright Box Hill Institute

    Variance and Standard Deviation

    Example: Find the variance and standarddeviation for the following set of test scores:

    T = {75, 80, 82, 87, 96}

    The mean of the set T is

    75+80+82+87+96 = ------------------------------- = 845

  • 8/2/2019 W2 Frequency Distribution 0

    26/47

    March 20, 2012

    Copyright Box Hill Institute

    Using the mean we get the variance as

    (75-84)2 + (80-84)2 + (82-84)2 + (87-84)2 + (96-84)2

    2 = ----------------------------------------------------

    5

    = 50.8

    Standard Deviation = 2 = 7.1274

  • 8/2/2019 W2 Frequency Distribution 0

    27/47

    March 20, 2012

    Copyright Box Hill Institute

    Sample Space

    Set of all possible outcomes of a

    statistical experiment is called a sample

    space or sample Each outcome is called an element or a

    member or sample point

    A group of samples is called population

  • 8/2/2019 W2 Frequency Distribution 0

    28/47

    March 20, 2012 Copyright Box Hill Institute

    Sample Statistics

    Any quantity obtained from a sample forthe purpose of estimating a populationparameter is called a sample statistic

    A sample along with inferential statisticsallow us to draw conclusions about

    population, with inferential statisticsmaking clear use of elements ofProbability.

  • 8/2/2019 W2 Frequency Distribution 0

    29/47

    March 20, 2012 Copyright Box Hill Institute

    Sample Mean

    For a given sample of n numbersx1,x2,x3,.....xn.

    The sample mean denoted by X

    x1+x

    2+x

    3+.....+x

    n

    X = ------------------------

    n

  • 8/2/2019 W2 Frequency Distribution 0

    30/47

    March 20, 2012 Copyright Box Hill Institute

    Weighted Mean

    For a given set of data, X= { x1, x2, ..., xn}

    and corresponding non-negative weights,

    W= { w1, w2, ..., wn}the weighted mean/average, is given by

    w1x1+w2x2+w3x3+.....+wnxn

    X = ---------------------------------------w1+w2+w3++wn

  • 8/2/2019 W2 Frequency Distribution 0

    31/47

    March 20, 2012 Copyright Box Hill Institute

    Sample Variance

    For a given sample of n numbersx1,x2,x3,.....xn, the Variance, denoted by S

    2

    is given by

    (x1- X)2 + (x2- X)

    2 + .....+ (xn- X)2

    S2 = -------------------------------------------(n-1)

  • 8/2/2019 W2 Frequency Distribution 0

    32/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Distributions

    For large samples (or populations) it isdifficult to observe various characteristics

    or to compute statistics Therefore it is useful to organize or group

    the raw data

    The data is arranged in intervals of equalwidth.

  • 8/2/2019 W2 Frequency Distribution 0

    33/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Distributions

    The intervals are called classes orcategories.

    The number of individuals or elements ineach class is determined, called classfrequency.

    The resulting arrangement is calledfrequency distribution or frequency table.

  • 8/2/2019 W2 Frequency Distribution 0

    34/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Distribution

    Example : Height ofstudents in XYZ

    university (frequencytable)

    Height

    (cm)

    Number of

    Students

    155-159

    160-164

    165-169

    170-174

    175-179

    5

    18

    42

    27

    8

    Total 100

  • 8/2/2019 W2 Frequency Distribution 0

    35/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Distribution

    In the previous example

    The first category 155-159 is called classinterval

    The corresponding class frequency is 5.

    The mid point of the class interval is calledthe class mark.

  • 8/2/2019 W2 Frequency Distribution 0

    36/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Histogram

    Height

    (cm)

    Number of

    Students

    155-159

    160-164

    165-169

    170-174

    175-179

    5

    18

    42

    27

    8

    Total 100 0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    155-159 160-164 165-169 170-174 175-179

    Height (cm)

  • 8/2/2019 W2 Frequency Distribution 0

    37/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Polygon

    Height

    (cm)

    Number of

    Students

    155-159

    160-164

    165-169

    170-174

    175-179

    5

    18

    42

    27

    8

    Total 1000

    5

    10

    15

    20

    25

    30

    35

    40

    45

    157 161 167 172 177

    Height (cm)

  • 8/2/2019 W2 Frequency Distribution 0

    38/47

    March 20, 2012 Copyright Box Hill Institute

    Frequency Graphs

    In a histogram, the sum of therectangular areas is 100.

    A frequency polygon is a graphconnecting the midpoints of the topsof the histogram.

    In a bar graph, the sum of theordinates is 1.

  • 8/2/2019 W2 Frequency Distribution 0

    39/47

    March 20, 2012 Copyright Box Hill Institute

    Relative Frequency

    Height

    (cm)

    Number of

    Students

    155-159

    160-164

    165-169

    170-174

    175-179

    05%

    18 %

    42 %

    27 %

    08 %

    Total 100%

    In relative frequency,the class frequency isreplaced by

    percentage ratherthan the number.

    In the histogram thevertical axis will bereplaced with relativefrequency instead offrequency.

  • 8/2/2019 W2 Frequency Distribution 0

    40/47

    March 20, 2012 Copyright Box Hill Institute

    In the previousexample, whathappens if we have a

    student with a heightof 159.7 cm.

    Height

    (cm)

    Number of

    Students

    155-159

    160-164

    165-169

    170-174

    175-179

    Total

  • 8/2/2019 W2 Frequency Distribution 0

    41/47

    March 20, 2012 Copyright Box Hill Institute

    Continuous Frequency Distribution

    The class intervalsare chosen such thatthey are continuous

    as shown

    Height

    (cm)

    Number of

    Students

    154.5-159.4

    159.5-164.4

    164.5-169.4

    169.5-174.4

    174.5-179.4

    Total

  • 8/2/2019 W2 Frequency Distribution 0

    42/47

    March 20, 2012 Copyright Box Hill Institute

    Mean and Variance from

    Frequency TableInterval mid point (x) frequency (f) f.X f.X2

    a0- a1 x1 f1 f1.x1 f1.x1.x1

    a1- a2 x2 f2 f2.x2 f2.x2.x2

    an-1an xn fn fn.xn fn.xn.xn

    All Total f Total f.x Total f.x.x

    Mean = total (f.x) / total f Variance = [total (f.x.x)/total f] (mean)2

  • 8/2/2019 W2 Frequency Distribution 0

    43/47

    March 20, 2012 Copyright Box Hill Institute

    Example : Mean and Variance from

    Frequency TableClass interval Frequency, f

    1.5 1.9

    2.0 2.4

    2.5 2.9

    3.0 3.4

    3.5 3.94.0 4.4

    4.5 4.9

    2

    1

    4

    15

    10

    5

    3

  • 8/2/2019 W2 Frequency Distribution 0

    44/47

    March 20, 2012 Copyright Box Hill Institute

    Class interval Class

    midpoint, x

    Frequency, f f.x f.x2

    1.5 1.9

    2.0 2.4

    2.5 2.9

    3.0 3.4

    3.5 3.9

    4.0 4.4

    4.5 4.9

    1.7

    2.2

    2.7

    3.2

    3.7

    4.2

    4.7

    2

    1

    4

    15

    10

    5

    3

    3.4

    2.2

    10.8

    48

    37

    21

    14.1

    5.78

    4.84

    29.16

    153.6

    136.9

    88.2

    66.2740 136.5 484.75

  • 8/2/2019 W2 Frequency Distribution 0

    45/47

    March 20, 2012 Copyright Box Hill Institute

    Mean = total (f.x) / total f

    = 136.5 / 40

    = 3.4125

    Variance = [total (f.x.x)/total f] (mean)2

    = 484.75 / 40 (3.4125)2

    = 12.1188 11.6452= 0.4736

  • 8/2/2019 W2 Frequency Distribution 0

    46/47

    March 20, 2012 Copyright Box Hill Institute

    Summary

    There are three main measures of centraltendency : Mean, Mode and Median.

    There are two main measures ofdispersion : Variance and StandardDeviation.

    The organization or grouping of raw datain a table is called Frequency distribution.

  • 8/2/2019 W2 Frequency Distribution 0

    47/47

    March 20, 2012 Copyright Box Hill Institute

    References

    M R Spiegel : Theory and Problems ofStatistics, Schaum's Outline Series,McGraw Hill.

    http://mathworld.wolfram.com