prob stat.4photocopy

Upload: jamesharrill199459

Post on 04-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 PROB STAT.4photocopy

    1/11

    comprise a listing of the observed values, while grouped data represent a lumping together of the

    observed values. The data can be discrete or continuous. Because unorganized data are virtually meaningless, a method

    of processing the data is necessary. Table 4-1 below will be used to illustrate the concept.

    T a b l e 4 - 1 N u m b e r o f D a i l y B i l l i n g E r r o r s0 1 3 0 1 0 1 0 1 5 4 1 2 12 0 1 0 2 0 0 2 0 1 2 1 1 1

    2 1 1 0 4 1 3 1 1 1 1 3 4 00 0 0 1 3 0 1 2 2 3

    The 1st step is to establish an array, which is an arrangement of raw numerical data in ascending or descending

    order of magnitude. An array of ascending order from 1-5 is shown in the 1 st column of Table 4-2. The next step is to

    tabulate the frequency of each value by placing a tally mark under the tabulation column and in the appropriate row. The

    last column of Table 4-2 is the numerical value for the number of tallies and is called the frequency.

    T a b l e 4 - 2 T a l l y o f N u m b e r o f D a i l y B i l l i n g E r r o r s

    Number of Nonconforming Tabulation Frequency0 1111 1111 1111 151 1111 1111 1111 1111 202 1111 111 83 1111 54 111 35 1 1

    If tabulation column is eliminated, the resulting table is classified as a frequency distribution, which is an

    arrangement of data to show the frequency of values in each category.

    Frequency distributions are presented in graphical form when greater visual clarity is desired. There are a number

    of different ways to present the frequency distribution:

    Histogram consists of a set of rectangles that represent the frequency in each category. It represents graphically the

    frequencies of the observed values. Figure 4-2a is a histogram for the data in Table 4-2.

    R elative frequency distribution . It is calculated by dividing the frequency for each data value by the total, which is the

    sum of the frequencies as shown in the 3rd col. of Table 4-3. Graphical representation is shown in Fig 4-2b.

    Cumulative frequency is calculated by adding the frequency of each data value to the sum of the frequencies for the

    previous data values as shown in 4 th column of Table 4-3, the cumulative frequency for 0 nonconforming units is 15; for 1

    nonconforming unit, 15+20=35; for 2 nonconforming, 35+8; and so on. Graphic representation is shown in Figure 4-2c.

    Relative cumulative frequency is calculated by dividing the cumulative frequency for each data value by the total as

    shown in the 5th column of Table 4-3, and the graphical representation is shown in Figure 4-3d. The graph shows that the

    proportion of the billing errors that have 2 or fewer nonconforming is .83 or 83%.

  • 7/30/2019 PROB STAT.4photocopy

    2/11

    TABLE 4-3 Difference Frequency Distributions of Data Given in Table 4-1

    (a)

    Number Frequency Relative Cum. Relative Cum. PieNonconforming Frequency Frequency Frequency Chart

    0 15 15/52 =0.29 15 5/52 =0.29 .29x360=104.4o

    1 20 20/52 =0.38 15+20=35 35/52 =0.67 .38x360=136.5o

    2 8 8/52 =0.15 35+8 =43 43/52 =0.83 .15x360=54o

    3 5 5/52 =0.10 43+5 =48 48/52 =0.92 .10x360=36o4 3 3/52 =0.06 48+3 =51 51/52 =0.98 .06x360=21.6o

    5 1 1/52 =0.02 51+1 =52 52/52 =1.00 .02x360=7.2o

    The construction of a frequency distribution for grouped data is more complicated because there is usually a

    larger number of categories. See Table 4-4 as example.

    1st : Collect data and construct a Tally Sheet See Table 4-5

    2nd : Determine the Range. Range is the difference between the highest and lowest observed value, R = Xh - Xl; from

    3rd : Determine the Cell Interval. Cell Interval is the distance between adjacent cell midpoints as shown below.

    Whenever possible, an odd interval such as 0.001, 0.07, 0.5, or 3 is recommended so that the midpoint

    values will be to the same number of decimal places as the data values. The simplest technique is to use Sturgis

  • 7/30/2019 PROB STAT.4photocopy

    3/11

    rule, which is for the above sample i=0.044/[1+3.322(2.041)] =0.0057 and the closest

    odd interval for the above data is 0.005.

    4th : Determine the Cell Midpoints. The lowest cell midpoint must be located to include the lowest data value in its cell.

    The simplest technique is to select the lowest data point (2.531) as the midpoint value for the first cell. A better

    technique is to use the formula: (do not round answer) where MPl = midpoint for lowest cell. For the

    sample problem, MPl = 2.531 + 0.005/2 = 2.533

    Therefore a midpoints value of 2.533 can be used for the 1 st cell.Note: The midpoint for the other 8 cells are

    obtained by adding the cell interval to the previous midpoint: 2.533 + 0.005 = 2.538, 2.538 + 0.005 = 2.543, so on and

    so forth as shown in Table 4-6

    5th : Determine the Cell Boundaries. Cell boundaries are the extreme or limit values of a cell, referred to as the upper

    and lower boundary. All the observations that fall between the upper and lower boundaries are classified into that

    particular cell. Boundaries are established so there is no question as to the location of an observation. Since the interval

    is odd, there will be an equal number of data values on each side if the midpoint. For the 1 st cell with a midpoint of 2.533

    and an interval of 0.005, there will be two values on each side. Therefore, that cell will contain the values 2.531, 2.532,2.533, 2.534 and 2.535. To prevent any gaps, the true boundaries are extended about halfway to the next number, which

    gives values of 2.5305 and 2.5355. The following number line illustrates this principle:

    |_____|_____|_____|_____|_____|_____|_____|_____|_____|_____|2.5305 2.531 2.532 2.533 2.534 2.535 2.5355

    (lower boundary) (upper boundary)

    6th : Post the Cell Frequency. The amount of numbers in each cell is posted to the frequency column of Table 4-6. An

    analysis of table 4-5 shows that for the lowest cell there are: one 2.531, two 2.532, one 2.533, two 2.534 and zero 2.535.

    Therefore, there is a total of six values in the lowest cell, and the cell with a midpoint of 2.533 has a frequency of 6. The

    amounts are determined for the other cells in a similar manner.

    TABLE 4-6 Frequency Distribution of Steel Shaft Weight (kg)

    Cell Boundaries Cell Midpoint Frequency

    2.531 2.535 2.533 62.536 2.540 2.538 82.541 2.545 2.543 122.546 2.550 2.548 132.551 - 2.555 2.553 202.556 2.560 2.558 192.561 2.565 2.563 132.566 2.570 2.568 112.571 2.575 2.573 8

  • 7/30/2019 PROB STAT.4photocopy

    4/11

    The bar chart can also represent frequency distributions, as shown in Fig 45 a using the data of Table 4-1. As mentioned

    previously, the bar chart is theoretically correct for discrete data but is not commonly used.

    The polygon or frequency polygon is another graphic way of presenting frequency distributions and is illustrated in

    Figure 4-5b using the data of 4-6. It is constructed by placing a dot over each cell midpoint at the height indicated for

    each frequency. The curve is extended at each end in order for the figure to be enclosed.

    The graph that is used to present the frequency of all values less than the upper cell boundary of a given cell is called

    cumulative frequency or ogive . Figure 4-5c shows a cumulative frequency distribution curve for data in Table 4-6. The

    cumulative value for each cell is plotted on the graph and joined by a straight line. The true upper cell boundary is

    labeled on the abscissa except for the 1st cell, which also has the true lower boundary.

  • 7/30/2019 PROB STAT.4photocopy

    5/11

    A frequency distribution is sufficient, however, with a broad range of problems a graphical technique either

    desirable or needs the additional information provided by analytical techniques. Analytical methods of describing a

    collection of data have the advantage of occupying less space than a graph. They also have the advantage of allowing for

    comparisons bet. collections of data & allow for additional calculations & inferences. There are 2 principal analytical

    methods of describing a collection of data: measures of central tendency & measures of dispersion.

    A measure of central tendency of a distribution is a numerical value that describe the central position of the data

    or how the data tend to build up in the center. We have three measures: the average, the median and the mode.

    is the sum of the observation divided by the number of observations. There are 3 different techniques

    available for calculating the average: Ungrouped Data , Grouped & Weighted Ave.

    Ungrouped Data :

    where: X = averagen = number of observed values

    X1, X2, ,Xn = observed value identified by the subscript 1,2,,n or general subscript i = sum of

    Sample Problem: A technician checks the resistance value of 5 coils and records the values in ohms: x1 = 3.35, x2 =

    3.37, x3 = 3.28, x4 = 3.34 and x5 = 3.30. Determine the average.

    Solution: 3.35 + 3.37 + 3.28 + 3.34 + 3.30X = ------------------------------------------- = 3.33

    5

    Grouped Data :

    where: n = sum of the frequenciesfi = frequency in a cell or frequency of an observed valueXi = cell midpoint or an observed valueh =number of cells or number of observed values

    Sample Problem: Given the frequency distribution of the life of 320 tires in 1000 km as shown below (Table 4-7).Determine the average.

    Boundaries Midpoint (Xi) Frequency (f i) fiXi23.6 26.5 25.0 4 100

    26.6 29.5 28.0 36 100829.6 32.5 31.0 51 158132.6 35.5 34.0 63 214235.6 38.5 37.0 58 214638.6 41.5 40.0 52 208041.6 44.5 43.0 34 146244.6 47.5 46.0 16 73647.6 50.5 49.0 6 294

    Total n = 320 f iXi = 11,549

    Solution: X = 11,549 / 320 = 36.1 (which is in 1000 km) = 36.1 x 103 km

    Weighted Average is used when the number of averages are combined with different frequencies.

    where: Xw = weighted averagewi = weight of the ith average

    Sample Problem: Tensile test on aluminum alloy rods are conducted at three different times, which results in three

    different average values in megapascals (MPa). On the 1st occasion, 5 test are conducted with an average of 207

    MPa; on the 2nd occasion, 6 test, with an average of 203 MPa; and on the last occasion, 3 tests, with an average

    of 206 MPa. Determine the weighted average.

    Solution: Xw = [5(207) + 6(203) + 3(206)] / (5 + 6 + 3) = 205 MPa

  • 7/30/2019 PROB STAT.4photocopy

    6/11

    is defined as the value which divides a series of ordered observations so that he number of items above it isequal to the number below it.

    Ungrouped Technique two situations are possible in determining the median of a series of ungrouped data:

    when the number in series is odd and when the number in series is even.

    When the number in the series is odd, the median is the midpoint of the values. Thus, the ordered set

    of numbers 3, 4, 5, 6, 8, 8, and 10 has a median of 6, and the ordered set of numbers 22, 24, 24, 24, and 30has a median of 24.

    When the number in the series is even, the median is the average of the two middle numbers. Thus,

    the ordered set of numbers 3, 4, 5, 6, 8, and 8 has a median that is the average of 5 & 6 = 5.5.

    If both middle numbers are the same, as in the ordered set of numbers 22, 24, 24, 24, 30 and 30, it is

    still computed as the average of the two middle numbers, since (24 + 24) / 2 = 24.

    Grouped Technique the median is obtained by finding the cell that has the middle number and then

    interpolating within the cell. The interpolation formula for computing the median is where: Md = median

    Lm = lower boundary of the cell with the median n = total number of observation

  • 7/30/2019 PROB STAT.4photocopy

    7/11

    The average is the most commonly used measure of central tendency. It is used when the distribution is

    symmetrical or not appreciably skewed to the left or right; and when additional statistics, such as measures of dispersion,

    control charts, etc. are to be computed based on the average; and when a stable value is needed for inductive statistics.

    The median becomes an effective measures of central tendency when the distribution is positively (to the right)

    or negatively (to the left) skewed. It is used when an exact midpoint of a distribution is desired. When a distribution has

    extreme values, the average will be adversely affected while the median will remain unchanged. Thus, in a series of

    numbers such as 12, 13, 14, 15, 16, the median and the average is identical (14). However, if the first value is changed

    to a 2, the median remains at 14, but the average becomes 12.

    The mode is used when a quick and approximate measure of the central tendency is desired. Thus, the mode of

    a histogram is easily found by a visual examination.

    Other measures of central tendency are the geometric mean, harmonic mean and quadratic mean.

    Situation Purpose Further Computation

    Mode nominal most occurring terminal, since it can no longer be use in advanced statisticsMedian ordinal positional average terminal, since it can no longer be use in advanced StatMean or interval or value of single best measure specially for Inferential Stat(Average) ratio data observation

    Measures how spread the observation / data is from the mean It indicates to what degree the individual observations are dispersed or spread out around their mean. Describe how the data are spread out or scattered on each side of the central value. Includes Range,

    Standard Deviation & Variance.

    is the difference between the highest observation & the lowest observation. The only advantage is it is easy to

    calculate but it does not really give us a real picture of the data set because it only considers two extreme values, thelowest & the highest.

    is the difference between the 1st quartile & the 3rd quartile, that is

    is the average amount by which the observations in a data set vary from themean & is also called average deviation. It requires that we subtract the mean from each of the observations &then average the differences. That is

    Measure the tendency for the individual observations to deviate from their mean. Measures the spreading tendency of the data

    Is the mean of the standard deviations from the mean. It means that we find the amount by which eachobservation deviates from the mean. Then square those deviations & find the average of those squareddeviations.

    Has the symbol

    Is by far the most generally useful measure of variation.

    Is a numerical value in the units of the observed values that measures the spreading tendency of the data.

  • 7/30/2019 PROB STAT.4photocopy

    8/11

    Is simply the square root of the variance.

    As a measure of dispersion, the variance & the standard deviation.

    A large std. dev shows greater variability than does a smaller std dev.

    Ungrouped Technique:

    where: s = sample standard deviation

    Xi = observed valueX = averagen = number of observed values

    Example: Determine the standard deviation of the moisture content of kraft paper, given the following readings: 6.7, 6.0,6.4, 6.4, 5.9 and 5.8

    Solution: Standard deviation = s = [6(231.26)-(37.2)2]/[6(6-1)] = 0.35

    Grouped Technique:

    Refers to the shape of data distribution. It shows the tendency of the distribution to tail off either to the left or to the right. There are 3 shapes: the normal curve, the positively skewed & the negatively skewed curve.

    The normal curve is a symmetric distribution where the mean, mode & median are equal. It is bell-shaped. It is

    asymptotic, meaning it does not & will never intersect the horizontal line. In the positively skewed curve, the curve tails

    off to the right, which means the mean/average is greater than the median, which in turn greater than the mode. Onthe other hand, the negatively skewed curve tails off to the left, where the mode is greater than the median, which in

    turn greater than the mean. To determine the shape of the frequency distribution, we have the formula:

    For ungrouped data only:

    And for grouped data:

  • 7/30/2019 PROB STAT.4photocopy

    9/11

    Refers to the height of the data distribution.

    It illustrates the peakedness or flatness of the data values.

    Curves may be mesokurtic, platykurtic of leptokurtic.

    Mesokurtic is the normal curve. Curves which are characteristically flat are considered platykurtic & is the most dispersed

    while those which are peaked or tall are leptokurtic & is generally less scattered/variable.

    To determine the peakedness or flatness of a frequency disytribution, we have the coefficient of kurtosis:

    For ungrouped data only:

    And for grouped data:

  • 7/30/2019 PROB STAT.4photocopy

    10/11

    divides certain set of data into 4 equal parts.

    Ungrouped Technique: Example: 72 74 75 77 78 79 82 85 86 90 93 & 94

    = (12+1) = 3.25th = 75+0.25(77-75) = 75.5

    = (12+1) = 6.5th = 79+0.5(82-79) = 80.5

    = (12+1) = 9.75th = 86+0.75(90-86) = 89

    Grouped Technique:Example: Boundary frequency Cum frequency

    49.5-58.5 3 359.5-68.5 7 1069.5-78.5 18 2879.5-88.5 12 4089.5-98.5 8 4899.5-108.5 2 50

    ( where: Lq1 = lower boundary where Q1 is foundCF = lower cumulative frequency

    fq1 = frequency of Q1i = interval

    = 69.5 + {[(50/4)-10]/18}10 = 70.8889

    (= 69.5 + {[(50/2)-10]/18}10 = 77.8333

    (3= 79.5 + {[(3*50/4)-28]/12}10 = 87.4165

    divides certain set of data into 10 equal parts.

    Ungrouped Technique: Example: 72 74 75 77 78 79 82 85 86 90 93 & 94

    =(3/10)(12+1) = 3.9th = 75+0.9(77-75) = 76.8

    = (6/10)(12+1) = 7.8th = 82+0.8(85-82) = 84.4

    Grouped Technique:Example: Boundary F Cum F

    49.5-58.5 3 359.5-68.5 7 1069.5-78.5 18 28

    79.5-88.5 12 4089.5-98.5 8 4899.5-108.5 2 50

    4 where: Ld4 = lower boundary where D4 is found+ -------------- i CF = lower cumulative frequency

    fd4 = frequency of D4i = interval

    = 69.5 + {{[(4*50)/10]-10}/18}}10 = 75.056

    divides certain set of data into 100 equal parts.

    Ungrouped Technique: Example: 72 74 75 77 78 79 82 85 86 90 93 & 94

    =(3/100)(12+1) = .39

    th

    = 0.39(72) = 28.08=(16/100)(12+1)= 2.08th = 74+0.08(75-74) = 74.08

    Grouped Technique:

    Example: Boundary F Cum F49.5-58.5 3 359.5-68.5 7 1069.5-78.5 18 2879.5-88.5 12 4089.5-98.5 8 4899.5-108.5 2 50

  • 7/30/2019 PROB STAT.4photocopy

    11/11

    (4 where: Lp4 = lower boundary where P4 is foundCF = lower cumulative frequency

    f fp4 = frequency of P4i = interval

    = 49.5 + {{[(4*50)/100]-0}/3}}10 = 56.17