w5inse6220

Upload: jaideep-singh

Post on 09-Mar-2016

10 views

Category:

Documents


0 download

DESCRIPTION

stats

TRANSCRIPT

  • 1INSE 6220 -- Week 5INSE 6220 -- Week 5Advanced Statistical Approaches to Quality

    0.3

    0.35S Chart

    Process capability More on Hypothesis Testing 0.2

    0.25

    0.3

    UCL

    Stan

    dard

    Dev

    iatio

    n

    More on Hypothesis Testing More on Statistical Inference More on Control Charts: 0.1

    0.15

    CLStan

    dard

    Dev

    iatio

    n

    X-bar, R, and S control charts

    0 5 10 15 20 25 30 35 400

    0.05

    LCL

    Sample Number

    Dr. A. Ben Hamza Concordia University

    Sample Number

    2

    Process capability analysis

    1. Compute the mean of sample means ( X ).

    2. Compute the mean of sample ranges ( R ).

    3. Estimate the population standard deviation (x):x = R / d2

    4. Estimate the natural tolerance of the process:Natural tolerance = 6xNatural tolerance = 6x

    5. Determine the specification limits:5. Determine the specification limits:USL = Upper specification limitLSL = Lower specification limit

    3

    Process capability analysis (cont.)6. Compute capability indices:

    Process capability potentialC = (USL LSL) / 6Cp = (USL LSL) / 6x

    Upper capability indexUpper capability indexCpU = (USL X ) / 3x

    Lower capability indexCpL = ( X LSL) / 3x

    Process capability indexCpk = min (CpU, CpL)

    4

    Control Charts Suppose we have a general statistic W Suppose we have a general statistic W We plot W over time We specify control limits of the form We specify control limits of the form

    3

    3

    W W

    W

    U C LC L

    L C L

    Mean of W

    Std. Dev. of W A control chart based on a number of standard deviations of the statistic

    from the mean of the statistic is called a Shewart Control Chart

    3W WL C L Std. Dev. of W

    from the mean of the statistic is called a Shewart Control Chart Some commonly used Ws

    X bar: Average R: Range s: Standard deviation

    We can also specify control charts using probability limits We can also specify control charts using probability limits

  • 5X-bar and R Charts

    Chartx :

    RAxUCL

    Chartx

    2

    :

    RlineCentral

    RDUCLChartR

    4

    :

    RAxLCL

    xlineCentral

    2

    xxR

    RDLCL

    RlineCentral

    3

    25~20

    ...21

    mm

    xxxx m

    25~20

    ...21

    minmax

    mm

    RRRR

    xxR

    m

    6~4n 25~20m

    A2, D3, D4=?

    To find the control limits, need to estimate

    Estimates process mean, To find the control limits, need to estimate

    the variance, or standard deviation

    6

    Control Charts for X-bar and s

    3s sUCLCL

    3s

    s s

    CLLCL

    If is a random sample from a population, then XXX ,...,, ),( 2NIf is a random sample from a population, then nXXX ,...,, 21 ),( 2N

    )(but )( 22 sEsE

    7

    8

  • 9Example

    10

    11

    12

  • 13

    Summary of Control Charts

    RDULCRCL

    RDLCL 3

    xCLRAxLCL 2

    dR

    X X bar & R chart

    SBLCLSAxLCL

    RDULC 4RAxULC 22d

    SBULCSCL

    SBLCL

    4

    3

    ;cS

    4

    SAxULCxCL

    SAxLCL

    3

    3

    X X bar & S chart

    4

    14

    Example: S charts with MATLAB

    This example plots an S chart of measurements on newly machined parts, taken at one hour intervals for 36 hours. taken at one hour intervals for 36 hours. Each row of the runout matrix contains the measurements for 4 parts chosen at random. The values indicate, in random. The values indicate, in thousandths of an inch, the amount the part radius differs from the target radius.

    >> load parts>> controlchart(runout,'chart','xbar','sigma',std');>> controlchart(runout,'chart','xbar','sigma',std');>> controlchart(runout,'chart','s', 'sigma','std');

    15

    Null HypothesisAlternative

    Hypothesis Testing pronouncedH nought

    Alternative Hypothesis

    A hypothesis test is a procedure for determining if an assertion about a characteristic of a

    0

    1

    : 1.10: 1.10

    HH

    A hypothesis test is a procedure for determining if an assertion about a characteristic of a population is reasonable. Example1: The mean monthly cell phone bill in this city is = $42

    Example3: suppose that someone says that the average price of a liter of regular unleaded

    Example2: The proportion of adults in this city with cell phones is p = 0.68

    Example3: suppose that someone says that the average price of a liter of regular unleaded gas in Montreal is $1.10. How would you decide whether this statement is true? You could try to find out what every gas station in the city was charging and how many liters they were selling at that price. That approach might be definitive, but it could end up costing more than the information is worth. A simpler approach is to find out the price of gas at a small number of the information is worth. A simpler approach is to find out the price of gas at a small number of randomly chosen stations around the city and compare the average price to $1.10. Of course, the average price you get will probably not be exactly $1.10 due to variability in price from one station to the next. Suppose your average price was $1.18. Is this three cent price from one station to the next. Suppose your average price was $1.18. Is this three cent difference a result of chance variability, or is the original assertion incorrect? A hypothesis test can provide an answer.

    16

    Hypothesis Test Terminology: review The significance level is related to the degree of certainty you require in order to reject the The significance level is related to the degree of certainty you require in order to reject the

    null hypothesis in favor of the alternative. By taking a small sample you cannot be certain about your conclusion. So you decide in advance to reject the null hypothesis if the probability of observing your sampled result is less than the significance level. For a typical significance level of 5%, the notation is = 0.05. For this significance level, the typical significance level of 5%, the notation is = 0.05. For this significance level, the probability of incorrectly rejecting the null hypothesis when it is actually true is 5%. If you need more protection from this error, then choose a lower value of .

    The p-value is the probability of observing the given sample result under the assumption The p-value is the probability of observing the given sample result under the assumption that the null hypothesis is true. If the p-value is less than , then you reject the null hypothesis. For example, if = 0.05 and the p-value is 0.03, then you reject the null hypothesis. The converse is not true. If the p-value is greater than , you have insufficient evidence to reject the null hypothesis. hypothesis. The converse is not true. If the p-value is greater than , you have insufficient evidence to reject the null hypothesis.

    The outputs for many hypothesis test functions also include confidence intervals. Loosely speaking, a confidence interval is a range of values that have a chosen probability of speaking, a confidence interval is a range of values that have a chosen probability of containing the true hypothesized quantity. Suppose, in the example, 1.15 is inside a 95% confidence interval for the mean, . That is equivalent to being unable to reject the null hypothesis at a significance level of 0.05. Conversely if the 100(1- ) confidence interval does not contain 1.15, then you reject the null hypothesis at the level of significance. does not contain 1.15, then you reject the null hypothesis at the level of significance.

  • 17

    Inference on the mean of a population, variance known0 0: H 0 01 0

    : : (3-22)

    (3-23)

    HH

    XZ

    H1 in equation (3-22) is a two-sided alternative hypothesis

    00 (3-23)/

    XZn

    1The procedure for testing this hypothesis is to:

    take a random sample of n observations on the random variable x, compute the test statistic, and reject H if |Z | > Z , where Z is the upper /2 percentage of the reject H0 if |Z0| > Z/2, where Z/2 is the upper /2 percentage of the standard normal distribution.

    In some situations we may wish to reject H0 only if the true mean is larger 0than 0 Thus, the one-sided alternative hypothesis is H1: >0, and we would reject

    H0: =0 only if Z0>ZH0: =0 only if Z0>Z If rejection is desired only when

  • 21

    Example: Glow Toothpaste Two-Tailed Tests about a Population Mean: Large n Two-Tailed Tests about a Population Mean: Large n

    The production line for Glow toothpaste is designed to fill tubes of toothpaste with a mean weight of 6 ounces. Periodically, a sample of 30 tubes will be selected in order to check the filling process. Quality assurance procedures call for the order to check the filling process. Quality assurance procedures call for the continuation of the filling process if the sample results are consistent with the assumption that the mean filling weight for the population of toothpaste tubes is 6 ounces; otherwise the filling process will be stopped and adjusted.ounces; otherwise the filling process will be stopped and adjusted.

    Two-Tailed Tests about a Population Mean: Large nA hypothesis test about the population mean can be used to help determine when the filling process should continue operating and when it should be stopped and corrected.filling process should continue operating and when it should be stopped and corrected. Hypotheses

    H0: H0: H1:

    Rejection Rule

    ssuming a .05 level of significance,

    Reject H0 if Z0 < -1.96 or if Z0 > 1.96

    22

    Example: Glow Toothpaste Two-Tailed Test about a Population Mean: Large n Two-Tailed Test about a Population Mean: Large n Two-Tailed Test about a Population Mean: Large n

    Assume that a sample of 30 toothpaste tubesprovides a sample mean of 6.1 ounces and standard

    Two-Tailed Test about a Population Mean: Large nAssume that a sample of 30 toothpaste tubes

    provides a sample mean of 6.1 ounces and standardprovides a sample mean of 6.1 ounces and standarddeviation of 0.2 ounces.

    Let n = 30, = 6.1 ounces, = 0.2 ounces

    provides a sample mean of 6.1 ounces and standarddeviation of 0.2 ounces.

    Let n = 30, = 6.1 ounces, = 0.2 ouncesx

    00

    6.1 6 2.74/ 0.2/ 30

    xZn

    Since 2.74 > 1.96, we reject H0.Since 2.74 > 1.96, we reject H0. Two-Tailed Test about a Population Mean: Large n Two-Tailed Test about a Population Mean: Large nn

    Conclusion: We are 95% confident that the mean filling weight of the toothpaste tubes is not 6 ounces. The filling process should be stopped

    nConclusion: We are 95% confident that the mean filling weight of the toothpaste tubes is not 6 ounces. The filling process should be stopped ounces. The filling process should be stopped and the filling mechanism adjusted.ounces. The filling process should be stopped and the filling mechanism adjusted.

    23

    Example: Glow Toothpaste Using the p-Value for a Two-Tailed Hypothesis Test

    Suppose we define the p-value for a two-tailed test as double the area found in the tail of the distribution.With Z0 = 2.74, the standard normal probability table shows:

    Considering the same probability of a larger difference in the lower tail of

    1 (2.74) 1 0.996928 0.0031

    Considering the same probability of a larger difference in the lower tail of the distribution, we have

    p-value = 2(0.0031) = 0.0062The p-value .0062 is less than = 0.05, so H0 is rejected.

    24Confidence Interval Approach to aTwo-Tailed Test about a Population MeanTwo-Tailed Test about a Population Mean Select a simple random sample from the population and use the value of the

    sample mean to develop the confidence interval for the population mean . If the confidence interval contains the hypothesized value , do not reject H .

    x If the confidence interval contains the hypothesized value 0, do not reject H0.

    Otherwise, reject H0.

    Confidence Interval Approach to a Two-Tailed Hypothesis TestThe 95% confidence interval for is

    or 6.0284 to 6.1716

    x zn

    / . . (. ) . .2 6 1 1 96 2 30 6 1 0716

    or 6.0284 to 6.1716

    Since the hypothesized value for the population mean, 0 = 6, is not in this interval, the hypothesis-testing conclusion is that the null in this interval, the hypothesis-testing conclusion is that the null hypothesis, H0: = 6, can be rejected.

  • 25

    Inference on the mean of a normal distribution with variance unknown

    For the two-sided alternative hypothesis, reject H if |t | > t , where For the two-sided alternative hypothesis, reject H0 if |t0| > t/2,n-1, where t/2,n-1, is the upper /2 percentage of the t distribution with n 1 degrees of freedom For the one-sided alternative hypotheses, For the one-sided alternative hypotheses,

    If H1: 1 > 0, reject H0 if t0 > t,n 1, and If H1: 1 < 0, reject H0 if t0 < t,n 1

    One could also compute the P-value for a t-test

    26

    Confidence interval on the mean of a normal distribution with variance unknownvariance unknown

    p_value:p_value:0

    0

    2[1- (| |)] for a two-tailed testvalue 1- ( ) for an upper-tailed test

    ( ) for a lower-tailed test

    F tp F t

    F t

    where is the cdf of the t-distribution.

    0( ) for a lower-tailed testF t

    F

    27

    28

  • 29

    Inference on a population proportionHypothesis TestingHypothesis Testing

    30

    Inference on a population proportionConfidence intervals on a population proportionConfidence intervals on a population proportion

    31

    The probability of type II error and sample size decisions

    n n / 2 / 2

    n nz z

    Sample size calculation for two-tailed tests:Sample size calculation for two-tailed tests:

    2 2/ 2

    02

    ( ), where

    Z Zn

    02 , where n

    32

  • 33

    Inference for a difference in means, variances knownStatistical inference for two samplesStatistical inference for two samples

    34

    Hypothesis tests for a difference in means, variances known

    Confidence interval on a difference in means, variances knownConfidence interval on a difference in means, variances known

    35

    36

    Inference for a difference in means of two normalDistributions: Variances unknown

    Hypothesis Tests for the Difference in Means

    Distributions: Variances unknown

  • 37

    38

    39

    40

    Example 3.9 The top figure shows comparative box plot for the The top figure shows comparative box plot for the

    yield data for the two types of catalysts. These comparative boxplots indicate that there is no obvious difference in the median of the two 94

    95

    96

    97

    obvious difference in the median of the two samples, although the second sample has a slightly larger sample dispersion or variance. There are no exact rules for comparing two samples with boxplots; their primary value is in the visual 90

    91

    92

    93

    94

    Yie

    ld

    boxplots; their primary value is in the visual impression they provide as a tool for explaining the results of a hypothesis test, as well as in the verification of assumptions.

    1 289

    90

    Catalyst type

    Normal Probability Plotverification of assumptions.

    The bottom figure shows the normal probability plot of the two samples of yield data. Note that both 0.7 5

    0.9 0

    0.9 5

    Normal Probability Plot

    catalyst 1

    of the two samples of yield data. Note that both samples plot approximately along straight lines, and the straight lines for each sample have similar slopes (i.e. similar standard deviations). Hence, we conclude that the normality and equal variances

    0.2 5

    0.5 0

    Prob

    abili

    ty

    catalyst 2

    conclude that the normality and equal variances assumptions are reasonable.

    89 90 91 92 93 94 95 96 97

    0.0 5

    0.1 0

    Data

  • 41

    Pooled-Variance t-Test Example

    You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data:

    NYSE NASDAQNYSE NASDAQNumber 21 25Sample mean 3.27 2.53Sample std dev 1.30 1.16Sample std dev 1.30 1.16

    Assuming both populations are Assuming both populations are approximately normal with equal variances, isthere a difference in meanthere a difference in meanyield ( = 0.05)?

    42Pooled-Variance t Test Example: Calculating the Test Statistic

    (continued)

    H0: 1 - 2 = 0 i.e. (1 = 2)

    The test statistic is:

    H0: 1 - 2 = 0 i.e. (1 = 2)H1: 1 - 2 0 i.e. (1 2)

    1 2 1 20

    X X 3.27 2.53 0t 2.040

    The test statistic is:

    0

    2p

    1 2

    t 2.0401 11 1 1.5021S 21 25n n

    1.50211.161251.30121S1nS1nS

    22222

    2112

    1 221 25n n

    1.5021

    1)25(1)-(211)n()1(nS

    21

    22112p

    43Pooled-Variance t Test Example: Hypothesis Test SolutionSolution

    H : - = 0 i.e. ( = )Reject H0 Reject H0H0: 1 - 2 = 0 i.e. (1 = 2)

    H1: 1 - 2 0 i.e. (1 2) = 0.05 .025.025

    df = 21 + 25 - 2 = 44Critical Values: t = 2.0154

    t0 2.0154-2.01542.040

    Test Statistic: Decision:Reject H0 at = 0.05

    2.040

    3.27 2.53t 2.040

    Conclusion:Reject H0 at = 0.05

    There is evidence of a

    0t 2.0401 11.502121 25

    There is evidence of a

    difference in means.

    44Pooled-Variance t Test Example: Confidence Interval for 1 - 2Interval for 1 - 2

    Since we rejected H0 can we be 95% confident that NYSE > NASDAQ?

    95% Confidence Interval for NYSE - NASDAQ

    Since 0 is less than the entire interval, we can be 95% confident that

    1 2 21 2 /2, 2 p1 2

    1 1X X S 0.74 2.0154 0.3628 (0.09, 1.471)n nn n

    t

    Since 0 is less than the entire interval, we can be 95% confident that NYSE > NASDAQ