12. inference about two populations

Upload: nurgazy-nazhimidinov

Post on 03-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 12. Inference About Two Populations

    1/79

    1

    Inference aboutTwo Populations

  • 7/28/2019 12. Inference About Two Populations

    2/79

    2

    Introduction

    Variety of techniques are presentedwhose objective is to compare twopopulations.

    We are interested in:

    The difference between two means. The difference between two proportions.

  • 7/28/2019 12. Inference About Two Populations

    3/79

    3

    INFERENCE ABOUT THEDIFFERENCE BETWEEN TWO

    SAMPLES: INDEPENDENT SAMPLES

    POPULATION 1 POPULATION 2

    PARAMETERS:1, 21

    22

    PARAMETERS:2,

    Statistics: Statistics:

    Sample size: n 1 Sample size: n 2

    21 1x , s

    22 2x , s

  • 7/28/2019 12. Inference About Two Populations

    4/79

    4

    Inference about the Differencebetween Two Means:Independent Samples

    Two random samples are drawn from the

    two populations of interest.

    Because we compare two population

    means, we use the statistic 1 2 X X

  • 7/28/2019 12. Inference About Two Populations

    5/79

    5

    The Sampling Distribution of 1 2 X X

    1 2 X X

    1 2 X X

    1 2 X X

    1 2 X X

    1. is normally distributed if the(original) population distributions are normal .

    2. is approximately normallydistributed if the (original) population is notnormal, but the samples size is sufficientlylarge (greater than 30).

    3. The expected value of is 1 - 2

    4. The variance of is 12/n1 + 22/n2

  • 7/28/2019 12. Inference About Two Populations

    6/79

    6

    If the sampling distribution of isnormal or approximately normal we canwrite:

    Z can be used to build a test statistic or a confidence interval for 1 - 2

    21

    21

    nn

    )()xx(Z

    21 xx

    Making an inference about

  • 7/28/2019 12. Inference About Two Populations

    7/79

    7

    21

    21

    nn

    )()xx(Z

    Practically, the Z statistic is hardlyused, because the population variancesare not known.

    ? ?

    Instead, we construct a t statistic using thesample variances (S12 and S22).

    S22S12t

    Making an inference about

  • 7/28/2019 12. Inference About Two Populations

    8/79

    8

    Two cases are considered whenproducing the t-statistic.

    The two unknown population variances areequal .

    The two unknown population variances areno t equ a l .

    Making an inference about :

    and unknown case

  • 7/28/2019 12. Inference About Two Populations

    9/79

    9

    Inference about : Equalvariances

    2nns)1n(s)1n(

    S21

    2

    22

    2

    112

    p

    Example: s12

    = 25; s22

    = 30; n1 = 10; n2 = 15. Then,

    04347.2821510

    )30)(115()25)(110(S2p

    Calculate the pooled variance estimate by:

    n2 = 15 n

    1= 10

    2

    1S

    2

    2S

    The pooled

    varianceestimator

  • 7/28/2019 12. Inference About Two Populations

    10/79

    10

    Inference about : Equalvariances

    2nns)1n(s)1n(

    S21

    2

    22

    2

    112

    p

    Example: s12

    = 25; s22

    = 30; n1 = 10; n2 = 15. Then,

    04347.2821510

    )30)(115()25)(110(S2p

    Calculate the pooled variance estimate by:

    2pS

    n2 = 15 n

    1= 10

    2

    1S

    2

    2S

    The pooled

    Varianceestimator

  • 7/28/2019 12. Inference About Two Populations

    11/79

    11

    Inference about : Equalvariances

    Construct the t-statistic as follows:

    2nn.f .d

    )n1

    n1

    (s

    )()xx(t

    21

    21

    2p

    21

    Perform a hypothesis testH0: = 0H1: > 0

    or < 0 or 0

    Build a confidence interval

    1 2

    21 2 , 2

    1 2

    1 1( ) ( )

    is the confidence level.

    n n p x x t s n n

    where

  • 7/28/2019 12. Inference About Two Populations

    12/79

    12

    EXAMPLE

    The statistics obtained from randomsampling are given as

    It is thought that 1 < 2. Test the

    appropriate hypothesis assumingnormality with = 0.01.

    1 1 1

    2 2 2

    n 8, x 93,s 20

    n 9, x 129,s 24

  • 7/28/2019 12. Inference About Two Populations

    13/79

    13

    SOLUTION

    1 and 2 are unknown t-test

    Because s 1 and s 2 are not much differentfrom each other, use equal-variance t-test.H0: 1 = 2

    H A: 1 < 2 (or 1 - 2

  • 7/28/2019 12. Inference About Two Populations

    14/79

    14

    Decision Rule:Reject H

    0if t < -t

    0.01,8+9-2=-2.602

    Conclusion: Since t = -3.33 < -t 0.01,8+9-2 =-2.602, reject H 0 at =

    0.01.

    1

    2 2 2 22 1 1 2 2

    p

    2

    2

    p

    1 2

    1 2

    (

    (n 1)s (n 1)s (7)20 (8)24s 494

    n n 2 8 9 2x x ) 0 (93 129) 0

    t 3.331 11 1 494s8 9n n

  • 7/28/2019 12. Inference About Two Populations

    15/79

    15

    Test Statistic for 1- 2 when 1 2 and unknown

    Test Statistic:

    with the degree of freedom

    1 2 1 2

    2 21 2

    1 2

    (x x ) ( )t =

    s sn n

    2 2 21 1 2 2

    2 22 21 1 2 2

    1 2

    (s / n s / n )

    s / n s / n

    n 1 n 1

  • 7/28/2019 12. Inference About Two Populations

    16/79

    16

    Inference about : Unequal

    variancesConduct a hypothesis testas needed, or,build a confidence interval

    int

    2 21 2

    ( ) ( )1 2 , 1 2is the confidence level

    Confidence erval

    s s

    x x t 2 n n

    where

  • 7/28/2019 12. Inference About Two Populations

    17/79

    17

    Which case to use:Equal variance or unequal

    variance? Whenever there is insufficient evidence that

    the variances are unequal, it is preferable to

    perform the equal variances t-test . This is so, because for any two given

    samples

    The number of degreesof freedom for the equalvariances case

    The number of degreesof freedom for the unequalvariances case

  • 7/28/2019 12. Inference About Two Populations

    18/79

    18

    Do people who eat high-fiber cereal for breakfast consume, on average, fewer calories for lunch than people who do not

    eat high-fiber cereal for breakfast? A sample of 30 people was randomlydrawn. Each person was identified as aconsumer or a non-consumer of high-fiber cereal.

    For each person the number of caloriesconsumed at lunch was recorded.

    Example: Making an inferenceabout

  • 7/28/2019 12. Inference About Two Populations

    19/79

    19

    onsumers on-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754

    637 741617 628633 537555 748

    . .

    . .

    . .

    . .

    Solution:

    The data are interval.

    The parameter to be tested isthe difference between two means.

    The claim to be tested is:The mean caloric intake of consumers (1)is less than that of non-consumers ( 2).

    Example: Making an inferenceabout

  • 7/28/2019 12. Inference About Two Populations

    20/79

    20

    The hypotheses are:

    H0: ( 1 - 2) = 0

    H1: ( 1 - 2) < 0 To check the whether the population variances areequal, we use computer output to find the samplevariances

    We have s 12= 1274.49, and s22 = 13,386.49.

    It appears that the variances are unequal .

    Example: Making an inferenceabout

  • 7/28/2019 12. Inference About Two Populations

    21/79

    21

    Example: Making an inferenceabout

    Compute: Manually

    From the data we have:

    1 2

    1 2

    595.8; x 661.1

    35.7; s 115.7

    x

    s

    2

    2 2

    2 22 2

    35.7 /10 115.7 / 20 25.0135.7 /10 115.7 / 20

    10 1 20 1

    df

  • 7/28/2019 12. Inference About Two Populations

    22/79

    22

    Example: Making an inferenceabout

    Compute: Manually The rejection region is t < -t , = -t .05,25 @ -1.708

    1 2 1 22 2 2 21 2

    1 2

    (x x ) ( ) (598.8 661.1) 0t = 2.31

    s s 35.7 115.7n n 30 30

  • 7/28/2019 12. Inference About Two Populations

    23/79

    23

    MINITAB OUTPUT Two Sample T-Test and Confidence Interval

    Twosample T for Consumers vs Non-cmrs

    N Mean StDev SE MeanConsumers 10 595.8 35.7 11Non-cmrs 20 661 116 26

    95% C.I. for mu Consumers - mu Non-cmrs: ( -123, -7)T-Test mu Consmers = mu Non-cmrs (vs

  • 7/28/2019 12. Inference About Two Populations

    24/79

    24

    2 21 2( )

    1 2 / 2,1 2

    4103 10670(604.02 633.239) 1.9796

    43 10729.21 27.65 56.86, 1.56

    s s x x t

    n n

    Compute: ManuallyThe confidence interval estimator for thedifference between two means is

    Example: Making an inferenceabout

  • 7/28/2019 12. Inference About Two Populations

    25/79

    25

    An ergonomic chair can be assembledusing two different sets of operations

    (Method A and Method B) The operations manager would like to know

    whether the assembly time under the two

    methods differ.

    Example

  • 7/28/2019 12. Inference About Two Populations

    26/79

    26

    Example Two samples are randomly and independently

    selected

    A sample of 25 workers assembled the chair using method A.

    A sample of 25 workers assembled the chair using method B.

    The assembly times were recorded

    Do the assembly times of the two methods differs ?

  • 7/28/2019 12. Inference About Two Populations

    27/79

    27

    Example: Making an inference

    about Method A Method B

    6.8 5.25.0 6.7

    7.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.7

    6.5 6.6. .. .. .. .

    Assembly times in Minutes

    Solution

    The data are interval.

    The parameter of interest is the differencebetween two population means.

    The claim to be tested is whether a differencebetween the two methods exists.

  • 7/28/2019 12. Inference About Two Populations

    28/79

    28

    Solution: Making an inference

    about Compute: Manually The hypotheses test is:

    H0: ( 1 - 2) 0H1: ( 1 - 2) 0

    To check whether the two unknown population variances areequal we calculate S12 and S22 .

    We have s 12= 0.8478, and s22 =1.3031.

    The two population variances appear to be equal.

  • 7/28/2019 12. Inference About Two Populations

    29/79

    29

    Solution: Making an inference

    about Compute: Manually

    4822525.f .d

    93.0

    251

    251

    076.1

    0)016.6288.6(t

    3031.1s 8478.0s 016.6x 288.6x 222121

    076.122525

    )303.1)(125()848.0)(125(S 2p

    To calculate the t-statistic we have:

  • 7/28/2019 12. Inference About Two Populations

    30/79

    30

    The rejection region is t < -t / , =-t .025,48 = -2.009or t > t / , = t .025,48 = 2.009

    CONCLUSION: Since t = -2.009 < 0.93 < 2.009,there is insufficient evidence to reject the nullhypothesis.

    For = 0.05

    2.009.093-2.009

    Rejection regionRejection region

    Solution

  • 7/28/2019 12. Inference About Two Populations

    31/79

    31

    Solution: Making an inference

    about

    .3584 > .05

    -2.0106 < .93 < +2.0106

    t-Test: Two-Sample Assuming Equal Variances

    Method A Method B

    Mean 6.29 6.02Variance 0.8478 1.3031Observations 25 25Pooled Variance 1.08Hypothesized Mean Difference 0df 48t Stat 0.93P(T

  • 7/28/2019 12. Inference About Two Populations

    32/79

    32

    Conclusion: There is no evidence to infer

    at the 5% significance level that the twoassembly methods are different in terms of assembly time

    Solution: Making an inference

    about

  • 7/28/2019 12. Inference About Two Populations

    33/79

    33

    Solution: Making an inference

    about A 95% confidence interval for 1 - 2 is calculated as follows:

    1 2

    2

    1 2 , 2

    1 2

    1 1( ) ( )

    1 16.288 6.016 2.0106 1.075( )

    25 250.272 0.5896 [ 0.3176, 0.8616]

    n n p x x t sn n

    Thus, at 95% confidence level -0.3176 < 1 - 2 < 0.8616

    Notice: Zero is included in the confidence interval

  • 7/28/2019 12. Inference About Two Populations

    34/79

    34

    Checking the required conditions for the equal variances case

    The data appear to beapproximately normal

    0

    2

    4

    6

    8

    10

    12

    5 5.8 6.6 7.4 8.2 More

    Design A

    01234

    567

    4.2 5 5.8 6.6 7.4 More

    Design B

  • 7/28/2019 12. Inference About Two Populations

    35/79

    35

    ANALYSIS OF PAIRED DATA

    What is a matched pair experiment?

    Why matched pairs experiments are needed?

    How do we deal with data produced in this way?

    The following example demonstrates a situationwhere a matched pair experiment is the correctapproach to test the difference between twopopulation means.

  • 7/28/2019 12. Inference About Two Populations

    36/79

  • 7/28/2019 12. Inference About Two Populations

    37/79

    37

    Solution Compare two

    populations of intervaldata.

    The parameter testedis 1 - 2

    Finance Marketing61,228 73,36151,836 36,95620,620 63,627

    73,356 71,06984,186 40,203. .. .. .

    1

    2

    The mean of the highest salaryoffered to Finance MBAs

    The mean of the highest salaryoffered to Marketing MBAs

    H0: ( 1 - 2) = 0H1: ( 1 - 2) > 0

    ANALYSIS OF PAIRED DATA

  • 7/28/2019 12. Inference About Two Populations

    38/79

    38

    Solution continued

    From the data we have:

    559,228,262s

    ,294,433,360s

    423,60x

    624,65x

    22

    21

    2

    1

    Let us assume equalvariances

    ANALYSIS OF PAIRED DATA

    Equal VariancesFinance Marketing

    Mean 65624 60423Variance 360433294 262228559Observations 25 25Pooled Variance 311330926Hypothesized Mean Difference 0df 48t Stat 1.04P(T

  • 7/28/2019 12. Inference About Two Populations

    39/79

    39

    Question The difference between the sample means is

    65624 60423 = 5,201. So, why could we not reject H 0 and favor H 1

    where ( 1 2 > 0)?

    The effect of a large samplevariability

  • 7/28/2019 12. Inference About Two Populations

    40/79

  • 7/28/2019 12. Inference About Two Populations

    41/79

    41

    Reducing the variability

    The values each sample consists of might markedly vary...

    The range of observationssample B

    The range of observationssample A

  • 7/28/2019 12. Inference About Two Populations

    42/79

    42

    ...but the differences between pairs of observations might be quite close to one another, resulting in a smallvariability of the differences.

    0

    Differences

    The range of thedifferences

    Reducing the variability

  • 7/28/2019 12. Inference About Two Populations

    43/79

    43

    Analysis of Paired Data

    Since the difference of the means isequal to the mean of the differences wecan rewrite the hypotheses in terms of D(the mean of the differences) rather than interms of 1 2.

    This formulation has the benefit of asmaller variability.

    Group 1 Group 2 Difference10 12 - 215 11 +4

    Mean1 =12.5 Mean2 =11.5Mean1 Mean2 = 1 Mean Differences = 1

  • 7/28/2019 12. Inference About Two Populations

    44/79

    44

    Analysis of Paired Data

    Data are generated from matched pairs notindependent samples.

    Let X i and Y i denote the measurements for the i-th subject. Thus, (X

    i, Y

    i) is a matched pair

    observations. Denote D i = Y i-Xi or X i-Yi. If there are n subjects studied, we have

    D1, D 2,, D n. Then, n n

    2 2i i 2

    2 2 Di 1 i 1D

    D

    D D nDs

    D and s sn n 1 n

  • 7/28/2019 12. Inference About Two Populations

    45/79

    45

    CONFIDENCE INTERVAL FORD= 1 - 2

    A 100(1- C.I. for D= is given by :

    For n 30, we can use z instead of t.

    DD /2, n-1

    sx tn

  • 7/28/2019 12. Inference About Two Populations

    46/79

    46

    HYPOTHESIS TESTS FORD= 1 - 2

    The test statistic for testing hypothesisabout D is given by

    with degree of freedom n-1.

    D D

    Dxt =s / n

    EXAMPLE

  • 7/28/2019 12. Inference About Two Populations

    47/79

    47

    EXAMPLE Sample data on attitudes before and

    after viewing an informational film.Subject Before After Difference

    1 41 46.9 5.9

    2 60.3 64.5 4.23 23.9 33.3 9.44 36.2 36 -0.25 52.7 43.5 -9.26 22.5 56.8 34.3

    7 67.5 60.7 -6.88 50.3 57.3 79 50.9 65.4 14.5

    10 24.6 41.9 17.3

    i X i Yi D i=Y i-X i

  • 7/28/2019 12. Inference About Two Populations

    48/79

    48

    90% CI for D= 1- 2:

    With 90% confidence, the mean attitudemeasurement after viewing the film exceedsthe mean attitude measurement beforeviewing by between 0.36 and 14.92 units.

    DD 7.64,s 12.57

    D/ 2,n 1

    s 12.57D t 7.64 1.833

    n 10

    t0.05, 9

    D 1 20.36 14.92

  • 7/28/2019 12. Inference About Two Populations

    49/79

    49

    EXAMPLE

    How can we design an experiment toshow which of two types of tires isbetter? Install one type of tire on onewheel and the other on the other (front)wheels. The average tire (lifetime)distance (in 1000s of miles) is:

    with a sample difference s.d. of There are a total of n=20 observations

    4.55 D X

    7.22 D s

  • 7/28/2019 12. Inference About Two Populations

    50/79

    50

    SOLUTION

    H0: D=0

    H A: D>0

    Test Statistics:D D

    D

    x 4.55 0t = 2.82

    s / n 7.22 / 20

    Rejection H 0 if t>t .05,19 =1.729 ,Conclusion: Reject H 0 at =0.05

  • 7/28/2019 12. Inference About Two Populations

    51/79

    51

    EXAMPLE

    It is claimed that an industrial safetyprogram is effective in reducing the loss of working hours due to factory accidents.The following data are collectedconcerning the weekly loss of workinghours due to accidents in six plants both

    before and after the safety program isinstituted.

  • 7/28/2019 12. Inference About Two Populations

    52/79

    52

    Loss of working hours 1 2 3 4 5 6

    Before 12 30 15 37 29 15 After 10 29 16 35 26 16

    Do the data substantiate the claim?

    Use = 0.05 .

  • 7/28/2019 12. Inference About Two Populations

    53/79

    53

    ANSWER

    This is a matched pair experiment becausesamples from two populations are notindependent.

    Loss of working hours Difference 2 1 -1 2 3 -1

    1, 1.67, 6 D D x s n

  • 7/28/2019 12. Inference About Two Populations

    54/79

    54

    1 denote the average loss of working hours due

    to factory accidents before the safety program .

    2 denote the average loss of working hours dueto factory accidents after the safety program.

    Also let . Then,1 2 D

    0 : 0

    : 0 D

    A D

    H

    H

  • 7/28/2019 12. Inference About Two Populations

    55/79

    55

    Test statistic:

    Rejection region: Conclusion: Do not reject H 0 at = 0.05

    because . There isnot sufficient evidence to conclude that the

    mean loss of working hours due to factoryaccidents reduces after the safetyprogram.

    11.47/ 1.67 / 6

    D

    D

    xt s n

    , 1 0.05,5 2.015nt t t

    0.05,51.47 2.015t t

    PAIRED DATA AND TWO

  • 7/28/2019 12. Inference About Two Populations

    56/79

    56

    PAIRED DATA AND TWOSAMPLE t PROCEDURE

    The two-sample t test is based on theassumption of independence.

    In many paired experiments, there is astrong dependence between variables.

    I f Ab t th Diff

  • 7/28/2019 12. Inference About Two Populations

    57/79

    57

    Inference About the Differenceof Two Population Proportions

    Population 1 Population 2

    PARAMETERS: p1

    PARAMETERS: p2

    Statistics: Statistics:

    Sample size: n 1 Sample size: n 2

    1

    p2

    p

    I f b h diff

  • 7/28/2019 12. Inference About Two Populations

    58/79

    58

    Inference about the differencebetween two population

    proportions In this section we deal with two populations

    whose data are nominal.

    For nominal data we compare the populationproportions of the occurrence of a certain event.

    Examples Comparing the effectiveness of new drug versus older

    one Comparing market share before and after advertising

    campaign Comparing defective rates between two machines

  • 7/28/2019 12. Inference About Two Populations

    59/79

    59

    Parameter and Statistic

    Parameter When the data are nominal, we can only

    count the occurrences of a certain event in

    the two populations, and calculateproportions.

    The parameter is therefore p 1 p2.

    Statistic An unbiased estimator of p 1 p2 is

    (the difference between the sampleproportions).

    1 2 p p

  • 7/28/2019 12. Inference About Two Populations

    60/79

    60

    Sample 1Sample size n1 Number of successes x1 Sample proportion

    Two random samples are drawn from twopopulations. The number of successes in each sample is

    recorded.

    The sample proportions are computed.

    Sample 2

    Sample size n2 Number of successes x2 Sample proportionx

    n 1

    1

    p 1

    2

    22 n

    xp

    Sampling Distribution of 1 2

    p p

  • 7/28/2019 12. Inference About Two Populations

    61/79

    61

    SAMPLING DISTRIBUTION OF

    A point estimator of p 1-p 2 is

    The sampling distribution of is

    if nip i 5 and n i(1-p i) 5, i=1,2.

    1 2

    p p

    1 2

    1 2 1 2

    x x p p

    n n

    1 2

    p p

    1 1 2 21 2 1 2

    1 2

    p (1 p ) p (1 p ) p p ~ N(p p , )n n

  • 7/28/2019 12. Inference About Two Populations

    62/79

    62

    2

    22

    1

    11

    2121

    )1()1(

    )()

    (

    n p p

    n p p

    p p p p Z

    The z-statistic

    Because and are unknown the standard error must be estimated using the sample proportions.The method depends on the null hypothesis

    1 p 2 p

  • 7/28/2019 12. Inference About Two Populations

    63/79

    63

    Testing the p 1 p2

    There are two cases to consider:Case 1:

    H0: p1-p2 =0Calculate the pooled proportion

    1 2

    1 2

    x x p

    n nThen Then

    Case 2:

    H0: p1-p2 =D (D is not equal to 0)Do not pool the data

    22

    2

    x p

    n1

    11

    x p

    n

    1 2

    1 2

    ( ) 01 1

    (1 )( )

    p p Z

    p pn n

    2

    22

    1

    11

    21

    n)p

    1(p

    n)p

    1(p

    D)p

    p

    (Z

  • 7/28/2019 12. Inference About Two Populations

    64/79

    64

    EXAMPLE (CASE 1)

    A manufacturer claims that compared with hisclosest competitor, fewer of his employeesare union members. Over 318 of his

    employees, 117 are unionists. From a sampleof 255 of the competitors labor force, 109 areunion members. Perform a test at = 0.05.

    p1: the proportion of the manufacturers

    employees that are union members. p2: the proportion of his closest competitors

    employees that are union members.

  • 7/28/2019 12. Inference About Two Populations

    65/79

    65

    SOLUTIONH

    0: p

    1- p

    2=0

    H A: p 1- p 2 < 0

    and , so pooled

    sample proportion is

    Test Statistic:

    11

    1

    x 117 p

    n 318 2

    22

    x 109 p

    n 255

    1 2

    1 2

    x x 117 109 p 0.39

    n n 318 255

    (117 / 318 109 / 255) 0

    z 1.45181 1

    (0.39)(1 0.39)318 255

  • 7/28/2019 12. Inference About Two Populations

    66/79

    66

    Decision Rule: Reject H 0 if z < -z 0.05 =-1.645.

    Conclusion: Because z = -1.4518 > -z 0.05 =-1.645, not reject H 0 at =0.05. Manufacturer is wrong.

  • 7/28/2019 12. Inference About Two Populations

    67/79

    67

    The marketing manager needs to decidewhich of two new packaging designs toadopt, to help improve sales of hiscompanys soap. A study is performed in two supermarkets:

    Brightly-colored packaging is distributed insupermarket 1.

    Simple packaging is distributed in supermarket 2.

    First design is more expensive, therefore,to befinancially viable it has to outsell the seconddesign.

    Testing p 1 p2 (Case 1)

  • 7/28/2019 12. Inference About Two Populations

    68/79

    68

    Summary of the experiment results Supermarket 1 - 180 purchasers of Johnson

    Brothers soap out of a total

    of 904 Supermarket 2 - 155 purchasers of Johnson

    Brothers soap out of a total

    of 1,038 Use 5% significance level and perform a

    test to find which type of packaging touse.

    Testing p 1 p2 (Case 1)

  • 7/28/2019 12. Inference About Two Populations

    69/79

    69

    Solution The problem objective is to compare the

    population of sales of the two packaging

    designs. The data are nominal (Johnson Brothers or

    other soap) The hypotheses are

    H0: p 1 - p 2 = 0H1: p 1 - p 2 > 0

    We identify this application as case 1

    Population 1: purchases at supermarket 1Population 2: purchases at supermarket 2

    Testing p 1 p2 (Case 1)

  • 7/28/2019 12. Inference About Two Populations

    70/79

    70

    Testing p 1 p2 (Case 1)

    Compute: Manually For a 5% significance level the rejection region is

    z > z = z .05 = 1.645

    1 2 1 2

    ( ) ( ) (180 155) (904 1,038) .1725

    The pooled proportion is

    p x x n n

    90.2

    038,11

    9041

    )1725.1(1725.

    1493.1991.

    11)

    1(

    )()

    (

    21

    2121

    nn p p

    p p p p Z

    becomes statistic z The

    1 2

    180 904 .1991, 155 1, 038 .1493

    The sample proportions are

    p and p

  • 7/28/2019 12. Inference About Two Populations

    71/79

    71

    Testing p 1 p2 (Case 1) Excel (Data Analysis Plus)

    Conclusion: There is sufficient evidence to conclude at the 5%significance level, that brightly-colored design will outsell thesimple design.

    z-Test: Two Proportions

    Supermark et 1 Supermark et 2 Sample Proportions 0.1991 0.1493Observations 904 1038Hypothesized Difference 0z Stat 2.90P(Z

  • 7/28/2019 12. Inference About Two Populations

    72/79

    72

    The bath soap of Johnson Brother Company is notselling well. Hoping to improve sales, the companysadvertising agency developed two new designs. Thefirst design features several bright colors and thesecond design is light green in color with thecompanys logo on it. Management needs to decidewhich of two new packaging designs to adopt, to helpimprove sales of a certain soap.

    A study is performed in two supermarkets: For the brightly-colored design to be financially viable

    it has to outsell the simple design by at least 3%.

    Testing p 1 p2 (Case 2)

  • 7/28/2019 12. Inference About Two Populations

    73/79

    73

    Summary of the experiment results Supermarket 1 - 180 purchasers of Johnson

    Brothers soap out of a total of 904

    Supermarket 2 - 155 purchasers of JohnsonBrothers soap out of a total of 1,038

    Use 5% significance level and perform a test tofind which type of packaging to use.

    Testing p 1 p2 (Case 2)

  • 7/28/2019 12. Inference About Two Populations

    74/79

    74

    Solution The hypotheses to test are

    H0: p 1 - p 2 = .03H1: p 1 - p 2 > .03

    We identify this application as case 2 (thehypothesized difference is not equal to

    zero).

    Testing p 1 p2 (Case 2)

  • 7/28/2019 12. Inference About Two Populations

    75/79

    75

    Compute: Manually

    The rejection region is z > z = z.05 = 1.645.Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis.There is insufficient evidence to infer that the brightly-coloreddesign will outsell the simple design by 3% or more.

    Testing p 1 p2 (Case 2)

    15 . 1

    038 , 1 ) 1493 . 1 ( 1493 .

    904 ) 1991 . 1 ( 1991 .

    03 . 038 , 1

    155 904 180

    ) 1 ( ) 1 ( ) (

    2

    2 2

    1

    1 1

    2 1

    n

    p p

    n

    p p

    D p p Z

    T i (C 2)

  • 7/28/2019 12. Inference About Two Populations

    76/79

    76

    Testing p 1 p2 (Case 2) Using Excel (Data Analysis Plus)

    z-Test: Two Proportions

    Supermarket 1 Supermarket 2 Sample Proportions 0.1991 0.1493Observations 904 1038Hypothesized Differen 0.03z Stat 1.14P(Z

  • 7/28/2019 12. Inference About Two Populations

    77/79

    77

    ESTIMATING p 1-p 2

    1 1 2 21 2 / 2

    1 2

    ( )p q p q

    p p z n n

    100(1 )% Confidence Interval for p 1-p 2:

  • 7/28/2019 12. Inference About Two Populations

    78/79

    78

    EXAMPLE

    An antibiotic for pneumonia was injected into100 patients with kidney malfunctions (calleduremic patients) and 100 patients with nokidney malfunctions (called normal patients).Some allergic reaction developed in 38 of theuremic patients and 21 of the normalpatients.

    ) D h d id id h

  • 7/28/2019 12. Inference About Two Populations

    79/79

    a) Do the data provide strong evidence thatthe rate of incidence of allergic reaction to

    the antibiotics is higher in uremic patientsthan normal patients ?

    Let p 1: the rate of incidence of allergic reaction to theantibiotics in uremic patients and

    P2: the rate of incidence of allergic reaction to theantibiotics in normal patients

    b) Construct a 95% confidence interval for the difference between the populationproportions and interpret the result .