copyright (c) bani k. mallick1 stat 651 lecture #16

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture #16


Topics in Lecture #16 Inference about two population

proportions


Book Sections Covered in Lecture #16

Chapter 10.3


Lecture #15 Review: Categorical Data

In general, we can discuss a problem where the outcome is binary, the success probability is , and number of experiments is n.

X = the number of successes in the experiment

= the fraction of successes in the experiment



The number of success X in n experiments each with probability of success is called a binomial random variable

There is a formula for this:

Pr(X = k) =

0! = 1, 1! = 1, 2! = 2 x 1 = 2, 3! = 3 x 2 x 1 = 6, 4! = 4 x 3 x 2 x 1 = 24, etc.

k n kn!(1 )

k! (n-k)!



The fraction of successes in n experiments each with probability of success also have a formula :

Pr( = k/n) =

The binomial formulae is used to understand the properties of the sample fraction, e.g., its standard deviation

k n kn!(1 )

k! (n-k)!


Lecture #15 Review:

If you code your attribute as “0” and “1” in SPSS, then the sample fraction is the sample as the sample mean of these “data”

For example, let the “data” be 0,1,0,0,0,1,0,1

Then n = 8, and = 3/8

What is the sample mean of these data?


Lecture #15 Review:

If you code your attribute as “0” and “1” in SPSS, then the sample fraction is the sample as the sample mean of these “data”

For example, let the “data” be 0,1,0,0,0,1,0,1

Then n = 8, and = 3/8

What is the sample mean of these “data”?

X 3/ 8 ˆ



(1100% CI for the population fraction

is by looking up 1 in Table 1

/ 2 ˆzˆ ˆ

ˆ

(1 )ˆ ˆˆ

n

/ 2z


Lecture #15 Review: Sample Size Calculations

If you want an (1100% CI interval to be

you should set

E 2

/ 2 2

(1 )n z

E


Lecture #15 Review: Sample Size Calculations

The small problem is that you do not know . You have two choices:

Make a guess for

Set = 0.50 and calculate (most conservative, since it results in largest sample size)

2/ 2 2

(1 )n z

E


Comparison of Two Population Proportions

In some cases, we may want to compare two populations 1 and 2

The null hypothesis is H0: 1 = 2

This is the same as H0: 1 - 2 = 0

There are two ways to test this hypothesis

One is via what is called a chisquared statistic, which gives you only a p-value

This is bad: why?



In some cases, we may want to compare two populations 1 and 2

The null hypothesis is H0: 1 - 2 = 0

There are two ways to test this hypothesis

One is via what is called a chisquared statistic, which gives you only a p-value

This is bad: why? If you reject, you have no idea how different the populations are!



The null hypothesis is H0: 1 - 2 = 0

The other way is to form a CI for the difference in population proportions 1 - 2

The estimate of this difference is simply the difference in the sample fractions:1 2ˆ ˆ



The standard error of the difference in the sample fractions:

The usual way to form a CI is to replace the unknown population fractions by the sample fractions

2

1 1 2 2

1 2

1 1

1ˆ ˆ

( ) ( )n n



The estimated standard error of the difference in the sample fractions:

The (1100% CI then is

2

1 1 2 2

1 2

1 1

1ˆ ˆ

( ) ( )ˆ ˆ ˆ ˆˆ

n n

21 2 2 1/ ˆ ˆzˆ ˆ ˆ


Comparison of Two Population Proportions: Boxers versus Brief Most books force you to compute this

by hand

For female preferences in men:

For male preferences:

Think the populations are different?

1 1177 0 7345 n , .

2 2188 0 4681 n , .

1 2 0 2664 .ˆ ˆ


Comparison of Two Population Proportions: Boxers versus Brief The estimated standard error of the

difference in the sample fractions is

2

1 1 2 2

1 2

1 1

0 001102 0 001324 0 04944

1ˆ ˆ

( ) ( )ˆ ˆ ˆ ˆˆ

n n

. . .


Comparison of Two Population Proportions: Boxers versus Brief Putting this together we get that the

95% CI is 0.2664 – 1.96 * 0.04944 = 0.17 up to the value 0.2664 + 1.96 * 0.04944 = 0.36

So, 95% CI is from 0.17 to 0.36

What is this a CI for?

What is the conclusion?


Comparison of Two Population Proportions: Boxers versus Brief 95% CI is from 0.17 to 0.36

What is this a CI for? The difference in population fractions of preferring boxers is from 0.17 to 0.36

What is the conclusion? More females prefer men to wear boxers than do males, by 17% to 36%


Comparison of Two Population Proportions:

Remarkably, but perhaps not surprisingly, you do not have to compute these confidence intervals by hand!

The idea: simply pretend, and I do mean pretend, that the binary outcomes are real numbers and run your ordinary t-test CI, unequal variance line

The results will be slightly different from your hand calculations, but actually a bit more accurate


Illustration with the Boxers Problem

Group Statistics

177 .7345 .4429 3.329E-02

188 .4681 .5003 3.649E-02

GenderFemale

Male

Boxer versusBriefs Preference

N Mean Std. DeviationStd. Error

Mean

The value “1” indicates a preference for boxers

Note how women have a higher preference for boxers than do men, in this sample



Independent Samples Test

49.523 .000 5.373 363 .000 .2664 4.957E-02 .1689 .3639

5.393 361.642 .000 .2664 4.939E-02 .1692 .3635

Equal variancesassumed

Equal variancesnot assumed


F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means




49.523 .000 5.373 363 .000 .2664 4.957E-02 .1689 .3639

5.393 361.642 .000 .2664 4.939E-02 .1692 .3635




F Sig.





Difference


Difference in sample means = 0.2664

Standard error of this difference = 0.04939


Illustration with the Boxers Problem: hand CI is 0.17 to 0.36: note

similarities!


49.523 .000 5.373 363 .000 .2664 4.957E-02 .1689 .3639

5.393 361.642 .000 .2664 4.939E-02 .1692 .3635




F Sig.





Difference


p-value = 0.000. Note how you use the unequal variances p-value


Illustration with the Boxers Problem: hand CI is 0.17 to 0.36: note

similarities!


49.523 .000 5.373 363 .000 .2664 4.957E-02 .1689 .3639

5.393 361.642 .000 .2664 4.939E-02 .1692 .3635




F Sig.





Difference


The 95% CI from SPSS is 0.1692 to 0.3635. Nearly same as hand calculation.

Men and Women have different preferences at even 99.9% confidence.


US Availability and Rating: Are Better Beers More Widely

Available?

Group Statistics

11 0.45 .52 .16

24 0.75 .44 9.03E-02

Very Good versus OtherVery Good

Fair or Good

Availability in the U.S.N Mean Std. Deviation

Std. ErrorMean

With the “data” coded as 0 and 1, this means that in the sample, 45% of the very good beers were widely available

The “data” are coded as 0 = not widely available 1 = widely available



Available?

Group Statistics

11 0.45 .52 .16

24 0.75 .44 9.03E-02

Very Good versus OtherVery Good

Fair or Good

Availability in the U.S.N Mean Std. Deviation

Std. ErrorMean

With the “data” coded as 0 and 1, this means that in the sample, 75% of the fair/good beers were widely available



Available?


3.169 .084 -1.734 33 .092 -.30 .17 -.64 5.12E-02

-1.628 16.864 .122 -.30 .18 -.68 8.77E-02



Availability in the U.S.F Sig.





Difference


This is the p-value for the hypothesis that the two population fractions are the same



Note that the p-values were > 0.10

What does this mean?



Note that the p-values were > 0.10

What does this mean?

There is no evidence that those beers which are very good have any more or less national availability than those which are good or fair


Construction Example

The construction example was based on a survey made available to me.

I will look at the percentages of males sampled in Texas and in states outside of Texas

If these were random samples, they would be a measure of how different states are in their gender distributions in the construction industry


Construction Data: Gender Differences by Texas or Not

(1 = male)

Group Statistics

274 .86 .34 2.07E-02

173 .26 .44 3.35E-02

State: Texas or NotOutside Texas

Texas

SexN Mean Std. Deviation

Std. ErrorMean

Something strange: 86% of the sample outside Texas is male26% of the sample in Texas is male


Construction Data: Gender Differences by Texas or Not

(1 = male)

Something strange: 86% of the sample outside Texas is male26% of the sample in Texas is male

Not surprising: p-value = 0.000


43.713 .000 16.260 445 .000 .60 3.72E-02 .53 .68

15.379 300.960 .000 .60 3.93E-02 .53 .68



SexF Sig.





Difference




Please study the slides for the next lecture before coming to class

The material is somewhat difficult, and if you do not look at the slides and try to understand them, you will find my lecture all but impossible to understand.

copyright (c) bani k. mallick1 stat 651 lecture #16

Documents