probability issues in without replacement sampling

10
This article was downloaded by:[King Fahd University of Petroleum and Minerals] On: 5 September 2007 Access Details: [subscription number 777447305] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK International Journal of Mathematical Education in Science and Technology Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713736815 Probability issues in without replacement sampling Online Publication Date: 01 January 2007 To cite this Article: Joarder, A. H. and Al-Sabah, W. S. (2007) 'Probability issues in without replacement sampling', International Journal of Mathematical Education in Science and Technology, 38:6, 823 - 831 To link to this article: DOI: 10.1080/00207390701228385 URL: http://dx.doi.org/10.1080/00207390701228385 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. © Taylor and Francis 2007

Upload: ulab

Post on 29-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by:[King Fahd University of Petroleum and Minerals]On: 5 September 2007Access Details: [subscription number 777447305]Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of MathematicalEducation in Science andTechnologyPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713736815

Probability issues in without replacement sampling

Online Publication Date: 01 January 2007To cite this Article: Joarder, A. H. and Al-Sabah, W. S. (2007) 'Probability issues inwithout replacement sampling', International Journal of Mathematical Education inScience and Technology, 38:6, 823 - 831To link to this article: DOI: 10.1080/00207390701228385URL: http://dx.doi.org/10.1080/00207390701228385

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expresslyforbidden.

The publisher does not give any warranty express or implied or make any representation that the contents will becomplete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should beindependently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with orarising out of the use of this material.

© Taylor and Francis 2007

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

Probability issues in without replacement sampling

A. H. JOARDER* and W. S. AL-SABAH

King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia

(Received 12 June 2006)

Sampling without replacement is an important aspect in teaching conditionalprobabilities in elementary statistics courses. Different methods proposed indifferent texts for calculating probabilities of events in this context are reviewedand their relative merits and limitations in applications are pinpointed. Analternative representation of hypergeometric distribution resembling binomialdistribution may provide more insight into the problems often encountered.

1. Introduction

Suppose that we are interested in the number of a particular kind of item say x ashcoloured balls in a sample of n units drawn from a lot containing N¼ aþ b balls,of which there are a ash coloured balls and b blue coloured balls. Let Ai denote theevent of having an ash ball at the ith draw, Bi denote the event of having a blue ballat the ith draw. For a sample of size n¼ 3, the sample space would be

S ¼ fA1A2A3, A1A2B3, A1B2A3, A1B2B3, B1A2A3, B1A2B3, B1B2A3, B1B2B3g

Let us assume that the items are drawn with replacement. Then the probability ofA1A2B3 is given by

PðA1A2B3Þ ¼ PðA1ÞPðA2jA1ÞPðB3jA1A2Þ

¼aþ 0

aþ b

aþ 0

aþ b

0þ b

aþ b

¼a

aþ b

� �2b

aþ bð1:1Þ

Since each of the 32

¼ 3 sample points namely {A1A2B3, A1B2A3, B1A2A3} has the

same probability, the probability of having 2 ash balls in a sample of size 3 drawnwith replacement is given by

PðX ¼ 2Þ ¼32

� �a

aþ b

� �2b

aþ bð1:2Þ

*Corresponding author. Email: [email protected]

DOI: 10.1080/00207390701228385

Classroom notes 823

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

where X is the number of ash balls in the sample. For a sample of size n, theprobability that all the balls in the sample are ash is given by

PðX ¼ nÞ ¼a

aþ b

� �n

while the probability that all the balls in the sample are blue is given by

PðY ¼ nÞ ¼b

aþ b

� �n

where Y is the number of blue balls in the sample. In general, we have a sequenceA1A2 . . .AxBxþ 1 . . .Bn of x ash balls and n� x blue balls. Then the probability ofthis sequence would be

P A1A2 � � �Ax|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}x

Bxþ1 � � �Bn|fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl}n�x

0@

1A ¼ aþ 0

aþ b

aþ 0

aþ b� � �

aþ 0

aþ b

0þ b

aþ b

0þ b

aþ b� � �

0þ b

aþ b,

i.e.

PðA1A2 � � �AxBxþ1 � � �BnÞ ¼a

aþ b

� �xb

aþ b

� �n�x

: ð1:3Þ

Since x ash balls (from a ash balls) and n� x blue balls (from b blue balls) canhappen in n

x

sequences, the probability of any outcome of this type is given by

PðX ¼ xÞ ¼nx

� �a

aþ b

� �xb

aþ b

� �n�x

, ðx ¼ 0, 1, . . . , nÞ: ð1:4Þ

Note that while (1.3) is the probability of a sample point, (1.4) is that of a compoundevent where each of the sample points in the compound event has the sameprobability as given by (1.3). Both the events in (1.3) and (1.4) are important inelementary courses in statistics.

When the sampling is without replacement in the above situations, a closeanalogue of (1.4) is not presented in any text book. An investigation resulted inan alternative derivation of hypergeometric probability function, and is presented insection 3. Since there are some other methods proposed in different textbooks, wefirst review them. It appears that the methods discussed in section 2 may be preferredby readers not acquainted with conditional probabilities (cf. [1], p. 182).

2. Unconditional probability approach

2.1 Labelling method

Let there be aþ b distinguishable items in the lot. They are labelled and n items aredrawn without replacement. If the items in the lot are indistinguishable, they may belabelled to make them distinguishable to use the method.

Suppose that there are three ash balls and two blue balls in an urn. If the balls areindistinguishable, we cannot use the Labelling method unless we label balls of thesame kind. Let A1, A2 and A3 be ash balls and B1 and B2 blue balls. We want todraw 1 ball from them in succession without replacement. Then the sample space

824 Classroom notes

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

would be S¼ {A1, A2, A3, B1, B2}. The probability that A1 is selected in the samplewould be P(A1)¼1/5. In fact the probability that Ai

¼ (i¼ 1, 2, 3) is selected inthe sample is 1/5, while the probability that Bi (i¼ 1, 2, 3) is selected in the sampleis also 1/5. But the probability that an ash ball is selected would beP(A)¼P(A1)þP(A2)þP(A3)¼ 3/5, while the probability that a blue ball will beselected is P(B)¼P(B1)þP(B2)¼ 2/5.

Example 2.1: Suppose that there are three males and two females in a family. Youwant to invite three of them for a dinner party. What is the probability that twomales are invited?

Let M1, M2 and M3 be males while F1 and F2 be females. Then the sample spacewould be

S ¼nM1M2M3,M1M2F1,M1M2F2,M1M3F1,M1M3F2,

M1F1F2,M2M3F1,M2M3F2,M2F1F2,M3F1F2o:

Assuming that each sample point is equally likely, the probability that the male Mi

(i¼ 1, 2, 3) is selected in the sample is 6/10 and the probability that female Fi isselected in the sample is 6/10. Thus, each element in the lot has the same chance ofbeing selected in the sample.

Assuming that each sample point is equally likely, the probability that two maleswill be invited is given by

PðM1M2F1Þ þ PðM1M2F2Þ þ PðM1M3F1Þ

þ PðM1M3F2Þ þ PðM2M3F1Þ þ PðM2M3F2Þ ¼ 6=10:

The number 10 in the denominator of the above probability is the number ofelements in the sample space. If the sample size increases, it is difficult to label thesample points. This is why we have the following equivalent method.

2.2 Combinatorial method

The Combinatorial method, better known as the Hypergeometric method, is amathematical way of counting the groups or combinations where the items in the lotare distinguishable.

A general proof of the problem encountered in the introduction when thesampling is without replacement is outlined here for some avid readers. The x ashballs can be chosen (from a ash balls in the lot) in a

x

ways, and the (n� x) blue balls

can be chosen (from b blue balls in the lot) in bn�x

ways. Hence the x ash balls and

(n� x) blue balls can be chosen in the sample in ax

b

n�x

ways. Also n balls can be

chosen from aþ b items in aþbn

ways. If we consider the possibilities as equally

likely, the probability of having x ash balls and (n� x) blue balls in a sample of size nis given by

PðX ¼ xÞ ¼ax

� �b

n� x

� ��

aþ bn

� �, x ¼ 0, 1, . . . , n

where x� a and n� x� b, i.e. max{0, n� b}� x�min{a, n} [2, p. 336].

Classroom notes 825

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

3. Conditional probability approach

In most elementary textbooks the Tree Diagram Method is used to calculate theprobability of a simple event or any compound event. However, it is difficult to fit atree diagram on A4 sized paper when the sample size increases. While calculatingprobability, we show a breakdown of items in the lot size and also in the sample sizeallowing us to solve any problem with relatively larger sample size in an efficientmanner without having recourse to the Tree Diagram itself ([4], pp. 176–179). Whatwe have discussed in the introduction is actually an example of Tree DiagramMethod though the sampling method is with replacement.

Example 3.1: The Example 2.1 is a natural example for the Tree Diagram method.Let Mi be the event that in the ith (i¼ 1, 2) selection without replacement we havea male and Fj be the event that in the jth (j¼ 1, 2) selection we have a female. Thenby the Tree Diagram Method the sample space is given by

S ¼ M1M2M3,M1M2F3,M1F2M3,M1F2F3,F1M2M3,F1M2F3,F1F2M3f g:

Note that sample space does not include F1F2F3 as we have only two females.A tree diagram is provided below:

Since

PðM1M2F3Þ ¼3þ 0

3þ 2

2þ 0

2þ 2

0þ 2

1þ 2¼

2

10,

PðM1F2M3Þ ¼3þ 0

3þ 2

0þ 2

2þ 2

2þ 0

2þ 1¼

2

10,

PðF1M2M3Þ ¼0þ 2

3þ 2

3þ 0

3þ 1

2þ 0

2þ 1¼

2

10,

826 Classroom notes

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

the probability that two males are invited in the sample of 3 persons is given by

PðM1M2F3ÞþPðM1F2M3Þ þ PðF1M2M3Þ ¼ 6=10:

We now generalize the Tree Diagram method. With the notation used in theintroduction, the probability of selecting x ash balls (from a ash balls) and n� xblue balls (from b blue balls) under sampling without replacement, is obtained asfollows. Let Ai denote the event of having an ash ball at ith draw, Bi denote the eventof having a blue ball at ith draw. For a sample of size n¼ 3 without replacement,the sample space would be

S ¼ fA1A2A3, A1A2B3, A1B2A3, A1B2B3, B1A2A3, B1A2B3, B1B2A3, B1B2B3g:

Then the probability of having 2 ash balls in the sample is given by

PðA1A2B3Þ ¼ PðA1ÞPðA2jA1ÞPðB3jA1A2Þ

¼aþ 0

aþ b

ða� 1Þ þ 0

ða� 1Þ þ b

0þ b

ða� 2Þ þ b

¼ðaþ 0Þf2g

ðaþ bÞf2gb

ða� 2Þ þ bð3:1Þ

where afxg ¼ aða� 1Þ � � � ða� xþ 1Þ. Equation (3.1) is an analogue for (1.1).Also, since each of the 3

2

¼ 3 sample points, namely {A1A2A3, A1A2B3, B1A2A3},

have the same probability, the probability of having two ash balls in a sample ofsize 3 is given by

PðX ¼ 2Þ ¼32

� �ðaþ 0Þf2g

ðaþ bÞf2gb

ða� 2Þ þ bð3:2Þ

where X is the number of ash balls in a sample of size 3. This is ananalogue of (1.2)

It is easy to prove that the probability of having an ash ball in a particulardraw, say, in the 2nd draw, is given by P(A1A2A3)þP(A1A2B3)þP(B1A2A3)þP(B1A2B3)¼ a/N. For a sample of size n the probability of having all ash ballsin the sample is given by

PðX ¼ nÞ ¼ðaþ 0Þfng

ðaþ bÞfng

while the probability of having all blue balls in the sample is given by

PðY ¼ nÞ ¼ð0þ bÞfng

ðaþ bÞfng

where Y is the number of blue balls in a sample.

Classroom notes 827

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

In general, we have the sequence A1A2 �AxBxþ1 �Bn of x ash balls and n� x blueballs. Then the probability of this sequence would be

PðA1A2 � � �AxBxþ1 � � �BnÞ ¼aþ 0

aþ b

ða� 1Þ þ 0

ða� 1Þ þ b� � �ða� xþ 1Þ þ 0

ða� xþ 1Þ þ b

0þ b

ða� xÞ þ b

0þ ðb� 1Þ

ða� xÞ þ ðb� 1Þ

� � �0þ b� ðn� xÞ þ 1ð Þ

ða� xÞ þ ðb� ðn� xÞ þ 1Þ,

i.e.

PðA1A2 � � �AxBxþ1 � � �BnÞ ¼ðaþ 0Þfxg

ðaþ bÞfxgð0þ bÞfn�xg

ða� xþ bÞfn�xgð3:3Þ

which is analogous to (1.3). Since x ash balls from (a ash balls) and n� x blue balls(from b blue balls) can happen in n

x

sequences, the probability of any outcome of

this type is given by

PðX ¼ xÞ ¼nx

� �aPx

NPx

bPn�x

N�xPn�x¼

nx

� �ðaþ 0Þfxg

ðaþ bÞfxgð0þ bÞfn�xg

ða� xþ bÞfn�xgð3:4Þ

where X is the number of ash balls in a sample of size n selected withoutreplacement from the lot of size aþ b¼N and aPx ¼ aða� 1Þ � � � ða� xþ 1Þ. Thisis analogous to (1.4).

In the case a!1,N!1 as a/N!p, (0< p<1), then (3.4) can beapproximated by a binomial probability function (see appendix).

Since

nPx ¼n!

ðn� xÞ!and

nx

� �¼

n!

x!ðn� xÞ!¼

nPx

x!

the above probability in (3.4) can be represented by

PðX ¼ xÞ ¼n

x

!aPx

NPx

bPn�x

N�xPn�x

¼n!

x!ðn� xÞ!aPxbPn�x

NPn

¼aPx

x!

bPn�x

ðn� xÞ!�

aþbPn

n!,

i.e.

PðX ¼ xÞ ¼ax

� �b

n� x

� ��

aþ bn

� �ð3:5Þ

which is the well-known formula for the hypergeometric distribution [3, p. 110].We highlight below why the use of (3.4) instead of (3.5) would provide more

828 Classroom notes

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

insight into conditional probabilities and often result in less mistakes bystudents.

Example 3.2: Suppose that a shipment of 9 digital voice recorders contains 4 thatare defective. If n recorders are randomly chosen without replacement for inspection,what is the probability that

(i) the first two of n¼ 3 checked will be defective but the third one will benon-defective?

(ii) 2 of the n¼ 3 recorders will be defective?(iii) 3 of the n¼ 5 recorders will be defective?

Solution:

(i) The probability is

PðD1D2D03Þ ¼

4þ 0

4þ 5

3þ 0

3þ 5

0þ 5

2þ 5¼

5

42:

Students, who memorize combinatorial rule, erroneously calculate theprobability to be

PðX ¼ 2Þ ¼42

� �51

� ��

93

� �¼

5

14:

(ii) Since 4þ 5¼ aþ b, 2þ1¼ xþ (n� x), the probability that 2 of the 3 voicerecorders will be defective is given by

PðX ¼ 2Þ ¼ PðD1D2D03Þ þ PðD1D

02D3Þ þ PðD01D2D3Þ

¼4þ 0

4þ 5

3þ 0

3þ 5

0þ 5

2þ 5

� �þ

4þ 0

4þ 5

0þ 5

3þ 5

3þ 0

3þ 4

� �

þ0þ 5

4þ 5

4þ 0

4þ 4

3þ 0

3þ 4

� �

¼5

14:

The above insight into conditional probabilities and the number of sequencesresulting in 2 defectives is apparent in

PðX ¼ 2Þ ¼32

� �ð4þ 0Þf2g

ð4þ 5Þf2g

0þ 5

ð4� 2Þ þ 5

but not in the combinatorial method

PðX ¼ 2Þ ¼42

� �51

� ��

93

� �:

Classroom notes 829

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

(iii) Since 4þ 5¼ aþ b, 3þ 2¼ xþ (n� x), by the representation (3.4), the

probability that 3 of the 5 voice recorders will be defective is given by

PðX ¼ 3Þ ¼5

3

0@

1A ð4þ 0Þf3g

ð4þ 5Þf3gð0þ 5Þf2g

ð4� 3þ 5Þf2g

¼ 104ð4� 1Þð4� 2Þ

9ð9� 1Þð9� 2Þ

5ð5� 1Þ

6ð6� 1Þ,

i.e.

PðX ¼ 3Þ ¼20

63:

Though the above can be quickly calculated by the combinatorial method as

PðX ¼ 3Þ ¼43

� �52

� ��

95

� �¼

20

63,

it does not provide any insight into conditional probabilities, which is expected

in sampling without replacement.

4. Conclusions

Students with little background in combinatorics, say in elementary schools, may usethe Labelling method provided the lot size or sample size is small. Most readerswould find the Tree Diagram method suitable, although it can be increasinglydifficult to fit on A4 sized paper when the sample size exceeds 3. The multiplicationrule in the Tree Diagram method, although it seems obvious, provides good insightinto conditional probability. The breakdown of the lot size and the sample size in thenumerator and denominator of (3.4) is easily understood by students, and can bedone with or without tree diagrams. The combinatorial method works regardless ofthe lot size or the sample size and whether the items are distinguishable or not, andwould certainly be preferred by readers not acquainted with conditional probabilities(cf. [1], p. 182).

Although the probability functions in (3.4) and (3.5) are algebraically the same,the former provides insight into conditional probabilities, and is analogous to theBinomial Probability function in (1.4).

Acknowledgements

The authors gratefully acknowledge the excellent research support provided byKing Fahd University of Petroleum and Minerals, Saudi Arabia.

830 Classroom notes

Dow

nloa

ded

By:

[Kin

g Fa

hd U

nive

rsity

of P

etro

leum

and

Min

eral

s] A

t: 06

:26

5 S

epte

mbe

r 200

7

Appendix

In the case a!1,N!1 as a/N!p, (0< p<1), then the second term in (3.4) canbe written as

afxg

Nfxg¼

aða� 1Þ � � � ða� xþ 1Þ

NðN� 1Þ � � � ðN� xþ 1Þ

¼a

N

ða� 1Þ=N

ðN� 1Þ=N� � �ða� xþ 1Þ=N

ðN� xþ 1Þ=N:

Each of the above x ratios will tend to p so that

afxg

Nfxg! px:

Similarly the third term in (3.4) can be written as

bfn�xg

N� xfn�xg¼

bðb� 1Þ � � � ðb� ðn� xÞ þ 1Þ

ðN� nÞðN� ðn� 1ÞÞ � � � ðN� nþ 1Þ

¼b=N

ðN� xÞ=N

ðb� 1Þ=N

ðN� x� 1Þ=N� � �ðb� ðn� xÞ þ 1Þ=N

ðN� n� 1Þ=N

¼1� a=N

ðN� xÞ=N

1� ðaþ 1Þ=N

ðN� x� 1Þ=N� � �

1� ðaþ n� x� 1Þ=N

ðN� n� 1Þ=N,

which will tend to (1� p)n� x. Thus (3.4) will have a binomial distribution in the limitwith probability function

PðX ¼ xÞ ¼nx

� �pxð1� pÞx, ðx ¼ 0, 1, . . . , n; 0 < p < 1Þ:

References

[1] Barnett, S., 1998, Discrete Mathematics: Numbers and Beyond (Harlow: PearsonEducation Limited).

[2] Rohatgi, V.K., 1997, Statistical Inference (New York: John Wiley and Sons).[3] Johnson, R., 2005, Miller and Freund’s Probability and Statistics for Engineers

(New Jersey: Pearson Educational International).[4] Lapin, L.L., 1997, Modern Engineering Statistics (New York: International Thompson

Publishing Co.).

Classroom notes 831