a short course on probability and sampling

A short Course on Probability Theory and Sampling, originally prepared as lecture notes for M.Sc.

(Geography) students of Vidyasagar Univ, WB, India. Compiled by Dr. A. Kar Gupta,

[email protected], Physics Deptt, Panskura B. College, WB, India

PROBABILITY and SAMPLING

Concept of Probability, the Probability Rules, Probability Distributions and Applications

For randomly occurring events, we would like to know how many times we get a desired result

out of all trials. This means we would like to know the fraction of favourable events or trails.

Suppose, we flip a coin a few number of times. We know there is a 50-50 chance of occurring a

Head or a Tail. We may count how many times there is a Head or a Tail out of all the flips.

Let,

= No. of favourable events and = Total no. of events.

= fraction of favourable events. We can also say this is relative frequency in

the usual language of Statistics.

Now, if we do the trials a large number of times, this fraction tends to some fixed value

specific to the event. Then the limiting value of the fraction is what we call probability.

Note:

Total no. of trials is also called sample space when we are drawing samples out of total

population. As the no. of trials is increased, the sample space becomes bigger.

Definition of Probability:

Probability is the ratio of number of favourable events to the total number of events, provided

the total number of events is very large (actually infinity).

, when (infinity).

So by definition, is a fraction between 0 and 1 : .

No favourable outcome.

All the outcomes are in favour.

We can also think in the following way: probability of occurring an event, probability of

not occurring the event. Since, either the event will occur or not occur, we must write:




2

Therefore, we have, .

Example #1:

In a coin tossing, we know from our experience, = and = =

. So,

.

Example #2:

In a throw of a dice, we know that the probability of the dice facing 1 up, 2 up, 3 up etc.

will be , , and so on.

Here,

Probability of not occurring 1 is

.

Note:

The condition that the total probability of all the events has to be 1 is called normalization of

probabilities:

Rules of Probability:

When more than one event takes place, we need to calculate the joint probability for the all the

events.

Mutually Exclusive Events

Two events are mutually exclusive (or disjoint) when they cannot occur at the same time.

Suppose, two events are A and B and the individual probabilities for them are designated as

and . Mutually exclusive means,

.

Addition Rule:

Example#1: The probability of occurring either Head or Tail in a coin toss,




3

Example#2: The probability of occurring either 1 or 6 in a dice throw,

.

Independent Events

When the occurrence of one event does not influence the other but they can occur at the same

time, they are called independent. For example, the rain fall today and the Manchester United

winning a match.

Multiplication Rule:

Example #1:

What is the probability that two Heads will occur when we toss two coins together?

for the first coin and for the second coin.

.

Note that if would flip a single coin two times and ask the probability of getting Heads twice, we

would get the same answer.

Example #2:

Now we ask the question, what is the probability of getting one Head and one Tail in the

flipping of two coins together?

Consider, the probability of obtaining Head in the first coin and Tail in the second coin:

.

And the probability of obtaining Tail in the first and the Head in the second:

.

Now the total probability of above two events (either of them occurs mutually exclusively):

.

Note that in the flipping of two coins together, there are 4 types of events, HH, HT, TH, TT. Out

of which the relative occurrence of one Head and one Tail is 2/4 = /12.




4

When Events are NOT Mutually Exclusive:

If the events are not mutually exclusive, there are some

overlap. Suppose, we designate

an area A corresponding to the probability of

some event A and the area B to the probability

of another event B. The overlap between the

two areas then represents the joint

probability, . Note that for two

independent events the overlap would be zero.

Addition Rule in this case:

When Events are NOT Independent:

Multiplication rule:

) The probability of B given A. This is a conditional probability, i.e., the probability of

occurring B provided A occurs first.

Similarly, ) The probability of A, given B.

Note here that

) = , when B does not depend on A which means A and B are independent.

) = , when A does not depend on B which means A and B are independent.

So, we can write the formula for conditional probability:

)




5

Let us consider the following table and use the probability rules.

In a survey over 100 people, the question was asked whether they are graduate or not.

Q,1 What us the probability that a randomly selected person is a male?

Ans.

Q.2 What is the probability that a randomly selected person is a female?

Ans.

Q.3 What is the probability that a randomly selected person is a male who is graduate?

Ans.

[Also we can think,

]

Q.4 What is the probability that a randomly selected person is a female who is non-graduate?

Ans.

[Also,

]

Q.5 What is the probability that the randomly selected person is either a male graduate or a

female non-graduate?

Ans. This two events are mutually exclusive and by the law of addition,

.

Q.6 If we now select two persons, what is the probability that one of them is a male graduate

and another is a female non-graduate?

Ans. Two independent events are occurring together. So by the law of multiplication of

probabilities,

.

Q.7 What is the probability that a randomly selected no-graduate is a female? [Prob. of non-

graduate among female]

Graduate Non-graduate

Total

Male 40 20 60

Female 10 30 40

Total 50 50 100




6

Ans.

Q.8 What is the probability that a randomly selected graduate is a male?

Ans. This is no. of male out of total graduates,

.

Note: In Q.7 & 8, each probability is a conditional probability. However, we gave the answers by

looking at the table directly. Now we answer them in terms of the law of conditional

probability.

Ans. to Q.8: Suppose, A = graduate, B = male, = probability of male given that they are

graduates.

We use the formula:

Here, = Prob. of male graduates =

, = prob. of graduates =

.

Exercise: Q.7 can also be answered in terms of conditional probability formula. Do this and check

yourself.

Q.9 What is the probability that the selected person is either male or graduate?

Ans. Here the two events do not happen together but they are not mutually exclusive. So we

use the formula:

=

.

Probability Distributions

Let us think of the probabilities for a number of events marked 1, 2, 3..and so on.

For each event we can have and also for all the events,

(normalization).

So, we have a set of probabilities corresponding to a set of events. This collection of

probabilities is a probability distribution for all that discrete events.

Imagine, instead of discrete events, we have as a variable which can have continuous values.

Also, there is the probability for each value of . Now if we plot against , we get a




7

continuous curve which is the continuous probability distribution curve (commonly referred as

the probability distribution curve).

Fig. 3.1

Area under the curve (above x-axis) can be obtained by summing up the areas of the approximate

rectangular bars (which we may easily find by plotting this on a graph paper). Approximate area

of one such bar of width and height is = . So, the approximate total area

between the two end points and is =

.

To calculate exactly, we need the help of Integral Calculus which essentially sums up the areas

of the rectangles (bars) of infinitesimally (smaller than the smallest you can think) small width.

Those not familiar with the Mathematics of Calculus, do not have to worry as the following

explanation and symbols can be understood qualitatively which may serve the purpose for now.

The area under the curve (between the two extreme points shown in the above figure) is the

following definite integral:

Area =

= .

is the total probability for all the values between the two limits. That is why, is often

referred to as the probability density. So, is the actual probability in between and

, where is the infinitesimally small (smaller than you can think) range! Note that the

area of the bar of height and width at some position is .

As in the discrete case, the area is the sum of all the mutually exclusive events.

[The sum (called sigma) in the discrete case becomes (called integral) for the continuous

case.]




8

Also,

= (Normalization)

Normalization means that the total area under the curve (extended from negative infinity to

positive infinity that means over the entire stretch of the curve.) is unity. This is true as in

discrete case we know that the sum of all the probabilities for all the events should be 1.

For discrete events, we calculated the relative frequency and then the Bar diagram from them.

Here for the continuous case, the bars merge together to form a continuous spectrum and that

is the probability distribution. The relative frequencies tend to the probabilities for

corresponding values of the variable for large number of events.

Now given the probability distribution curve, we would like to know about the shape and size of

the curve, some specific quantities that are representative of the character of the event.

From a discrete data set to a continuous Prob. distribution:

For any discrete set of data collection, we measure the central tendency of the data set. We

commonly calculate mean, mean of square and variance.

Mean:

=

=

(

) ,

where is the frequency of occurrence for event and we have total frequency, .

[Note:

relative frequency]

Mean of Square:

= (

)

Variance: Var ( ) = =




9

=

(

)

= (

)

*( ) +

Standard deviation is the square root of the variance.

Now for a large number of events, each of the ratios

in the above formulas becomes the

corresponding probability :

as tends to very large.

Therefore, we write the above quantities in terms of probabilities:

If the probabilities , , etc. are known for the values , , and so on, we can say

that we have a discrete probability distribution. When the probabilities are so infinitesimally

closely spaced that we can have probabilities for all possible continuous values of the variable

, we can say that there is a function of which is called continuous probability

distribution function.

[Note: However, in a practical calculation, when instead of probabilities, we are given the frequencies

, , for the quantities that appear in a data set, we calculate mean or average: = (

) .]

Expectation Values:

As the probability distribution (no matter discrete or continuous) for some event or some

population is known, we may expect what its mean value would be, either through

mathematical calculations or through our experience.

*In Statistics, population means entire or all possible set of data. Taking a few data (which we

call sample ) from the population we often try to estimate the mean, which is definitely

different from the population mean. But we know, with the larger and larger sample size, this

Mean,

Mean of Square, =

Variance,

=

Standard deviation




10

mean (which we call sample mean) should tend towards the population mean. This means, we

expect the population mean. More on this aspect will be discussed in the chapter on

Sampling. ]

So, the expectation value, the is mean of . Likewise, we can have expectation value of any power of .

Combination Rules: When we scale a variable that is we multiply a variable by a number or add with this, we need to know

how this scaled variable behaves. Do they have same statistical measures? Do they follow the same kind

of distributions? Also, we ask the same question for two or more variables when scaled and added

together to form a combined variable.

[Continuous case]

[Continuous case]

=

When

Mean:

Variance:

When

Mean:

Variance:

If has a Normal distribution, is also a Normal distribution.

When

Mean:

Variance:

If and are separately Normal distributions, is then also a Normal

distribution.




11

Following the combination rules in the above box, we can solve the following problem.

Example:

The weight of individual people follows Normal distribution, . What will be the

probability distribution of weight of 10 people taking together?

Ans. Here, mean , .

Mean weight of 10 people, + = = 40

Variance, + = = 500

The probability distribution of weight of 10 people taking together, .

Normal Distribution:

For any naturally occurring event, for any

random measurement of any value in any

experiment, the distribution that occurs is

Normal distribution. The bell shaped

symmetric curve is called Normal curve. If

we calculate the height or age distribution

or a distribution IQ level among a

population, the probability distribution

turns out to be Normal. The name normal is given as it occurs normally. In Mathematics or

Physics literature, it is also called Gaussian distribution after the great mathematician, Karl

Fredrick Gauss.

Properties of Normal Distribution:

A Bell shaped Symmetric distribution with the peak at the middle. The distribution curve

is extended from to [from minus infinity to plus infinity].

Mean, Mode and Median at the same position (at the peak).




12

Area under the curve:

Total area under the curve = 100%

A = 68%,

[Area within one standard deviation (

from the mean ( on both sides]

A = 95%,

A = 99.7%,

Normal distribution is most commonly observed and widely used and discussed. There are

various other kinds of distributions which can be identifies by the shapes and

mathematical expressions.

NOTE:

If we combine a set of Normal distributions, we get a Normal distribution as a result. Consider

some -numbers ( ) where each of which are drawn independently from a Normal

distribution. Calculate the mean of the numbers:

. If we draw -numbers again

and again, the mean of them would be different but the mean would follow Normal

distribution, provided the number is sufficiently large. But more interestingly, the individual

distributions from which the numbers are drawn, do not matter, the combination always turn

up to be a Normal distribution. This is Central Limit Theorem.




13

Experiment with rolling dice:

So, here we roll dice, calculate probabilities of occurring numbers and try to establish some

truth!




14

Example #1 Throwing of a single dice:

The chance of turning up of any side is equal which is 1 out of 6. We consider that a priori

probabilities for each case and find out the mean and variance from the following table.

1 2 3 4 5 6 Total

1/6 1/6 1/6 1/6 1/6 1/6 1

1/6 2/6 3/6 4/6 5/6 6/6 21/6

1/6 4/6 9/6 16/6 25/6 36/6 91/6

From the table, we can calculate mean,

and

variance,

If we plot against , we obtain the probability distribution for this case. This distribution is

uninteresting as we can check that the probabilities for all values of are same! The curve

obtained by joining the points will be a horizontal straight line.

Fig.

Now we do this similar experiment taking two dice together.

Example #2 (Two Dice)

We look for the value of which is the sum of two numbers on the top faces of the two dice as

rolled.

Here we shall have possible combinations of events and can have a minimum

value, and maximum value, .




15

2 3 4 5 6 7 8 9 10 11 12 Total

1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1

2/36 6/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 252/36

4/36 18/36 48/36 100/36 180/36 294/36 320/36 324/36 300/36 242/36 144/36 1974/36

Mean,

, Variance,

Now if we plot against taking from

above table, we get an interesting

symmetric distribution around a peak! The

peak is at (mean value).

The distribution is showing a peak at the

middle and it is symmetric!

We can go on doing such experiment taking 3 or more dice together and ask for the sum of

values and the corresponding probabilities as above. It can be understood that the smoothness

of the distribution would be more and more tending towards a definite shape while retaining

the peak at the centre.

[In fact, the envelope of the probability values at different (joining the top of the height bars)

of the discrete distribution will slowly assume a continuous symmetric curve!]

In the limit of large number of events obtained from the large number of dice throwing

together, we tend to get a continuous bell shaped symmetric distribution.

This is Normal Distribution.

For a large number of independent random observations, the probability distribution for the

mean of the observations can be shown to be Normal distribution. This is called Central Limit

Theorem.




16

Shape of a Distribution: Symmetry, Skewness, Kurtosis

Skewness:

A Normal distribution is symmetric around its peak. The peak corresponds to the most probable

value that is the value for which the probability is the maximum. An interesting thing about a

symmetric distribution is that the mean, median and mode are at the same position.

The skewness is any deviation from symmetry or we can say, lack of symmetry. For a symmetric

distribution, skewness is zero.

Coefficient of skewness =

The following mathematical definition is often used to measure the skewness:

Skew =

(

)

,

where is the standard deviation of the distribution. So, we see that the skewness is a

dimensionless quantity.

Skewness can be positive or negative. A distribution with a positive value of skewness is called

positively skewed, which means the tail of the distribution is more extended towards the more

positive values of . On the other hand, a distribution with a negative value of skewness is

called negatively skewed, which means the tail is more extended towards more negative values

(or lowers values) of .

Below are the two figures demonstrating the negative and positive skewness: the distributions

are correspondingly called negative skewed and positively skewed distributions.

(Negative Skewness: Mean < Mode) (Positive Skewness: Mean > Mode)

Kurtosis:




17

Kurtosis is another kind of measure of the shape of the distribution. It tells us about the

peakedness (how the peak looks like) or flatness of the probability distribution.

A Normal distribution is considered as a standard (or benchmark) in this regard. So, any change

of shape of the peak of a distribution (peakedness or flatness) compared to a Normal

distribution is measured.

The mathematical expression for kurtosis:

Kurt =

(

)

Note that the number 3 is subtracted from the expression so as to make the value of kurtosis

for Normal distribution equal to zero. It can be shown that

(

)

= 0 for Normal

distribution.

When kurtosis is positive, the peak of the distribution appears sharper relative to a Normal

distribution. The distribution is then called leptokurtic. One the other hand, when the kurtosis

is negative, we call the distribution mesokurtic. A mesokurtic distribution looks flatter

compared to a Normal distribution. As the distribution looks almost flat on top, it is called

platykurtic.

Fig.

If a distribution has more than one peak

The distribution we discussed (and we shall consider

throughout) is a unimodal distribution that means a

distribution which has a single mode or one peak. But in

many practical cases, we can have a distribution with

many peaks or many modes. For example, a distribution

with two peaks (in the fig. below) is called a bimodal

distribution.

Platykurtic

(Negative kurtosis)




18

Z-Distribution

What is a Z-distribution?

A Z-distribution is nothing but a Normal distribution with the peak (mean) at zero.

The peak of a Normal distribution is generally at a finite value with a standard deviation

(say). If we consider a new variable

the given Normal distribution (of variable) becomes another Normal distribution (of variable ) with the peak value at and this is

then called Z-distribution.

[The derivation of Z-distribution is given in appendix for those who are interested to know.]

For solving problems with Normal distribution, it is often advantageous to obtain a Z-

distribution and then to consult a Z-table.

In the following, we demonstrate with some examples how that is done.

Consider the following typical situations where we have to calculate the areas from Z-

distribution:

Fig.

(Total area under the curve = 1)

Fig.

(Area between and is 0.5 or area between

and is 0.5 because of symmetry)

Fig.

(Area between and any other value )




19

Fig.

(Area between two positive values of or between two negative values)

Fig.

(Area between a negative value and a positive value)

Fig.

(Area less than a negative or greater than a

positive value)

Important:

In the z-score table we always look for the area between zero and any other value (as the

integral is actually done that way). So, zero is always the reference point.

Finally, the area between any two values of is obtained by adding or subtracting the scores

involving zero. This will be clear from the following examples.

Examples:

(Some typical problems are discussed, consult the z-score table given in the appendix.)

#1. In the Geography examination, the marks distribution is known to be Normal where the

mean is 52 and the standard deviation is 15. Determine the z-scores of students receiving

marks: (i) 40, (ii) 95, (iii) 52.

Solution: Here, ,

(i)

(ii)

(iii)

So, we see the z-scores can be negative, positive or zero.




20

#2. Find the area under the normal curve in each of the following cases:

(i) and

Area = 0.3849 from table.

(ii) and

Area = 0.2518

(Note: The area is equal to the area between and as the curve is symmetric.)

(iii) Area between and 2.21

Area = (area between and 2.21) + (area between and -0.46)

= 0.4861 + 0.1772 = 0.6633

(Note: The areas are added as they are on both sides of .)

(iv) Area between and

Required area = (area between and 1.94) (area between and 0.81)

= 0.4738 0.2881 = 0.1857

(Note: There is the subtraction as the two areas are on the same side of .)

(v) To the left of




21

Required area = 0.5 (area between and )

= 0.5 0.2257 = 0.2743

(vi) To the right of

Required area = (area between and ) + 0.5

= (area between and ) + 0.5

= 0.3997 + 0.5 = 0.8997

#3. Among 1000 students, the mean score in the final examination is 25 and the standard

deviation is 4.0. Assume the distribution is Normal. Find the following.

(a) How many students score between 22 and 27?

=25, = 4.0

,

So the probability is the area under the curve between -0.75 and 0.5

= (area between 0 and -0.75) + (area between 0 and 0.5)

= 0.2734 + 0.1915 = 0.4649

The number of students in this marks range =

(b) How many students score above 30?

Probability = area right to

= (area between 0 and 1.25)

= 0.5 0.3944 = 0.1056

The number of students =




22

(c) How many students score below 15?

Area = 0.5 (area between and -2.5) = 0.5 0.4938 = 0.0062

The number of students =

(d) How many score 24?

Here we have to calculate area between 23.5 and 24.5.

,

Area between and

= (area between 0 and ) + (area between 0 and

= 0.1480 0.0517 = 0.0963

The number of students = .

Binomial Distribution

Before we discuss Binomial distribution, we should know certain basic mathematical

operations. For those who are not familiar with some mathematical notations and rules, may

consult the necessary introduction given in the following Box.

Binomial Probability:

Suppose, the probability of occurring a certain event is and not occurring of the event is

. In a total of trials, the particular event occurs times each with probability and

does not occur times each with probability, . Also, we have to know which events

will occur out of total events. The number of ways we can do that is the number of

combinations = ( ) . Consider a variable which is equal to the relative frequency,

.

As the events are considered independent, the joint probability will be




23

The above probability is called binomial probability.

Now consider the following table based on the binomial probability:

..

) (

) (

) --------

Factorial: ! =

For example, !

Consider that factorial of negative integers have no meaning and ! .

Note that we can write ! = !

Permutation: How many different objects can be arranged among themselves? The

answer is the permutation of objects, !

For example, for three objects A, B, C, the different combinations are ABC, ACB, BCA,

BAC, CAB, CBA: total 6 ways = !

Combination: () or =

!

! !

This is the number of ways some objects can be selected from objects.

For example, if we want to know how 2 students can be selected from total 3 students,

the answer is ( )

!

! !

!

! ! .

Also note for quick calculations, ( )

!

! ! = 1, (

) !

! ! and

( )

!

! ! .

(

) (

)




24

If we add all the terms of the second row above, we get the following binomial expansion:

( ) (

) (1)

From the expression (1) above, we can easily check the following known algebraic formulas:

.

= ..

The coefficients of the terms on the right of the above can be arranged in the following

triangular form which is called Pascals triangle:

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

1 7 21 35 35 21 7 1

1 8 28 56 70 56 28 8 1

The Rule:

As indicated above, a number in a row (except the right and left most ones) is the sum of two

numbers on the two sides of the preceding row.

So, from the 8th row in the Pascals triangle we can easily write the binomial expansion:




25

Remember that each term represents a binomial probability. A binomial distribution is a

collection of these discrete binomial probabilities. Note:

Example #1:

Five independent shots are fired at a target. The probability of a hit from each shot is 0.4.

Q. What is the probability that two shots will hit the target?

Ans. Here , , ,

( )

!

! !

Q. What is the probability that there will be more than two hits?

Ans. Prob. = ( ) (

) (

)

= !

! !

!

! !

!

! !

= !

!

!

!

=

Q. What is the expectation value of the hits (that is the mean value of hitting the targets out of

all five shots)?

Ans. For this we have to calculate the probabilities , , ,..for the corresponding number

of hits 0, 1, 2..

The expectation value,

= 0 + ( ) (

)

( ) (

) (

)

=




26

= 0.2592 + 0.6912 + 0.6912 + 0.3072 + 0.0512 = 2.0

Example #2:

Now, imagine a situation where we toss 8 coins together or we toss one coin 8 times

consecutively. We measure the relative occurrence of Head in 8 trials. Let us attach values,

Head = 1 and Tail = 0. So, we can think of a variable which can take values 1/8, 2/8, 3/8,

4/8. and so on. Thus we can associate probabilities for the values of directly from Pascals

triangle (or by using formula). Note that probability of occurring Head, and not-

occurring Head, .

(

)

, (

) (

)

(

) (

)

, (

) (

)

(

) (

)

, (

) (

)

(

) (

)

, (

) (

)

(

)

If we now plot against , we get the following symmetric discrete distribution with the

peak value at .

Fig.

For large number of trails, this distribution becomes Normal distribution. Therefore, we can say

the following:

Binomial Probability distribution for a random variable becomes Normal distribution

for a large number of trials.




27

The Z-Table

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.1 0.03983 0.04380 0.04776 0.05172 0.05567 0.05962 0.06356 0.06749 0.07142 0.07535

0.2 0.07926 0.08317 0.08706 0.09095 0.09483 0.09871 0.10257 0.10642 0.11026 0.11409

0.3 0.11791 0.12172 0.12552 0.12930 0.13307 0.13683 0.14058 0.14431 0.14803 0.15173

0.4 0.15542 0.15910 0.16276 0.16640 0.17003 0.17364 0.17724 0.18082 0.18439 0.18793

0.5 0.19146 0.19497 0.19847 0.20194 0.20540 0.20884 0.21226 0.21566 0.21904 0.22240

0.6 0.22575 0.22907 0.23237 0.23565 0.23891 0.24215 0.24537 0.24857 0.25175 0.25490

0.7 0.25804 0.26115 0.26424 0.26730 0.27035 0.27337 0.27637 0.27935 0.28230 0.28524

0.8 0.28814 0.29103 0.29389 0.29673 0.29955 0.30234 0.30511 0.30785 0.31057 0.31327

0.9 0.31594 0.31859 0.32121 0.32381 0.32639 0.32894 0.33147 0.33398 0.33646 0.33891

1.0 0.34134 0.34375 0.34614 0.34849 0.35083 0.35314 0.35543 0.35769 0.35993 0.36214

1.1 0.36433 0.36650 0.36864 0.37076 0.37286 0.37493 0.37698 0.37900 0.38100 0.38298

1.2 0.38493 0.38686 0.38877 0.39065 0.39251 0.39435 0.39617 0.39796 0.39973 0.40147

1.3 0.40320 0.40490 0.40658 0.40824 0.40988 0.41149 0.41308 0.41466 0.41621 0.41774

1.4 0.41924 0.42073 0.42220 0.42364 0.42507 0.42647 0.42785 0.42922 0.43056 0.43189

1.5 0.43319 0.43448 0.43574 0.43699 0.43822 0.43943 0.44062 0.44179 0.44295 0.44408

1.6 0.44520 0.44630 0.44738 0.44845 0.44950 0.45053 0.45154 0.45254 0.45352 0.45449

1.7 0.45543 0.45637 0.45728 0.45818 0.45907 0.45994 0.46080 0.46164 0.46246 0.46327

1.8 0.46407 0.46485 0.46562 0.46638 0.46712 0.46784 0.46856 0.46926 0.46995 0.47062

1.9 0.47128 0.47193 0.47257 0.47320 0.47381 0.47441 0.47500 0.47558 0.47615 0.47670

2.0 0.47725 0.47778 0.47831 0.47882 0.47932 0.47982 0.48030 0.48077 0.48124 0.48169

2.1 0.48214 0.48257 0.48300 0.48341 0.48382 0.48422 0.48461 0.48500 0.48537 0.48574

2.2 0.48610 0.48645 0.48679 0.48713 0.48745 0.48778 0.48809 0.48840 0.48870 0.48899

2.3 0.48928 0.48956 0.48983 0.49010 0.49036 0.49061 0.49086 0.49111 0.49134 0.49158

2.4 0.49180 0.49202 0.49224 0.49245 0.49266 0.49286 0.49305 0.49324 0.49343 0.49361

2.5 0.49379 0.49396 0.49413 0.49430 0.49446 0.49461 0.49477 0.49492 0.49506 0.49520

2.6 0.49534 0.49547 0.49560 0.49573 0.49585 0.49598 0.49609 0.49621 0.49632 0.49643

2.7 0.49653 0.49664 0.49674 0.49683 0.49693 0.49702 0.49711 0.49720 0.49728 0.49736

2.8 0.49744 0.49752 0.49760 0.49767 0.49774 0.49781 0.49788 0.49795 0.49801 0.49807

2.9 0.49813 0.49819 0.49825 0.49831 0.49836 0.49841 0.49846 0.49851 0.49856 0.49861

3.0 0.49865 0.49869 0.49874 0.49878 0.49882 0.49886 0.49889 0.49893 0.49896 0.49900




28

Sampling

Basic Concept:

What is sampling?

Sampling is to take a subsection of the population for a particular study. The aim is to

select the data sample in order to represent the total data set.

In statistics, population means the total collection of data. When the population or the

entire collection of data is studied, it is called census.

In short, population is the total set and the sample is the subset of it.

Why the sampling is done?

When the number of elements in a population is large it is often not possible to

investigate the population completely due to lack of time, money and resources. This is

why the sampling is necessary.

Sampling is done in such a way that the subset of data represents the entire set.

Example:

If a TV channel wants to know the popularity of a program it would be expensive to ask

everybodys opinion. Instead a subsection of viewers are interviewed and the data is

collected.

Methods of Sampling:

A sample of size means there are -data points in the collection. A sample of size is

collected from a population of size in such a way that all the features of the population are

well represented by this.

If a sampling method does over-represent or under-represent a feature of the population it is

said to be biased. The aim of any selection method is to reduce the chance of bias as far as

possible.

There are several methods of sampling; among them the most common is the random

sampling.

Random sampling:

For a sample of size , we collect -data from the population. We collect many such

samples for our evaluation. If this is done randomly so that each group of size taken




29

from the population has equal chance of getting selected, we call this random sampling.

Sometimes, it is called simple random sampling.

For a random sampling, the successive drawings have to be independent.

Let us suppose, we want to select a sample of size 100 from a population of size 10000.

In case of random sampling, we select the elements (that is which element is to be

picked) with the help of a random number (generated in a computer) or by consulting a

random number table or by some kind of dice throwing.

Systematic Sampling:

If simple random sampling from population is not possible, the systematic sampling may

be done. First, population is enumerated from 1 onwards. If sample size of from a

population of size is to be obtained, every

-th item is selected. First a random

number between 1 and is selected and then it is taken as the 1st element. After this

every -th element is taken.

Example:

Follow the table given below.

Sl no. value

1 20

2 27

3 33

4 21

5 15

6 22

7 45

8 13

9 32

10 29

11 10

12 16

For a sample of size

Select a random number between 1-

3: choose 2, for example.

Start with #2 and then take 5, 8, 11

number data.




30

Stratified Sampling:

In this method, the population is first divided into groups (strata). Each element of the

sample belongs to one such group.

Divide the population into non-overlapping groups each containing , data such

that . Next do the simple random sampling to collect one or

a few elements from each group.

Suppose, a population is classified into several groups according to age or something

like that. Then from each group random samples are collected.

Note: This is also called restricted random sampling.

Cluster Sampling:

In this method, like before, the population is divided into groups called clusters. Then

clusters are taken randomly and the elements are collected from them as sample.

Probability sampling

Any method of sampling that uses (probabilistically) random selection is in general

called probability sampling.

Sampling variation:

When sampling from a population is done, we take not one sample but different sets of

samples having same size. If the samples are different, we call this sampling variation.

Usually in practice, we often draw only one sample or one set of data from a population.

But we may not be sure what may happen in case we draw several other samples. Will

we get the same result? The answer is No. If we look for mean value, we see that the

mean is not the same for all the samples that we are able to draw. We then get some

distributions of the sample means.

population size, sample size, = the sample fraction.

Many samples of the same size yield a sampling distribution.

The sampling distributions are usually assumed to follow any well-known probability

distribution.

We look for various properties from the distribution curves.

It is seen how the variation of sample size can affect the properties.




31

From the experience and theory, we can say that the variability of sampling

distributions decreases with sample size.

SAMPLING DISTRIBUTIONS

What do you do after the sample is collected?

The first thing one can do with a set of data is to measure the central tendency of it. Usually, we

calculate the mean and variance.

The calculation of mean (or variance) is done over many samples of same sizes. Let us suppose,

we have collected -samples of same size. The mean values , , .of the

various samples are calculated. It is assumed that the grand mean of all these mean values is

the actual sample mean, .

The mean of the sample means is the estimate of the population mean. Similarly, the variance

of the mean values calculated from the set of samples (of equal size) is an estimate of the

population variance.

It can be shown:

Hypothesis Testing

What is Hypothesis?

On the basis of sample information, we make certain decisions about the population. In taking

such decisions we make certain assumptions. These assumptions are known as statistical

hypothesis.

[ Note: A collected set of data points which is a part of the population (a few number of data)

is called a sample. The process of selection is called sampling. When all the data are considered

for a study, this is called population.]

Sample mean is the unbiased estimate of population mean, .

For the population variance, the unbiased estimate is




32

How to test Hypothesis?

Assuming the hypothesis correct, we calculate the probability of getting the observed sample. If

this probability is less than a certain assigned value, the hypothesis is rejected.

If there is no significant difference between the observed value and the expected value, the

hypothesis is called Null Hypothesis.

Test of significance:

The tests which enable us to decide whether to accept or to reject the null hypothesis are called

the tests of significance. If the differences between the sample values and the population

values are significantly large it is to be rejected (i.e., Hypothesis is not Null).

It is known that the mean of a sample is an unbiased estimate of the population mean . It is called point estimate. But we know, if we collect different samples, the mean ( ) varies from sample to sample. Mean of samples form a distribution which we call sampling distribution. Note that the sampling distribution is Normal if the variable in the population is normally

distributed.

Now the question is, how close is a calculated mean to the population mean? We have to

estimate that with some level of accuracy.

Confidence Interval:

Confidence interval is a range of values over which we can trap the population mean with some

probability. So, we consider the probability distribution of sample means in order to find that

probability of trapping.

Suppose, we have a sample mean and we consider a symmetric interval around this:

where is a value that we shall determine.

If | | , the confidence interval traps the population mean .

How to calculate confidence interval?




33

Suppose, the variable follows a Normal distribution, with mean and standard deviation .

Symbolically,

So, for a sample if size , (

).

This mean that the distribution of mean ( ) of sample size follows a Normal distribution with

mean and standard deviation .

If the confidence interval is 95%, the interval has a probability 0.95 to trap the population

mean: | |

Now as an example, consider a sampling distribution with , .

Here follows z-distribution, Z . [Normal distribution with mean = 0, stand dev. = 1]

Now let us look up the z-table. The total area under the curve is 100% which gives us the total

probability = 1. The shaded area (as in the fig.) is 95% of the total area which corresponds to

probability = 0.95.

The half of the shaded area = 0.95/2 = 0.475 as it is symmetric around zero.

In the z-distribution [ , we now find the value of from z-table, where the

area from to is 0.475.

as we consider the critical value, .




34

Thus 95% confidence interval:

If the sample mean is , the confidence interval:

So we can say with 95% confidence level that the population mean can be in this interval.

Let us now calculate the width of the confidence interval for 95% confidence:

So, we can see that the interval decreases with the increase of sample size. That is we can

narrow down the search of the population mean as we take larger sample size. Then we can say

with more accuracy that our measured mean is closer to the population mean.

For example, for ,

For ,

,

For 98% Confidence interval:

Shaded area = 0.98. Half the shaded area = 0.98/2 =0.49 which is between and

Thus 98% confidence interval is *

+.

NOTE #1:

Symbolically, it often said that the confidence level is , where .

This also means significant level.

NOTE: For a sample of size = , with population variance , a 95% confidence

interval means *

+




35

For example, for Confidence level = Significance level = Confidence level and significance levels are complimentary.

NOTE #2:

When we are not sure if the population is Normal and we do not know the population variance

, we can still use the method of calculating the confidence interval by considering the

variance of a large sample (usually ).

(

)

Then we consider the interval, *

+.

Students T-test:

This is applied to find confidence interval for a small sample. The population is Normal.

Consider the variable defined as

[Note here, we use , calculated for the sample, instead of .]

The values of the variable varies from sample to sample and thus it forms a distribution

looking very similar to Normal distribution. This is t-distribution. As we take larger and larger

samples, the t-distributions more and more become closer to a Normal distribution, ,

which is nothing but z-distribution.

Confidence

Level

z

90% 1.645

95% 1.96

98% 2.326

99% 2.576




36

Now instead of sample size, the family of distributions are characterized by a parameter called

degrees of freedom (df), usually denoted by [Nu].

Degrees of freedom = No. of independent values used for calculation of .

For example, if is the sample size, we use -data points but they are related by their mean,

or . Such a condition in the form of a relation or equation is called a

constraint. Thus we have independent quantities and this is degrees of freedom here.

Degrees of freedom, Number of values Number of constraints

In this case,

The -distributions are now designated as -distributions. As is higher the -distributions

tend more and more towards z-distribution.

Like z-table, we now have -table to consult, from where we have the area under the curve with

some -range.

So, for a Normal distribution, for a sample of size , we have confidence interval:

*

+ for a confidence level, for -degrees of freedom.




37

EXAMPLE #1:

Consider the following 10 measurements of some variable. The hypothesis is that the

population mean is . We have to verify that. Assume that the readings follow a Normal

Distribution.

No. of

Obs.

1 2 3 4 5 6 7 8 9 10

Values 0.13 -0.09 0.06 0.15 -0.02 0.03 0.01 -0.02 -0.07 0.05

(

)

(

)

Degrees of freedom,

From -table for with 95% confidence level, we have .

Confidence interval: *

+

The mean is trapped inside the above interval. So the hypothesis is right. Null

hypothesis.

EXAMPLE #2:

The mean life time (in Hours) of an electric bulb is measured to be 10.4. Now a technology is

introduced to increase the life time. The experimental data collected from a random sample of

size , , . Test whether there is any evidence at the 10%

significance level that the new technology has actually increased the life time.

[Note that it is not asked if there is any decrease in life time. The question is to ask whether

there is any increase or it remains the same.]

Ans.

Null hypothesis, , Alternate hypothesis,

Here we consider one tail t-test as we are to look for the increase only.

Sample mean,

Unbiased estimate of the population variance (from the sample),

(

)

*

+




38

For the t-test,

Here, degrees of freedom, . So we look for area under the curve for

distribution.

For 10% significance, i.e. for 90% confidence level, we find . Thus our

observed value lies in the rejection region. That means that the mean life time is increased.

Alternate hypothesis.

EXAMPLE #3:

You are measuring some length which is 10 cm. Five measurements by you are 9.88, 10.18.

10.23, 10.39, 10.25 cm. Assume that the measurements follow a Normal distribution. Test at

the 5% significance level whether there results support the claim or it is biased.

Ans.

Since the bias can be in either direction (positive or negative), we consider two tail test.

The Hypothesis, Null cm, Alternate cm.

Sample mean,

Variance,

(

)

*

+ , this is an unbiased

estimate of the population mean.

For 5% significance level, we consider the area of 0.95 (shaded area in the fig.) around the

centre, and an area of 0.025 on both sides (at both the tails).




39

We consider distribution as the degrees of freedom, . The rejection region on

either sides corresponds to , from the table.

Here we find that the t-value is below the rejection region that is in the acceptance region. Thus

the hypothesis ( cm) is accepted. Null hypothesis.

Chi-Squared Test: ( -Test)

In some measurement, we obtain the frequencies of some events. We call them observed

frequencies ( ). We have to test whether the observed frequencies are consistent with the

expected frequencies according to some given distribution or hypothesis.

The measure of discrepancy between the observed and expected frequencies is defined by the

following quantity:

NOTE:

In one tail t-test, we consider only one side of the t-distribution, either on the right side (for

increase or positive values) or on the left side (for decrease or negative values). For two tail t-

test, we consider both sides of the distribution (as we have done before) considering the fact that

the value of the variable can increase or decrease from the mean value.




40

Note: (Chi-square) is a positive quantity, lower its value better is the agreement between the

observed and expected frequencies. In other words, it gives a goodness of fit of the model or

hypothesis. For , the agreement is absolute.

Like t-distribution, we do also have -distribution. We measure the values for different samples of

same size and obtain a distribution. The distribution, here also, is characterized by the degrees of

freedom . So for we write ,

EXAMPLE #1:

In a dice throw experiment, we obtain the following fig. where the dice was thrown 600 times.

Score 1 2 3 4 5 6

Freq. 90 108 110 95 100 97

Let us check the above with respect to -test. Our hypothesis is that for each score, the

probability = 1/6 (for a fair dice). So

the expected frequency =

.

Hypothesis, the dice is fair,

the dice is not fair.

In this example, the degrees of

freedom, .

So after we calculate from the

following table, we have to look for the

-table.

Score

1 90 100 -10 1

2 108 100 8 0.64

3 110 100 10 1

4 95 100 -5 0.25

5 100 100 0 0

6 97 100 -3 0.09

Total 600 600 0 2.98




41

To calculate :

From the table, we see . If we consider 90% confidence level, we have ( )

. Our obtained value for is below this. So it falls within the acceptance region. The dice is fair,

the hypothesis null.

EXAMPLE #2:

In a genetic study, it is predicted that the children with both parents of blood group AB will fall

into blood groups AB, A and B in the ratio 2:1:1. Out of a random sample 100, we find 55

children have blood group AB, 27 have blood group A and 18 blood group B. Test at 10%

significance level whether the observed results agree with the theoretical prediction.

Ans.

Hypothesis The childrens blood group is in ratio 2:1:1

The childrens blood group is NOT in ratio 2:1:1

The ratio of probabilities AB, A, B is 2:1:1 =

Degrees of freedom

For 10% significance level we look for -distribution table: (

)

We find

The rejection region is thus above .

Here the obtained value of is below the rejection region, so it falls in the acceptance region.

The hypothesis is correct. Null Hypothesis!

EXAMPLE #3

The rain fall ( ) at some place is measured in cm in the following table. We assume that is a

random variable and it follows a Normal distribution with mean and standard deviation

.

(i) Calculate the expected frequencies of the different classes

Blood

group

AB 55 50 5 0.5

A 27 25 2 0.16

AB 18 25 -7 1.96

Total 100 100 0 2.62

65

Obs. Freq. 10 18 28 18 12




42

(ii) Carry out a goodness of fit analysis to test at the 5% level of significance and test the

hypothesis that the random variable actually follows the Normal distribution

.

Ans.

(i)

For 35, 45, 55, 65 we have -1, -0.333,

0.333, 1 respectively.

Now Follow z-table.

For , we have

,

Expected frequency =

Here, total frequency =

For ,

,


For ,

,


By symmetry, the expected frequencies for the 4th and 5th groups are 18.14 and 13.65

respectively.

To carry out -test we prepare the following table.




43

Class

65 12 13.65 -1.65 0.2

Total 86 86.01 0

Here,

From the -distribution table, , for 5% significance level.

Since, 2.56 is not in the rejection region, the data follows Normal distribution, .

Null hypothesis.

Additional Information

Type I and Type II errors:

In case of Hypothesis testing, we call

Type I Error -> When we incorrectly reject the true Null Hypothesis.

Type II Error -> When we fail to reject the false Null Hypothesis.

Probability Density Function:

In Probability theory, the probability density function (P.D.F.) of a continuous random variable

is the probability around a certain value or probability in a unit interval. P.D.F. when integrated

over a finite interval gives the cumulative probability.

, P.D.F.




44

*The Lecture notes are for private circulation only. Some of the ideas and examples are

taken from the some books and numerous materials available in internet.

Books and Websites:

1. Advanced Level Mathematics: STATISTICS 1 & 2 Steve Dobbs and Jane Miller

(Pub: CAMBRIDGE International Examinations)

2. The Analysis of Time Series An Introduction (fifth edition) C. Chatfield (Pub:

Chapman & Hall)

3. Basic & Clinical Biostatistics (fourth edition) Beth Dawson, Robert G. Trapp (Pub: Lange

Medical Books/ McGraw-Hill)

4. Numerical Recipes (2nd Ed, Vol I FORTRAN) William H. Press, Saul A. Teukolsky, William

T. Vetterling, Brian P. Flannery (Pub: Cambridge University Press)

5. Mathematical Physics, H.K. Dass

6. Website: people.richland.edu

a short course on probability and sampling

Documents