binomial distribution

41
Binomial distribution From Wikipedia, the free encyclopedia "Binomial model" redirects here. For the binomial model in options pricing, see Binomial options pricing model. See also: Negative binomial distribution The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is . In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test ofstatistical significance. The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

Upload: kimberly-cross

Post on 11-Jan-2016

13 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Binomial Distribution

Binomial distributionFrom Wikipedia, the free encyclopedia"Binomial model" redirects here. For the binomial model in options pricing, see Binomial options pricing model.

See also: Negative binomial distribution

The probability that a ball in a Galton box with 8 layers

(n = 8) ends up in the central bin (k = 4) is  .

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test ofstatistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

Binomial distribution for 

with n and k as in Pascal's triangle

Definition: A polynomial equation with two

terms usually joined by a plus or minus sign is

called a binomial. Binomials are used in

algebra. Polynomials with one term will be

binomial

Probability mass function

Cumulative distribution function

Notation B(n, p)

Parameters n ∈ N0 — number of trials

p ∈ [0,1] — success probability in each trial

Support k ∈ { 0, …, n } — number of successes

pmf

CDF

Mean

Median  or 

Mode  or 

Variance

Skewness

Page 2: Binomial Distribution

called a monomial and could look like 7x. A polynomial with two terms is called a binomial, it could look

like 3x + 9. It is easy to remember binomials as bi means 2 and a binomial will have 2 terms.

Examples: 3x + 4 is an example of a binomial.

A binomial is a polynomial.

Examples of Polynomials

3x -2

Monomials: 3x, -2

5xy2 + 36xy + 52x + 6

Monomials: 5xy2, 36xy, 52x, 6

Πr2 + 2Πh

Monomials: Πr2, 2Πh

When to Multiply Polynomials

The instructions will ask you to multiply or simplify exercises that look like this:

(Polynomial) × (Polynomial)

(Polynomial) • (Polynomial)

(Polynomial) * (Polynomial)Ads

ACLS Certification Coupon

www.aclscertification.com

Certify Today for a 15% Discount! The ACLS Certification Institute™

UNIQLO MY Online Store

uniqlo.com/my/LimitedOffer

Shop Our Specially Priced Limited Offers Every Week. Check It Out!

Pinjaman Utk Gaji Rendah

ringgitplus.com/RHB-Easy-Loan

Kelulusan Dalam 10 Minit Sahaja. Pinjaman Max RM50k. Gaji Min RM1.5k

Page 3: Binomial Distribution

Algebra

Free Algebra Worksheets

Math Help

Math Tutor Online

Multiplying Polynomials

(Polynomial)x

(Polynomial)(Polynomial)Note:  When there is no multiplication symbol between 2 sets of parentheses, realize that you are being called to multiply.When to Not Multiply Polymonials(Polynomial) + (Polynomial)

(Polynomial)  − (Polynomial)

Yes, I understand that parentheses encompass the polynomials, but pay attention to what the exercise is asking you to do.

 (3x + 5y) + (2x +-6y) does not equal (3x + 5y) (2x +-6y).

Practice with Constants

Multiply:  (8 + 6)(-2 + 5)

Use order of operations:

1. Parentheses

(8+ 6) = 14

(-2 + 5) = 3

2. Multiply

14 * 3 = 42

Introducing FOIL

Here’s another way of looking at it:

FOIL is method to multiply polynomials. It is an acronym for First, Outer, Inner, Last1. First: (8 + 6)(-2 + 5)Multiply the first terms: 8 * -2 = -162. Outer: (8 + 6)(-2 + 5)Multiply the outer terms: 8 * 5 = 403. Inner: (8 + 6)(-2 + 5)Multiply the inner terms: 6 * -2= -12

Page 4: Binomial Distribution

4. Last: (8 + 6)(-2 + 5)Multiply the outer terms: 6 * 5 = 30Next, add the results:-16 + 40 + -12 + 30

Simplify:-16 + 40 + -12 + 30 = 42

Now, let's practice multiplying polynomials with variables.

Practice with Positives

Simplify the following polynomials.

(x + 5)(x + 4)

1.

2. First. Outer. Inner. Last.

x * x + x * 4 + 5 * x + 5 * 4

3.

4. Multiply.

x2 + 4x + 5x + 20

5.

6. Simplify.

x2 + 9x + 20

Practice with Negatives

Simplify the following polynomials.

(x - 5)(x - 4)

1.

2. Before you start FOILing, change the negative signs:

(x + -5)(x + -4)

3.

4. Now FOIL: First. Outer. Inner. Last.

x * x + x * -4 + -5 * x + -5 * -4

5.

6. Multiply.

x2 + -4x + -5x + 20

7.

Page 5: Binomial Distribution

8. Simplify.

x2 + -9x + 20 

Practice Exercises

1) (x + 3)(x - 3) =

 

2) (x - 6)(x + 4) =

 

3) (x - 8)(x - 94) (5j + 11)(j + 1)=

 5) (5p - 7)(4p + 3)=

Word Origin and History for binomialExpand1550s (n.); 1560s (adj.), from Late Latin binomius "having two personalnames," a hybrid fro

m bi- (see bi- )

+ nomius, from nomen (see name(n.)). Taken up 16c. in the algebraic sense "consisting of t

wo terms."

binomial distribution  

Definition Add to Flashcards Save to Favorites See Examples

Frequency distribution where only two (mutually exclusive) outcomes are possible, such as better

or worse, gain or loss, head or tail, rise or fall, success or failure, yes or no. Therefore, if

the probability of success in any given trial is known, binomial distributions can be employed to

compute a given number of successes in a given number of trials. And it can be determined if

an empirical distribution deviates significantly from a typical outcome. Also called Bernoulli

distribution after its discoverer, the Swiss mathematician Jacques Bernoulli (1654-1705). See

also poisson distribution.

The binomial probability distribution is useful when a total of n independent trials are conducted and we

Page 6: Binomial Distribution

want to find out the probability of r successes, where each success has probability p of occurring. There

are several things stated and implied in this brief description. The definition boils down to these four

conditions:

1. Fixed number of trials

2. Independent trials

3. Two different classifications

4. Probability of success stays the same for all trials5. All of these must be present in the process under investigation in order to use the binomial

probability formula or tables. A brief description of each of these follows.

6. Fixed Trials

7. The process being investigated must have a clearly defined number of trials that does not vary.

We cannot alter this number midway through our analysis. Each trial must be performed the

same way as all of the others, although the outcomes may vary. The number of trials is indicated

by an n in the formula.

8. An example of this studying the outcomes from rolling a die for 11 times. The total number of

times that each trial (roll) is conducted is defined from the outset.

9. Independent Trials

10. Each of the trials have to be independent. Each trial should have absolutely no effect on any of

the others. The classical examples of rolling two dice or flipping several coins illustrate

independent events. Since the events are independent we are able to use the multiplication

rule to multiply the probabilities together.

11. In practice, especially due to some sampling techniques, there can be times when trials are not

technically independent.

12. A binomial distribution can sometimes be used in these situations as long as the population is

larger relative to the sampl

Poisson approximation[edit]

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity

while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an

approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According

to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[10]

Limiting distributions[edit]

Poisson limit theorem : As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at

least np approaches λ > 0, then the Binomial(n, p) distribution approaches the Poisson

distribution with expected value λ.[10]

de Moivre–Laplace theorem : As n approaches ∞ while p remains fixed, the distribution of

Page 7: Binomial Distribution

approaches the normal distribution with expected value 0 and variance 1.[citation needed] This result is

sometimes loosely stated by saying that the distribution of X isasymptotically normal with expected

value np and variance np(1 − p). This result is a specific case of the central limit theorem.

Beta distribution[edit]

Beta distributions provide a family of conjugate prior probability distributions for binomial distributions

in Bayesian inference. The domain of the beta distribution can be viewed as a probability, and in fact

the beta distribution is often used to describe the distribution of a probability value p:[11]

.

Confidence intervals[edit]

Main article: Binomial proportion confidence interval

Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.[12] Because of this problem several methods to estimate confidence intervals have been

proposed.

Let n1 be the number of successes out of n, the total number of trials, and let

be the proportion of successes. Let zα/2 be the 100(1 − α/2)th percentile of the standard

normal distribution.

Wald method

A continuity correction of 0.5/n may be added.[clarification needed]

Agresti-Coull method[13]

Here the estimate of p is modified to

Page 8: Binomial Distribution

ArcSine method[14]

Wilson (score) method[15]

The exact (Clopper-Pearson) method is the most

conservative.[12] The Wald method although commonly

recommended in the text books is the most biased.[clarification needed]

Normal approximation[edit]

Binomial probability mass function and normal probability density functionapproximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation

to B(n, p) is given by the normal distribution

and this basic approximation can be improved in a simple way by using a suitable continuity correction.

The basic approximation generally improves as n increases (at least 20) and is better when p is not near to

0 or 1.[8] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from

the extremes of zero or one:

One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies

from source to source, and depends on how good an approximation one wants; some sources give 10

which gives virtually the same results as the following rule for large n until n is very large (ex: x=11,

n=7752).

A second rule[8] is that for n > 5 the normal approximation is adequate if

Page 9: Binomial Distribution

Another commonly used rule holds that the normal approximation is appropriate only if everything

within 3 standard deviations of its mean is within the range of possible values,[citation needed] that is if

The following is an example of applying a continuity correction. Suppose one wishes to calculate

Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal

approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity

correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when

undertaking calculations by hand (exact calculations with large n are very onerous); historically, it

was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine

of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit

theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with

parameter p. This fact is the basis of a hypothesis test, a "proportion z-test", for the value

of p using x/n, the sample proportion and estimator of p, in a common test statistic.[9]

For example, suppose one randomly samples n people out of a large population and ask them

whether they agree with a certain statement. The proportion of people who agree will of course

depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the

proportions would follow an approximate normal distribution with mean equal to the true

proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2.

Poisson approximation[edit]

The binomial distribution converges towards the Poisson distribution as the number of trials goes

to infinity while the product np remains fixed. Therefore the Poisson distribution with

parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is

sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is

good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[10]

Limiting distributions[edit]

Poisson limit theorem : As n approaches ∞ and p approaches 0 while np remains fixed at

λ > 0 or at least np approaches λ > 0, then the Binomial(n, p) distribution approaches

the Poisson distribution with expected value λ.[10]

de Moivre–Laplace theorem : As n approaches ∞ while p remains fixed, the distribution of

Page 10: Binomial Distribution

approaches the normal distribution with expected value 0 and variance 1.[citation needed] This result is

sometimes loosely stated by saying that the distribution of X isasymptotically normal with expected

value np and variance np(1 − p). This result is a specific case of the central limit theorem.

Beta distribution[edit]

Beta distributions provide a family of conjugate prior probability distributions for binomial

distributions in Bayesian inference. The domain of the beta distribution can be viewed as

a probability, and in fact the beta distribution is often used to describe the distribution of

a probability value p:[11]

Poisson Distribution

A Poisson distribution is the probability distribution that results from a Poisson experiment.

Attributes of a Poisson Experiment

A Poisson experiment is a statistical experiment that has the following properties:

The experiment results in outcomes that can be classified as successes or failures.

The average number of successes (μ) that occurs in a specified region is known.

The probability that a success will occur is proportional to the size of the region.

The probability that a success will occur in an extremely small region is virtually zero.

Note that the specified region could take many forms. For instance, it could be a length, an area, a

volume, a period of time, etc.

Notation

The following notation is helpful, when we talk about the Poisson distribution.

e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural

logarithm system.)

μ: The mean number of successes that occur in a specified region.

x: The actual number of successes that occur in a specified region.

P(x; μ): The Poisson probability that exactly x successes occur in a Poisson

experiment, when the mean number of successes is μ.

Page 11: Binomial Distribution

Poisson Distribution

A Poisson random variable is the number of successes that result from a Poisson experiment.

Theprobability distribution of a Poisson random variable is called a Poisson distribution.

Given the mean number of successes (μ) that occur in a specified region, we can compute the Poisson

probability based on the following formula:

Poisson Formula. Suppose we conduct a Poisson experiment, in which the average

number of successes within a given region is μ. Then, the Poisson probability is:

P(x; μ) = (e-μ) (μx) / x!

where x is the actual number of successes that result from the experiment, and eis

approximately equal to 2.71828.

The Poisson distribution has the following properties:

The mean of the distribution is equal to μ .

The variance is also equal to μ .

Example 1

The average number of homes sold by the Acme Realty company is 2 homes per day. What is the

probability that exactly 3 homes will be sold tomorrow?

Solution: This is a Poisson experiment in which we know the following:

μ = 2; since 2 homes are sold per day, on average.

x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.

e = 2.71828; since e is a constant equal to approximately 2.71828.

We plug these values into the Poisson formula as follows:

P(x; μ) = (e-μ) (μx) / x! 

P(3; 2) = (2.71828-2) (23) / 3! 

P(3; 2) = (0.13534) (8) / 6 

P(3; 2) = 0.180 

Thus, the probability of selling 3 homes tomorrow is 0.180 .

Page 12: Binomial Distribution

Poisson Calculator

Clearly, the Poisson formula requires many time-consuming computations. The Stat Trek

Poisson Calculator can do this work for you - quickly, easily, and error-free. Use the Poisson

Calculator to compute Poisson probabilities and cumulative Poisson probabilities. The

calculator is free. It can be found under the Stat Tables tab, which appears in the header of

every Stat Trek web page.

Poisson

Calculator

Cumulative Poisson Probability

A cumulative Poisson probability refers to the probability that the Poisson random variable is

greater than some specified lower limit and less than some specified upper limit.

Example 1

Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists

will see fewer than four lions on the next 1-day safari?

Solution: This is a Poisson experiment in which we know the following:

μ = 5; since 5 lions are seen per safari, on average.

x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than

4 lions; that is, we want the probability that they will see 0, 1, 2, or 3 lions.

e = 2.71828; since e is a constant equal to approximately 2.71828.

To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus, we

need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute this

sum, we use the Poisson formula:

P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)

P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ] 

P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] + [ (0.006738)(125) /

6 ] 

P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ] 

P(x < 3, 5) = 0.2650

Thus, the probability of seeing at no more than 3 lions is 0.2650.

Page 13: Binomial Distribution

Statistics

The Poisson Distribution

In the picture above are simultaneously portrayed several Poisson distributions. Where the rate of occurrence of some event, r (in this chart called lambda or ) is small, the range of likely possibilities will lie near the zero line. Meaning that when the rate r is small, zero is a very likely number to get. As the rate becomes higher (as the occurrence of the thing we are watching becomes commoner), the center of the curve moves toward the right, and eventually, somewhere around r = 7, zero occurrences actually become unlikely. This is how the Poisson world looks graphically. All of it is intuitively obvious. Now we will back up a little and begin over, with you and your mailbox.

Suppose you typically get 4 pieces of mail per day. That becomes your expectation, but there will be a certain spread: sometimes a little more, sometimes a little less, once in a while nothing at all. Given only the average rate, for a certain period of observation (pieces of mail per day, phonecalls per hour, whatever), and assuming that the process, or mix of processes, that produce the event flow are essentially random, the Poisson Distribution will tell you how likely it is that you will get 3, or 5, or 11, or any other number, during one period of observation. That is, it predicts the degree of spread around a known average rate of occurrence. (The average or likeliest actual occurrence is the hump on each of the Poisson curves shown above). For small values of p, the Poisson Distribution can simulate the Binomial Distribution (the pattern of Heads and Tails in coin tosses), and it is much easier to compute.

Application Derivation  (on separate page) Computing Poisson Probabilities The Classic Example Features Poisson Paper Types of Problem Approximation to Binomial Summary

Page 14: Binomial Distribution

End Matter

Application

The Poisson distribution applies when: (1) the event is something that can be counted in whole numbers; (2) occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another; (3) the average frequency of occurrence for the time period in question is known; and (4) it is possible to count how many events have occurred, such as the number of times a firefly lights up in my garden in a given 5 seconds, some evening, but meaningless to ask how many such events have not occurred. This last point sums up the contrast with the Binomial situation, where the probability of each of two mutually exclusive events (p and q) is known. The Poisson Distribution, so to speak, is the Binomial Distribution Without Q. In those circumstances, and they are surprisingly common, the Poisson Distribution gives the expected frequency profile for events. It may be used in reverse, to test whether a

given data set was generated by a random process. If the data fit the Poisson Expectation closely, then there is no strong reason to believe that something other than random occurrence is at work. On the other hand, if the data are lumpy, we look for what might be causing the lump.

The Poisson situation is most often invoked for rare events, and it is only with rare events that it can successfully mimic the Binomial Distribution (for larger values of p, the Normal Distribution gives a better approximation to the Binomial). But the Poisson rate may actually be any number. The real contrast is that the Poisson Distribution is asymmetrical: given a rate r = 3, the range of variation ends with zero on one side (you will never find "minus one" letter in your mailbox), but is unlimited on the other side (if the label machine gets stuck, you may find yourself some Tuesday with 4,573 copies of some magazine spilling all over your front yard - it's not likely, but you can't call it impossible). The Poisson Distribution, as a data set or as the corresponding curve, is always skewed toward the right, but it is inhibited by the Zero occurrence barrier on the left. The degree of skew diminishes as r becomes larger, and at some point the Poisson Distribution becomes, to the eye, about as symmetrical as the Normal Distribution. But much though it may come to resemble the Normal Distribution, to the eye of the person who is looking at a graph for, say, r = 35, the Poisson is really coming from a different kind of world event.

History. The Poisson Distribution is named for its discoverer, who first applied it to the deliberations of juries; in that form it did not attract wide attention. More suggestive was Poisson's application to the science of artillery. The distribution was later and independently discovered by von Bortkiewicz, Rutherford, and Gosset. It was von Bortkiewicz who called it The Law of Small Numbers, but as noted above, though it has a special usefulness at the small end of the range, a Poisson Distribution may also be computed for larger r. The fundamental trait of the Poisson is its asymmetry, and this trait it preserves at any value of r.

Page 15: Binomial Distribution

Derivation. The Poisson Distribution has a close connection with the Binomial, Hypergeometric, and Exponential Distributions, and can be derived as an extreme case of any of them. The Poisson can also be derived from first principles, which involve the growth constant . That derivation is given on a separate page, for those who like to see the inner workings of the universe up close. Other readers may proceed directly to the how-to-do-it instructions in the next section.

Computing Poisson Probabilities

We found, on the Derivation page, that when the average rate of occurrence of some event per module of observation is r, we can calculate the probability of any given number of actually observed occurrences, k, by substituting in the formula

p(k) = r*k / (k!)(*r) (5)

Before going on, consider the following:

It will be noticed that in our formula, the only variable quantity is the rate r. That number is the only way in which one Poisson situation differs from another, and it is the only determining variable (parameter) of the Poisson equation. Nothing else enters in.

Each number r defines a different Poisson distribution. We cannot multiply by 10 the values for the distribution whose rate is r = 1 and get the values for r = 10. The latter must be calculated separately, and will be found to have a different shape. Specifically, the larger the r for any given unit of occurrence, the more symmetrical is the resulting frequency profile. This we already noticed in the picture at the top of this page.

What we here call rate of occurrence, or r, is conventionally called lambda (). Remember to make that adjustment when consulting other textbooks or tables.

Calculating Poisson probabilities ideally requires a statistical calculator, with x*y and *x keys (remember that  is the constant 2.71828). Absent such a calculator, certain individual probabilities may be computed with the aid of the  e *x  Table. For selected simple values of r, problems may be solved using the Tables here provided.

Example. Let us suppose that some event, say the arrival of a weird particle from outer space at a counter on some farm outside Topeka, occurs on average 2 times per hour. But there are variations from that average. What is the probability that in a given hour three weird particles will be recorded? Substituting in formula (5) the empirical rate r = 2 and the expectation k = 3, we get:

Page 16: Binomial Distribution

p(3) = r*3 / 3!*r = 2*3 / 6*2 = 8 / (6)(7.3891) = 8 / 44.3346 = 0.1804

This answer may be checked with the one given in the Poisson Table, and will be found to match. This sort of calculation was in fact how the Table was constructed.

In rough terms, then, if our weird particles average 2 per hour but vary randomly around that average, and thus fit the random Poisson model, we would expect to get 3 rather than 2 weird particles per hour, at the counter over by the silo, in about 0.1804 of the hours observed. If we only watch for one hour, our reading will most likely be 2 particles. But there are 24 hours in a day, and in an average day, there should thus be (24)(0.1804) = approximately 4 hours during which 3 particles are registered. Of course, things can vary from that most likely expectation; that is the way the universe works. But now we know what the most likely expectation is. It is such likeliest expectations that the Poisson formula gives us.

Just to show how the whole situation looks, here, from the Table, is the frequency profile for r = 2, omitting the extremely rare possibilities:

r = 2.0

p(0) 0.1353

p(1) 0.2707

p(2) 0.2707

p(3) 0.1804

p(4) 0.0902

p(5) 0.0361

p(6) 0.0120

p(7) 0.0034

p(8) 0.0009

p(9) 0.0002

It will be seen that the realistic possibilities for occurrence per hour go no lower than zero (which would be physically impossible), and that they reach as high as 9 per hour before becoming so miniscule that they do not show up in four decimal places. If we add these probabilities, we get 0.9999, or 1 (the total probability in the system) plus an effect of rounding error. This, then, is a virtually complete picture of the possibilities. So also with every other column of the Table.

Browsing one of those Tables will illustrate the fact that the Poisson is cramped on the zero side, but spreads out on the infinity side. The list of possible values is thus asymmetrical (the statistical term is "skew"). Such situations, where variation from an average is easier in one direction than another, are very common in real life, and this is one thing that accounts for the fact that so many situations are well described by the Poisson distribution. (For the Normal Distribution, the assumption is that variation is equally likely in either direction from the average).

For the set of probabilities (frequency profiles) for selected average rates r, consult the Poisson Table. To calculate individual probabilities, use formula (5) above. Rough probabilities may be obtained by the use of Poisson Paper. This, and clear thinking, are all that are required to work with the Poisson distribution. The clear thinking is the hardest part, as the Problem set will presently demonstrate.

The Classic Example

Page 17: Binomial Distribution

The classic Poisson example is the data set of von Bortkiewicz (1898), for the chance of a Prussian cavalryman being killed by the kick of a horse. Ten army corps were observed over 20 years, giving a total of 200 observations of one corps for a one year period. The period or module of observation is thus one year. The total deaths from horse kicks were 122, and the average number of deaths per year per corps was thus 122/200 = 0.61. This is a rate of less than 1. It is also obvious that it is meaningless to ask how many times per year a cavalryman was not killed by the kick of a horse. In any given year, we expect to observe, well, not exactly 0.61 deaths in one corps (that is not possible; deaths occur in modules of 1), but sometimes none, sometimes one, occasionally two, perhaps once in a while three, and (we might intuitively expect) very rarely any more. Here, then, is the classic Poisson situation: a rare event, whose average rate is small, with observations made over many small intervals of time.

Let us see if our formula gives a close fit for the actual Prussian data, where r = 0.61 is the average number expected per year for the whole sample, and the successive terms of the Poisson formula are the successive probabilities. Remember that our formula for each term in the distribution is:

p(k) = r*k / (k!)(*r) (5)

We may start by asking, given r = 0.61, what is the probability of no deaths by horse kick in a given year (module of observation)? For k = 0, we get by substitution

p(0) = (0.61)*0 / (0!)(*0.61) = 1 / (1)(1.8404) = 0.5434

Given that probability, then over the 200 years observed we should expect to find a total of 108.68 = 109 years with zero deaths. It turns out that 109 is exactly the number of years in which the Prussian data recorded no deaths from horse kicks. The match between expected and actual values is not merely good, it is perfect.

If we had used instead, as an approximation, the value of *0.6 from our table, we would have gotten p(0) = 0.5488, so that the expected number of such years over 200 years would be 109.76 = 110, or 1 too high. Not bad.

For the entire set of Prussian data, where p = the predicted Poisson frequency for a given number of deaths per year, E is the corresponding number of years in which that number of deaths is expectedto occur in our 200 samples (that is, our p value times 200), and A is the actual number of years in which that many deaths were observed, we have:

Deaths p E A

0 0.54335 108.67 109

1 0.33145 66.29 65

Page 18: Binomial Distribution

2 0.10110 20.22 22

3 0.02055 4.11 3

4 0.00315 0.63 1

5 0.00040 0.08 0

6 0.00005 0.01 0

and the match seems very good throughout. (Not perfect. But it is intuitively obvious that another trial, over another 200 years' worth of data, would give slightly different results, and this is a perfectly plausible example of one such result).

In sum, then, we assume that the Poisson frequency profile gives the expectation (E) when the events in question are indeed random. Comparing that expectation with our actual results (A), we judge that the Prussian data set appears to be the result of random causes. There is no reason to suspect any systematic cause, or any connection between separate events. These deaths, then, just happened. (If ill-trained horses were supplied to all corps in one year, for instance, the pattern of deaths should be more clustered, and we would have a nonrandom factor). It is the ability of the Poisson Distribution to give a model for stuff that "just happens," that accounts for its power in statistics. Statistics is about stuff that "just happens."

Features

The Poisson distribution has several unique features. Most distinctively, as noted above, it has only one parameter, namely the average frequency of the event. That figure is conventionally called lambda (); we here use instead the abbreviation r (for rate).

The Poisson distribution is not symmetrical; it is skewed toward the infinity end.

The mean of any Poisson distribution is equal to its variance, that is

m = v

which is a unique property of this distribution. (Note that "mean" here is the average of all values, and defines the center of gravity of the distribution; it is not a point from which values diverge symmetrically; the Poisson Distribution is not symmetrical). It is sometimes said that the Poisson mean is an "expectation." It is true that the commonest frequency in any Poisson set is the one corresponding to r itself. But it is also true that if r is a whole number, the expectation for (r-1) is identical to that for r, so that where r > 1, the "expectation" is a pair of outcomes, not one single outcome.

For fractional r, where the likeliest or equally likeliest frequency is 0, the histogram of a Poisson set of frequencies is high on the left and skewed toward the right. For the Prussian horse data, above, where r = 0.61, it looks like this:

Page 19: Binomial Distribution

 

p0.543

0.331

0.101

0.021

0.003

0.000

0.000

E 0 1 2 3 4 56

As the average frequency (r) increases, the histogram becomes a little humpier in the middle (see the Poisson Table for an overview of the pattern up to r = 20), but it never becomes perfectly symmetrical, and thus it never loses its distinctive character as a distribution. That character, however, does weaken with increasing r.

Poisson Paper

Poisson paper is specially printed for the easy analysis of raw data. If you plot data points on Poisson paper, they will lie on a vertical line if the set is random in the sense assumed by the Poisson formula. If the resulting line is not vertical, then to that degree, the data set is non-Poisson.

Types of Problem

The situations to which Poisson distributions apply are diverse, and it is not always easy to see at first glance that they are specimens of one underlying type. We give here examples of three common types of Poisson problem. These sample problems will be repeated on the Practice page, along with other problems of the same general type.

Keep in mind that all we have to work with are (1) a rate of occurrence, r, which may be any number; (2) a window of observation; a timespan or a space within which occurrences are observed, and (3) the number of times the event, as seen through that given window, is repeated.

Isolated Events

It has been observed that the average number of traffic accidents on the Hollywood Freeway between 7 and 8 PM on Wednesday mornings is 1 per hour. What is the chance that there will be 2 accidents on the Freeway, on some specified Wednesday morning?

Answer. The basic rate is r = 1 (in hour units), and our window is 1 hour. We wish to know the chance of observing 2 events in that window. The rate r = 1 is included in the Poisson Table, so we don't have to calculate anything. Reading down the r = 1 column, we come to the p(2) row, and there we find

Page 20: Binomial Distribution

that the probability of 2 accidents is 0.1839, or a little less than 1 chance in 5. It's not unlikely. You might get that situation about once a week.

Proportions

Coliform bacteria are randomly distributed in a certain Arizona river at an average concentration of 1 per 20cc of water. If we draw from the river a test tube containing 10cc of water, what is the chance that the sample contains exactly 2 coliform bacteria?

Answer. Our window of observation is 10cc. If the concentration is 1 per 20cc, it is also 0.5 per 10cc; that is just another way of saying the same thing. So r = 0.5 is the rate relevant to our chosen window (if we used a 20cc test tube, or window, the rate would be different, and the resulting frequency profile would also be different). We can then read off any desired probability from the r = 0.5 column of the Poisson Table. For the specific value of p(2), the table supplies the answer 0.0758, or about 1 chance in 13. Not common, but not out of the question either. About once in 8 tries with that unit of observation.

Arrivals

The switchboard in a small Denver law office gets an average of 2.5 incoming phonecalls during the noon hour on Thursdays. Staffing is reduced accordingly; people are allowed to go out for lunch in rotation. Experience shows that the assigned levels are adequate to handle a high of 5 calls during that hour. What is the chance that 6 calls will be received in the noon hour, some particular Thursday, in which case the firm might miss an important call?

Answer. The rate 2.5, and the window of observation is 1 hour. The desired result is easily read off the Poisson Table, from the p(6) row of the r = 2.5 column. The answer is p(6) = 0.0278, or about 1 chance in 36, or a little more than 1 missed phonecall per month. How acceptable that is will depend on how cranky the firm's clients are, and the firm itself is in the best position to make that judgement.

Approximation to Binomial

Besides handling Poisson problems proper, the Poisson Distribution can give an useful simulation of the Binomial Distribution when p is small (one rule of thumb is that it should be no greater than 0.1). In these cases, q is known (as in true Poisson problems it is not), but it is simply discarded; we pay no attention to it. In the range where the Poisson approximation is reasonably close, it is much less difficult to calculate, and is often preferred in practice.

Sample Binomial Problem

Rick has a crooked quarter, which comes up Heads 80% of the time. He tells Jimmie he will get 7 or more Heads in 10 tosses. Jimmie bets the family horse that there will be 6 or fewer Heads. What is Jimmie's chance of riding home from the wager?

This can be worked out by Binomial methods, which are the ones strictly proper to it. To adjust to Poisson perspectives, we take as r the rate of the rarer event (T = Tails), with an average rate of 2 per 10 tosses or 0.2. This exceeds the above recommended level, but we will go ahead anyway, just to see what happens.

We are making a trial observation for another 10 tosses. The expectation for those 10 tosses is ten times the rate for one toss; hence we expect T = 2, and the rate also becomes 2 (per set of 10). Rick bet on between 3 and 0 Tails, so Jimmie wins only if the 10 tosses yield 4 or more Tails. From the Poisson

Page 21: Binomial Distribution

Table for r = 2, we find that the sum of probabilities p(4) through p(9) gives 0.1428, or about 1 in 7, as Jimmie's chance of winning. (There is no value for p(10), that frequency being so small that it does not show up in four place decimals, so it is not included in the table).

If we go back and do this over as as a Binomial problem, we would have have n = 10 (there are 10 tosses), and p(T) = 0.2000 (the coin comes up Tails, on average, 2 times out of 10). The exact Binomial answer for Rickie's chance of winning (to four places) is 0.1209. The Poisson approximation was 0.1428. The Poisson approximation in this case is 18% high; that is, it is only roughly right. This is the consequence of our having exceeded the recommended figure of p = 0.1. This may remind us that Poisson is not an easier way of getting any and all Binomial results. It is a different animal, one which under certain conditions leaves similar tracks as it lopes on its own errands through the statistical woods.

Summary

The Poisson distribution deals with mutually independent events, occurring at a known and constant rate r per unit (of time or space), and observed over a certain unit of time or space.

The probability of k occurrences in that unit can be calculated from p(k) = r*k / (k!)(*r). The rate r is the expected or most likely outcome (for whole number r greater than 1, the

outcome corresponding to r-1 is equally likely). The frequency profile of Poisson outcomes for a given r is not symmetrical; it is skewed

more or less toward the high end. For Binomial situations with p < 0.1 and reasonably many trials, the Poisson Distribution

can acceptably mimic the Binomial Distribution, and is easier to calculate.

Problems

The first of these items is mandatory, for practice with the above Poisson explanation (you don't know it until you can do it). The second and third are problems dealing with the number , which plays a fundamental role in questions of this type. The rest are a mix of Sinological and standard puzzlements, some of which have turned up elsewhere in this site, and are gathered here for their recreational value.

Poisson DistributionDefinition 1: The Poisson distribution has pdf given by

The parameter μ is often replaced by λ.

Page 22: Binomial Distribution

Figure 1 – Poisson Distribution

Observation: Some key statistical properties of the Poisson distribution are:

Mean = µ

Median = µ

Skewness = 1 /

Kurtosis = 1/µ

Excel Function: Excel provides the following function for the Poisson distribution:

POISSON(x, μ, cum) where μ = the mean of the distribution and cum takes the

values TRUE and FALSE

POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for

the Poisson distribution with mean μ.

POISSON(x, μ, TRUE) = cumulative probability distribution function F(x) at the

valuex for the Poisson distribution with mean μ.

Excel 2010/2013 provide the additional function POISSON.DIST which is

equivalent to POISSON.

Real Statistics Function: Excel doesn’t provide a worksheet function for the

inverse of the Poisson distribution. Instead you can use the following function

provided by the Real Statistics Resource Pack.

POISSON_INV(p, μ) = smallest integer x such that POISSON(x, μ, TRUE) ≥ p

Note that the maximum value of x is 1,024,000,000. A value higher than this

indicates an error.

Theorem 1: If the probability p of success of a single trial approaches 0 while the

number of trials n approaches infinity and the value μ = np stays fixed, then the

binomial distribution B(n, p) approaches the Poisson distribution with mean μ.

Click here for the proof of this theorem.

Page 23: Binomial Distribution

Observation: Based on Theorem 1 the Poisson distribution can be used to

estimate the binomial distribution when n ≥ 50 and p ≤ .01, preferably with np ≤

5.

Example 1: A company produces high precision bolts so that the probability of a

defect is .05%. In a sample of 4,000 units what is the probability of having more

than 3 defects?

We can solve this problem using the distribution B(4000, .0005), namely the

desired probability is

1 – BINOMDIST(3, 4000, .0005, TRUE) = 1 – 0.857169 = 0.142831

We can also use the Poisson approximation as follows:

μ = np = 4000(.0005) = 2

1 – POISSON(3, 2, TRUE) = 1 – 0.857123 = 0.142877

As you can see the approximation is quite accurate.

Observation: If the average number of occurrences of a particular event in an

hour (or some other unit of time) is μ and the arrival times are random without any

tendency to bunch up then the probability of x events occurring in an hour is given

by

Example 2: A large department store sells on average 100 MP3 players a week.

Assuming that purchases are as described in the above observation, what is the

probability that the store will run out of MP3 players in a week if they stock 120

players? How many MP3 players should the store stock in order to make sure that

it has a 99% probability of being able to supply a week’s demand?

The probability that they will sell ≤ 120 MP3 players in a week is

POISSON(120, 100, TRUE) = 0.977331

Thus, the answer to the first problem is 1 – 0.977331 = 0.022669, or about 2.3%.

We can answer the second question by using successive approximations until we

arrive at the correct answer. E.g. we could try x = 130, which is higher than 120.

The cumulative Poisson is 0.998293, which is too high. We then pick x = 125

(halfway between 120 and 130). This yields 0.993202, which is a little too high,

and so we try 123. This yields 0.988756, which a little too low, and so we finally

arrive at 124 which has cumulative Poisson of 0.991226.

Page 24: Binomial Distribution

Observation: We have observed that under the appropriate conditions the

binomial distribution can be approximated by either the Poisson or normal

distribution. We conclude this section by stating that the Poisson distribution can

be approximated by the normal distribution.

Theorem 2: For n sufficiently large (usually n ≥ 20), if x has a Poisson distribution

with mean μ, then x ~ N(μ,  ).

14 Responses to Poisson Distribution

1. Anson says:

August 10, 2015 at 11:30 am

Hi.May I ask a question?

When n (from 5 to 10 and 20)increases what happens on the probability

distribution graph?(binomial, poisson and normal)

Reply

o Charles says:

August 10, 2015 at 9:48 pm

It really depends on what happens with the other parameters.

Binomial: If the other parameters in the BINOMDIST function are held constant

then the cumulative distribution values decrease (e.g. compare

BINOMDIST(4,n,.7,TRUE) for n = 5, 10, 20.

Poisson: If you assume that the mean of the distribution = np, then the cumulative

distribution values decrease (e.g. compare POISSON(2,np,TRUE) where p = .5 for

n = 5, 10, 20.

Normal: It really depends on how you are going to use n since NORMDIST doesn’t

directly use n.

Normal Distribution

Data can be "distributed" (spread out) in different ways.

It can be spread out more on the left

 Or more on the right

Page 25: Binomial Distribution

 

     

Or it can be all jumbled up

But there are many cases where the data tends to be around a central value

with no bias left or right, and it gets close to a "Normal Distribution" like this:

A Normal Distribution

The "Bell Curve" is a Normal Distribution. 

And the yellow  histogram  shows some data that 

follows it closely, but not perfectly (which is usual).

Page 26: Binomial Distribution

It is often called a "Bell Curve"because it looks like a bell.

Many things closely follow a Normal Distribution:

heights of people

size of things produced by machines

errors in measurements

blood pressure

marks on a test

We say the data is "normally distributed":

The Normal Distribution has:

mean  = median = mode

symmetry about the center

50% of values less than the mean and 50% greater than the mean

Page 27: Binomial Distribution

Quincunx

You can see a normal distribution being created by random

chance!

It is called the  Quincunx  and it is an amazing machine.

Have a play with it!

 

Standard Deviations

The  Standard Deviation  is a measure of how spread out numbers are (read that

page for details on how to calculate it).

When we  calculate the standard deviation  we find that (generally):

Page 28: Binomial Distribution

 

68% of values are within

1 standard deviation of the mean

 

 

95% of values are within 

2 standard deviations of the mean

 

 

99.7% of values are within 

3 standard deviations of the mean

 

Example: 95% of students at school are between 1.1m and 1.7m tall.

Assuming this data is normally distributed can you calculate the mean and

standard deviation?

The mean is halfway between 1.1m and 1.7m:

Mean = (1.1m + 1.7m) / 2 = 1.4m

95% is 2 standard deviations either side of the mean (a total of 4 standard

deviations) so:

Page 29: Binomial Distribution

1 standard deviation = (1.7m-1.1m) / 4

= 0.6m / 4

= 0.15m

And this is the result:

It is good to know the standard deviation, because we can say that any value is:

likely to be within 1 standard deviation (68 out of 100 should be)

very likely to be within 2 standard deviations (95 out of 100 should be)

almost certainly within 3 standard deviations (997 out of 1000 should be)

Standard Scores

The number of standard deviations from the mean is also called the

"Standard Score", "sigma" or "z-score". Get used to those words!

Example: In that same school one of your friends is 1.85m tall

You can see on the bell curve that 1.85m is 3 standard

deviations from the mean of 1.4, so:

Your friend's height has a "z-score" of 3.0

 

Page 30: Binomial Distribution

It is also possible to calculate how many standard deviations 1.85 is from the

mean

How far is 1.85 from the mean?

It is 1.85 - 1.4 = 0.45m from the mean

How many standard deviations is that? The standard deviation is 0.15m, so:

0.45m / 0.15m = 3 standard deviations

So to convert a value to a Standard Score ("z-score"):

first subtract the mean,

then divide by the Standard Deviation

And doing that is called "Standardizing":

We can take any Normal Distribution and convert it to The Standard Normal

Distribution.

Example: Travel Time

A survey of daily travel time had these results (in minutes):

26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34

Page 31: Binomial Distribution

The Mean is 38.8 minutes, and the Standard Deviation is 11.4

minutes (you can copy and paste the values into the  Standard Deviation

Calculator  if you want).

Convert the values to z-scores ("standard scores").

 

To convert 26:

first subtract the mean: 26 - 38.8 = -12.8,

then divide by the Standard Deviation: -12.8/11.4 = -1.12

So 26 is -1.12 Standard Deviations from the Mean

 

Here are the first three conversions

Original Value CalculationStandard Score

(z-score)

26 (26-38.8) / 11.4 = -1.12

33 (33-38.8) / 11.4 = -0.51

65 (65-38.8) / 11.4 = +2.30

... ... ...

 

And here they are graphically:

Page 32: Binomial Distribution

You can calculate the rest of the z-scores yourself!

 

Here is the formula for z-score that we have been using:

 

z is the "z-score" (Standard Score)

x is the value to be standardized

μ is the mean

σ is the standard deviation

Why Standardize ... ?

It can help us make decisions about our data.

Example: Professor Willoughby is marking a test.

Here are the students results (out of 60 points):

20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17

Most students didn't even get 30 out of 60, and most will fail.

The test must have been really hard, so the Prof decides to Standardize all the

scores and only fail people 1 standard deviation below the mean.

Page 33: Binomial Distribution

The Mean is 23, and the Standard Deviation is 6.6, and these are the

Standard Scores:

-0.45, -1.21, 0.45, 1.36, -0.76, 0.76, 1.82, -1.36, 0.45, -0.15, -0.91

Only 2 students will  fail  (the ones who scored 15 and 14 on the test)

It also makes life easier because we only need one table (the  Standard Normal

Distribution Table ), rather than doing calculations individually for each value of

mean and standard deviation.

In More Detail

Here is the Standard Normal Distribution with percentages for every half of a

standard deviation, and cumulative percentages:

Page 34: Binomial Distribution

Example: Your score in a recent test was 0.5 standard deviations above the

average, how many people scored lower than you did?

Between 0 and 0.5 is 19.1%

Less than 0 is 50% (left half of the curve)

So the total less than you is:

50% + 19.1% = 69.1%

In theory 69.1% scored less than you did (but with real data the percentage

may be different)

A Practical Example: Your company packages sugar in 1 kg bags.

When you weigh a sample of bags you get these results:

1007g, 1032g, 1002g, 983g, 1004g, ... (a hundred measurements)

Mean = 1010g

Standard Deviation = 20g

Some values are less than 1000g ... can you fix that?

Page 35: Binomial Distribution

The normal distribution of your measurements looks like this:

31% of the bags are less than 1000g, which is cheating the customer!

It is a random thing, so we can't stop bags having less than 1000g, but we can

try to reduce it a lot.

Let's adjust the machine so that 1000g is:

at −3 standard deviations:

From the big bell curve above we see that 0.1% are less. But maybe that is

too small.

at −2.5 standard deviations:

Below 3 is 0.1% and between 3 and 2.5 standard deviations is 0.5%, together

that is 0.1% + 0.5% = 0.6% (a good choice I think)

So let us adjust the machine to have 1000g at −2.5 standard

deviations from the mean.

Now, we can adjust it to:

increase the amount of sugar in each bag (which changes the mean), or

make it more accurate (which reduces the standard deviation)

Let us try both.

Page 36: Binomial Distribution

ADJUST THE MEAN AMOUNT IN EACH BAG

The standard deviation is 20g, and we need 2.5 of them:

2.5 × 20g = 50g

So the machine should average 1050g, like this:

 ADJUST THE ACCURACY OF THE MACHINE

Or we can keep the same mean (of 1010g), but then we need 2.5 standard deviations to be

equal to 10g:

10g / 2.5 = 4g

So the standard deviation should be 4g, like this:

(We hope the machine is that accurate!)

Or perhaps we could have some combination of