lesson 5 - probability distributions
TRANSCRIPT
Probability Distributions
Jeremy G. Vicencio
2nd Semester, A.Y. 2010-2011
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution
• A device which summarizes the relationship between the values of a random variable and the probabilities of their occurrence.
• It may be expressed in the form of a table, graph, or formula.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
• The probability distribution of a discrete random variable is a table, graph, or formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
In an article in the American Journal of Obstetrics andGynecology, Buitendijk and Bracken state that duringthe previous 25 years there had been an increasingawareness of the potentially harmful effects of drugsand chemicals on the developing fetus. The authorsassessed the use of medication in a population ofwomen who were delivered of infants at a largeEuropean hospital between 1980 and 1982, andstudied the association of medication use withvarious maternal characteristics such asalcohol, tobacco, and illegal drug use. Their findings
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
suggest that women who engage in risk-takingbehavior during pregnancy are also more likely to usemedications while pregnant. Table 5.1 shows theprevalence of prescription and nonprescription druguse in pregnancy among the study subjects.
We wish to construct the probability distributionof the discrete variable X = number of prescriptionand nonprescription drugs used by the study subjects.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
Number of Drugs Frequency
0 1425
1 1351
2 793
3 348
4 156
5 58
6 28
7 15
8 6
9 3
10 1
12 1
Total 4185
Table 5.1 Prevalence of Prescription and Nonprescription Drug Use in Pregnancy Among Women Delivered of Infants at a Large European Hospital
SOURCE: Simone Buitendjik and Michael B. Bracken, “Medication in Early Pregnancy: Prevalence of Use and Relationship to Maternal Characteristics,” American Journal of Obstetrics and Gynecology, 165(1991), 33-40 as printed in Biostatistics: A Foundation for Analysis in the Health Sciences by Wayne W. Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
Number of Drugs (x) P (X=x)
0 0.3405
1 0.3228
2 0.1895
3 0.0832
4 0.0373
5 0.0139
6 0.0067
7 0.0036
8 0.0014
9 0.0007
10 0.0002
12 0.0002
Total 1.0000
Table 5.2 Probability Distribution of Number of Prescription and Nonprescription Drugs Used in Pregnancy By Women Delivered
of Infants at a Large European Hospital
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0 1 2 3 4 5 6 7 8 9 10 11 12
Pro
bab
ility
x (number of drugs)
Figure 5.1 Graphical representation of the probability distribution shown in Table 5.2
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
Two essential properties of a probability distribution of a discrete variable:
1) 0 ≤ P(X=x) ≤ 1
2) ∑P(X=x) = 1
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
What is the probability that a randomly selected woman will be one who used seven prescription and nonprescription drugs?
Solution: the desired probability is P(X=7). From Table 5.2, it will be seen that the answer is 0.0036
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
What is the probability that a randomly selected woman used either two or three drugs?
Solution: Use the addition rule for mutually exclusive events. Using probability notation and the values from Table 5.2, the answer is
P(2 ∪ 3) = P(2) + P(3) = 0.1895 + 0.0832 = 0.2727
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
Cumulative probability distribution
The cumulative probability for xi is written as
F(x) = P(X ≤ xi)
It gives the probability that X is less than or equal to a specified value, xi.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
Number of Drugs (x) Cumulative Frequency P(X ≤ xi)
0 0.3405
1 0.6633
2 0.8528
3 0.9360
4 0.9733
5 0.9872
6 0.9939
7 0.9975
8 0.9989
9 0.9996
10 0.9998
12 1.0000
Table 5.3 Cumulative Probability Distribution of Number of Prescription and Nonprescription Drugs Used in Pregnancy By Women Delivered
of Infants at a Large European Hospital
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
Figure 5.1 Cumulative probability distribution of number of prescription and prescription drugs used during pregnancy Used in Pregnancy by women
delivered of infants at a large European hospital
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
What is the probability that a woman picked at random will be one who used three or fewer drugs?
Solution: The probability in question can be found directly in Table 5.3 by reading the cumulative probability opposite x = 3. Therefore,
P(x ≤ 3) = 0.9360
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
What is the probability that a woman picked at random will be one who used fewer than 3 drugs?
Solution: Since a woman who used fewer than three drugs used either two, one, or no drugs, the answer is the cumulative probability for 2. That is,
P(x < 3) = P(x ≤ 2) = 0.8528
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
What is the probability that a randomly selected woman used six or more drugs?
Solution: Use the concept of complementary probabilities. P(x ≥ 6) + P(x ≤ 5) = 1. Therefore,
P(x ≥ 6) = 1 – P(x ≤ 5) = 1 – 0.9872 = 0.0128
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Probability Distribution of a Discrete Random Variable
What is the probability that a randomly selected woman is one who used between two and five drugs inclusive?
Solution: P(x ≤ 5) = 0.9872 is the probability that a woman used between zero and five drugs. To get the probability of between two and five drugs, subtract from 0.9872, the probability of one or fewer.
P(2 ≤ x ≤ 5) = P(x ≤ 5) – P(x ≤ 1) =
0.9872 – 0.6633 = 0.3239
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
Figure 5.3.1 Histogram of the Ages of 169 Subjects Who Participated in a Study of Sparteine and Mephenytoin Oxidation
0
10
20
30
40
50
60
70
10-19 20-29 30-39 40-49 50-59 60-69
Freq
uen
cy
Age
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
Figure 5.3.2 Histogram of the Ages of 169 Subjects Who Participated in a Study of Sparteine and Mephenytoin Oxidation
0
5
10
15
20
25
30
35
40
45
15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
Freq
uen
cy
Age
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
0
5
10
15
20
25
30
35
40
45
0 10 20 30 40 50 60 70
Freq
uen
cy
Age
Figure 5.3.3 Frequency Polygon of the Ages of 169 Subjects Who Participated in a Study of Sparteine and Mephenytoin Oxidation
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
0
5
10
15
20
25
30
35
40
45
0 10 20 30 40 50 60 70
Freq
uen
cy
Age
Figure 5.3.4 Continuous Probability Distribution of the Ages of Subjects Who Participate in Sparteine and Mephenytoin Oxidation Studies
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
Figure 5.4.1 A histogram resulting from a large number of values and small class intervals.
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
• In general, as the number of observations, n, approaches infinity, and the width of the class intervals approaches zero, the frequency polygon approaches a smooth curve.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
• The total area under the curve is equal to one
Figure 5.4.2 Graphical representation of a continuous distributionSource: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
• The relative frequency of occurrence of values between any two points on the x-axis is equal to the total area bounded by the curve, the x-axis, and the perpendicular lines erected at two points on the x axis.
Figure 5.4.2 Graph of a continuous distribution showing area between a and b
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
• What is the probability of any specific value of the random variable?
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
Finding area under a smooth curve
• Integral Calculus – to find the area under a smooth curve between any two points a and b, the density function is integrated from a to b.
• Density Function – a formula used to represent the distribution of a continuous random variable.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
ContinuousProbability Distribution
Definition:
A nonnegative function f(x) is called a probabilitydistribution (sometimes called a probabilitydensity function) of the continuous randomvariable X if the total area bounded by its curveand the x-axis is equal to 1 and if the subareaunder the curve bounded by the curve, the x-axis, and perpendiculars erected at any twopoints a and b gives the probability that X isbetween the points a and b.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Figure 5.6 Graph of a normal distributionSource: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Characteristics of the Normal Distribution:
1. It is symmetrical about its mean, µ.
2. The mean, median, and the mode are all equal.
3. The total area under the curve about the x-axis is one square unit.
4. If we erect perpendiculars a distance of 1 SD from the mean in both directions, the area enclosed by these perpendiculars, the x-axis,
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
and the curve will be approximately 68% of the total area. (2 SD, 95%; 3 SD; 99.7%).
Figure 5.7 Subdivision of the Areas Under the Normal Curve
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
5. The normal distribution is completely determined by the parameters µ and σ. In other words, a different normal distribution is specified for each different value of µ and σ. Different values of µ shift the graph of the distribution along the x-axis while different values of σ determine the degree of flatness or peakedness of the graph of the distribution.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Figure 5.8.1 Three normal distributions with different means but the same amount of variability
Figure 5.8.2 Three normal distributions with different standard deviations but the same mean
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Standard Normal Dist./Unit Normal Dist.
• Has a mean of 0 and a standard dev. of 1
Figure 5.9 Graph of the Standard Normal DistributionSource: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• To find the area between z0 and z1, we need to evaluate the following integral:
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• Given the standard normal distribution, find the area under the curve, above the z-axis between z = -∞ and z = 2 (0.9772)
Figure 5.10.1 Graph of the standard normal distribution showing area between z = - ∞ and z = 2
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
The area can be interpreted in several ways:
• The probability that a z picked at random from a population of z’s will have a value between -∞ and 2.
• The relative frequency of occurrence (or proportion) of values of z between -∞ and 2, or we may say that 97.72% of the z’s have a value between -∞ and 2.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• Instead of looking up the areas on the table, you can use Excel’s NORMSDIST function.
=NORMSDIST(z)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• What is the probability that a z picked at random from the population of z’s will have a value between -2.55 and +2.55?
Figure 5.10.2 Standard normal curve showing P(-2.55 < z < 2.55)Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• What proportion of z-values are between -2.74 and 1.53?
Figure 5.10.3 Standard normal curve showing proportion of z values between z = -2.74 and z = 1.53
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• Given the standard normal distribution, find P(z ≥ 2.71)
Figure 5.10.4 Standard normal curve showing P(z ≥ 2.71). Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• Given the standard normal distribution, find P(0.84 ≤ z ≤ 2.45)
Figure 5.10.5 Standard normal curve showing P(0.84 ≤ z ≤ 2.45)Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
As part of a study of Alzheimer’s disease (AD), Dusheiko reported data that are compatible with the hypothesis that brain weights of victims of the disease are normally distributed. From the reported data, we may compute a mean of 1076.80 grams and an SD of 105.76 grams. If we assume that these results are applicable to all victims of Alzheimer’s disease, find the probability that a randomly selected victim of the disease will have a brain that weighs less than 800 grams.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Figure 5.11.1 Normal distribution to approximate distribution of brain weights of patients with AD (mean and SD estimated)
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Figure 5.11.1 Normal distribution of brain weights (x) and the standard normal distribution (z)
Source: Daniel (1995)
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• This formula transforms any value of x in any normal distribution to the corresponding value of z in the standard normal distribution.
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• Instead of using the formula, you may use Excel’s STANDARDIZE function
=STANDARDIZE(x, mean, standard_dev)
• Then, apply the NORMSDIST function, or…
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• You may use the NORMDIST function
=NORMDIST (x, mean, standard_dev, cumulative)
• cumulative – if FALSE, returns the probability that the x value will occur; if TRUE, returns the probability that the value will be less than or equal to x
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• Suppose it is known that the heights of a certain population of individuals are approximately normally distributed with a mean of 70 inches and a standard deviation of 3 inches. What is the probability that a person picked at random from this group will be between 65 and 74 inches tall?
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• In a population of 10,000 people described in the previous example, how many would you expect to be 6 feet 5 inches tall or taller?
Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011
Normal Distribution
• In a population of 10,000 people described in the previous example, how many would you expect to be 6 feet 5 inches tall or taller?
Out of 10,000 people, we would expect 10,000(0.0099) = 99 to be 6 feet 5 inches (77 inches) tall or taller