chapter 2 – econometrics for health policy, health ...plaza.ufl.edu/jvt/econometrics-topic2.pdf2...
TRANSCRIPT
Chapter 2 – Econometrics for Health Policy, Health Economics, andOutcomes Research– Probability
©Copyright Joseph V. Terza, Ph.D. 2007 All Rights Reserved
1. Descriptive Statistics and Summation Notation
1.1 Introduction
1.2 Descriptive Statistics
1.3 Summation Notation
1.4 Some Useful Descriptive Statistics
2. Basic Probability Theory
2.1 Set Theory
2.2 Counting Techniques
2.3 Basic Definitions for Probability Theory
2.4 Assigning Probability Values to Events
2.5 Some Important Theorems on Probability Measures
3. Joint Probability
3.1 Joint and Conditional Probability
3.2 Stochastic Independence
3.3 Probabilities Assigned to Samples Drawn With and Without Replacement
3.4 Baye's Theorem
Exercises
2
1. Descriptive Statistics and Summation Notation
1.1. Introduction
(2.1) DEF: Statistics is the branch of mathematics that consists of a set of analytical techniques that
can be applied to data to help in making judgements and decisions in problems involving uncertainty.
There are two branches of statistics: Descriptive Statistics and Inferential Statistics
(2.2) DEF: Descriptive statistics consists of procedures for (1) tabulating or graphing the general
characteristics of a set of data and (2) describing some characteristics of this set, such as measures
of central tendency or measures of dispersion.
(2.3) DEF: Inferential statistics consists of a set of procedures that helps in making inferences and
predictions about a whole population based on information from a sample of the population.
1.2 Descriptive Statistics
There are two reasons for learning about descriptive statistics. First, after gathering data one
should always examine it carefully. Your examination should range from directly examining the
data, to applying various techniques for summarizing the data. Such techniques include:
a) tables
b) graphs
c) summary statistics
i) measures of central tendency
3
ii) measures of dispersion.
Secondly, the discussion of descriptive statistics will serve as a foundation for our study of
probability theory.
The first step in summarizing the data is determining the percentages of data points in each
of the possible categories of the data.
(2.4) DEF: The relative frequency of the ith category is the proportion of observations falling in that
icategory. If the ith category contains f observations, then the relative frequency of the ith category
iis f /n where n denotes the total number of observations in the sample. If we are working with a
ipopulation of N items, then the relative frequency of the ith class is f /N. A relative frequency
distribution is a table that shows the proportion of observations that falls into each category.
(2.5) DEF: A histogram is a graphical representation of a relative frequency distribution.
Example
wages
$4, 4.60, 4.75, 5, 5, 5.50, 5.90, 6, 6, 6.25, 6.25, 6.75, 6.80, 7, 7.25, 7.30, 7.50, 8, 8.50, 9
wage relative frequency
4 to under 5 3/20 = .15
5 to under 6 4/20 = .20
4
6 to under 7 6/20 = .30
7 to under 8 4/20 = .20
8 to under 9 2/20 = .10
9 to under 10 1/20 = .05
(Figure 2-1 Here)
(2.6) DEF: A relative frequency distribution is said to be symmetric if there exists a value such that
the portion of the distribution to the left of the value is the mirror image of the portion to its right.
(2.7) DEF: A relative frequency distribution is said to be skewed if it is not symmetric.
(Figure 2-2 Here)
1.3 Summation Notation
(2.8) DEF: The symbol
i i i g(x , y , z ,...)
is defined by the equation
5
i i i 1 1 1 2 2 2 I I I g(x , y , z ,...) = g(x , y , z ,...) + g(x , y , z ,...) + ... + g(x , y , z ,...)
where g(...) is any function
Examples:
i i i i1) g(x , y , z ,....) = x
so
i i i i 1 2 I g(x , y , z ,....) = x = x + x + ... + x .
i i i i i2) g(x , y , z ,...) = x y
so
i i i i i 1 1 2 2 I I g(x , y , z ,...) = x y = x y + x y + ... + x y .
Useful properties of summations:
Let c be a constant (i.e., it does not change with the counter i)
1) c = I c
i i i i i i2) c g(x , y , z ,...) = c g(x , y , z , ...)
6
i i i i i i i i i i i i3) {g(x , y , z ,...) + h(x , y , z ,...)} = g(x , y ,z ,...) + h(x , y ,z ,...)
(2.9) DEF: The symbol
ij ij ij g(x , y , z ,...)
is defined by
11 11 11 12 12 12 1J 1J 1J = (g(x , y , z ,...) + g(x , y , z ,...) + ... + g(x , y , z ,...)) +
21 21 21 22 22 22 2J 2J 2J (g(x , y , z ,...) + g(x , y , z ,...) + ... + g(x , y , z ,...)) +
I1 I1 I1 IJ IJ IJ ... + (g(x , y , z ,...) + ... + g(x , y , z ,...))
Examples:
1) Let I = 2 and J = 2
where
11 12 21 22x = 1, x = 2, x = 3, x = 4
and
11 12 21 22y = 2, y = 3, y = 1, y = 1.
Then
Note that in this chapter we use the word population to mean finite population. This means1
that there is a finite number of individuals in the poplulation. In the next chapter we will discuss thedifference between finite and infinite in describing populations.
7
ij x = (1 + 2) + (3 + 4) = 3 + 7 = 10
2) Using the same definitions as in Example 1
ij ij (x + y ) = ((1 + 2) + (2 + 3) ) + ((3 + 1) + (4 + 1) )2 2 2 2 2
= 9 + 25 + 16 + 25 = 75
1.4 Some Useful Descriptive Statistics
By far the most important measure of central tendency
(2.10) DEF: The population mean is defined and denoted1
(2.11) DEF: The sample mean is defined and denoted
8
1 2 n(2.12) DEF: Suppose the observations x , x , ..., x have been arranged in ascending order. The
p ppth percentile is the value x such that p percent of the observations are less than or equal to x and
p(100 - p) percent of the observations are greater than x .
(2.13) DEF: The lower quartile of a set of observations, also called the first quartile, is the 25th
1percentile. The first quartile is denoted Q . The upper quartile, also called the third quartile, is
3denoted by the symbol Q and is the 75th percentile of the data. The median is the 50th percentile
(or second quartile).
(2.14) DEF: The range of a set of observations is the difference between the largest and smallest
values.
3 1(2.15) DEF: For any population or sample of values, the interquartile range is (Q - Q ).
i i(2.16) DEF: The deviation from the mean is the difference x - ì or where x is an
observation (population or sample).
Computing the average deviation may seem like a good idea for measuring variation in the data but
it is not because
9
because (as you will show for homework)
(2.17) DEF: The population variance is defined and denoted
ó = 2
(2.18) DEF: The sample variance is defined and denoted
S = 2
(2.19) DEF: The population standard deviation is defined and denoted
ó =
(2.20) DEF: The sample standard deviation is defined and denoted
S =
these two items give you a way of getting the average deviation without running into the zero
10
problem.
(2.21) THEOREM: (Chebyshev's Inequality) Let c be any number greater than 1. For any
population of data, the proportion of observations that lie fewer than c standard deviations from the
mean is at least 1 - .
Example
Recall the wage data considered earlier.
ì / mean = 6.3675
ó / / standard deviation = 1.3069.
Therefore the proportion of the population between ì - 2ó (3.7537) and ì + 2ó (8.9814) is, according
to the theorem, greater than or equal to
1 - 1/4 = .75
Directly calculating the actual value from the relative frequencies yields
19/20 = .95.
2. Basic Probability Theory
2.1 Set Theory
(2.22) DEF: A set is any collection of objects. These objects are called elements.
11
Ways to denote sets:
1 2 3A = {a ,a ,a ,...}
A = {a: a satisfies certain criteria}
(2.23) DEF: The null set, designated 0/ , is the set with no elements.
(2.24) DEF: A set A is said to be a subset of another set B, denoted A d B, if every element of A is
also an element of B.
(2.25) DEF: The union of the sets A and B is another set containing all the elements belonging to
either A or B. We write A c B.
(2.26) DEF: The intersection of the sets A and B is another set containing all elements common to
both A and B. We write A 1 B.
(2.27) DEF: Two sets A and B are said to be disjoint if A 1 B = 0/ .
(2.28) DEF: The universal set, denoted U, is the set of all elements of interest to a particular
problem. It is the set of which all sets are subsets.
(2.29) DEF: The complement of a set A is the set containing all the elements of U that are not in A.
We write
When I refer to a standard deck of playing cards I mean a deck of 52 cards, four suits each2
comprising 13 cards (hearts, spaded, diamonds, clubs), cards in each suit marked 2 through 10, J,Q, K, A (jack, queen, king, ace). If you are unfamiliar with playing cards, please consult me or afriend.
12
A complement = Ac
(2.30) DEF: A set is said to be countable if its elements are in one-to-one correspondence with the
natural numbers (the positive integers).
(2.31) DEF: A set is said to be finite if it is countable and has a finite number of elements.
(2.32) DEF: A set is said to be countably infinite if it is countable and has an infinite number of
elements.
(2.33) DEF: A set is said to be uncountably infinite if it is not countable and has an infinite number
of elements.
Examples2
finite - the subset of all spades in a standard deck of playing cards.
countably infinite - the set of number of heavenly bodies north of the earth’s equator at a
given moment in time.
uncountably infinite - the number of points on the rightmost panel of the blackboard.
13
1 2 n 1 2 n(2.34) DEF: Suppose A is a finite set consisting of elements a , a ,..., a so that A = {a , a , ..., a }.
We denote the number of elements in A by n(A).
2.2 Counting Techniques
(2.35) THEOREM: Let A and B be finite sets. The number of possible ordered pairs (a,b) where
a 0 A and b 0 B is n(A)n(B). Where "0" means "is an element of."
1 2 J(2.36) COROLLARY: Let A , A , ..., A be finite sets. The number of possible ordered n-tuples
1 2 J 1 2 J(a , a , ..., a ) is n(A )n(A ), ..., n(A ).
(2.37) DEF: Let N be a positive integer. The product of all integers from 1 to N is called N-factorial
and is denoted N!.
note 0! = 1
e.g., 3! = 1x2x3 = 6
(2.38) DEF: A permutation of N different things taken R at a time is an arrangement in a specific
N Rorder of any R of those N things. We write P .
e.g.
Three things A B C
ABC BAC CABACB BCA CBA
14
Note that there are 3! permutations of 3 items taken 3 at a time.
Can waste a lot of time counting up permutations the above way. Let's find an easier way. We can
derive a simpler formula in 3 steps.
1. After choosing the first you have N-1 ways of choosing the second. So the total number of ways
to choose the first and the second is N(N-1).
2. Therefore by the same reasoning there are
N(N-1)(N-2)
ways of choosing the first 3 items.
3. Likewise there are
N RP = N(N-1)(N-2) ..... (N-(R-1))
ways of choosing the first R items.
e.g., 4 things taken 3 at a time
4 3P = 4 x 3 x 2 = 24
Notice that this formula is still a bit cumbersome. But we can simplify it further. Extending the
expression in 3, we obtain
[N(N-1)(N-2) C ... C (N-(R-1))] C [(N-R)(N-(R+1)) C ... C 1] = N!
or
15
N RP C (N-R)! = N!.
or
N!
N RP = ______ (N-R)!
This implies the following definition
(2.39) DEF: The formula for the number of permutations of N objects taken R at a time is
N RP =
Example A personnel manager interviews 10 and must select and rank 3
10 3P = = 10 x 9 x 8 = 720.
(2.40) DEF: A combination of N things taken R at a time is an arrangement of any R of those things
N Rwithout regard to order. We use the symbol C to denote the number of combinations of N things
taken R at a time.
e.g., Suppose a drug salesman must visit 5 out of 10 hospitals and the order of the calls is not
important. The number of possible visit combinations is
16
10 5 C .
Note that each combination of R things affords R! permutations. In other words
N R N RP = R! C
which implies that
N R N RC = P /R!
(2.40) DEF: The formula for the number of combinations of N objects taken R at a time is
N RC =
sometimes the symbol
N RC = is used.
Back to the example (5 hospitals out of 10)
10 5 C = = 252
2.3 Basic Definitions for Probability Theory
17
(2.41) DEF: An experiment is any repeatable process from which an outcome, measurement, or
result is obtained. When the outcomes cannot be predicted with certainty, the experiment is called
a random experiment.
Outcomes of experiments can be:
qualitative attributes (people, objects)
quantitative attributes - real values
continuous - point along the real line
discrete - positive integers
(2.42) DEF: Each of the possible outcomes of an experiment is called a basic outcome and denoted
ù (the lowercase Greek letter omega). The set of all possible basic outcomes for a given experiment
is called the sample space of the experiment and is denoted Ù.
Basic outcomes correspond to elements in set theory and the sample space corresponds to the
universe.
Examples
1 2 3ù = Mike, ù = Kathy, ù = Mary Beth
iù denotes a point on the real line
0 1 2ù = 0, ù = 1, ù = 2, ..., 4
18
(2.43) DEF: An event is any collection of basic outcomes; i.e., an event is any subset of the sample
space.
We denote events using uppercase block letters. We say that an event occurs if any basic outcome
in that event occurs.
(2.44) DEF: The probability of an event is the limiting relative frequency with which the event
occurs when an experiment is repeated a large number of times under identical circumstances. Given
the event A (i.e., the subset of the sample space) we write P(A) = the probability of A. Technically
then
Twhere T denotes the number of trials of the experiment, and A denotes the number of times the
event A occurs in T trials.
2.4 Assigning Probability Values to Events
How we assign probabilities to events defined on a random experiment will depend on the
nature of the sample space and the events defined thereon. Recall that the sample space corresponds
to the concept of the universal set in set theory, and events are subsets of the sample space. We saw
earlier that universes and their subsets can be finite, countably infinite, or uncountably infinite.
Therefore, sample spaces and events can be finite, countably infinite, or uncountably infinite. How
19
probability is assigned to a specified event will depend on the whether that event (and its sample
space) is finite, countably infinite, or uncountably infinite. Each will warrant a different method for
assigning probability
Examples
a) Finite Case - Suppose all basic outcomes are equiprobable. Then
P(A) = .
i) Suppose we have an urn of marbles, 7 of which are red and 3 of which are green. The
random experiment consists of drawing a marble blindly from the urn and observing its
color.
basic outcomes = the ten marbles
sample space = {red, red, red, red, red, red, red, green, green, green}
possible events are A = {red}, B = {green}, sample space itself
P(A) = .7
P(B) = .3
P(Ù) = 1
ii) Consider a standard deck of playing cards. Suppose the random experiment consists of
20
dealing a 5-card hand from the deck after it has been thoroughly shuffled. The basic
outcomes are the possible hands. Is this set infinite? How many basic outcomes are there?
52 5C
Find the probability of getting exactly 3 aces. How many ways to get hands with exactly 3
aces?
4 33 of the 4 ace cards = C
48 22 of the remaining 48 non-ace cards = C
A = {all hands with exactly 3 aces}
P(A) =
b) Countably Infinite Case -
Consider the manufacture of cotter pins. Suppose the random experiment consists of examining the
cotter pins produced in a 1 hr. period and counting the number of defectives. The basic outcomes
in this case are the possible numbers of defective pins observed in that 1 hr. period. The sample
space consists of the set of Natural Numbers (the positive integers and zero). So this sample space
is countable but (virtually) infinite. It has been shown that under certain conditions the following
function can be used to assign probabilities to the elementary events
21
where j = 0, 1, 2, ...,4, ì is the only parameter of the function and
e = lim (1 + (1/m)) = 2.7128...m
m64
In other words if this experiment were conducted an infinity of times the relative frequency of the
event {j} would be as is described by the function above. Now as an example suppose that ì = .5
P({0}) = e = e = .6065.-ì -.5
P({1}) = ìe = .5e = (.5)(.6065) = .3033.-ì -.5
P({2}) = (ì e )/2 = (.25e )/2 = (.25)(.6065)/2 = .0758.2 -ì -.5
.
.
.
Now suppose that ì = .001
P({0}) = e = e = .9990-ì -.001
P({1}) = ìe = (.001)e = (.001)(.9990) = .0009.-ì -.001
P({2}) = (ì e )/2 = ( (.001) e )/2 = (.000001)(.9990)/2 2 2-ì -.001
= .0000005.
22
c) Uncountably Infinite Case - Suppose the sample space Ù is a bounded subset of the Cartesian
plane. The basic outcomes are the points in Ù. If each point in the sample space is equiprobable the
rule for assigning probabilities to events is
P(A) =
where A is a subset of Ù.
The following definition generalizes the concept of the probability assignment rule to cover all types
of sample spaces and events.
(2.45) DEF: Let A be an event (i.e., A is a subset of Ù) and define the set function P(A) such that
a) 0 # P(A)
1 2 3 1 2 3b) P(A c A c A c ...) = P(A ) + P(A ) + P(A ) + ...
i j 1 2 3where A and A are disjoint (i not equal to j) and the set of sets (class) {A , A , A ,
... } is countable
c) P(Ù) = 1.
Note that P(A) so defined satisfies the conditions for being a measure. A measure is a set function
that satisfies certain properties. The study of measures is beyond the scope of this course but you
should be aware that there is an entire field of mathematics, called measure theory, devoted to
measures and their properties. The fact that P(A) is a measure means that all of the results of
measure theory can be applied to the study of probability. Henceforth we refer to P(A) as probability
23
measure. It is, in fact, property (c) that makes P(.) a probability measure.
2.5 Some Important Theorems on Probability Measures
There are a number of important theorems which follow from Definition (2.45).
(2.46) THEOREM: For each event A, P(A) = 1 - P(A ).c
(2.47) THEOREM: The probability of the null event is zero; i.e., P(0/) = 0.
1 2 1 2 1 2(2.48) THEOREM: If A and A are events such that A d A , then P(A ) # P(A ).
(2.49) THEOREM: For each event A, 0 # P(A) # 1.
1 2(2.50) THEOREM: For every pair of events A and A
1 2 1 2 1 2P(A c A ) = P(A ) + P(A ) - P(A 1 A )
(2.51) DEF: If events A and B have no basic outcomes in common they are said to be mutually
exclusive. In this case A 1 B = 0/ .
(2.52) THEOREM: For mutually exclusive events A and B, P(A 1 B) = 0.
24
(2.53) THEOREM: For mutually exclusive events A and B,
P(A c B) = P(A) + P(B).
Example
1 2 K 1 2 KLet A = {ù ,ù ,...,ù } be an event where ù , ù , ..., ù are basic outcomes then
1 2 KP(A) = P({ù }) + P({ù }) + ... + P({ù })
or more simply
1 2 K iP(A) = P({ù }) + P({ù }) + ... + P({ù }) = P({ù })
(2.54) DEF: If events A and B cover the sample space we say that A and B are collectively
exhaustive. In this case A c B = Ù.
(2.55) DEF: For collectively exhaustive events A and B
P(A c B) = 1
(2.56) THEOREM: If A and B are mutually exclusive and collectively exhaustive we have
P(A c B) = 1 = P(A) + P(B)
25
Examples
ii) P(Ù) = Ó P({ù }) = 1
ii) A = {students concentrating in history}
B = {students concentrating in math}
P(A c B) = %(hist or math) = .75
P(A) = %(hist) = .6
P(B) = %(math) = .5
What is P(A 1 B) ?
We know by Theorem 2.50 that P(A 1 B) = P(A) + P(B) - P(A c B)
= .6 + .5 - .75
= .35
iii) Store has 10 loaves of bread in its inventory
6 fresh
4 stale
The random experiment consists of a customer blindly picking two loaves. The sample space is
Ù = {all possible pairs of loaves out of 10}.
26
Let the event
A = {all pairs of loaves in which there is at least one stale loaf}
A = {all pairs of loaves in which there are no stale loaves}. c
Therefore
P(A) = 1 - P(A )c
P(A ) =c
n(Ù) = the number of combinations of 10 loaves taken 2 at a time
10 2= C
n(A ) = the number of ways of drawing two loaves from among thec
6 fresh ones
6 2= C
10 2n(Ù) = C =
6 2 n(A ) = C = c
27
Therefore
P(A) = 1 - P(A ) = .c
3. Joint Probability
3.1 Joint and Conditional Probability
Often we can partition the sample space into a class of mutually exclusive events based on
any one of a variety of categorization schemes. For example suppose
Ù = {population of individual health care demanders}.
We can categorize individuals (basic outcomes) according to their health insurance coverage (1 if
covered; 0 if not covered), or according to whether or not they visited a physician at least once
during the previous year (1 if they did; 0 if they didn’t). We can then construct a table of the
following form which depicts the four possible joint probabilities assigned to these events:
HI 1 0 VISIT 1 P( 1 1 1) P(1 1 0) 0 P( 0 1 1) P(0 1 0)
Note that if you sum the entries in the first column you obtain
28
P( 1 1 1) + P( 0 1 1) = P((1 1 1) c ( 0 1 1)) = P(HI = 1).
Likewise if you sum the entries in the second columns you obtain P(HI = 0). And summing across
the first and second rows respectively yields P(VISIT=1) and P(VISIT=0). Placing these newly
computed probabilities in their appropriate positions on the margins, we rewrite the above table as
HI 1 0 VISIT 1 P(1 1 1) P(1 1 0) P(VISIT = 1) 0 P(0 1 1) P(0 1 0) P(VISIT= 0)
P(HI = 1) P(HI = 0)
The four probabilities along the margins of the above table are called the marginal probabilities.
They describe the probabilities associated with a particular categorization of the sample space,
regardless of the other possible categorizations. It is easy to see that these concepts of joint and
marginal probabilities can be extended to cases in which there are more than two categorization
schemes and/or more than two events in each categorization.
Example
Suppose there were 100 individuals in the population and the joint frequency table is:
HI 1 0 VISIT 1 n(1 1 1)=35 n(1 1 0)= 25
29
0 n(0 1 1)=15 n(0 1 0)= 25
so the joint/marginal probability table is
HI 1 0 VISIT 1 .35 .25 .6 0 .15 .25 .4
.5 .5
(2.57) DEF: Given two events A and B the conditional probability of A given B is the probability
that event A occurs on a trial of the random experiment given certainty that B occurs. We write
the conditional probability of A given B = P(A*B).
Example
Consider the health insurance example discussed above. Intuitively then, to evaluate P(VISIT=1
*H.I.= 1) using our method for assigning probabilities to finite sets we divide the number who both
visit and are covered by the number who are covered. We thus obtain
P(VISIT=1 *HI= 1) =
but dividing both numerator and denominator by n(Ù) yields
30
P(VISIT=1 *HI= 1) =
which leads us to the familiar definition of conditional probability
P(VISIT=1 *HI= 1) .
(2.58) DEF: P(A*B) = P(A 1 B)/P(B) and likewise P(B*A) = P(A 1 B)/P(A).
The essence of the concept of conditional probability is redefining the sample space. In our example
the sample space is first redefined (reduced) from the entire population of 100 individuals to the
smaller set including only those 50 who have health insurance.
3.2 Stochastic Independence
(2.59) DEF: Two events A and B are said to be stochastically independent if
P(A*B) = P(A)
or equivalently
P(B*A) = P(B).
31
In our example the events VISIT = 1 and HI = 1 are not independent. P( VISIT = 1) = .6 and
P(VISIT = 1 | HI = 1) = .7.
There are a couple of alternative ways to describe independence.
(2.60) DEF: Independence means that no new information (in terms of probability assignments) is
supplied by knowledge of the conditioning event.
or
(2.61) DEF: Independence means that the probability of the event in question relative to the new
(redefined) sample space is the same as the probability of that event relative to the original sample
space.
(2.62) THEOREM: If two events A and B are stochastically independent then
P(A 1 B) = P(A)P(B).
This follows from the fact that by the definition of conditional probability
P(A 1 B) = P(A*B)P(B)
but when A and B are independent
P(A*B) = P(A).
32
NOTE: A and B being disjoint is not equivalent to their being independent. In fact, if A and B are
disjoint they cannot be independent.
3.3 Probabilities Assigned to Samples Drawn With and Without Replacement
Let's return to an earlier example. Recall, a store has 10 loaves of bread in its inventory
6 fresh4 stale
The random experiment consists of a customer blindly picking two loaves. What is the probability
of getting two fresh loaves? We can view this problem from a perspective that is slightly different
from the one taken earlier. First, define the sample space as S = {f,f,f,f,f,f,s,s,s,s}. Secondly, define
the random experiment as drawing one loaf from S. So here we repeat the experiment twice.
A = {fresh loaf on first draw}
B = {fresh loaf on second draw}
We want the probability of A and B or A 1 B. By the definition of conditional probability we know
that P(A 1 B) = P(B*A)P(A). In our example then
P(A 1 B) = P(B*A)P(A) = .
Recall that we computed this same probability earlier by viewing this as a random experiment
consisting of a customer blindly picking two loaves. The sample space is
33
Ù = {all possible pairs of loaves out of 10}.
Let the event
A = {all pairs of loaves in which there is at least one stale loaf}
A = {all pairs of loaves in which there are no stale loaves}. c
Therefore
P(A) = 1 - P(A )c
P(A ) =c
n(Ù) = the number of combinations of 10 loaves taken 2 at a time
10 2= C
n(A ) = the number of ways of drawing two loaves from among thec
6 fresh ones
6 2= C
10 2n(Ù) = C =
6 2 n(A ) = C = c
34
In other words
6 2 10 2P(A 1 B) = C / C = 1/3.
Now suppose that on the second draw the experiment is conducted under different circumstances,
i.e., the loaf that was drawn first is replaced before the second drawing is made. Now the draws are
indistinguishable, the events can be rewritten
A = {fresh loaf}
B = {fresh loaf}
and P(A) = P(B) = P(B*A) = .6. Therefore the events are independent and
P(A 1 B) = P(A)P(B) = P(A) = .362
The first case is an example of sampling without replacement, i.e. for two events A and B defined
respectively on successive trials of a random experiment such that the item drawn in the first trial
is not replaced before the second draw is conducted
P(A 1 B) = P(B*A)P(A)
The second case is an example of sampling with replacement, i.e., for two events A and B defined
respectively on successive trials of a random experiment such that the item drawn in the first trial
is replaced before the second draw is conducted
35
P(A 1 B) = P(A)P(B).
3.4 Bayes' Theorem
Consider a random experiment with n mutually exclusive collectively exhaustive outcomes
1 2 n;E , E , ..., E and suppose that the respective probabilities assigned to these outcomes are
1 2 n(*) P(E ), P(E ), ..., P(E ).
Suppose also that we are interested in updating these probabilities based on some new information.
Let's denote the new information as an event A defined on the same sample space. We are interested
then in
1 2 n(**) P(E *A), P(E *A), ..., P(E *A).
The probabilities in (*) are called the prior probabilities, and (**) are called the posterior
probabilities. The problem is that the only information you have at your disposal is the set of
following conditional probabilities
1 2 nP(A*E ), P(A*E ), ..., P(A*E ).
For example consider assessing the probability that a particular jet engine is made at one of a
company's two plants. The possible events are
1E = engine made at plant 1
36
2E = engine made at plant 2
The known prior probabilities are
1P(E ) = .4
2P(E ) = .6
You would like to update these probabilities given the knowledge that the engine is defective let this
be denoted as the event A. So you want
1P(E *A)
but the only information you have is
1P(A*E ) = .02 = % of defectives made at plant 1
2P(A*E ) = .03 = % of defectives made at plant 2
From the definition of conditional probability we know that
1P(E *A) =
1 1 1and P(A 1 E ) = P(A*E )P(E ). But we still don't know P(A), or do we?
1 2 1 2 1 2A = (A 1 E ) c (A 1 E ) but because E and E are mutually exclusive (A 1 E ) and (A 1 E ) are
also mutually exclusive so
1 2 1 1 2 2P(A) = P(A 1 E ) + P(A 1 E ) = P(A*E )P(E ) + P(A*E )P(E )
37
and
1P(E *A) =
In general then the following is true.
1 2 n(2.64) THEOREM: (Bayes' Theorem) Let E , E , ..., E be events with prior probabilities
1 2 nP(E ), P(E ), ..., P(E ). Let A be another event defined on the same sample space. Then the
iposterior probability of E is given by
.
Exercises
Chapter 2, Exercise 1
1. Suppose that we have the following four observations on x and y.
x y1 22 23 44 4
38
For each of the following, without using the x-y numbers, attempt to establish the validity of theequality using the three “Useful Properties of Summations” given on p. 5 of Chapter 2 of the text.If you suspect that a particular equality does not hold, plug in the above x-y values to confirm yoursuspicion.
i i(a) Ó2x = 2Óx
i i(b) Ó2x = 2(4)Óx
i i i i(c) Ó(x + y ) = Óx + Óy
i i i i(d) Ó2x y = 2Óx y
i i i i(e) Ó2x y = 2(Óx )(Óy )2i = 2Óx2i(f) Ó2x
i = 2(Óx )2i(g) Ó2x 2
i i i i(h) Ó(2 + 3x + 4y ) = 2 + 3Óx + 4Óy
i i i i(i) Ó(2 + 3x + 4y ) = 2(4) + 3Óx + 4Óy
i i(j) Ó(y + 3) = Óy + 3
Chapter 2, Exercise 2
2. Consider a state in which there are ten colleges and universities. Tuition and fees (in dollars) atthe institutions are as follows:
8315, 12124, 4239, 14138, 16050
9327, 7136, 9051, 13124, 9250
Calculate the mean and variance of this population of tuition levels.
Chapter 2, Exercise 3
3. Consider a college where tuition and fees are $17,000. The mean student receives a $2000grant against tuition from the college. If total revenues (tuition net of grants) for the collegeare $22,500,000 for the year, how many students are enrolled at the college?
Chapter 2, Exercise 4
4. Show that:
39
Chapter 2, Exercise 5
5. There are 20 flower vendors in a city, with the following profits (in dollars) on a given day:
52.24, 68.01, 73.13, 38.26, 65.06, 58.26, 48.57, 53.42, 71.29, 64.13,30.05, 48.27, 51.43, 52.67, 49.28, 48.16, 59.04, 63.23, 67.15, 70.21
Using cells with midpoints of 35, 45, 55, 65, and 75, graph the relative-frequency histogram for thisvariable.
Chapter 2, Exercise 6
6. The following data represent a sample of weekly wages (in dollars) earned by part-time employees at a department store:
80 98 75 69 81 88 78 96 70 88 85 88 75 58 97 67 61 52 76 81 83 70 98 83 85 90 64 95 63 82 108 109100 96 92 100 73 94 105 78
a) Put the data in order.
b) Construct a frequency distribution using the classes $50 to under $60, $60 to under $70, and soforth.
c) Graph the histogram.
d) Construct and graph the relative frequency distribution.
e) Explain the difference between the height and area of a bar in the histogram.
f) Construct the cumulative frequency distribution.
g) Calculate the sample mean, sample variance, and sample standard deviation.
h) What is the median?
i) Find the first and third quartiles.
j) Compute the range and interquartile range for the data given in this problem.
40
Chapter 2, Exercise 7
7.
i iA. Let the values of X be 5, 4, 2, 12, 12, 10, 6, 7, 9, and 4, respectively. Also let y = 5x . Calculatethe following sums:
ia. G y
ib. G 2y
ic. G y2
B. Write the following values in summation notation:
1 2 15a. x + x + ... + x
1 2 22b. 5x + 5x + ... + 5x
1 1 2 2 9 9c. x + y + x + y + ... + x + y
C. Are the following expressions true or false?
i i i ia. (Gx )(Gy ) = G x y
i ib. G x = (G x )2 2
i i i ic. G(x + y ) = G x + G y2 2 2
Chapter 2, Exercise 8
8. Use a Venn diagram to illustrate A 1 B .c
Chapter 2, Exercise 9
9. Let A, B d U (the universe). Using Venn diagrams show that the following statements are true(DeMorgan’s Laws):
41
a)
b)
Chapter 2, Exercise 10
10. The menu on an airplane offers a choice of four drinks, three salads, five entrees, fourvegetables, two kinds of potatoes, and five desserts. If a meal consists of one of each item, howmany different meals are possible?
Chapter 2, Exercise 11
11. From a group of 5 men and 7 women, how many different committees consisting of 2 men and3 women can be formed?
Chapter 2, Exercise 12
12. A publishing company publishes five different how-to books for the handy person, and it has aspecial offer of three books for $10. A woman has decided to buy three books and give them to herhusband one at a time to entice him to make three desired home repairs. In how many ways canthree books be selected and ordered from the list of five books?
Chapter 2, Exercise 13
13. Compute the probability of being dealt at random and without replacement a 5-card hand inwhich there are exactly three kings and two queens (from a standard deck of playing cards)?
Chapter 2, Exercise 14
14. Compute the probability of being dealt at random and without replacement a 13-card bridgehand from a standard deck of playing cards consisting of:
(a) 6 spades, 4 hearts, 2 diamonds, and 1 club.(b) 13 cards of the same suit.
Chapter 2, Exercise 15
15. Consider a circular dart board whose radius is two feet. If the bull’s-eye is six inches indiameter, what is the probability of getting a bull’s-eye with a randomly tossed dart. By randomlytossed I mean that the probability of the dart landing in any area of the dart board of a given size isthe same, regardless of the shape of the area and where it is on the dartboard.
Chapter 2, Exercise 16
42
16 . An oil company that has purchased a square tract of land in Alaska 20 miles on a side is goingto pick a site on the tract at random and drill a well. Assume that oil exists in two rectangular pools,each having dimensions 2 miles by 3 miles. If one well is drilled, what is the probability of strikingoil?
Chapter 2, Exercise 17
17. Using Definition 2.45 prove Theorem (2.46).
Chapter 2, Exercise 18
18. Using Definition 2.45, and all theorems and definitions prior to (2.47), prove Theorem (2.47).
Chapter 2, Exercise 19
19. Using Definition 2.45, and all theorems and definitions prior to (2.49), prove Theorem (2.49).
Chapter 2, Exercise 20
20. Using Definition 2.45, and all theorems and definitions prior to (2.52), prove Theorem (2.52).
Chapter 2, Exercise 21
21. Using Definition 2.45, and all theorems and definitions prior to (2.53), prove Theorem (2.53).
Chapter 2, Exercise 22
22. Using Definition 2.45, and all theorems and definitions prior to (2.55), prove Theorem (2.55).
Chapter 2, Exercise 23
23. Using Definition 2.45, and all theorems and definitions prior to (2.56), prove Theorem (2.56).
Chapter 2, Exercise 24
24. Consider three events A, B, and C that are all part of the same sample space. Develop a generalformula for determining P(A c B c C). [HINT: Use Theorem 2.50 more than once, and note that:
(A 1 (B c C)) = (A 1 B) c (A 1 C)and
(A 1 B) 1 (A 1 C) = (A 1 B 1 C) ].
43
To formally show that a set equality holds you must show that any element in the left hand sideset must also be in the right hand side set, and vice verse.
Chapter 2, Exercise 25
25. Formally show that the set equalities in Exercise 2 hold.
To formally show that A d B you must show that any element A must also be in B.
Chapter 2, Exercise 26
26. Formally show that the following statements are equivalent, that is show that a) implies b)implies c):
a. A d B
b. A c B = B
c. A 1 B = A
Chapter 2, Exercise 27
27. A hand of 13 cards is to be dealt at random and without replacement from a standard deck ofplaying cards. Find the conditional probability that there are at least three kings in the hand relativeto the hypothesis that the hand contains at least two kings.
Chapter 2, Exercise 2828. Suppose there are four roads (A, B, C, and D) from Town X to Town Y, and three roads (E, F,and G) from Town Y to Town Z. Suppose a criminal travels from Town X to Town Y to Town Z,and a detective is asked which route the criminal took from Town X to Town Z. The detective doesnot know which route was taken and considers each possible route equally likely. Eventually thedetective is forced to guess and hypothesizes that the criminal went via road A between Town X andTown Y and via road F from Town Y to Town Z. What is the probability that the detective iscorrect?
Chapter 2, Exercise 2929. In the daily lottery, a number from 000 to 999 is picked randomly. Yesterday the number 463won the lottery.
a. What is the probability that 463 wins today?
b. What is the probability that 463 does not win today?
44
c. Pick any integer from 000 to 999. What is the probability that your number will win thelottery today?
Chapter 2, Exercise 3030. Of the employees for a supermarket chain, 60% are women, 50% work part-time, and 35% ofthe women work part-time.
a. What proportion of employees are females who work part-time?
b. What proportion of employees are males who work full-time?
Chapter 2, Exercise 3131. A national credit card company is interested in getting more young female customers. In arecent survey, the company gathered data on the marital and employment status of 25-year-oldwomen. Suppose that 60% of all 25-year-old women are married, 50% have full-time jobs, and 20%are both married and have full-time jobs. Given that a 25-year-old woman has a full-time job, whatis the probability that she is married?
Chapter 2, Exercise 3232. Mr. Wilson, the owner of a used car agency, meets about 45% of the agency's potentialcustomers and makes a sale to about 60% of those he meets. The other salespeople meet the other55% of the potential customers and make a sale to about 50% of these individuals.
a. What proportion of potential customers eventually buy a car?
b. What proportion of sales can be attributed to Mr. Wilson?
Chapter 2, Exercise 3333. At the McVay Advertising Agency, 80% of newly hired employees quit within one year. Aman and a woman are hired on the same day. Assume that their decision to stay or leave areindependent of one another. What is the probability of the following:
a. Both will work there more than a year.
b. At least one will work there more than a year.
c. Neither works there for a year.
Chapter 2, Exercise 3434. An oil wildcatter has assigned a probability of .5 to striking oil on a certain plot of property.The wildcatter orders a seismic survey that has proven to be 90% reliable in the past. That is, whenoil is present, it predicts favorably 90% of the time, and when no oil is present, it predicts no oil 90%of the time.
45
a. Given a favorable seismic result, what is the probability for oil?
b. Given an unfavorable seismic result, what is the probability for oil?
Chapter 2, Exercise 3535 . A company introduced a new product and classified the responses of customers as favorableor unfavorable in four different cities. The proportions of responses were as follows:
City New
Response York Boston Chicago Detroit============================================================Favorable .15 .09 .17 .13Unfavorable .09 .13 .08 .16
a. What is the probability that a randomly chosen respondent is both favorable to the productand from New York?
b. Find the probability that a respondent from Boston reacts favorably to the product.
c. Is reaction to the product independent of city?
d. Given that a respondent is from Chicago, find the probability that the response is favorable.
Chapter 2, Exercise 3636. For a randomly selected person, define the following events:
A = {The event that the person watches NBC's TONIGHT SHOW} B = {The event that the person watches Seinfeld}
If P(A) = .3; P(A 1 B) = .2; P(A c B) = .7, then find:
a) P(B)b) P(A )c
c) P(B )c
d) P(A 1 B )c
e) P(A 1 B)c
f) P(A 1 B )c c
g) P(A | B)h) P(A | B )c
i) P(B | A)j) P(A | B ) where A denotes the complement of A. c c
46
k) Are the events A and B independent?
Chapter 2, Exercise 3737. Draw a Venn diagram depicting two independent events.
Chapter 2, Exercise 2737. Prove that if health insurance coverage and visiting the physician are independent events thenthe treatment effect of health insurance is zero.