chapter 2 – econometrics for health policy, health ...plaza.ufl.edu/jvt/econometrics-topic2.pdf2...

Chapter 2 – Econometrics for Health Policy, Health Economics, andOutcomes Research– Probability

©Copyright Joseph V. Terza, Ph.D. 2007 All Rights Reserved

1. Descriptive Statistics and Summation Notation

1.1 Introduction

1.2 Descriptive Statistics

1.3 Summation Notation

1.4 Some Useful Descriptive Statistics

2. Basic Probability Theory

2.1 Set Theory

2.2 Counting Techniques

2.3 Basic Definitions for Probability Theory

2.4 Assigning Probability Values to Events

2.5 Some Important Theorems on Probability Measures

3. Joint Probability

3.1 Joint and Conditional Probability

3.2 Stochastic Independence

3.3 Probabilities Assigned to Samples Drawn With and Without Replacement

3.4 Baye's Theorem

Exercises

2

1. Descriptive Statistics and Summation Notation

1.1. Introduction

(2.1) DEF: Statistics is the branch of mathematics that consists of a set of analytical techniques that

can be applied to data to help in making judgements and decisions in problems involving uncertainty.

There are two branches of statistics: Descriptive Statistics and Inferential Statistics

(2.2) DEF: Descriptive statistics consists of procedures for (1) tabulating or graphing the general

characteristics of a set of data and (2) describing some characteristics of this set, such as measures

of central tendency or measures of dispersion.

(2.3) DEF: Inferential statistics consists of a set of procedures that helps in making inferences and

predictions about a whole population based on information from a sample of the population.

1.2 Descriptive Statistics

There are two reasons for learning about descriptive statistics. First, after gathering data one

should always examine it carefully. Your examination should range from directly examining the

data, to applying various techniques for summarizing the data. Such techniques include:

a) tables

b) graphs

c) summary statistics

i) measures of central tendency

3

ii) measures of dispersion.

Secondly, the discussion of descriptive statistics will serve as a foundation for our study of

probability theory.

The first step in summarizing the data is determining the percentages of data points in each

of the possible categories of the data.

(2.4) DEF: The relative frequency of the ith category is the proportion of observations falling in that

icategory. If the ith category contains f observations, then the relative frequency of the ith category

iis f /n where n denotes the total number of observations in the sample. If we are working with a

ipopulation of N items, then the relative frequency of the ith class is f /N. A relative frequency

distribution is a table that shows the proportion of observations that falls into each category.

(2.5) DEF: A histogram is a graphical representation of a relative frequency distribution.

Example

wages

$4, 4.60, 4.75, 5, 5, 5.50, 5.90, 6, 6, 6.25, 6.25, 6.75, 6.80, 7, 7.25, 7.30, 7.50, 8, 8.50, 9

wage relative frequency

4 to under 5 3/20 = .15

5 to under 6 4/20 = .20

4

6 to under 7 6/20 = .30

7 to under 8 4/20 = .20

8 to under 9 2/20 = .10

9 to under 10 1/20 = .05

(Figure 2-1 Here)

(2.6) DEF: A relative frequency distribution is said to be symmetric if there exists a value such that

the portion of the distribution to the left of the value is the mirror image of the portion to its right.

(2.7) DEF: A relative frequency distribution is said to be skewed if it is not symmetric.

(Figure 2-2 Here)

1.3 Summation Notation

(2.8) DEF: The symbol

i i i g(x , y , z ,...)

is defined by the equation

5

i i i 1 1 1 2 2 2 I I I g(x , y , z ,...) = g(x , y , z ,...) + g(x , y , z ,...) + ... + g(x , y , z ,...)

where g(...) is any function

Examples:

i i i i1) g(x , y , z ,....) = x

so

i i i i 1 2 I g(x , y , z ,....) = x = x + x + ... + x .

i i i i i2) g(x , y , z ,...) = x y

so

i i i i i 1 1 2 2 I I g(x , y , z ,...) = x y = x y + x y + ... + x y .

Useful properties of summations:

Let c be a constant (i.e., it does not change with the counter i)

1) c = I c

i i i i i i2) c g(x , y , z ,...) = c g(x , y , z , ...)

6

i i i i i i i i i i i i3) {g(x , y , z ,...) + h(x , y , z ,...)} = g(x , y ,z ,...) + h(x , y ,z ,...)

(2.9) DEF: The symbol

ij ij ij g(x , y , z ,...)

is defined by

11 11 11 12 12 12 1J 1J 1J = (g(x , y , z ,...) + g(x , y , z ,...) + ... + g(x , y , z ,...)) +

21 21 21 22 22 22 2J 2J 2J (g(x , y , z ,...) + g(x , y , z ,...) + ... + g(x , y , z ,...)) +

I1 I1 I1 IJ IJ IJ ... + (g(x , y , z ,...) + ... + g(x , y , z ,...))

Examples:

1) Let I = 2 and J = 2

where

11 12 21 22x = 1, x = 2, x = 3, x = 4

and

11 12 21 22y = 2, y = 3, y = 1, y = 1.

Then

Note that in this chapter we use the word population to mean finite population. This means1

that there is a finite number of individuals in the poplulation. In the next chapter we will discuss thedifference between finite and infinite in describing populations.

7

ij x = (1 + 2) + (3 + 4) = 3 + 7 = 10

2) Using the same definitions as in Example 1

ij ij (x + y ) = ((1 + 2) + (2 + 3) ) + ((3 + 1) + (4 + 1) )2 2 2 2 2

= 9 + 25 + 16 + 25 = 75

1.4 Some Useful Descriptive Statistics

By far the most important measure of central tendency

(2.10) DEF: The population mean is defined and denoted1

(2.11) DEF: The sample mean is defined and denoted

8

1 2 n(2.12) DEF: Suppose the observations x , x , ..., x have been arranged in ascending order. The

p ppth percentile is the value x such that p percent of the observations are less than or equal to x and

p(100 - p) percent of the observations are greater than x .

(2.13) DEF: The lower quartile of a set of observations, also called the first quartile, is the 25th

1percentile. The first quartile is denoted Q . The upper quartile, also called the third quartile, is

3denoted by the symbol Q and is the 75th percentile of the data. The median is the 50th percentile

(or second quartile).

(2.14) DEF: The range of a set of observations is the difference between the largest and smallest

values.

3 1(2.15) DEF: For any population or sample of values, the interquartile range is (Q - Q ).

i i(2.16) DEF: The deviation from the mean is the difference x - ì or where x is an

observation (population or sample).

Computing the average deviation may seem like a good idea for measuring variation in the data but

it is not because

9

because (as you will show for homework)

(2.17) DEF: The population variance is defined and denoted

ó = 2

(2.18) DEF: The sample variance is defined and denoted

S = 2

(2.19) DEF: The population standard deviation is defined and denoted

ó =

(2.20) DEF: The sample standard deviation is defined and denoted

S =

these two items give you a way of getting the average deviation without running into the zero

10

problem.

(2.21) THEOREM: (Chebyshev's Inequality) Let c be any number greater than 1. For any

population of data, the proportion of observations that lie fewer than c standard deviations from the

mean is at least 1 - .

Example

Recall the wage data considered earlier.

ì / mean = 6.3675

ó / / standard deviation = 1.3069.

Therefore the proportion of the population between ì - 2ó (3.7537) and ì + 2ó (8.9814) is, according

to the theorem, greater than or equal to

1 - 1/4 = .75

Directly calculating the actual value from the relative frequencies yields

19/20 = .95.

2. Basic Probability Theory

2.1 Set Theory

(2.22) DEF: A set is any collection of objects. These objects are called elements.

11

Ways to denote sets:

1 2 3A = {a ,a ,a ,...}

A = {a: a satisfies certain criteria}

(2.23) DEF: The null set, designated 0/ , is the set with no elements.

(2.24) DEF: A set A is said to be a subset of another set B, denoted A d B, if every element of A is

also an element of B.

(2.25) DEF: The union of the sets A and B is another set containing all the elements belonging to

either A or B. We write A c B.

(2.26) DEF: The intersection of the sets A and B is another set containing all elements common to

both A and B. We write A 1 B.

(2.27) DEF: Two sets A and B are said to be disjoint if A 1 B = 0/ .

(2.28) DEF: The universal set, denoted U, is the set of all elements of interest to a particular

problem. It is the set of which all sets are subsets.

(2.29) DEF: The complement of a set A is the set containing all the elements of U that are not in A.

We write

When I refer to a standard deck of playing cards I mean a deck of 52 cards, four suits each2

comprising 13 cards (hearts, spaded, diamonds, clubs), cards in each suit marked 2 through 10, J,Q, K, A (jack, queen, king, ace). If you are unfamiliar with playing cards, please consult me or afriend.

12

A complement = Ac

(2.30) DEF: A set is said to be countable if its elements are in one-to-one correspondence with the

natural numbers (the positive integers).

(2.31) DEF: A set is said to be finite if it is countable and has a finite number of elements.

(2.32) DEF: A set is said to be countably infinite if it is countable and has an infinite number of

elements.

(2.33) DEF: A set is said to be uncountably infinite if it is not countable and has an infinite number

of elements.

Examples2

finite - the subset of all spades in a standard deck of playing cards.

countably infinite - the set of number of heavenly bodies north of the earth’s equator at a

given moment in time.

uncountably infinite - the number of points on the rightmost panel of the blackboard.

13

1 2 n 1 2 n(2.34) DEF: Suppose A is a finite set consisting of elements a , a ,..., a so that A = {a , a , ..., a }.

We denote the number of elements in A by n(A).

2.2 Counting Techniques

(2.35) THEOREM: Let A and B be finite sets. The number of possible ordered pairs (a,b) where

a 0 A and b 0 B is n(A)n(B). Where "0" means "is an element of."

1 2 J(2.36) COROLLARY: Let A , A , ..., A be finite sets. The number of possible ordered n-tuples

1 2 J 1 2 J(a , a , ..., a ) is n(A )n(A ), ..., n(A ).

(2.37) DEF: Let N be a positive integer. The product of all integers from 1 to N is called N-factorial

and is denoted N!.

note 0! = 1

e.g., 3! = 1x2x3 = 6

(2.38) DEF: A permutation of N different things taken R at a time is an arrangement in a specific

N Rorder of any R of those N things. We write P .

e.g.

Three things A B C

ABC BAC CABACB BCA CBA

14

Note that there are 3! permutations of 3 items taken 3 at a time.

Can waste a lot of time counting up permutations the above way. Let's find an easier way. We can

derive a simpler formula in 3 steps.

1. After choosing the first you have N-1 ways of choosing the second. So the total number of ways

to choose the first and the second is N(N-1).

2. Therefore by the same reasoning there are

N(N-1)(N-2)

ways of choosing the first 3 items.

3. Likewise there are

N RP = N(N-1)(N-2) ..... (N-(R-1))

ways of choosing the first R items.

e.g., 4 things taken 3 at a time

4 3P = 4 x 3 x 2 = 24

Notice that this formula is still a bit cumbersome. But we can simplify it further. Extending the

expression in 3, we obtain

[N(N-1)(N-2) C ... C (N-(R-1))] C [(N-R)(N-(R+1)) C ... C 1] = N!

or

15

N RP C (N-R)! = N!.

or

N!

N RP = ______ (N-R)!

This implies the following definition

(2.39) DEF: The formula for the number of permutations of N objects taken R at a time is

N RP =

Example A personnel manager interviews 10 and must select and rank 3

10 3P = = 10 x 9 x 8 = 720.

(2.40) DEF: A combination of N things taken R at a time is an arrangement of any R of those things

N Rwithout regard to order. We use the symbol C to denote the number of combinations of N things

taken R at a time.

e.g., Suppose a drug salesman must visit 5 out of 10 hospitals and the order of the calls is not

important. The number of possible visit combinations is

16

10 5 C .

Note that each combination of R things affords R! permutations. In other words

N R N RP = R! C

which implies that

N R N RC = P /R!

(2.40) DEF: The formula for the number of combinations of N objects taken R at a time is

N RC =

sometimes the symbol

N RC = is used.

Back to the example (5 hospitals out of 10)

10 5 C = = 252

2.3 Basic Definitions for Probability Theory

17

(2.41) DEF: An experiment is any repeatable process from which an outcome, measurement, or

result is obtained. When the outcomes cannot be predicted with certainty, the experiment is called

a random experiment.

Outcomes of experiments can be:

qualitative attributes (people, objects)

quantitative attributes - real values

continuous - point along the real line

discrete - positive integers

(2.42) DEF: Each of the possible outcomes of an experiment is called a basic outcome and denoted

ù (the lowercase Greek letter omega). The set of all possible basic outcomes for a given experiment

is called the sample space of the experiment and is denoted Ù.

Basic outcomes correspond to elements in set theory and the sample space corresponds to the

universe.

Examples

1 2 3ù = Mike, ù = Kathy, ù = Mary Beth

iù denotes a point on the real line

0 1 2ù = 0, ù = 1, ù = 2, ..., 4

18

(2.43) DEF: An event is any collection of basic outcomes; i.e., an event is any subset of the sample

space.

We denote events using uppercase block letters. We say that an event occurs if any basic outcome

in that event occurs.

(2.44) DEF: The probability of an event is the limiting relative frequency with which the event

occurs when an experiment is repeated a large number of times under identical circumstances. Given

the event A (i.e., the subset of the sample space) we write P(A) = the probability of A. Technically

then

Twhere T denotes the number of trials of the experiment, and A denotes the number of times the

event A occurs in T trials.

2.4 Assigning Probability Values to Events

How we assign probabilities to events defined on a random experiment will depend on the

nature of the sample space and the events defined thereon. Recall that the sample space corresponds

to the concept of the universal set in set theory, and events are subsets of the sample space. We saw

earlier that universes and their subsets can be finite, countably infinite, or uncountably infinite.

Therefore, sample spaces and events can be finite, countably infinite, or uncountably infinite. How

19

probability is assigned to a specified event will depend on the whether that event (and its sample

space) is finite, countably infinite, or uncountably infinite. Each will warrant a different method for

assigning probability

Examples

a) Finite Case - Suppose all basic outcomes are equiprobable. Then

P(A) = .

i) Suppose we have an urn of marbles, 7 of which are red and 3 of which are green. The

random experiment consists of drawing a marble blindly from the urn and observing its

color.

basic outcomes = the ten marbles

sample space = {red, red, red, red, red, red, red, green, green, green}

possible events are A = {red}, B = {green}, sample space itself

P(A) = .7

P(B) = .3

P(Ù) = 1

ii) Consider a standard deck of playing cards. Suppose the random experiment consists of

20

dealing a 5-card hand from the deck after it has been thoroughly shuffled. The basic

outcomes are the possible hands. Is this set infinite? How many basic outcomes are there?

52 5C

Find the probability of getting exactly 3 aces. How many ways to get hands with exactly 3

aces?

4 33 of the 4 ace cards = C

48 22 of the remaining 48 non-ace cards = C

A = {all hands with exactly 3 aces}

P(A) =

b) Countably Infinite Case -

Consider the manufacture of cotter pins. Suppose the random experiment consists of examining the

cotter pins produced in a 1 hr. period and counting the number of defectives. The basic outcomes

in this case are the possible numbers of defective pins observed in that 1 hr. period. The sample

space consists of the set of Natural Numbers (the positive integers and zero). So this sample space

is countable but (virtually) infinite. It has been shown that under certain conditions the following

function can be used to assign probabilities to the elementary events

21

where j = 0, 1, 2, ...,4, ì is the only parameter of the function and

e = lim (1 + (1/m)) = 2.7128...m

m64

In other words if this experiment were conducted an infinity of times the relative frequency of the

event {j} would be as is described by the function above. Now as an example suppose that ì = .5

P({0}) = e = e = .6065.-ì -.5

P({1}) = ìe = .5e = (.5)(.6065) = .3033.-ì -.5

P({2}) = (ì e )/2 = (.25e )/2 = (.25)(.6065)/2 = .0758.2 -ì -.5

.

.

.

Now suppose that ì = .001

P({0}) = e = e = .9990-ì -.001

P({1}) = ìe = (.001)e = (.001)(.9990) = .0009.-ì -.001

P({2}) = (ì e )/2 = ( (.001) e )/2 = (.000001)(.9990)/2 2 2-ì -.001

= .0000005.

22

c) Uncountably Infinite Case - Suppose the sample space Ù is a bounded subset of the Cartesian

plane. The basic outcomes are the points in Ù. If each point in the sample space is equiprobable the

rule for assigning probabilities to events is

P(A) =

where A is a subset of Ù.

The following definition generalizes the concept of the probability assignment rule to cover all types

of sample spaces and events.

(2.45) DEF: Let A be an event (i.e., A is a subset of Ù) and define the set function P(A) such that

a) 0 # P(A)

1 2 3 1 2 3b) P(A c A c A c ...) = P(A ) + P(A ) + P(A ) + ...

i j 1 2 3where A and A are disjoint (i not equal to j) and the set of sets (class) {A , A , A ,

... } is countable

c) P(Ù) = 1.

Note that P(A) so defined satisfies the conditions for being a measure. A measure is a set function

that satisfies certain properties. The study of measures is beyond the scope of this course but you

should be aware that there is an entire field of mathematics, called measure theory, devoted to

measures and their properties. The fact that P(A) is a measure means that all of the results of

measure theory can be applied to the study of probability. Henceforth we refer to P(A) as probability

23

measure. It is, in fact, property (c) that makes P(.) a probability measure.

2.5 Some Important Theorems on Probability Measures

There are a number of important theorems which follow from Definition (2.45).

(2.46) THEOREM: For each event A, P(A) = 1 - P(A ).c

(2.47) THEOREM: The probability of the null event is zero; i.e., P(0/) = 0.

1 2 1 2 1 2(2.48) THEOREM: If A and A are events such that A d A , then P(A ) # P(A ).

(2.49) THEOREM: For each event A, 0 # P(A) # 1.

1 2(2.50) THEOREM: For every pair of events A and A

1 2 1 2 1 2P(A c A ) = P(A ) + P(A ) - P(A 1 A )

(2.51) DEF: If events A and B have no basic outcomes in common they are said to be mutually

exclusive. In this case A 1 B = 0/ .

(2.52) THEOREM: For mutually exclusive events A and B, P(A 1 B) = 0.

24

(2.53) THEOREM: For mutually exclusive events A and B,

P(A c B) = P(A) + P(B).

Example

1 2 K 1 2 KLet A = {ù ,ù ,...,ù } be an event where ù , ù , ..., ù are basic outcomes then

1 2 KP(A) = P({ù }) + P({ù }) + ... + P({ù })

or more simply

1 2 K iP(A) = P({ù }) + P({ù }) + ... + P({ù }) = P({ù })

(2.54) DEF: If events A and B cover the sample space we say that A and B are collectively

exhaustive. In this case A c B = Ù.

(2.55) DEF: For collectively exhaustive events A and B

P(A c B) = 1

(2.56) THEOREM: If A and B are mutually exclusive and collectively exhaustive we have

P(A c B) = 1 = P(A) + P(B)

25

Examples

ii) P(Ù) = Ó P({ù }) = 1

ii) A = {students concentrating in history}

B = {students concentrating in math}

P(A c B) = %(hist or math) = .75

P(A) = %(hist) = .6

P(B) = %(math) = .5

What is P(A 1 B) ?

We know by Theorem 2.50 that P(A 1 B) = P(A) + P(B) - P(A c B)

= .6 + .5 - .75

= .35

iii) Store has 10 loaves of bread in its inventory

6 fresh

4 stale

The random experiment consists of a customer blindly picking two loaves. The sample space is

Ù = {all possible pairs of loaves out of 10}.

26

Let the event

A = {all pairs of loaves in which there is at least one stale loaf}

A = {all pairs of loaves in which there are no stale loaves}. c

Therefore

P(A) = 1 - P(A )c

P(A ) =c

n(Ù) = the number of combinations of 10 loaves taken 2 at a time

10 2= C

n(A ) = the number of ways of drawing two loaves from among thec

6 fresh ones

6 2= C

10 2n(Ù) = C =

6 2 n(A ) = C = c

27

Therefore

P(A) = 1 - P(A ) = .c

3. Joint Probability

3.1 Joint and Conditional Probability

Often we can partition the sample space into a class of mutually exclusive events based on

any one of a variety of categorization schemes. For example suppose

Ù = {population of individual health care demanders}.

We can categorize individuals (basic outcomes) according to their health insurance coverage (1 if

covered; 0 if not covered), or according to whether or not they visited a physician at least once

during the previous year (1 if they did; 0 if they didn’t). We can then construct a table of the

following form which depicts the four possible joint probabilities assigned to these events:

HI 1 0 VISIT 1 P( 1 1 1) P(1 1 0) 0 P( 0 1 1) P(0 1 0)

Note that if you sum the entries in the first column you obtain

28

P( 1 1 1) + P( 0 1 1) = P((1 1 1) c ( 0 1 1)) = P(HI = 1).

Likewise if you sum the entries in the second columns you obtain P(HI = 0). And summing across

the first and second rows respectively yields P(VISIT=1) and P(VISIT=0). Placing these newly

computed probabilities in their appropriate positions on the margins, we rewrite the above table as

HI 1 0 VISIT 1 P(1 1 1) P(1 1 0) P(VISIT = 1) 0 P(0 1 1) P(0 1 0) P(VISIT= 0)

P(HI = 1) P(HI = 0)

The four probabilities along the margins of the above table are called the marginal probabilities.

They describe the probabilities associated with a particular categorization of the sample space,

regardless of the other possible categorizations. It is easy to see that these concepts of joint and

marginal probabilities can be extended to cases in which there are more than two categorization

schemes and/or more than two events in each categorization.

Example

Suppose there were 100 individuals in the population and the joint frequency table is:

HI 1 0 VISIT 1 n(1 1 1)=35 n(1 1 0)= 25

29

0 n(0 1 1)=15 n(0 1 0)= 25

so the joint/marginal probability table is

HI 1 0 VISIT 1 .35 .25 .6 0 .15 .25 .4

.5 .5

(2.57) DEF: Given two events A and B the conditional probability of A given B is the probability

that event A occurs on a trial of the random experiment given certainty that B occurs. We write

the conditional probability of A given B = P(A*B).

Example

Consider the health insurance example discussed above. Intuitively then, to evaluate P(VISIT=1

*H.I.= 1) using our method for assigning probabilities to finite sets we divide the number who both

visit and are covered by the number who are covered. We thus obtain

P(VISIT=1 *HI= 1) =

but dividing both numerator and denominator by n(Ù) yields

30

P(VISIT=1 *HI= 1) =

which leads us to the familiar definition of conditional probability

P(VISIT=1 *HI= 1) .

(2.58) DEF: P(A*B) = P(A 1 B)/P(B) and likewise P(B*A) = P(A 1 B)/P(A).

The essence of the concept of conditional probability is redefining the sample space. In our example

the sample space is first redefined (reduced) from the entire population of 100 individuals to the

smaller set including only those 50 who have health insurance.

3.2 Stochastic Independence

(2.59) DEF: Two events A and B are said to be stochastically independent if

P(A*B) = P(A)

or equivalently

P(B*A) = P(B).

31

In our example the events VISIT = 1 and HI = 1 are not independent. P( VISIT = 1) = .6 and

P(VISIT = 1 | HI = 1) = .7.

There are a couple of alternative ways to describe independence.

(2.60) DEF: Independence means that no new information (in terms of probability assignments) is

supplied by knowledge of the conditioning event.

or

(2.61) DEF: Independence means that the probability of the event in question relative to the new

(redefined) sample space is the same as the probability of that event relative to the original sample

space.

(2.62) THEOREM: If two events A and B are stochastically independent then

P(A 1 B) = P(A)P(B).

This follows from the fact that by the definition of conditional probability

P(A 1 B) = P(A*B)P(B)

but when A and B are independent

P(A*B) = P(A).

32

NOTE: A and B being disjoint is not equivalent to their being independent. In fact, if A and B are

disjoint they cannot be independent.

3.3 Probabilities Assigned to Samples Drawn With and Without Replacement

Let's return to an earlier example. Recall, a store has 10 loaves of bread in its inventory

6 fresh4 stale

The random experiment consists of a customer blindly picking two loaves. What is the probability

of getting two fresh loaves? We can view this problem from a perspective that is slightly different

from the one taken earlier. First, define the sample space as S = {f,f,f,f,f,f,s,s,s,s}. Secondly, define

the random experiment as drawing one loaf from S. So here we repeat the experiment twice.

A = {fresh loaf on first draw}

B = {fresh loaf on second draw}

We want the probability of A and B or A 1 B. By the definition of conditional probability we know

that P(A 1 B) = P(B*A)P(A). In our example then

P(A 1 B) = P(B*A)P(A) = .

Recall that we computed this same probability earlier by viewing this as a random experiment

consisting of a customer blindly picking two loaves. The sample space is

33

Ù = {all possible pairs of loaves out of 10}.

Let the event

A = {all pairs of loaves in which there is at least one stale loaf}

A = {all pairs of loaves in which there are no stale loaves}. c

Therefore

P(A) = 1 - P(A )c

P(A ) =c

n(Ù) = the number of combinations of 10 loaves taken 2 at a time

10 2= C

n(A ) = the number of ways of drawing two loaves from among thec

6 fresh ones

6 2= C

10 2n(Ù) = C =

6 2 n(A ) = C = c

34

In other words

6 2 10 2P(A 1 B) = C / C = 1/3.

Now suppose that on the second draw the experiment is conducted under different circumstances,

i.e., the loaf that was drawn first is replaced before the second drawing is made. Now the draws are

indistinguishable, the events can be rewritten

A = {fresh loaf}

B = {fresh loaf}

and P(A) = P(B) = P(B*A) = .6. Therefore the events are independent and

P(A 1 B) = P(A)P(B) = P(A) = .362

The first case is an example of sampling without replacement, i.e. for two events A and B defined

respectively on successive trials of a random experiment such that the item drawn in the first trial

is not replaced before the second draw is conducted

P(A 1 B) = P(B*A)P(A)

The second case is an example of sampling with replacement, i.e., for two events A and B defined

respectively on successive trials of a random experiment such that the item drawn in the first trial

is replaced before the second draw is conducted

35

P(A 1 B) = P(A)P(B).

3.4 Bayes' Theorem

Consider a random experiment with n mutually exclusive collectively exhaustive outcomes

1 2 n;E , E , ..., E and suppose that the respective probabilities assigned to these outcomes are

1 2 n(*) P(E ), P(E ), ..., P(E ).

Suppose also that we are interested in updating these probabilities based on some new information.

Let's denote the new information as an event A defined on the same sample space. We are interested

then in

1 2 n(**) P(E *A), P(E *A), ..., P(E *A).

The probabilities in (*) are called the prior probabilities, and (**) are called the posterior

probabilities. The problem is that the only information you have at your disposal is the set of

following conditional probabilities

1 2 nP(A*E ), P(A*E ), ..., P(A*E ).

For example consider assessing the probability that a particular jet engine is made at one of a

company's two plants. The possible events are

1E = engine made at plant 1

36

2E = engine made at plant 2

The known prior probabilities are

1P(E ) = .4

2P(E ) = .6

You would like to update these probabilities given the knowledge that the engine is defective let this

be denoted as the event A. So you want

1P(E *A)

but the only information you have is

1P(A*E ) = .02 = % of defectives made at plant 1

2P(A*E ) = .03 = % of defectives made at plant 2

From the definition of conditional probability we know that

1P(E *A) =

1 1 1and P(A 1 E ) = P(A*E )P(E ). But we still don't know P(A), or do we?

1 2 1 2 1 2A = (A 1 E ) c (A 1 E ) but because E and E are mutually exclusive (A 1 E ) and (A 1 E ) are

also mutually exclusive so

1 2 1 1 2 2P(A) = P(A 1 E ) + P(A 1 E ) = P(A*E )P(E ) + P(A*E )P(E )

37

and

1P(E *A) =

In general then the following is true.

1 2 n(2.64) THEOREM: (Bayes' Theorem) Let E , E , ..., E be events with prior probabilities

1 2 nP(E ), P(E ), ..., P(E ). Let A be another event defined on the same sample space. Then the

iposterior probability of E is given by

.

Exercises

Chapter 2, Exercise 1

1. Suppose that we have the following four observations on x and y.

x y1 22 23 44 4

38

For each of the following, without using the x-y numbers, attempt to establish the validity of theequality using the three “Useful Properties of Summations” given on p. 5 of Chapter 2 of the text.If you suspect that a particular equality does not hold, plug in the above x-y values to confirm yoursuspicion.

i i(a) Ó2x = 2Óx

i i(b) Ó2x = 2(4)Óx

i i i i(c) Ó(x + y ) = Óx + Óy

i i i i(d) Ó2x y = 2Óx y

i i i i(e) Ó2x y = 2(Óx )(Óy )2i = 2Óx2i(f) Ó2x

i = 2(Óx )2i(g) Ó2x 2

i i i i(h) Ó(2 + 3x + 4y ) = 2 + 3Óx + 4Óy

i i i i(i) Ó(2 + 3x + 4y ) = 2(4) + 3Óx + 4Óy

i i(j) Ó(y + 3) = Óy + 3


2. Consider a state in which there are ten colleges and universities. Tuition and fees (in dollars) atthe institutions are as follows:

8315, 12124, 4239, 14138, 16050

9327, 7136, 9051, 13124, 9250

Calculate the mean and variance of this population of tuition levels.


3. Consider a college where tuition and fees are $17,000. The mean student receives a $2000grant against tuition from the college. If total revenues (tuition net of grants) for the collegeare $22,500,000 for the year, how many students are enrolled at the college?


4. Show that:

39


5. There are 20 flower vendors in a city, with the following profits (in dollars) on a given day:

52.24, 68.01, 73.13, 38.26, 65.06, 58.26, 48.57, 53.42, 71.29, 64.13,30.05, 48.27, 51.43, 52.67, 49.28, 48.16, 59.04, 63.23, 67.15, 70.21

Using cells with midpoints of 35, 45, 55, 65, and 75, graph the relative-frequency histogram for thisvariable.


6. The following data represent a sample of weekly wages (in dollars) earned by part-time employees at a department store:

80 98 75 69 81 88 78 96 70 88 85 88 75 58 97 67 61 52 76 81 83 70 98 83 85 90 64 95 63 82 108 109100 96 92 100 73 94 105 78

a) Put the data in order.

b) Construct a frequency distribution using the classes $50 to under $60, $60 to under $70, and soforth.

c) Graph the histogram.

d) Construct and graph the relative frequency distribution.

e) Explain the difference between the height and area of a bar in the histogram.

f) Construct the cumulative frequency distribution.

g) Calculate the sample mean, sample variance, and sample standard deviation.

h) What is the median?

i) Find the first and third quartiles.

j) Compute the range and interquartile range for the data given in this problem.

40


7.

i iA. Let the values of X be 5, 4, 2, 12, 12, 10, 6, 7, 9, and 4, respectively. Also let y = 5x . Calculatethe following sums:

ia. G y

ib. G 2y

ic. G y2

B. Write the following values in summation notation:

1 2 15a. x + x + ... + x

1 2 22b. 5x + 5x + ... + 5x

1 1 2 2 9 9c. x + y + x + y + ... + x + y

C. Are the following expressions true or false?

i i i ia. (Gx )(Gy ) = G x y

i ib. G x = (G x )2 2

i i i ic. G(x + y ) = G x + G y2 2 2


8. Use a Venn diagram to illustrate A 1 B .c


9. Let A, B d U (the universe). Using Venn diagrams show that the following statements are true(DeMorgan’s Laws):

41

a)

b)


10. The menu on an airplane offers a choice of four drinks, three salads, five entrees, fourvegetables, two kinds of potatoes, and five desserts. If a meal consists of one of each item, howmany different meals are possible?


11. From a group of 5 men and 7 women, how many different committees consisting of 2 men and3 women can be formed?


12. A publishing company publishes five different how-to books for the handy person, and it has aspecial offer of three books for $10. A woman has decided to buy three books and give them to herhusband one at a time to entice him to make three desired home repairs. In how many ways canthree books be selected and ordered from the list of five books?


13. Compute the probability of being dealt at random and without replacement a 5-card hand inwhich there are exactly three kings and two queens (from a standard deck of playing cards)?


14. Compute the probability of being dealt at random and without replacement a 13-card bridgehand from a standard deck of playing cards consisting of:

(a) 6 spades, 4 hearts, 2 diamonds, and 1 club.(b) 13 cards of the same suit.


15. Consider a circular dart board whose radius is two feet. If the bull’s-eye is six inches indiameter, what is the probability of getting a bull’s-eye with a randomly tossed dart. By randomlytossed I mean that the probability of the dart landing in any area of the dart board of a given size isthe same, regardless of the shape of the area and where it is on the dartboard.


42

16 . An oil company that has purchased a square tract of land in Alaska 20 miles on a side is goingto pick a site on the tract at random and drill a well. Assume that oil exists in two rectangular pools,each having dimensions 2 miles by 3 miles. If one well is drilled, what is the probability of strikingoil?


17. Using Definition 2.45 prove Theorem (2.46).


18. Using Definition 2.45, and all theorems and definitions prior to (2.47), prove Theorem (2.47).












24. Consider three events A, B, and C that are all part of the same sample space. Develop a generalformula for determining P(A c B c C). [HINT: Use Theorem 2.50 more than once, and note that:

(A 1 (B c C)) = (A 1 B) c (A 1 C)and

(A 1 B) 1 (A 1 C) = (A 1 B 1 C) ].

43

To formally show that a set equality holds you must show that any element in the left hand sideset must also be in the right hand side set, and vice verse.


25. Formally show that the set equalities in Exercise 2 hold.

To formally show that A d B you must show that any element A must also be in B.


26. Formally show that the following statements are equivalent, that is show that a) implies b)implies c):

a. A d B

b. A c B = B

c. A 1 B = A


27. A hand of 13 cards is to be dealt at random and without replacement from a standard deck ofplaying cards. Find the conditional probability that there are at least three kings in the hand relativeto the hypothesis that the hand contains at least two kings.

Chapter 2, Exercise 2828. Suppose there are four roads (A, B, C, and D) from Town X to Town Y, and three roads (E, F,and G) from Town Y to Town Z. Suppose a criminal travels from Town X to Town Y to Town Z,and a detective is asked which route the criminal took from Town X to Town Z. The detective doesnot know which route was taken and considers each possible route equally likely. Eventually thedetective is forced to guess and hypothesizes that the criminal went via road A between Town X andTown Y and via road F from Town Y to Town Z. What is the probability that the detective iscorrect?

Chapter 2, Exercise 2929. In the daily lottery, a number from 000 to 999 is picked randomly. Yesterday the number 463won the lottery.

a. What is the probability that 463 wins today?

b. What is the probability that 463 does not win today?

44

c. Pick any integer from 000 to 999. What is the probability that your number will win thelottery today?

Chapter 2, Exercise 3030. Of the employees for a supermarket chain, 60% are women, 50% work part-time, and 35% ofthe women work part-time.

a. What proportion of employees are females who work part-time?

b. What proportion of employees are males who work full-time?

Chapter 2, Exercise 3131. A national credit card company is interested in getting more young female customers. In arecent survey, the company gathered data on the marital and employment status of 25-year-oldwomen. Suppose that 60% of all 25-year-old women are married, 50% have full-time jobs, and 20%are both married and have full-time jobs. Given that a 25-year-old woman has a full-time job, whatis the probability that she is married?

Chapter 2, Exercise 3232. Mr. Wilson, the owner of a used car agency, meets about 45% of the agency's potentialcustomers and makes a sale to about 60% of those he meets. The other salespeople meet the other55% of the potential customers and make a sale to about 50% of these individuals.

a. What proportion of potential customers eventually buy a car?

b. What proportion of sales can be attributed to Mr. Wilson?

Chapter 2, Exercise 3333. At the McVay Advertising Agency, 80% of newly hired employees quit within one year. Aman and a woman are hired on the same day. Assume that their decision to stay or leave areindependent of one another. What is the probability of the following:

a. Both will work there more than a year.

b. At least one will work there more than a year.

c. Neither works there for a year.

Chapter 2, Exercise 3434. An oil wildcatter has assigned a probability of .5 to striking oil on a certain plot of property.The wildcatter orders a seismic survey that has proven to be 90% reliable in the past. That is, whenoil is present, it predicts favorably 90% of the time, and when no oil is present, it predicts no oil 90%of the time.

45

a. Given a favorable seismic result, what is the probability for oil?

b. Given an unfavorable seismic result, what is the probability for oil?

Chapter 2, Exercise 3535 . A company introduced a new product and classified the responses of customers as favorableor unfavorable in four different cities. The proportions of responses were as follows:

City New

Response York Boston Chicago Detroit============================================================Favorable .15 .09 .17 .13Unfavorable .09 .13 .08 .16

a. What is the probability that a randomly chosen respondent is both favorable to the productand from New York?

b. Find the probability that a respondent from Boston reacts favorably to the product.

c. Is reaction to the product independent of city?

d. Given that a respondent is from Chicago, find the probability that the response is favorable.

Chapter 2, Exercise 3636. For a randomly selected person, define the following events:

A = {The event that the person watches NBC's TONIGHT SHOW} B = {The event that the person watches Seinfeld}

If P(A) = .3; P(A 1 B) = .2; P(A c B) = .7, then find:

a) P(B)b) P(A )c

c) P(B )c

d) P(A 1 B )c

e) P(A 1 B)c

f) P(A 1 B )c c

g) P(A | B)h) P(A | B )c

i) P(B | A)j) P(A | B ) where A denotes the complement of A. c c

46

k) Are the events A and B independent?

Chapter 2, Exercise 3737. Draw a Venn diagram depicting two independent events.

Chapter 2, Exercise 2737. Prove that if health insurance coverage and visiting the physician are independent events thenthe treatment effect of health insurance is zero.

chapter 2 – econometrics for health policy, health ...plaza.ufl.edu/jvt/econometrics-topic2.pdf2...

Documents