ismor fischer, 11/5/2017 4.4-1 4.4...

22
Ismor Fischer, 11/5/2017 4.4-1 $20 $10 $40 $30 4.4 Problems 1. Patient noncompliance is one of many potential sources of bias in medical studies. Consider a study where patients are asked to take 2 tablets of a certain medication in the morning, and 2 tablets at bedtime. Suppose however, that patients do not always fully comply and take both tablets at both times; it can also occur that only 1 tablet, or even none, are taken at either of these times. (a) Explicitly construct the sample space S of all possible daily outcomes for a randomly selected patient. (b) Explicitly list the outcomes in the event that a patient takes at least one tablet at both times, and calculate its probability, assuming that the outcomes are equally likely . (c) Construct a probability table and corresponding probability histogram for the random variable X = “the daily total number of tablets taken by a random patient.” (d) Calculate the daily mean number of tablets taken. (e) Suppose that the outcomes are not equally likely , but vary as follows: # tablets AM probability PM probability 0 0.1 0.2 1 0.3 0.3 2 0.6 0.5 Rework parts (b)-(d) using these probabilities. Assume independence between AM and PM. 2. A statistician’s teenage daughter withdraws a certain amount of money X from an ATM every so often, using a method that is unknown to him: she randomly spins a circular wheel that is equally divided among four regions, each containing a specific dollar amount, as shown. Bank statements reveal that over the past n = 80 ATM transactions, $10 was withdrawn thirteen times, $20 sixteen times, $30 nineteen times, and $40 thirty-two times. For this sample, construct a relative frequency table, and calculate the average amount x withdrawn per transaction, and the variance s 2 . Suppose this process continues indefinitely. Construct a probability table, and calculate the expected amount μ withdrawn per transaction, and the variance 2 . (Verify that, for this sample, s 2 and 2 happen to be equal.)

Upload: phamxuyen

Post on 05-Jul-2018

242 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-1

$20 $10

$40 $30

4.4 Problems

1. Patient noncompliance is one of many potential sources of bias in medical studies. Consider a

study where patients are asked to take 2 tablets of a certain medication in the morning, and 2

tablets at bedtime. Suppose however, that patients do not always fully comply and take both

tablets at both times; it can also occur that only 1 tablet, or even none, are taken at either of these

times.

(a) Explicitly construct the sample space S of all possible daily outcomes for a randomly selected

patient.

(b) Explicitly list the outcomes in the event that a patient takes at least one tablet at both times,

and calculate its probability, assuming that the outcomes are equally likely.

(c) Construct a probability table and corresponding probability histogram for the random variable

X = “the daily total number of tablets taken by a random patient.”

(d) Calculate the daily mean number of tablets taken.

(e) Suppose that the outcomes are not equally likely, but vary as follows:

# tablets AM probability PM probability

0 0.1 0.2

1 0.3 0.3

2 0.6 0.5

Rework parts (b)-(d) using these probabilities. Assume independence between AM and PM.

2. A statistician’s teenage daughter withdraws a certain amount of money X from an ATM every so

often, using a method that is unknown to him: she randomly spins a circular wheel that is equally

divided among four regions, each containing a specific dollar amount, as shown.

Bank statements reveal that over the past n = 80 ATM transactions, $10 was withdrawn thirteen

times, $20 sixteen times, $30 nineteen times, and $40 thirty-two times. For this sample, construct

a relative frequency table, and calculate the average amount x withdrawn per transaction, and

the variance s2.

Suppose this process continues indefinitely. Construct a probability table, and calculate the

expected amount µ withdrawn per transaction, and the variance 2 . (Verify that, for this

sample, s2 and

2 happen to be equal.)

Page 2: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-2

3. A youngster finds a broken clock, on which the hour and minute hands can be randomly spun at

the same time, independently of one another. Each hand can land in any one of the twelve equal

areas below, resulting in elementary outcomes in the form of ordered pairs (hour hand, minute

hand), e.g., (7, 11), as shown.

Let the simple events A = “hour hand lands on 7” and B = “minute hand lands on 11.”

(a) Calculate each of the following probabilities. Show all work!

P(A and B)

P(A or B)

(b) Let the discrete random variable X = “the product of the two numbers spun”. List all the

elementary outcomes that belong to the event C = “X = 36” and calculate its probability P(C).

(c) After playing for a little while, some of the numbers fall off, creating new areas, as shown. For

example, the configuration below corresponds to the ordered pair (9, 12). Now calculate P(C).

Page 3: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-3

4. An amateur game player throws darts at the dartboard shown below, with each target area worth

the number of points indicated. However, because of the player’s inexperience, all of the darts

hit random points that are uniformly distributed on the dartboard.

(a) Let X = “points obtained per throw.” What is the sample space S of this experiment?

(b) Calculate the probability of each outcome in S. (Hint: The area of a circle is 2r .)

(c) What is the expected value of X, as darts are repeatedly thrown at the dartboard at random?

(d) What is the standard deviation of X?

Suppose that, if the total number of points in three independent random throws is exactly 100, the

player wins a prize. With what probability does this occur? (Hint: For the random variable T =

“total points in three throws,” calculate the probability of each “ordered triple” outcome

1 2 3( , , )X X X in the event “T = 100.”)

5. Compare this problem with 2.5/10!

Consider the binary population variable 1, with probability

0, with probability 1Y

(see figure).

(a) Construct a probability table for this random variable.

(b) Show that the population mean Y .

(c) Show that the population variance 2 (1 )Y .

Note that controls both the mean and the variance!

10

20

30

40

50 1 1 1 1 1

POPULATION

= 1 ◦ = 0

Page 4: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-4

6. SLOT MACHINE

Wheel 1 Wheel 2 Wheel 3

A casino slot machine consists of three wheels, each with images of three types of fruit: apples,

bananas, and cherries. When a player pulls the handle, the wheels spin independently of one

another, until each one stops at a random image displayed in its window, as shown above. Thus,

the sample space S of possible outcomes consists of the 27 ordered triples shown below, where

events A = “Apple,” B = “Banana,” and C = “Cherries.”

(a) Complete the individual tables above, and use them to construct the probability table

(including the outcomes) for the discrete random variable X = “# Apples” that are displayed

when the handle is pulled. Show all work. (Hint: To make calculations easier, express

probabilities as fractions reduced to lowest terms, instead of as decimals.)

Outcome Probability

A

B

C

Outcome Probability

A

B

C

Outcome Probability

A

B

C

X Outcomes Probability f(x)

(A A A), (A A B), (A A C), (A B A), (A B B), (A B C), (A C A), (A C B), (A C C)

(B A A), (B A B), (B A C), (B B A), (B B B), (B B C), (B C A), (B C B), (B C C)

(C A A), (C A B), (C A C), (C B A), (C B B), (C B C), (C C A), (C C B), (C C C)

$100

$100

Page 5: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-5

(b) Sketch the corresponding probability histogram of X. Label all relevant features.

(c) Calculate the mean µ and variance σ2 of X. Show all work.

(d) Similar to X = “# Apples,” define random variables Y = “# Bananas” and Z = “# Cherries”

displayed in one play. The player wins if all three displayed images are of the same fruit.

Using these variables, calculate the probability of a win. Show all work.

(e) Suppose it costs one dollar to play this game once. The result is that either the player loses the

dollar, or if the player wins, the slot machine pays out ten dollars in coins. If the player

continues to play this game indefinitely, should he/she expect to win money, lose money, or

neither, in the long run? If win or lose money, how much per play? Show all work.

7. Formally prove that each of the following is a valid pmf or pdf. [Note: This is a rigorous

mathematical exercise.]

(a) Bin ( ) (1 )x n xnp x

x

x = 0, 1, 2, ..., n

(b) Poisson ( )!

xep x

x

, x = 0, 1, 2, ...

(c)

21

2Normal

1( )

2

x

f x e

, x

8. Formally prove each of the following, using the appropriate “expected value” definitions.

[Note: As the preceding problem, this is a rigorous mathematical exercise.]

(a) If X ~ Bin(n, ), then n and 2 (1 )n .

(b) If X ~ Poisson( ), then and 2 .

(c) If X ~ ( , )N , then and 2 2 .

9. For any p > 0, sketch the graph of 1( ) pf x p x for x 1 (and ( ) 0f x for x < 1), and formally

show that it is a valid density function. Then show the following.

If p > 2, then ( )f x has finite mean and finite variance 2 .

If 1 2p , then ( )f x has finite mean but infinite (i.e., undefined) variance.

If 0 1p , then ( )f x has infinite (i.e., undefined) mean (and hence undefined variance).

[Note: As with the preceding problems, this is a rigorous mathematical exercise.]

Page 6: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-6

10. This is a subtle problem that illustrates an important difference

between the normal distribution and many other distributions, the

binomial in particular. Consider a large group of populations of

males and females, such as all Wisconsin counties, and suppose that

the random variable Y = “Age (years)” is normally distributed in all

of them, each with some mean , and some variance 2 .

Clearly, there is no direct relationship between any and its

corresponding 2 , as we range continuously from county to county.

(In fact, it is not unreasonable to assume that although the means may

be different, the variances – which, recall, are measures of “spread” –

might all be the same (or similar) throughout the counties. This is

known as equivariance, a concept that we will revisit in Chapter 6.)

Suppose that, instead of age, we are now concerned with the different proportion of males from

one county to another, i.e., ( Male )P . If we intend to select a random sample of n = 100

individuals from each county, then the random variable X = “Number of males” in each sample is

binomially distributed, i.e., X ~ Bin(100, ), for 0 1 . Answer each of the following.

If a county has no males, compute the mean , and variance 2 .

If a county has all males, compute the mean , and variance 2 .

If a county has males and females in equal proportions, compute the mean , and variance 2 .

Sketch an accurate graph of 2 on the vertical axis, versus on the horizontal axis, for n = 100

and 0 1 , as we range continuously from county to county. Conclusions?

Note: Also see related problem 4.4/5.

Page 7: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-7

11. Imagine that a certain disease occurs in a large population in such a way that the probability of a

randomly selected individual having the disease remains constant at = .008, independent of any

other randomly selected individual having the disease. Suppose now that a sample of n = 500

individuals is to be randomly selected from this population. Define the discrete random variable

X = “the number of diseased individuals,” capable of assuming any value in the set {0, 1, 2, …,

500} for this sample.

(a) Calculate the probability distribution function p(x) = P(X = x) – “the probability that the

number of diseased individuals equals x” – for x = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Do these

computations two ways: first, using the Binomial Distribution and second, using the Poisson

Distribution, and arrange these values into a probability table. (For the sake of comparison,

record at least five decimal places.)

Tip: Use the functions dbinom and dpois in R.

x Binomial Poisson

0

1

2

3

4

5

6

7

8

9

10

etc. etc. etc.

(b) Using either the Binomial or Poisson Distribution, what is the mean number of diseased

individuals to be expected in the sample, and what is its probability? How does this

probability compare with the probabilities of other numbers of diseased individuals?

(c) Suppose that, after sampling n = 500 individuals, you find that X = 10 of them actually have

this disease. Before performing any formal statistical tests, what assumptions – if any – might

you suspect have been violated in this scenario? What is the estimate of the probability ̂ of

disease, based on the data of this sample?

12.

(a) The discrete uniform density function of a fair die given in the notes has median and mean = 3.5,

by inspection. Calculate the variance.

(b) Repeat part (a) for the continuous uniform distribution of ages on [1, 6], given in the notes.

Page 8: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-8

13. (These problems need not be solved using calculus, but if you must...)

(a) Let ( )8

xf x for 0 4,x and = 0 elsewhere, as shown below left.

Confirm that f(x) is indeed a density function.

Determine the formula for the cumulative distribution function ( ) ( ),F x P X x and

sketch its graph. Recall that F(x) corresponds to the area under the density curve f(x) up

to and including the value x, and therefore must increase monotonically and continuously

from 0 to 1, as x increases.

Using F(x), calculate the probabilities ( 1),P X ( 3),P X and (1 3).P X

Using F(x), calculate the quartiles Q1, Q2, and Q3.

(b) Repeat (a) for the function

, 0 26

( )1

, 2 43

xx

f x

x

, and = 0 elsewhere, as shown below right.

14. Define the piecewise uniform function

18

14

, 1 3( )

, 3 6

xf x

x

(and = 0 elsewhere). Prove that

this is a valid density function, sketch the cdf F(x), and find the median, mean, and variance.

8

x

6

x

1

3

Page 9: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-9

15. Suppose that the continuous random variable X = “age of juniors at the UW-Iwanagoeechees

campus” is symmetrically distributed about its mean, but piecewise linear as illustrated, rather

than being a normally distributed bell curve.

For an individual selected at random from this population, calculate each of the following.

(a) Verify by direct computation that P(18 X 22) = 1, as it should be.

[Hint: Recall that the area of a triangle = ½ (base height).]

(b) P(18 X < 18.5)

(c) P(18.5 < X 19)

(d) P(19.5 < X < 20.5)

(e) What symmetric interval about the mean contains exactly half the population values?

Express in terms of years and months.

16. Suppose that in a certain population of adult males, the variable X = “total serum cholesterol level

(mg/dL)” is found to be normally distributed, with mean = 220 and standard deviation = 40. For an individual selected at random, what is the probability that his cholesterol level is…

(a) under 190? under 210? under 230? under 250?

(b) over 240? over 270? over 300? over 330?

(c) Using the R command pnorm, redo parts (a) and (b). [Type ?pnorm for syntax help.

Ex: pnorm(q=190, mean=220, sd=40), or more simply, pnorm(190, 220, 40)]

(d) over 250, given that it is over 240? [Tip: See the last question in (a), and the first in (b).]

(e) between 214 and 276?

(f) between 202 and 238?

(g) Eighty-eight percent of men have a cholesterol level below what value? Hint: First find the

approximate critical value of z that satisfies P(Z +z) = 0.88, then change back to X.

(h) Using the R command qnorm, redo (g). [Type ?qnorm for syntax help.]

(i) What symmetric interval about the mean contains exactly half the population values?

Hint: First find the approximate critical value of z that satisfies P(z Z +z) = 0.5, then change back to X.

1/3

f(x)

X 0 18 19 = 20 21 22

Submit a

copy of the

output, and

clearly show

agreement

of your

answers!

Submit a

copy of the

output, and

clearly show

agreement

of your

answer!

Page 10: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-10

M ~ N(10, 2.5)

F ~ N(16, 5)

17. A population biologist is studying a certain species of lizard, whose sexes appear alike, except

for size. It is known that in the adult male population, length M is normally distributed with

mean M = 10.0 cm and standard deviation M = 2.5 cm, while in the adult female

population, length F is normally distributed with mean F = 16.0 cm and standard deviation

F = 5.0 cm.

(a) Suppose that a single adult specimen of length 11 cm is captured at random, and its sex

identified as either a larger-than-average male, or a smaller-than-average female.

Calculate the probability that a randomly selected adult male is as large as, or larger

than, this specimen.

Calculate the probability that a randomly selected adult female is as small as, or smaller

than, this specimen.

Based on this information, which of these two events is more likely?

(b) Repeat part (a) for a second captured adult specimen, of length 12 cm.

(c) Repeat part (a) for a third captured adult specimen, of length 13 cm.

Page 11: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-11

18. Consider again the male and female lizard populations in the previous problem.

(a) Answer the following.

Calculate the probability that the length of a randomly selected adult falls between the

two population means, i.e., between 10 cm and 16 cm, given that it is male.

Calculate the probability that the length of a randomly selected adult falls between the

two population means, i.e., between 10 cm and 16 cm, given that it is female.

(b) Suppose it is known that males are slightly less common than females; in particular, males

comprise 40% of the lizard population, and females 60%. Further suppose that the length

of a randomly selected adult specimen of unknown sex falls between the two population

means, i.e., between 10 cm and 16 cm.

Calculate the probability that it is a male.

Calculate the probability that it is a female.

Hint: Use Bayes’ Theorem.

19. Bob spends the majority of a certain evening in his favorite drinking establishment.

Eventually, he decides to spend the rest of the night at the house of one of his two friends, each

of whom lives ten blocks away in opposite directions. However, being a bit intoxicated, he

engages in a so-called “random walk” of n = 10 blocks where, at the start of each block, he

first either turns and faces due west with probability 0.4, or independently, turns and faces due

east with probability 0.6, before continuing. Using this information, answer the following.

Hint: Let the discrete random variable X = “number of east turns in n = 10 blocks.” (0, 1, 2, 3, …, 10)

(a) Calculate the probability that he ends up at Al’s house.

(b) Calculate the probability that he ends up at Carl’s house.

(c) Calculate the probability that he ends up back where he started.

(d) How far, and in which direction, from where he started is he expected to end up, on average?

(Hint: Combine the expected number of east and west turns.)

(e) With what probability does this occur?

West East

Al’s

house

Carl’s

house

ECK’S BAR

Page 12: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-12

20.

(a) Let “X = # Heads” in n = 100 tosses of a fair coin (i.e., = 0.5). Write but DO NOT

EVALUATE an expression to calculate the probability P(X 45 or X 55).

(b) In R, type ?dbinom, and scroll down to Examples, where P(45 < X < 55) is computed

for X Binomial(100,0.5). Copy, paste, and run the single line of code given, and use it

to calculate the probability in (a).

(c) How does this compare with the corresponding probability on page 1.1-4?

21.

(a) How much overlap is there between the bell curves ~ (0,1)Z N and ~ (2,1)X N ? (Take

2 in the figure below.) That is, calculate the probability that a randomly selected

population value is either in the upper tail of (0,1)N , or in the lower tail of (2,1)N .

Hint: Where on the horizontal axis do the two curves cross in this case?

(b) Suppose ~ ( ,1)X N for a general ; see figure. How close to 0 does the mean have to

be, in order for the overlap between the two distributions to be equal to 20%? 50%? 80%?

1 1

0

Z X

Page 13: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-13

22. Consider the two following modified Cauchy distributions.

(a) “Truncated” Cauchy: 2

2 1( )

1f x

x

for 1 1x (and ( ) 0f x otherwise).

Show that this is a valid density function, and sketch its graph. Find the cdf ( )F x , and

sketch its graph. Find the mean and variance.

(b) “One-sided” Cauchy: 2

2 1( )

1f x

x

for 0x (and ( ) 0f x otherwise). Show that

this is a valid density function, and sketch its graph. Find the cdf ( )F x , and sketch its

graph. Find the median. Does the mean exist?

23. Suppose that the random variable X = “time-to-failure (yrs)” of a standard model of a medical

implant device is known to follow a uniform distribution over ten years, and therefore

corresponds to the density function 1( ) 0.1f x for 0 10x (and zero otherwise). A new

model of the same implant device is tested, and determined to correspond to a time-to-failure

density function 2

2 ( ) .009 .08 0.2f x x x for 0 10x (and zero otherwise). See figure.

(a) Verify that 1( )f x and 2 ( )f x are indeed legitimate density functions.

(b) Determine and graph the corresponding cumulative distribution functions 1( )F x and 2 ( )F x .

(c) Calculate the probability that each model fails within the first five years of operation.

(d) Calculate the median failure time of each model.

(e) How do 1( )F x and 2 ( )F x compare? In particular, is one model always superior during the

entire ten years, or is there a time in 0 10x when a switch occurs in which model

outperforms the other, and if so, when (and which model) is it? Be as specific as possible.

2 ( )f x

1( )f x

Page 14: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-14

24. Suppose that a certain random variable X follows a Poisson distribution with mean cases –

i.e., X1 ~ Poisson() – in the first year, then independently, follows a Poisson distribution with

mean cases – i.e., X2 ~ Poisson() – in the second year. Then it should seem intuitively

correct that the sum X1 + X2 follows a Poisson distribution with mean + cases – i.e.,

X1 + X2 ~ Poisson( + ) – over the entire two-year period. Formally prove that this is indeed

true. (In other words, the sum of two Poisson variables is also a Poisson variable.)

25. [Note: The result of the previous problem might be useful for part (e).] Suppose the occurrence

of a rare disease in a certain population is known to follow a Poisson distribution, with an

average of λ = 2.3 cases per year. In a typical year, what is the probability that…

(a) no cases occur?

(b) exactly one case occurs?

(c) exactly two cases occur?

(d) three or more cases occur?

(e) Answer (a)-(d) for a typical two-year period. (Assume independence from year to year.)

(f) Use the function dpois in R to redo (a), (b), and (c), and include the output as part of your

submitted assignment, clearly showing agreement of your answers.

(g) Use the function ppois in R to redo (d), and include the output as part of your submitted

assignment, clearly showing agreement of your answer.

26.

(a) Population 1 consists of individuals whose ages are uniformly distributed in [0, 50) years old.

What is the mean age of the population?

What proportion of the population is in the interval [30, 50) years old?

(b) Population 2 consists of individuals whose ages are uniformly distributed in [50, 90) years old.

What is the mean age of the population?

What proportion of the population is in the interval [50, 80) years old?

(c) Suppose the two populations are combined into a single population.

What is the mean age of the population?

What proportion of the population in the interval [30, 80) years old?

Page 15: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-15

27. Let X be a discrete random variable on a population, with corresponding probability mass

function ( )p x , i.e., P(X = x). Then recall that the population mean, or expectation, of X is

defined as

all

Mean( ) = [ ] ( )x

X E X x p x ,

and the population variance of X is defined as

2 2 2

all

Var( ) = ( ) ( ) ( )x

X E X x p x .

(NOTE: Also recall that if X is a continuous random variable with probability density

function ( )f x , all of the definitions above – as well as those that follow – can be modified

simply by replacing the summation sign ∑ by an integral symbol ∫ over all population values x.

For example, Mean( ) = [ ] ( )X E X x f x dx

, etc.)

Now suppose we have two such random variables X and Y, with corresponding joint distribution

function ( , )p x y , i.e., ( , )P X x Y y . Then in addition to the individual means ,X Y and

variances 2 2,X Y above,

we can also define the population covariance between X and Y :

all all

Cov( , ) = ( )( ) ( )( ) ( , )XY X Y X Y

x y

X Y E X Y x y p x y .

Example: A sociological study investigates a certain population of married couples, with

random variables X = “number of husband’s former marriages (0, 1, or 2)” and Y = “number of

wife’s former marriages (0 or 1).” Suppose that the joint probability table is given below.

X = # former marriages

(Husbands)

0 1 2

Y = # former

marriages

(Wives)

0 .19 .20 .01 .40

1 .01 .10 .49 .60

.20 .30 .50 1.00

For instance, the probability p(0, 0) = ( 0, 0)P X Y = .19, i.e., neither spouse was previously

married in 19% of this population of married couples. Similarly, p (2, 1) = ( 2, 1)P X Y = .49,

i.e., in 49% of this population, the husband was married twice before, and the wife once before,

etc.

The individual distribution functions ( )

Xp x for X, and ( )

Yp y for Y, correspond to the so-called marginal distributions

of the joint distribution ( , )p x y , as will be seen in the upcoming example.

Page 16: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-16

From their joint distribution above, we can read off the marginal distributions of X and Y :

X ( )Xp x Y ( )Yp y

0 0.2 0 0.4

1 0.3 1 0.6

2 0.5 1.0

1.0

from which we can compute the corresponding population means and population variances:

(0)(0.2) (1)(0.3) (2)(0.5),X

i.e., 1.3X

(0)(0.4) (1)(0.6),Y

i.e., 0.6Y

2 2 2 2(0 1.3) (0.2) (1 1.3) (0.3) (2 1.3) (0.5),X

i.e., 2 0.61X

2 2 2(0 0.6) (0.4) (1 0.6) (0.6),Y

i.e., 2 0.24Y .

But now, we can also compute the population covariance between X and Y, using their joint

distribution: (0,0) (1,0) (2,0)

(0,1) (1,1) (2,1)

(0 1.3)(0 0.6) (.19) (1 1.3)(0 0.6) (.20) (2 1.3)(0 0.6) (.01)

(0 1.3)(1 0.6) (.01) (1 1.3)(1 0.6) (.10) (2 1.3)(1 0.6) (.49) ,

p p p

XY

p p p

i.e., 0.30XY .

(A more meaningful context for the covariance will be discussed in Chapter 7.)

(a) Recall that two events A and B are statistically independent if ( ) ( ) ( ).P A B P A P B

Therefore, in this context, two discrete random variables X and Y are statistically independent

if, for all population values x and y, we have ( , ) ( ) ( ).P X x Y y P X x P Y y That is,

( , ) ( ) ( )X Yp x y p x p y , i.e., the joint probability distribution is equal to the product of the

marginal distributions. However, it then follows from the covariance definition above, that ( , )

all all all all

( )( ) ( ) ( ) ( ) ( ) ( ) ( )

p x y

XY X Y X Y X X Y Y

x y x y

x y p x p y x p x y p y

= 0,

since each of the two factors in this product is the sum of the deviations of the variable from

its respective mean, hence = 0. Consequently, we have the important property that

If X and Y are statistically independent, then Cov(X, Y) = 0.

Page 17: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-17

Verify that this statement is true for the joint probability table below.

X = # former marriages

(Husbands)

0 1 2

Y = # former

marriages

(Wives)

0 .08 .12 .20 .40

1 .12 .18 .30 .60

.20 .30 .50 1.00

That is, first confirm that X and Y are statistically independent, by showing that each cell

probability is equal to the product of the corresponding row marginal and column marginal

probabilities (as in Chapter 3). Then, using the previous example as a guide, compute the

covariance, and show that it is equal to zero.

(b) The converse of the statement in (a), however, is not necessarily true! For the table below,

show that Cov(X, Y) = 0, but X and Y are not statistically independent.

X = # former marriages

(Husbands)

0 1 2

Y = # former

marriages

(Wives)

0 .13 .02 .25 .40

1 .07 .28 .25 .60

.20 .30 .50 1.00

Page 18: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-18

28. Using the joint distribution ( , )p x y , we can also define the sum X + Y and difference X – Y of

two discrete random variables in a natural way, as follows.

{ | , }X Y x y x X y Y { | , }X Y x y x X y Y

That is, the variable X + Y consists of all possible sums x + y, where x comes from the population

distribution of X, and y comes from the population distribution of Y. Likewise, the variable X – Y

consists of all possible differences x – y, where x comes from the population distribution of X,

and y comes from the population distribution of Y. The following important statements can then

be easily proved, from the algebraic properties of mathematical expectation given in the notes.

(Exercise)

Example (cont’d): Again consider the first joint probability table in the previous problem:

X = # former marriages

(Husbands)

0 1 2

Y = # former

marriages

(Wives)

0 .19 .20 .01 .40

1 .01 .10 .49 .60

.20 .30 .50 1.00

We are particularly interested in studying D = X – Y, the difference between these two variables.

As before, we reproduce their respective marginal distributions below. In order to construct a

probability table for D, we must first list all the possible (x, y) ordered-pair outcomes in the

sample space, but use the joint probability table to calculate the corresponding probability values:

X ( )Xp x Y ( )Yp y D = X – Y Outcomes ( )Dp d

0 0.2 0 0.4 –1 (0, 1) .01

1 0.3 1 0.6 0 (0, 0), (1, 1) .29 = .19 + .10

2 0.5 1.0 1 (1, 0), (2, 1) .69 = .20 + .49

1.0 2 (2, 0) .01

1.0

II..

(A) Mean(X + Y) = Mean(X) + Mean(Y)

(B) Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y)

IIII..

(A) Mean(X – Y) = Mean(X) – Mean(Y)

(B) Var(X – Y) = Var(X) + Var(Y) – 2 Cov(X, Y)

Page 19: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-19

We are now able to compute the population mean and variance of the variable D:

( 1)(.01) (0)(.29) (1)(.69) (2)(.01),D

i.e., 0.7D

2 2 2 2 2( 1 0.7) (.01) (0 0.7) (.29) (1 0.7) (.69) (2 0.7) (.01),D

i.e., 2 0.25D

To verify properties II(A) and II(B) above, we can use the calculations already done in the

previous problem, i.e., 2 21.3, 0.6, 0.61, 0.24,X Y X Y and 0.30XY .

Mean(X – Y) = 0.7 = 1.3 – 0.6 = Mean(X) – Mean(Y)

Var(X – Y) = 0.25 = 0.61 + 0.24 – 2(0.30) = Var(X) + Var(Y) – 2 Cov(X, Y)

Using this example as a guide, verify properties II(A) and II(B) for the tables in part (a) and part

(b) of the previous problem. These properties are extremely important, and will be used in §6.2.

29. On his way to work every morning, Bob first takes the bus from his house, exits near his

workplace, and walks the remaining distance. His time spent on the bus (X) is a random

variable that follows a normal distribution, with mean µ = 20 minutes, and standard deviation

= 2 minutes, i.e., X ~ N(20, 2). Likewise, his walking time (Y) is also a random variable that

follows a normal distribution, with mean µ = 10 minutes, and standard deviation = 1.5 minutes, i.e., Y ~ N(10, 1.5). Find the probability that Bob arrives at his workplace in 35

minutes or less. [Hint: Total time = X + Y ~ N(?, ?). Recall the “General Fact” on page

4.1-13, which is true for both discrete and continuous random variables.]

30. The arrival time of my usual morning bus (B) is normally distributed, with a mean ETA at 8:00

AM, and a standard deviation of 4 minutes. My arrival time (A) at the bus stop is also normally

distributed, with a mean ETA at 7:50 AM, and a standard deviation of 3 minutes.

(a) With what probability can I expect to catch the bus? (Hint: What is the distribution of the

random variable X = A – B, and what must be true about X in the event that I catch the bus?)

(b) On average, how much earlier should I arrive, if I expect to catch the bus with 99%

probability?

X ~ N(20, 2) Y ~ N(10, 1.5)

Page 20: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-20

31. Discrete vs. Continuous

(a) Discrete: General. Imagine a flea starting from initial position X = 0, only able to move by

making integer jumps X = 1, X = 2, X = 3, X = 4, X = 5, or X = 6, according to the following

probability table and corresponding probability histogram.

x p(x)

0 .05

1 .10

2 .20

3 .30

4 .20

5 .10

6 .05

Confirm that P(0 ≤ X ≤ 6) = 1, i.e., this is indeed a legitimate probability distribution.

Calculate the probability P(2 ≤ X ≤ 4).

Determine the mean µ and standard deviation of this distribution.

(b) Discrete: Binomial. Now imagine a flea starting from initial position X = 0, only able to

move by making integer jumps X = 1, X = 2, …, X = 6, according to a binomial distribution,

with = 0.5. That is, X ~ Bin(6, 0.5).

x p(x)

0

1

2

3

4

5

6

Complete the probability table above, and confirm that P(0 ≤ X ≤ 6) = 1.

Calculate the probability P(2 ≤ X ≤ 4).

Determine the mean µ and standard deviation of this distribution.

Page 21: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-21

(c) Continuous: General. Next imagine an ant starting

from initial position X = 0, able to move by

crawling to any position in the interval [0, 6],

according to the following probability density curve.

, if 0 39

( )6

, if 3 69

xx

f xx

x

Confirm that P(0 ≤ X ≤ 6) = 1, i.e., this is indeed

a legitimate probability density.

Calculate the probability P(2 ≤ X ≤ 4).

What distance is the ant able to pass only 2% of the time? That is, P(X ≥ ?) = .02.

(d) Continuous: Normal. Finally, imagine an ant

starting from initial position X = 0, able to move by

crawling to any position in the interval [0, 6],

according to the normal probability curve, with

mean µ = 3, and standard deviation = 1. That is,

X ~ N(3, 1).

Calculate the probability P(2 ≤ X ≤ 4).

What distance is the ant able to pass only 2% of the time? That is, P(X ≥ ?) = .02.

Page 22: Ismor Fischer, 11/5/2017 4.4-1 4.4 Problemspages.stat.wisc.edu/~ifischer/Intro_Stat/Lecture_Notes/4...Ismor Fischer, 11/5/2017 4.4-1 $10 $20 $40 $30 4.4 Problems 1. Patient noncompliance

Ismor Fischer, 11/5/2017 4.4-22

32. Suppose that two coins – one with probability of heads P(H) = 0.6 and tails P(T) = 0.4, the

other with probability of heads P(H) = 0.3 and tails P(T) = 0.7 – are repeatedly tossed

according to the following rule: I will randomly toss one of the two coins, depending on

whether my goofy parrot squawks “Hello” or not, within a short predetermined timeframe. If

he does, I toss the first coin; otherwise, if he does not, I toss the second coin. (Assume I have

no reliable way of predicting whether or not he will squawk at any given moment.)

(a) Can the resulting outcomes of Heads and Tails be considered a sequence of Bernoulli trials?

Clearly explain why or why not.

Suppose my parrot is being trained to squawk “Hello” on command, by offering him a favorite

food treat as an incentive to comply, within the same timeframe.

(b) Can the resulting outcomes of Heads and Tails be considered a sequence of Bernoulli trials?

Clearly explain why or why not.

33.

(a) The ages of employees in a certain workplace are normally distributed. It is known that 80% of

the workers are under 65 years old, and 67% are under 55 years old. What percentage of the

workers are under 45 years old? (Hint: First find and σ by calculating the z-scores.)

(b) Suppose it is known that the wingspan X of the males of a certain bat species is normally

distributed with some mean and standard deviation σ, i.e., ( , )X N , while the wingspan

Y of the females is normally distributed with the same mean , but standard deviation twice that

of the males, i.e., ( , 2 )Y N . It is also known that 80% of the males have a wingspan less

than a certain amount m. What percentage of the females have a wingspan less than this same

amount m? (Hint: Calculate the z-scores.)

34. Refer to the fair die experiment on page 4.1-1. Because the value X on each face is equally likely

to appear, it follows that the probability mass function is 1

( )6

p x for every x = 1, 2, 3, 4, 5, 6,

producing the probability table and (uniformly distributed) probability histogram shown.

(a) Determine the expected value (i.e., mean) and standard deviation of X.

Now suppose the die is biased, with probability mass function ( )21

xp x , for x = 1, 2, 3, 4, 5, 6.

(b) Construct a probability table and corresponding probability histogram for X, and verify

that ( )f x does indeed yield a legitimate probability distribution.

(c) Determine the expected value (i.e., mean) and standard deviation of X.