applied statistics i - mathlzhang/teaching/3070summer2008/... · liang zhang (uofu) applied...

Applied Statistics I

Liang Zhang

Department of Mathematics, University of Utah

July 8, 2008

Liang Zhang (UofU) Applied Statistics I July 8, 2008 1 / 15

Distribution for Sample Mean

Example (Problem 54)Suppose the sediment density (g/cm) of a randomly selected specimenfrom a certain region is normally distributed with mean 2.65 and standarddeviation .85 (suggested in “Modeling Sediment and Water ColumnInteractions for Hydrophobic Pollutants”, Water Research, 1984:1169-1174).

a. If a random sample of 25 specimens is selected, what is theprobability that the sample average sediment density is at most 3.00?

b. How large a sample size would be required to ensure that the aboveprobability is at least .99?

The Central Limit Theorem (CLT)

Let X1,X2, . . . ,Xn be a random sample from a distribution with meanvalue µ and standard deviation σ. Then if n is sufficiently large, X hasapproximately a normal distribution with mean value µ and standarddeviation σ/

√n, and T0 also has approximately a normal distribution with

mean value nµ and standard deviation√

nσ. The larger the value of n, thebetter the approximation.

Remark:

1. As long as n is sufficiently large, CLT is applicable no matter Xi ’s arediscrete random variables or continuous random variables.

2. How large should n be such that CLT is applicable?Generally, if n > 30, CLT can be used.

Remark:

2. How large should n be such that CLT is applicable?

Generally, if n > 30, CLT can be used.

Remark:

Example (Problem 49)There are 40 students in an elementary statistics class. On the basis ofyears of experience, the instructor knows that the time needed to grade arandomly chosen first examination paper is a random variable with anexpected value of 6 min and a standard deviation of 6 min.

a. If grading times are independent and the instructor begins grading at6:50pm and grades continuously, what is the (approximate) probabilitythat he is through grading before the 11:00pm TV news begins?

b. If the sports report begins at 11:10pm, what is the probability that hemisses part of the report if he waits unitl grading is done beforeturning on the TV?

The original version of CLT

Let X1,X2, . . . be a sequence of i.i.d. random variables from adistribution with mean value µ and standard deviation σ. Define randomvariables

∑ni=1 Xi − nµ√

nσfor n = 1, 2, . . .

Then as n→∞, Yn has approximately a normal distribution.

The original version of CLT

Let X1,X2, . . . be a sequence of i.i.d. random variables from adistribution with mean value µ and standard deviation σ. Define randomvariables

∑ni=1 Xi − nµ√

nσfor n = 1, 2, . . .

Then as n→∞, Yn has approximately a normal distribution.

Corollary

Let X1,X2, . . . ,Xn be a random sample from a distribution for which onlypositive values are possible [P(Xi > 0) = 1]. Then if n is sufficiently large,the product Y = X1X2 · · ·Xn has approximately a lognormal distribution.

Corollary

Let X1,X2, . . . ,Xn be a random sample from a distribution for which onlypositive values are possible [P(Xi > 0) = 1]. Then if n is sufficiently large,the product Y = X1X2 · · ·Xn has approximately a lognormal distribution.

Distribution for Linear Combinations

Proposition

Let X1,X2, . . . ,Xn have mean values µ1, µ2, . . . , µn, respectively, andvariances σ2

1, σ22, . . . , σ

2n, respectively.

1.Whether or not the Xi s are independent,

E (a1X1 + a2X2 + · · ·+ anXn) = a1E (X1) + a2E (X2) + · · ·+ anE (Xn)

= a1µ1 + a2µ2 + · · ·+ anµn

2. If X1,X2, . . . ,Xn are independent,

V (a1X1 + a2X2 + · · ·+ anXn) = a21V (X1) + a2

2V (X2) + · · ·+ a2nV (Xn)

= a1σ21 + a2σ

22 + · · ·+ anσ

Proposition

1, σ22, . . . , σ

2n, respectively.

1.Whether or not the Xi s are independent,

E (a1X1 + a2X2 + · · ·+ anXn) = a1E (X1) + a2E (X2) + · · ·+ anE (Xn)

= a1µ1 + a2µ2 + · · ·+ anµn

2. If X1,X2, . . . ,Xn are independent,

V (a1X1 + a2X2 + · · ·+ anXn) = a21V (X1) + a2

2V (X2) + · · ·+ a2nV (Xn)

= a1σ21 + a2σ

22 + · · ·+ anσ

Proposition (Continued)

1, σ22, . . . , σ

2n, respectively.

3. More generally, for any X1,X2, . . . ,Xn

V (a1X1 + a2X2 + · · ·+ anXn) =n∑

n∑j=1

aiajCov(Xi ,Xj)

We call a1X1 + a2X2 + · · ·+ anXn a linear combination of the Xi ’s.

1, σ22, . . . , σ

2n, respectively.

V (a1X1 + a2X2 + · · ·+ anXn) =n∑

n∑j=1

aiajCov(Xi ,Xj)

1, σ22, . . . , σ

2n, respectively.

V (a1X1 + a2X2 + · · ·+ anXn) =n∑

n∑j=1

aiajCov(Xi ,Xj)

Example (Problem 64)Suppose your waiting time for a bus in the morning is uniformlydistributed on [0,8], whereas waiting time in the evening is uniformlydistributed on [0,10] independent of morning waiting time.

a. If you take the bus each morning and evening for a week, what is yourtotal expected waiting time?

b. What is the variance of your total waiting time?

c. What are the expected value and variance of the difference betweenmorning and evening waiting times on a given day?

d. What are the expected value and variance of the difference betweentotal morning waiting time and total evening waiting time on aparticular week?

Corollary

E (X1 − X2) = E (X1)− E (X2) and, if X1 and X2 are independent,V (X1 − X2) = V (X1) + V (X2).

Proposition

If X1,X2, . . . ,Xn are independent, normally distributed rv’s (with possiblydifferent means and/or variances), then any linear combination of the Xi salso has a normal distribution. In particular, the difference X1 − X2

between two independent, normally distributed variables is itself normallydistributed.

Corollary

Proposition

Corollary

Proposition

Example (Problem 62)Manufacture of a certain component requires three different machingoperations. Machining time for each operation has a normal distribution,and the three times are independent of one another. The mean values are15, 30, and 20min, respectively, and the standard deviations are 1, 2, and1.5min, respectively.What is the probability that it takes at most 1 hour of machining time toproduce a randomly selected component?

Example (Problem 62)Manufacture of a certain component requires three different machingoperations. Machining time for each operation has a normal distribution,and the three times are independent of one another. The mean values are15, 30, and 20min, respectively, and the standard deviations are 1, 2, and1.5min, respectively.

What is the probability that it takes at most 1 hour of machining time toproduce a randomly selected component?

Example (Problem 62)Manufacture of a certain component requires three different machingoperations. Machining time for each operation has a normal distribution,and the three times are independent of one another. The mean values are15, 30, and 20min, respectively, and the standard deviations are 1, 2, and1.5min, respectively.What is the probability that it takes at most 1 hour of machining time toproduce a randomly selected component?

Point Estimation

Example (a variant of Problem 62, Ch5)Manufacture of a certain component requires three different machingoperations. The total time for manufacturing one such component isknown to have a normal distribution. However, the mean µ and varianceσ2 for the normal distribution are unknown. If we did an experiment inwhich we manufactured 10 components and record the operation time,and the sample time is given as following:

1 2 3 4 5time 63.8 60.5 65.3 65.7 61.9

6 7 8 9 10time 68.2 68.1 64.8 65.8 65.4

What can we say about the population mean µ and population varianceσ2?

Point Estimation

Example (a variant of Problem 62, Ch5)Manufacture of a certain component requires three different machingoperations. The total time for manufacturing one such component isknown to have a normal distribution. However, the mean µ and varianceσ2 for the normal distribution are unknown. If we did an experiment inwhich we manufactured 10 components and record the operation time,and the sample time is given as following:

1 2 3 4 5time 63.8 60.5 65.3 65.7 61.9

6 7 8 9 10time 68.2 68.1 64.8 65.8 65.4

What can we say about the population mean µ and population varianceσ2?

Point Estimation

Example (a variant of Problem 64, Ch5)Suppose the waiting time for a certain bus in the morning is uniformlydistributed on [0, θ], where θ is unknown. If we record 10 waiting times asfollwos:

1 2 3 4 5time 7.6 1.8 4.8 3.9 7.1

6 7 8 9 10time 6.1 3.6 0.1 6.5 3.5

What can we say about the parameter θ?

Point Estimation

Example (a variant of Problem 64, Ch5)Suppose the waiting time for a certain bus in the morning is uniformlydistributed on [0, θ], where θ is unknown. If we record 10 waiting times asfollwos:

1 2 3 4 5time 7.6 1.8 4.8 3.9 7.1

6 7 8 9 10time 6.1 3.6 0.1 6.5 3.5

What can we say about the parameter θ?

Point Estimation

Definition

A point estimate of a parameter θ is a single number that can beregarded as a sensible value for θ. A point estimate is obtained byselecting a suitable statistic and computing its value from the given sampledata. The selected statistic is called the point estimator of θ.

e.g. X =∑10

i=1 Xi/10 is a point estimator for µ for the normal distributionexample.The largest sample data X10,10 is a point estimator for θ for the uniformdistribution example.

Point Estimation

Definition

e.g. X =∑10

Point Estimation

Definition

e.g. X =∑10

i=1 Xi/10 is a point estimator for µ for the normal distributionexample.

The largest sample data X10,10 is a point estimator for θ for the uniformdistribution example.

Point Estimation

Definition

e.g. X =∑10

Methods of Point Estimation

Definition

Let X1,X2, . . . ,Xn be a random sample from a distribution with pmf orpdf f (x). For k = 1, 2, 3, . . . , the kth population moment, or kthmoment of the distribution f (x), is E (X k). The kth sample momentis 1

∑ni=1 X k

Definition

Let X1,X2, . . . ,Xn be a random sample from a distribution with pmf orpdf f (x ; θ1, . . . , θm), where θ1, . . . , θm are parameters whose values areunknown. Then the moment estimators θ̂1, . . . , θ̂m are obtained byequating the first m sample moments to the corresponding first mpopulation moments and solving for θ1, . . . , θm.

Definition

∑ni=1 X k

Definition

∑ni=1 X k

Definition

applied statistics i - mathlzhang/teaching/3070summer2008/... · liang zhang (uofu) applied...

Documents