definition properties of estimator (unbiasedness
TRANSCRIPT
2104253Eng Stat I
1
Point Estimation
DefinitionProperties of Estimator
(Unbiasedness / Efficiency)Standard Error
2104253Eng Stat I
22
Introduction• Populations are described by their probability
distributions and parameters.– For quantitative populations, the location and
shape are described by and – For a binomial populations, the location and
shape are determined by p.• If the values of parameters are unknown, we make
inferences about them using sample information.
2104253Eng Stat I
33
Types of Inference• Estimation:
– Estimating or predicting the value of the parameter
– “What is (are) the most likely values of or p?”• Hypothesis Testing:
– Deciding about the value of a parameter based on some preconceived idea.
– “Did the sample come from a population with or p = 0.2?”
2104253Eng Stat I
44
Types of Inference• Examples:
– A consumer wants to estimate the average price of similar homes in her city before putting her home on the market.
Estimation: Estimate , the average home price.
Hypothesis test: Is the new average resistance, equal to the old average resistance,
–A manufacturer wants to know if a new type of steel is more resistant to high temperatures than an old type was.
2104253Eng Stat I
55
Types of Inference• Whether you are estimating parameters or testing
hypotheses, statistical methods are important because they provide:– Methods for making the inference– A numerical measure of the goodness or
reliability of the inference
2104253Eng Stat I
66
Definitions
• An estimator is a corresponding random variable. It is written as .– Point estimation: A single number is calculated
to estimate the parameter, .– Interval estimation: Two numbers are
calculated to create an interval within which the parameter is expected to lie.
2104253Eng Stat I
77
Example• For a continuous measurements with mean and
variance 2, we typically use the estimator with value . The usual estimator for 2 is with value s2 .
• For binomial data the parameter is p, the probability of success, and obvious estimator is , the proportion of successes.
22 Sx
nXp /ˆ
2104253Eng Stat I
88
Properties of Point Estimators
• Since an estimator is calculated from sample values, it varies from sample to sample according to its sampling distribution.
• An estimator for the parameter is unbiasedif the mean of its sampling distribution equals the parameter of interest.
– It does not systematically overestimate or underestimate the target parameter.
]ˆ[E
2104253Eng Stat I
99
Unbiased Estimator of • A population with mean and variance 2
• Parameter of interest, , is the population mean , . • Estimator, ,is the sample mean,
nx
x i
)(1
)](E....)(E)([E1
)....(E1)(E ]E[]ˆ[E
21
21
nn
xxxn
xxxnn
xx
n
ni
2104253Eng Stat I
1010
Unbiased Estimator of 2
• A population with mean and variance 2
• Parameter of interest, , is the population variance, . • Estimator, ,is the sample variance, S2
22
22
22
22
)](E[)()(E ,So][EE RVany For
))(
(1
11
)(
YYVY(Y))(YY, V(Y)
nX
Xnn
XXS i
ii
2104253Eng Stat I
1111
Unbiased Estimator of 2
222
2222
222
22
222
11
)(111
1
)]([)(1)(1
1
)(1)E(1
1
))(
(E1
1 ]E[]ˆ[E
nn
nn
nn
nnn
XEXVnn
XEn
Xn
nX
Xn-
s
ii
ii
ii
2104253Eng Stat I
1212
Unbiased Estimator of p• X is Binomial (n,p)• Parameter of interest, , is probability of success, p. • Estimator, ,is the proportion of success,
nXp ˆ
pn
npXnn
Xp )(E1)(E ]ˆE[]ˆ[E
2104253Eng Stat I
1313
Efficiency• Given two unbiased estimator, we generally prefer the one
with the smaller variance. It tends to provide estimates close to the population parameter.
• Occasionally it is possible to prove mathematically that an estimator is a minimum variance unbiased estimator (MVUE).
• Ex. If X1,….,Xn are a random sample from a normal distribution with mean and variance 2 then and are both the unbiased estimators for but is an MVUE for .
Note that: The sample median is an unbiased estimator for if we can assume that X1,….,Xn are a random sample and the distribution of the Xis is continuous and symmetric.
X X~X
X~
2104253Eng Stat I
1414
Properties of Point Estimators
• Of all the unbiased estimators, we generally prefer the estimator whose sampling distribution has the smallest spread or variance.
2104253Eng Stat I
1515
Measuring the Goodnessof an Estimator
• The distance between an estimate and the true value of the parameter is the measure of the precision or goodness of the estimator.
• The standard deviation is a reasonable value for this measures of the goodness.
2104253Eng Stat I
1616
Measuring the Errorof an Estimator
• For ex. If the sample sizes are large, so that our unbiased estimators will have normal distributions according to CLT. •For unbiased estimators, 95% of all point estimates will lie within 1.96 standard deviations of the parameter of interest.Margin of error: The maximum error of estimation, calculated as
estimatortheofdeviation std96.1 estimatortheofdeviation std96.1
2104253Eng Stat I
1717
Standard Error of an Estimator
• Unfortunately, the standard deviation of the sampling distribution is an unknown parameters.
• If we use estimates of the unknown parameters in the formula for the standard deviation we obtain the standard error (SE) of the estimator, which is the estimated standard deviation of the estimator.
2104253Eng Stat I
1818
Standard Error of an Estimator
• To estimate from a random sample that seems close to a normal distribution, we use the estimator whose standard deviation, depends on an unknown parameter
• Therefore, the estimated standard deviation of the estimator, is .
n/X
ns /X
2104253Eng Stat I
1919
Standard Error of an Estimator
• For a binomial model the estimator of the probability of success, has the standard deviation of which depends on
the parameter we are trying to estimate.
• Therefore, the estimated standard deviation of the estimator, is .
nXp /ˆ
npp )1(
npp )ˆ1(ˆ p
2104253Eng Stat I
2020
Estimating Means and Proportions
•For a quantitative population,
nsn
xμ
96.1)30 :(error of Margin
: mean population ofestimator Point
nsn
xμ
96.1)30 :(error of Margin
: mean population ofestimator Point
•For a binomial population, ˆPoint estimator of population proportion :
ˆ ˆMargin of error ( 5, 5): 1.96
p p x/npqnp nqn
ˆPoint estimator of population proportion : ˆ ˆ
Margin of error ( 5, 5): 1.96
p p x/npqnp nqn
2104253Eng Stat I
2121
Example• To estimate the average time it takes to assembly a
certain computer component, the industrial engineer at an electronics firm timed 40 technicians in the performance of this task, getting a mean of 12.73 minutes and a standard deviation of 2.06 minutes.
Point estimation of : 12.732.06
Margin of error 1 96 1.96 0.63840
xs
.n
Point estimation of : 12.732.06
Margin of error 1 96 1.96 0.63840
xs
.n
2104253Eng Stat I
2222
ExampleA quality control technician wants to estimate the proportion of soda cans that are underfilled. He randomly samples 200 cans of soda and finds 10 under-filled cans.
03.200
)95)(.05(.96.1ˆˆ
96.1
05.200/10ˆ200
nqp
x/npppn
:error of Margin
: ofestimator Pointcans dunderfille of proportion
03.200
)95)(.05(.96.1ˆˆ
96.1
05.200/10ˆ200
nqp
x/npppn
:error of Margin
: ofestimator Pointcans dunderfille of proportion
2104253Eng Stat I
23
Exercise: Sports CrazyAre you “sports crazy”? Most Americans love participating in or
at least watching a multitude of sporting events, but many feel that sports have more than just an entertaining value. In a survey of 1000 adults conducted by KRC Research & Consulting, 78% feel that spectator have a positive effect on society.
• Find a point estimate for the proportion of American adults who feel that spectator sports have a positive effect on society. Calculate the margin. (Ans : ±0.026)
• The poll reports a margin of error of “plus or minus 3.1%.” Does this agree with your results in part a? If not, what value of p produces the margin of error given in the poll?(Ans : 0.5)
2104253Eng Stat I
2424
Example: Amount Spent• In an effort to estimate the mean amount spent per customer for
dinner at a major restaurant, data were collected for a sample of49 customers. Assume a population standard deviation of $5,what is the margin of error at 95% confidence? (Ans : ±1.4)
2104253Eng Stat I
2525
Example: Web siteA survey of small business with Web sites found that the averageamount spent on a site was $11,500 per year (Fortune, March2001). Given a sample size of 60 businesses and a populationstandard deviation of s = $4,000. What is the margin of error at95% confidence. What would you recommend if the studyrequired a margin of error of $500?(Ans : 1,012.14, increases n =246)
2104253Eng Stat I
26
Exercise: Brand XIn random sample of 100 households, 59 are found to prefer
brand X. Determine the margin of error at 95% confidence forthe population proportion who prefer brand X. (Ans : 0.096)
2104253Eng Stat I
27
Interval Estimation
Basic Properties of Confidence IntervalsLarge-Sample Confidence Intervals
Intervals Based on a Normal PopulationConfidence Intervals for the Variance and
Standard Deviation
2104253Eng Stat I
2828
Definitions
• An estimator is a corresponding random variable. It is written as .– Point estimation: A single number is calculated
to estimate the parameter.– Interval estimation: Two numbers are calculated
to create an interval within which the parameter is expected to lie.
2104253Eng Stat I
2929
Confidence Interval • Create an interval (a, b) so that you are fairly sure that
the parameter lies between these two values.• “Fairly sure” is means “with high probability”,
measured using the confidence coefficient, 1
Usually, 1-
• Suppose 1- =0.95 and that the estimator has a normal distribution.
Parameter 1.96SE
2104253Eng Stat I
3030
Confidence Interval
nXZ
/
• Suppose that the population is normal with known . • Then will have a standard normal
distribution.• Hence
• This implies that
95.0)96.1/
96.1(
n
XP
95.0)96.196.1( n
Xn
XP
2104253Eng Stat I
3131
Confidence Interval• Since we don’t know the value of the parameter,
consider which has a variable center.
• Only if the estimator falls in the tail areas will the interval fail to enclose the parameter. This happens only 5% of the time.
Estimator 1.96SE
WorkedWorkedWorkedFailed
2104253Eng Stat I
3232
To Change the Confidence Level
• To change to a general confidence level, 1-, pick a value of z that puts area 1-in the center of the z distribution.
100(1-)% Confidence Interval: Estimator zSE
Tail area Z/2
.05 1.645
.025 1.96
.005 2.58Z1-a/2Za/2
2104253Eng Stat I
3333
Large-Sample Confidence Intervals •For a quantitative population,
nszx
μ
2/
:mean afor interval confidence )%-100(1A
nszx
μ
2/
:mean afor interval confidence )%-100(1A
•For a binomial population,
nqpzp
p
ˆˆˆ
: proportion afor interval confidence )%-100(1A
2/
nqpzp
p
ˆˆˆ
: proportion afor interval confidence )%-100(1A
2/
2104253Eng Stat I
3434
Example• A random sample of n = 50 males showed a mean
average daily intake of dairy products equal to 756 grams with a standard deviation of 35 grams. Find a 95% confidence interval for the population average
nsx 96.1
503596.1567 70.97 56
grams. 65.70 746.30or 7
2104253Eng Stat I
3535
Example• Find a 99% confidence interval for the population
average daily intake of dairy products for men.
nsx 58.2
503558.27 56 77.127 56
grams. 7 743.23or 77.68 The interval must be wider to provide for the increased confidence that is indeed enclose the true value of .
2104253Eng Stat I
3636
Example• Of a random sample of n = 150 college students, 104 of
the students said that they had played on a soccer team during their K-12 years. Estimate the proportion of college students who played soccer in their youth with a 98% confidence interval.
nqppˆˆ
33.2ˆ 150
)31(.69.33.2104
150
09.. 69 .60or .78. p
2104253Eng Stat I
37
Exercise: Airline LuggageAn airline wants to estimate the proportions of passengers who
carry only hand luggage on its New York- to-Chicago flights.Random samples of 50 passengers shows 34 passengers whocarry only hand luggage. Construct a 99% confidence intervalfor the population proportion. (Ans : 0.51 < p < 0.85)
2104253Eng Stat I
3838
One Sided Confidence Bounds• Confidence intervals are by their nature two-sided
since they produce upper and lower bounds for the parameter.
• One-sided bounds can be constructed simply by using a value of z that puts rather than /2 in the tail of the z distribution.
Estimator) ofError Std(Estimator :UCBEstimator) ofError Std(Estimator :LCB
1
1
zz
Estimator) ofError Std(Estimator :UCBEstimator) ofError Std(Estimator :LCB
1
1
zz
Z1-a/2Za/2
2104253Eng Stat I
3939
Choosing the Sample Size• The total amount of relevant information in a sample
is controlled by two factors:- The sampling plan or experimental design: the procedure for collecting the information- The sample size n: the amount of information you collect.
• In a statistical estimation problem, the accuracy of the estimation is measured by the margin of error or the width of the confidence interval.
2104253Eng Stat I
4040
1. Determine the size of the margin of error, E, that you are willing to tolerate.
2. Choose the sample size by solving for n or n n 1 n2 in the inequality: Z/2 SE E, where SE is a function of the sample size n.
3. For quantitative populations, estimate the population standard deviation using a previously calculated value of s or the range approximation Range / 4.
4. For binomial populations, use the conservative approach and approximate p using the value p .5.
Choosing the Sample Size
2104253Eng Stat I
4141
ExampleA producer of PVC pipe wants to survey wholesalers who buy his product in order to estimate the proportion who plan to increase their purchases next year. What sample size is required if he wants his estimate to be within .04 of the actual proportion with probability equal to .95?
04.96.1 npq 04.)5(.5.96.1
n
5.2404.
)5(.5.96.1 n 25.6005.24 2 n
He should survey at least 601 wholesalers.
2104253Eng Stat I
42
Exercise: PorosityAssume that the helium porosity of coal samples taken from any
particular seam is normally distributed with true standard deviation 0.75
• How large a sample size is necessary if the width of the 95% interval is to be 0.40? (Ans : 55)
• What sample size is necessary to estimate true average porosity to within 0.2 with 99% confidence? (Ans : 94)
2104253Eng Stat I
43
Exercise : Ice Hockey A study of fast for the ice hockey player shows that the means and standard deviation of the 69 individual average acceleration measurements over the 6-meter distance were 2.962 and 0.529 meters per second, respectively.• Find a 95% confidence interval for this population mean. Interpret the interval. (Ans : 2.837 < µ < 3.087)• Suppose you were dissatisfied with the width of this confidence interval and wanted to cut the interval in half by increasing the sample size. How many skaters (total) would have to be included in the study? (Ans : 276)
2104253Eng Stat I
4444
Example: SalaryAnnual starting salaries for college graduates with degree in BA
are between $30,000 and $45,000. With 99% confidence, howlarge a sample should be taken if the desired error is $500 and$200? (Ans : 375, 2,341)
2104253Eng Stat I
45
Exercise: Company ProfitAccording to Thomson Financial, the majority of companies
reporting profit had beaten estimates, A sample of 162companies showed 104 beat estimates. How large a sample isneeded if the desired margin of error is 0.2 with 95%confidence ? (Ans : 23)
2104253Eng Stat I
4646
Interval based on Normal Population Distribution
• When working with a small sample we must make additional assumptions on the distribution to make up for our lack of information about . We assume the Xi’s are from a normal distribution.
2104253Eng Stat I
4747
Interval based on Normal Population Distribution • When we take a sample from a normal population,
the sample mean has a normal distribution for any sample size n, and
• has a standard normal distribution. • But if is unknown, and we must use s to estimate
it, the resulting statistic is not normal.
nxz
/
n
xz/
normal!not is / ns
x normal!not is / ns
x
x
2104253Eng Stat I
4848
Student’s t Distribution• Fortunately, this statistic does have a sampling
distribution that is well known to statisticians, called the Student’s t distribution, with n-1 degrees of freedom.
nsxt/
ns
xt/
•We can use this distribution to create estimation testing procedures for the population mean .
2104253Eng Stat I
4949
Properties of Student’s t
• Shape depends on the sample size n or thedegrees of freedom, n-1.
• As n increases the shapes of the t and zdistributions become almost identical.
•Mound-shaped and symmetric about 0.•More variable than z, with “heavier tails”
2104253Eng Stat I
5050
Graphs of t density functions
2104253Eng Stat I
5151
Using the t-Table• t-Table gives the values of t that cut off certain critical values
in the tail of the t distribution.• Index df and the appropriate tail area a to find ta,the value of t
with area a to its right.
For a random sample of size n = 10, find a value of t that cuts off .025 in the right tail.
Row = df = n –1 = 9
t.025 = 2.262
Column subscript = a = .025
2104253Eng Stat I
5252
Inference of Small Sample for a Population Mean
• For a 100(1)% confidence interval for the population mean
.1on with distributi- a of tailin the /2 area off cuts that of value theis where
:sidedTwo
2/
1,2/
ndfttt
nstx n
.1on with distributi- a of tailin the /2 area off cuts that of value theis where
:sidedTwo
2/
1,2/
ndfttt
nstx n
.1on with distributi- ta of upper tail in the area off cuts that of value theis where
:bound confidenceupper An 1,
ndftt
nstx n
.1on with distributi- ta of upper tail in the area off cuts that of value theis where
:bound confidenceupper An 1,
ndftt
nstx n
2104253Eng Stat I
53
ExampleFind the following t-values:• t0.05 for 5 df (Ans : 2.015)• t0.10 for 18 df (Ans : 1.33)• t0.99 for 30 df (Ans : -2.457)
Approximate the probability for the following t:• A right-tailed prob. with t = 3.21 and 16 df (Ans : 0.0028)• A left-tailed prob. with t = -8.77 and 7 df (Ans : 2.52x10-5)• A two-tailed prob. with t = 2.43 and 12 df (Ans : 0.032)
2104253Eng Stat I
54
Example: Test ScoresTest Scores on a 100-point test were recorded for 20 students71, 93, 91, 86, 75, 73, 86, 82,76, 57, 84, 89, 67, 62, 72, 77, 68,
65, 75, 84• Calculate mean and standard deviation of the scores.
(Ans : 76.65, 10.038)• Find 95% CI for the average test score in the population.
(Ans : 71.95 < µ < 81.35)
2104253Eng Stat I
55
Example: Household Income• Suppose a random sample of 14 people 30-39 years of ages
produced the household incomes shown below. Determine a point estimate for the population mean of household incomes for people 30-39 years of age and construct a 95% confidence interval. Assume house income is normally distributed.
• 37,600, 33,800, 42,400, 28,100, 46,500, 40,210, 35,550 • 44,900, 36,700, 32,700, 41,800, 38,300, 32,700, 36,600
(Ans : 37,704.29 ± 2,948.16)
2104253Eng Stat I
5656
The Sampling Distribution of the Sample Variance
If S2 is the variance of a random sample of size n takenfrom a normal population having the variance 2, then
is a random variable having the chi-square distributionwith the parameter (degree of freedom) df = n -1
22
2 12 2
( )( 1)
n
ii
X Xn S
2
( / 2) 1 / 2/ 2
pdf of random variable 1( ) 0
2 ( / 2)xf x x e x
2104253Eng Stat I
5757
Graphs of chi-squared density functions
2104253Eng Stat I
5858
Graphs of chi-squared density functions
2104253Eng Stat I
5959
Finding Probabilities for the Sample Variance
-Table contains selected values of 2 for various values of df, again called the number of freedom2 is the area under the chi-square distribution to its right is equal .Unlike the normal distribution, it is necessary to tabulate value of 2 for > 0.50 because the chi-square distribution is not symmetrical.Find the appropriate area using 2 -Table
2104253Eng Stat I
6060
Chi-square Table• -Table gives both upper and lower critical values of the chi-square statistic for a given df.
For example, the value of chi-square that cuts off .05 in the upper tail of the distribution with df = 5 is 2 =11.07.
2104253Eng Stat I
61
Excel Function • Excel function for both upper and lower critical values of the chi-square distribution for a given df.
The critical value of the chi-square that has area (p) to its right.=CHIINV(p,df)
Upper Value = CHIINV(0.05, 5) = 11.07
Lower Value = CHIINV(0.95, 5) = 1.15
2104253Eng Stat I
6262
A Confident Interval for the Variance and StDev.
2
2 22
2 2/ 2, 1 (1 / 2), 1
A 100 (1- )% confidence interval for :( 1) ( 1)
n n
n s n s
2
2 22
2 2/ 2, 1 (1 / 2), 1
A 100 (1- )% confidence interval for :( 1) ( 1)
n n
n s n s
2 2
2 2/ 2, 1 (1 / 2), 1
A 100 (1- )% confidence interval for :( 1) ( 1)
n n
n s n s
2 2
2 2/ 2, 1 (1 / 2), 1
A 100 (1- )% confidence interval for :( 1) ( 1)
n n
n s n s
2104253Eng Stat I
63
Example: Household Income• Suppose a random sample of 14 people 30-39 years of ages
produced the household incomes shown below. Determine a point estimate for the population variance of household incomes for people 30-39 years of age and construct a 95% confidence interval. Assume house income is normally distributed.
• 37,600, 33,800, 42,400, 28,100, 46,500, 40,210, 35,550 • 44,900, 36,700, 32,700, 41,800, 38,300, 32,700, 36,600
(Ans : 13,707,053.72 < 2 < 67,691,978.62)
2104253Eng Stat I
64
Example: Test ScoresTest Scores on a 100-point test were recorded for 20 students71, 93, 91, 86, 75, 73, 86, 82,76, 57, 84, 89, 67, 62, 72, 77, 68,
65, 75, 84• Find 99% CI for the population variance.
(Ans : 49.62 < 2 < 279.73)