lecture 9: estimation - unbddu/2623/lecture_notes/lecture9_student.pdflecture 9: estimation donglei...

Lecture 9: Estimation

Donglei Du(ddu@unb.edu)

Faculty of Business Administration, University of New Brunswick, NB Canada FrederictonE3B 9Y2

Donglei Du (UNB) ADM 2623: Business Statistics 1 / 1

Table of contents

Layout

What is Statistical inference?

Statistical inference/inferential statistics is the process of drawingconclusions about population from samples that are subject torandom variation.

We will learn three basic Statistical inference techniques in the rest ofthe course

EstimationHypothesis testingLinear regression analysis

Layout

Estimation theory

Estimation theory is a branch of statistics that deals with estimatingthe values of parameters based on measured/empirical data that hasa random component.

An estimate is a single value that is calculated based on samples andused to estimate a population value

An estimator is a function that maps the sample space to a set ofestimates.

The entire purpose of estimation theory is to arrive at an estimator,which takes the sample as input and produces an estimate of theparameters with the corresponding accuracy.

Two Types of Estimator

There are two types of estimators

Point estimatorInterval estimator

Point Estimator

A point estimator is a statistic (that is, a function of the data) thatis used to infer the value of an unknown parameter in a statisticalmodel.

A point estimate is one of the possible values a pointer estimatorcan assume.

Mathematically, suppose there is a fixed parameter θ that needs to beestimated and X is a random variable corresponding to the observeddata. Then an estimator of θ, usually denoted by the symbol θ̂, is afunction of the random variable X, and hence itself a random variableθ̂(X).

A point estimate for a particular observed dataset (i.e. for X = x) isthen θ̂(x), which is a fixed value.

Assessing Point Estimator

A point estimator can be evaluated based on:

Unbiasedness (mean): whether the mean of this estimator is close tothe actual parameter?Efficiency (variance): whether the standard deviation of this estimatoris close to the actual parameterConsistency (size): whether the probability distribution of the estimatorbecomes concentrated on the parameter as the sample sizes increases

Sampling Error, Bias and Mean squared error

Sampling Error: The error of the estimator θ̂(X) for the parameter θis defined as:

e(θ̂(X)) := θ̂(X)− θ

The bias of the estimator θ̂ is defined as the expected value of theerror:

B(θ̂(X)) := E[θ̂(X)

]− θ = E

[θ̂(X)− θ

]The mean squared error of θ̂(X) is defined as the expected value(probability-weighted average, over all samples) of the squared errors;namely,

MSE(θ̂(X)) = E

[(e(θ̂(X))

)2]= E

[(θ̂(X)− θ)2

The MSE, variance, and bias, are related:

MSE(θ̂) = var(θ̂) + (B(θ̂))2

Unbiased Point Estimator

A point estimator θ̂ for the parameter θ is unbiasedness if its bias iszero; namely

E(θ̂(X))− θ = E(θ̂(X)− θ) = 0

Sample mean X̄ is an unbiasedness point estimator for populationmean µ; namely

E(X̄)− µ = E(X̄ − µ) = 0

Sample variance S2 is an unbiasedness point estimator for populationvariance σ2; namely

E(S2)− σ2 = E(S2 − σ2) = 0

Efficient Point Estimator

For the same parameter θ, an unbiased point estimator θ̂1 is moreefficient than another unbiased point estimator θ̂2 if

Var(θ̂1(X)) < Var(θ̂2(X))

The minimum-variance unbiased estimator (MVUE) is an unbiasedestimator that has lower variance than any other unbiased estimatorfor all possible values of the parameter.For a normal distribution with unknown mean and variance:

Sample mean X̄ is the MVUE for population mean µ; namely

E(X̄)− µ = E(X̄ − µ) = 0

Sample variance S2 is the MVUE for population variance σ2; namely

E(S2)− σ2 = E(S2 − σ2) = 0

For other distributions the sample mean and sample variance are notin general MVUEs.

Interval Estimator

An interval estimator of a population parameter under randomsampling consists of two random variables, which are called the upperand lower limits of the interval estimator, whose values decideintervals which expect to contain the parameter estimated.

Interval estimates are the all the ranges an interval estimator canassume. Each interval estimate states an range within which apopulation parameter probably lies.

An interval estimator can be assessed through

Accuracy (confidence level)Precision (margin of error)

Accuracy

Accuracy is also called the confidence level of the estimator, and itis defined to be the probability that an interval estimator obtained willcontain the value of the population parameter.

Any possible outcome of an interval estimator is called an intervalestimate.

An interval estimate with confidence level (1− α) is called an(1− α)-confidence interval.

The upper and lower limits of a confidence interval are called theupper and lower limits, respectively.

Precision

Precision is also called the margin of error.

It is measured by the half of width of the interval estimates, i.e., thedifference of the upper and lower limits of the confidence interval.

Balancing Accuracy with Precision

Accuracy and precision are two opposite driving forces, they areinversely related.

Our objective is to design an interval estimator such that

the confidence level sufficiently highthe margin of error sufficiently small.

How to Design an Interval Estimator

An interval estimator is usually designed in the following way:

(i) take an unbiased point estimator, and(ii) define an interval of reasonable width around it.

We will use the above technique to design an interval estimator forpopulation mean

Layout

An interval Estimator for Population Mean

(i) Sample mean X̄ is an unbiased point estimator.

(ii) Define an interval estimator around it with margin of error equal tow: [

X̄ − w, X̄ + w]

A notation

For any given α, denote zα/2 to be the z-score such that

P(Z ≥ zα/2

For example:α zα/2

0.1 1.650.05 1.960.02 2.330.01 2.58

A notation

2/z2/z

2013/9/23 Donglei Du: Lecture 1 1Donglei Du (UNB) ADM 2623: Business Statistics 21 / 1

(1− α)-confidence interval for Population Mean

Given a the confidence level (1− α), an (1− α)-confidence intervalfor population mean µ is:[

X̄ − zα/2σ√n, X̄ + zα/2

σ√n

]:= X̄ ± zα/2

σ√n

In order to apply the above formula, one of the following assumptionsmust be satisfied:

(i) If n ≥ 30 and σ is known.(ii) Or, if n < 30, σ is known, and the population is normally distributed.

If n ≥ 30 and σ is unknown, the standard deviation s of the sample isused to approximate the population standard deviation σ:[

X̄ − zα/2s√n, X̄ + zα/2

]:= X̄ ± zα/2

(1− α)-confidence interval for Population Mean

If n < 30, σ is unknown, and the population is normally distributed,then we should use the Student t distribution:[

X̄ − tα/2s√n, X̄ + tα/2

]:= X̄ ± tα/2

If n < 30, σ is unknown, and the population is not normallydistributed, then we should use non-parametric statistics, which willbe discussed in Advanced Statistics, such as ADM 3628.

The Weight Example

For the weight example, find the 95 percent confidence interval forthe population mean.

Let us check the assumption first:

(i) n = 62 ≥ 30(ii) σ is unknown

So we should use (2). x̄ = 155.8548, s ≈ 12.0964, andzα/2 = z0.95/2 = 1.96 from the normal table. The 95%-CI for µ is:

x̄± zα/2s√n≈ 155.8548± 1.96

12.0964√62

= [152.8438, 158.8658]

R code: z-test

> weight <- read.csv("weight.csv")

> x<-weight$Weight.01A.2013Fall

> library(BSDA)

> z.test(x, sigma.x=sd(x), conf.level = 0.95)

One-sample z-Test

data: x

z = 101.4519, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

152.8439 158.8658

sample estimates:

mean of x

155.8548

Interpreting confidence intervals correctly

Incorrect: A 95%-CI for population mean µ does not mean:The probability that this interval contains the mean

with probability 95%.

Correct (well, almost): Instead it meansIf we randomly construct 100 such confidence intervals,

95 of them will contain the mean!

Selecting the Sample Size

Given the confidence level (1− α) and the margin of error w, whatshould be the minimum number of sample size?

Depending on the assumptions, n should be chosen such that

n ≥(zα/2σ

n ≥(zα/2s

n ≥(tα/2s

Example

Example: A consumer group would like to estimate the meanmonthly electricity charge for a single family house in July (within $5)using a 99 percent level of confidence. Based on similar studies thestandard deviation is estimated to be $20.00.

Problem: How large a sample is required

Solution: Let us check the assumption first:

(i) n is unkonwn and we want to find it out. But assume it is at least 30!(ii) σ is known

So we could use (4): zα/2 = z0.01/2 = 2.58 from the normal table,σ = 20, and w = 5. So

n ≥(zα/2σ

(2.58(20)

≈ (10.32)2 ≈ 106.5.

A minimum of 107 homes must be sampled.

lecture 9: estimation - unbddu/2623/lecture_notes/lecture9_student.pdflecture 9: estimation donglei...

Documents

improving software effort estimation using neuro-fuzzy...

debris estimation debris estimation field guidefield...

social network analysis: lecture 3-network...

optical flow estimation versus motion estimation

parametric density estimation: bayesian estimation. naïve...

1 lbayesian estimation (be) l bayesian parameter estimation:...

mass estimation - github pagesmass estimation has constant...

social network analysis: small world phenomenon and...

socialnetworkanalysis: centralitymeasures -...

debris estimation debris estimation field guidefield … ·...

unit 1 estimation of earthwork earthwork estimation of

lecture 4: measure of dispersion - university of new...

lecture 7: gmm estimation of spatial models1 estimation of...

socialnetworkanalysis:...

image unmixing success estimation in spatially … ·...

relationship between time estimation, cost estimation, and

supply chain management: risk pooling -...

specification estimation andspecification, estimation and...

supply chain management: international issues in...

social network analysis lecture 5{strength of weak ties...