prof. dr. s. k. bhattacharjee department of statistics university of rajshahi
TRANSCRIPT
Prof. Dr. S. K. Bhattacharjee Department of Statistics
University of Rajshahi
Statistical Inference• Statistical inference is the process of
making judgment about an unknown population based on sample.
• An important aspect of statistical inference is using estimates to approximate the value of an unknown population parameter.
• Another type of inference involve choosing between two opposing views or statements about the population; this process is called hypothesis testing.
Statistical EstimationAn estimator is a statistical parameter that
provides an estimation of a population parameter.
Point EstimationInterval Estimation
Point EstimationA point estimator is a single numerical
estimate of a population parameter. The sample mean, is a point estimator of
the population mean, μ. The sample proportion, p is a point
estimator of the population proportion, π.
, .
Properties of a good Estimator Principles of Parameter
Estimation Unbiased
– The expected value of the estimate is equal to population parameter
Consistent– As n (sample size) approaches N (population size),estimator converges to the population parameter
Efficient– With the smallest variance.
Minimum Mean-Squared Error -Variance of estimator be as low as possible.
Sufficient– Contains all information about the parameter througha sample of size n
Unbiased EstimatorAn unbiased estimator
is a statistics that has an expected value equal to the population parameter being estimated.
E[θ]n = θ0 for any nExamples:The sample mean, is an unbiased estimator
of the population mean, μ.The sample variance is an unbiased
estimator of the population variance,
Consistent EstimatorsA statistics is a consistent estimator of a parameter
if its probability that it will be close to the parameter's true value approaches 1 with increasing sample size.
The standard error of a consistent estimator becomes smaller as the sample size gets larger.
The sample mean and sample proportions are consistent estimators, since from their formulas as n gets bigger, the standard errors become smaller.
and
Mathematically, a sequence of estimators {tn; n ≥ 0} is a consistent estimator for parameter θ if and only if, for all ϵ > 0, no matter how small, we have
Consistent EstimatorsAn estimator's distribution (like that of any any other non trivial statistic) becomes narrower and narrower, and more and more normal-like as larger and larger samples are considered. If we take for granted the fact that the variance of the estimator will tend to 0 as the sample size grows without limit, what consistency really means is that the mean of the estimator's distribution tends to θ0 as the sample size grows without limit, as shown in the upper and lower images below :
In technical terms, a consistent estimator is a sequence of random variables indexed by n (the sample size) that converge in probability to θ0.
Relative EfficiencyA parameter may have several unbiased estimators. For example, given a symmetrical continuous distribution, both : * The sample mean and * The sample median are unbiased estimators of the distribution mean (when it exists). Which one should we choose ?
Certainly we should choose the estimator that generates estimates that are closer (in a probabilistic sense) to the true value θ0 than estimates generated by the other one. One way to do that is to select the estimator with the lower variance.
This leads to the definition of the relative efficiency of two unbiased estimators. Given two unbiased estimators θ *
1 and θ *2 of
the same parameter θ , one defines the efficiency of θ *2 with respect
to θ *1 (for a given sample size n) as the ratio of their variances :
Relative efficiency (θ *2 with respect to θ *
1)n = Var(θ *1)n / Var(θ *
2 )n
Efficient Estimator
The estimator has a low variance, usually relative to other estimators, which is called relative efficiency. Otherwise, the variance of the estimator is minimized.
An efficient estimator consider the reliability of the estimator in terms of its tendency to have a smaller standard error for the same sample size when compared each other
The median is an unbiased estimator of μ when the sample distribution is normally distributed; but is standard error is 1.25 greater than that of the sample mean, so the sample mean is a more efficient estimator than the median.
The Maximum Likelihood Estimator is the most efficient estimator among all the unbiased ones.
Minimum Mean-Squared Error Estimator
The practitioner is not particularly keen on unbiasedness. What is really important is that, on the average, the estimate θ* be close to the true value θ 0. So he will tend to favour estimators such that the mean-square error :
E[(θ* - θ0 )]²
be as low as possible, whether θ * is biased or not. Such an estimator is called a minimum mean-square-error estimator. Given two estimators :
θ *1: that is unbiased, but with a large variance, θ *2 : that is somewhat biased, but with a small variance,
θ *2 might prove a better estimator than θ *1 in practice .
Minimum Mean-Squared Error Estimator
Sufficient EstimatorWe have shown that and are unbiased
estimators of μ and .
Are we loosing any information about our target parameters relying on these statistics?
The statistics, that summarizes all the
information about target parameters are said to have the property of sufficiency, or they are called sufficient statistics.
“Good” estimators are (or can be made to be)
functions of any sufficient statistic.
Sufficient Estimator*Let 1 2, ,..., nY Y Y denote a random sample from a probability
distribution with unknown parameter . Then the statistics u is said
to be sufficient for if the conditional distribution of 1 2, ,..., nY Y Y
given u does not depend on .
*Let u be a statistic based on the random sample 1 2, ,..., nY Y Y . Then
u is sufficient statistic for the estimation of a parameter if and only
if the likelihood 1 2, ,..., |nL y y y can be factored into two
nonnegative functions
1 2 1 2, ,..., | , , ,...,n nL y y y g u h y y y
Where ,g u is a function only of u and , and
1 2, ,..., nh y y y is not a function of .
Example : Sufficient Estimator
Example : Sufficient Estimator
Methods of Point Estimation
Classical Approach. Bayesian Approach. Classical Approach: Method of Moment Method of Maximum Likelihood Method of Least Square
Method of Moments
i) Sample moments should provide good estimates of the corresponding population moments.
ii) Because the population moments are functions of population parameters, we can use i) to get these parameters
Formal Definition:
Choose as estimates those values of the parameters that are solutions of the equations ' '
k km , for 1,2,...,k t , where t is the number of parameters to
be estimated.
ExampleA random sample 1 2, ,..., nY Y Y is selected from a population in which iY possesses a uniform density
function over the interval 0, where is
unknown. Use the method of moments to estimate .
Solution
The value of '1 for a uniform random variable is
'1 2
The corresponding first sample moment is
'1
1
1 n
ii
m Y Yn
From which:
'1 2
Y
Thus,
ˆ 2Y
Method of Maximum LikelihoodThe likelihood and log-likelihood functions are the basis for deriving estimators for parameters, given data. While the shapes of these two functions are different, they have their maximum point at the same value. In fact, the value of parameter that corresponds to this maximum point is defined as the Maximum Likelihood Estimate (MLE). This is the value that is “mostly likely" relative to the other values. The maximum likelihood estimate of the unknown parameter in the model is that value that maximizes the log-likelihood, given the data.
Method of Maximum LikelihoodUsing calculus one could take the first partial
derivative of the likelihood or log-likelihood function with respect to the parameter(s), set it to zero and solve for parameter(s). This solution will give the MLE(s).
Method of Maximum LikelihoodIf x is a continuous random variable with pdf:
where are k unknown constant parameters which need to be estimated, conduct an experiment and obtain N independent observations, x1, x2,...,xN. Then the likelihood function is given by the following product:
The logarithmic likelihood function is given by:
The maximum likelihood estimators (MLE) of are obtained by maximizing L or .
By maximizing , which is much easier to work with than L, the maximum likelihood estimators (MLE) of are the simultaneous solutions of k equations such that:
Properties of Maximum Likelihood EstimatorsFor “large" samples (“asymptotically"), MLEs are
optimal.1. MLEs are asymptotically normally distributed.2. MLEs are asymptotically “minimum variance."3. MLEs are asymptotically unbiased (MLEs are
often biased, but the bias→ 0 as n → ∞.MLE is consistent The Maximum Likelihood Estimator is the most
efficient estimator among all the unbiased ones.
Maximum likelihood estimation represents the backbone of statistical estimation.
ExampleSuppose. Find the MLE of p.
The likelihood is
and the loglikelihood is
Taking derivatives and solving, we find
ExampleSuppose . Find the MLE of .
The likelihood is
and the loglikelihood is
Maximizing this equation,
Method of Least SquaresA statistical technique to determine the line of
best fit for a model. The least squares method is specified by an
equation with certain parameters to observed data.
This method is extensively used in regression analysis and estimation.
Ordinary least squares - a straight line is sought to be fitted through a number of points to minimize the sum of the squares of the distances (hence the name "least squares") from the points to this line of best fit.
Method of Least SquaresDefine the distance from the data point from the line, denoted by u, as follows:
Method of Least Squares
Example: Method of Least Squares
To illustrate the computations of b and a, refer to the following data. All the sums required are computed and shown here:
Interval EstimationEstimation of the parameter is not sufficient. It is necessary to analyse and see how confident we can be about this particular estimation. One way of doing it is defining confidence intervals. If we have estimated we want to know if the “true” parameter is close to our estimate. In other words we want to find an interval that satisfies following relation:
P{L ˂ μ ˂ U} ≥ 1- α
I.e. probability that “true” parameter is in the interval (L ,U) is greater than 1-. Actual realisation of this interval - (L ,U) is called a 100(1- )% of confidence interval, limits of the interval are called lower and upper confidence limits. 1- is called confidence level.
Example: If population variance is known (2) and we estimate population mean then
We can find from the table that probability of Z is more than 1 is equal to 0.1587. Probability of Z is less than -1 is again 0.1587. These values comes from the tables of the standard normal distribution.
)1,0( normal is /
Nn
xZ
Interval EstimationInterval estimation, Credible interval, and Prediction intervalConfidence intervals are one method of interval estimation, and the most widely used in Classical statistics. An analogous concept in Bayesian statistics is credible intervals, while an alternative Classical and Bayesian both methods is that of prediction intervals which, rather than estimating parameters, estimate the outcome of future samples.
An interval estimator of the sample mean can be expressed as the probability that the mean between two values. Interval estimation, “Confidence Interval”– use a range of numbers within which theparameter is believed to fall (lower bound,upper bound)– e.g. (10, 20)
Interval Estimation for the mean of a Normal Distribution
Confidence IntervalThe simplest and most commonly used formula for a binomial confidence interval relies on approximating the binomial distribution with a normal distribution. This approximation is justified by the central limit theorem. The formula is
where is the proportion of successes in a Bernoulli trial process estimated from the statistical sample, z1 − α / 2 is the 1 − α / 2 percentile of a standard normal distribution, α is the error percentile and n is the sample size. For example, for a 95% confidence level the error (α) is 5%, so 1 − α / 2 = 0.975 and z1 − α / 2 = 1.96.
Exponential DistributionThe 100(1 − α)% exact confidence interval for this estimate is given by[2]
which is also equal to:
where is the MLE estimate, λ is the true value of the parameter, and χ2p,ν is the
100(1 – p) percentile of the chi squared distribution with ν degrees of freedom
Bayesian EstimationBayesian statistics views every unknown as a random quantity. Bayesian statistics is a little more complicated in the simple cases than computing the Maximum Likelihood Estimate Suppose we have data from a
distribution. Our goal is to estimate the unknown . The first step in
Bayesian statistics is to select a prior distribution, , intended to represent prior information about the . Often, you don't have any available. In this case, the prior should be relatively diffuse. For example, if we are trying to guess the average height (in feet) of students at RU, we may know enough to realize the most student are between 5 and 6 feet tall, and therefore the mean should be between 5 and 6 feet, but we may not want to be more specific than that. We
wouldn't, for example, want to specify . Even though 5.6 feet may be a good guess, this prior places almost all its mass between 5.599995 and 5.600005 feet, indicating we are almost sure, before seeing any data, then the mean height is in this range. I'm personally not that sure, so I
might choose a much more diffuse prior, such as setting , indicating that I'm sure the mean height is between 5 and 6 feet but every value in there seems about as likely as any other.
Prior and Posterior DistributionThe tool for guessing at the parameters value with prior knowledge of parameter and data is called the posterior distribution, which is defined as the conditional distribution of the parameter given data, formally
where is the likelihood function.
The posterior is a distribution over and has all the usual properties of a distribution. In particular
Prior and Posterior Distribution
Prior and Posterior Distribution1. Although not guaranteed, in almost all practical situations the posterior
distribution provides a more refined guess of than the prior. We are combining our prior information with the information contained in the data to make better guesses about .
2. If we observe a large amount of data, the posterior distribution is determined almost exclusively by the data, and tends to place more and more mass near the true value of . Thus, we don't have to be too precise about specifying our prior distribution in advance. Any errors will tend to wash out as we observe more data.
Properties of Posterior MeanThe Bayes estimate of a parameter is the posterior
mean. Usually the posterior distribution will have some common distributional form (such as Gamma, Normal, Beta, etc.). Some things to remember about the posterior mean
The data only enter the equation for the posterior in terms of the likelihood function. Therefore, the parameters of the posterior distribution, and hence the posterior mean, are functions of the sufficient statistics.
Often the posterior mean has lower MSE than the MLE for portions of the parameter space, so its a worthwhile estimator to consider and compare to the MLE.
The posterior mean is consistent, asymptotically unbiased (meaning the bias tends to 0 as the sample size increases), and the asymptotic efficiency of the MLE compared to the posterior mean is 1. Actually, for large n the MLE and posterior mean are very similar estimators, as we will see in the examples.
Example: GeometricSuppose we wished to use a general prior. We would like a formula
for the posterior in terms of and . We proceed as before, finding the prior density to be
The likelihood is unchanged, so the product of the prior and likelihood simplifies is
The prior parameters and are treated as fixed constants.Thus the Gamma functions in front may be considered part of the normalizing constant C, leaving the kernel
This is the kernel of a distribution, with posterior mean
Example : Binomial
Example: PoissonLet . Suppose you have a prior on . Compute the posterior distribution of .
As stated above, our first goal is to compute and simplify and product of the likelihood and
prior. If the data are , then the likelihood is
and the prior density is
The posterior distribution
which simplifies to
Example: PoissonAll the , , and are constants since is the only thing random in this expression. The terms that involve are
Hence the posterior distribution is
distribution.
The Bayes estimate is the posterior mean. The posterior mean of a
distribution is
Notice that
The only terms that get large as n increases are and n. Thus, for large n, is approximately
, the MLE.
Example: Normal
Bayesian Interval Estimation
Prediction
Predictive Distribution : Binomial-Beta
Predictive Density : Normal-Normal
Predictive Distribution : Binomial-Beta