statistical estimation

BiostatisticsStatistical Estimation

Dr Remya.G

Statistics

\][ STATISTICAL ANALYSIS

DESCRIPTIVE INFERENTIALL

NUMERICAL GRAPHICAL

UNIVERIATE ?MULTIVARIATE

?

Estimation Hypothesis testing

Point estimate

Interval estimate Univariate

Multivariate

Inferential statistics

The part of statistics that allows researchers to generalize their findings to a larger population beyond data from the sample collected.

Two ways to make inference

–Estimation of parameters * Point Estimation * Intervals Estimation–Hypothesis Testing

Basic terminology

• Parameter –the numbers that describe the charactreistics of the population(mean, sd, varience etc)

• Statistic- The numbers that describe characteristics of scores in the sample (mean, variance, s.d., correlation coefficient,etc .)

ParameterStatistic

Mean:

Standard deviation:

Proportion:

s

X _

__

_

from sample

from entire population

p

Basic Logic

• Information from samples is used to estimate information about the population.

• Statistics are used to estimate parameters.

POPULATION

SAMPLE

PARAMETER

STATISTIC

Estimation

The process by which one makes inferences about a population, based on information obtained from a sample.

Point estimateInterval estimate

Point estimate

• Point estimates are single points that estimates parameter directly which serve as a "best guess" or "best estimate" of an unknown population parameter

• sample proportion pˆ (“p hat”) is the point estimate of p

• sample mean x (“x bar”) is the point estimate of μ

• sample standard deviation s is the point estimate of σ

Problem

• iIn a health survey of 55 school boys,it was found that the mean hemoglobin level was 10.2 g per 100 ml with a standard deviation of 2.1.Estimate the mean hemoglobin level of the population of such school boys.

Point estimate of the population mean is 10.2

Disadvantages of point estimatesPoint estimate do not provide

information about sample to sample variability

How precise is x as an estimate of μ

How much can we expect x vary from μ

Sampling distribution of the mean

X X X

Sampling Distribution

• Sampling Distribution: A theoretical distribution that shows the frequency of occurrence of values of some statistic computed for all possible samples of size N drawn from some population.

• Sampling Distribution of the Mean: A theoretical distribution of the frequency of occurrence of values of the mean computed for all possible samples of size N from a population

sampling distribution as N- increases Mean

Central Limit Theorem

States that the sampling distribution of means, for samples of 30 or more:– Is normally distributed (regardless of the shape of the

population from which the samples were drawn)– Has a mean equal to the population mean, “mu” regardless

of the shape population or of the size of the sample– Has a standard deviation--the standard error of the mean--

equal to the population standard deviation divided by the square root of the sample size

Square root law

Confidence interval

CI is the probability that the interval computed from the sample data includes the population parameter of interest

FACTORS AFFECTING CONFIDENCE INTERVAL Distribution of Means and Standard Error of the Means

umu

+2sem-2sem +1sem-1sem-3sem +3sem

Population mean

Confidence interval

Confidence limits

• The α (“alpha”) level represents the “lack of confidence”• (1−α)100% represent the confidence level of a

confidence interval

• Confidence interval =

• z1-α/2 instead of z1-α in this formula is because the random error (imprecision) is split between right and left tail

Z values for different confidence level

Area under the curve

Z table 2 tailed

Area

und

er th

e cu

rve

Second decimal places

1.96=1.9+0.06

Process for Constructing Confidence Intervals

• Compute the sample statistic (e.g. a mean)• Compute the standard error of the mean• Make a decision about level of confidence that is

desired (usually 95% or 99%)• Find tabled value for 95% or 99% confidence

interval• Multiply standard error of the mean by the tabled

value• Form interval by adding and subtracting

calculated value to and from the mean

Problems


Problems


X =10.2 s=2.1SE= =0.28395% CI= 10.2-1.96 x 0.283 to 10.2+ 1.96 x 0.283

=9.6 to 10.7599% CI= 9.47 to 10.93

Problem

• In a survey on hearing level of schoolchildren with normal hearing it was found that in the frequency 500 cycles per second,62 children tested in the sound proof room had a mean hearing threshold of 15.5 db with a standard deviation of 6.5.Another 76 comparable children who were tested in the field had a mean threshold of 20 db with a standard deviation of 7.1.what is the 95% confidence interval for the difference in mean.

Here 2 independent samples,sound proof room tested and field tested sample given

The confidence interval of difference in means =difference in means +/_ 1.96 SE of difference in means

sqrt [ s21 / n1 + s2

2 / n2 ]

= 4.5-1.96x1.17 to 4.5+1.96x1.17= 2.21 to 6.79

SE of difference in means = Pooled SD x sqrt [1/ n1 + 1 / n2 ]

Problemm

• In an otological examination of school children out of 146 children examined 21 were found to have otological abnormalities,Find the 99% confidence interval for the proportion of children with otological abnormalities.

Answer

• p=21 x 100/146 = 14.4%• q= 85.3• 99% CI= p +/_2.57 SE of proportion• SE of proportion = √pq/n

Problem

• Find the best estimate of the mean and 95% CI of the mean using the data

Sl no Protein value

1 6

2 7

3 8

4 6

5 8

6 7

7 6

8 7

9 8

10 6

• Best estimate is the mean of sample= 6.9

• Interval estimate -95% CI= x +/- t0.05 SE of x

t0.05 is found from t table with df= 9

• In case If 2 independent sample is given with sample size less than 30 and difference in CI to be found

• CI=difference in means +/_ t0.05 SE of difference in means

t0.05 found from the t table with df = n1+n2-2 SE of difference in means = use n-1 in the

equation for pooled sd

T table

An assumption about the population parameter.

I assume the mean SBP of population is 120 mmHg

What is a Hypothesis?

How can u test this hypothesis ??????

Hypothesis testing – Next Class

Thank you

statistical estimation

Health & Medicine