statistical estimation
TRANSCRIPT
BiostatisticsStatistical Estimation
Dr Remya.G
Statistics
\][ STATISTICAL ANALYSIS
DESCRIPTIVE INFERENTIALL
NUMERICAL GRAPHICAL
UNIVERIATE ?MULTIVARIATE
?
Estimation Hypothesis testing
Point estimate
Interval estimate Univariate
Multivariate
Inferential statistics
The part of statistics that allows researchers to generalize their findings to a larger population beyond data from the sample collected.
Two ways to make inference
–Estimation of parameters * Point Estimation * Intervals Estimation–Hypothesis Testing
Basic terminology
• Parameter –the numbers that describe the charactreistics of the population(mean, sd, varience etc)
• Statistic- The numbers that describe characteristics of scores in the sample (mean, variance, s.d., correlation coefficient,etc .)
ParameterStatistic
Mean:
Standard deviation:
Proportion:
s
X _
__
_
from sample
from entire population
p
Basic Logic
• Information from samples is used to estimate information about the population.
• Statistics are used to estimate parameters.
POPULATION
SAMPLE
PARAMETER
STATISTIC
Estimation
The process by which one makes inferences about a population, based on information obtained from a sample.
Point estimateInterval estimate
Point estimate
• Point estimates are single points that estimates parameter directly which serve as a "best guess" or "best estimate" of an unknown population parameter
• sample proportion pˆ (“p hat”) is the point estimate of p
• sample mean x (“x bar”) is the point estimate of μ
• sample standard deviation s is the point estimate of σ
Problem
• iIn a health survey of 55 school boys,it was found that the mean hemoglobin level was 10.2 g per 100 ml with a standard deviation of 2.1.Estimate the mean hemoglobin level of the population of such school boys.
Point estimate of the population mean is 10.2
Disadvantages of point estimatesPoint estimate do not provide
information about sample to sample variability
How precise is x as an estimate of μ
How much can we expect x vary from μ
Sampling distribution of the mean
X X X
Sampling Distribution
• Sampling Distribution: A theoretical distribution that shows the frequency of occurrence of values of some statistic computed for all possible samples of size N drawn from some population.
• Sampling Distribution of the Mean: A theoretical distribution of the frequency of occurrence of values of the mean computed for all possible samples of size N from a population
sampling distribution as N- increases Mean
Central Limit Theorem
States that the sampling distribution of means, for samples of 30 or more:– Is normally distributed (regardless of the shape of the
population from which the samples were drawn)– Has a mean equal to the population mean, “mu” regardless
of the shape population or of the size of the sample– Has a standard deviation--the standard error of the mean--
equal to the population standard deviation divided by the square root of the sample size
Square root law
Confidence interval
CI is the probability that the interval computed from the sample data includes the population parameter of interest
FACTORS AFFECTING CONFIDENCE INTERVAL Distribution of Means and Standard Error of the Means
umu
+2sem-2sem +1sem-1sem-3sem +3sem
Population mean
Confidence interval
Confidence limits
• The α (“alpha”) level represents the “lack of confidence”• (1−α)100% represent the confidence level of a
confidence interval
• Confidence interval =
• z1-α/2 instead of z1-α in this formula is because the random error (imprecision) is split between right and left tail
Z values for different confidence level
Area under the curve
Z table 2 tailed
Area
und
er th
e cu
rve
Second decimal places
1.96=1.9+0.06
Process for Constructing Confidence Intervals
• Compute the sample statistic (e.g. a mean)• Compute the standard error of the mean• Make a decision about level of confidence that is
desired (usually 95% or 99%)• Find tabled value for 95% or 99% confidence
interval• Multiply standard error of the mean by the tabled
value• Form interval by adding and subtracting
calculated value to and from the mean
Problems
• iIn a health survey of 55 school boys,it was found that the mean hemoglobin level was 10.2 g per 100 ml with a standard deviation of 2.1.Estimate the mean hemoglobin level of the population of such school boys.
Problems
• iIn a health survey of 55 school boys,it was found that the mean hemoglobin level was 10.2 g per 100 ml with a standard deviation of 2.1.Estimate the mean hemoglobin level of the population of such school boys.
X =10.2 s=2.1SE= =0.28395% CI= 10.2-1.96 x 0.283 to 10.2+ 1.96 x 0.283
=9.6 to 10.7599% CI= 9.47 to 10.93
Problem
• In a survey on hearing level of schoolchildren with normal hearing it was found that in the frequency 500 cycles per second,62 children tested in the sound proof room had a mean hearing threshold of 15.5 db with a standard deviation of 6.5.Another 76 comparable children who were tested in the field had a mean threshold of 20 db with a standard deviation of 7.1.what is the 95% confidence interval for the difference in mean.
Here 2 independent samples,sound proof room tested and field tested sample given
The confidence interval of difference in means =difference in means +/_ 1.96 SE of difference in means
sqrt [ s21 / n1 + s2
2 / n2 ]
= 4.5-1.96x1.17 to 4.5+1.96x1.17= 2.21 to 6.79
SE of difference in means = Pooled SD x sqrt [1/ n1 + 1 / n2 ]
Problemm
• In an otological examination of school children out of 146 children examined 21 were found to have otological abnormalities,Find the 99% confidence interval for the proportion of children with otological abnormalities.
Answer
• p=21 x 100/146 = 14.4%• q= 85.3• 99% CI= p +/_2.57 SE of proportion• SE of proportion = √pq/n
Problem
• Find the best estimate of the mean and 95% CI of the mean using the data
Sl no Protein value
1 6
2 7
3 8
4 6
5 8
6 7
7 6
8 7
9 8
10 6
• Best estimate is the mean of sample= 6.9
• Interval estimate -95% CI= x +/- t0.05 SE of x
t0.05 is found from t table with df= 9
• In case If 2 independent sample is given with sample size less than 30 and difference in CI to be found
• CI=difference in means +/_ t0.05 SE of difference in means
t0.05 found from the t table with df = n1+n2-2 SE of difference in means = use n-1 in the
equation for pooled sd
T table
An assumption about the population parameter.
I assume the mean SBP of population is 120 mmHg
What is a Hypothesis?
How can u test this hypothesis ??????
Hypothesis testing – Next Class
Thank you