for 95 out of 100 (large) samples, the interval will contain the true population mean. but we...
Post on 13-Jan-2016
216 Views
Preview:
TRANSCRIPT
For 95 out of 100 (large) samples, the interval
will contain the true population mean.
nx x96.1
But we don’t know ?!
Inference for the Mean of a Population
To estimate , we use a confidence interval around x.
The confidence interval is built with , which we replace with s (the sample std. dev.) if is not known.
nx x96.1
t-distributions
ns
The “standard error” of x.
nsx
t
The “standard error” of x.
For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom.
(see Table D)
t-distributions
t-distributions with k (=n-1) degrees of freedom – are labeled t(k), – are symmetric around 0, – and are bell-shaped – … but have more variability than Normal
distributions, due to the substitution of s in the place of .
Example: Estimating the level of vitamin C
Data:
26 31 23 22 11 22 14 31 Find a 95% confidence interval for . A: ( , ) Write it as “estimate plus margin of error”
STATA Exercise 1
STATA Exercise 2
STATA Exercise 2
STATA Exercises 3 and 4
Paired, unpaired tests
“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
STATA Exercise 5
STATA Exercise 6
Robustness of t procedures
t-tests are only appropriate for testing a hypothesis on a single mean in these cases:– If n<15: only if the data is Normally distributed
(with no outliers or strong skewness)– If n≥15: only if there are no outliers or strong
skewness– If n≥40: even if clearly skewed (because of the
Central Limit Theorem)
Comparing Two Means
Comparing Two Means
Suppose we make a change to the registration procedure. Does this reduce the number of mistakes?
Basically, we’re looking at two populations: – the before-change population (population 1)– the after-change population (population 2)
Is the mean number of mistakes (per student) different? Is 1 – 2 = 0 or 0?
Comparing Two Means
Notice that we are not matching pairs. We compare two groups.
Comparing Two Means
Population Variable MeanStandard Deviation
1 x1 1 1
2 x2 2 2
Comparing Two Means
PopulationSample
SizeSample Mean
Sample Standard Deviation
1 n1 x1 s1
2 n2 x2 s2
Comparing Two Means
The population, really, is every single student using each registration procedure, an infinite number of times.– Suppose we get a “good” result today: how do we
know it will be repeated tomorrow? We can’t repeat the procedure an infinite
number of times, we only have a “sample”: numbers from one year.
We estimate (1 – 2) with (x1 – x2) .
Comparing Two Means
Remember is a Random Variable. To estimate we need both and the margin of error around , which is
So we need to know ,or rather, the appropriate standard error for this estimation.
Because we are estimating a difference, we need the standard error of a difference.
nt x*x
nx
xx
=0
Comparing Two Means
If the standard error for is
Then the standard error for (x1 – x2) is
1
1
n
1x
2
22
1
21
nn
2
22
1
21
2121
nn
xxt
STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.
Two-sample significance test
STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.
STATA Exercise 7
Paired, unpaired tests
“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.
Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test
whether the difference of the means is zero.Ho: mean(pretest) - mean(posttest) = diff = 0
STATA Exercise 5
STATA Exercise 8ttest ego, by(group) unequal
Robustness and Small Samples
Two-sample methods are more robust than one-sample methods.– More so if the two samples have similar shapes
and sample sizes. STATA assumes that the variances are the same (what
the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option.
Small samples, as always, make the test less robust.
Pooled two-sample t procedures
Pooled two-sample t procedures
Suppose the two Normal population distributions have the same standard deviation.
Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution.
Pooled two-sample t procedures
The common, but unknown standard deviation of both populations is . The sample standard deviations s1 and s2 estimate .
The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights:
2
11
21
222
2112
nn
snsnsp
(assuming is the same for both populations)
21
11
nnsp
Here, t* is the value for the t(n1 + n2 – 2) density curve with area C between – t* and t*.
To test the hypothesis Ho: 1 = 2, compute the pooled two-sample t statistic
And use P-values from the t(n1 + n2 – 2) distribution.
21
21
11nn
s
xxt
p
THE POOLED TWO-SAMPLE T PROCEDURES
ttest ego, by(group)
top related