two sample test
DESCRIPTION
Details of Two Sample TestTRANSCRIPT
Chapter 10
Two-Sample Tests and ANOVA
Learning ObjectivesIn this chapter, you learn hypothesis testing procedures to
test: • The means of two independent populations• The means of two related populations • The proportions of two independent populations• The variances of two independent populations• The means of more than two populations
Chapter Overview
One-Way Analysis of Variance (ANOVA)
F-test
Tukey-Kramer test
Two-Sample Tests
Population Means, Independent Samples
Means, Related Samples
PopulationProportions
PopulationVariances
Two-Sample Tests
Two-Sample Tests
Population Means,
Independent Samples
Means, Related Samples
Population Variances
Mean 1 vs. independent Mean 2
Same population before vs. after treatment
Variance 1 vs.Variance 2
Examples:
Population Proportions
Proportion 1 vs. Proportion 2
Difference Between Two Means
Population means, independent
samples
σ1 and σ2 known
Goal: Test hypothesis or form a confidence interval for the difference between two population means, μ1 – μ2
The point estimate for the difference is
X1 – X2
*
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Independent Samples
Population means, independent
samples
• Different data sources• Unrelated• Independent
• Sample selected from one population has no effect on the sample selected from the other population
• Use the difference between 2 sample means
• Use Z test, a pooled-variance t test, or a separate-variance t test
*σ1 and σ2 known
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Difference Between Two Means
Population means, independent
samples
σ1 and σ2 known
*Use a Z test statistic
Use Sp to estimate unknown σ , use a t test statistic and pooled standard deviation
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Use S1 and S2 to estimate unknown σ1 and σ2, use a separate-variance t test
Population means, independent
samples
σ1 and σ2 known
σ1 and σ2 KnownAssumptions:
Samples are randomly and independently drawn
Population distributions are normal or both sample sizes are 30
Population standard deviations are known
*σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown, not assumed equal
Population means, independent
samples
σ1 and σ2 known …and the standard error of X1 – X2 is
When σ1 and σ2 are known and both populations are normal or both sample sizes are at least 30, the test statistic is a Z-value…
2
22
1
21
XX nσ
nσσ
21
(continued)σ1 and σ2 Known
*σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown, not assumed equal
Population means, independent
samples
σ1 and σ2 known
2
22
1
21
2121
nσ
nσ
μμXXZ
The test statistic for μ1 – μ2 is:
σ1 and σ2 Known
*
(continued)
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Hypothesis Tests forTwo Population Means
Lower-tail test:
H0: μ1 μ2
H1: μ1 < μ2
i.e.,
H0: μ1 – μ2 0H1: μ1 – μ2 < 0
Upper-tail test:
H0: μ1 ≤ μ2
H1: μ1 > μ2
i.e.,
H0: μ1 – μ2 ≤ 0H1: μ1 – μ2 > 0
Two-tail test:
H0: μ1 = μ2
H1: μ1 ≠ μ2
i.e.,
H0: μ1 – μ2 = 0H1: μ1 – μ2 ≠ 0
Two Population Means, Independent Samples
Two Population Means, Independent Samples
Lower-tail test:
H0: μ1 – μ2 0H1: μ1 – μ2 < 0
Upper-tail test:
H0: μ1 – μ2 ≤ 0H1: μ1 – μ2 > 0
Two-tail test:
H0: μ1 – μ2 = 0H1: μ1 – μ2 ≠ 0
a a/2 a/2a
-za -za/2za za/2
Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2
or Z > Za/2
Hypothesis tests for μ1 – μ2
Population means, independent
samples
σ1 and σ2 known
2
22
1
21
21nσ
nσZXX
The confidence interval for μ1 – μ2 is:
Confidence Interval, σ1 and σ2 Known
*σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown, not assumed equal
Population means, independent
samples
σ1 and σ2 known
σ1 and σ2 Unknown, Assumed Equal
Assumptions: Samples are randomly and independently drawn
Populations are normally distributed or both sample sizes are at least 30
Population variances are unknown but assumed equal
*σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Population means, independent
samples
σ1 and σ2 known
(continued)
*
Forming interval estimates:
The population variances are assumed equal, so use the two sample variances and pool them to estimate the common σ2
the test statistic is a t value with (n1 + n2 – 2) degrees of freedom
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
σ1 and σ2 Unknown, Assumed Equal
Population means, independent
samples
σ1 and σ2 knownThe pooled variance is
(continued)
1)n(n
S1nS1nS21
222
2112
p
()1*σ1 and σ2 unknown,
assumed equal
σ1 and σ2 unknown, not assumed equal
σ1 and σ2 Unknown, Assumed Equal
Population means, independent
samples
σ1 and σ2 known
Where t has (n1 + n2 – 2) d.f.,
and
21
2p
2121
n1
n1S
μμXXt
The test statistic for μ1 – μ2 is:
*
1)n()1(nS1nS1nS
21
222
2112
p
(continued)
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
σ1 and σ2 Unknown, Assumed Equal
Population means, independent
samples
σ1 and σ2 known
21
2p2-nn21
n1
n1StXX
21
The confidence interval for μ1 – μ2 is:
Where
1)n()1(n
S1nS1nS21
222
2112
p
*
Confidence Interval, σ1 and σ2 Unknown
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Pooled-Variance t Test: ExampleYou are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data: NYSE NASDAQNumber 21 25Sample mean 3.27 2.53Sample std dev 1.30 1.16
Assuming both populations are approximately normal with equal variances, isthere a difference in average yield (a = 0.05)?
Calculating the Test Statistic
1.50211)25(1)-(21
1.161251.301211)n()1(n
S1nS1nS22
21
222
2112
p
2.040
251
2115021.1
02.533.27
n1
n1S
μμXXt
21
2p
2121
The test statistic is:
SolutionH0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)a = 0.05df = 21 + 25 - 2 = 44Critical Values: t = ± 2.0154
Test Statistic: Decision:
Conclusion:Reject H0 at a = 0.05
There is evidence of a difference in means.
t0 2.0154-2.0154
.025
Reject H0 Reject H0
.025
2.040
2.040
251
2115021.1
2.533.27t
Population means, independent
samples
σ1 and σ2 known
σ1 and σ2 Unknown, Not Assumed Equal
Assumptions: Samples are randomly and independently drawn
Populations are normally distributed or both sample sizes are at least 30
Population variances are unknown but cannot be assumed to be equal*
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
Population means, independent
samples
σ1 and σ2 known
(continued)
*
Forming the test statistic:
The population variances are not assumed equal, so include the two sample variances in the computation of the t-test statistic
the test statistic is a t value (statistical software is generally used to do the necessary computations)
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
σ1 and σ2 Unknown, Not Assumed Equal
Population means, independent
samples
σ1 and σ2 known
2
22
1
21
2121
nS
nS
μμXXt
The test statistic for μ1 – μ2 is:
*
(continued)
σ1 and σ2 unknown, assumed equal
σ1 and σ2 unknown, not assumed equal
σ1 and σ2 Unknown, Not Assumed Equal
Related Populations Tests Means of 2 Related Populations
• Paired or matched samples• Repeated measures (before/after)• Use difference between paired values:
• Eliminates Variation Among Subjects• Assumptions:
• Both Populations Are Normally Distributed• Or, if not Normal, use large samples
Related samples
Di = X1i - X2i
Mean Difference, σD KnownThe ith paired difference is Di , where
Related samples
Di = X1i - X2i
The point estimate for the population mean paired difference is D : n
DD
n
1ii
Suppose the population standard deviation of the difference scores, σD, is known
n is the number of pairs in the paired sample
The test statistic for the mean difference is a Z value:Paired
samples
nσ
μDZD
D
Mean Difference, σD Known(continued)
WhereμD = hypothesized mean differenceσD = population standard dev. of differencesn = the sample size (number of pairs)
Confidence Interval, σD Known
The confidence interval for μD isPaired samples
nσZD D
Where n = the sample size
(number of pairs in the paired sample)
If σD is unknown, we can estimate the unknown population standard deviation with a sample standard deviation:
Related samples
1n
)D(DS
n
1i
2i
D
The sample standard deviation is
Mean Difference, σD Unknown
• Use a paired t test, the test statistic for D is now a t statistic, with n-1 d.f.:Paired
samples
1n
)D(DS
n
1i
2i
D
nS
μDtD
D
Where t has n - 1 d.f.
and SD is:
Mean Difference, σD Unknown
(continued)
The confidence interval for μD isPaired samples
1n
)D(DS
n
1i
2i
D
nStD D
1n
where
Confidence Interval, σD Unknown
Lower-tail test:
H0: μD 0H1: μD < 0
Upper-tail test:
H0: μD ≤ 0H1: μD > 0
Two-tail test:
H0: μD = 0H1: μD ≠ 0
Paired Samples
Hypothesis Testing for Mean Difference, σD Unknown
a a/2 a/2a
-ta -ta/2ta ta/2
Reject H0 if t < -ta Reject H0 if t > ta Reject H0 if t < -ta/2 or t > ta/2 Where t has n - 1 d.f.
• Assume you send your salespeople to a “customer service” training workshop. Has the training made a difference in the number of complaints? You collect the following data:
Paired t Test Example
Number of Complaints: (2) - (1)Salesperson Before (1) After (2) Difference, Di
C.B. 6 4 - 2 T.F. 20 6 -14 M.H. 3 2 - 1 R.K. 0 0 0 M.O. 4 0 - 4 -21
D =Di
n
5.67
1n)D(D
S2
iD
= -4.2
• Has the training made a difference in the number of complaints (at the 0.01 level)?
- 4.2D =
1.6655.67/04.2
n/SμDt
D
D
H0: μD = 0H1: μD 0
Test Statistic:
Critical Value = ± 4.604 d.f. = n - 1 = 4
Reject
a/2 - 4.604 4.604
Decision: Do not reject H0(t stat is not in the reject region)
Conclusion: There is not a significant change in the number of complaints.
Paired t Test: Solution
Reject
a/2
- 1.66a = .01
Two Population Proportions
Goal: test a hypothesis or form a confidence interval for the difference between two population proportions,
π1 – π2
The point estimate for the difference is
Population proportions
Assumptions: n1 π1 5 , n1(1- π1) 5
n2 π2 5 , n2(1- π2) 5
21 pp
Two Population Proportions
Population proportions
21
21
nnXXp
The pooled estimate for the overall proportion is:
where X1 and X2 are the numbers from samples 1 and 2 with the characteristic of interest
Since we begin by assuming the null hypothesis is true, we assume π1 = π2 and pool the two sample estimates
Two Population Proportions
Population proportions
21
2121
n1
n1)p(1p
ppZ ππ
The test statistic for p1 – p2 is a Z statistic:
(continued)
2
22
1
11
21
21
nXp ,
nXp ,
nnXXp
where
Confidence Interval forTwo Population Proportions
Population proportions
2
22
1
1121 n
)p(1pn
)p(1pZpp
The confidence interval for π1 – π2 is:
Hypothesis Tests forTwo Population Proportions
Population proportions
Lower-tail test:
H0: π1 π2
H1: π1 < π2
i.e.,
H0: π1 – π2 0H1: π1 – π2 < 0
Upper-tail test:
H0: π1 ≤ π2
H1: π1 > π2
i.e.,
H0: π1 – π2 ≤ 0H1: π1 – π2 > 0
Two-tail test:
H0: π1 = π2
H1: π1 ≠ π2
i.e.,
H0: π1 – π2 = 0H1: π1 – π2 ≠ 0
Hypothesis Tests forTwo Population Proportions
Population proportions
Lower-tail test:
H0: π1 – π2 0H1: π1 – π2 < 0
Upper-tail test:
H0: π1 – π2 ≤ 0H1: π1 – π2 > 0
Two-tail test:
H0: π1 – π2 = 0H1: π1 – π2 ≠ 0
a a/2 a/2a
-za -za/2za za/2
Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2
or Z > Za/2
(continued)
Example: Two population Proportions
Is there a significant difference between the proportion of men and the proportion of women who will vote Yes on Proposition A?
• In a random sample, 36 of 72 men and 31 of 50 women indicated they would vote Yes
• Test at the .05 level of significance
• The hypothesis test is:H0: π1 – π2 = 0 (the two proportions are equal)
H1: π1 – π2 ≠ 0 (there is a significant difference between proportions)
The sample proportions are: Men: p1 = 36/72 = .50
Women: p2 = 31/50 = .62
.54912267
50723136
nnXXp
21
21
The pooled estimate for the overall proportion is:
Example: Two population Proportions
(continued)
The test statistic for π1 – π2 is:
Example: Two population Proportions
(continued)
.025
-1.96 1.96
.025
-1.31
Decision: Do not reject H0
Conclusion: There is not significant evidence of a difference in proportions who will vote yes between men and women.
1.31
501
721.549)(1.549
0.62.50
n1
n1p(1p
ppz
21
2121
)
Reject H0 Reject H0
Critical Values = ±1.96For a = .05