basic biostat8 - 10: intro to statistical inference1
Post on 21-Dec-2015
221 views
TRANSCRIPT
Basic Biostat 8 - 10: Intro to Statistical Inference
1
Chapters 8 – 10 (Summary)Chapters 8 – 10 (Summary)
Basic Biostat 8 - 10: Intro to Statistical Inference
2
Statistical InferenceStatistical inference is the act of generalizing from a sample to a population with calculated degree of certainty.
using statistics calculated in the sample
We want to learn about population parameters…
Basic Biostat 8 - 10: Intro to Statistical Inference
3
Parameters Statistics
Source Population Sample
Calculated? No Yes
Constants? Yes No
Examples μ, σ, p psx ˆ,,
Parameters and Statistics
We MUST draw distinctions between parameters and statistics
Basic Biostat 8 - 10: Intro to Statistical Inference
4
There are two forms of statistical inference:
• Hypothesis (“significance”) tests
• Confidence intervals
Statistical Inference
Basic Biostat 8 - 10: Intro to Statistical Inference
5
Example: NEAP math scores• Young people have a better chance of good jobs and
wages if they are good with numbers. • NAEP math scores
– Range from 0 to 500 – Have a Normal distribution– Population standard deviation σ is known to be 60– Population mean μ not known
• We sample n = 840 young mean• Sample mean (“x-bar”) = 272• Population mean µ unknown• We want to estimate population mean NEAP score µ
Reference: Rivera-Batiz, F. L. (1992). Quantitative literacy and the likelihood of employment among young adults. Journal of Human Resources, 27, 313-328.
Basic Biostat 8 - 10: Intro to Statistical Inference
6
1. Data acquired by Simple Random Sample
2. Population distribution is Normal or large sample
3. The value of σ is known4. The value of μ is NOT known
Conditions for Inference
Basic Biostat 8 - 10: Intro to Statistical Inference
7
The distribution of potential sample means:The Sampling Distribution of the Mean
• Sample means will vary from sample to sample
• In theory, the sample means form a sampling distribution
• The sampling distribution of means is Normal with mean μ and standard deviation equal to population standard deviation σ divided by the square root of n:
xSE
x
n
This relationship is known as the square root law
This statistic is known as the standard error
Basic Biostat 8 - 10: Intro to Statistical Inference
8
For our example, the population is Normal with σ = 60 (given). Since n = 840,
Standard Error of the mean
12840
60.
n
x
SE
Standard error
Basic Biostat 8 - 10: Intro to Statistical Inference
9
Margin of Error m for 95% Confidence
• The 68-95-99.7 rule says 95% of x-bars will fall in the interval μ ± 2∙SExbar
• More accurately, 95% will fall in μ ± 1.96∙SExbar
• 1.96∙SExbar is the margin of error m for 95% confidence
2.4
1.296.1
96.1
example data For the
xSEm
Basic Biostat 8 - 10: Intro to Statistical Inference
10
In repeated independent samples:
We call these intervals Confidence Intervals (CIs)
Basic Biostat 8 - 10: Intro to Statistical Inference
11
How Confidence Intervals Behave
Basic Biostat 8 - 10: Intro to Statistical Inference
12
Other Levels of ConfidenceConfidence intervals can be calculated at various levels of confidence by altering coefficient z1-α/2
α (“lack of confidence level”) .10 .05 .01
Confidence level (1–α)100% 90% 95% 99%
z1-α/2 1.645 1.960 2.576
Basic Biostat 8 - 10: Intro to Statistical Inference
13
(1–α)100% Confidence Interval for
for μ when σ known
n
σSEx where
xSE
21
zx
Basic Biostat 8 - 10: Intro to Statistical Inference
15
Note that m is a function of • confidence level 1 – α (as confidence goes up, z increase
and m increases)• population standard deviation σ (this is a function of the
variable and cannot be altered)• sample size n (as n goes up, m decreases; to decrease m,
increase n!)
Margin of Error (m)
n
σm= z α
2
1
Margin of error (m) quantifies the precision of the sample mean as an estimatir of μ. The direct formula for m is:
Basic Biostat 8 - 10: Intro to Statistical Inference
16
Tests of Significance• Recall: two forms of statistical inference
– Confidence intervals– Hypothesis tests of statistical significance
• Objective of confidence intervals: to estimate a population parameter
• Objective of a test of significance: to weight the evidence against a “claim”
Basic Biostat 8 - 10: Intro to Statistical Inference
17Basics of Significance Testing 17
Tests of Significance: Reasoning
• As with confidence intervals, we ask what would happen if we repeated the sample or experiment many times
• Let X ≡ weight gain– Assume population
standard deviation σ = 1 – Take an SRS of n = 10,
– SExbar = 1 / √10 = 0.316
– Ask: Has there weight gain in the population?
Basic Biostat 8 - 10: Intro to Statistical Inference
18Basics of Significance Testing 18
Tests of Statistical Significance: Procedure
A. The claim is stated as a null hypothesis H0 and alternative hypothesis Ha
B. A test statistic is calculated from the data
C. The test statistic is converted to a probability statement called a P-value
D. The P-value is interpreted
Basic Biostat 8 - 10: Intro to Statistical Inference
199: Basics of Hypothesis Testing 19
Test for a Population Mean – Null Hypothesis
• Example: We want to test whether data in a sample provides reliable evidence for a population weight gain
• The null hypothesis H0 is a statement of “no weight gain”
• In our the null hypothesis is H0: μ = 0
Basic Biostat 8 - 10: Intro to Statistical Inference
20
Alternative Hypothesis
• The alternative hypothesis Ha is a statement that contradicts the null.
• In our weight gain example, the alternative hypothesis can be stated in one of two ways– One-sided alternative Ha: μ > 0
(“positive weight change in population”)
– Two-sided alternative Ha: μ ≠ 0(“weight change in the population”)
Basic Biostat 8 - 10: Intro to Statistical Inference
21Basics of Significance Testing 21
Test Statistic
nSE
x
SE
x
hypothesis null under theparameter theof valuethe
mean sample the
0
where
stat
x
μxz 0
Basic Biostat 8 - 10: Intro to Statistical Inference
22Basics of Significance Testing 22
Test Statistic, Example
3.230.3162
01.020
x
SE
μxzstat
3126.010
1
10;021 :Data
1 :Given
xSE
n.x
Basic Biostat 8 - 10: Intro to Statistical Inference
23Basics of Significance Testing 23
P-Value from z tableConvert z statistics to a P-value:
• For Ha: μ> μ0
P-value = Pr(Z > zstat) = right-tail beyond zstat
• For Ha: μ< μ0
P-value = Pr(Z < zstat) = left tail beyond zstat
• For Ha: μμ0
P-value = 2 × one-tailed P-value
Basic Biostat 8 - 10: Intro to Statistical Inference
24Basics of Significance Testing 24
P-value: Interpretation • P-value (definition) ≡ the probability the sample mean
would take a value as extreme or more extreme than observed test statistic when H0 is true
• P-value (interpretation) Smaller-and-smaller P-values → stronger-and-stronger evidence against H0
• P-value (conventions)
.10 < P < 1.0 evidence against H0 not significant
.05 < P ≤ .10 marginally significant
.01 < P ≤ .05 significant
0 < P ≤ .01 highly significant
Basic Biostat 8 - 10: Intro to Statistical Inference
25Basics of Significance Testing 25
P-value: Example• zstat = 3.23 • One-sided P-value = Pr(Z > 3.23) = 1 − 0.9994 = 0.0006
• Two-sided P-value = 2 × one-sided P= 2 × 0.0006 = 0.0012
Conclude: P = .0012 Thus, data provide highly significant evidence against H0
Basic Biostat 8 - 10: Intro to Statistical Inference
26Basics of Significance Testing 26
• α ≡ threshold for “significance”• If we choose α = 0.05, we require evidence so
strong that it would occur no more than 5% of the time when H0 is true
• Decision ruleP-value ≤ α evidence is significantP-value > α evidence not significant
• For example, let α = 0.01P-value = 0.0006 Thus, P < α evidence is significant
Significance Level
Basic Biostat 8 - 10: Intro to Statistical Inference
27Basics of Significance Testing 27
Summary
Basic Biostat 8 - 10: Intro to Statistical Inference
2828
Relation Between Tests and CIs• The value of μ under the null hypothesis is
denoted μ0
• Results are significant at the α-level of when μ0 falls outside the (1–α)100% CI
• For example, when α = .05 (1–α)100% = (1–.05)100% = 95% confidence
• When we tested H0: μ = 0, two-sided P = 0.0012. Since this is significant at α = .05, we expect “0” to fall outside that 95% confidence interval
Basic Biostat 8 - 10: Intro to Statistical Inference
29Basics of Significance Testing 29
Relation Between Tests and CIs
1.64 to40.062.002.110
196.102.1for CI 95%
21
n
σzx
Data: x-bar = 1.02, n = 10, σ = 1
Notice that 0 falls outside the 95% CI, showing that the test of H0: μ = 0 will be significant at α = .05