prep course 17: statistical decision theory, hypothesis testing and common statistical ... · ·...
TRANSCRIPT
PREP Course 17:
STATISTICAL DECISION THEORY,
HYPOTHESIS TESTING
AND
COMMON STATISTICAL TESTS
Presented by:
Cristina P. Sison, PhD
CME Disclosure Statement
• The North Shore LIJ Health System adheres to the ACCME’s new Standards for Commercial Support. Any individuals in a position to control the content of a CME activity, including faculty, planners, and managers, are required to disclose all financial relationships with commercial interests. All identified potential conflicts of interest are thoroughly vetted by the North Shore-LIJ for fair balance and scientific objectivity and to ensure appropriateness of patient care recommendations.
• Course Director, Kevin Tracey , has disclosed a commercial interest in Setpoint, Inc. as the cofounder, for stock and consulting support. He has resolved his conflicts by identifying a faculty member to conduct content review of this program who has no conflicts.
• Cristina Sison has nothing to disclose
Outline
• Statistical Inference
• Testing Hypothesis
• Null and Alternative Hypotheses
• Type I error () and Type II error ()
• Alpha Level
• p-value
• Common Statistical Tests
@2002-2012 mcips
Outline • Common Statistical Tests:
Parametric Tests
Non-Parametric Tests, etc.
• one sample t-test for the mean • paired t-test • two-sample t-test, ANOVA • Chi-square test, Fisher‘s Exact Test • McNemar‘s Test
• Sign test • Mann-Whitney Test, Kruskal-Wallis Test • Survival analysis, Logistic Regression
@2002-2012 mcips
@2002-2012 mcips
Decision Theory
concerned with the problem of making decisions
in the presence of statistical knowledge which sheds light on some of the uncertainties involved in the decision problem
^
Unknown Numerical Quantities ()
@2002-2012 mcips
Example (Berger, 1985)
Market a new
drug?
Y
N
Factors : •Proportion of people for which drug is effective (1) •Proportion of the Market the drug will capture (2)
Conduct experiments; obtain info
on
1 , 2
@2002-2012 mcips
Decision-Theoretic Framework
1 Unknown parameter
a1 Decision Action
Loss Function: L(1 , a1) -will never be known with certainty at time of action
Risk Function: Expected [L(1 , a1)] (average loss over all possibilities)
Optimal Decision?
Classical (Frequentist)
Bayesian
@2002-2012 mcips
The Nature and Purpose of Statistical Inference
Inference =drawing of conclusions from data
Statistical Inference
•Draw conclusions •Use information (Quantitative/qualitative) •Use statistical methods Describe data
Test Hypothesis @2002-2012 mcips
Statistical Inference from a Sample
Population
Sample
Statistical Inference
Sample 1
Sample 2
Sample 3
Sample 4
Sample 6
Sample 7
Sample 5
Random
Assumptions must be met
@2002-2012 mcips
Extrapolating from Sample to Population
• Quality control
• Political polls
• Clinical studies-rarely random sample; representative
• Laboratory experiments
Assume: population being sampled is infinite.
@2002-2012 mcips
Statistical Hypothesis Testing -Helps one decide whether an observed difference is likely to be caused by chance. Statistical Hypothesis Testing involves stating:
Which conclusion does my data support? Better: Does my data support the alternative?
``process‖ = Statistical Test
Null Hypothesis Alternative Hypothesis vs.
@2002-2012 mcips
@2002-2012 mcips
“Data don’t make any sense,
we will have to resort to statistics.”
Important: What is your research question?
What is the hypothesis?
• Well-defined endpoints (outcome variables) • Quantification of outcomes
@2002-2012 mcips
Example 1. One sample, continuous variable
Outcome variable: Alcohol intake (g/day) in men with skin diseases other than psoriasis
•In the population: Mean alcohol intake: 21.0 g/day, SD=34.2
•In a study: n=142 men with psoriasis Sample mean alcohol intake: 42.99 g/day
•Question: 42.99 g/day =*** 21.0 g/day?
@2002-2012 mcips
Example 2. Two Independent Samples, continuous variables
Outcome variable: Highest urinary excretions of 5-HIAA
(mg per 24 hours) -from Dawson & Trapp-p 113
Carcinoid Heart Disease
(n=16)
vs No Carcinoid Heart Disease
(n=12)
263 450 283 524 288 1270 274 135 432 220 580 500 890 350 285 120
60 124 43 119 196 854 153 14 400 588 23 73
@2002-2012 mcips
Example 3. Paired Measurements, continuous variables
Effect of low-calorie intake on abnormal pulmonary physiology in patients with chronic hypercapneic respiratory failure Arterial Oxygen Tension (mmHg)
-Am J Med 1984, 77: 987-994 (from p 108 Dawson & Trapp)
Patient Before After
1 2 3 4 5 6 7 8
70 59 53 54 44 58 64 43
82 66 65 62 74 77 68 59
@2002-2012 mcips
Example 4. Two Independent Samples, Comparing proportions
Low birth weight and smoking during pregnancy
Women who
smoked
during pregnancy
versus Women who
did not smoke
during pregnancy
@2002-2012 mcips
1
2
3
@2002-2012 mcips
Example (Motulsky): Comparing SBPs between 1st and 2nd Year
Med Students (MS1 vs. MS2, n=5 per group/class)
MS1: 120, 80, 90, 110, 95
MS2: 105, 130, 145, 125, 115
@2002-2012 mcips
Difference in means =25 mmHg
99 mmHg 124 mmHg
Substantial? Trivial?
@2002-2012 mcips
Use confidence intervals!
99 mmHg 124 mmHg
(79.2 ,118.8 ) (105.2, 142.8 )
Mean
95% CI:
@2002-2012 mcips
Q: What is the probability that the difference is due to chance?
Null Hypothesis
Alternative Hypothesis
H0
H1
- Represents a theory; assumed to be true - want to test if data supports H0
e.g. ―No difference‖ or ―No association‖
-Represents a theory against the ―Null‖ -Usually is what the scientist claims and aims to prove
e.g. ―There is a difference‖ or ―There is an association‖
@2002-2012 mcips
Stating the hypothesis in terms of statistical hypothesis
-―No difference‖ -―No association‖
―There is a difference‖ ―There is an association‖
vs.
H0: = 21.0 g/day H1: 21.0 g/day (two-tailed)
H1: > 21.0 g/day (one-tailed) vs.
H0: 1 = 2
H1: 1 2 (two-tailed)
H1: 1 > 2 (one-tailed) vs.
@2002-2012 mcips
Errors in Hypothesis Testing
=Level of Significance= chosen before statistical test is performed, also called ―alpha value”
=Maximum Probability of incorrectly rejecting the null when in fact it is true, i.e. H0 is wrongly rejected
=Probability of incorrectly ‗accepting‘ the null when in fact it is false, i.e. H0 is wrongly ‗accepted‘ Note: Power = 1-
Small sample size results in low power!
Type I Error
: False-Positive Error
Type II Error
: False-Negative Error
@2002-2012 mcips
Correct Decisions and Errors in Hypothesis Testing
True Situation
Co
ncl
usi
on
fro
m
Hy
po
thesi
s T
est
Difference Exists (H1)
No Difference (H0)
Difference Exists (Reject H0)
No Difference (Do not reject H0)
Type I error ( error)
Type II error ( error)
No error
No error Fail to reject H0
@2002-2012 mcips
Just like the US Legal System!
True Situation
TH
E J
UR
Y‘S
V
ER
DIC
T
Guilty (H1)
Innocent (H0)
Guilty!! (Reject H0)
Not Guilty!! (Do not reject H0)
Type I error ( error)
Type II error ( error)
No error
No error
Convict Innocent=Type I
Fail to convict guilty=Type II Fail to reject H0
@2002-2012 mcips
Guilty vs. Not Guilty
Significant vs. Not Significant
1.Presume Defendant innocent
Presume Null hypothesis is true
2. Factual evidence Observed Data
3. Evaluate witnesses Evaluate expt‟l flaws
4. Evidence consistent with innocence?
Calculate p-value
5. If evidence inconsistent with assumption of innocence, declare defendant to be guilty. Else not guilty.
If p-value<preset threshold, conclude data inconsistent with the null hypothesis, declare difference as statistically significant. Else not st. sig.
@2002-2012 mcips
What is a P-value (p)?
``If p < , then reject the null hypothesis‖
p= the probability that the observed difference (or a difference more extreme) could have been obtained by chance alone.
e.g.`If p < .05, reject H0 --- i.e. conclude that the data supports the alternative, i.e. there is a difference between the two groups.‘
@2002-2012 mcips
• Can he tell the difference between shaken or stirred martini?
(how often is he correct?)
• 16 taste tests
(Design: 50% shaken, 50% stirred)
• Bond correct on 13/16 (81.25%)
Bond…James Bond
@2002-2012 mcips
• If Bond was just guessing, (i.e., flipping a coin),
Prob (observing 13 or more out of 16 taste tests)
=Binomial (p=.5, n=16, x=13,14,15,16)=0.0106
• would have to be very lucky to be correct 13 or more times out of 16 if just guessing.
@2002-2012 mcips
Bond…James Bond
• Null Hypothesis: Bond is guessing (p=.5) • Alternative: Bond can tell shaken from stirred (p ne .5) • Prob (>=13 correct)=.0106. ??Did we prove the null hypothesis false (i.e. he
was guessing)? NO, but we strongly doubt it. • Therefore, there is strong evidence that Mr. Bond
can tell whether a drink was shaken or stirred. (reject the hypothesis that he was just guessing).
@2002-2012 mcips
Bond…James Bond
―Significant‖ • Statistical significance versus scientific importance
(caution: large samples significant results).
• Extremely significant?
significant* (p<.05) vs.
highly significant** (p<.01) vs.
extremely significant*** (p<.0001)
• Borderline p-values (marginally significant)
p<.049 or p<.051
• Not significant: does not prove the null hypothesis; fail to reject the null hypothesis; need to evaluate study (CIs or power)
@2002-2012 mcips
What if p > ???
• If p < , then reject H0 (―significant‖) • If p > , then fail to reject H0 (―not significant‖) • A high p-value does not prove the null
hypothesis; it means that data are not strong enough to persuade you to reject the null hypothesis.
• If not significant, evaluate:
- Confidence intervals
- Power of the study
@2002-2012 mcips
Test Statistic
-sometimes corresponds to a ―critical ratio”
A test statistic is a quantity calculated from a sample of data.
e.g. One sample t-test.
t= (X-)/(s/n)
Intuitively, a very large value of ‗t‘ or a very small value of ‗t‘ indicates support for the alternative
??? How large or how small should ‗t‘ be???
@2002-2012 mcips
___ ___
Trivia
• The t-distribution is sometimes called “Student’s t,” after the man (his pseudonym) who first noted the distribution of means from small samples in 1908.
• “Student” was really William Gosset, a mathematician who worked for the Guinness Brewery; was forced to use a pseudonym because company policy prohibited publishing
@2002-2012 mcips
STEPS IN TESTING STATISTICAL HYPOTHESIS
Step 1. State the research question in terms of a statistical hypothesis.
Step 2. Decide on the appropriate test statistic.
Step 3. Decide on the level of significance ().
Step 4. Determine the critical value for the test statistic to be declared as significant (p).
Step 5. Perform calculations.
Step 6. Draw and state your conclusion.
Note: for steps 4 & 5 computer programs might be helpful.
@2002-2012 mcips
Back to: Comparing SBPs between 1st and 2nd Year Med Students
(MS1 vs. MS2, n=5 per group/class)
MS1: 120, 80, 90, 110, 95
MS2: 105, 130, 145, 125, 115
@2002-2012 mcips
Comparing Two Means with the t-Test
Example: Comparing SBPs between MS1 and MS2
MS1 MS2
n1=5 Mean1= 99 SD1=15.97 n2=5 Mean2= 124 SD2= 15.17
t= (X2-X1) /pooled SD*( (1/ n1) + (1/n2) Pooled SD= (n1-1) SD1
2 + (n2-1) SD22 / (n1+n2-2)
t=2.538> t-table(df=n1+n2-2)=2.306 p=0.035*** there is only a 3.5% chance of randomly selecting samples whose means are as far apart as (or further than) we observed.
Mean difference =124-99 =25
@2002-2012 mcips
___ ___ _____________________
__________________________________________
Example 2. Highest urinary excretions of 5-HIAA (mg per 24 hours)
Carcinoid Heart Disease vs No Carcinoid Heart Disease
263 450 283 524 288 1270 274 135 432 220 580 500 890 350 285 120
60 124 43 119 196 854 153 14 400 588 23 73
n=16 Mean= 429.00 SD=294.67 n=12 Mean= 220.58 SD= 261.82
t-test for comparing the difference between 2 independent groups Assumptions: 1. Normality 2. Homogeneity (Equal variance)
@2002-2012 mcips
Histogram of the Raw Data
@2002-2012 mcips
Comparing Two Means with the t-Test
Example 2. Highest urinary excretions of 5-HIAA (mg per 24 hours)
Carcinoid Heart Disease No Carcinoid Heart Disease
n=16 Mean= 429.00 SD=294.67 n=12 Mean= 220.58 SD= 261.82
t=1.94; p=0.063
Mean difference =429-220.58 =208.42
Random sampling would create a difference this large or larger in 6.3% of expts if null was true.
@2002-2012 mcips
t= (X1-X2) /pooled SD*( (1/ n1) + (1/n2) Pooled SD= (n1-1) SD1
2 + (n2-1) SD22 / (n1+n2-2)
___ ___ _____________________
__________________________________________
Did data meet assumptions of normality and equal variance?
Normality—questionable.
If n were large, appeal to the Central Limit Theorem (which is what we just did, even though n was small)
@2002-2012 mcips
If „Departure from Normality‟ is large—what can be done???
Two Alternatives:
1. Transform the scale of the observations
2. Use different statistical methods to analyze data (nonparametric procedures)
-normalizing transformations
-variance-stabilizing transformations
-distribution-free statistics
(based on ranks)
@2002-2012 mcips
Carcinoid No Carcinoid
Raw UEX Raw UEX
@2002-2012 mcips
Log UEX Log UEX
Carcinoid No Carcinoid
@2002-2012 mcips
Carcinoid No Carcinoid
Square Root UEX Square Root UEX
@2002-2012 mcips
Don‟t forget EDA!!!
Actual Log-transformed
Square root- transformed
@2002-2012 mcips
@2002-2012 mcips
Nonparametric tests
• Make no rigid assumptions about the distribution of the population
• Calculations based on ranks (rather than the actual data values)
• Resilient to outliers (robust)
• ‗distribution-free tests‘
• E.g. Mann-Whitney Rank Sum Test
( alternative to the t-test when distribution is non-Gaussian)
@2002-2012 mcips
1. Rank all values regardless of group.
(For ties: assign the average.)
2. Sum the ranks in each group. Call the sum of the ranks T1 and T2.
3. Calculate U= T1- (n1 (n1 +1)/2)
or U=T2- (n2 (n2 +1)/2)
4. Look up table value
Mann-Whitney Test (a.k.a. Wilcoxon Rank Sum Test)
(Mann-Whitney U-statistic)
Z=|U- n1 n2/2|-0.5
n1 n2(n1 + n2 +1)/12
_____________________________ _____________________________
Intuitively, if Median1 is truly < Median2 then one would expect: T1 to be smaller than T2
@2002-2012 mcips
Mann-Whitney Test for the Example: Comparing SBPs between MS1 and MS2
MS1 SBP:
MS1 Ranks:
MS2 SBP:
MS2 Ranks:
120 80 90 110 95
7 1 2 5 3
105 130 145 125 115
4 9 10 8 6
T1=18 T2=37
U1=T1- (n1 (n1 +1)/2)=3 U2=T2- (n2 (n2 +1)/2)=22
Z=1.88 p=0.06 @2002-2012 mcips
Results from t-test vs. M-W test for MS1 vs. MS2 problem
t-test M-W test
p=0.035 p=0.06
When you use a non-parametric test with Gaussian data, the p-value tends to be too high.
@2002-2012 mcips
t-test and Mann-Whitney test
• t-test more powerful than M-W test when the assumptions for t-test are true
• With large samples, the difference in power is trivial
• With smaller samples, the difference is
more pronounced • If there are 7 or fewer total data points,
the Mann-Whitney test (two-tailed test) can never yield a p-value<0.05, however different the groups are
@2002-2012 mcips
Mann-Whitney Test Carcinoid Heart Disease (n=16) No CHD (n=12)
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16
263 288 432 890 450
1270 220 350 283 274 580 285 524 135 500 120
PTID 5-HIAA Rank
1 2 3 4 5 6 7 8 9
10 11 12
60 119 153 588 124 196
14 23 43
854 400
73
PTID 5-HIAA Rank
4
6
10
25
8
11
1
2
3
26
19
5
13
17
20
27
21
28
12
18
15
14
24
16
23
9
22
7
Mean, SD Rank 17.875, 6.13 Mean, SD Rank 10, 8.73
@2002-2012 mcips
tranks= (17.875-10) / 7.34*( (1/ 16) + (1/12) )
tranks= 7.875/2.80=2.81 p=0.009 ****
Mann-Whitney: a t-test on the ranks!
t= (R1-R2) /pooled SD*( (1/ n1) + (1/n2) Pooled SD= (n1-1) SD1
2 + (n2-1) SD22 / (n1+n2-2)
H0: Mean ranks equal in the two groups (if no difference fairly even distribution of ranks across grps)
compare to t-test result: t=1.94; p=0.063 (n.s.)
@2002-2012 mcips ____
___ ___ _______________________
__________________________________________
____________________
Why the discrepancy in results?
Normality ???
@2002-2012 mcips
Histogram of the Log-transformed Data with Normal Curve Overlayed
@2002-2012 mcips
T-test on the log-transformed values: Carcinoid Heart Disease (n=16) No CHD (n=12)
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16
263 288 432 890 450
1270 220 350 283 274 580 285 524 135 500 120
PTID 5-HIAA Rank
1 2 3 4 5 6 7 8 9
10 11 12
60 119 153 588 124 196
14 23 43
854 400
73
PTID 5-HIAA Rank
4
6
10
25
8
11
1
2
3
26
19
5
13
17
20
27
21
28
12
18
15
14
24
16
23
9
22
7
Mean, SD Log Mean, SD Log 5.88, 0.62 4.75, 1.25
5.57 5.66 6.06 6.79 6.10 7.14 5.39 5.85 5.64 5.61 6.36 5.65 6.26 4.90 6.21 4.78
LOG
4.09 4.77 5.03 6.37 4.82 5.27 2.63 3.13 3.76 6.74 5.99 4.29
LOG
@2002-2012 mcips
Compare Results: Carcinoid data
T-TEST MANN-WHITNEY
VARIABLE P-VALUE P-VALUE
1 uex 0.06323 0.01300
2 rank 0.00932 0.01300
3 loguex 0.00402 0.01300
ASSUMPTIONS IMPORTANT!
@2002-2012 mcips
Parametric tests vs Non-parametric tests
• Specific bacteria known • Unknown bacteria
Analogue:
use specific antibiotic to fight infection!
use wide-spectrum antibiotic to fight infection!
• Can be more powerful if you‘re sure of assumptions!!
• Conservative but ―safe‖!!!
Power
Requirements & Applicability:
• no distributional assumptions; unrestricted
• some distributional assumptions; restricted
@2002-2012 mcips
“Still, it is an error to argue in front of your data. You find yourself insensibly twisting them round to fit your theories.”
–Sherlock Holmes
The Adventure of Wisteria Lodge
(quoted by Casella and Berger)
@2002-2012 mcips
• Before and after intervention
• Recruit subjects as pairs, matched for certain variables (age, age range, diagnosis, treatment, etc.)
• Twins
• Child-parent pairs
distinguish variation among subjects vs. variation due to differences between groups
Special Test for Paired Data
@2002-2012 mcips
Rationale for paired design
Weight Loss Program-2 months –Dawson & Trapp
Patient Weight Before(kg) Weight After (kg)
1 2 3 4 5 6
100 89 83 98
108 95
95 84 78 93
103 90
Randomly select 3 before program
Randomly select 3 after program
(89+83+95)/3=89 kg (95+93+103)/3=97 kg
!!! 8 kg gain ??? But each lost 5 kg!!! VaRiAtIoN
@2002-2012 mcips
Example 3. Analysis of Arterial Oxygen Tension (mmHg) using paired t-test
Patient Before After
1 2 3 4 5 6 7 8
70 59 53 54 44 58 64 43
82 66 65 62 74 77 68 59
Difference (d)
12 7 12 8 30 19 4 16
Mean 55.6 69.1 13.5 SD 9.2 7.9 8.2
@2002-2012 mcips
Distribution of the differences
@2002-2012 mcips
Example 3. Analysis of Arterial Oxygen Tension (mmHg) using paired t-test
Patient Before After
1 2 3 4 5 6 7 8
70 59 53 54 44 58 64 43
82 66 65 62 74 77 68 59
Difference (d)
12 7 12 8 30 19 4 16
Mean 55.6 69.1 13.5 SD 9.2 7.9 8.2
Step 1. H0: =0 H1: 0
Step 2. t-test for mean difference: t= mean(d)/SEmean(d) Step 3. =0.10 Step 4. t-value, df=7 t>1.895 or t<-1.895 Step 5. t=13.5/(8.2/sqrt(8))=4.7
Step 6. t>1.895 Rej H0
There is a difference ()in AOT after weight reduction
Mean 55.6 69.1 13.5
@2002-2012 mcips
Wilcoxon Signed Rank Test on Ex. 3-Arterial Oxygen sample
Patient
1 2 3 4 5 6 7 8
Difference (d=aft - bef)
+12 +7 +12 +8 +30 +19 +4 +16
Absolute Value of d
12 7 12 8 30 19 4 16
Rank Signed Rank
+4.5 +2
+4.5 +3 +8 +7 +1 +6
4.5 2
4.5 3 8 7
1 6
@2002-2012 mcips
1. For each pair, calculate the signed difference.
2. Rank the absolute values of all differences (i.e. ignore sign temporarily). If any of the differences is equal to 0, ignore them entirely. For ties: assign the average.
3. Add up the ranks of all positive differences and all of the negative differences. Call the sum of the ranks T+
and T-.
3. Look up table value (details omited here) or
use large sample approximation—calculate T* and
if T* >Z(α) Reject null hypothesis.
T*= T+-[n(n+1)/4]/sqrt[n(n+1)(2n+1)/24]
Wilcoxon Signed Rank Test (a.k.a. Wilcoxon Signed Rank Sum Test)
@2002-2012 mcips
• Paired t-tests: tests that „intervention‟ always causes the same average absolute difference, regardless of starting value
• Sometimes, intervention will cause the same average relative difference.
A look at t-tests on the ‗Ratio‘
@2002-2012 mcips
Ex. Enzyme Activity: Control vs. Treated Cells -from Motulsky
Ctl Trt
1 2 3 4 5
24 6
16 5 2
52 11 28 8 4
Mean
diff=0 vs. diff n.e. 0: Paired t-test:
p=0.107
SD
Clone
28 5
12 3 2
10.0 10.8
Diff
2.2 1.8
1.75 1.6 2.0
1.87 0.22
Ratio
Test if ratio=1
Log Ratio=log (Treated/Control) =log (Treated) - log (Control) Test if diff of logs=0 one sample prob!
Log10 (Ctl)
1.38 0.78 1.20 0.70 0.30
Log10 (Trt)
1.72 1.04 1.45 0.90 0.60
Diff Log (Trt-Ctl)
0.34 0.26 0.24 0.20 0.30
0.27
0.05
@2002-2012 mcips
Ex. Enzyme Activity: Control vs. Treated Cells -from Motulsky
Ctl Trt
1 2 3 4 5
24 6
16 5 2
52 11 28 8 4
Mean SD
Clone
28 5
12 3 2
10.0 10.8
Diff
2.2 1.8
1.75 1.6 2.0
1.87 0.22
Ratio Log10 (Ctl)
1.38 0.78 1.20 0.70 0.30
Log10 (Trt)
1.72 1.04 1.45 0.90 0.60
Diff Log (Trt-Ctl)
0.34 0.26 0.24 0.20 0.30
0.27
0.05
Test if diff of logs=0 p=0.0003 95% CI for Diff of Logs: (0.21, 0.33) Antilog 95% CI for the Ratio: (1.62, 2.14) Doubling in enzyme activity is very unlikely due to coincidence!
@2002-2012 mcips
Comparing Observed Counts to Expected Counts
Example (Motulsky). Assume 10% die after brain Sx.
Last month, of n=75 patients, 16 pts died (21.3%).
Want to know if this is just coincidence or there is a change in death rate.
H0: If the prob of dying is 10%, what is the prob of observing 16 or more deaths out of 75 pts?
i @2002-2012 mcips
Example. Death after Sx
Alive Dead Total
59 16 75
# Observed # Expected
67.5 (90%) 7.5 (10%) 75 (100%)
2= |Observed-Expected| 2 _____________________
Expected Σ
2=10.7; p=0.0011 Reject H0 suspect some factor responsible for incr death rate
@2002-2012 mcips
Comparing Two Proportions
Disease Progressed versus No Progression
Dru
g
Disease Status
Progressed No
Progression
New Drug
Placebo
76 399
129 332
205 731
475
461
936
104 371
101 360
76/205=37% 399/731=55%
@2002-2012 mcips
Comparing Two Proportions (New drug vs. Placebo: Disease Progression)
2= (|Observed-Expected| - 0.5) 2 __________________________
Expected Σ
P=0<0.0001 (there is a less than 0.01% chance of observing such a large discrepancy between observed and expected counts.)
2= (76-104) 2 ________
104 +
(129-101) 2 ________
101 +
(399-371) 2 ________
371 +
(332-360) 2 ________
360
@2002-2012 mcips
Calculating the Expected counts in a 2x2 contingency Table
Expected count= Row Tot x Column Tot x Grand Tot Grand Tot
Grand Tot
Expected count= Row Tot x Column Tot Grand Tot
Dru
g
Progressed No
Progression
New Drug
Placebo
76 399
129 332
205 731
475
461
936
104 371
101 360
@2002-2012 mcips
Example 4. Analysis of Low birth weight & smoking during pregnancy (association) Women who
smoked during pregnancy
versus Women who did not smoke
during pregnancy
Sm
ok
ing
Sta
tus
Low Birth Weight
Yes No
Smoker
Non-Smoker
55 45
40 60
95 105
100
100
200
@2002-2012 mcips
Chi-square Test (test of association)
Sm
ok
ing
Sta
tus
Low Birthweight
Yes No
Smoker
Non-Smoker
55 45
40 60
95 105
100
100
200
47.5 52.5
47.5 52.5
2= (55-47.5) 2 ________
47.5 +
(40-47.5) 2 ________
47.5 +
(45-52.5) 2 ________
52.5 +
(60-52.5) 2 ________
52.5
= 4.5 critical value = 3.84 Reject the null hypothesis. Conclude there is an association.
55%
40%
@2002-2012 mcips
Calculating Relative Risk (Cohort study)-Dawson&Trapp
RR= Incidence in exposed
_____________________________
Incidence in the unexposed
Ex
po
sure
Outcome MI No MI
Aspirin
Placebo
139 10,898
239 10,795
378 21,693
11,037
11,034
22,071
= 139/11,037
________________
239/11,034
Fewer MIs with aspirin
= 0.58 _________ 0.0126
0.0217
= 95% CI: (.47, .71) -does not include 1, reject the null hypothesis.
@2002-2012 mcips
Calculating the Odds Ratio (Case-Control study)
OR= Odds that a Stroke Pt is exposed
_____________________________
Odds that a Control Pt is exposed
Ex
po
sure
Outcome Stroke Control
Smoker
Non-Smoker
73 18
141 196
214 214
91
337
428
= (73/214)/(141/214)
________________
(18/214)/(196/214)
95% CI: (3.2, 9.9)
= 5.64 _________ (73x196)
(18x141)
= does not include 1, reject the null hypothesis. Conclude there is an association.
@2002-2012 mcips
Analyzing Survival Data
Outcome: Time-to-‖Event‖ (―survival‖ time)
(time always > 0; could be ―censored‖ [start or end])
Time years, days, hours, minutes from start until an event occurs
Event death, diagnosis, relapse, discharge, ―failure” or any designated experience of interest
Examples: Time from birth until death Time from diagnosis until death Time from surgery to relapse of disease Time-to-discharge from Hospital (LOS) Time from discharge until ER visit
@2002-2012 mcips
Censoring: don‘t know exact survival time
Examples: Time from birth until death Time from diagnosis until death Time from surgery to relapse of disease
Possibilities:
[-----------------X
[-----------------O
―left censored―
―right censored‖
(----------X
―uncensored‖
@2002-2012 mcips
Example: Relapse
Drug A versus Placebo
6 6 6
6+ 7
9+ 10
10+ 11+ 13 16
17+ 19+ 20+
22 23
25+ 32+ 32+ 34+ 35+
1 1 2 2 3 4 4
5 5 8 8 8 8 11
11 12 12 15 17 22 23
+ = Censored
@2002-2012 mcips
Example: Survival (years) in Acute Leukemia
Drug A N=5
versus Drug B N=5
24 5+ 9 11 13
2 3 4 5
6+ + = Censored
@2002-2012 mcips
Incorrect Analysis
DRUG A
DRUG B
ALIVE 1 1
DEAD 4 4
Shows no difference % alive !!!
(20%) (20%)
@2002-2012 mcips
DRUG A DRUG B
Time-to-Death
@2002-2012 mcips
Log-Rank Test: p=0.013
Log-Rank Test: p=0.013
DRUG A
Median Survival Time:
(50% alive)-drop a ┴
DRUG B
Time-to-Death
@2002-2012 mcips
Natural Killer cell activity (lytic units**). Grouping based on scores from Social Readjustment Rating Scale -Dawson and Trapp p. 162 (from Irwin et al. Am J Psychiatry 1987; 144)
Low Score (n=13)
Moderate Score (n=12)
22.2 82.0 97.8 56.0 29.1 9.3 37.0 19.9 35.8 39.5 44.2 12.8
37.4
15.1 23.2 10.5 13.9 9.7 19.0 19.8 9.1 30.1 15.5 10.3 11.0
High Score (n=12)
10.2 11.3 11.4 5.3
14.5 11.0 13.6 33.4 25.0 27.0 36.3 17.7
Mean= 40.23 SD=25.71
Mean= 15.60 SD= 6.42
Mean= 18.06 SD= 9.97
**one lytic unit=number of effector cells killing 20% of target cells
Grand Mean= 25.05
@2002-2012 mcips
Analysis of Variance (ANOVA)
Multiple-Comparison Procedures
• IDEAL: pre-planned or a priori comparisons • Actual practice: posteriori or post-hoc methods
@2002-2012 mcips
• Investigating several variables • Post-hoc comparisons
Nonparametric ANOVA: a different method of analyzing data
-Recall: • t-test ↔Wilcoxon Rank sum • paired t test ↔ Wilcoxon Signed Rank Test • One way ANOVA ↔ Kruskal-Wallis test • Two-way ANOVA ↔ Friedman 2-way ANOVA by ranks
@2002-2012 mcips
Multiple Regression
Uses: -devise an equation to predict Y from several Xi variables for future subjects. E.g. predict cardiac output from BP, pulse rate, weight -adjust data-effect of X1 on Y but adjust for differences with respect to X2 -explore relationships among several X variables to find out which of X1, X2, X3 influence Y
@2002-2012 mcips
Logistic Regression
-quantifies the association between a risk factor (or treatment) and a disease (binary outcome), after adjusting for other variables. -similar to linear regression/multiple regression (predicts Y from one or several X-variables) -Logistic regression finds an equation that best predicts an outcome (binary) from one or several X variables (binary or quantitative) e.g. Y: Disease (Yes/No)= Age, BP, Gender, FamHx
@2002-2012 mcips
Example: Logistic Regression
-binary data (diabetes: 1 or 0) -ordinal data (severity: 1,2,3)
Possible predictors: -age at admission, gender, race -HGB (>=10 or <10) -presence of infection -serum LDH ratio >=1 or <1 -Pleural fluid WBC >=1000 or <1000 -Pleural fluid pH (>=7.2 or <7.2), ….,
Example: Risk factors— Malignant vs. Non-Malignant Pleural Effusion
@2002-2012 mcips
One way to proceed: First do a ―univariate screening‖ then include all significant factors into the ‗full‘ logistic model.
Rule of Thumb: About 10 events per variable are necessary in order to get reasonably stable estimates of the regression coefficients. (‗Statistical
Rules of Thumb‟-van Belle ) Example: 5-year f/up study of deaths following acute MI—about 1/3 of pts are expected to die during the study. If 7 variables are considered as predictors, by ROT, 70 events(deaths) are needed—therefore 210 subjects should be enrolled. (Has nothing to do with power, just stable estimates.)
@2002-2012 mcips
Summary---1
Parametric
(Gaussian)
Non-Parametric
(Non-Gaussian)
Decisions about a Single Group
Comparing to a standard
value
Matched or paired design
One sample t-test
(mean)
Paired t-test
Sign Test
(median)
Wilcoxon Signed-Rank
Test
Single Group, Multiple Time
Points RMANOVA
Friedman Test (ranks)
@2002-2012 mcips
Summary---2
Parametric
(Gaussian) Non-Parametric
(Non-Gaussian)
Decisions about Two or More Groups
Comparing 2 Independent
groups
Comparing 3 or more independent
groups
Two sample t-test
Analysis of Variance
(ANOVA)
Mann-Whitney Test
(Rank-Sum Test)
Kruskal-Wallis Test
@2002-2012 mcips
Summary---3
Usual test Caveats
Analyzing Proportions / Contingency Tables
Proportions in single groups
Comparing two independent proportions
Binomial Test; CI‘s
Small samples
Chi-square test, Fisher‘s Exact
Test; CI‘s
Scarce cells, continuous
made ordinal
Testing for Association or Independence
Chi-square test, Fisher‘s Exact Test; Kappa
Low expected values
Testing for Agreement or Discordance
McNemar‘s test; Kappa
Low expected values
@2002-2012 mcips
Word to the Wise…
• Statistical Significance
• Clinical Significance
• Use confidence intervals!!!
vs.
• When doing multiple comparisons, use adjusted p-values!!! (Bonferroni-like adjustment)
@2002-2012 mcips
Research Collaboration
@2002-2012 mcips
REFERENCES
• Berger J. Statistical Decision Theory and Bayesian Analysis, 2nd Ed. (1985). Springer-Verlag. New York.
• Casella G, Berger RL. Statistical Inference. 1990. Duxbury Press. • Daniel W. Biostatistics: A Foundation for Analysis in the Health
Sciences. 7th Ed. 1999. J. Wiley & Sons. • Dawson B, Trapp RG. Basic & Clinical Biostatistics, 2nd & 3rd Ed.
(1994 & 2001). McGraw-Hill Medical Publishing Division (Lange Med Books).
• Hennekens CH MD DrPH, Buring JE, ScD. Edited by Mayrent SL, PhD. Epidemiology in Medicine, 1st Ed. (1987). Little Brown & Company, Boston/Toronto.
• Motulsky H. Intuitive Biostatistics (1995). Oxford Univ Press. • Rosner B. Fundamentals of Biostatistics. 5th Ed. 2000. Duxbury-
Brooks/Cole. • Van Belle G. Statistical Rules of Thumb (2002). Wiley. • Zar, JH. Biostatistical Analysis, 2nd Ed. (1984). Prentice-Hall.
@2002-2012 mcips
Thanks!
@2002-2012 mcips
Disclosure on next page…
Statistical Consulting: FEINSTEIN INSTITUTE FOR MEDICAL RESEARCH AT NORTH SHORE-LIJ: BIOSTATISTICS UNIT: (516) 562 0300
I can prove it or disprove it! What do you want me to do?
@2002-2012 mcips
This is not what we do…