prep course 17: statistical decision theory, hypothesis testing and common statistical ... · ·...

PREP Course 17:

STATISTICAL DECISION THEORY,

HYPOTHESIS TESTING

AND

COMMON STATISTICAL TESTS

Presented by:

Cristina P. Sison, PhD

CME Disclosure Statement

• The North Shore LIJ Health System adheres to the ACCME’s new Standards for Commercial Support. Any individuals in a position to control the content of a CME activity, including faculty, planners, and managers, are required to disclose all financial relationships with commercial interests. All identified potential conflicts of interest are thoroughly vetted by the North Shore-LIJ for fair balance and scientific objectivity and to ensure appropriateness of patient care recommendations.

• Course Director, Kevin Tracey , has disclosed a commercial interest in Setpoint, Inc. as the cofounder, for stock and consulting support. He has resolved his conflicts by identifying a faculty member to conduct content review of this program who has no conflicts.

• Cristina Sison has nothing to disclose

Outline

• Statistical Inference

• Testing Hypothesis

• Null and Alternative Hypotheses

• Type I error () and Type II error ()

• Alpha Level

• p-value

• Common Statistical Tests

@2002-2012 mcips

Outline • Common Statistical Tests:

Parametric Tests

Non-Parametric Tests, etc.

• one sample t-test for the mean • paired t-test • two-sample t-test, ANOVA • Chi-square test, Fisher‘s Exact Test • McNemar‘s Test

• Sign test • Mann-Whitney Test, Kruskal-Wallis Test • Survival analysis, Logistic Regression

@2002-2012 mcips

@2002-2012 mcips

Decision Theory

concerned with the problem of making decisions

in the presence of statistical knowledge which sheds light on some of the uncertainties involved in the decision problem

^

Unknown Numerical Quantities ()

@2002-2012 mcips

Example (Berger, 1985)

Market a new

drug?

Y

N

Factors : •Proportion of people for which drug is effective (1) •Proportion of the Market the drug will capture (2)

Conduct experiments; obtain info

on

1 , 2

@2002-2012 mcips

Decision-Theoretic Framework

1 Unknown parameter

a1 Decision Action

Loss Function: L(1 , a1) -will never be known with certainty at time of action

Risk Function: Expected [L(1 , a1)] (average loss over all possibilities)

Optimal Decision?

Classical (Frequentist)

Bayesian

@2002-2012 mcips

The Nature and Purpose of Statistical Inference

Inference =drawing of conclusions from data

Statistical Inference

•Draw conclusions •Use information (Quantitative/qualitative) •Use statistical methods Describe data

Test Hypothesis @2002-2012 mcips

Statistical Inference from a Sample

Population

Sample

Statistical Inference

Sample 1

Sample 2

Sample 3

Sample 4

Sample 6

Sample 7

Sample 5

Random

Assumptions must be met

@2002-2012 mcips

Extrapolating from Sample to Population

• Quality control

• Political polls

• Clinical studies-rarely random sample; representative

• Laboratory experiments

Assume: population being sampled is infinite.

@2002-2012 mcips

Statistical Hypothesis Testing -Helps one decide whether an observed difference is likely to be caused by chance. Statistical Hypothesis Testing involves stating:

Which conclusion does my data support? Better: Does my data support the alternative?

``process‖ = Statistical Test

Null Hypothesis Alternative Hypothesis vs.

@2002-2012 mcips

@2002-2012 mcips

“Data don’t make any sense,

we will have to resort to statistics.”

Important: What is your research question?

What is the hypothesis?

• Well-defined endpoints (outcome variables) • Quantification of outcomes

@2002-2012 mcips

Example 1. One sample, continuous variable

Outcome variable: Alcohol intake (g/day) in men with skin diseases other than psoriasis

•In the population: Mean alcohol intake: 21.0 g/day, SD=34.2

•In a study: n=142 men with psoriasis Sample mean alcohol intake: 42.99 g/day

•Question: 42.99 g/day =*** 21.0 g/day?

@2002-2012 mcips

Example 2. Two Independent Samples, continuous variables

Outcome variable: Highest urinary excretions of 5-HIAA

(mg per 24 hours) -from Dawson & Trapp-p 113

Carcinoid Heart Disease

(n=16)

vs No Carcinoid Heart Disease

(n=12)

263 450 283 524 288 1270 274 135 432 220 580 500 890 350 285 120

60 124 43 119 196 854 153 14 400 588 23 73

@2002-2012 mcips

Example 3. Paired Measurements, continuous variables

Effect of low-calorie intake on abnormal pulmonary physiology in patients with chronic hypercapneic respiratory failure Arterial Oxygen Tension (mmHg)

-Am J Med 1984, 77: 987-994 (from p 108 Dawson & Trapp)

Patient Before After

1 2 3 4 5 6 7 8

70 59 53 54 44 58 64 43

82 66 65 62 74 77 68 59

@2002-2012 mcips

Example 4. Two Independent Samples, Comparing proportions

Low birth weight and smoking during pregnancy

Women who

smoked

during pregnancy

versus Women who

did not smoke

during pregnancy

@2002-2012 mcips

1

2

3

@2002-2012 mcips

Example (Motulsky): Comparing SBPs between 1st and 2nd Year

Med Students (MS1 vs. MS2, n=5 per group/class)

MS1: 120, 80, 90, 110, 95

MS2: 105, 130, 145, 125, 115

@2002-2012 mcips

Difference in means =25 mmHg

99 mmHg 124 mmHg

Substantial? Trivial?

@2002-2012 mcips

Use confidence intervals!

99 mmHg 124 mmHg

(79.2 ,118.8 ) (105.2, 142.8 )

Mean

95% CI:

@2002-2012 mcips

Q: What is the probability that the difference is due to chance?

Null Hypothesis

Alternative Hypothesis

H0

H1

- Represents a theory; assumed to be true - want to test if data supports H0

e.g. ―No difference‖ or ―No association‖

-Represents a theory against the ―Null‖ -Usually is what the scientist claims and aims to prove

e.g. ―There is a difference‖ or ―There is an association‖

@2002-2012 mcips

Stating the hypothesis in terms of statistical hypothesis

-―No difference‖ -―No association‖

―There is a difference‖ ―There is an association‖

vs.

H0: = 21.0 g/day H1: 21.0 g/day (two-tailed)

H1: > 21.0 g/day (one-tailed) vs.

H0: 1 = 2

H1: 1 2 (two-tailed)

H1: 1 > 2 (one-tailed) vs.

@2002-2012 mcips

Errors in Hypothesis Testing

=Level of Significance= chosen before statistical test is performed, also called ―alpha value”

=Maximum Probability of incorrectly rejecting the null when in fact it is true, i.e. H0 is wrongly rejected

=Probability of incorrectly ‗accepting‘ the null when in fact it is false, i.e. H0 is wrongly ‗accepted‘ Note: Power = 1-

Small sample size results in low power!

Type I Error

: False-Positive Error

Type II Error

: False-Negative Error

@2002-2012 mcips

Correct Decisions and Errors in Hypothesis Testing

True Situation

Co

ncl

usi

on

fro

m

Hy

po

thesi

s T

est

Difference Exists (H1)

No Difference (H0)

Difference Exists (Reject H0)

No Difference (Do not reject H0)

Type I error ( error)

Type II error ( error)

No error

No error Fail to reject H0

@2002-2012 mcips

Just like the US Legal System!

True Situation

TH

E J

UR

Y‘S

V

ER

DIC

T

Guilty (H1)

Innocent (H0)

Guilty!! (Reject H0)

Not Guilty!! (Do not reject H0)

Type I error ( error)

Type II error ( error)

No error

No error

Convict Innocent=Type I

Fail to convict guilty=Type II Fail to reject H0

@2002-2012 mcips

Guilty vs. Not Guilty

Significant vs. Not Significant

1.Presume Defendant innocent

Presume Null hypothesis is true

2. Factual evidence Observed Data

3. Evaluate witnesses Evaluate expt‟l flaws

4. Evidence consistent with innocence?

Calculate p-value

5. If evidence inconsistent with assumption of innocence, declare defendant to be guilty. Else not guilty.

If p-value<preset threshold, conclude data inconsistent with the null hypothesis, declare difference as statistically significant. Else not st. sig.

@2002-2012 mcips

What is a P-value (p)?

``If p < , then reject the null hypothesis‖

p= the probability that the observed difference (or a difference more extreme) could have been obtained by chance alone.

e.g.`If p < .05, reject H0 --- i.e. conclude that the data supports the alternative, i.e. there is a difference between the two groups.‘

@2002-2012 mcips

• Can he tell the difference between shaken or stirred martini?

(how often is he correct?)

• 16 taste tests

(Design: 50% shaken, 50% stirred)

• Bond correct on 13/16 (81.25%)

Bond…James Bond

@2002-2012 mcips

• If Bond was just guessing, (i.e., flipping a coin),

Prob (observing 13 or more out of 16 taste tests)

=Binomial (p=.5, n=16, x=13,14,15,16)=0.0106

• would have to be very lucky to be correct 13 or more times out of 16 if just guessing.

@2002-2012 mcips

Bond…James Bond

• Null Hypothesis: Bond is guessing (p=.5) • Alternative: Bond can tell shaken from stirred (p ne .5) • Prob (>=13 correct)=.0106. ??Did we prove the null hypothesis false (i.e. he

was guessing)? NO, but we strongly doubt it. • Therefore, there is strong evidence that Mr. Bond

can tell whether a drink was shaken or stirred. (reject the hypothesis that he was just guessing).

@2002-2012 mcips

Bond…James Bond

―Significant‖ • Statistical significance versus scientific importance

(caution: large samples significant results).

• Extremely significant?

significant* (p<.05) vs.

highly significant** (p<.01) vs.

extremely significant*** (p<.0001)

• Borderline p-values (marginally significant)

p<.049 or p<.051

• Not significant: does not prove the null hypothesis; fail to reject the null hypothesis; need to evaluate study (CIs or power)

@2002-2012 mcips

What if p > ???

• If p < , then reject H0 (―significant‖) • If p > , then fail to reject H0 (―not significant‖) • A high p-value does not prove the null

hypothesis; it means that data are not strong enough to persuade you to reject the null hypothesis.

• If not significant, evaluate:

- Confidence intervals

- Power of the study

@2002-2012 mcips

Test Statistic

-sometimes corresponds to a ―critical ratio”

A test statistic is a quantity calculated from a sample of data.

e.g. One sample t-test.

t= (X-)/(s/n)

Intuitively, a very large value of ‗t‘ or a very small value of ‗t‘ indicates support for the alternative

??? How large or how small should ‗t‘ be???

@2002-2012 mcips

___ ___

Trivia

• The t-distribution is sometimes called “Student’s t,” after the man (his pseudonym) who first noted the distribution of means from small samples in 1908.

• “Student” was really William Gosset, a mathematician who worked for the Guinness Brewery; was forced to use a pseudonym because company policy prohibited publishing

@2002-2012 mcips

STEPS IN TESTING STATISTICAL HYPOTHESIS

Step 1. State the research question in terms of a statistical hypothesis.

Step 2. Decide on the appropriate test statistic.

Step 3. Decide on the level of significance ().

Step 4. Determine the critical value for the test statistic to be declared as significant (p).

Step 5. Perform calculations.

Step 6. Draw and state your conclusion.

Note: for steps 4 & 5 computer programs might be helpful.

@2002-2012 mcips

Back to: Comparing SBPs between 1st and 2nd Year Med Students

(MS1 vs. MS2, n=5 per group/class)

MS1: 120, 80, 90, 110, 95

MS2: 105, 130, 145, 125, 115

@2002-2012 mcips

Comparing Two Means with the t-Test

Example: Comparing SBPs between MS1 and MS2

MS1 MS2

n1=5 Mean1= 99 SD1=15.97 n2=5 Mean2= 124 SD2= 15.17

t= (X2-X1) /pooled SD*( (1/ n1) + (1/n2) Pooled SD= (n1-1) SD1

2 + (n2-1) SD22 / (n1+n2-2)

t=2.538> t-table(df=n1+n2-2)=2.306 p=0.035*** there is only a 3.5% chance of randomly selecting samples whose means are as far apart as (or further than) we observed.

Mean difference =124-99 =25

@2002-2012 mcips

___ ___ _____________________

__________________________________________

Example 2. Highest urinary excretions of 5-HIAA (mg per 24 hours)

Carcinoid Heart Disease vs No Carcinoid Heart Disease

263 450 283 524 288 1270 274 135 432 220 580 500 890 350 285 120

60 124 43 119 196 854 153 14 400 588 23 73

n=16 Mean= 429.00 SD=294.67 n=12 Mean= 220.58 SD= 261.82

t-test for comparing the difference between 2 independent groups Assumptions: 1. Normality 2. Homogeneity (Equal variance)

@2002-2012 mcips

Histogram of the Raw Data

@2002-2012 mcips

Comparing Two Means with the t-Test

Example 2. Highest urinary excretions of 5-HIAA (mg per 24 hours)

Carcinoid Heart Disease No Carcinoid Heart Disease

n=16 Mean= 429.00 SD=294.67 n=12 Mean= 220.58 SD= 261.82

t=1.94; p=0.063

Mean difference =429-220.58 =208.42

Random sampling would create a difference this large or larger in 6.3% of expts if null was true.

@2002-2012 mcips

t= (X1-X2) /pooled SD*( (1/ n1) + (1/n2) Pooled SD= (n1-1) SD1

2 + (n2-1) SD22 / (n1+n2-2)

___ ___ _____________________

__________________________________________

Did data meet assumptions of normality and equal variance?

Normality—questionable.

If n were large, appeal to the Central Limit Theorem (which is what we just did, even though n was small)

@2002-2012 mcips

If „Departure from Normality‟ is large—what can be done???

Two Alternatives:

1. Transform the scale of the observations

2. Use different statistical methods to analyze data (nonparametric procedures)

-normalizing transformations

-variance-stabilizing transformations

-distribution-free statistics

(based on ranks)

@2002-2012 mcips

Carcinoid No Carcinoid

Raw UEX Raw UEX

@2002-2012 mcips

Log UEX Log UEX


@2002-2012 mcips


Square Root UEX Square Root UEX

@2002-2012 mcips

Don‟t forget EDA!!!

Actual Log-transformed

Square root- transformed

@2002-2012 mcips

@2002-2012 mcips

Nonparametric tests

• Make no rigid assumptions about the distribution of the population

• Calculations based on ranks (rather than the actual data values)

• Resilient to outliers (robust)

• ‗distribution-free tests‘

• E.g. Mann-Whitney Rank Sum Test

( alternative to the t-test when distribution is non-Gaussian)

@2002-2012 mcips

1. Rank all values regardless of group.

(For ties: assign the average.)

2. Sum the ranks in each group. Call the sum of the ranks T1 and T2.

3. Calculate U= T1- (n1 (n1 +1)/2)

or U=T2- (n2 (n2 +1)/2)

4. Look up table value

Mann-Whitney Test (a.k.a. Wilcoxon Rank Sum Test)

(Mann-Whitney U-statistic)

Z=|U- n1 n2/2|-0.5

n1 n2(n1 + n2 +1)/12

_____________________________ _____________________________

Intuitively, if Median1 is truly < Median2 then one would expect: T1 to be smaller than T2

@2002-2012 mcips

Mann-Whitney Test for the Example: Comparing SBPs between MS1 and MS2

MS1 SBP:

MS1 Ranks:

MS2 SBP:

MS2 Ranks:

120 80 90 110 95

7 1 2 5 3

105 130 145 125 115

4 9 10 8 6

T1=18 T2=37

U1=T1- (n1 (n1 +1)/2)=3 U2=T2- (n2 (n2 +1)/2)=22

Z=1.88 p=0.06 @2002-2012 mcips

Results from t-test vs. M-W test for MS1 vs. MS2 problem

t-test M-W test

p=0.035 p=0.06

When you use a non-parametric test with Gaussian data, the p-value tends to be too high.

@2002-2012 mcips

t-test and Mann-Whitney test

• t-test more powerful than M-W test when the assumptions for t-test are true

• With large samples, the difference in power is trivial

• With smaller samples, the difference is

more pronounced • If there are 7 or fewer total data points,

the Mann-Whitney test (two-tailed test) can never yield a p-value<0.05, however different the groups are

@2002-2012 mcips

Mann-Whitney Test Carcinoid Heart Disease (n=16) No CHD (n=12)

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

263 288 432 890 450

1270 220 350 283 274 580 285 524 135 500 120

PTID 5-HIAA Rank

1 2 3 4 5 6 7 8 9

10 11 12

60 119 153 588 124 196

14 23 43

854 400

73

PTID 5-HIAA Rank

4

6

10

25

8

11

1

2

3

26

19

5

13

17

20

27

21

28

12

18

15

14

24

16

23

9

22

7

Mean, SD Rank 17.875, 6.13 Mean, SD Rank 10, 8.73

@2002-2012 mcips

tranks= (17.875-10) / 7.34*( (1/ 16) + (1/12) )

tranks= 7.875/2.80=2.81 p=0.009 ****

Mann-Whitney: a t-test on the ranks!

t= (R1-R2) /pooled SD*( (1/ n1) + (1/n2) Pooled SD= (n1-1) SD1

2 + (n2-1) SD22 / (n1+n2-2)

H0: Mean ranks equal in the two groups (if no difference fairly even distribution of ranks across grps)

compare to t-test result: t=1.94; p=0.063 (n.s.)

@2002-2012 mcips ____

___ ___ _______________________

__________________________________________

____________________

Why the discrepancy in results?

Normality ???

@2002-2012 mcips

Histogram of the Log-transformed Data with Normal Curve Overlayed

@2002-2012 mcips

T-test on the log-transformed values: Carcinoid Heart Disease (n=16) No CHD (n=12)

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16

263 288 432 890 450

1270 220 350 283 274 580 285 524 135 500 120

PTID 5-HIAA Rank

1 2 3 4 5 6 7 8 9

10 11 12

60 119 153 588 124 196

14 23 43

854 400

73

PTID 5-HIAA Rank

4

6

10

25

8

11

1

2

3

26

19

5

13

17

20

27

21

28

12

18

15

14

24

16

23

9

22

7

Mean, SD Log Mean, SD Log 5.88, 0.62 4.75, 1.25

5.57 5.66 6.06 6.79 6.10 7.14 5.39 5.85 5.64 5.61 6.36 5.65 6.26 4.90 6.21 4.78

LOG

4.09 4.77 5.03 6.37 4.82 5.27 2.63 3.13 3.76 6.74 5.99 4.29

LOG

@2002-2012 mcips

Compare Results: Carcinoid data

T-TEST MANN-WHITNEY

VARIABLE P-VALUE P-VALUE

1 uex 0.06323 0.01300

2 rank 0.00932 0.01300

3 loguex 0.00402 0.01300

ASSUMPTIONS IMPORTANT!

@2002-2012 mcips

Parametric tests vs Non-parametric tests

• Specific bacteria known • Unknown bacteria

Analogue:

use specific antibiotic to fight infection!

use wide-spectrum antibiotic to fight infection!

• Can be more powerful if you‘re sure of assumptions!!

• Conservative but ―safe‖!!!

Power

Requirements & Applicability:

• no distributional assumptions; unrestricted

• some distributional assumptions; restricted

@2002-2012 mcips

“Still, it is an error to argue in front of your data. You find yourself insensibly twisting them round to fit your theories.”

–Sherlock Holmes

The Adventure of Wisteria Lodge

(quoted by Casella and Berger)

@2002-2012 mcips

• Before and after intervention

• Recruit subjects as pairs, matched for certain variables (age, age range, diagnosis, treatment, etc.)

• Twins

• Child-parent pairs

distinguish variation among subjects vs. variation due to differences between groups

Special Test for Paired Data

@2002-2012 mcips

Rationale for paired design

Weight Loss Program-2 months –Dawson & Trapp

Patient Weight Before(kg) Weight After (kg)

1 2 3 4 5 6

100 89 83 98

108 95

95 84 78 93

103 90

Randomly select 3 before program

Randomly select 3 after program

(89+83+95)/3=89 kg (95+93+103)/3=97 kg

!!! 8 kg gain ??? But each lost 5 kg!!! VaRiAtIoN

@2002-2012 mcips

Example 3. Analysis of Arterial Oxygen Tension (mmHg) using paired t-test


1 2 3 4 5 6 7 8

70 59 53 54 44 58 64 43

82 66 65 62 74 77 68 59

Difference (d)

12 7 12 8 30 19 4 16

Mean 55.6 69.1 13.5 SD 9.2 7.9 8.2

@2002-2012 mcips

Distribution of the differences

@2002-2012 mcips

Example 3. Analysis of Arterial Oxygen Tension (mmHg) using paired t-test


1 2 3 4 5 6 7 8

70 59 53 54 44 58 64 43

82 66 65 62 74 77 68 59

Difference (d)

12 7 12 8 30 19 4 16

Mean 55.6 69.1 13.5 SD 9.2 7.9 8.2

Step 1. H0: =0 H1: 0

Step 2. t-test for mean difference: t= mean(d)/SEmean(d) Step 3. =0.10 Step 4. t-value, df=7 t>1.895 or t<-1.895 Step 5. t=13.5/(8.2/sqrt(8))=4.7

Step 6. t>1.895 Rej H0

There is a difference ()in AOT after weight reduction

Mean 55.6 69.1 13.5

@2002-2012 mcips

Wilcoxon Signed Rank Test on Ex. 3-Arterial Oxygen sample

Patient

1 2 3 4 5 6 7 8

Difference (d=aft - bef)

+12 +7 +12 +8 +30 +19 +4 +16

Absolute Value of d

12 7 12 8 30 19 4 16

Rank Signed Rank

+4.5 +2

+4.5 +3 +8 +7 +1 +6

4.5 2

4.5 3 8 7

1 6

@2002-2012 mcips

1. For each pair, calculate the signed difference.

2. Rank the absolute values of all differences (i.e. ignore sign temporarily). If any of the differences is equal to 0, ignore them entirely. For ties: assign the average.

3. Add up the ranks of all positive differences and all of the negative differences. Call the sum of the ranks T+

and T-.

3. Look up table value (details omited here) or

use large sample approximation—calculate T* and

if T* >Z(α) Reject null hypothesis.

T*= T+-[n(n+1)/4]/sqrt[n(n+1)(2n+1)/24]

Wilcoxon Signed Rank Test (a.k.a. Wilcoxon Signed Rank Sum Test)

@2002-2012 mcips

• Paired t-tests: tests that „intervention‟ always causes the same average absolute difference, regardless of starting value

• Sometimes, intervention will cause the same average relative difference.

A look at t-tests on the ‗Ratio‘

@2002-2012 mcips

Ex. Enzyme Activity: Control vs. Treated Cells -from Motulsky

Ctl Trt

1 2 3 4 5

24 6

16 5 2

52 11 28 8 4

Mean

diff=0 vs. diff n.e. 0: Paired t-test:

p=0.107

SD

Clone

28 5

12 3 2

10.0 10.8

Diff

2.2 1.8

1.75 1.6 2.0

1.87 0.22

Ratio

Test if ratio=1

Log Ratio=log (Treated/Control) =log (Treated) - log (Control) Test if diff of logs=0 one sample prob!

Log10 (Ctl)

1.38 0.78 1.20 0.70 0.30

Log10 (Trt)

1.72 1.04 1.45 0.90 0.60

Diff Log (Trt-Ctl)

0.34 0.26 0.24 0.20 0.30

0.27

0.05

@2002-2012 mcips

Ex. Enzyme Activity: Control vs. Treated Cells -from Motulsky

Ctl Trt

1 2 3 4 5

24 6

16 5 2

52 11 28 8 4

Mean SD

Clone

28 5

12 3 2

10.0 10.8

Diff

2.2 1.8

1.75 1.6 2.0

1.87 0.22

Ratio Log10 (Ctl)

1.38 0.78 1.20 0.70 0.30

Log10 (Trt)

1.72 1.04 1.45 0.90 0.60

Diff Log (Trt-Ctl)

0.34 0.26 0.24 0.20 0.30

0.27

0.05

Test if diff of logs=0 p=0.0003 95% CI for Diff of Logs: (0.21, 0.33) Antilog 95% CI for the Ratio: (1.62, 2.14) Doubling in enzyme activity is very unlikely due to coincidence!

@2002-2012 mcips

Comparing Observed Counts to Expected Counts

Example (Motulsky). Assume 10% die after brain Sx.

Last month, of n=75 patients, 16 pts died (21.3%).

Want to know if this is just coincidence or there is a change in death rate.

H0: If the prob of dying is 10%, what is the prob of observing 16 or more deaths out of 75 pts?

i @2002-2012 mcips

Example. Death after Sx

Alive Dead Total

59 16 75

# Observed # Expected

67.5 (90%) 7.5 (10%) 75 (100%)

2= |Observed-Expected| 2 _____________________

Expected Σ

2=10.7; p=0.0011 Reject H0 suspect some factor responsible for incr death rate

@2002-2012 mcips

Comparing Two Proportions

Disease Progressed versus No Progression

Dru

g

Disease Status

Progressed No

Progression

New Drug

Placebo

76 399

129 332

205 731

475

461

936

104 371

101 360

76/205=37% 399/731=55%

@2002-2012 mcips

Comparing Two Proportions (New drug vs. Placebo: Disease Progression)

2= (|Observed-Expected| - 0.5) 2 __________________________

Expected Σ

P=0<0.0001 (there is a less than 0.01% chance of observing such a large discrepancy between observed and expected counts.)

2= (76-104) 2 ________

104 +

(129-101) 2 ________

101 +

(399-371) 2 ________

371 +

(332-360) 2 ________

360

@2002-2012 mcips

Calculating the Expected counts in a 2x2 contingency Table

Expected count= Row Tot x Column Tot x Grand Tot Grand Tot

Grand Tot

Expected count= Row Tot x Column Tot Grand Tot

Dru

g

Progressed No

Progression

New Drug

Placebo

76 399

129 332

205 731

475

461

936

104 371

101 360

@2002-2012 mcips

Example 4. Analysis of Low birth weight & smoking during pregnancy (association) Women who

smoked during pregnancy

versus Women who did not smoke

during pregnancy

Sm

ok

ing

Sta

tus

Low Birth Weight

Yes No

Smoker

Non-Smoker

55 45

40 60

95 105

100

100

200

@2002-2012 mcips

Chi-square Test (test of association)

Sm

ok

ing

Sta

tus

Low Birthweight

Yes No

Smoker

Non-Smoker

55 45

40 60

95 105

100

100

200

47.5 52.5

47.5 52.5

2= (55-47.5) 2 ________

47.5 +

(40-47.5) 2 ________

47.5 +

(45-52.5) 2 ________

52.5 +

(60-52.5) 2 ________

52.5

= 4.5 critical value = 3.84 Reject the null hypothesis. Conclude there is an association.

55%

40%

@2002-2012 mcips

Calculating Relative Risk (Cohort study)-Dawson&Trapp

RR= Incidence in exposed

_____________________________

Incidence in the unexposed

Ex

po

sure

Outcome MI No MI

Aspirin

Placebo

139 10,898

239 10,795

378 21,693

11,037

11,034

22,071

= 139/11,037

________________

239/11,034

Fewer MIs with aspirin

= 0.58 _________ 0.0126

0.0217

= 95% CI: (.47, .71) -does not include 1, reject the null hypothesis.

@2002-2012 mcips

Calculating the Odds Ratio (Case-Control study)

OR= Odds that a Stroke Pt is exposed

_____________________________

Odds that a Control Pt is exposed

Ex

po

sure

Outcome Stroke Control

Smoker

Non-Smoker

73 18

141 196

214 214

91

337

428

= (73/214)/(141/214)

________________

(18/214)/(196/214)

95% CI: (3.2, 9.9)

= 5.64 _________ (73x196)

(18x141)

= does not include 1, reject the null hypothesis. Conclude there is an association.

@2002-2012 mcips

Analyzing Survival Data

Outcome: Time-to-‖Event‖ (―survival‖ time)

(time always > 0; could be ―censored‖ [start or end])

Time years, days, hours, minutes from start until an event occurs

Event death, diagnosis, relapse, discharge, ―failure” or any designated experience of interest

Examples: Time from birth until death Time from diagnosis until death Time from surgery to relapse of disease Time-to-discharge from Hospital (LOS) Time from discharge until ER visit

@2002-2012 mcips

Censoring: don‘t know exact survival time

Examples: Time from birth until death Time from diagnosis until death Time from surgery to relapse of disease

Possibilities:

[-----------------X

[-----------------O

―left censored―

―right censored‖

(----------X

―uncensored‖

@2002-2012 mcips

Example: Relapse

Drug A versus Placebo

6 6 6

6+ 7

9+ 10

10+ 11+ 13 16

17+ 19+ 20+

22 23

25+ 32+ 32+ 34+ 35+

1 1 2 2 3 4 4

5 5 8 8 8 8 11

11 12 12 15 17 22 23

+ = Censored

@2002-2012 mcips

Example: Survival (years) in Acute Leukemia

Drug A N=5

versus Drug B N=5

24 5+ 9 11 13

2 3 4 5

6+ + = Censored

@2002-2012 mcips

Incorrect Analysis

DRUG A

DRUG B

ALIVE 1 1

DEAD 4 4

Shows no difference % alive !!!

(20%) (20%)

@2002-2012 mcips

DRUG A DRUG B

Time-to-Death

@2002-2012 mcips

Log-Rank Test: p=0.013

Log-Rank Test: p=0.013

DRUG A

Median Survival Time:

(50% alive)-drop a ┴

DRUG B

Time-to-Death

@2002-2012 mcips

Natural Killer cell activity (lytic units**). Grouping based on scores from Social Readjustment Rating Scale -Dawson and Trapp p. 162 (from Irwin et al. Am J Psychiatry 1987; 144)

Low Score (n=13)

Moderate Score (n=12)

22.2 82.0 97.8 56.0 29.1 9.3 37.0 19.9 35.8 39.5 44.2 12.8

37.4

15.1 23.2 10.5 13.9 9.7 19.0 19.8 9.1 30.1 15.5 10.3 11.0

High Score (n=12)

10.2 11.3 11.4 5.3

14.5 11.0 13.6 33.4 25.0 27.0 36.3 17.7

Mean= 40.23 SD=25.71

Mean= 15.60 SD= 6.42

Mean= 18.06 SD= 9.97

**one lytic unit=number of effector cells killing 20% of target cells

Grand Mean= 25.05

@2002-2012 mcips

Analysis of Variance (ANOVA)

Multiple-Comparison Procedures

• IDEAL: pre-planned or a priori comparisons • Actual practice: posteriori or post-hoc methods

@2002-2012 mcips

• Investigating several variables • Post-hoc comparisons

Nonparametric ANOVA: a different method of analyzing data

-Recall: • t-test ↔Wilcoxon Rank sum • paired t test ↔ Wilcoxon Signed Rank Test • One way ANOVA ↔ Kruskal-Wallis test • Two-way ANOVA ↔ Friedman 2-way ANOVA by ranks

@2002-2012 mcips

Multiple Regression

Uses: -devise an equation to predict Y from several Xi variables for future subjects. E.g. predict cardiac output from BP, pulse rate, weight -adjust data-effect of X1 on Y but adjust for differences with respect to X2 -explore relationships among several X variables to find out which of X1, X2, X3 influence Y

@2002-2012 mcips

Logistic Regression

-quantifies the association between a risk factor (or treatment) and a disease (binary outcome), after adjusting for other variables. -similar to linear regression/multiple regression (predicts Y from one or several X-variables) -Logistic regression finds an equation that best predicts an outcome (binary) from one or several X variables (binary or quantitative) e.g. Y: Disease (Yes/No)= Age, BP, Gender, FamHx

@2002-2012 mcips

Example: Logistic Regression

-binary data (diabetes: 1 or 0) -ordinal data (severity: 1,2,3)

Possible predictors: -age at admission, gender, race -HGB (>=10 or <10) -presence of infection -serum LDH ratio >=1 or <1 -Pleural fluid WBC >=1000 or <1000 -Pleural fluid pH (>=7.2 or <7.2), ….,

Example: Risk factors— Malignant vs. Non-Malignant Pleural Effusion

@2002-2012 mcips

One way to proceed: First do a ―univariate screening‖ then include all significant factors into the ‗full‘ logistic model.

Rule of Thumb: About 10 events per variable are necessary in order to get reasonably stable estimates of the regression coefficients. (‗Statistical

Rules of Thumb‟-van Belle ) Example: 5-year f/up study of deaths following acute MI—about 1/3 of pts are expected to die during the study. If 7 variables are considered as predictors, by ROT, 70 events(deaths) are needed—therefore 210 subjects should be enrolled. (Has nothing to do with power, just stable estimates.)

@2002-2012 mcips

Summary---1

Parametric

(Gaussian)

Non-Parametric

(Non-Gaussian)

Decisions about a Single Group

Comparing to a standard

value

Matched or paired design

One sample t-test

(mean)

Paired t-test

Sign Test

(median)

Wilcoxon Signed-Rank

Test

Single Group, Multiple Time

Points RMANOVA

Friedman Test (ranks)

@2002-2012 mcips

Summary---2

Parametric

(Gaussian) Non-Parametric

(Non-Gaussian)

Decisions about Two or More Groups

Comparing 2 Independent

groups

Comparing 3 or more independent

groups

Two sample t-test

Analysis of Variance

(ANOVA)

Mann-Whitney Test

(Rank-Sum Test)

Kruskal-Wallis Test

@2002-2012 mcips

Summary---3

Usual test Caveats

Analyzing Proportions / Contingency Tables

Proportions in single groups

Comparing two independent proportions

Binomial Test; CI‘s

Small samples

Chi-square test, Fisher‘s Exact

Test; CI‘s

Scarce cells, continuous

made ordinal

Testing for Association or Independence

Chi-square test, Fisher‘s Exact Test; Kappa

Low expected values

Testing for Agreement or Discordance

McNemar‘s test; Kappa

Low expected values

@2002-2012 mcips

Word to the Wise…

• Statistical Significance

• Clinical Significance

• Use confidence intervals!!!

vs.

• When doing multiple comparisons, use adjusted p-values!!! (Bonferroni-like adjustment)

@2002-2012 mcips

Research Collaboration

@2002-2012 mcips

REFERENCES

• Berger J. Statistical Decision Theory and Bayesian Analysis, 2nd Ed. (1985). Springer-Verlag. New York.

• Casella G, Berger RL. Statistical Inference. 1990. Duxbury Press. • Daniel W. Biostatistics: A Foundation for Analysis in the Health

Sciences. 7th Ed. 1999. J. Wiley & Sons. • Dawson B, Trapp RG. Basic & Clinical Biostatistics, 2nd & 3rd Ed.

(1994 & 2001). McGraw-Hill Medical Publishing Division (Lange Med Books).

• Hennekens CH MD DrPH, Buring JE, ScD. Edited by Mayrent SL, PhD. Epidemiology in Medicine, 1st Ed. (1987). Little Brown & Company, Boston/Toronto.

• Motulsky H. Intuitive Biostatistics (1995). Oxford Univ Press. • Rosner B. Fundamentals of Biostatistics. 5th Ed. 2000. Duxbury-

Brooks/Cole. • Van Belle G. Statistical Rules of Thumb (2002). Wiley. • Zar, JH. Biostatistical Analysis, 2nd Ed. (1984). Prentice-Hall.

@2002-2012 mcips

Thanks!

@2002-2012 mcips

Disclosure on next page…

Statistical Consulting: FEINSTEIN INSTITUTE FOR MEDICAL RESEARCH AT NORTH SHORE-LIJ: BIOSTATISTICS UNIT: (516) 562 0300

I can prove it or disprove it! What do you want me to do?

@2002-2012 mcips

This is not what we do…

prep course 17: statistical decision theory, hypothesis testing and common statistical ... · ·...

Documents