correlation, regression, and causality richard l. amdur, ph.d. chief, biostatistics & data...

31
Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery Georgetown University Medical Center

Upload: camren-bater

Post on 14-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Correlation, Regression, and Causality

Richard L. Amdur, Ph.D.Chief, Biostatistics & Data Management Core

DC VAMCAssistant Professor, Depts. of Psychiatry & Surgery

Georgetown University Medical Center

Page 2: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Association does not mean causality

Why?

Page 3: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

SSRI & Depression

Study Design:Do a survey of everyone who is currently present at the DCVA, to determine if taking SSRI’s reduces depression. Find out whether or not each person is currently taking an SSRI, and measure their level of depression with the Beck Depression Inventory.

Conceptualization:Dr. Smith believes that if SSRI’s reduce depression then people who take SSRI’s should have less depression than those who do not take SSRI’s.

Page 4: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Results:Mean ± sd BDI scores were 50 ± 18 for those taking SSRI’s, and 15 ± 8 for those not taking SSRI’s.

Correct Conclusion:SSRI use is positively associated with depression.

Incorrect Conclusion:SSRI use increases depression.

Page 5: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Causal Modeling Notation for Discussing Study Design

Mean Daily Caloric Intake(unit=100 cal/day)

0.5

Interpretation of path coefficient:For every 1-unit increase in Daily Caloric Intake, there is an increase in weight of 0.5 units.In this case, for every additional 100 calories taken in, subjects will gain ½ pound.

Weight (lbs)

Independent variable Dependent variableEffect size

Page 6: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Mean Daily Caloric Intake(unit=100 cal/day)

0.5

Interpretation of path coefficients:For every 100cal/day increase in Daily Caloric Intake, there is an increase in weight of 0.5 pounds. For every 100 cal/day increase in activity, there is a decrease in weight of 0.5 pounds.

Weight (lbs)

Mean Daily Activity(unit=100 cal/day)

- 0.5

Page 7: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

‘Causal’ Model Using a Categorical Independent Variable

Treatment withSSRI(Coded yes=1, no=0)

35.0

Interpretation:For every 1-unit increase in Treatment, there is an increase in BDI score of 35 units.In this case, subjects in treatment with an SSRI will have an average BDI score 35 points higher than subjects not taking SSRIs.

BDI score

Independent variable Dependent variableEffect size

Page 8: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

What is actually going on?

Treatment withSSRI(Coded yes=1, no=0)

0.80

Interpretation:80% of those diagnosed with depression are taking an SSRI. Those diagnosed with depression have 50 points higher BDI scores. Taking an SSRI reduces the BDI score by 5 points.

Observed SSRIBDI effect (35) = 50 x 0.80 – 5.0

Correct Conclusion:After accounting for the effect of Pre-Treatment Depression, SSRI treatment has a direct negative effect on depression score.

BDI score

Was diagnosed withsevere depression(yes=1, no=0)

50.0

-5.0

Page 9: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Case Study: the effect of mindfulness training (MT) on working memory capacity (WMC) and positive and negative emotions in subjects who are under stress

Study Design:One Marine unit was given MT, another was not. Both units underwent stressful preparations for deployment.

Page 10: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Question: Does mindfulness training (MT) increase working memory capacity (WMC) and positive emotions in subjects who are under stress?

Results:“In the MT group, WMC decreased over time in those with low MT practice time, but increased in those with high practice time. Higher MT practice time also corresponded to lower levels of negative affect and higher levels of positive affect ….”

Conclusion:“these findings suggest that sufficient MT practice may protect against functional impairments associated with high-stress contexts.”

Page 11: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Author’s Model of Mindfulness EffectsMT increases WMC, WMC increases PA , both WMC & PA increase Job Performance

MindfulnessTraining (MT)

Working MemoryCapacity(WMC)

PositiveAffect (PA)

JobPerformance

a

Page 12: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Mindfulness Effects are Mediated by Practice Time

MindfulnessTraining (MT)

Working MemoryCapacity(WMC)

PositiveAffect (PA)

JobPerformance

MindfulnessPractice Time

b

c(obs)

a = bc(obs)

Page 13: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Mindfulness Effects: The observed effect of Practice Time on WMC may be spurious

MindfulnessPractice Time

Pre-MTWorking Memory

Pre-MTPositiveAffect

Post-MTWorking Memory

Post-MTPositiveAffect

x

y

Pre-MT

TraitMindfulness

During-MT Post-MT

JobPerformance

c

Page 14: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Trait Mindfulness Spuriously Increases cobserved

MindfulnessTraining (MT)Yes=1, No=0 Working

MemoryCapacity(WMC)

c

Trait Mindfulness

MT Practice Time

y

x

Observed MT-Practice-time—WMC correlation [c(obs)] = c + xy

We know that since x and y are both positive, c(obs) > c

Observed r = direct effect + spurious effect

b

Page 15: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Lots of variables may spuriously increase cobs

Working MemoryCapacity(WMC)

c

Trait Mindfulness

MT Practice Time

y1

x1

c(obs) = c + x1y1 + x2y2 + x3y3 + x4y4 + …. + xnyn

There may be many unmeasured variables creating spurious effects, so c(obs) >>> c

Observed r = direct effect + spurious effect

Pos Affect

IQ

??

y2

x2

y3

x3

y4

x4

Page 16: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

If you randomize subjects to Practice Time, this sets all x’s to 0

Working MemoryCapacity(WMC)

c

Trait Mindfulness

MT Practice Time

y1

c(obs) = c + x1y1 + x2y2 + x3y3 + x4y4 + …. + xnyn . This now becomes c(obs) = c + 0.

Observed r = direct effect

Pos Affect

IQ

??

y2

y3

y4

Page 17: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Carotid Arterial Stent vs. Surgical Repair (endarterectomy) for

carotid stenosis

Study Design:Examine a large database to determine outcomes following treatment.

Conceptualization:Dr. Smith believes that if CAS works better than CEA, then patients who received CAS should live longer than those who received CEA.

Page 18: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Results:9-month death rates were 4% for CEA, 5% for CAS.

Correct Conclusion:CAS treatment is positively associated with death at 9 months post.

Incorrect Conclusion:CEA produces better outcomes than CAS.

Page 19: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Lots of variables may spuriously increase cobs

Death at 9 months

c

Contralateralcarotid occlusion

Tx: CAS=1, CEA=0

y1

x1

c(obs) = c + x1y1 + x2y2 + x3y3 + x4y4 + …. + xnyn

There may be many unmeasured variables creating spurious effects, so c(obs) >>> c

Observed r = direct effect + spurious effect

CHF

Recent MI

Unstableangina

y2

x2

y3

x3

y4

x4

Severe COPD Age > 80

Page 20: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Does regression modeling solve this problem?

To some extent: only if you identify all the possible covariates that have x & y effects, and you have reliable measures for each of these variables. In practice, this is usually difficult to do. And you will not know if you’ve done it.

How about using a general comorbidity index as a covariate:For example, use Elixhauser score instead of individual variables

Page 21: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Comorbidity indicesElixhauser, A., Steiner, C., Harris, D. R., & Coffey, R. M. (1998). Comorbidity measures for use with administrative data. Med Care, 36, 8-27.Goldstein, L. B., Samsa, G. P., Matchar, D. B., & Horner, R. D. (2004). Charlson Index comorbidity adjustment for ischemic stroke outcome studies. Stroke, 35, 1941-1945.Dominick, K. L., Dudley, T. K., Coffman, C. J., & Bosworth, H. B. (2005). Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis. Arthritis Rheum, 53, 666-672.

These indices create a single score which is a sum of all the possible medical problems a patient could have:TB, infection, HIV, cancers, thyroid disorder, DM, MS, epilepsy, Headache, hyperlipidemia, gout, anemia, psychiatric disorders, cataracts, dizziness, HTN, cardiac disorders, varicose veins, bronchitis, asthma, abdominal hernia, etc.

Page 22: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

• Useful to correct for case mix in administrative studies examining treatment outcomes across hospitals or regions.

• The long list of disorders creates noise that swamps the actual covariates of interest when patients are the unit of analysis.

• Use of Propensity Scores is a better option(but you still may have problems with unmeasured covariates, measures with poor reliability, lack of group overlap).

Page 23: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Problems in interpreting correlations

Page 24: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Correlation & RegressionSubject Height Weight

1 66 125

2 68 150

3 70 160

4 72 195

5 73 180

6 74 175

7 76 200

8 77 205

Mean 72 173.75

SD 3.82 27.48

r = .933

64 66 68 70 72 74 76 7890

110

130

150

170

190

210

230

f(x) = 6.7156862745098 x − 309.779411764706R² = 0.870022713577157

Height x Weight

HeightW

eig

ht

Page 25: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Effect of Non-Linearity

4

5

6

7

8

9

10

11

12

13

14

0 2 4 6 8 10

Arousal level

Mem

ory

Tes

t sc

ore

Page 26: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Effect of Non-Linearity

R2 = 0.0323

4

5

6

7

8

9

10

11

12

13

14

0 2 4 6 8 10

Arousal level

Mem

ory

Tes

t sc

ore

Correlation is not a good statistic to use to measure non-linear relationships

r = .18

Page 27: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Effect of Extreme Score

Height x Weight with Outlier

y = 4.9329x - 176.87

R2 = 0.5474

100

120

140

160

180

200

220

65 70 75 80

Height

Wei

gh

t

Height x Weight

R2 = 0.87

y = 6.7157x - 309.78

100

120

140

160

180

200

220

65 70 75 80

Height

Weig

ht

r = .933r = .740

Page 28: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Outlier EffectR2 = 0.0086

0

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8 10 12

Arousal

Tes

t P

erfo

rman

ce

r = .093

Page 29: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Outlier Effect

R2 = 0.05630

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8 10 12

r = -.237

Page 30: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Effect of Subgroups

70

80

90

100

110

120

130

0 10 20 30 40 50 60 70

med dose

SB

P

Diagnosis A

Diagnosis B

Page 31: Correlation, Regression, and Causality Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core DC VAMC Assistant Professor, Depts. of Psychiatry

Effect of Subgroups

R2 = 0.0003

70

80

90

100

110

120

130

0 10 20 30 40 50 60 70

med dose

SB

P

R2 = 0.96684

86

88

90

92

94

96

0 10 20 30 40 50

R2 = 0.9708

108

110

112

114

116

118

120

122

124

126

128

0 10 20 30 40 50 60 70

Dx A

Dx B