from last time…

39
From last time….

Upload: derora

Post on 16-Mar-2016

39 views

Category:

Documents


4 download

DESCRIPTION

From last time…. Basic Biostats Topics. Summary Statistics mean, median, mode standard deviation, standard error Confidence Intervals Hypothesis Tests t-test (paired and unpaired) Chi-Square test Fisher’s exact test. More Advanced. Linear Regression Logistic Regression - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: From last time…

From last time….

Page 2: From last time…

Basic Biostats Topics • Summary Statistics

– mean, median, mode– standard deviation, standard error

• Confidence Intervals• Hypothesis Tests

– t-test (paired and unpaired)– Chi-Square test– Fisher’s exact test

Page 3: From last time…

More Advanced

• Linear Regression• Logistic Regression• Repeated Measures Analysis• Survival Analysis• Analyzing fMRI data

Page 4: From last time…

General Biostatistics References• Practical Statistics for Medical Research.

Altman. Chapman and Hall, 1991.• Medical Statistics: A Common Sense Approach.

Campbell and Machin. Wiley, 1993• Principles of Biostatistics. Pagano and Gauvreau.

Duxbury Press, 1993.• Fundamentals of Biostatistics. Rosner. Duxbury

Press, 1993.

Page 5: From last time…

Lecture 3:Linear Regression

Elizabeth [email protected]

Child Psychiatry Research Methods Lecture Series

Page 6: From last time…

Introduction

• Simple linear regression is most useful for looking at associations between continuous variables.

• We can evaluate if two variables are associated linearly.

• We can evaluate how well we can predict one of the variables if we know the other.

Page 7: From last time…

Motivating Example (Tierney et al. 2001)

• Is there an association between total sterol level and ADI scores in autistic children?

• Hypothesis: Children with lower sterol levels will tend to have poorer performance (i.e. higher scores) on the following components of the ADI:– social – nonverbal– repetitive

Page 8: From last time…

Preliminary Data• 9 individuals with autism• Some have been on cholesterol

supplementation (7 out of 9)• Mean age: 14• Age range: 8 - 32 years• Sterol is a continuous variable• ADI scores are continuous variables

Page 9: From last time…

Statistical Language• Need to choose what variable is the predicted

(Y) and which is the predictor (X). • Y: outcome, dependent variable, endogenous

variable• X: covariate, predictor, regressor, explanatory

variable, exogenous variable, independent variable.

• Our example?

Page 10: From last time…

Sterol Level

15

20

25

30

35

800 1000 1200 1400

Soc

ial S

core

Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core

Sterol Level

4

6

8

10

12

800 1000 1200 1400

Rep

etiti

ve S

core How can we conclude

if there is or is not anassociation betweensterol and the ADI scores?

Page 11: From last time…

One approach: Correlation

• Correlation is a measure of LINEAR association between two variables.

• It takes values from -1 to 1.• Often notated r or

r = 1 perfect positive correlationr = -1 perfect negative correlationr = 0 no correlation

Page 12: From last time…

x

y

0 2 4 6 8 10

02

46

810

x

y

0 2 4 6 8 10

05

10

x

y

0 2 4 6 8 10

-10

-8-6

-4-2

0

x

y

0 2 4 6 8 10

-10

010

r = 0.95

r = 0.09

r = 0.77

r = -0.95

Page 13: From last time…

Correlation between ADI measures and Sterol

Sterol Level

15

20

25

30

35

800 1000 1200 1400

Soc

ial S

core

Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core

Sterol Level

4

6

8

10

12

800 1000 1200 1400

Rep

etiti

ve S

core

r = -0.70

r = 0.06

r = -0.85

Page 14: From last time…

Related to r: R2

• R2 = % of variation in Y explained by X.• Example:

– Correlation between nonverbal score and sterol is -0.85.– R2 is 0.852 = 0.73– 73% of the variation in nonverbal score is explained by sterol

• Gives a sense of the value of sterol in predicting nonverbal score

• Other examples– R2 between sterol and social is 0.49– R2 between sterol and repetitive is 0.004

Page 15: From last time…

Simple Linear Regression (SLR) Approach

(1) Fits “best” line to describe the association between Y and X (note: straight line)

(2) Line can be described by two numbers- intercept- slope

(3) By-product of regression: correlation measures how close points fall from the line.

(4) Why “simple”? Only one X variable.

Page 16: From last time…

Intercept = 24.8

Slope = -0.01

Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core

Page 17: From last time…

SLR answers two questions….• Association?

– Does nonverbal score tend to decrease on average when sterol increases?

– Is slope different than zero?• Prediction?

– Can we predict nonverbal score if we know sterol level?

– Is the correlation (or R2) high? • You CAN have association with low correlation!

Page 18: From last time…

non verb a l stero l 0 1Equation of a line: 0: Intercept

0 is the estimated nonverbal score if it were possible to have a sterol level of 0 (nonsensical in this case).

0 calibrates height of line

• 1: Slope 1 is the estimated change in nonverbal score for a one unit change

in sterol 1 the estimated difference in nonverbal score comparing two kids

whose sterol levels differ by one.– We usually use 1 as our measure of association

Page 19: From last time…

The slope, 1

Is 1 different than zero?

Are each of these reasonable given the data that we have observed?

Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core

Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core

Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core

Page 20: From last time…

Evaluating Association 1is a “statistic,” similar to a sample mean, and as such

has a precision estimate.

• The precision estimate is called the standard error of 1. Denoted se(1).

• We look at how large 1 is compared to its standard error

1 is often called a “regression coefficient” or a “slope.”

Page 21: From last time…

General Rule• If , then we say that 1 is

statistically significantly different than zero.

• T-test interpretation: H0: 1 = 0

Ha: 1 0

• If is true, then p-value less than 0.05.

• Intuition: 1 is large compared to its precision not likely that 1 is 0.

1

1

2se( )

1

1

2se( )

Page 22: From last time…

For large samples….

1/se(1) pvalue0.50 0.621.00 0.321.50 0.131.65 0.101.96 0.0502.00 0.0472.25 0.0252.58 0.0102.75 0.0063.00 0.0033.25 0.001

Page 23: From last time…

ADI Nonverbal and Sterol

------------------------------------------------------------------------------ nonvrb | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- totster | -.0099066 .0022804 -4.344 0.003 -.0152988 -.0045144 _cons | 24.84349 2.578369 9.635 0.000 18.74661 30.94036------------------------------------------------------------------------------

1

0

se(1)

1

1se ( )

pvalue

R-squared = 0.73

Outcome

Predictor

Page 24: From last time…

Interpretation“Comparing two autistic kids whose sterol levels differ by 1,

we estimate that the one with lower sterol will have an ADI nonverbal score that is higher by 0.01 points.”

Put it in “real” units:“Comparing two autistic kids whose sterol levels differ by

200, we estimate that the child with the lower sterol level will have an ADI nonverbal score that is higher by 2 points.”

(Note: 200 x 0.01 = 2.0)

Page 25: From last time…

A few other details...

• 95% Confidence interval interpretation: 1 2se(1) does not include

zero. 1/se(1) is called the

– “t-statistic” – “Z-statistic”

• If you have small sample (i.e. fewer than 50 individuals), need to use a “t-correction.”

N 1/se(1) pvalue

10 2.31 0.0515 2.16 0.0520 2.10 0.0530 2.05 0.0540 2.02 0.0550 2.01 0.05

Page 26: From last time…

Relationship between correlation and SLR

Testing that correlation is equal to zero is equivalent to testing that the slope is equal to zero.

Can have strong association and low correlation

x

y

0 2 4 6 8 10

05

1015

20

x

y

0 2 4 6 8 10

010

2030

40

r = 0.931 = 1.86pvalue < 0.001

r = 0.551 = 1.88pvalue < 0.001

Page 27: From last time…

Additional Points(1) Association measured is LINEAR

x

y

-4 -2 0 2 4

05

1015

2025

r = 0.02

Page 28: From last time…

Additional Points

(2) Difference (i.e. distance) between observed data and fitted line is called a residual, .

1. 0.74 2. -0.95 3. -2.53 4. 3.01 5. 2.52 6. 0.45 7. -3.15 8. -0.07 9. 0.59

. Sterol Level

10

12

14

16

18

800 1000 1200 1400

Non

verb

al S

core 3 5

Page 29: From last time…

Additional Points(3) Often see model equation as

n on verba l stero l 0 1

n o n verb a l stero l i Ni i i 0 1 1; , . . . ,

Generically,

y x i Ni i i 0 1 1; , . .. ,

non verb a l stero l 0 1

Refers toregressionlineRefer to

observeddata

Page 30: From last time…

Additional Points

(4) Spread of points around line is assumed to be constant (i.e. variance of residuals is constant)

x

y

0 2 4 6 8 10

-20

020

4060

BAD!

Page 31: From last time…

Multiple Linear Regression

• More than one X variable• Generally the same, except

– Can’t make plots in multi-dimensions– Interpretation of ’s is somewhat different

y x x xi i i i i 0 1 1 2 2 3 3

Page 32: From last time…

Other ADI and Sterol SLRs

• How is age when supplementation began related to sterol?

• How is age when supplementation began related to nonverbal score?

Page 33: From last time…

nonvrb

500

1000

1500

10 15 20

totster

0 10 20 30

agester

Page 34: From last time…

How might this change our previous result?

Sterol Nonverbal Score

• What if age when cholesterol supplementation began is associated with both sterol level and nonverbal score?

• Is it correct to conclude that total sterol level is associated with nonverbal score?

Supplementation Age

Page 35: From last time…

We can “adjust”!

------------------------------------------------------------------------------ nonvrb | Coef. Std. Err. t P>|t| [95% Conf. Interval]---------+-------------------------------------------------------------------- sterol | -.0105816 .0022118 -4.784 0.003 -.0159937 -.0051696 agester | .1570626 .1158509 1.356 0.224 -.1264143 .4405394 _cons | 23.81569 2.551853 9.333 0.000 17.57153 30.05985------------------------------------------------------------------------------

n o nverba l stero l a g esteri i i i 0 1 2

Page 36: From last time…

Interpretation of Betas• Now that we have “adjusted” for age at

supplementation, we need to include that in our result:“Comparing two kids who began cholesterol supplementation at the

same age and whose sterol levels differ by 250 units, we estimate that the child with the lower sterol level will have an ADI nonverbal score higher by 2 points.”

“Adjusting for age at supplementation, comparing two kids whose sterol levels differ by 250 units, we estimate…”

“Controlling for age at supplementation …..”“Holding age at supplementation constant…..”

Page 37: From last time…

Collinearity

• If two variables are – correlated with each other– correlated with the outcome

• Then, when combined in a MLR model, it could happen that– neither is significant– only one is significant– both remain significant

Page 38: From last time…

ADI and Sterol

We say that cholesterol time and sterol are “collinear.”

Correlation Matrix

| nonvrb sterol agester---------+--------------------------- nonvrb | 1.0000 sterol | -0.8541 1.0000 agester | 0.0531 0.2251 1.0000

Page 39: From last time…

Summing up example….

• After adjusting for age at supplementation, it appears that sterol is still a significant predictor of ADI nonverbal score.

• BUT!– Only NINE observations! With more, we would almost

CERTAINLY see even stronger associations!– We haven’t controlled for other potential confounders:

• length of time on supplementation• nonverbal score prior to supplementation