7 regression & correlation: rates basic medical statistics course october 2010 w. heemsbergen

21
7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Upload: martin-bridges

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

7 Regression & Correlation: Rates

Basic Medical Statistics CourseOctober 2010W. Heemsbergen

Page 2: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Event rate

Event rate: rate at which the event occurs per subject per period of time.

Number of events occurring

Cumulative units of time*

Rate =

* Clinical research: person-years (total number of years of follow-up for all individuals)

123456

X

X

X

Time should only be counted in which information is available about possible events, and in which the subject is at risk.

One count (e.g. onset cancer), or several counts (e.g. bloody nose), are possible.

1

Page 3: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Event rateIncidence rate: no. of new cases per time periodMortality rate: no. of death per time period

In case of a small rate: re-expressed by for instance the rate per 1000 person-years.

1 10 years +2 6 years -3 5 years -4 1 year +

2 events / 22 years : event rate = 0.09 per person-year (or 90 per 1000 p-y).

If we are only interested in first events (e.g. diagnosis of breast cancer)the f-up must cease at the time point of the (first) event.

2

Page 4: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Relative rate

Relative rate =(rate ratio,

incidence rate ratio)

Rate exposed

Rate unexposed

A relative rate equal to 1 indicates a similar risk for the two groups

A relative rate > 1 indicates that the rate is higher in the exposed group.

A relative rate < 1 indicates that the rate is lower in the exposed group.

A relative rate is interpreted similar as the relative risk and the Odds ratio, in most

situations in cancer research.

3

Page 5: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Standardization

A comparison between 2 rates can be misleading/inadequate.

The (crude) mortality rate (number of deaths per 1000 person years) between2 countries is misleading when country A has a relatively young population and country B a relatively old population (e.g. European vs. African country).

Solution 1: age specific death rates (a calculated rate for each age category)

Solution 2: standardization of the mortality rate, using a standard population.

Solution 3: recalculate (adjust) rate of population A, using the age structure of population B.

Standardized mortality/death rate: a standard population is introduced with a fixed age structure. Then the mortality of any population is adjusted for discrepancies in age structure between standard and the specific population.

Factors often used in standardization: calender-year, age, gender, ethnicity.

4

Page 6: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Rate vs. Risk

Rate: Total no. events / person-years of follow-up.

Risk: Total no. events / no. of individuals exposed (probability between 0-1 for first events).

Risk: Is calculated for a certain interval of time, may differ for longer or shorter intervals.

In case the follow-up differs from person to person,rates are preferred.

5

Page 7: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

ExampleImmediate risk of suicide and

cardiovascular death after a prostate cancer diagnosis.

BACKGROUND: Receiving a cancer diagnosis is a stressful event that may increase risks of suicide and cardiovascular death, especially soon after diagnosis.

METHODS: We conducted a cohort study of 342,497 patients diagnosed with prostate cancer from January 1, 1979, through December 31, 2004, in the Surveillance, Epidemiology, and End Results Program. Follow-up started from the date of prostate cancer diagnosis to the end of first 12 calendar months after diagnosis. The relative risks of suicide and cardiovascular death were calculated as standardized mortality ratios (SMRs) comparing corresponding incidences among prostate cancer patients with those of the general US male population, with adjustment for age, calendar period, and state of residence. We compared risks in the first year and months after a prostate cancer diagnosis. The analyses were further stratified by calendar period at diagnosis, tumor characteristics, and other variables.

J Natl Cancer Inst. 2010;102:307-14

6

Page 8: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

ExampleRESULTS: During follow-up, 148 men died of suicide (mortality rate = 0.5 per 1000 person-years) and 6845 died of cardiovascular diseases (mortality rate = 21.8 per 1000 person-years).

Patients with prostate cancer were at increased risk of suicide during the first year (SMR = 1.4, 95% confidence interval [CI] = 1.2 to 1.6), especially during the first 3 months (SMR = 1.9, 95% CI = 1.4 to 2.6), after diagnosis. The elevated risk was apparent in pre-prostate-specific antigen (PSA) (1979-1986) and peri-PSA (1987-1992) eras but not since PSA testing has been widespread (1993-2004).

The risk of cardiovascular death was slightly elevated during the first year (SMR = 1.09, 95% CI = 1.06 to 1.12), with the highest risk in the first month (SMR = 2.05, 95% CI = 1.89 to 2.22), after diagnosis. The first-month risk was statistically significantly elevated during the entire study period.

CONCLUSION: A diagnosis of prostate cancer may increase the immediate risks of suicide and cardiovascular death.

7

Page 9: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Question

SMR (standardized mortality ratio) = 1.09 (cardiovasc death)

A group of 15.000 men is diagnosed with prostate cancer.

Based on statistics of the general male population, the baseline risk for cardio-vascular death (without prostate cancer diagnosis), is 0.8 % for the coming year.

How many men in this group are expected to die from cardio-vasc disease, the coming year ?

8

Page 10: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

7 Regression & Correlation: Logistic regression

Basic Medical Statistics CourseOctober 2010W. Heemsbergen

Page 11: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

(Binary) Logistic Regression

We have collected data on N individuals.

We are interested in disease A ,which is present in part of the subjects:

• which (risk) factors are predictive / associated with the disease ?

• what is the probability that a subject with a certain risk profile, has the disease or will develop the disease ?

Example: The development of mucositis of the lower alimentary tract after chemotherapy in cancer patients.

- What are the risk factors predictive for mucositis after chemotherapy ?

- What is the probability to develop mucositis after chemotherapy, given an individual risk profile ?

- Potential risk factors: age, weight, renal functioning, type and duration of chemotherapy, ….

9

Page 12: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

(Binary) Logistic Regression

Logistic Regression is similar to Linear Regression. It is used when the outcome of

interest (the dependent variable) is not continuous (e.g. cancer yes/no).

A patient with a certain risk profile (the independent factors), has a probability to

develop an outcome: risk factor 1, risk factor 2 (covariates), … result in a

probability (between 0-1). The outcome itself will however always be present (1) or

not present (0).

probability(D=1|z) = ez / (1+ ez) ez = Exp(z)

set of covariate values: x1..xk, regression coefficients b1 .. bk

z = a + b1x1+b2x2…+b1x1

10

Page 13: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Examplepatnr mean lung Radiation

dose (Gy) Pneumonitis 1 20.1 1 2 23.6 1 3 7.7 1 4 10.5 0 5 6.0 0 6 26.0 1 7 14.8 0 8 18.2 1 9 22.7 1 10 17.1 0 11 24.0 1

11

Page 14: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Linear Regression: PRED = -0.135 + 0.045 * MLD

Log. Reg: PROB(D=1) = (exp(-3.4 + 0.24 * MLD)) / ( 1 + exp(-3.4 + 0.24 * MLD) )

Exp(B) is the Odds Ratio for a unit increase. (Odds: P/(1-P) )

Logistic Regression

Linear Regression

12

Page 15: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

What is an Odds (Ratio) ?

Obese Diabetes yes yesyes yesyes yesyes nono nono nono nono yes.. ..

30

10

10

30

obesey n

ydiabetes n

Oddsobese = 0.75/0.25=3

Oddsnot obese = 0.25/0.75=0.33

OR = 0.33 / 3 = 0.11

or OR = 3 / 0.33 = 9 preferred

Are obese patients more at risk to develop diabetes ?What is the Odds Ratio (OR) ?

Odds = p/(1-p) = proportion with disease/(1-proportion with disease)

13

(ratio exposed/unexposed)

Page 16: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Variable types

The potential predictive factors of interest, can be continuous, categorical, ordinal, or binary. How to deal with these different types in Logistic Regression ?

In the Logistic Regression procedure, categorical data have to be indicated as categorical data, and a reference category has to be chosen. Then for each other category, the regression coefficient is calculated using this category as a reference. Therefore it is advised to use the largest category as the reference.

If not, it will be assumed that the variable is “continuous”: for each increase of a unit, the same regression coefficient is estimated. Normal distribution is no prerequisite.

An ordinal variable can be put in the model as a continuous variable. One should however always be aware of the underlying assumptions in the model.

In case of a binary predictive variable, it is not necessary to choose. However, the “reference” will be the lowest value in case of a continuous variable, and possibly the highest value in case it is indicated as a category (depending on the chosen reference category).

14

Page 17: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Example: categorical / continuous

Obese Diabetes (1 yes, 2 no or 0 no) (1 present, 0 not)

1 11 11 11 02 (0) 02 (0) 02 (0) 02 (0) 1

3 1

1 3

obesey n

ydiabetes n

Oddsobese = 0.75/0.25=3

Oddsnot obese = 0.25/0.75=0.33

OR = 0.33 / 3 = 0.11

or OR = 3 / 0.33 = 9 (preferred)

15

Page 18: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Example: categorical / continuous

Obese, 1=yes, 2=no, continuous

Obese, 1=yes, 0=no, continuous

Obese, 1=yes, 2=no, category (reference=last)

risk factors present/not present: code 1 and 0, continuous var.

16

Page 19: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Example: rectal bleeding

Int J Radiat Oncol Biol Phys 2004; 59: 1343.

Dosimetric factors predictive for

moderate/severe rectal bleeding,

after RT for prostate cancer.

17

Page 20: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Example: esophagus toxicity

Radiat Oncol 2005; 75: 157.

To correlate acute esophageal toxicity with dosimetric and clinical parameters for patients treated with radiotherapy (RT) alone or with chemo-radiotherapy (CRT).

probability(D=1) = exp(z) / (1+ exp(z) ),

can be rewritten as:

probability(D=1) = 1/(1+exp-(z) )volume of esophagus

18

Page 21: 7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen

Question

What is the Odds Ratio for V35, and for Concurrent Chemo-RT ?

19