applied epidemiologic analysis

35
Applied Epidemiologic Analysis Fall 2002 Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie Kranick Sylvia Taylor Chelsea Morroni Judith Weissman

Upload: maxine-dyer

Post on 03-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Applied Epidemiologic Analysis. Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia Taylor Chelsea MorroniJudith Weissman. Lecture 7. Categorical analysis Conditional logistic regression Unconditional logistic regression Introduction to stratifiers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Applied Epidemiologic Analysis

Patricia Cohen, Ph.D.

Henian Chen, M.D., Ph. D.

Teaching Assistants

Julie Kranick Sylvia TaylorChelsea Morroni Judith Weissman

Page 2: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Lecture 7

Categorical analysis

Conditional logistic regression

Unconditional logistic regression

Introduction to stratifiers

Page 3: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Objectives

• To understand the basic assumptions of analyses of case-control and cohort data

• To see how assumptions about the predictor variables differ between categorical analyses and some regression models

• To see the connection between stratified analyses and analyses incorporating all stratifiers as predictors

Page 4: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Categorical analysis:Analyses of tables of frequencies

Assumptions / requirements

• Adequate sample size in each table cell and in total

• Independence of outcomes• no contagion effects • single event per person

• For rates, homogeneity: probability of outcome is uniform for all time units in a stratum

e.g., doesn’t matter if 6 people are observed for 10 years or 10 people are observed for 6 years

Page 5: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Does not assume that distributions of exposure and other predictors are fixed.

In contrast, ordinary regression analysis assumes that distributions of independent variables are fixed (selected or created by the researchers, rather than whatever distributions happen to characterize the sampled population).

Ordinary or “unconditional” logistic regression also assumes that independent variables are fixed.

Categorical analysis

Page 6: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Cateforical analysis of incidence rates:A single group in comparison to some expected rate

T

AIncidence per time unit in exposed group

T

E Incidence per time unit expected in the reference population for the same distribution of person-time (e.g., based on morbidity rates for equivalent age groups)

Page 7: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Categorical analysis:Single group in comparison to some expected rate

E

A

TETA

ratio = standardized morbidity ratio

Since person-time distribution is contant:

Confidence limits on this rate ratio employ the Poisson distribution and maximum likelihood estimation.

Should be adequate when E > 5.

Page 8: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Categorical analysis of 2 groups, exposed and unexposed

Maximum likelihood estimates using the Poisson model are used to estimate rate ratios and rate difference or risk ratios and risk differences.

Hand calculation of these estimates is rare, partly because of the inclusion of multiple confounders and/or exposures in the models.

Page 9: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Selecting an analytic model forcase-control (or cohort) data

The ordinary least squares (OLS) method of analyzing dichotomous outcomes is problematic because the formal assumptions of the model (homoscedasticity) are necessarily violated.

Nevertheless, for case-control data with similar sample sizes in the two groups, conclusions from OLS and logistic regression may well be similar.

Page 10: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Ordinary Least Squares

This model uses as a link function the “identity” function: a difference in the value of the predictor is (linearly) related to a difference in the value of the outcome.

When the outcome is disease or non-disease, this is equivalent to a difference in the proportion with the disease (incidence or prevalence).

For a binary exposure the B = difference in proportion, or risk difference.

Page 11: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Logistic Regression Model

The link function estimated by the logistic regression model (using maximum likelihood methods) is the log odds or logit.

In this model, for a binary exposure the B = difference in the log odds of the outcome (disease).

It is equivalent to an exponential odds model, so taking the anti-log provides the odds ratio, an estimate of risk ratio.

Page 12: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Other models

• Exponential risk models – a log- linear risk model (requires an estimate of risk in the source population)

• Probit model – assumes a normal distribution underlying outcome; used in bioassay and economics

Note: These alternative models are designed to provide apprpriate statistical tests, but do not necessarily match the actual biological mechanisms.

Page 13: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Stratification of case-control data

• A means of equating for stratifiers• Most often on sex and age categories

• Note: If there is a non-trivial age difference there will be a remaining mean difference within categories.

Page 14: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Stratifying variables: standardization

Standardization of rates or risks with regard to a stratifying variable.

Example: Control group = 40% male

Case group = 60% male

Can standardize the case group to the control by weighting every female case by 1.5 and every male case by .67. So the sum of weights still = N in the case group.

Page 15: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Stratifying variables: standardization

Thus, for every 100 cases we have:

60 males * .67 (= 40)

40 females * 1.5 (=60)

Weighted N = 100.

Could, alternatively, weight both case and control groups to equal male and female sizes.

Page 16: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Weighting for rate or risk difference, or unconditional logistic regression

This weighting to produce equality on predictors can be done for hand calculation of rate or risk differences or for computer analyses of data by conditional or unconditional logistic regression.

Note: This is only one reason for weighting observations. Another common reason is to take into account sampling strategies with unequal probabilities for inclusion. Such strategies often over-sample certain strata in order to improve the statistical power for analyses of subgroups.

Page 17: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Weighting

It is useful to see this as analogous to what the analytic program does when inclusion of a predictor “equates” groups by removing effects of counfounders.

Simple standardization assumes a uniform effect of exposure across strata: each stratum provides an estimate of the same quantity.

Statistical tests of homogeneity are commonly used to decide whether this assumption is warranted.

Page 18: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Mantel – Haenszel Estimation

Mantel – Haenszel estimation of uniform rate differences (using weights as described above applied to person – time)

Preferred when some strata have fewer than 10 cases

Unbiased, unlike maximum likelihood estimates, but larger SE (much larger for rate difference, not much for rate ratio)

Page 19: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

First Study : Wine drinking and risk of non-Hodgkin’s lymphoma among men in the United States: a population based case-control study

Reference:

Nathaniel C. Briggs, Robert S. Levine, Linda D. Bobo, William P. Haliburton, Edward A. Brann, and Charles H. Hennekens, American Journal of Epidemiology, 156, No. 5, 454-462

Page 20: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

The problem: Lymphoma study

Non-Hodgkin’s lymphoma (NHL) is the fifth most common cancer in the United States with etiology mostly unknown.

Can exploration of protective factors help move toward etiological understanding? Specifically, will this study strengthen prior weak evidence of lower NHL in wine drinkers?

Page 21: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Population studied, study design, and sample size : Lymphoma study

960 cases of NHL males born 1929 – 1953 and diagnosed 1984 – 1988 (without specific known risks such as HIV)

1717 controls of males recruited through random digit dialing and matched geographically

Page 22: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Measurement issues: Lymphoma

Data collected by interviews regarding life-time habits

Selection and inclusion of predictors in the analysis:

* All odds ratios (OR) are adjusted for age, race/ethnicity, cancer registry,smoking history, and education.

Odds ratios for each alcohol beverage type are adjusted for the other types. All odds ratios are in reference to nondrinkers.

Page 23: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

The effect being estimated: Lymphoma

Basic analysis to answer study questions:

Logistic regression analysis

Test for the significance of the trend (dose-response) in the OR as dose increases

Odds ratios of NHL associated with alcohol consumption by type and quantity over the life-time

Page 24: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

TABLE 5. Adjusted odds ratios and 95% confidence intervals for risk of developing non-Hodgkin’s lymphoma by type, quantity, and age of onset of alcohol beverage consumption, Selected Cancers Study, 1984–1988 No. of

cases No. of controls

OR* 95% CI†

p for trend‡

Never drinkers 300 510 1.0 All drinkers§ 660 1,207 0.9 0.8, 1.1 Current drinkers¶ 490 930 0.9 0.8, 1.1 Former drinkers 170 277 1.0 0.8, 1.3 Wine drinkers 1–6 drinks/week 178 352 0.8 0.5, 1.3 1 drink/day 46 121 0.4 0.2, 0.9 0.02 Beer drinkers 1–6 drinks/week 271 555 0.8 0.6, 1.1 1–2 drinks/day 168 242 1.2 0.8, 1.7 3 drinks/day 93 160 0.9 0.6, 1.4 0.58 Spirits drinkers 1–6 drinks/week 237 454 0.8 0.6, 1.2 1–2 drinks/day 109 178 1.1 0.7, 1.8 3 drinks/day 53 69 1.1 0.6, 2.1 0.38 Age at onset (years) 16 52 130 0.7 0.4, 0.96 17–18 182 291 1.0 0.8, 1.3 19–20 103 200 0.9 0.6, 1.2 21 319 572 0.9 0.8, 1.2 0.75

Page 25: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Conclusions: Lymphoma

“Among wine drinkers, there was a significant linear decrease in risk of NHL with increasing quantity of wine intake. A more than twofold decrease in risk was seen for consumption of one wine drink or more per day.”

Note that the p for trend tests the dose-response aspect.

Page 26: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

TABLE 7. Adjusted odds ratios and 95% confidence intervals for risk of developing non-Hodgkin’s lymphoma among drinkers by type and quantity of alcohol beverage consumption stratified by age of onset of drinking, Selected Cancers Study, 1984–1988

Onset age <16 years Onset age >17 years

No. of cases

No. of controls OR* 95% CI†

p for trend‡

No. of cases

No. of controls OR* 95% CI†

p for trend‡

Nondrinkers 300 510 1.0 300 510 1.0

All drinkers

Current drinkers 37 91 0.7 0.4, 1.1 450 154 0.9 0.8, 1.1

Former drinkers 15 39 0.6 0.3, 1.1 826 237 1.0 0.8, 1.3

Wine drinkers 12 55 0.4 0.2, 0.7 211 412 0.9 0.7, 1.2

1–6 drinks/week 8 31 0.4 0.2, 0.97 169 315 1.0 0.8, 1.3

1 drink/day 4 24 0.3 0.1, 0.8 0.004 42 97 0.7 0.4, 1.04 0.05

Nonwine drinkers 40 75 1.0 0.8, 1.2 393 651 1.0 0.9, 1.1

1–6 drinks/week 9 24 0.7 0.3, 1.6 140 268 0.8 0.6, 1.1

1 drink/day 31 51 1.0 0.8, 1.3 0.85 253 383 1.0 0.9, 1.2 0.88

Page 27: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Conclusions: Lymphoma

Early age of onset of drinking was associated with decreased risk of NHL specifically for wine drinkers.

Discussed biologic plausibility, probable effects of self-report, and data limitations (biases generally would be expected to lower effects) and age-sex limitations of sample.

Page 28: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Second Study : Occupation and Adult Gliomas

Reference:

Susan E. Carozza, Margaret Wrensch, Rei Miike, Beth Newman, Andrew F. Olshan, David A. Savitz, Michael Yost and Marion Lee American Journal of Epidemiology, 152, No 9, 838 - 846.

Page 29: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

The problem: Gliomas

Gliomas are the most common form of primary malignant brain tumor in adults. The etiology is largely unknown but prior evidence implicates occupational exposures associated with certain chemically-exposed industrial, agricultural and blue-collar workers.

Page 30: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Population studied, study design, and sample size : Gliomas

492 incident cases in San Francisco bay area, age over 20

462 controls recruited through random digit dialing, matched by :

5 year age group

gender

ethnicity

(Note: 1/3 declined to participate. Controls more educated because of participation bias.)

Page 31: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Measurement issues: Gliomas

Because of rapid death of cases, many proxy informants needed to supply information. How might these interviews be biased?

Are the controls likely to be adequate?

Control variables: age (20- 54 vs 55+), gender, years of education, race

Page 32: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Analyses

Exposure measures:

•All jobs held at least 6 months in lifetime

•All jobs up to 10 years previously (assuming a 10 year latency)

•Within each,

ever employed

< 10 years

=> 10 years

Page 33: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Logic of study: Gliomas

If real, the association should increase with longer exposure.

Also, if real, the effect should be more apparent when the latency period is excluded.

Page 34: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Ever employed < 10 years 10 years No latency period

OR 95% CI OR 95% CI OR 95% CI Managers, administrators 0.9 0.7, 1.2 0.8 0.6, 1.2 1.1 0.7, 1.6 Engineers, architects, draughtsmen 0.9 0.6, 1.4 0.8 0.4, 1.4 1.1 0.6, 1.9 Mathematical, physical, computer scientists 1.0 0.7, 1.6 1.5 0.9, 2.6 0.6 0.3, 1.1 Biologic scientists 1.0 0.4, 2.3 1.2 0.4, 3.2 0.6 0.1, 3.2 Chemists, pharmacists, chemical engineers 0.6 0.4, 2.3 1.0 0.3, 3.6 0.0 0.0, Engineering, science technicians 0.6 0.3, 1.2 0.5 0.2, 1.1 Dentists, dental technicians 1.0 0.4, 3.0 0.6 0.2, 2.0 Physicians, surgeons 3.5 0.7, 17.6 2.2 0.2, 25.0 4.7 0.5, 42.7 Nurses, health technicians 1.3 0.8, 2.1 1.3 0.7, 2.3 1.3 0.6, 2.9 Teachers, librarians 0.8 0.5, 1.1 0.7 0.4, 1.1 1.0 0.6, 1.7 Legal and social service workers 1.1 0.6, 1.9 0.8 0.4, 1.6 1.8 0.7, 4.8 Entertainers, athletes 0.6 0.3, 1.0 0.6 0.3, 1.1 0.7 0.2, 2.2 Writers, journalists 0.8 0.3, 2.0 1.3 0.4, 3.8 0.2 0.0, 2.0 Artists 1.9 0.5, 6.5 4.2 0.5, 38.4 1.1 0.2, 5.4 Photographers, photo processors 0.5 0.1, 1.8 0.5 0.1, 2.7 0.4 0.0, 4.4 Printers 0.7 0.3, 1.9 1.0 0.3, 3.0 0.4 0.1, 2.2 Salesmen 0.7 0.6, 1.0 0.6 0.4, 0.8 1.2 0.8, 1.8 Clerks 0.6 0.5, 0.9 0.6 0.5, 0.9 0.8 0.5, 1.3 Shippers 1.1 0.7, 1.9 1.1 0.6, 1.8 1.6 0.5, 5.6 Messengers 1.0 0.6, 1.9 1.0 0.5, 1.9 1.1 0.2, 5.1 Electronic equipment operators 0.7 0.4, 1.0 0.6 0.4, 1.0 0.9 0.3, 2.8 Firemen 2.7 0.3, 26.1 0.0 0.0, Policemen, guards 0.7 0.3, 1.3 0.5 0.2, 1.2 1.1 0.3, 3.7 Armed forces 0.8 0.5, 1.2 0.7 0.5, 1.1 2.9 0.6, 14.9 Janitors 0.9 0.6, 1.5 0.6 0.4, 1.2 2.5 0.9, 7.2 Personal service workers 0.6 0.3, 1.0 0.6 0.4, 1.2 0.4 0.14, 1.5 Textile workers 1.2 0.5, 3.1 1.8 0.6, 5.4 0.3 0.0, 2.9 Food service workers 0.9 0.6, 1.2 0.8 0.6, 1.1 1.3 0.6, 2.9 Food processors 0.9 0.5, 1.6 1.1 0.6, 2.1 0.4 0.1, 1.5 Farm managers and workers 0.5 0.4, 0.8 0.5 0.3, 0.8 0.7 0.4, 1.4

Page 35: Applied Epidemiologic Analysis

Applied Epidemiologic AnalysisFall 2002

Gliomas Odd Ratios

Virtually no odds ratios were statistically significantly different from 1.0. Nevertheless, several were discussed. Is this sensible?