is there a comparison? ◦ are the groups really comparable? are the differences being reported...

The Epidemiologist’s Toolbox

Cari OlsonBonnie Kerker

NYC Department of Health and Mental HygieneNovember 29, 2012

Is there a comparison? ◦ Are the groups really comparable?

Are the differences being reported real?◦ Are they worth reporting? ◦ How much confidence do we have in them?

Can anything else explain this association?

What can (and can’t) this study tell us?

How should findings be accurately presented?

Questioning Health Data

Most data interpretation requires context – a comparison group.◦ Same group compared over time;◦ Different groups compared within same timeframe;◦ Different groups compared over time.

Without a comparison, the likelihood that findings are due to factors other than the hypothesized cause cannot be assessed.

Selection of study participants; Chance; Other factors or trends.

Comparison / Control Groups

A basic epidemiologic tool because they allow for appropriate comparisons.◦ Comparing counts can be misleading.

# of events in a specific time periodRate = -------------------------------------------------- x 10 n

Avg. pop during that time period

…per 100 (%) …per 1000 …per 100,000

Rates

There were 1,765 heart disease deaths in Flushing, Queens in 2002 and 882 in Pelham Bay, Bronx.

Start with a Fact….

A . Flushing, Queens

B . Pelham Bay, Bronx

Where are residents at greater risk for dying from heart disease?

Where are residents at greater risk for dying from heart disease?

A. Flushing, Queens 354/100,000 pop

B. Pelham Bay, Bronx 361/100,000 pop

Because Flushing (n = 498,318) has a larger population than Pelham Bay (n = 244,452).

Same as saying 25 miles-per-hour is faster than 50 miles-per-day:

◦ 25 miles 50 miles 1 hour 1 day (24 hours)

Why?


Are the differences being reported real?◦ Are they worth reporting? How much confidence do we

have in them?





The process of inferring from your data whether an observed difference is likely due to chance.

Commonly, significance set at 0.05 (5%): 95% sure that the association is not due to chance. sig=0.01 (1%): 99% sure.

sig=.10 (10%): 90% sure.

The smaller the sample, the more difficult it is to find a significant difference.

◦ In larger samples, it is often easy to find significance – but is it meaningful?

What is Statistical Testing?

Statistical significance ≠ importance

Not significant ≠ no association

Statistical significance ≠ causation

Notes on Interpretation

An interval or range of values that reflects the precision of an estimate of a population parameter. Statistically, how confident are we that the

number is real? E.g., Smoking prevalence (2010): 14.0% (12.9,

15.3)

The more confidence you want (90% vs. 95% vs. 99%), the wider the interval.

What is a Confidence Interval (CI)?

What does it mean if 2 CIs overlap?– Prevalence of smoking among:

• Men: 16.1% (14.3%-18.1%)• Women: 12.2% (10.8%-13.7%)

– Prevalence of diabetes among:• Men: 9.4% (8.3%-10.8%)• Women: 9.1% (8.2%-10.2%)

What does it mean if a CI includes 0?

Applied Interpretation of a CI


Are the differences being reported real?◦ Are they worth reporting? How much confidence do we

have in them?





16

A third factor that influences the relationship between exposure and disease.

If you are interested in actual differences in prevalence across populations, confounders are not that important.

However, if you are interested in assessing risk differences, confounders can and should be controlled for in analyses.

Confounding

Example: When comparing cardiac disease between men and women, what other factor may confound the relationship between sex and illness? Age! If we don’t adjust for age, and find a higher

prevalence among women, it might be due to the fact that in the general population, women are (on average) older than men.

Age-adjustment is one way to limit confounding. Ensures that any differences you see between groups

are NOT due to age.

Confounding

Cross-sectional◦ Select a sample from the population and measure predictor and

outcome variables at the same time. Yields prevalence; Cannot talk about incidence or risk of developing a disease; Cannot establish sequence of events; Cannot infer causation; Can be generalizable.

Case-control◦ Select two samples from the population - one with disease and one

without, then look back and measure predictor variable. Yields odds ratio (measure of association); Cannot talk about incidence or risk of developing a disease; Can be generalizable.

Types of Studies

Prospective cohort ◦ Select a sample from the population, measure

predictor variable (presence or absence), then follow up and measure the outcome variable. Yields incidence, relative risk; Can be generalizable.

Randomized Control Trial (RCT)◦ Randomly assign people to treatment or control

(exposure), then follow up and measure outcome. Can be generalizable; STRONGEST STUDY DESIGN FOR CAUSATION.

Types of Studies

Ecologic Study◦ Unit of analysis is a population, rather than an

individual. For example, looking at rates of disease across countries. Can’t infer anything about individuals; Cannot infer causality.

Qualitative Study◦ Aims to gather an in-depth understanding;◦ Includes focus groups, in-depth interviews;◦ Subjects are not systematically chosen to represent a

target population. Data cannot be generalized.

Types of Studies

Time sequence of events

Biological plausibility

Consistency and replications

Rule out confounding

Causality

Size of study◦ The bigger the study, the more power you have to

detect findings and the more generalizable it will be.

New knowledge vs. replicated finding◦ First study ever finding this result?◦ Scientific method requires ability to replicate

findings.

How meaningful is this study?

Provide clear context of literature base and importance of findings.◦ How big is the population that these findings apply to

and what population exactly is referenced?

Always source the data clearly, providing link to/information on original research for audience.

Question researchers on limitations to their data.◦ Researcher “headlines” (titles/abstracts) can be

misleading!

Presentation of Data Findings

Best answered by qualitative data (focus groups, interviews).

Speculation vs. Evidence.

Reporting “could be” rather than “is.”

The WHY Question

Anecdotes can make data come alive, but…◦ “Anecdotal evidence” is an oxymoron.

Anecdotes should not be the only “counterfactual” argument against data.◦ “Fairness” in reporting must insist on data (with

stated limitations) from both sides.

Illustrating Data with “Human Interest” Stories

Anecdotes must be presented in the context of the data.◦ Source says “Everyone does X” vs. data showing

that 35% of people do X.

Illustrating Data with “Human Interest” Stories

EpiQuery◦ Web-based, interactive data tool◦ Multiple data sources

My Community’s Health: Data and Statistics◦ www.nyc.gov/health

Remember Health Department Data Sources for NYC

THANK YOU!

Contact: [email protected]@health.nyc.gov

mailto:[email protected]

mailto:[email protected]

is there a comparison? ◦ are the groups really comparable? are the differences being reported...

Documents

comparison group

pelham bay n

timedifferent groups

timeframedifferent groups

confidence interval

questioning health datais

popbecause flushing

heart disease deaths