measuring dietary intake

Measuring Dietary Intake

Raymond J. CarrollDepartment of Statistics

Faculty of Nutrition and ToxicologyTexas A&M University

http://stat.tamu.edu/~carroll

_________________________________________________________

Where I am From

Wichita Falls

(Ranked #10 in the worst jobs in Texas by Texas Monthly, 1980)

Best pecans in the world

_________________________________________________________

What I am Not

I know that potato chips are not a basic healthy food group. However, if you ask me a detailed question about nutrition, then I will ask

Joanne Lupton Nancy Turner Meeyoung Hong

_________________________________________________________

You are what you eat, but do you know who you are?

• This talk is concerned with a simple question.• Will lowering her intake of fat decrease a

woman’s chance of developing breast cancer?

• This is a hugely controversial question• The debate has a huge statistical component• It is also relevant to questions such as: if I

lower my caloric intake, will I live longer?

_________________________________________________________

Evidence in Favor of the Fat-Breast Cancer Hypothesis

• Animal studies

• Ecological comparisons

• Case-control studies

_________________________________________________________

International Comparisons

• There are major differences in fat and saturated fat intake across countries

• Are these related to breast cancer?

_____________________________________________________________

International Comparisons _____________________________________________________________

Case-control studies• Find women who have breast cancer, and

women who do not.• Compare their current fat intakes

• A problem on its face. We want past intake, not after-the-fact intake

• Not much found in single studies, but pooling them over many diverse studies suggests a fat-breast cancer link

_________________________________________________________

Evidence against the Fat-Breast Cancer Hypothesis

• Prospective studies• These studies try to assess a woman’s diet,

then follow her health progress to see if she develops breast cancer

• The diets of those who developed breast cancer are compared to those who do not

• Only (?) 1 prospective study has found firm evidence suggesting a fat and breast cancer link, and 1 has a negative link

_________________________________________________________

Prospective Studies

• NHANES (National Health and Nutrition Examination Survey): n = 3,145 women aged 25-50

• Nurses Health Study: n = 60,000+• Pooled Project: n = 300,000+• Norfolk (UK) study: n = 15,000+• AARP: n = 250,000+• WHI Controls n = 30,000+• AARP and WHI available soon

_________________________________________________________

The Nurses Health Study, Fat and Breast Cancer_________________________________________________________

60,000 women, followed for 10 years

Prospective study

Note that the breast cancer cases were eating less fat

Donna Spiegelman, the NHS statistician

Clinical Trials• The lack of consistent (even positive) findings

led to the Women’s Health Initiative

• Approximately 60,000 women randomized to two groups: healthy eating and typical eating

_________________________________________________________

WHI Diet Study Objectives_________________________________________________________

Objections to WHI

• Cost ($100,000,000+)• Whether Americans can really

lower % Calories from Fat to 20%, from the current 35%

• Even if the study is successful, difficulties in measuring diet mean that we will not know what components led to the decrease in risk.

_________________________________________________________

Ross Prentice of the WHI

How do we measure diet in humans?

• 24 hour recalls

• Diaries

• Food Frequency Questionnaires (FFQ)

_________________________________________________________

Walt Willett has a popular book and a popular FFQ

Objections to the 24 hour recall

• Only measures yesterday’s diet, not typical diet

• A single 24 hour recall finding a diet-cancer link is not universally scientifically acceptable

• Need for repeated applications

• Expensive• Personal interview• Phone interview

_________________________________________________________

NHANES: Fat is Protective (?)

Typical % Calories from Fat

Cases: 35%

Controls: 37%

_________________________________________________________

_________________________________________________________NHANES: Calories are Protective (?)

Typical Calories

Cases: 1,300

Controls: 1,500

Food diaries

• Hot topic at NCI

• Only measures a few day’s diet, not typical diet

• A single 3-day diary finding a diet-cancer link is not universally scientifically acceptable

• Need for repeated applications

• Induces behavioral change??

_________________________________________________________

1350140014501500155016001650170017501800

FFQ

Diar

y 1

Diar

y 2

Diar

y 3

Diar

y 4

Diar

y 5

Dia

ry 6

Typical (Median) Values of Reported Caloric Intake Over 6 Diary Days: WISH Study

The Food Frequency Questionnaire

• Do you remember the SAT?

_________________________________________________________

The Pizza Question_________________________________________________________

The Norfolk Study with ~Diaries and FFQ_________________________________________________________

15,000 women, aged 45-74, followed for 8 years

163 breast cancer cases

Diary: p = 0.005

FFQ: p = 0.229

Directly contradicts NHANES (women aged 25-50).

Summary

• FFQ does not find a fat and breast cancer link

• 24 hour recalls and diaries are expensive• They have found links, but in opposite directions• Diaries also appear to modify behavior

• Question: do any of these things actually measure dietary intake? • How well or how badly?

• These are statistical questions!

_________________________________________________________

Do We Know Who We Are?

• Karl Pearson was arguably the 1st great modern statistician

• Pearson chi-squared test

• Pearson correlation coefficient

_________________________________________________________

Karl Pearson at age 30

Do We Know Who We Are?

• Pearson was deeply interested in self-reporting errors

• In 1896, Pearson ran the following experiment.

• For each of 3 people, he set up 500 lines of a set of paper, and had them bisected by hand

_________________________________________________________

A gaggle of lines

Pearson’s Experiment

• He then had an postdoc measure the error made by each person on each line, and averaged

• “Dr. Lee spent several months in the summer of 1896 in the reduction of the observations ”

_________________________________________________________

A gaggle of lines, with my bisections

Pearson’s Personal Equations

• Pearson computed the mean error committed by each individual: the “personal equations “

• He found: the errors were individual. His errors were to the right, Dr. Lee’s to the left

_________________________________________________________

Karl Pearson in later life

What Do Personal Equations Mean?

• Given the same set of data, when we are asked to report something, we all make errors, and our errors are personal

• In the context of reporting diet, we call this “person-specific bias “

_________________________________________________________

Laurence Freedman of NCI, with whom I did the work

What errors do FFQ Make?

• Pretend you and I eat the same amount of fat on average.

• We each fill out a FFQ twice, take the mean fat intake from the FFQ, and get different answers. Why?

_________________________________________________________

• Random Error: I will give different answers each time

• No one reports all the ice cream he/she eats (fixed bias due to societal factors)

• Personal Equation: we all report differently

Model Details for Statisticians

• The model in symbols

• Note how existence of person-specific bias means that variance of true intake is less than one would have thought

_________________________________________________________

iij 0 1

2r

2ε

i

i

ij

i

i

j

Q =β +β + + ;=true intake;

=personal equation=Normal(0,σ );=random error =Normal(0,

rX

ε

ε σ

rX

)

Our Hypothesis

• We hypothesized that when measuring Fat intake• The personal equation, or person-specific bias,

unique to each individual, is large and debilitating.

• The problem: the actual variability in American diets is much smaller than suspected.

• If true, the hypothesis says that one cannot really do an epidemiologic study for total energy or total fat, with any degree of success for cancer

_________________________________________________________

Can We Test Our Hypothesis?

• We need biomarker data that are not much subject to the personal equation

• There is no biomarker for Fat

• There are biomarkers for energy (calories) and Protein

• We expect that studies are too small by orders of magnitude

_________________________________________________________

Biomarker Data Protein:

Available from a number of European studies

Calories and Protein: Available from NCI’s

OPEN study Results are

surprising

Victor Kipnis was the driving force behind OPEN

_________________________________________________________

Sample Size Inflation There are formulae for how large a study

needs to be to detect a doubling of risk from low and high Fat/Energy Diets

These formulae ignore the personal equation We recalculated the formulae accounting for

Random error in repeated FFQ Societal factors causing underreporting in general Pearson’s personal equation: we report differently

_________________________________________________________

Biomarker Data: Sample Size Inflation

0

2

4

6

8

10

12Protein

Calories

%-

Protein

_________________________________________________________

If you are interested in the effect of calories on health, multiply the sample size you thought you needed by 11. For protein, by 4.5

Relative Odds Suppose high fat/energy/? diets lead

to twice the risk of breast cancer compared to low fat/energy

This is called the Relative Risk What is the risk we would observe

with the FFQ?

_________________________________________________________

Relative Risk_________________________________________________________

If high calories increases the risk of breast cancer by 100% in fact, and you change your intake dramatically, the FFQ thinks doing so increases the risk by 4%

1

1.2

1.4

1.6

1.8

2

Relative Risk ForChanging Your Food

Intake

True: 2.00

ObservedProtein: 1.09ObservedCalories: 1.04

Result: It is not possible to tell if changing your absolute caloric intake, or your fat intake, or your protein intake will have any health effects

Relative Risk, Food Composition_________________________________________________________

If high protein (fat) increases the risk of breast cancer by 100%, your calories remain the same, you dramatically lower your protein (fat) intake, then FFQ thinks your risk increases by 20%-30%

1

1.2

1.4

1.6

1.8

2

Relative Risk for FoodComposition

True: 2.00

ObservedProteinDensity: noenergy effect:1.31ObservedProteinDensity, energyeffect: 1.20

Result: It is very difficult to tell if changing your food composition while maintaining your caloric intake will have any health effects

Summary Trying to establish a Fat and Breast

Cancer link has proved difficult Standard instruments hide effects 24 hour recalls have found effects,

but are very expensive Diaries may(?) change behavior:

difficult to believe what they say There is hope to analyze food

composition, not absolute intakes

_________________________________________________________

Summary The AARP Study: 250,000+

women, by far the greatest number in any study

My best case conjecture: Huge size statistical

significance FFQ small measured

increase in risk for dramatic behavioral change

Statistician’s dream: use Pearson’s idea to get at the true increase in risk

_________________________________________________________

A happy statistician dreaming about AARP

Summary The WHI Controls Study:

30,000+ women All with > 32% Calories

from Fat via FFQ Also includes diaries Will be able to compare

diaries and FFQ How many studies with

30,000+ diaries can we afford?

_________________________________________________________

A happy statistician doing field biology in Northwest Australia (the Kimberley)

Summary

WHI, 2005, clinical trial My best case conjecture:

Probably no statistical effects (?) Even if so, the FFQ is so bad that we will not know what to

do: Decrease Fat? Decrease saturated Fat? Eat more grain? Eat more veggies (yuck)?

_________________________________________________________


Diet is incredibly hard to measure Even 100% increases in risk cannot be

seen in large studies If you read about a diet intervention,

measured by a FFQ, and it achieves statistical significance multiple times: wow!

_________________________________________________________


Much work at NCI and WHI and EPIC on new ways of measuring diet

EPIC may be a model, because of the wide distribution of intakes

_________________________________________________________

Reporting Biases FFQ are not very

good for measuring caloric intake

We do not want to admit our pizza, ice cream, etc.

_________________________________________________________

Reporting Biases 24 hour recalls are

not very good for measuring caloric intake

They are better than FFQ (less bias, for example), but they still are not very good

_________________________________________________________

Reporting Biases FFQ are better for

% Calories from Protein

Our food composition is better known to us than the amounts

Inflation of sample size only 2.3, not 4.5 as for actual protein

_________________________________________________________

measuring dietary intake

Documents

fatbreast cancer link

breast cancer cases

intake of fat

saturated fat intake

breast cancerthe diets

nurses health study

whi controls n

current fat intakesa