finding dietary patterns in the uk - uspmarlyac/palestra.pdf · • vegetables • fruit/nuts ......
TRANSCRIPT
Finding Dietary Patterns in the UK
Michael FaheyNutritional Epidemiology Group
In collaboration with:
Andy CowardStable Isotopes Group
Chris Thane, Gemma BramwellNutritional Epidemiology Group
Objectives
To find dietary patterns in food consumption that:
are representative of the UK population
are interpretable and valid
3) can be used to create test meals
Research steps
Random sample of population food groups
Multivariate statistical analysis
Interpret patterns by examining average consumption of the food groups
Divide sample into mutually exclusive sub-groups that have different dietary profiles
Motivation
Nutritional data is massive and messy
Food eaten together have interactive effects on the bioavailabilityof nutrients, and possibly on disease risk
Association between a single food and disease incidence isdifficult to accurately estimate, e.g. need to adjust for intakes ofmany other foods
Effect of measurement error not well understood when there ismore than one food involved
What is a dietary pattern?
Two different approaches can be found in the epidemiologicalliterature:
1) Create an index based upon existing knowledge and scoreindividuals accordingly, e.g. the Healthy Eating Index
2) Use multivariate statistical methods to find patterns that existin data
These approaches differ with respect to the emphasis that is placedupon uncovering what people actually eat, and to the extent that apriori information is used.
Data
National Diet and Nutrition Survey of Adults (NDNS), 2000-01
population-based survey with representative sampling of 1724adults (958 women and 766 men)
seven-day weighed diet records used to measure consumption ofall food items
food items were grouped to create 25 food variables measuringconsumption of major food groups
log of daily food intakes (g) used for 20 food groups, and adichotomous variable (consumer or not) used for five food groupsfor which food non-consumption was >=50%
• Vegetables
• Fruit/nuts
• Potatoes
• Wholegrain cereals
• Refined cereals
• Pasta & rice
• Snack foods = (biscuits,
cakes, crisps, chocolate)
• Puddings*
• Sugars
• Butter*
• Fat spreads
• Whole milk*
• Skimmed milk
• Dairy products
• Eggs
• Fish (excludes deep-fried fish)
• White meat
• Red meat
• Fast foods = (burgers, fish &
chips, pizza, kebabs)
• Soups*
• Coffee & tea
• Soft drinks
• Alcohol
• Fruit juice*
• Water
Food Groups
Table 1 Sample Characteristicsby Sex, NDNS Adults 2000-1
N=958N=766Age (%)
2932London & South East
5164BMI (%) >=25
Women Men
33 31Smoker(%)
3736Central, South West & Wales
3432Scotland & North
Region (%)
192055-64
515135-54
222125-34
8818-24
Rationale for statistical method
If there exist sub-groups in a sample of individuals that aredistinguished by their dietary profiles, then we would expect theprobability distributions describing their joint food consumption tobe different .
A finite mixture model (FMM) is well-suited to finding such sub-groups
has characteristics in common with both conventional clusteranalysis and factor analysis
like cluster analysis, “clusters” of similar individuals are found
the underlying statistical theory is based upon latent variables,like factor analysis
Probabilistic Classification (1)
• An FMM analyses the mix of sub-groups in the entire sample, anduncovers the likely sub-groups by estimating their probabilitydensity functions
• Requires estimating probability distributions
• E.g. for a single continuous variable, Y, we need to estimate µYand σ2
Y
• Then the probability density for a univariate normal distribution,ƒ( Y | µ Y , σY), can be used to estimate probabilities
Probabilistic Classification (2)
• For Y = Y1, Y2,…. YK continuous food variables, we need toestimate µ Y and σ2
Y for each of the K variables
• In general, one estimates the variances and covariances amongthe elements of Y, represented by the matrix ΣY
• The function, ƒ ( Y | µ Y , Σ Y ), describes the multivariateprobability density
The Finite Mixture Model 1
The model expresses the probability density of our observed data,ƒo , as a mixture of K multivariate normals:
.... ...2211 KKo ffff πππ ++=Each of the ƒK above are probability density functions, e.g. functions describing a bell-shaped curve.
Each ƒK represents one dietary pattern.
The πK denote the proportion of the sample belonging to each pattern.
Example mixture: two sub-groups
This mixture below can be decomposed into the sumof two normal distributions:
fo = 0.75 N(0,1) + 0.25 N(4,1)
0 4
The Finite Mixture Model 2
An FMM is a probabilistic (or model-based)method
Mutually exclusive sub-groups are created byassigning each individual to the one giving rise totheir data with greatest probability
Probabilistic classification 3
Let’s assume that three sub-groups or clusters wereidentified. The highlighted cell indicates the cluster thatthese three individuals would be assigned to.
Assignment probabilities
Individual cluster 1 cluster 2 cluster 3
1 0.40 0.30 0.30 2 0.20 0.75 0.05 3 0.05 0.10 0.85
Advantages of Model-basedMethods
• Can use model fit criteria to identify the numberof patterns or clusters
• Can take account of covariates in the patternidentification step
• Can predict the uncertainty in the classification ofindividuals to patterns
Need for flexible assumptions
Scientists are sometimes described as being expert atsolving problems that are convenient to solve, like thedrunk looking for keys in the light of the lamp post, butnot in the dark where they were dropped.
when knowledge is weak about the shape and size ofunknown clusters it is preferable not to make strongassumptions
there are very efficient methods to find clusters thatare circles of same size (in two dimensions) – theseapproaches may not be informative
Model Fitting 1
Compare models making different assumptions aboutthe size and shape of clusters:
A) Clusters are (hyper) spheres and are the same size (conventional clustering methods)
B) Clusters are (hyper) ellipses and are the same size
C) Clusters are (hyper) ellipses and of different sizes
Model Fitting 2
A goodness of fit index (BIC) is used to choose whichof the 3 models is better
The BIC is also used to choose the number of clusters(smaller values are better)
Food intakes were adjusted for age and total energy
Model Choice 1
Begin by fitting a solution with only one cluster formodel type (A)
• Refit the model to have additional clusters, stopping ata maximum of 6 clusters
• Repeat above steps for model types (B) and (C)
• Choose the model with the smallest BIC from the 18possible alternatives
Model Fit (A): same sized spheres
Choose the model with the smallest value of the BIC:
Parity BIC No. parameters
1 81107 1762 79980 2083 79500 2404 79190 2725 79014 3046 78818 336
The 6-cluster solution is the preferred one of this type.
Model Fit (B): same sized ellipses
Choose the model with the smallest value of the BIC:
Parity BIC No. parameters
1 78605 1952 77639 2273 76803 2594 76482 2915 76703 3236 - -
The 4-cluster solution is the preferred one of this type.
Model Fit (C): ellipses of different sizes
Choose the model with the smallest value of the BIC:
Parity BIC No. parameters
1 78605 1952 76343 2473 75189 2994 73657 3515 74541 4036 - -
The 4-cluster solution is the preferred one of this type.
Model Choice: summary
Model fit was poor using the conventional clusteringassumptions and leads one to choose too many clusters
Of the alternative models examined, fit was best whenclusters were allowed to be ellipses of different sizes
A model of this type with 4 clusters was the best among allalternatives
Next step: interpret each cluster’s dietary pattern byexamining average food consumption
Radar Graphs
• Show how the dietary patterns (clusters) differ interms of food intakes
• The plotted points, for each food group, are calculatedas:
100 x (predicted mean / grand mean)
• Food groups are ordered clockwise: plants, refined,fats, dairy, animal, sauces, drinks
Cluster 1, women
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 2, women
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 3, women
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 4, women
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
39293322London & South East (%)
4321
39371541Smoker (%)
65698155Non-manual Occupation (%)
45515451BMI >=25 (%)
75667572EI/BMR <=1.35 (%)
6.337.186.896.82Median Energy (MJ/d)
39424638Median Age (yrs)
16222933Size of cluster (% of sample)
Cluster
Characteristic
Table 3 Cluster Characteristics, Women
Cluster 1, men
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 2, men
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 3, men
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 4, men
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 5, men
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
Cluster 6, men
0
50
100
150
200
250Vegetables
Fruit/nutsPotatoes
Wholegrain cereals
Refined cereals
Pasta/rice
Snack foods
Puddings*
Sugars
Butter*
Fat spreads
Whole milk*Skimmed milkDairy
Eggs
Fish
White meat
Red meat
Fast foods
Soups*
Coffee/tea
Soft drinks
Alcohol
Fruit juice*Water
38
44
25
59
61
9.67
41
28
1
Smoker (%)
Non-manual Occupation (%)
London & South East (%)
BMI >=25 (%)
EI/BMR <=1.35 (%)
Median Energy (MJ/d)
Median Age (yrs)
Size of cluster (% of sample)
Characteristic
21 14 18 22 50
49 71 63 74 32
4342364021
6572657160
7346536176
8.7210.2610.2410.008.80
3652454339
811141920
65432
Cluster
Table 4 Cluster Characteristics, Men
(59,74)(64,78)(86,99)(58,67)(95% CI)
8312585906272Men(56,110)(100,150)(75,94)(83,97)(55,68)(67,77)
(95% CI)
62
1
Women
Gender
--677192
65432
Mean daily vitamin K intake (µg) by
cluster
Table 5 Vitamin K intake by cluster
Pattern Validation
Validation is an important aspect
In large epidemiologic studies split-samplevalidation is convenient; smaller studies could usecross-validation and / or re-sampling
Among men, patterns similar to 3 of the femaleclusters were found
Average classification probabilities were veryhigh (>= 0.93) for all clusters
Conclusions
Four dietary patterns were found among women and theirdiets could be labelled: 1) “convenience”, 2) “cosmopolitan,3) “animal oriented”, 4) “moderate eaters”.
Finite mixture modeling is a flexible approach to patternidentification
Variables of different scale types can be included,covariate adjustment is possible, and assumptions about thenature of clusters can be examined explicitly
One disadvantage is that the method is heavilyparameterised
Future Work
choice of food groups is important and arbitrary
improved definition of fast and snack food groups
different food groups may be needed for different populations, e.g.the elderly
inclusion of certain foods for which there are truly many non-consumers suggests using a different probability approach
use of other dietary assessment methods, e.g. FFQs, suggestallowing correlated measurement error among foods
Alcohol non-consumption
Next: destination Cuba!
Comparison with previous work
Pryer’s analysis of 1986-7 NDNS Adults
Pryer probably found more clusters
Pryer do not report a low consuming pattern
Pryer’s “healthy” female pattern has above average intakes of redand white meat
Comparison with previous work
There are important methodological differences between Pryer’swork and ours.
Pryer was restricted to finding clusters of same sized “circles”
FMM yields exhaustive classification
Pryer’s analysis gave less weight to beverages owing tostandardisation
Pryer’s is a greater mix of patterns in dietary composition andbody size/energy requirements
classification by FMM is probabilistic not deterministic
Association between foods andpatterns
Association between a food group and a pattern can be quantifiedby the amount of variation in intake explained by patternmembership (R2). The six greatest values of R2 are listed below.
For women they were:
fish (55%), coffee & tea (27%), refined cereals (20%),wholegrain cereals (19%), skimmed milk (18%), fast food (16%)
For men they were:
fish (46%), coffee & tea (44%), fruit (31%), wholegrain cereals(27%), fast food (22%), red meat (19%)
Table 6a Predicted mean intakes (g)or % who consume by cluster, women
ClusterFood Group
2.1
33%
30
10
56
9.1
25
18
56
1
2.5
37%
26
24
21
62
56
112
117
2
2.02.4Sugars
26%25%Puddings
2027Snack food
1818Pasta/rice
1661Refined cereals
1515Wholegrain cereals
1221Potatoes
3622Fruit/nuts
5972Vegetables
43
40%51%40%42%Butter
2.23.83.23.8Fat spreads
41%49%27%56%Whole milk
163919131Skimmed milk
18164216Dairy products
ClusterFood Group
27
20
0.0
4.4
1
27
21
13
4.2
2
8.447Red meat
1524White meat
4.719Fish
3.66.0Eggs
43
Table 6b Predicted mean intakes (g)or % who consume by cluster, women
27793790Fast foods
34%29%42%35%Soups
84541640639Coffee/tea
34493942Soft drinks
9.021348.8Alcohol
46%46%58%36%Fruit juice
1053119734Water
ClusterFood Group
1 2 43
Table 6c Predicted mean intakes (g)or % who consume by cluster, women
47%
2.7
31%
36
15
89
13
31
22
79
1
Butter
Sugars
Puddings
Snack food
Pasta/rice
Refined cereals
Wholegrain cereals
Potatoes
Fruit/nuts
Vegetables
Food Group Cluster
42%
3.5
41%
47
64
39
78
49
93
107
3
45%
4.7
34%
28
16
73
22
64
41
91
4
28%43%26%
1.66.10.0
26%34%20%
185019
33488.3
281990
11945.3
121019
101095.4
4211747
652
Table 7a Predicted mean intakes (g)or % who consume by cluster, men
80
24
0.0
8.6
22
38
54%
5.7
47%
1
Red meat
White meat
Fish
Eggs
Dairy products
Skimmed milk
Whole milk
Fat spreads
Butter
Food Group Cluster
62
60
10
4.5
42
199
31%
6.0
42%
3
80
27
24
10
22
42
47%
7.2
45%
4
231336
362126
3.6122.5
4.25.47.6
8.84911
198127
31%38%53%
2.94.96.1
28%43%26%
652
Table 7b Predicted mean intakes (g)or % who consume by cluster, men
17
40%
75
65
655
36%
133
1
Water
Fruit juice
Alcohol
Soft drinks
Coffee/tea
Soups
Fast foods
Food Group Cluster
62
57%
115
48
688
35%
89
3
62
52%
114
50
686
47%
109
4
601398.8
41%66%19%
306774
642519
25474729
23%40%12%
5037183
652
Table 7c Predicted mean intakes (g)or % who consume by cluster, men
How Many Clusters? (1)
The Bayesian Information Criterion (BIC) can beused to choose between models that differ interms of number of clusters and/or clusteringmethod.
Its use is analogous to the deviance for otherregression models. It is calculated as:
NnparLL log).(2BIC +−=
How many clusters? (2)
Choose the model with the smallest value of the BIC:
No. clusters log like BIC No. parameters
1 -126392 253559 1022 -124587 250648 1943 -124069 250312 2864 -123640 250154 3785 -123374 250321 470
The 4-pattern model is the preferred one of this type.
Intervention studies
food or nutrient based RCTs have been ineffective
e.g. beta-carotene and lung cancer/CVD, fat/fibre and colorectaladenomas
dietary patterns based interventions may be more effective
e.g. 60% reduction in total mortality among those randomised toa Mediterranean diet in the Lyon Diet Heart Study
Radar Graphs (1)
• to show how the dietary patterns (clusters) differ in terms of food intakes
• the plotted points, for each food group, are calculated as:100 x (predicted mean / grand mean)
• food groups are ordered clockwise: plants, refined, fats, dairy, animal,sauces, drinks