finding dietary patterns in the uk - uspmarlyac/palestra.pdf · • vegetables • fruit/nuts ......

Finding Dietary Patterns in the UK

Michael FaheyNutritional Epidemiology Group

In collaboration with:

Andy CowardStable Isotopes Group

Chris Thane, Gemma BramwellNutritional Epidemiology Group

Objectives

To find dietary patterns in food consumption that:

are representative of the UK population

are interpretable and valid

3) can be used to create test meals

Research steps

Random sample of population food groups

Multivariate statistical analysis

Interpret patterns by examining average consumption of the food groups

Divide sample into mutually exclusive sub-groups that have different dietary profiles

Motivation

Nutritional data is massive and messy

Food eaten together have interactive effects on the bioavailabilityof nutrients, and possibly on disease risk

Association between a single food and disease incidence isdifficult to accurately estimate, e.g. need to adjust for intakes ofmany other foods

Effect of measurement error not well understood when there ismore than one food involved

What is a dietary pattern?

Two different approaches can be found in the epidemiologicalliterature:

1) Create an index based upon existing knowledge and scoreindividuals accordingly, e.g. the Healthy Eating Index

2) Use multivariate statistical methods to find patterns that existin data

These approaches differ with respect to the emphasis that is placedupon uncovering what people actually eat, and to the extent that apriori information is used.

Data

National Diet and Nutrition Survey of Adults (NDNS), 2000-01

population-based survey with representative sampling of 1724adults (958 women and 766 men)

seven-day weighed diet records used to measure consumption ofall food items

food items were grouped to create 25 food variables measuringconsumption of major food groups

log of daily food intakes (g) used for 20 food groups, and adichotomous variable (consumer or not) used for five food groupsfor which food non-consumption was >=50%

• Vegetables

• Fruit/nuts

• Potatoes

• Wholegrain cereals

• Refined cereals

• Pasta & rice

• Snack foods = (biscuits,

cakes, crisps, chocolate)

• Puddings*

• Sugars

• Butter*

• Fat spreads

• Whole milk*

• Skimmed milk

• Dairy products

• Eggs

• Fish (excludes deep-fried fish)

• White meat

• Red meat

• Fast foods = (burgers, fish &

chips, pizza, kebabs)

• Soups*

• Coffee & tea

• Soft drinks

• Alcohol

• Fruit juice*

• Water

Food Groups

Table 1 Sample Characteristicsby Sex, NDNS Adults 2000-1

N=958N=766Age (%)

2932London & South East

5164BMI (%) >=25

Women Men

33 31Smoker(%)

3736Central, South West & Wales

3432Scotland & North

Region (%)

192055-64

515135-54

222125-34

8818-24

Rationale for statistical method

If there exist sub-groups in a sample of individuals that aredistinguished by their dietary profiles, then we would expect theprobability distributions describing their joint food consumption tobe different .

A finite mixture model (FMM) is well-suited to finding such sub-groups

has characteristics in common with both conventional clusteranalysis and factor analysis

like cluster analysis, “clusters” of similar individuals are found

the underlying statistical theory is based upon latent variables,like factor analysis

Probabilistic Classification (1)

• An FMM analyses the mix of sub-groups in the entire sample, anduncovers the likely sub-groups by estimating their probabilitydensity functions

• Requires estimating probability distributions

• E.g. for a single continuous variable, Y, we need to estimate µYand σ2

Y

• Then the probability density for a univariate normal distribution,ƒ( Y | µ Y , σY), can be used to estimate probabilities

Probabilistic Classification (2)

• For Y = Y1, Y2,…. YK continuous food variables, we need toestimate µ Y and σ2

Y for each of the K variables

• In general, one estimates the variances and covariances amongthe elements of Y, represented by the matrix ΣY

• The function, ƒ ( Y | µ Y , Σ Y ), describes the multivariateprobability density

The Finite Mixture Model 1

The model expresses the probability density of our observed data,ƒo , as a mixture of K multivariate normals:

.... ...2211 KKo ffff πππ ++=Each of the ƒK above are probability density functions, e.g. functions describing a bell-shaped curve.

Each ƒK represents one dietary pattern.

The πK denote the proportion of the sample belonging to each pattern.

Example mixture: two sub-groups

This mixture below can be decomposed into the sumof two normal distributions:

fo = 0.75 N(0,1) + 0.25 N(4,1)

0 4

The Finite Mixture Model 2

An FMM is a probabilistic (or model-based)method

Mutually exclusive sub-groups are created byassigning each individual to the one giving rise totheir data with greatest probability

Probabilistic classification 3

Let’s assume that three sub-groups or clusters wereidentified. The highlighted cell indicates the cluster thatthese three individuals would be assigned to.

Assignment probabilities

Individual cluster 1 cluster 2 cluster 3

1 0.40 0.30 0.30 2 0.20 0.75 0.05 3 0.05 0.10 0.85

Advantages of Model-basedMethods

• Can use model fit criteria to identify the numberof patterns or clusters

• Can take account of covariates in the patternidentification step

• Can predict the uncertainty in the classification ofindividuals to patterns

Need for flexible assumptions

Scientists are sometimes described as being expert atsolving problems that are convenient to solve, like thedrunk looking for keys in the light of the lamp post, butnot in the dark where they were dropped.

when knowledge is weak about the shape and size ofunknown clusters it is preferable not to make strongassumptions

there are very efficient methods to find clusters thatare circles of same size (in two dimensions) – theseapproaches may not be informative

Model Fitting 1

Compare models making different assumptions aboutthe size and shape of clusters:

A) Clusters are (hyper) spheres and are the same size (conventional clustering methods)

B) Clusters are (hyper) ellipses and are the same size

C) Clusters are (hyper) ellipses and of different sizes

Model Fitting 2

A goodness of fit index (BIC) is used to choose whichof the 3 models is better

The BIC is also used to choose the number of clusters(smaller values are better)

Food intakes were adjusted for age and total energy

Model Choice 1

Begin by fitting a solution with only one cluster formodel type (A)

• Refit the model to have additional clusters, stopping ata maximum of 6 clusters

• Repeat above steps for model types (B) and (C)

• Choose the model with the smallest BIC from the 18possible alternatives

Model Fit (A): same sized spheres

Choose the model with the smallest value of the BIC:

Parity BIC No. parameters

1 81107 1762 79980 2083 79500 2404 79190 2725 79014 3046 78818 336

The 6-cluster solution is the preferred one of this type.

Model Fit (B): same sized ellipses



1 78605 1952 77639 2273 76803 2594 76482 2915 76703 3236 - -


Model Fit (C): ellipses of different sizes



1 78605 1952 76343 2473 75189 2994 73657 3515 74541 4036 - -


Model Choice: summary

Model fit was poor using the conventional clusteringassumptions and leads one to choose too many clusters

Of the alternative models examined, fit was best whenclusters were allowed to be ellipses of different sizes

A model of this type with 4 clusters was the best among allalternatives

Next step: interpret each cluster’s dietary pattern byexamining average food consumption

Radar Graphs

• Show how the dietary patterns (clusters) differ interms of food intakes

• The plotted points, for each food group, are calculatedas:

100 x (predicted mean / grand mean)

• Food groups are ordered clockwise: plants, refined,fats, dairy, animal, sauces, drinks

Cluster 1, women

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads

Whole milk*Skimmed milkDairy

Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 2, women

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 3, women

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 4, women

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

39293322London & South East (%)

4321

39371541Smoker (%)

65698155Non-manual Occupation (%)

45515451BMI >=25 (%)

75667572EI/BMR <=1.35 (%)

6.337.186.896.82Median Energy (MJ/d)

39424638Median Age (yrs)

16222933Size of cluster (% of sample)

Cluster

Characteristic

Table 3 Cluster Characteristics, Women

Cluster 1, men

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 2, men

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 3, men

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 4, men

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 5, men

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

Cluster 6, men

0

50

100

150

200

250Vegetables

Fruit/nutsPotatoes

Wholegrain cereals

Refined cereals

Pasta/rice

Snack foods

Puddings*

Sugars

Butter*

Fat spreads


Eggs

Fish

White meat

Red meat

Fast foods

Soups*

Coffee/tea

Soft drinks

Alcohol

Fruit juice*Water

38

44

25

59

61

9.67

41

28

1

Smoker (%)

Non-manual Occupation (%)

London & South East (%)

BMI >=25 (%)

EI/BMR <=1.35 (%)

Median Energy (MJ/d)

Median Age (yrs)

Size of cluster (% of sample)

Characteristic

21 14 18 22 50

49 71 63 74 32

4342364021

6572657160

7346536176

8.7210.2610.2410.008.80

3652454339

811141920

65432

Cluster

Table 4 Cluster Characteristics, Men

(59,74)(64,78)(86,99)(58,67)(95% CI)

8312585906272Men(56,110)(100,150)(75,94)(83,97)(55,68)(67,77)

(95% CI)

62

1

Women

Gender

--677192

65432

Mean daily vitamin K intake (µg) by

cluster

Table 5 Vitamin K intake by cluster

Pattern Validation

Validation is an important aspect

In large epidemiologic studies split-samplevalidation is convenient; smaller studies could usecross-validation and / or re-sampling

Among men, patterns similar to 3 of the femaleclusters were found

Average classification probabilities were veryhigh (>= 0.93) for all clusters

Conclusions

Four dietary patterns were found among women and theirdiets could be labelled: 1) “convenience”, 2) “cosmopolitan,3) “animal oriented”, 4) “moderate eaters”.

Finite mixture modeling is a flexible approach to patternidentification

Variables of different scale types can be included,covariate adjustment is possible, and assumptions about thenature of clusters can be examined explicitly

One disadvantage is that the method is heavilyparameterised

Future Work

choice of food groups is important and arbitrary

improved definition of fast and snack food groups

different food groups may be needed for different populations, e.g.the elderly

inclusion of certain foods for which there are truly many non-consumers suggests using a different probability approach

use of other dietary assessment methods, e.g. FFQs, suggestallowing correlated measurement error among foods

Alcohol non-consumption

Next: destination Cuba!

Comparison with previous work

Pryer’s analysis of 1986-7 NDNS Adults

Pryer probably found more clusters

Pryer do not report a low consuming pattern

Pryer’s “healthy” female pattern has above average intakes of redand white meat

Comparison with previous work

There are important methodological differences between Pryer’swork and ours.

Pryer was restricted to finding clusters of same sized “circles”

FMM yields exhaustive classification

Pryer’s analysis gave less weight to beverages owing tostandardisation

Pryer’s is a greater mix of patterns in dietary composition andbody size/energy requirements

classification by FMM is probabilistic not deterministic

Association between foods andpatterns

Association between a food group and a pattern can be quantifiedby the amount of variation in intake explained by patternmembership (R2). The six greatest values of R2 are listed below.

For women they were:

fish (55%), coffee & tea (27%), refined cereals (20%),wholegrain cereals (19%), skimmed milk (18%), fast food (16%)

For men they were:

fish (46%), coffee & tea (44%), fruit (31%), wholegrain cereals(27%), fast food (22%), red meat (19%)

Table 6a Predicted mean intakes (g)or % who consume by cluster, women

ClusterFood Group

2.1

33%

30

10

56

9.1

25

18

56

1

2.5

37%

26

24

21

62

56

112

117

2

2.02.4Sugars

26%25%Puddings

2027Snack food

1818Pasta/rice

1661Refined cereals

1515Wholegrain cereals

1221Potatoes

3622Fruit/nuts

5972Vegetables

43

40%51%40%42%Butter

2.23.83.23.8Fat spreads

41%49%27%56%Whole milk

163919131Skimmed milk

18164216Dairy products

ClusterFood Group

27

20

0.0

4.4

1

27

21

13

4.2

2

8.447Red meat

1524White meat

4.719Fish

3.66.0Eggs

43

Table 6b Predicted mean intakes (g)or % who consume by cluster, women

27793790Fast foods

34%29%42%35%Soups

84541640639Coffee/tea

34493942Soft drinks

9.021348.8Alcohol

46%46%58%36%Fruit juice

1053119734Water

ClusterFood Group

1 2 43

Table 6c Predicted mean intakes (g)or % who consume by cluster, women

47%

2.7

31%

36

15

89

13

31

22

79

1

Butter

Sugars

Puddings

Snack food

Pasta/rice

Refined cereals

Wholegrain cereals

Potatoes

Fruit/nuts

Vegetables

Food Group Cluster

42%

3.5

41%

47

64

39

78

49

93

107

3

45%

4.7

34%

28

16

73

22

64

41

91

4

28%43%26%

1.66.10.0

26%34%20%

185019

33488.3

281990

11945.3

121019

101095.4

4211747

652

Table 7a Predicted mean intakes (g)or % who consume by cluster, men

80

24

0.0

8.6

22

38

54%

5.7

47%

1

Red meat

White meat

Fish

Eggs

Dairy products

Skimmed milk

Whole milk

Fat spreads

Butter

Food Group Cluster

62

60

10

4.5

42

199

31%

6.0

42%

3

80

27

24

10

22

42

47%

7.2

45%

4

231336

362126

3.6122.5

4.25.47.6

8.84911

198127

31%38%53%

2.94.96.1

28%43%26%

652

Table 7b Predicted mean intakes (g)or % who consume by cluster, men

17

40%

75

65

655

36%

133

1

Water

Fruit juice

Alcohol

Soft drinks

Coffee/tea

Soups

Fast foods

Food Group Cluster

62

57%

115

48

688

35%

89

3

62

52%

114

50

686

47%

109

4

601398.8

41%66%19%

306774

642519

25474729

23%40%12%

5037183

652

Table 7c Predicted mean intakes (g)or % who consume by cluster, men

How Many Clusters? (1)

The Bayesian Information Criterion (BIC) can beused to choose between models that differ interms of number of clusters and/or clusteringmethod.

Its use is analogous to the deviance for otherregression models. It is calculated as:

NnparLL log).(2BIC +−=

How many clusters? (2)


No. clusters log like BIC No. parameters

1 -126392 253559 1022 -124587 250648 1943 -124069 250312 2864 -123640 250154 3785 -123374 250321 470

The 4-pattern model is the preferred one of this type.

Intervention studies

food or nutrient based RCTs have been ineffective

e.g. beta-carotene and lung cancer/CVD, fat/fibre and colorectaladenomas

dietary patterns based interventions may be more effective

e.g. 60% reduction in total mortality among those randomised toa Mediterranean diet in the Lyon Diet Heart Study

Radar Graphs (1)

• to show how the dietary patterns (clusters) differ in terms of food intakes

• the plotted points, for each food group, are calculated as:100 x (predicted mean / grand mean)

• food groups are ordered clockwise: plants, refined, fats, dairy, animal,sauces, drinks

finding dietary patterns in the uk - uspmarlyac/palestra.pdf · • vegetables • fruit/nuts ......

Documents