miracles can be achieved by mixture modelling of messy data

30
les can be achieved by mixture modelling of messy Chromopainter/FineSTRUCTURE/Globetrotter

Upload: corina

Post on 05-Jan-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Miracles can be achieved by mixture modelling of messy data. Chromopainter/FineSTRUCTURE/Globetrotter. Mixture modelling of an English palette. How does the information from metagenomics compare with what we would know if we had full genome sequences?. Not quite a metagenomics experiment. - PowerPoint PPT Presentation

TRANSCRIPT

Miracles can be achieved by mixture modelling of messy data..

Chromopainter/FineSTRUCTURE/Globetrotter

Mixture modelling of an English palette

How does the information from metagenomics compare with what we would know if we had full

genome sequences?

Qizhi Cao, Jianzhong Zhang, China CDC, Xavier Didelot, Imperial College

Not quite a metagenomics experiment

18 Helicobacter pylori sequenced from the same biopsy.

All of them were different.2 clades = 2 infections.

Reconstruction of recombination and mutation using ClonalFrame.

Years in past

Average import size 940bp

Interspersions in recombination events

Functional characterization of recombination events

Proportion of genes recombined

Ancestral sequences show interactions between strains in the pastAnd suggest longer mixed infection.

Classical metagenomic questionsHow many different infections?

What proportion of the population does each infectionaccount for?

Microevolutionary questionsWhat diversification has each infection undergone?

What is the functional effect of diversification?

Heroic questionsHow many infections were there in the past?

Ecological questions

Which strains thrive in the same environment?

Which strains thrive in the presence of each other?

Which strains competitively exclude one another?

What determines patterns of „succession“?

What determines infection rates by different strains?

Ecological genetic questions

How do strains adapt to new environments?

Fran Colles, University of Oxford

Campylobacter in chickens

Epidemiology of infection amongst a free-range broiler breeder flock: two stages of infection

Colles et al. (2011) PLoS One 6(12):e22825

A rapid turnover of Campylobacter STs amongst individual birds

Chicken Week of the year8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 1 2 3 4 5 6

23456789

10

49 661 148751 814 176453 827 3120

443 828 Negative538 945573 958574 1089586 1090607 1257

Sequence types:

Colles et al, Unpublished

Clonal complexes isolated from a broiler breeder flock over time

9 19 26 30 9 16 23 1 8 15 22 29 5 26 4 10 17 24 1 7 14 21 28 5 13 19 27 2 9 16 23 31 6 13 20 27 4 11 18 25 1 8 15 22 29 6 13 20 4 10 17 31 8

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

ST-828 complexST-692 complexST-661 complexST-607 complexST-574 complexST-573 complexST-49 complexST-45 complexST-443 complexST-257 complexST-21 complexST-1287 complexST-1275 complexST-1150 complex

Colles et al, Unpublished

Strains are sampled from a population...

Vibrio parahaemolyticus with Yujun Cui and Ruifu Yang

All the strains in CG1

The strains in CG1+S093

The strains in CG1+S093+CG2

The strains in CG1+S093+CG2+1 unrelated strain

SNP density in 1Kbp windows

Oceanic gene pools

Amongst 53 unrelated strains, strong non-random associationsbetween loci are almost entirely due to close genetic linkage

Not all populations are that simple...

Host-restricted and multihost lineages of C. jejuni

952

22

42

45

177

682

48

1275

661692

61

206

354

257

1034

57421

Multihost lineages of C. coli and C. jejuni

Association study Method: Word analysis ST-45 complex

number word host1(total=14)host2(total=9)pvalue Info. Locus>388 GTTTAAAATTATTTAAATAGAAAGATATTT 4 9 1.50E- 06 Partial match found CAMP0030>49 CCATCGATTAAATATAACTTACTTTATCAT 5 9 4.30E- 05 Partial match found CAMP0043>6840 TAAAGCTTTAAAAAGTATTTTGTTTAAAAT 5 9 4.30E- 05 Partial match found CAMP0059>1130 GATTTATGATTTCAAAAAATTTTCAATAAA 2 6 4.30E- 05 Partial match found CAMP0092>1371 TATTTAAATAGAAAGATATTTTATGAAAAA 4 9 1.50E- 06 Partial match found CAMP0116

56

90

70

4

91

92

124

32

79

39

84

111

181

81

131

52

103

57

54

45

102

55

114

100

128

104

82

112

119

34

38

44

48

51

52

31

33

39

46

54

40

47

55

37

41

43

30

45

32

49

36

50

53

56

35

42

0.01

9034 host associated words in ST-45 complex

Map to 99 genes in total but 76% of words map to 10 contiguous genes (region 3)