comparative analysis of genome methylation in thermotogae isolates from deep-sea hydrothermal vents
DESCRIPTION
The phylum Thermotogae is characterized by the presence of extensive horizontal gene transfer (HGT). Highly similar genes are shared between genomes of different Thermotogae genera, other phyla (Firmicutes) or other kingdoms such as the Archaea [1]. Many of these organisms proliferate in hot extreme environments such as oil fields and hydrothermal vents. How HGT functions in these ecosystems is unclear, but phages might play a role as a transfer agent of genetic material. Thermotogae genomes contain CRISPR repeats, which are part of the defence machinery against phages. Another defence mechanism against phages is the restriction modifications system and genes related to this are found as well in several Ther- motogae genomes. The restriction modification system uses methyltransferase proteins to methylate bases of the DNA strand. Under a phage attack, this system detects the non-meth- ylated foreign DNA and utilizes restriction enzymes to degrade invading DNA. With the advancement of single-molecule, real-time (SMRT) sequencing it has become possible to detect- ed N4-methylcytosine (m4C) and N6-methyladenine (m6A) bases in bacterial genomes. Here we use SMRT genome sequencing to compare four Thermotogae isolates from deep-sea hydrothermal vents and compare their defence system set-up, including CRISPRs and base modifications, in order to understand the probable response to invading DNA. TRANSCRIPT
IntroductionThe phylum Thermotogae is characterized by the presence of extensive horizontal gene transfer (HGT). Highly similar genes are shared between genomes of different Thermotogae genera, other phyla (Firmicutes) or other kingdoms such as the Archaea [1]. Many of these organisms proliferate in hot extreme environments such as oil fields and hydrothermal vents. How HGT functions in these ecosystems is unclear, but phages might play a role as a transfer agent of genetic material. Thermotogae genomes contain CRISPR repeats, which are part of the defence machinery against phages. Another defence mechanism against phages is the restriction modifications system and genes related to this are found as well in several Ther-motogae genomes. The restriction modification system uses methyltransferase proteins to methylate bases of the DNA strand. Under a phage attack, this system detects the non-meth-ylated foreign DNA and utilizes restriction enzymes to degrade invading DNA. With the advancement of single-molecule, real-time (SMRT) sequencing it has become possible to detect-ed N4-methylcytosine (m4C) and N6-methyladenine (m6A) bases in bacterial genomes. Here we use SMRT genome sequencing to compare four Thermotogae isolates from deep-sea hydrothermal vents and compare their defence system set-up, including CRISPRs and base modifications, in order to understand the probable response to invading DNA.
Comparative analysis of genome methylation in Thermotogae
isolates from deep-sea hydrothermal ventsThomas Haverkamp1 ([email protected]), Lossouarn J2, Geslin C2, Nesbø CL1,3
Affili
Phylogenetic distance of Thermotogae isolates
Base modifications and DNA motifsTable 1. Modification and Motif analysis for four Thermotogae strains.
Strain (contigs)
Motif Modification Type
Motifs in
Genome
Fraction methylated
motifs
Mean score
Mean IPD#
Ratio
Mean Coverage
Marinitoga sp. 1137 (10)
TANCAY m6A 9852 0,95 104,2 4,42 82,4
GTNNAC m6A 3532 0,92 104,5 4,64 81,8
T. melanesiensis 431(3)
GATC m6A 5446 0,99 72,6 4,60 48,3
RTAYNNNNNNTNNCG m6A 520 0,95 70,6 5,13 48,0
CGNNANNNNNNRTAY m6A 520 0,94 66,9 4,75 48,5
CCGG m4C 2968 0,71 45,8 3,63 49,1
CGCC m4C 2462 0,62 44,6 3,07 52,3
Thermosipho sp. 1063 (3)$ Not Clustered – 3584808 0,09 37,4 – 81,2
Thermosipho sp. 1070 (1)$ CNNNTNCNNTAANATNG modified base 72 0,50 41,3 2,60 39,9
Modification and motif analysis was performed using the RS modification and motif analysis v1 pipeline [2]. This pipeline maps the SMRT subreads to the HGAP assembled genome and determines the inter pulse du-ration for each base and the likelihood that a specific base is modified. m6A methylated bases were identi-fied along the entire chromosome sequences of Thermosipho melanesiensis 431 and Marinitoga sp. 1137.
#) IPD: Inter Pulse Density$) Non-significant results
Identification of prophage sequences
Figure 3. Chromosome maps for Thermosipho melanesiensis 431 and Marinitoga sp. 1137. Rings from inside to outside indicate: 1) Chromosome position; 2) GC content; 3) GC skew; 4) rRNA operon, 5) annotated hypotheti-cal genes; 6) prophage regions. Note: the chromosome from Marinitoga sp. 1137 is based on Mauve ordered and concatenated contigs. Prophages were identified using the PHAST website [6]. rRNA genes, hypothetical genes, and genes found in prophage assigned genomic regions were extracted using CLC workbench. Each cat-egory of genes was then matched using BlastN against the chromosome / contigs and visualized with BRIG. The chromosome sequences of strains 1060 and 1070 were analysed using PHAST and were found not to contain any prophage regions.
ConclusionsThe analysis of four different Thermotogae genomes and there methylation pattern shows a clear difference between the methylated and non-methylated genomes. The methylated ge-nomes of T. melanesiensis 431 and Marinitoga sp. 1137 contain prophage elements. Second, the methylated genomes have coding genes for the restriction modification system as well, which is know as another phage defence system. Although all four genomes contained CRISPR regions and CRISPR associated genes, the composition of the CRISPR associated genes was dif-ferent between the non-methylated and methylated genomes. At this stage it is unclear how these defence system differences affect the populations of these bacteria and how it supports / suppresses the process of HGT.
References
1. Zhaxybayeva et al., 2009. PNAS 106: 5865 - 5870.2. Methylome Analysis Technical Note: http://tinyurl.com/mfo74u4 3. Makarova et al. 2011. J. Bacteriology 193: 6039 - 60564. Grissa et al., 2007. Nucleic Acids Res., 35: W52–W57.5. Makarova et al., 2011. Nature Rev. Microbiology 9: 467 - 4776. Zhou et al., 2011. Nucleic Acids Res. : 1 - 6
Affiliations
1. Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway.
2. Laboratory of Microbiology of Extreme Environments (LMEE), UMR 6197/CNRS/UBO IUEM, Plouzane, France
3. Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
Characteristics of CRISPR elementsTable 2: CRISPR repeat elements and typing of CRISPR associated genesStrain CRISPR locus CRISPR
type#
Positions(bp)
Spacers (n)
Typical repeat sequence
Tm 431 Crispr-1 - 137986 - 138686 9 GTTTCTACCTTACCTTGGAGGAATTGAAAC
Crispr-2 I-B 360035 - 360533 7 ATTTCAATTCCTCCAAGGTAAGGTAAAAAC
Crispr-3 - 754395 - 758275 55 ATTTCTATTCCTCATAGGTAGATTCTAAAC
Crispr-4 III-B 1638809 - 1639894 15 GTTTAGAATCTACCTATGAGGAATGGAAAC
Crispr-5 III-B 1651157 -1651956 12 GTTTCCATTCCTCATAGGTAGATTCTAAAC
Ts 1063 Crispr-1 - 125563 - 126130 8 GTTTCCATTCCTCATAGGTATGTTCTAAAC
Crispr-2 - 306496 - 310808 60 GTTAAAAAACCTAATTCCATAAATGGAATTCAAAC
Crispr-3 - 624605 - 625388 11 GTTTAGAACATACCTATGAGGAATGGAAAC
Crispr-4 - 909396 - 911235 27 GTTTCCATTCCTCATAGGTATGTTCTAAAC
TS 1070 Crispr-1 - 125564 - 126342 11 GTTTCCATTCCTCATAGGTATGTTCTAAAC
Crispr-2 - 306632 - 310795 57 GTTAAAAAACCTAATTCCATAAAATGGAATTCAAAC
Crispr-3 - 625089 - 626267 11 GTTTAGAACATACCTATGAGGAATGGAAAC
Crispr-4 - 923880 - 925386 22 GTTTCCATTCCTCATAGGTATGTTCTAAAC
Mar 1137 C23* - Crispr-1 - 336790 - 337823 13 GTTTCTATCTCTTTCAGAGAGCAGTTATATTCGGAT
C23 - Crispr-2 III-B 349422 - 351458 26 GTTTCTATCTCTTTCAGAGAGTAGTGATATTCGGAT
C23 - Crispr-3 III-B 367928 - 370483 33 GTTTCTATCTCTTTCAGAGAGTAGTGATATTCGGAT
C23 - Crispr-4 I-B 471993 - 472356 5 ATTTACATTCCAATATGGATTATTAAAGACC26- Possible
Crispr-5$ - 285006 - 285093 1 TTTGTAATTTTACCTTGGACACTCTGCGAG
CRISPRs were identified using CRISPRfinder [4]. To determine the CRISPR type we first screened the genome region around the CRISPR for the presence of CRISPR-associated genes and compared the gene order with the propose CRIS-PR types in [5]. We only find evidence for the presence of CRISPR types I-B and III-B in two of the isolates, Tm 431 and Mar 1137. For the genomes of Ts 1063 and Ts 1070 we did not conclusively identify the CRISPR system based on the classification scheme used [5), indicating that the CRISPR operating mechanism could be different.
#: CRISPR type was identified using the classification scheme as proposed by [5]. A dash means that a classification could not be established.*: Contig ID for the Marinitoga 1137 genome.
$: This repeat sequence was identified by CRISPRfinder and indicated as a possible CRISPR repeat.
Acknowledgements
We thank the Norwegian Sequencing Platform at the University of Oslo for sequencing our samples and sup-
port with the bioinformatic analysis.
Defense system genesFigure 2. Phage defence systems genes in four Thermotogae isolates. To identify defence system proteins in each genome we used a curated database containing 132 COGs present in different prokaryo-tic defence systems [3]. For each genome we used BlastP (blast+ v2.2.28) to match all protein sequences against the defence systems COGS database, and extracted only those sequences with an e-value below 1.0e-20. To identify the restriction modification system genes in the total data set, we checked the REBASE da-tabase annotations for T. melanesiensis BI429 and Marinitoga piezophila KA3 for reference. CRISPRs genes were identified by their annotations (e.g. CRISPR-associat-ed protein cas 1).
Kosmotoga olearia TBF 19.5.1
Petrotoga mobilis SJ95
Marinitoga sp. 1137
Marinitoga piezohila KA310098
Thermosipho sp. 1223
Thermosipho sp. 1074
Thermosipho sp. 1063
Thermosipho sp. 107077
T. africanus H17ap60334
T. africanus Ob7
99
99
T. melanesiensis 487
T. melanesiensis 430
T. melanesiensis 432
T. melanesiensis 43474
T. melanesiensis 433
T. melanesiensis 431
T. melanesiensis BI42976
96
0.1
Figure 1. Maximum likelihood phylogeny of the DNA-directed RNA polymerase beta subunit (rpoB) gene sequences from Thermotogae species used in our study and reference strains (425-491 bp). A 500 base pair alignment was used to constructed the tree with PhyML (Seaview v4) with the GTR model and 1000 replicates. Numbers at the nodes indicate bootstrap values (only nodes above 70 % are shown). Dots mark strains used in the present analysis. Green: closed genomes; red: contigs only. Black squares are ref-erence genomes.
Download your PDF