genome insights for bacterial species definition - wdcm.org dr. wen zhang ( genome... · • li,...
TRANSCRIPT
Genome insights for Bacterial species definition
National Institute for Communicable Disease Control and Prevention,
Chinese Center for Disease Control and Prevention
Wen Zhang [email protected]
2016-09
• Chinese Center for Disease Control and Prevention (China CDC) is a nonprofit institution working in the fields of disease control and prevention, public health management and provision of service.
Pathogen
Management Institution
Resource Center
Research
Mycobacterium tuberculosis
Vibrio cholerae
Streptococcus sp.
Yersinia pestis
Clostridium difficileBrucella sp.
How genome change our work in past 10 years?
•1、Genome: In modern molecular biology and genetics, a genome is the genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes, (the coding regions), the noncoding DNA and the genomes of the mitochondria and chloroplasts.
The explosive growth of genome data
2、DNA Sequencing technology
• determining the precise order of nucleotides within a DNA molecule
ATG TTC ATC CGA TCA ACG TGA
Gene FragmentGenome
Cell Chromosome
• 一次性对几百万到十亿条DNA分子进行测序的技术
First generation:Sanger sequencing
NGS (2nd gen) platforms
Illumina ( Solexa)
SoLID: Life technlogies (Applied Biosystems)
454/Roche (Pyrosequencing)
Ion Torrent (semiconductor hydrogen ion detection)
3rd gen platforms
Helicos
Pac bio
Oxford Nanopore
ABI 3730XL
Illumina HiSeq MiSeq
Roche/454 FLX 5500 SOLiD™ Ion Torrent
Pacific BioSciences
Helicos
Oxford Nanopore
National Center for Biotechnology Information , or NCBI, already house petabytes — millions of gigabytes — of data, and biologists around the world are churning out 15 petabases
Thousands of bacterial genomes are released in NCBI.
Our Publication in recent years• Wen Zhang, et al. Genomic study of the Type IVC secretion system in Clostridium difficile: Understanding C. difficile evolution via horizontal gene transfer. Genome, 10.1139/gen-2016--0053
• Chen, Chen, Wen Zhang, Han Zheng, Ruiting Lan, Haiyin Wang, Pengcheng Du, Xuemei Bai, Shaobo Ji, Qiong Meng, and Dong Jin. 2013. "Minimum core genome sequence typing of bacterial pathogens: a unified approach for clinical and public health microbiology." Review of. Journal of clinical microbiology 51 (8):2582-91.
• Du, Pengcheng, Bo Cao, Jing Wang, Wenge Li, Hongbing Jia, Wen Zhang, Jinxing Lu, Zhongjie Li, Hongjie Yu, and Chen Chen. 2014. "Sequence variation in tcdA and tcdB of Clostridium difficile: ST37 with truncated tcdA is a potential epidemic strain in China." Review of. Journal of clinical microbiology 52 (9):3264-70.
• Du, Pengcheng, Wen Zhang, Haiyin Wang, Chen Chen, and T Bureau. 2012. "Comparative genomic analysis of Escherichia coli O104: H4 stx2 prophage reveals a potential new method to identify virulence factors." Review of. Genome 55 (9):697-700.
• Han N, Qiang Y, Zhang W. ANItools web: a web tool for fast genome comparison within multiple bacterial strains.[J]. Database the Journal of Biological Databases & Curation, 2016, 2016
• Jiang, Hai, Pengcheng Du, Wen Zhang, Heng Wang, Hongyan Zhao, Dongri Piao, Guozhong Tian, Chen Chen, and Buyun Cui. 2013. "Comparative Genomic Analysis of Brucella melitensis Vaccine Strain M5 Provides Insights into Virulence Attenuation." Review of. PLoS ONE 8 (8):e70852.
• Jiang, Yi, Haican Liu, Haiyin Wang, Xiangfeng Dou, Xiuqin Zhao, Yun Bai, Li Wan, Guilian Li, Wen Zhang, and Chen Chen. 2013. "Polymorphism of Antigen MPT64 in Mycobacterium tuberculosis Strains." Review of. Journal of clinical microbiology 51 (5):1558-62.
• Li, Jing, Jing Ding, Wen Zhang, Yuanli Zhang, Ping Tang, Jian-Qun Chen, Dacheng Tian, and Sihai Yang. 2010. "Unique evolutionary pattern of numbers of gramineous NBS–LRR genes." Review of. Molecular Genetics and Genomics 283 (5):427-38. doi: 10.1007/s00438-010-0527-6.
• Lu, Liang, Douglas Chesters, Wen Zhang, Guichang Li, Ying Ma, Huailei Ma, Xiuping Song, et al. 2012. "Small Mammal Investigation in Spotted Fever Focus with DNA-Barcoding and Taxonomic Implications on Rodents Species from Hainan of China." Review of. PLoS ONE 7 (8):e43479. doi: 10.1371/journal.pone.0043479.
• Luke G. Barrett, Joel M. Kniskern, Natacha Bodenhausen, Wen Zhang, Joy Bergelson. 2009. "Continua of specificity and virulence in plant-host pathogen interactions: causes and consequences." Review of. New Phytologist 183:513-29.
• Zhang, Wen, Pengcheng Du, Han Zheng, Weiwen Yu, Li Wan, and Chen Chen. 2014. "Whole-genome sequence comparison as a method for improving bacterial species definition." Review of. The Journal of general and applied microbiology 60 (2):75-8.
• Zhang, Wen, Chengbo Rong, Chen Chen, and George F. Gao. 2012. "Type-IVC Secretion System: A Novel Subclass of Type IV Secretion System (T4SS) Common Existing in Gram-Positive Genus Streptococcus." Review of. PLoS ONE 7 (10):e46390. doi: 10.1371/journal.pone.0046390.
• Zhang, Wen, Xiaoqin Sun, Huizhong Yuan, Hitoshi Araki, Jue Wang, and Dacheng Tian. 2008. "The pattern of insertion/deletion polymorphism in Arabidopsis thaliana." Review of. Molecular Genetics and Genomics 280 (4):351-61. doi: 10.1007/s00438-008-0370-1.
• ZHANG, Wen, Wei Wen YU, Di LIU, Ming LI, Peng Cheng DU, Yi Lei WU, George F Gao, and Chen Chen. 2013. "T4SP: A Novel Tool and Database for Type IV Secretion Systems in Bacterial Genomes." Review of. Biomedical and environmental sciences: BES 26 (7):614-7.
• Zhang, Wen, Yuanyuan Zhang, Huajun Zheng, Yuanlong Pan, Haican Liu, Pengcheng Du, Li Wan, Jun Liu, Baoli Zhu, and Guoping Zhao. 2013. "Genome sequencing and analysis of BCG vaccine strains." Review of. PLoS ONE 8 (8):e71243.
Genome
Sequencing
Bioinformatics
16
Genomics Evolution
PathogenTranscriptom
ics
Research
Bacterial Species Definition
Genome Typing
《Bergey's Manual of Systematic Bacteriology》
—Bacterial Species Definition
Traditional method: based on phenotypic similarities and chemical characteristics, which are to some extent affected by environmental factors, such as temperature and pH, which can cause possible biases
—Bacterial Species Definition
• Modern genetic methods: based on fragment nucleotide sequences (16S and MLST) , which could be also biased by one or more sequencing errors
Biochemical PFGE
MLST 16S
—Bacterial Species Definition
Gen
om
e High resolution level on strain
Low error
No environmental impact factors
there are not two identical leaves in the world
Future Method?
Does ANI work for Bacterial Species Definition?
• Average Nucleotide Identity (ANI) : calculated from pair-wise comparisons of all sequences shared between any two strains
Average Nucleotide Identity (ANI)
• SGC > DGC
• Average ANI • 0.936 Species Level
• 0.836 Genus Level
• 0.789 Family Level
Cutoff 0.92 does not work for the following genus
• Shigella sp. 志贺氏菌属
• Brucella sp. 布鲁菌属
• Rickettsia sp. 立克次体属
• Yersinia sp. 耶尔森菌属
We developed the web version of ANItools(http://ani.mypathogen.cn/), which helps users directly get ANI values from online sources. A database covering ANI values of any two strains in a genus was also included (2773 strains, 1487 species and 668 genera). Importantly, ANItools web can automatically run genome comparison between the input genomic sequence and data sequences (Genus and Species levels), and generate a graphical report for ANI calculation results.
ANItools web is useful for defining the relationship between bacterial strains, further contributing to the classification and identification of bacterial species using genome data.
ANItools
Genome
Sequencing
Bioinformatics
29
Genomics Evolution
PathogenTranscriptom
ics
Research
Bacterial Species Definition
Genome Typing
Our Method: MCGT
What is Core genome?
The core genome is the set of genes/genomesequence shared by a group of organisms; the pan genome is the set of all genes/genome seen in any of these organisms.
species evolution
Core
genome
Ecotypes
Genome analysis/Comparative genomics
Population analysis
Sample Selection
85 strains (32 serotypes and 75 STs)
58,501 SNPs
39
GC%=41.1%
1998 outbreak strain:98HAH12
2005 outbreak strain:05ZYH33
Chen Chen, et al. A Glimpse of
Streptococcal Toxic Shock
Syndrome From Comparative
Genomics of S. suis 2 Chinese
Isolates. PLoS One, 2007; 2(3):
e315.
40
Ming Li, et al. SalK/SalR, a Two-Component Signal Transduction System Is Essential for
Full Virulence of Highly Invasive Streptococcus suis Serotype 2. PLoS One. 2008; 3(5):
e2080.
05ZYH33 ΔsalKR
CΔsalKR 05HAS68
A C
B
89K GI is related with the pathogen level of S.suis
MCGT methods also used in other species
• Borrelia burgdorferi• Legionella pneumophila
Tian Q, Zhang W, Liu W, et al. Population structure and
minimum core genome typing of Legionella
pneumophila[J]. Scientific Reports, 2016, 6.
Hao Q, Du P, Zhang W, et al. Genomic Characteristics of
ChineseBorrelia burgdorferiIsolates:[J]. Plos One, 2016,
11(4).
—Bacterial Species Definition
• Core Genome Typing Method
• ANI for Bacterial Species Definition
Biochemical PFGE
MLST 16S
NGS and Genome
Now Future
Compare CGT and ANI
• CGT (Core Genome Typing) method• Accurate (one or more weeks)
• Definition on strain level
• Typing and Evolution study
• ANI method• Fast (10min~60min)
• Definition on strain level
• Finding the candidate outbreak
According to Mindy Goldsborough, ATCC’s chief science and technology officer, the repository acquired its U87 line in 1982 from the Memorial Sloan Kettering Cancer Center in New York City, which itself had received the cell line from Uppsala in 1973. And by the time it arrived at the ATCC, U87 had a Y chromosome — despite the fact that it was supposed to have come from a female patient. This suggests that the mix-up probably happened at Sloan Kettering or during one of the hand-offs.
A 50 years cell line was mislabled
With the help of Genome, we could
• (1) check the bacterial strains supported by users.
UserResource center
The third party
With the help of Genome, we could
• (2) self-check for the deposit of biological material
One year later
Identical, preserve
Not. Problem!
With the help of Genome, we could
• (3) Trace the candidate resource of the target bacterial strain
•
• More evolution background
Compare with genome database
Phylogenetic Tree
More Data are welcome!
• Now: 2773 strains, 1487 species and 668 genera
• 12,614 genome in our database http://data.mypathogen.org
ANI of each speciesTop 10 Last 10
Species Strain Num.Average
ANISpecies Strain Num.
Average ANI
Leptospira biflexa 2 1.000 Candidatus Blochmannia 3 0.738
Caulobacter crescentus 2 1.000 Buchnera aphidicola 6 0.779
Clostridium kluyveri 2 1.000 Polynucleobacter necessarius 2 0.781
Lactobacillus reuteri 2 1.000 Prochlorococcus marinus 12 0.788
Bifidobacterium animalis 3 0.999 Candidatus Liberibacter 2 0.791
Brucella abortus 2 0.999 Enterobacter cloacae 2 0.797
Erwinia amylovora 2 0.999 Blattabacterium 2 0.797
Yersinia pestis 8 0.998 Cyanothece PCC 5 0.808
Treponema pallidum 2 0.998 Dickeya dadantii 3 0.808
Mycobacterium bovis 3 0.997 Pseudomonas fluorescens 3 0.808