genome insights for bacterial species definition - wdcm.org dr. wen zhang ( genome... · • li,...

55
Genome insights for Bacterial species definition National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention Wen Zhang [email protected] 2016-09

Upload: hoangcong

Post on 02-Aug-2019

220 views

Category:

Documents


0 download

TRANSCRIPT

Genome insights for Bacterial species definition

National Institute for Communicable Disease Control and Prevention,

Chinese Center for Disease Control and Prevention

Wen Zhang [email protected]

2016-09

• Chinese Center for Disease Control and Prevention (China CDC) is a nonprofit institution working in the fields of disease control and prevention, public health management and provision of service.

Pathogen

Management Institution

Resource Center

Research

Mycobacterium tuberculosis

Vibrio cholerae

Streptococcus sp.

Yersinia pestis

Clostridium difficileBrucella sp.

How genome change our work in past 10 years?

Background

Our Work about bacterial genome

Genome for resource center

Three Words

• 1、 Genome

• 2、Sequencing

• 3、Bioinformatics

•1、Genome: In modern molecular biology and genetics, a genome is the genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes, (the coding regions), the noncoding DNA and the genomes of the mitochondria and chloroplasts.

The explosive growth of genome data

2、DNA Sequencing technology

• determining the precise order of nucleotides within a DNA molecule

ATG TTC ATC CGA TCA ACG TGA

Gene FragmentGenome

Cell Chromosome

• 一次性对几百万到十亿条DNA分子进行测序的技术

First generation:Sanger sequencing

NGS (2nd gen) platforms

Illumina ( Solexa)

SoLID: Life technlogies (Applied Biosystems)

454/Roche (Pyrosequencing)

Ion Torrent (semiconductor hydrogen ion detection)

3rd gen platforms

Helicos

Pac bio

Oxford Nanopore

ABI 3730XL

Illumina HiSeq MiSeq

Roche/454 FLX 5500 SOLiD™ Ion Torrent

Pacific BioSciences

Helicos

Oxford Nanopore

Human Genome

In 2000, 3 billion dollars

Now, 1000 dollars

1/1,000,000 in the past 16 years

National Center for Biotechnology Information , or NCBI, already house petabytes — millions of gigabytes — of data, and biologists around the world are churning out 15 petabases

Thousands of bacterial genomes are released in NCBI.

3、Bioinformatics

Computer

BioinformaticsBiology

Background

Our Work about bacterial genome

Genome for resource center

Genome

Sequencing

Bioinformatics

12

Genomics Evolution

PathogenTranscriptom

ics

Research

—Pathogen Mechanism

Clostridium difficile Vibrio mimicus

Vibrio cholerae

Streptococcus suis

—BCG Vaccine Research

Genome Comparison

T-epitopes variation

Our Publication in recent years• Wen Zhang, et al. Genomic study of the Type IVC secretion system in Clostridium difficile: Understanding C. difficile evolution via horizontal gene transfer. Genome, 10.1139/gen-2016--0053

• Chen, Chen, Wen Zhang, Han Zheng, Ruiting Lan, Haiyin Wang, Pengcheng Du, Xuemei Bai, Shaobo Ji, Qiong Meng, and Dong Jin. 2013. "Minimum core genome sequence typing of bacterial pathogens: a unified approach for clinical and public health microbiology." Review of. Journal of clinical microbiology 51 (8):2582-91.

• Du, Pengcheng, Bo Cao, Jing Wang, Wenge Li, Hongbing Jia, Wen Zhang, Jinxing Lu, Zhongjie Li, Hongjie Yu, and Chen Chen. 2014. "Sequence variation in tcdA and tcdB of Clostridium difficile: ST37 with truncated tcdA is a potential epidemic strain in China." Review of. Journal of clinical microbiology 52 (9):3264-70.

• Du, Pengcheng, Wen Zhang, Haiyin Wang, Chen Chen, and T Bureau. 2012. "Comparative genomic analysis of Escherichia coli O104: H4 stx2 prophage reveals a potential new method to identify virulence factors." Review of. Genome 55 (9):697-700.

• Han N, Qiang Y, Zhang W. ANItools web: a web tool for fast genome comparison within multiple bacterial strains.[J]. Database the Journal of Biological Databases & Curation, 2016, 2016

• Jiang, Hai, Pengcheng Du, Wen Zhang, Heng Wang, Hongyan Zhao, Dongri Piao, Guozhong Tian, Chen Chen, and Buyun Cui. 2013. "Comparative Genomic Analysis of Brucella melitensis Vaccine Strain M5 Provides Insights into Virulence Attenuation." Review of. PLoS ONE 8 (8):e70852.

• Jiang, Yi, Haican Liu, Haiyin Wang, Xiangfeng Dou, Xiuqin Zhao, Yun Bai, Li Wan, Guilian Li, Wen Zhang, and Chen Chen. 2013. "Polymorphism of Antigen MPT64 in Mycobacterium tuberculosis Strains." Review of. Journal of clinical microbiology 51 (5):1558-62.

• Li, Jing, Jing Ding, Wen Zhang, Yuanli Zhang, Ping Tang, Jian-Qun Chen, Dacheng Tian, and Sihai Yang. 2010. "Unique evolutionary pattern of numbers of gramineous NBS–LRR genes." Review of. Molecular Genetics and Genomics 283 (5):427-38. doi: 10.1007/s00438-010-0527-6.

• Lu, Liang, Douglas Chesters, Wen Zhang, Guichang Li, Ying Ma, Huailei Ma, Xiuping Song, et al. 2012. "Small Mammal Investigation in Spotted Fever Focus with DNA-Barcoding and Taxonomic Implications on Rodents Species from Hainan of China." Review of. PLoS ONE 7 (8):e43479. doi: 10.1371/journal.pone.0043479.

• Luke G. Barrett, Joel M. Kniskern, Natacha Bodenhausen, Wen Zhang, Joy Bergelson. 2009. "Continua of specificity and virulence in plant-host pathogen interactions: causes and consequences." Review of. New Phytologist 183:513-29.

• Zhang, Wen, Pengcheng Du, Han Zheng, Weiwen Yu, Li Wan, and Chen Chen. 2014. "Whole-genome sequence comparison as a method for improving bacterial species definition." Review of. The Journal of general and applied microbiology 60 (2):75-8.

• Zhang, Wen, Chengbo Rong, Chen Chen, and George F. Gao. 2012. "Type-IVC Secretion System: A Novel Subclass of Type IV Secretion System (T4SS) Common Existing in Gram-Positive Genus Streptococcus." Review of. PLoS ONE 7 (10):e46390. doi: 10.1371/journal.pone.0046390.

• Zhang, Wen, Xiaoqin Sun, Huizhong Yuan, Hitoshi Araki, Jue Wang, and Dacheng Tian. 2008. "The pattern of insertion/deletion polymorphism in Arabidopsis thaliana." Review of. Molecular Genetics and Genomics 280 (4):351-61. doi: 10.1007/s00438-008-0370-1.

• ZHANG, Wen, Wei Wen YU, Di LIU, Ming LI, Peng Cheng DU, Yi Lei WU, George F Gao, and Chen Chen. 2013. "T4SP: A Novel Tool and Database for Type IV Secretion Systems in Bacterial Genomes." Review of. Biomedical and environmental sciences: BES 26 (7):614-7.

• Zhang, Wen, Yuanyuan Zhang, Huajun Zheng, Yuanlong Pan, Haican Liu, Pengcheng Du, Li Wan, Jun Liu, Baoli Zhu, and Guoping Zhao. 2013. "Genome sequencing and analysis of BCG vaccine strains." Review of. PLoS ONE 8 (8):e71243.

Genome

Sequencing

Bioinformatics

16

Genomics Evolution

PathogenTranscriptom

ics

Research

Bacterial Species Definition

Genome Typing

《Bergey's Manual of Systematic Bacteriology》

—Bacterial Species Definition

Traditional method: based on phenotypic similarities and chemical characteristics, which are to some extent affected by environmental factors, such as temperature and pH, which can cause possible biases

—Bacterial Species Definition

• Modern genetic methods: based on fragment nucleotide sequences (16S and MLST) , which could be also biased by one or more sequencing errors

Biochemical PFGE

MLST 16S

—Bacterial Species Definition

Gen

om

e High resolution level on strain

Low error

No environmental impact factors

there are not two identical leaves in the world

Future Method?

Does ANI work for Bacterial Species Definition?

• Average Nucleotide Identity (ANI) : calculated from pair-wise comparisons of all sequences shared between any two strains

1226 Strains,871 Species,466 Genus

Average Nucleotide Identity (ANI)

• SGC > DGC

• Average ANI • 0.936 Species Level

• 0.836 Genus Level

• 0.789 Family Level

Cutoff 0.92 does not work for the following genus

• Shigella sp. 志贺氏菌属

• Brucella sp. 布鲁菌属

• Rickettsia sp. 立克次体属

• Yersinia sp. 耶尔森菌属

We developed the web version of ANItools(http://ani.mypathogen.cn/), which helps users directly get ANI values from online sources. A database covering ANI values of any two strains in a genus was also included (2773 strains, 1487 species and 668 genera). Importantly, ANItools web can automatically run genome comparison between the input genomic sequence and data sequences (Genus and Species levels), and generate a graphical report for ANI calculation results.

ANItools web is useful for defining the relationship between bacterial strains, further contributing to the classification and identification of bacterial species using genome data.

ANItools

ANItools

http://ani.mypathogen.cn/

Run Time

Result Report

Status

http://ani.mypathogen.cn/

Summary

http://ani.mypathogen.cn/

ANI List

Phylogenetic Tree

Web site:http://ani.mypathogen.cn/

Genome

Sequencing

Bioinformatics

29

Genomics Evolution

PathogenTranscriptom

ics

Research

Bacterial Species Definition

Genome Typing

30

Streptococcus suis 猪链球菌

Core Genome Typing Method

31

2005 Sichuan

Province, China

38-death cases

Why not only typing using serotype or MLST?

Patients Healthy

Serotype 2Serotype 14

ST1、ST6、ST7

Our Method: MCGT

What is Core genome?

The core genome is the set of genes/genomesequence shared by a group of organisms; the pan genome is the set of all genes/genome seen in any of these organisms.

species evolution

Core

genome

Ecotypes

MCGT for Streptococcus suis

85 Strains of S.suis Selection• 32 serotypes

• 75 STs based on MLST

Genome analysis/Comparative genomics

Population analysis

Sample Selection

85 strains (32 serotypes and 75 STs)

58,501 SNPs

7 Groups based on SNPs in MCG

Virulent genes in 7 groups

39

GC%=41.1%

1998 outbreak strain:98HAH12

2005 outbreak strain:05ZYH33

Chen Chen, et al. A Glimpse of

Streptococcal Toxic Shock

Syndrome From Comparative

Genomics of S. suis 2 Chinese

Isolates. PLoS One, 2007; 2(3):

e315.

40

Ming Li, et al. SalK/SalR, a Two-Component Signal Transduction System Is Essential for

Full Virulence of Highly Invasive Streptococcus suis Serotype 2. PLoS One. 2008; 3(5):

e2080.

05ZYH33 ΔsalKR

CΔsalKR 05HAS68

A C

B

89K GI is related with the pathogen level of S.suis

MCGT methods also used in other species

• Borrelia burgdorferi• Legionella pneumophila

Tian Q, Zhang W, Liu W, et al. Population structure and

minimum core genome typing of Legionella

pneumophila[J]. Scientific Reports, 2016, 6.

Hao Q, Du P, Zhang W, et al. Genomic Characteristics of

ChineseBorrelia burgdorferiIsolates:[J]. Plos One, 2016,

11(4).

—Bacterial Species Definition

• Core Genome Typing Method

• ANI for Bacterial Species Definition

Biochemical PFGE

MLST 16S

NGS and Genome

Now Future

Compare CGT and ANI

• CGT (Core Genome Typing) method• Accurate (one or more weeks)

• Definition on strain level

• Typing and Evolution study

• ANI method• Fast (10min~60min)

• Definition on strain level

• Finding the candidate outbreak

Background

Our Work about bacterial genome

Genome for resource center

According to Mindy Goldsborough, ATCC’s chief science and technology officer, the repository acquired its U87 line in 1982 from the Memorial Sloan Kettering Cancer Center in New York City, which itself had received the cell line from Uppsala in 1973. And by the time it arrived at the ATCC, U87 had a Y chromosome — despite the fact that it was supposed to have come from a female patient. This suggests that the mix-up probably happened at Sloan Kettering or during one of the hand-offs.

A 50 years cell line was mislabled

With the help of Genome, we could

• (1) check the bacterial strains supported by users.

UserResource center

The third party

With the help of Genome, we could

• (2) self-check for the deposit of biological material

One year later

Identical, preserve

Not. Problem!

With the help of Genome, we could

• (3) Trace the candidate resource of the target bacterial strain

• More evolution background

Compare with genome database

Phylogenetic Tree

ANItools

http://ani.mypathogen.cn/

More Data are welcome!

• Now: 2773 strains, 1487 species and 668 genera

• 12,614 genome in our database http://data.mypathogen.org

For co-workers

• Technical guidance

• Achievement sharing

• Free genome sequencing for type strains

CDC各业务科室

CDC

Co-workers

And More!

Thank you!Wen Zhang [email protected]

• Cutoff value: 0.92

• Accuracy rate: 81.5%

ANI of each speciesTop 10 Last 10

Species Strain Num.Average

ANISpecies Strain Num.

Average ANI

Leptospira biflexa 2 1.000 Candidatus Blochmannia 3 0.738

Caulobacter crescentus 2 1.000 Buchnera aphidicola 6 0.779

Clostridium kluyveri 2 1.000 Polynucleobacter necessarius 2 0.781

Lactobacillus reuteri 2 1.000 Prochlorococcus marinus 12 0.788

Bifidobacterium animalis 3 0.999 Candidatus Liberibacter 2 0.791

Brucella abortus 2 0.999 Enterobacter cloacae 2 0.797

Erwinia amylovora 2 0.999 Blattabacterium 2 0.797

Yersinia pestis 8 0.998 Cyanothece PCC 5 0.808

Treponema pallidum 2 0.998 Dickeya dadantii 3 0.808

Mycobacterium bovis 3 0.997 Pseudomonas fluorescens 3 0.808