seven clusters and four types of symmetry in microbial genomes andrei zinovyev bioinformatics...
TRANSCRIPT
![Page 1: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/1.jpg)
Seven clusters and four types of symmetry in
microbial genomes
Andrei Zinovyev
Bioinformatics service
Math@Bio group of M.Gromov
Tatyana Popova
R&D Centre in Biberach, Germany
Alexander Gorban
Centre for Mathematical Modelling
![Page 2: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/2.jpg)
Symbol of GofG’05
![Page 3: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/3.jpg)
Genomic sequence as a text in unknown language
tagggrcgcacgtggtgagctgatgctaggg
frequency dictionaries:t a g g g r c g c a c g t g g t g a g c t g a t g c t a g g g
ta gg gr cg ca cg tg gt ga gc tg at gc ta gg
tag ggr cgc acg tgg tga gct gat gct agg
tagg grcg cacg tggt gagc tgat gcta gggr
N = 4=41
N = 16=42
N = 64=43
N=256=44
gggrcgccacgttggtgagctgatgctagggrcgacgtgg
tagggrcgcacgtggtgagctgatgctagggrcgacgtgg
agggrcgcacgtggtgagctgatgctagggrcgacgtggc
..cgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcgacgtggtgagctgatgctagggrcgc…
![Page 4: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/4.jpg)
From text to geometrycgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcgacgtggtgagctgatgctagggrcgc
107
cgtggtgagctgatgctagggrcgcacggtgagctgatgctagggrcgcacacttgagctgatgctagggrcgcacaattcgtgagctgatgctagggrcgcacggtg……gagctgatgctagggrcgcacaagtga
length~200-400
10000-20000 fragments
RN
![Page 5: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/5.jpg)
Method of visualizationprincipal components analysis
RNR
2
R2
PCA plot
![Page 6: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/6.jpg)
Caulobacter crescentus
singles N=4
doublets N=16
triplets N=64
quadruplets N=256
!!!
the information in genomic sequence is encodedby non-overlapping triplets (Nature, 1961)
![Page 7: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/7.jpg)
First explanation
cgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcgacgtggtgagctgatgctagggrcgc
![Page 8: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/8.jpg)
tga tgc tag ggr cgc acg tgg
ctg atg cta ggg rcg cac gtg
Basic 7-cluster structure
gtgagctgatgctagggrcgcacgtggtgagc
gct gat gct agg grc gca cgt
gtgaatcggtgggtgaqtgtgctgctatgagc
atc ggt ggg tga gtg tgc tgc
tcg gtg ggt gag tgt gct gct
cgg tgg gtg agt gtg ctg ctg
![Page 9: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/9.jpg)
Non-coding parts
gtgagctgatgctagggr cgcacgaat
Point mutations:insertions, deletions
a
![Page 10: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/10.jpg)
The flower-like 7 clusters structure is flat
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
![Page 11: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/11.jpg)
Seven classes vs Seven clusters
Stanford
TIGR
Georgia Institute of Technology
![Page 12: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/12.jpg)
Computational gene prediction
Accuracy >90%
![Page 13: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/13.jpg)
Mean-field approximationfor triplet frequencies
321KJIIJK PPPF
FIJK : Frequency of triplet IJK ( I,J,K {A,C,G,T} ):
FAAA , FAAT , FAAC … FGGC , FGGG : 64 numbers
position-specific letter frequency + correlations
: 12 numbersjiP
![Page 14: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/14.jpg)
Why hexagonal symmetry?
0-+
-+0
+0-
+-0
-0+
0+-
GC-content = PC + PG
![Page 15: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/15.jpg)
Genome codon usageand mean-field approximation
ggtgaATG gat gct agg … gtc gca cgc TAAtgagct
…
correct frameshift
64 frequencies FIJK
…
ggtgaATG gat gct agg … gtc gca cgc TAAtgagct
12 frequencies PI1 , PJ
2 , PK3
![Page 16: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/16.jpg)
PIJ are linear functions of GC-content
eubacteria
archae
![Page 17: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/17.jpg)
THE MYSTERY OF TWOSTRAIGHT LINES ???
R12 R64
FIJK = P1IP2
JP3K + correlations
![Page 18: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/18.jpg)
Codon usage signature
0-+
![Page 19: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/19.jpg)
19 possible eubacterialsignatures
![Page 20: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/20.jpg)
Example: Palindromic signatures
![Page 21: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/21.jpg)
Four symmetry typesof the basic 7-cluster structure
eubacteria
flower-likedegeneratedperpendiculartriangles
paralleltriangles
![Page 22: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/22.jpg)
B.Halodurans (GC=44%)
S.Coelicolor (GC=72%)
F.Nucleatum (GC=27%)
E.Coli (GC=51%)
![Page 23: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/23.jpg)
Web-site
http://www.ihes.fr/~zinovyev/7clusters
cluster structures in genomic sequences
![Page 24: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/24.jpg)
Human genome (chr19)
non-repetitive sequencesrepetitive sequences
singles doublets triplets
![Page 25: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/25.jpg)
Letter frequencies (3 dimensions)
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1 2 3
a
c
g
t
GC-content (50%)
Purine-Pyrimidine (33%)
Amino-Keto
(17%)
a t
c g
a
tc
g a c
gt
![Page 26: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/26.jpg)
Non-linear good 2D representation(elastic principal manifolds)
A T
G C
0%
100%
![Page 27: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/27.jpg)
Measuring densities
A
T
G
C
A
T
G
C
![Page 28: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/28.jpg)
Contrasting density distribution (two ideas)
• Noise is Gaussian
• Noise is smooth
![Page 29: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/29.jpg)
Contrasted density
A
T
G
C
A
T
G
C
![Page 30: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/30.jpg)
Excluding repeats
A
T
G
C
A
T
G
C
![Page 31: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/31.jpg)
Excluding repeats
A
T
G
C
A
T
G
C
![Page 32: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/32.jpg)
Papers (type Zinovyev in Google)
Gorban A, Zinovyev AGorban A, Zinovyev APCA deciphers genome.PCA deciphers genome. 2005. Arxiv preprint
Gorban A, Popova T, Zinovyev A Gorban A, Popova T, Zinovyev A Codon usage trajectories and 7-cluster structure of 143 complete Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences.bacterial genomic sequences. 2005. Physica A 353, 365-387
Gorban A, Popova T, Zinovyev AGorban A, Popova T, Zinovyev AFour basic symmetry types in the universal 7-cluster structure of Four basic symmetry types in the universal 7-cluster structure of microbial genomic sequences. microbial genomic sequences. 2005. In Silico Biology 5, 0025
Gorban A, Zinovyev A, Popova T Seven clusters in genomic triplet distributionsSeven clusters in genomic triplet distributions. 2003. In Silico Biology. V.3, 0039.
Zinovyev A, Gorban A, Popova T Self-Organizing Approach for Automated Gene IdentificationSelf-Organizing Approach for Automated Gene Identification. 2003. Open Systems and Information Dynamics 10 (4).
![Page 33: Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service Math@Bio group of M.Gromov Tatyana Popova R&D Centre](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492f9e0/html5/thumbnails/33.jpg)
People
Dr. Tanya PopovaInstitute of Computational ModelingRussia
ProfessorAlexander GorbanUniversity of LeicesterUK