Download - Using genomic-based information for the modelling of bacterial environments and lifestyle
USING GENOMIC-BASED INFORMATION FOR THE MODELLING OF BACTERIAL ENVIRONMENTS AND LIFESTYLE
Shiri Freilich
Eytan Ruppin, Roded SharanSchool of Computer Sciences Tel Aviv UniversityMay 2009
Species evolve to adapt to their environment.
Environment/lifestyle Phenotype Genotype
Can we use the genotype to predict the lifestyle of a species?
FROM GENOMIC INFORMATION TO PHENOTYPIC (METABOLIC) INFORMATION
Genomicinformation
Metabolicinformation
GeneEnzyme
GENOMIC ERA: SYSTEMATIC CONSTRUCTION OF HUNDREDS OF METABOLIC NETWORKS
Genomicinformation
Metabolicinformation
Hundreds offully sequencedbacterial species
FROM METABOLIC INFORMATION TO ENVIRONMENTAL INFORMATION
Metabolicinformation
Environmentalinformation
Internal metaboliteExternal metabolite
Predicted natural metabolic environments
D-GLUCOSE IS AN EXAMPLE OF AN EXTERNALMETABOLITE
NEW APPROACHES ALLOW RECONSTRUCTION OF SPECIES’ METABOLIC-ENVIRONMENTS
From Borenstein et al, PNAS 2008
Based on the network topology, identifying the set of compounds that are exogenously acquired
Internal metabolite
External metabolite
CONSTRUCTING PREDICTED ENVIRONMENTS ACROSS HUNDREDS OF SPECIES
Metabolicinformation
Environmentalinformation
Predicted metabolic environments
SO WHAT DO WE HAVE AND WHAT IS IT GOOD FOR?
Metabolic networks
Environments
Species
Genomes
?
• Can we characterize the lifestyle of a species based onGenomic attributes?
• How does the structure of the metabolic network reflectadaptation to species’ lifestyle?
• Can we characterize ecological strategies based on genomic attributes?
• Can we characterize ecological communities based on genomic attributes?
• Why should we do it?
FIRST QUESTION
Metabolic networks
Environments
Species
Genomes
?
• Can we characterize the lifestyle of the species based onGenomic attributes?
Can we predict, based on genomic knowledge, whether a speciesis a specialist or generalist?Can we estimate the range of environments it can inhabit?
AND TO BE MORE SPECIFIC – WE SHOULD COUNT IN HOW MANY ENVIRONMENTS A SPECIES LIVES
Predicted metabolic environments
External/input metabolites
Internal/essential metabolites
GENOMIC-BASE PREDICTED DIVERSITY CORRESPONDS WITH ECOLOGICAL KNOWLEDGE
Specific examples :
Pseudomonas aeruginosa
Desulfotalea psychrophila
Genomic- based predicted environments
√ √ √√xx
NCBI annotations
Available systematic estimates/information for environmental variability
Fraction of reg. genes
Multiple
Specialized
High
{Low}
Beyond specific examples:strong correlation between metabolic-environment variability and established measures of environment variability
WE NOW HAVE AN ENVIRONMENTAL MODEL
ViableNot viable
Environmental viability matrix
Env 1 Env 2Env N
Spc N
Spc 1Spc 2 Information on species
Information on environment
SECOND QUESTION
Metabolic networks
Environments
Species
Genomes
?
• How does the structure of the metabolic network reflects adaptation to species’ lifestyle?
Studying essentialityof reactions acrosshundreds of bacterial-species across many simulated growth-environments.
ESSENTIALITY OF ENZYMES ACROSS SPECIES AND ENVIRONMENTS
Predicted environments
√ √ √√xx
x √ √Enzy
mes
Predicted environments
√ √ √√xx
x √ √
Environment I:
α β
γ δ
ε ζ
η
α
γ δ
ε ζ
η
Environment II: Environment III:
β
γ δ
ε ζ
η
External metabolite
Intermediate product
Essential biomass product
Backed-up reaction
Essential reaction
Conditional-dependent reaction
Accuracy: 0.86 (E. coli ) and 0.85 (B. subtilis)
Taking an enzymes’ point of view
High-throughput identification of essential reactions across species species-specific (or group-specific) essentiality looking for drug targets against reaction with a wide phylogenetic coverage.
This approach can be applied for highlighting essentiality in
groups of medical, ecological or agricultural interest, e.g., human pathogens versus human commensals.
pathogens
Enzy
mes
Backed-up reaction
Essential reaction
Conditional-dependent reaction
Non pathogenic-bacteria
One example:
10-Formyl-THF
fMet-tRNA
Initiation of protein synthesis
Purine synthesis
THF 5,10-MethenylTHFFTL MCH
Most commensals Few pathogens
Most commensals Most pathogens
Potential drug targetMCH: Methenyltetrahydrofolate cyclohydrolase FTL: Formyltetrahydrofolate synthetase
TAKING A SPECIES POINT OF VIEW: ESTIMATING ROBUSTNESS OF METABOLIC NETWORK
4/7 Backed-up reaction
1/7 Essential reaction
2/7 Conditional-dependent reaction
α β
γ δ
ε ζ
η
The fraction of reaction across species
Environmental diversity
Frac
tion
Mean ~0.75
Descriptionofnetwork-robustness
E.coli: 0.78 (0.83 backed-up genes)
M. genitalium: 0.35 (0.2-0.45 backed-up genes)
GENETIC ROBUSTNESS What is it? The ability of a biological system
to continue functioning following mutations How robust are biological systems? Under
laboratory conditions most genes are dispensable; dispensability depends on the experimental setting.
How can we explain robustness in evolutionary terms? The origin of robustness is under debate: Direct selection in favor of resistance to
mutations By product of the selection for other traits (e.g.,
increasing steady-state metabolic fluxes) Genetic robustness reflects environmental
robustness
NETWORK ROBUSTNESS: ENVIRONMENTAL-DEPENDENT AND INDEPENDENT COMPONENTS
The fraction of reaction across species
Environmental diversity
Frac
tion
Correlation: 0.8
Correlation: 0.1
environmentally-dependent component component is strongly associated with environmental diversity (rho=0.8) and responsible for the robustness of no more than 20% of metabolic reactions over all species and environments modeled.
How can we explain the environmentally-independent component?
ENVIRONMENTALLY-INDEPENDENT ROBUSTNESS IS ASSOCIATED WITH THE METABOLIC CAPACITIES
Obse
rved
gro
wth
rate
(log
)
Prediction for growth rate (log), based on network robustness
How can we explain the environmentally-independent component?The environmentally-independent component is associated (correlation=~0.6) with the metabolic capacities of a species -- higher robustness is observed in fast-growers or in organisms with an extensive production of secondary-metabolites.
SECOND QUESTION
Metabolic networks
Environments
Species
Genomes
?
• How does the structure of the metabolic network reflectadaptation to species’ lifestyle?
The design of metabolic networks represents a species-specific adaptation to both its needs and its environment.
THIRD QUESTION
Metabolic networks
Environments
Species
Genomes
?
• The structure of the metabolic network reflectsadaptation to species’ lifestyle. Can we characterize complexecological attributes based on genomic attributes?
Can we predict the level ofcompetition a species encounters in its natural environments andits rate of growth?
ONCE AGAIN – OUR ENVIRONMENTAL MODEL
Viable
Not viable
Environmental viability matrix
Env 1 Env 2 Env N
Spc N
Spc 1
Spc 2 Information on species
Information on environment
APPLYING THE ENVIRONMENTAL-MODEL FOR THE CHARACTERIZATION OF ECOLOGICAL ATTRIBUTES – COMPETITION
Viable
Not viable
Environmental Viability MatrixEnv 1 Env 2 Env 3
Spc 3
Spc 1
Spc 2
Spc 4
Env 4
Co-Habitation vector
Spc 3
Spc 1
Spc 2
{1,3,2}
{3}
{3,2}
Max-CHS
Spc 4 {1}
3
3
31
Environments populated by bacteria of an annotated lifestyleM
ean
leve
l of p
opul
atio
n
Population of environments are in agreement withecological knowledge
DELINEATING ECOLOGICAL STRATEGIES FOR RATE OF GROWTH:
Environments diversity
Max
imal
co-
habi
tatio
n
Ecological diversity with intense co-inhabitation, associated with a typical fast rate of growth.
A specialized niche with little co-inhabitation, associated with a typical slow rate of growth
THIRD QUESTION
Metabolic networks
Environments
Species
Genomes
?
• Can we characterize ecological attributes based on genomic attributes?
The patterns observed suggests a universal principle where metabolic flexibility is associated with a need to grow fast, possibly in the face of competition.Beyond specific examples, the interplay between the environmental diversity – and maximal co habitationallows training a predictor for growth rate (ROC score of 0.75 )
FOURTH QUESTION
Metabolic networks
Environments
Species
Genomes
?
• Can we characterize ecological Communities based on genomic attributes?
Characterization of pair-wise relationship between bacterial species to identify competitionand cooperation
SPECIES DO NOT LIVE IN A VACUUM
WHY SHOULD WE MODEL COMMUNITIES? The composition of bacterial communities is
a major factor in human health. Variations in the identity and abundance of
species affect its metabolic potential and hence have important medical implications.
Computational approaches can now be applied for the modeling of bacterial interactions.
The ultimate goal is to be able to manipulate bacterial communities to our advantage.
FIRST STEP: MODELING PAIR-WISE INTERACTIONS
Environmental viability matrix
Env 1 Env 2 Env N
Spc N
Spc 1Spc 2
Pairwise interactions data
Spc1 Spc2 Spc N
Spc N
Spc 1Spc 2
New type of data:
Interaction (competition/cooperation)
No interaction
Viable
Not viable
Environments populated by bacteria of an annotated lifestyle
Mea
n le
vel o
f pop
ulat
ion
LACK OF SYSTEMATIC KNOWLEDGE OF PAIR-WISE INTERACTIONS
Modelling lifestyle
Systematic knowledge for species-specific lifestyle
Modeling pairwise interactions
Env 1 Env 2 Env N
Spc N
Spc 1Spc 2
3
1
2
Spc1 Spc2 Spc N
Spc N
Spc 1Spc 2
Systematic knowledge for pairwise interactions
Metagenomic data?
WHAT ARE METAGENOMIC DATA?
IDENTIFYING “OUR” SPECIES IN ENVIRONMENTAL SAMPLES
Species represented by 16s rRNA
Spc 1
Spc 2
Gut
Spc 3
Environmental samples
Marine
Soil
BLAST
CONSTRUCTING EXPERIMENTALLY-DRIVEN PAIRWISE INTERACTIONS DATA
Environmental samples
Gut Marine PM3
Spc 3
Spc 1Spc 2
Spc 4
Spc 5
Spc 6
Environmentally-drivenDatabase of interactions
Spc 3
Spc 1Spc 2
Spc 4
Spc 5
Spc 6
Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6
-134 species (including 47 species in the gut and 81 marine species)-~1200 interactions (limited to interactions within the gut and between marine species)
PUBMED IS A LARGE AND COMPREHENSIVE DATA SOURCE
Müller & Mancuso, Plos ONE, 2008
Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. Co-occurrence between genes and proteins was shown to reveal functional interactions.
CONSTRUCTING LITERATURE-DRIVEN PAIRWISE INTERACTIONS DATA
Papers in Pubmed
PM1 PM2 PM3
Spc 3
Spc 1Spc 2
Spc 4
Spc 5
Spc 6
Co-occurrence basedDatabase of interactions
Spc 3
Spc 1Spc 2
Spc 4
Spc 5
Spc 6
Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6
-All species are covered by the database (~400 species)-~6000 interactions (cut-offs vary in dependence with the statistical approach taken)
CO-OCCURRENCE BASED INTERACTIONS DATA ARE IN AGREEMENT WITH ECOLOGICAL PROPERTIES
Num
ber of co-associated partners
Spc1 Spc2 Spc 3
Spc 3
Spc 1Spc 2
2
1
0
Pairwise interaction data
Significant enrichment in experimentally-based
interactions
Spec. Obli. Aqua. Hoas. Mult. Soil
Strong correlationwith systematic annotations
of ecological diversity
PAIRWISE INTERACTIONS OF GUT BACTERIA
Typical gut bacteria
Potential gut bacteria
Potential human-associatedbacteria
Other
THANKS
Anat KreimerUri GophnaRoded SharanEytan Ruppin
Nir YosefIsaac MeilijsonElhanan BorensteinMoshe Mevarech