environmental conditions and transcriptional regulation in escherichia coli: a physiological...
TRANSCRIPT
Environmental Conditions andTranscriptional Regulation in Escherichiacoli: A Physiological Integrative Approach
Agustino Martınez-Antonio, Heladia Salgado, Socorro Gama-Castro,Rosa Marıa Gutierrez-Rıos, Veronica Jimenez-Jacinto, Julio Collado-Vides
Program of Computational Genomics, CIFN, UNAM AP 565-A Cuernavaca,Morelos 62100, Mexico; telephone: +527-77-313-2063; fax: +527-77-317-5581; e-mail: [email protected]
Published online 24 November 2003 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/bit.10846
Abstract: Bacteria develop a number of devices for sens-ing, responding, and adapting to different environmentalconditions. Understanding within a genomic perspectivehow the transcriptional machinery of bacteria is modulat-ed, as a response to changing conditions, is a major chal-lenge for biologists. Knowledge of which genes are turnedon or turned off under specific conditions is essential forour understanding of cell behavior. In this study we de-scribe how the information pertaining to gene expressionand associated growth conditions (even with very littleknowledge of the associated regulatory mechanisms) isgathered from the literature and incorporated into Regulon-DB, a database on transcriptional regulation and operonorganization in E. coli. The link between growth condi-tions, signal transduction, and transcriptional regulationis modeled in the database in a simple format that high-lights biological relevant information. As far as we know,there is no other database that explicitly clarifies the effectof environmental conditions on gene transcription. We dis-cuss how this knowledge constitutes a benchmark that willimpact future research aimed at integration of regulatoryresponses in the cell; for instance, analysis of microarrays,predicting culture behavior in biotechnological processes,and comprehension of dynamics of regulatory networks.This integrated knowledge will contribute to the future goalof modeling the behavior of E. coli as an entire cell. TheRegulonDB database can be accessed on the web at theURL: http://www.cifn.unam.mx/Computational_Biology/regulondb/. B 2003 Wiley Periodicals, Inc.
Keywords: Escherichia coli; transcriptional regulation;environmental conditions; transcription factors; gene ex-pression
INTRODUCTION
Bacteria are the most successful life forms, with different
species able to colonize in the most extreme and diverse
conditions. Although they have a compact genome, bacteria
develop and exploit all imaginable devices, at the molec-
ular, physiological, and cellular levels, to survive and pro-
liferate (Vicente et al., 1999). Studies of bacteria in the
laboratory can frequently give a false impression of their
normal growth and survival abilities because their behavior
in the laboratory can, for many reasons, differ markedly
from their behavior in nature (Bruggeman et al., 2000). One
of the reasons for this difference is that exposure to stress is
a normal way of life for bacteria in nearly all the natural
situations (Konopka, 2000). In nature, as in bioreactors,
bacteria need to incorporate from the milieu the necessary
nutritional elements required for their survival (Taylor and
Zhulin, 1998). A free-living bacterium maintains a constant
monitoring of the extracellular physicochemical conditions
(Armitage, 1997; Lengeler, 2000; Ninfa, 1996; Pellequer
et al., 1999; Poolman et al., 2002). This is achieved by
genes whose products are involved in sensing and
incorporating different nutritional elements, as well as
sensing the concentration of elements of insult to respond
and modify their gene expression patterns to adjust growth
(Koretke et al., 2000; Rowbury, 2001; Rui and Tse-Dinh,
2003). These sensing systems are either part of the
transcriptional machinery as in the two-component systems
or are connected through intermediates and products of
metabolic pathways. Indeed, many small metabolites act as
allosteric effectors that bind transcriptional factors (TFs) to
modulate their activity (Monod, 1966). With the large
number of available genomes, a new perspective of
molecular mechanisms can be explored that integrates
physiological conditions with regulatory processes at the
cellular level (Cases et al., 2003). For instance, a recent
study with the E.coli TFs identified domains for binding
small metabolites for more than half of the total set of
known regulators (Babu and Teichmann, 2003; Perez-
Rueda and Collado-Vides, 2000).
E. coli is probably the most thoroughly studied model
organism (Blattner et al., 1997) and from which much of our
current knowledge of microbial physiology, biochemistry,
B 2003 Wiley Periodicals, Inc.
Correspondence to: J. Collado-Vides
Contract grant sponsors: NIH; CONACyT of Mexico
Contract grant numbers: GM62205-02; 0028
and bacterial genetics began. Some of the presently more
active fields of research involve modeling of biochemical
and genetic pathways (Shen-Orr et al., 2002), as well as the
articulation of one of the most ambitious projects: modeling
of E. coli as an entire cell (Holden, 2002). Databases that
gather organized information on molecular mechanisms and
function of cell machinery represent an essential infra-
structure for such ambitious goals, but also represent an
invaluable knowledge repository for basic and biotechno-
logical research. Thus, the development of such integrative
databases for the combined analysis of various types of
physiological data is a major goal of much of the current
research in life sciences. RegulonDB is a relational database
containing information on mechanisms at the level of
transcriptional initiation as well as operon organization with
their promoters and terminator signals in the Escherichia
coli K-12 genome (Salgado et al., 2001). The database is
updated constantly by searching in original publications, and
is complemented by computational predictions (Salgado
et al., 2000). RegulonDB has been used in different types of
analyses by the scientific community, such as: prediction of
regulatory binding sites (Thieffry et al., 1998b); prediction
of operons (Ermolaeva et al., 2001; Salgado et al., 2000);
reconstruction of metabolic pathways using regulatory
information (Covert et al, 2001); studies on the evolution
of regulatory mechanisms (Zheng et al., 2002); in a gram-
matical modeling of transcriptional regulation (Collado-
Vides, 1992); and in other analyses of regulatory networks
(Babu and Teichmann, 2003; Shen-Orr et al., 2002; Thieffry
et al., 1998). Furthermore, the mechanistic information
gathered has been included in the EcoCyc database (Karp
et al., 2002).
This review describes how we have incorporated in
RegulonDB the effects of environmental conditions on gene
transcription, and exemplifies possible applications to some
research areas that would benefit from this information.
CELL MECHANISMS FOR SENSING THEENVIRONMENTAL CONDITIONS
In natural environments, there are numerous stresses to
which microorganisms can be exposed, such as heat, cold,
lethal levels of metal ions, chlorine, acrylate, and NH3, or
attack by colicins, bacteriophages, or other biological agents
such as protozoa. In the case of organisms that contaminate
food, stress occurs due to exposure to heat, cold, irradiation,
acidity, salts, or high sugar concentration. Infecting orga-
nisms entering into animals, plants, or the human body en-
counter numerous stresses, such as extreme pH levels,
chlorine, oxidative components, enzymes, and bile salts
(Rowbury, 2001). Usually these responses follow an induc-
ible process whereby specific genes are switched on by the
interaction of the triggering agent with a stimulus sensor.
Information-processing or signal-transduction pathways
recognize various physicochemical stimuli, amplify and pro-
cess signals, and trigger an adaptive response. For instance,
in a two-component transduction system, the DNA-binding
transcriptional regulator is activated by phosphorylation,
generally on an aspartate residue, by a transmembrane ki-
nase. The TFs that belong to this class in E. coli and other
bacteria include BvgA, ComA, DctR, DegU, EvgA, FimZ,
FixJ, GacA, GlpR, NarL, NarP, NodW, RcsB, and UhpA,
among others (Oshima et al., 2002). A second class of
regulators are activated when bound to autoinducer mole-
cules such as N-(3-oxohexanoyl)-L-homoserine lactone
(OHHL), involved in bacteria cell–cell communication.
The proteins that belong to this class are CarR, EchR,
EsaR, ExpR, LasR, LuxR, PhzR, RhlR, TraR, and YenR.
The Sentra database contains a classification of signal-
transduction proteins for over 43 prokaryote genomes
(Maltsev et al., 2002).
The diffusion of substrates into the cell is the central
problem in nutrient limitation conditions; therefore, to maxi-
mize transport, increased synthesis of membrane-bound
permeases for the limiting nutrient or de novo synthesis
of high-affinity transport systems is a frequent strategy
(Konopka, 2000). In this manner, the change in concen-
tration of small-molecule metabolites or physicochemical
Figure 1. Sensing mechanism, pathway transduction, and transcriptional
regulation response in E. coli. The cpx two-component system responds to
alterations in the folding of membrane proteins. This system is induced by
conditions such as high pH or protein overexpression. The flow of infor-
mation to express the yihE –dsbA operon as a response to NlpE over-
expression in an in vitro study is as follows (Pogliano et al., 1997; Raivio and
Silhavy, 2001): (1) When NlpE is not overexpressed, the CpxA sensor
protein is maintained in an inactive state by the periplasmic inhibitor, CpxP.
(2) When NlpE is overexpressed and consequently misfolded, CpxA is
relieved of CpxP. (3) CpxA can be autophosphorylated (CpxA-P), then it
transfers the phosphate to the CpxR response regulator protein (CpxR-P).
(4) CpxR-P binds to �49-bp upstream promoter p2 of the yhiE –dsbA
operon, activating the transcription (yihE: unknown function; dsbA: disul-
fide isomerase, involved in the periplasmic protein folding by catalyzing
disulfide bonds formed between pairs of cysteines). Promoter p1 is not active
under this condition. (5) The system is turned off and returns to its previous
state when NlpE protein is not overexpressed or is correctly folded. OM,
outer membrane; PP, periplasm, IM, inner membrane; P, phosphate.
744 BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003
determinants can modulate the transcriptional response by
means of allosteric changes in specific TFs. In E. coli,
Salmonella, and other bacteria, some classic examples of
many others, of how gene expression changes as a con-
sequence of specific nutrients, such as carbon and ammo-
nium sources, have been reviewed recently (Dworkin and
Losick, 2001). Figure 1 illustrates one particular example of
the integration of sensing, pathway transduction, and tran-
scriptional regulation in response to misfolding of peri-
plasmic proteins in E. coli (Pogliano et al., 1997; Raivio and
Silhavy, 2001).
LINK BETWEEN ENVIRONMENTAL CONDITIONSAND GENE TRANSCRIPTION
The mechanisms by which cells sense and respond to
changes in their environment are difficult to elucidate in
part because, even for simple cells like bacteria, a change in
growth conditions can perturb the complex network of
metabolic interactions that govern the physiological state
of the cell. The number of genes affected can be appre-
ciated in the several available microarray experiments
wherein we have estimated that no less than 100 genes are
affected in a single shift of conditions (data not shown).
Therefore, it is difficult to identify the early, specific steps
in the myriad of intermediate metabolites responsible for
adaptive response that lead to the overall effects in gene
expression. The total set of metabolites or metabolome, as
identified by nuclear magnetic resonance (NMR) and mass
spectroscopy, has been shown to be considerably different
when comparing shifts from microaerophilic, aerobic, or
anaerobic metabolism in E. coli (Frey et al., 2001).
Mixtures of nonlabeled and labeled substrates enable one
to follow the catabolic and anabolic processes and the
intermediates formed, wherein the resulting flux maps
may be regarded as the phenotypic correlate of a micro-
array expression profile (Phelps et al., 2002). Thus, the
majority of transcription regulators respond to cell physi-
ology directly by the presence or absence of small metab-
olites in addition to two-component sensing systems
(Anantharaman et al., 2001, Babu and Teichmann, 2003,
Oltvai and Barabasi, 2002). Table I shows the metabolites
that interact with TFs in E. coli based on the current
knowledge from RegulonDB. We can observe metabolite
intermediates as sugars, amino acids, nucleotides, vitamins,
metal ions, and physicochemical determinants such as
redox conditioners.
GATHERING THE EFFECT OF ENVIRONMENTALCONDITIONS ON GENE TRANSCRIPTION
Many laboratories have analyzed different experimental
conditions and studied their effect on gene transcription,
and other laboratories continue to do so. If we were to
gather detailed quantitative information on experimental
growth conditions and parameters it would it would be very
labor-intensive, but also its utility would be questionable.
Such an effort in metabolism was already undertaken by
Seiko and colleagues (Overbeek et al., 2000) in what
became the WIT database. After an analysis of several
different conditions and the systems involved, we chose to
favor discrete models that take into account growth
conditions and their effect in the regulation of gene
expression. We concentrated on a strategy whereby the
following properties and descriptions are considered
essential: (i) a global or general descriptor of the condition
studied; (ii) the control or initial condition; (iii) the
specifically tested experimental condition; (iv) the growth
media used, described in simple but precise terms; (v) the
list of affected genes; (vi) the type of effect (induction,
repression, or no effect) on the genes analyzed; (vii) the
experimental evidence; and (viii) the associated references.
We are aware that this is a simplified discrete modeling of a
very rich and complex biological and experimental setting.
It is clear that growth media can be specified in great detail,
as well as the effect of signal transduction mechanisms on
transcriptional regulation.
In addition to the previously mentioned properties and
descriptions considered essential or obligatory, in some
Table I. List of metabolite effectors gathered on RegulonDB that affect
the conformation and regulatory activity of TFs in E. coli.
Sugars Amino acids
6-Phosphogluconate Alanine
Acyl-CoA D-serine
Allolactose Glycine
Arabinose Homocysteine
Deoxyribose-5-phosphate L-arginine
Formate Leucine
Fructose-1,6-biphosphate L-tryptophan
Fructose-1-phosphate N-acetyl-L-serine
Fucose O-acetyl-L-serine
Galactose Phenylalanine
Glucitol Tyrosine
Gluconate S-adenosylmethionine
Glycerol-1-phosphate Choline
Gylcerol-3-phosphate
Glycolate Vitamins
Maltotriose-ATP Biotin-5V-AMP
Melibiose Vitamin B12 (cyanocobalamin)
a-Cetohydroxibutyrate
a-Cetolactate Ions
Phosphoenol pyruvate Ferric complex
Pyruvate Na+
L-rhamnose Zn+
Threalose-6-phosphate Mn+
Raffinose Molybdate
Ribose Phosphate
Xylose
Others
Nucleotides Redox cell state
Adenosine Cyanate
ATP Formate
cAMP Methylation
Cytidine Salicylate
Guanine Sulfur
Hypoxanthine Thiosulfate
Purine
Xanthosine
MARTINEZ-ANTONIO ET AL.: GROWTH CONDITIONS AND REGULATION OF TRANSCRIPTION IN E. COLI 745
experiments, additional knowledge is generated by specify-
ing some of the components involved in the connection
between changes in conditions and gene expression. This
includes mechanistic properties that are already part of
the RegulonDB model, such as: (i) the intermediate metab-
olites or proteins that participate in the regulatory sensing
mechanism; (ii) the TF involved, especially in cases in which
the experiment is studying mutations of TFs; (iii) the
allosteric conformation and associated effector; (iv) the set
of sites in the DNA involved in the regulation; and (v) the
gene and transcription unit that the gene belongs to,
specifying the promoter and terminators involved. Ideally,
an efficient database design requires selection of a
representation that best models the information of interest.
Table II illustrates how the data in Figure 1 are organized in
RegulonDB using a more suitable format for automatic anal-
yses. This information can then be easily transformed into a
discrete format, appropriate for computational analyses,
depending on a particular biological question. For example,
0 might represent absence, inactive conformation, or
repression, whereas 1 might represent the opposite.
INFORMATION GATHERED ABOUTENVIRONMENTAL CONDITIONS ON REGULONDB
Table III provides a summary of the information we gathered
until July 2003, specifically with regard to physiological
conditions and their effect on gene expression. It is important
to note that a global condition may be represented by differ-
ent specific conditions, and a gene may respond, of course, to
one or several experimental conditions. The numbers in the
Table III account for unique cases; for instance, so far we
have conditions for 325 different genes, although some may
be affected by more than one condition. Table IV shows the
global conditions annotated so far, and the corresponding
numbers of genes and TFs involved. These are the main
global conditions that have been analyzed experimentally.
As already suggested, compilation of this type of informa-
tion is usually more complicated than the gathering of
mechanistic information, due mostly to the heterogeneity in
the specific experimental conditions, buffers, and concen-
trations of metabolites used; times of sampling; different
Table II. Textual form of growth conditions and its effect in gene transcription (as gathered RegulonDB)—the same information as in the Figure 1.
Global condition Control condition Specific condition Medium Gene Effect INa CTRb RIc TUd PMe
Protein overproduction No-NlpE overproduction NlpE overproduction Minimal yihE Induction CpxP CpxR-CpxAP �49 YihE-dsbA p2
Protein overproduction No-NlpE overproduction NlpE overproduction Minimal dsbA Induction CpxP CpxR-CpxAP �49 YihE-dsbA p2
aIntermediate.bTF conformation.cCenter position of operator site.dTranscription unit (promoter, terminator).ePromoters.
Table III. Summary of elements gathered related to the effect of growth
conditions on gene transcription, as of July 2003.
Objects Total
Global conditionsa 29
Experimental conditionsb 83
Genes 325
Transcription units 56
Promoters 115
Conformations 39
Transcription factors 32
Intermediates 29
References 227
Types of evidence 3
aGlobal condition refers to one general growth condition; for instance,
anaerobiosis or carbon source.bExperimental conditions refer to more specific growth conditions
included in one global condition; for instance, the transition in growth
media from glucose to glycerol or other precise carbon source.
Table IV. Total of genes affected by global growth conditions and the
TFs involved in its regulation.
Global condition Total genes Promoters
Transcription
factors
Aerobiosis 8 7 0
Alkylating condition 2 1 0
Amino acids 35 27 27
Anaerobiosis 78 48 36
Antibiotic stress 30 18 0
Carbon starvation 21 9 6
Cell density 6 3 1
Cold shock 7 1 1
Exponential phase 11 5 0
Heat shock 38 23 2
Hydrostatic pressure 6 6 0
Iron starvation 2 0 2
Nitrogen starvation 13 1 0
Nonoptimal carbon source 99 1 2
Nucleotides 4 2 1
Osmotic shift 65 22 2
Oxidative stress 32 13 23
Protein overproduction 20 9 3
Rich medium 3 1 0
Stationary phase 69 20 1
Sulfur limited 1 1 1
Metals metabolism 12 0 0
Motility 2 2 2
Optimal carbon source (glucose) 30 18 4
pH shift 20 15 5
Phosphate starvation 8 3 3
Stringent response 6 2 0
Vitamins 1 1 0
746 BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003
methods of quantification; different ways of preparing the
inocula; different strains, etc.
The different physiological (global) conditions affecting
the same genes are shown in Figure 2. With the limited data
collected so far, it can be observed that most of the genes
(67%) are affected by only one condition, and fewer genes
are affected by two or more conditions. For instance, bolA,
hmpA, and hyaA respond to six different global conditions.
BolA is a regulator of cell-wall biosynthetic enzymes with
different roles in cell morphology and cell division; it is es-
sential for normal cell morphology in stationary phase and
under conditions of starvation (Santos et al., 2002). Flavo-
hemoglobin HmpA is involved in protective responses
to oxidative stress, specifically exposition to nitric oxide
(Cruz-Ramos et al., 2002). HyaA is one hydrogenase-1
small subunit involved in aerobic respiration (Menon et al.,
1991). These proteins involved in cell morphology, oxida-
tive stress response, and aerobic respiration give us an idea
of the type of activity that should be part of the core
required for multiple-condition responses. As the informa-
tion in RegulonDB grows more complete, it will become
possible to develop a more complete picture of the network
of physiological responses in bacteria.
A more detailed example of the overlapping of genes
used in three different global conditions is shown in a Venn
diagram in Figure 3. The number of genes responding to
stationary phase, osmotic shift, and nonoptimal carbon
source (any carbon source other than glucose) is shown in
Figure 3A; we can observe a small fraction of genes
common to all three conditions, as well as others common
to any two conditions. In one of the first global analyses,
Chuang et al. (1993), employing 400 segments of the E. coli
genome, found that the expression pattern was strikingly
similar when comparing responses to a strong osmotic
shock and to those entering stationary phase, indicating that
both conditions trigger a similar response. In fact, growth
stopped in both conditions. In Figure 3B, the Venn diagram
shows the shared known TFs that modulate the responses to
these three conditions. These were obtained based on the
information in RegulonDB describing the regulators of the
shared expressed genes shown in Figure 3A. The well-
known global regulators are present, such as CRP, Hns, and
IHF, among others (Martinez-Antonio and Collado-Vides,
2003). A further observation is that genes shared by the
three conditions, with the exception of osmY (regulation
unknown), tend to be regulated by two or more TFs, and
some of them have two promoters, suggesting that multiple
promoters, sometimes for different sigma factors, and
multiple regulators enable flexible response patterns. In the
example shown, the j38 factor in addition to j70 is required
for response to these conditions. Similar analyses in other
conditions strengthen the notion that few genes respond to
multiple environmental perturbations, whereas the remain-
ing genes fall into two categories: those expressed in a subset
of physiologically equivalent conditions and those expressed
under specific conditions (data not shown). We expect that,
in the near future, it will be possible to elucidate a set of
‘‘housekeeping’’ genes, including global TFs and any other
gene expressed from multiple promoters (for different sigma
factors) and/or multiple DNA-binding sites (for different
TFs); furthermore, genes required for individual states will
also be defined.
Figure 2. Summary of genes affected as a response to different envi-
ronmental conditions. Most genes are affected by only one condition,
although some genes are affected by as many as six different conditions.
Figure 3. Venn diagram of the gene response pattern to three different
growth conditions. (A) Genes affected. (B) TFs modulating the transcrip-
tional response for the same genes. Nonoptimal carbon source refers to
growth in a source other than glucose.
MARTINEZ-ANTONIO ET AL.: GROWTH CONDITIONS AND REGULATION OF TRANSCRIPTION IN E. COLI 747
UTILITY OF INFORMATION COLLECTED ABOUTEFFECT OF ENVIRONMENTAL CONDITIONS ONGENE TRANSCRIPTION
We know that information about the effect of environmental
conditions on gene transcription is essential for making
RegulonDB perhaps the best benchmark of knowledge for
use in analysis, implementation, and comparison of exper-
imental approaches and of novel computational methods
dealing with microarray data and gene regulation. We used
RegulobDB for evaluating the consistency between the
literature knowledge and microarray expression data (R.M.
Gutierrez-Rios et al., 2003). A typical array experiment
generates thousands of data points and poses major
challenges for storing and processing data (Huerta et al.,
2002). Techniques for biological analysis of microarrays
rely mainly on clustering methods using different mathe-
matical approaches (Bowtell, 1999). The clustering algo-
rithms enable us to group genes with a similar expression
pattern. In all cases, a literature comparison of the data
(Richmond et al., 1999; Sabatti et al., 2002; Tao et al., 1999)
or northern blot experiments (Martin and Rosner, 2002;
Schembri et al., 2002) should be made to ensure that some of
the genes induced or repressed in the array act as in the
condition or conditions tested. The fulfillment of a literature
comparison or experimental validation takes valuable time
that could be saved if all or most of the expression values
were collected from the literature for individual experiments
and could be stored in a dependable database.
This information is also useful as a reference to generate
hypotheses when designing experiments on other related
bacteria based on conservation of some aspects of the mo-
lecular mechanisms. Importantly, these data could help to
identify the affected genes in specific conditions and help
to understand and predict the culture behavior in a flask
or in a bioreactor without deliberate manipulation; for in-
stance, predicting the conditions that affect biotechnolog-
ical process, such as overexpression of proteins, polymers,
or other recombinant cellular components.
Another use is related to the dynamic descriptions of
regulatory networks. One way to address this problem is to
focus on the regulation of TFs themselves (Thieffry et al.,
1998). In this case, RegulonDB is an important data reser-
voir that provides an estimate of 25% of the E. coli network.
This occurs without considering comprehensive predictions
of operons, promoters, and TFs also contained in the data-
base. This information, if appropriately organized, can be
used to infer the logic of the regulatory wiring. By putting
together the results of global or isolated experiments that
affect the transcriptional regulation of genes, we can outline
the fraction of the total regulatory network that E. coli uses
in a given condition. Once we are able to identify the con-
nections between TFs and their corresponding environmen-
tal conditions, we may be able to predict additional genes
affected under such conditions.
Finally, the formalization of gene regulation into a
grammatical model that links the gene regulation mecha-
nisms with their physiology is strongly dependent on data
contained in RegulonDB (Collado-Vides et al., 1998). The
long-term goal is to organize a model that describes a
dynamic set of options for gene regulation, starting with a
description of the structural and the physiological properties
of the organization of operons and their regulatory domains.
CONCLUSIONS
Genomes and postgenomic high-throughput experimental
methodologies are paving the way for implementation of
models of a complete cell. Cell modeling expresses much of
the need and tendency of a new biology. Databases with
knowledge gathered from the literature, together with
bioinformatic methods, are an important aspect of this
integrated modeling goal. Because the behavior of the cell is
the component most easily observed, we will eventually need
to deal with a representation of media and growth conditions
in a form linked to, or directly included within, databases.
Given the complexity in physicochemical properties of
growth media, it is not easy to find a simple and adequate
level for their computational representation. We decided to
initiate such modeling in a strategy that combines simplicity
and biological relevance—for instance, by specifying
general or global conditions and the specific experimental
manipulation of such conditions. To our knowledge, there is
no other database that explicitly contains the effect of
environmental conditions on gene transcription. Computa-
tional genomics has grown in methods and goals, moving
from a sequence-centered approach to one in which
regulatory networks and interactions have become the main
focus of research. Physiological states impact the rapidity of
the microbial responses to perturbations that occur in natural
ecosystems, industrial bioreactors, and in situ and ex situ
bioremediation strategies. We believe that the current
expansion of data gathered and organized in RegulonDB
will reinforce and contribute to the work aimed at the goal of
modeling the behavior of E. coli as an entire cell.
The authors acknowledge an anonymous referee and Arturo
Medrano-Soto for the critical comments on the manuscript, and
Cesar Bonavides and Vıctor del Moral for their computer support.
References
Anantharaman V, Koonin EV, Aravind L. 2001. Regulatory potential,
phyletic distribution and evolution of ancient, intracellular small-
molecule-binding domains. J Mol Biol 307:1271– 1292.
Armitage JP. 1997. Behavioural responses of bacteria to light and oxygen.
Arch Microbiol 168:249–261.
Babu MM, Teichmann SA. 2003. Evolution of transcription factors and gene
regulatory network in Escherichia coli. Nucl Acid Res 31:1234– 1244.
Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M,
Collado-Vides J, Glasner JD, Rode CK, Mayhew G, Gregor J, Davis
NW, Kirkpatrick H, Goeden M, Mau R, Shao Y. 1997. The complete
genome sequence of Escherichia coli. Science 277:1453–1462.
Bowtell DD. 1999. Options available—from start to finish—for obtaining
expression data by microarray. Nat Genet 21:25– 32.
Bruggeman FJ, van Heeswijk WC, Boorgerd FC, Whesterhoff HV. 2000.
748 BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003
Macromolecular intelligence in microorganisms. Biol Chem 138:
965– 972.
Cases I, de Lorenzo V, Ouzounis CA. 2003. Transcription regulation and
environmental adaptation in bacteria. Trends Microbiol 11:248– 253.
Chuang SE, Daniels DL, Blattner FR. 1993. Global regulation of gene
expression in Escherichia coli. J Bacteriol 175:2026–2036.
Collado-Vides J. 1992. Grammatical model of the regulation of gene
expression. Proc Natl Acad Sci USA 89:9405– 9409.
Collado-Vides J, Gutierrez-Rıos RM, Bel-Enguix G. 1998. Networks of
transcriptional regulation encoded in a grammatical model. Biosys-
tems 47:103– 118.
Cruz-Ramos H, Crack J, Wu G, Hughes MN, Scott C, Thomson AJ, Green
J, Poole RK. 2002. NO sensing by FNR: Regulation of the Escherichia
coliNO-detoxifying flavohaemoglobin, Hmp. EMBO J 21:3235–3244.
Covert MW, Schilling CH, Palsson B. 2001. Regulation of gene expression
in flux balance models of metabolism. J Theor Biol 213:73– 88.
Dworkin J, Losick R. 2001. Linking nutritional status to gene activation
and development. Gene Dev 15:1051– 1054.
Ermolaeva MD, White O, Salzberg SL. 2001. Prediction of operons in
microbial genomes. Nucl Acids Res 29:1216 – 1221.
Frey AD, Fiaux J, Szyperski T, Wuthrich K, Bailey JE, Kallio PT. 2001.
Dissection of central carbon metabolism of hemoglobin-expressing
Escherichia coli by 13C nuclear magnetic resonance flux distribution
analysis in microaerobic bioprocesses. Appl Environ Microbiol 67:
680– 687.
Guitierrez-Rios RM, Rosenblueth DA, Loza JA, Huerta AM, Glasner JD,
Blattner FR, Collado-Vides J. 2003. Regulatory network of Esche-
richia coli: Consistency between literature knowledge and microarray
profiles. Genome Res 13:2435– 2443.
Holden C. 2002. Alliance launched to model E. coli. Science 297:
1459– 1460.
Huerta AM, Glasner JD, Jin H, Blattner FD, Gutierrez-Rıos RM, Collado-
Vides J. 2002. GETools: Gene expression tool for analysis of tran-
scriptome experiments in Escherichia coli. Trends Genet 18:217–218.
Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM,
Pellegrini-Toole A, Bonavides C, Gama-Castro S. 2002. The EcoCyc
database. Nucl Acids Res 30:56– 58.
Konopka A. 2000. Microbial physiological state at low growth rate in
natural and engineered ecosystems. Curr Opin Microbiol 3:244–247.
Koretke KK, Lupas AN, Warren PV, Rosenberg M, Brown JR. 2000.
Evolution of two-component signal transduction. Mol Biol Evol 17:
1956– 1970.
Lengeler JW. 2000. Metabolic networks: A signal-oriented approach to
cellular models. Biol Chem 381:911–920.
Maltsev N, Marland E, Yu GX, Bhatnagar S, Lusk R. 2002. Sentra, a
database of signal transduction proteins. Nucl Acids Res 30:349–350.
Martin RG, Rosner JL. 2002. Genomics of the marA/soxS/rob regulon of
Escherichia coli: identification of directly activated promoters by
application of molecular genetics and informatics to microarray data.
Mol Microbiol 44:1611 – 1624.
Martinez-Antonio A, Collado-Vides J. 2003. Identifying global regulators
in transcriptional regulatory networks in bacteria. Current Opinion in
Microbiol 6:482– 489.
Menon NK, Robbins J, Wendt JC, Shanmugam KT, Przybyla AE.
1991. Mutational analysis and characterization of the Escherichia
coli hya operon, which encodes [NiFe] hydrogenase 1. J Bacteriol 173:
4851– 4861.
Monod J. 1966. From enzymatic adaptation to allosteric transitions.
Science 154:475–483.
Ninfa AJ. 1996. Regulation of gene transcription by extracellular stimuli. In:
Cellular and Molecular Biology: Escherichia coli and Salmonella, edn
2nd. Edited by Neidhardt FC, Curtiss III R, Ingraham J, Lin ECC, Low
KB, Magasanik B, Reznikoff W, Schaechter M, Umbarger HE and Riley
M. Washington, DC: American Society for Microbiology. 1246–1261.
Oltvai ZN, Barabasi AL. 2002. Systems biology. Life’s complexity
pyramid. Science 298:763– 764.
Oshima T, Aiba H, Masuda Y, Kanaya S, Sugiura M, Wanner BL, Mori H,
Mizuno T. 2002. Transcriptome analysis of all two-component reg-
ulatory system mutants of Escherichia coli K-12. Mol Microbiol 46:
281–291.
Overbeek R, Larsen N, Pusch GD, D’Souza M, Selkov E Jr, Kyrpides N,
Fonstein M, Maltsev N, Selkov E. 2000. WIT: Integrated system for
high-throughput genome sequence analysis and metabolic reconstruc-
tion. Nucl Acids Res 28:123– 125.
Pellequer JL, Brudler R, Getzoff ED. 1999. Biological sensors: More than
one way to sense oxygen. Curr Biol 9:R416– R418.
Perez-Rueda E, Collado-Vides J. 2000. The repertoire of DNA-binding
TFs in Escherichia coli. Nucl Acids Res 28:1838– 1847.
Phelps TJ, Palumbo AV, Beliaev AS. 2002. Metabolomics and microarrays
for improved understanding of phenotypic characteristics controlled by
both genomics and environmentalal constraints. Curr Opin Biotechnol
13:20– 24.
Pogliano J, Lynch AS, Belin D, Lin EC, Beckwith J. 1997. Regulation
of Escherichia coli cell envelope proteins involved in protein folding
and degradation by the Cpx two-component system. Gene Dev 11:
1169– 1182.
Poolman B, Blount P, Folgering JH, Friesen RH, Moe PC, van der Heide
T. 2002. How do membrane proteins sense water stress? Mol Micro-
biol 44:889– 902.
Raivio TL, Silhavy TJ. 2001. Periplasmic stress and ECF sigma factors.
Annu Rev Microbiol 55:591– 624.
Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR. 1999. Genome-
wide expression profiling in Escherichia coli K-12. Nucl Acids Res
27:3821– 3835.
Rowbury RJ. 2001. Extracellular sensing components and extracellular
induction component alarmones give early warning against stress in
Escherichia coli. Adv Microbiol Physiol 44:215–257.
Rui S, Tse-Dinh YC. 2003. Topoisomerase function during bacterial
responses to environmental challenge. Front Biosci 1:D256– D263.
Sabatti C, Rohlin L, Oh MK, Liao JC. 2002. Co-expression pattern from
DNA microarray experiments as a tool for operon prediction. Nucl
Acids Res 30:2886–2893.
Salgado H, Moreno-Hagelsieb G, Smith TF, Collado-Vides J. 2000.
Operons in Escherichia coli: genomic analyses and predictions. Proc
Natl Acad Sci USA 97:6652– 6657.
Salgado H, Santos-Zavaleta A, Gama-Castro S, Millan-Zarate D, Dıaz-
Peredo E, Sanchez-Solano F, Perez-Rueda E, Bonavides-Martınez C,
Collado-Vides J. 2001. RegulonDB (version 3.2): Transcriptional
regulation and operon organization in Escherichia coli K. Nucl Acids
Res 29:72– 74.
Santos JM, Lobo M, Matos AP, De Pedro MA, Arraiano CM. 2002. The
gene bolA regulates dacA (PBP5), dacC (PBP6) and ampC (AmpC),
promoting normal morphology in Escherichia coli. Mol Microbiol
45:1729– 1740.
Schembri MA, Ussery DW, Workman C, Hasman H, Klemm P. 2002.
DNA microarray analysis of fim mutations in Escherichia coli. Mol
Genet Genomics 267:721–729.
Shen-Orr SS, Milo R, Mangan S, Alon U. 2002. Network motifs in the
transcriptional regulation network of Escherichia coli. Nat Genet 31:
64–68.
Tao H, Bausch C, Richmond C, Blattner FR, Conway T. 1999. Functional
genomics: Expression analysis of Escherichia coli growing on minimal
and rich media. J Bacteriol 181:6425– 6440.
Taylor BL, Zhulin IB. 1998. In search of higher energy: Metabolism-
dependent behavior in bacteria. Mol Microbiol 28:683– 690.
Thieffry D, Huerta AM, Perez-Rueda E, Collado-Vides J. 1998. From
specific gene regulation to genomic networks: A global analysis of
transcriptional regulation in Escherichia coli. BioEssays 20:433– 440.
Thieffry D, Salgado H, Huerta AM, Collado-Vides J. 1998b. Prediction of
transcription regulatory sites in the complete genome of Escherichia
coli. Bioinformatics 14:391– 400.
Vicente M, Chater KF, de Lorenzo V. 1999. Bacterial transcription factors
involved in global regulation. Mol Microbiol 33:8 –17.
Zheng Y, Szustakowski JD, Fortnow L, Roberts RJ, Kasif S. 2002.
Computational identification of operons in microbial genomes. Ge-
nome Res 12:1221 – 1230.
MARTINEZ-ANTONIO ET AL.: GROWTH CONDITIONS AND REGULATION OF TRANSCRIPTION IN E. COLI 749