environmental conditions and transcriptional regulation in escherichia coli: a physiological...

7
Environmental Conditions and Transcriptional Regulation in Escherichia coli: A Physiological Integrative Approach Agustino Martı ´nez-Antonio, Heladia Salgado, Socorro Gama-Castro, Rosa Marı ´a Gutie ´rrez-Rı ´os, Vero ´nica Jime ´nez-Jacinto, Julio Collado-Vides Program of Computational Genomics, CIFN, UNAM AP 565-A Cuernavaca, Morelos 62100, Mexico; telephone: +527-77-313-2063; fax: +527-77-317- 5581; e-mail: ecoli-t1 @ cifn.unam.mx Published online 24 November 2003 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/bit.10846 Abstract: Bacteria develop a number of devices for sens- ing, responding, and adapting to different environmental conditions. Understanding within a genomic perspective how the transcriptional machinery of bacteria is modulat- ed, as a response to changing conditions, is a major chal- lenge for biologists. Knowledge of which genes are turned on or turned off under specific conditions is essential for our understanding of cell behavior. In this study we de- scribe how the information pertaining to gene expression and associated growth conditions (even with very little knowledge of the associated regulatory mechanisms) is gathered from the literature and incorporated into Regulon- DB, a database on transcriptional regulation and operon organization in E. coli. The link between growth condi- tions, signal transduction, and transcriptional regulation is modeled in the database in a simple format that high- lights biological relevant information. As far as we know, there is no other database that explicitly clarifies the effect of environmental conditions on gene transcription. We dis- cuss how this knowledge constitutes a benchmark that will impact future research aimed at integration of regulatory responses in the cell; for instance, analysis of microarrays, predicting culture behavior in biotechnological processes, and comprehension of dynamics of regulatory networks. This integrated knowledge will contribute to the future goal of modeling the behavior of E. coli as an entire cell. The RegulonDB database can be accessed on the web at the URL: http://www.cifn.unam.mx/Computational_Biology/ regulondb/. B 2003 Wiley Periodicals, Inc. Keywords: Escherichia coli; transcriptional regulation; environmental conditions; transcription factors; gene ex- pression INTRODUCTION Bacteria are the most successful life forms, with different species able to colonize in the most extreme and diverse conditions. Although they have a compact genome, bacteria develop and exploit all imaginable devices, at the molec- ular, physiological, and cellular levels, to survive and pro- liferate (Vicente et al., 1999). Studies of bacteria in the laboratory can frequently give a false impression of their normal growth and survival abilities because their behavior in the laboratory can, for many reasons, differ markedly from their behavior in nature (Bruggeman et al., 2000). One of the reasons for this difference is that exposure to stress is a normal way of life for bacteria in nearly all the natural situations (Konopka, 2000). In nature, as in bioreactors, bacteria need to incorporate from the milieu the necessary nutritional elements required for their survival (Taylor and Zhulin, 1998). A free-living bacterium maintains a constant monitoring of the extracellular physicochemical conditions (Armitage, 1997; Lengeler, 2000; Ninfa, 1996; Pellequer et al., 1999; Poolman et al., 2002). This is achieved by genes whose products are involved in sensing and incorporating different nutritional elements, as well as sensing the concentration of elements of insult to respond and modify their gene expression patterns to adjust growth (Koretke et al., 2000; Rowbury, 2001; Rui and Tse-Dinh, 2003). These sensing systems are either part of the transcriptional machinery as in the two-component systems or are connected through intermediates and products of metabolic pathways. Indeed, many small metabolites act as allosteric effectors that bind transcriptional factors (TFs) to modulate their activity (Monod, 1966). With the large number of available genomes, a new perspective of molecular mechanisms can be explored that integrates physiological conditions with regulatory processes at the cellular level (Cases et al., 2003). For instance, a recent study with the E.coli TFs identified domains for binding small metabolites for more than half of the total set of known regulators (Babu and Teichmann, 2003; Pe ´rez- Rueda and Collado-Vides, 2000). E. coli is probably the most thoroughly studied model organism (Blattner et al., 1997) and from which much of our current knowledge of microbial physiology, biochemistry, B 2003 Wiley Periodicals, Inc. Correspondence to: J. Collado-Vides Contract grant sponsors: NIH; CONACyT of Mexico Contract grant numbers: GM62205-02; 0028

Upload: agustino-martinez-antonio

Post on 06-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

Environmental Conditions andTranscriptional Regulation in Escherichiacoli: A Physiological Integrative Approach

Agustino Martınez-Antonio, Heladia Salgado, Socorro Gama-Castro,Rosa Marıa Gutierrez-Rıos, Veronica Jimenez-Jacinto, Julio Collado-Vides

Program of Computational Genomics, CIFN, UNAM AP 565-A Cuernavaca,Morelos 62100, Mexico; telephone: +527-77-313-2063; fax: +527-77-317-5581; e-mail: [email protected]

Published online 24 November 2003 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/bit.10846

Abstract: Bacteria develop a number of devices for sens-ing, responding, and adapting to different environmentalconditions. Understanding within a genomic perspectivehow the transcriptional machinery of bacteria is modulat-ed, as a response to changing conditions, is a major chal-lenge for biologists. Knowledge of which genes are turnedon or turned off under specific conditions is essential forour understanding of cell behavior. In this study we de-scribe how the information pertaining to gene expressionand associated growth conditions (even with very littleknowledge of the associated regulatory mechanisms) isgathered from the literature and incorporated into Regulon-DB, a database on transcriptional regulation and operonorganization in E. coli. The link between growth condi-tions, signal transduction, and transcriptional regulationis modeled in the database in a simple format that high-lights biological relevant information. As far as we know,there is no other database that explicitly clarifies the effectof environmental conditions on gene transcription. We dis-cuss how this knowledge constitutes a benchmark that willimpact future research aimed at integration of regulatoryresponses in the cell; for instance, analysis of microarrays,predicting culture behavior in biotechnological processes,and comprehension of dynamics of regulatory networks.This integrated knowledge will contribute to the future goalof modeling the behavior of E. coli as an entire cell. TheRegulonDB database can be accessed on the web at theURL: http://www.cifn.unam.mx/Computational_Biology/regulondb/. B 2003 Wiley Periodicals, Inc.

Keywords: Escherichia coli; transcriptional regulation;environmental conditions; transcription factors; gene ex-pression

INTRODUCTION

Bacteria are the most successful life forms, with different

species able to colonize in the most extreme and diverse

conditions. Although they have a compact genome, bacteria

develop and exploit all imaginable devices, at the molec-

ular, physiological, and cellular levels, to survive and pro-

liferate (Vicente et al., 1999). Studies of bacteria in the

laboratory can frequently give a false impression of their

normal growth and survival abilities because their behavior

in the laboratory can, for many reasons, differ markedly

from their behavior in nature (Bruggeman et al., 2000). One

of the reasons for this difference is that exposure to stress is

a normal way of life for bacteria in nearly all the natural

situations (Konopka, 2000). In nature, as in bioreactors,

bacteria need to incorporate from the milieu the necessary

nutritional elements required for their survival (Taylor and

Zhulin, 1998). A free-living bacterium maintains a constant

monitoring of the extracellular physicochemical conditions

(Armitage, 1997; Lengeler, 2000; Ninfa, 1996; Pellequer

et al., 1999; Poolman et al., 2002). This is achieved by

genes whose products are involved in sensing and

incorporating different nutritional elements, as well as

sensing the concentration of elements of insult to respond

and modify their gene expression patterns to adjust growth

(Koretke et al., 2000; Rowbury, 2001; Rui and Tse-Dinh,

2003). These sensing systems are either part of the

transcriptional machinery as in the two-component systems

or are connected through intermediates and products of

metabolic pathways. Indeed, many small metabolites act as

allosteric effectors that bind transcriptional factors (TFs) to

modulate their activity (Monod, 1966). With the large

number of available genomes, a new perspective of

molecular mechanisms can be explored that integrates

physiological conditions with regulatory processes at the

cellular level (Cases et al., 2003). For instance, a recent

study with the E.coli TFs identified domains for binding

small metabolites for more than half of the total set of

known regulators (Babu and Teichmann, 2003; Perez-

Rueda and Collado-Vides, 2000).

E. coli is probably the most thoroughly studied model

organism (Blattner et al., 1997) and from which much of our

current knowledge of microbial physiology, biochemistry,

B 2003 Wiley Periodicals, Inc.

Correspondence to: J. Collado-Vides

Contract grant sponsors: NIH; CONACyT of Mexico

Contract grant numbers: GM62205-02; 0028

Page 2: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

and bacterial genetics began. Some of the presently more

active fields of research involve modeling of biochemical

and genetic pathways (Shen-Orr et al., 2002), as well as the

articulation of one of the most ambitious projects: modeling

of E. coli as an entire cell (Holden, 2002). Databases that

gather organized information on molecular mechanisms and

function of cell machinery represent an essential infra-

structure for such ambitious goals, but also represent an

invaluable knowledge repository for basic and biotechno-

logical research. Thus, the development of such integrative

databases for the combined analysis of various types of

physiological data is a major goal of much of the current

research in life sciences. RegulonDB is a relational database

containing information on mechanisms at the level of

transcriptional initiation as well as operon organization with

their promoters and terminator signals in the Escherichia

coli K-12 genome (Salgado et al., 2001). The database is

updated constantly by searching in original publications, and

is complemented by computational predictions (Salgado

et al., 2000). RegulonDB has been used in different types of

analyses by the scientific community, such as: prediction of

regulatory binding sites (Thieffry et al., 1998b); prediction

of operons (Ermolaeva et al., 2001; Salgado et al., 2000);

reconstruction of metabolic pathways using regulatory

information (Covert et al, 2001); studies on the evolution

of regulatory mechanisms (Zheng et al., 2002); in a gram-

matical modeling of transcriptional regulation (Collado-

Vides, 1992); and in other analyses of regulatory networks

(Babu and Teichmann, 2003; Shen-Orr et al., 2002; Thieffry

et al., 1998). Furthermore, the mechanistic information

gathered has been included in the EcoCyc database (Karp

et al., 2002).

This review describes how we have incorporated in

RegulonDB the effects of environmental conditions on gene

transcription, and exemplifies possible applications to some

research areas that would benefit from this information.

CELL MECHANISMS FOR SENSING THEENVIRONMENTAL CONDITIONS

In natural environments, there are numerous stresses to

which microorganisms can be exposed, such as heat, cold,

lethal levels of metal ions, chlorine, acrylate, and NH3, or

attack by colicins, bacteriophages, or other biological agents

such as protozoa. In the case of organisms that contaminate

food, stress occurs due to exposure to heat, cold, irradiation,

acidity, salts, or high sugar concentration. Infecting orga-

nisms entering into animals, plants, or the human body en-

counter numerous stresses, such as extreme pH levels,

chlorine, oxidative components, enzymes, and bile salts

(Rowbury, 2001). Usually these responses follow an induc-

ible process whereby specific genes are switched on by the

interaction of the triggering agent with a stimulus sensor.

Information-processing or signal-transduction pathways

recognize various physicochemical stimuli, amplify and pro-

cess signals, and trigger an adaptive response. For instance,

in a two-component transduction system, the DNA-binding

transcriptional regulator is activated by phosphorylation,

generally on an aspartate residue, by a transmembrane ki-

nase. The TFs that belong to this class in E. coli and other

bacteria include BvgA, ComA, DctR, DegU, EvgA, FimZ,

FixJ, GacA, GlpR, NarL, NarP, NodW, RcsB, and UhpA,

among others (Oshima et al., 2002). A second class of

regulators are activated when bound to autoinducer mole-

cules such as N-(3-oxohexanoyl)-L-homoserine lactone

(OHHL), involved in bacteria cell–cell communication.

The proteins that belong to this class are CarR, EchR,

EsaR, ExpR, LasR, LuxR, PhzR, RhlR, TraR, and YenR.

The Sentra database contains a classification of signal-

transduction proteins for over 43 prokaryote genomes

(Maltsev et al., 2002).

The diffusion of substrates into the cell is the central

problem in nutrient limitation conditions; therefore, to maxi-

mize transport, increased synthesis of membrane-bound

permeases for the limiting nutrient or de novo synthesis

of high-affinity transport systems is a frequent strategy

(Konopka, 2000). In this manner, the change in concen-

tration of small-molecule metabolites or physicochemical

Figure 1. Sensing mechanism, pathway transduction, and transcriptional

regulation response in E. coli. The cpx two-component system responds to

alterations in the folding of membrane proteins. This system is induced by

conditions such as high pH or protein overexpression. The flow of infor-

mation to express the yihE –dsbA operon as a response to NlpE over-

expression in an in vitro study is as follows (Pogliano et al., 1997; Raivio and

Silhavy, 2001): (1) When NlpE is not overexpressed, the CpxA sensor

protein is maintained in an inactive state by the periplasmic inhibitor, CpxP.

(2) When NlpE is overexpressed and consequently misfolded, CpxA is

relieved of CpxP. (3) CpxA can be autophosphorylated (CpxA-P), then it

transfers the phosphate to the CpxR response regulator protein (CpxR-P).

(4) CpxR-P binds to �49-bp upstream promoter p2 of the yhiE –dsbA

operon, activating the transcription (yihE: unknown function; dsbA: disul-

fide isomerase, involved in the periplasmic protein folding by catalyzing

disulfide bonds formed between pairs of cysteines). Promoter p1 is not active

under this condition. (5) The system is turned off and returns to its previous

state when NlpE protein is not overexpressed or is correctly folded. OM,

outer membrane; PP, periplasm, IM, inner membrane; P, phosphate.

744 BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003

Page 3: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

determinants can modulate the transcriptional response by

means of allosteric changes in specific TFs. In E. coli,

Salmonella, and other bacteria, some classic examples of

many others, of how gene expression changes as a con-

sequence of specific nutrients, such as carbon and ammo-

nium sources, have been reviewed recently (Dworkin and

Losick, 2001). Figure 1 illustrates one particular example of

the integration of sensing, pathway transduction, and tran-

scriptional regulation in response to misfolding of peri-

plasmic proteins in E. coli (Pogliano et al., 1997; Raivio and

Silhavy, 2001).

LINK BETWEEN ENVIRONMENTAL CONDITIONSAND GENE TRANSCRIPTION

The mechanisms by which cells sense and respond to

changes in their environment are difficult to elucidate in

part because, even for simple cells like bacteria, a change in

growth conditions can perturb the complex network of

metabolic interactions that govern the physiological state

of the cell. The number of genes affected can be appre-

ciated in the several available microarray experiments

wherein we have estimated that no less than 100 genes are

affected in a single shift of conditions (data not shown).

Therefore, it is difficult to identify the early, specific steps

in the myriad of intermediate metabolites responsible for

adaptive response that lead to the overall effects in gene

expression. The total set of metabolites or metabolome, as

identified by nuclear magnetic resonance (NMR) and mass

spectroscopy, has been shown to be considerably different

when comparing shifts from microaerophilic, aerobic, or

anaerobic metabolism in E. coli (Frey et al., 2001).

Mixtures of nonlabeled and labeled substrates enable one

to follow the catabolic and anabolic processes and the

intermediates formed, wherein the resulting flux maps

may be regarded as the phenotypic correlate of a micro-

array expression profile (Phelps et al., 2002). Thus, the

majority of transcription regulators respond to cell physi-

ology directly by the presence or absence of small metab-

olites in addition to two-component sensing systems

(Anantharaman et al., 2001, Babu and Teichmann, 2003,

Oltvai and Barabasi, 2002). Table I shows the metabolites

that interact with TFs in E. coli based on the current

knowledge from RegulonDB. We can observe metabolite

intermediates as sugars, amino acids, nucleotides, vitamins,

metal ions, and physicochemical determinants such as

redox conditioners.

GATHERING THE EFFECT OF ENVIRONMENTALCONDITIONS ON GENE TRANSCRIPTION

Many laboratories have analyzed different experimental

conditions and studied their effect on gene transcription,

and other laboratories continue to do so. If we were to

gather detailed quantitative information on experimental

growth conditions and parameters it would it would be very

labor-intensive, but also its utility would be questionable.

Such an effort in metabolism was already undertaken by

Seiko and colleagues (Overbeek et al., 2000) in what

became the WIT database. After an analysis of several

different conditions and the systems involved, we chose to

favor discrete models that take into account growth

conditions and their effect in the regulation of gene

expression. We concentrated on a strategy whereby the

following properties and descriptions are considered

essential: (i) a global or general descriptor of the condition

studied; (ii) the control or initial condition; (iii) the

specifically tested experimental condition; (iv) the growth

media used, described in simple but precise terms; (v) the

list of affected genes; (vi) the type of effect (induction,

repression, or no effect) on the genes analyzed; (vii) the

experimental evidence; and (viii) the associated references.

We are aware that this is a simplified discrete modeling of a

very rich and complex biological and experimental setting.

It is clear that growth media can be specified in great detail,

as well as the effect of signal transduction mechanisms on

transcriptional regulation.

In addition to the previously mentioned properties and

descriptions considered essential or obligatory, in some

Table I. List of metabolite effectors gathered on RegulonDB that affect

the conformation and regulatory activity of TFs in E. coli.

Sugars Amino acids

6-Phosphogluconate Alanine

Acyl-CoA D-serine

Allolactose Glycine

Arabinose Homocysteine

Deoxyribose-5-phosphate L-arginine

Formate Leucine

Fructose-1,6-biphosphate L-tryptophan

Fructose-1-phosphate N-acetyl-L-serine

Fucose O-acetyl-L-serine

Galactose Phenylalanine

Glucitol Tyrosine

Gluconate S-adenosylmethionine

Glycerol-1-phosphate Choline

Gylcerol-3-phosphate

Glycolate Vitamins

Maltotriose-ATP Biotin-5V-AMP

Melibiose Vitamin B12 (cyanocobalamin)

a-Cetohydroxibutyrate

a-Cetolactate Ions

Phosphoenol pyruvate Ferric complex

Pyruvate Na+

L-rhamnose Zn+

Threalose-6-phosphate Mn+

Raffinose Molybdate

Ribose Phosphate

Xylose

Others

Nucleotides Redox cell state

Adenosine Cyanate

ATP Formate

cAMP Methylation

Cytidine Salicylate

Guanine Sulfur

Hypoxanthine Thiosulfate

Purine

Xanthosine

MARTINEZ-ANTONIO ET AL.: GROWTH CONDITIONS AND REGULATION OF TRANSCRIPTION IN E. COLI 745

Page 4: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

experiments, additional knowledge is generated by specify-

ing some of the components involved in the connection

between changes in conditions and gene expression. This

includes mechanistic properties that are already part of

the RegulonDB model, such as: (i) the intermediate metab-

olites or proteins that participate in the regulatory sensing

mechanism; (ii) the TF involved, especially in cases in which

the experiment is studying mutations of TFs; (iii) the

allosteric conformation and associated effector; (iv) the set

of sites in the DNA involved in the regulation; and (v) the

gene and transcription unit that the gene belongs to,

specifying the promoter and terminators involved. Ideally,

an efficient database design requires selection of a

representation that best models the information of interest.

Table II illustrates how the data in Figure 1 are organized in

RegulonDB using a more suitable format for automatic anal-

yses. This information can then be easily transformed into a

discrete format, appropriate for computational analyses,

depending on a particular biological question. For example,

0 might represent absence, inactive conformation, or

repression, whereas 1 might represent the opposite.

INFORMATION GATHERED ABOUTENVIRONMENTAL CONDITIONS ON REGULONDB

Table III provides a summary of the information we gathered

until July 2003, specifically with regard to physiological

conditions and their effect on gene expression. It is important

to note that a global condition may be represented by differ-

ent specific conditions, and a gene may respond, of course, to

one or several experimental conditions. The numbers in the

Table III account for unique cases; for instance, so far we

have conditions for 325 different genes, although some may

be affected by more than one condition. Table IV shows the

global conditions annotated so far, and the corresponding

numbers of genes and TFs involved. These are the main

global conditions that have been analyzed experimentally.

As already suggested, compilation of this type of informa-

tion is usually more complicated than the gathering of

mechanistic information, due mostly to the heterogeneity in

the specific experimental conditions, buffers, and concen-

trations of metabolites used; times of sampling; different

Table II. Textual form of growth conditions and its effect in gene transcription (as gathered RegulonDB)—the same information as in the Figure 1.

Global condition Control condition Specific condition Medium Gene Effect INa CTRb RIc TUd PMe

Protein overproduction No-NlpE overproduction NlpE overproduction Minimal yihE Induction CpxP CpxR-CpxAP �49 YihE-dsbA p2

Protein overproduction No-NlpE overproduction NlpE overproduction Minimal dsbA Induction CpxP CpxR-CpxAP �49 YihE-dsbA p2

aIntermediate.bTF conformation.cCenter position of operator site.dTranscription unit (promoter, terminator).ePromoters.

Table III. Summary of elements gathered related to the effect of growth

conditions on gene transcription, as of July 2003.

Objects Total

Global conditionsa 29

Experimental conditionsb 83

Genes 325

Transcription units 56

Promoters 115

Conformations 39

Transcription factors 32

Intermediates 29

References 227

Types of evidence 3

aGlobal condition refers to one general growth condition; for instance,

anaerobiosis or carbon source.bExperimental conditions refer to more specific growth conditions

included in one global condition; for instance, the transition in growth

media from glucose to glycerol or other precise carbon source.

Table IV. Total of genes affected by global growth conditions and the

TFs involved in its regulation.

Global condition Total genes Promoters

Transcription

factors

Aerobiosis 8 7 0

Alkylating condition 2 1 0

Amino acids 35 27 27

Anaerobiosis 78 48 36

Antibiotic stress 30 18 0

Carbon starvation 21 9 6

Cell density 6 3 1

Cold shock 7 1 1

Exponential phase 11 5 0

Heat shock 38 23 2

Hydrostatic pressure 6 6 0

Iron starvation 2 0 2

Nitrogen starvation 13 1 0

Nonoptimal carbon source 99 1 2

Nucleotides 4 2 1

Osmotic shift 65 22 2

Oxidative stress 32 13 23

Protein overproduction 20 9 3

Rich medium 3 1 0

Stationary phase 69 20 1

Sulfur limited 1 1 1

Metals metabolism 12 0 0

Motility 2 2 2

Optimal carbon source (glucose) 30 18 4

pH shift 20 15 5

Phosphate starvation 8 3 3

Stringent response 6 2 0

Vitamins 1 1 0

746 BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003

Page 5: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

methods of quantification; different ways of preparing the

inocula; different strains, etc.

The different physiological (global) conditions affecting

the same genes are shown in Figure 2. With the limited data

collected so far, it can be observed that most of the genes

(67%) are affected by only one condition, and fewer genes

are affected by two or more conditions. For instance, bolA,

hmpA, and hyaA respond to six different global conditions.

BolA is a regulator of cell-wall biosynthetic enzymes with

different roles in cell morphology and cell division; it is es-

sential for normal cell morphology in stationary phase and

under conditions of starvation (Santos et al., 2002). Flavo-

hemoglobin HmpA is involved in protective responses

to oxidative stress, specifically exposition to nitric oxide

(Cruz-Ramos et al., 2002). HyaA is one hydrogenase-1

small subunit involved in aerobic respiration (Menon et al.,

1991). These proteins involved in cell morphology, oxida-

tive stress response, and aerobic respiration give us an idea

of the type of activity that should be part of the core

required for multiple-condition responses. As the informa-

tion in RegulonDB grows more complete, it will become

possible to develop a more complete picture of the network

of physiological responses in bacteria.

A more detailed example of the overlapping of genes

used in three different global conditions is shown in a Venn

diagram in Figure 3. The number of genes responding to

stationary phase, osmotic shift, and nonoptimal carbon

source (any carbon source other than glucose) is shown in

Figure 3A; we can observe a small fraction of genes

common to all three conditions, as well as others common

to any two conditions. In one of the first global analyses,

Chuang et al. (1993), employing 400 segments of the E. coli

genome, found that the expression pattern was strikingly

similar when comparing responses to a strong osmotic

shock and to those entering stationary phase, indicating that

both conditions trigger a similar response. In fact, growth

stopped in both conditions. In Figure 3B, the Venn diagram

shows the shared known TFs that modulate the responses to

these three conditions. These were obtained based on the

information in RegulonDB describing the regulators of the

shared expressed genes shown in Figure 3A. The well-

known global regulators are present, such as CRP, Hns, and

IHF, among others (Martinez-Antonio and Collado-Vides,

2003). A further observation is that genes shared by the

three conditions, with the exception of osmY (regulation

unknown), tend to be regulated by two or more TFs, and

some of them have two promoters, suggesting that multiple

promoters, sometimes for different sigma factors, and

multiple regulators enable flexible response patterns. In the

example shown, the j38 factor in addition to j70 is required

for response to these conditions. Similar analyses in other

conditions strengthen the notion that few genes respond to

multiple environmental perturbations, whereas the remain-

ing genes fall into two categories: those expressed in a subset

of physiologically equivalent conditions and those expressed

under specific conditions (data not shown). We expect that,

in the near future, it will be possible to elucidate a set of

‘‘housekeeping’’ genes, including global TFs and any other

gene expressed from multiple promoters (for different sigma

factors) and/or multiple DNA-binding sites (for different

TFs); furthermore, genes required for individual states will

also be defined.

Figure 2. Summary of genes affected as a response to different envi-

ronmental conditions. Most genes are affected by only one condition,

although some genes are affected by as many as six different conditions.

Figure 3. Venn diagram of the gene response pattern to three different

growth conditions. (A) Genes affected. (B) TFs modulating the transcrip-

tional response for the same genes. Nonoptimal carbon source refers to

growth in a source other than glucose.

MARTINEZ-ANTONIO ET AL.: GROWTH CONDITIONS AND REGULATION OF TRANSCRIPTION IN E. COLI 747

Page 6: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

UTILITY OF INFORMATION COLLECTED ABOUTEFFECT OF ENVIRONMENTAL CONDITIONS ONGENE TRANSCRIPTION

We know that information about the effect of environmental

conditions on gene transcription is essential for making

RegulonDB perhaps the best benchmark of knowledge for

use in analysis, implementation, and comparison of exper-

imental approaches and of novel computational methods

dealing with microarray data and gene regulation. We used

RegulobDB for evaluating the consistency between the

literature knowledge and microarray expression data (R.M.

Gutierrez-Rios et al., 2003). A typical array experiment

generates thousands of data points and poses major

challenges for storing and processing data (Huerta et al.,

2002). Techniques for biological analysis of microarrays

rely mainly on clustering methods using different mathe-

matical approaches (Bowtell, 1999). The clustering algo-

rithms enable us to group genes with a similar expression

pattern. In all cases, a literature comparison of the data

(Richmond et al., 1999; Sabatti et al., 2002; Tao et al., 1999)

or northern blot experiments (Martin and Rosner, 2002;

Schembri et al., 2002) should be made to ensure that some of

the genes induced or repressed in the array act as in the

condition or conditions tested. The fulfillment of a literature

comparison or experimental validation takes valuable time

that could be saved if all or most of the expression values

were collected from the literature for individual experiments

and could be stored in a dependable database.

This information is also useful as a reference to generate

hypotheses when designing experiments on other related

bacteria based on conservation of some aspects of the mo-

lecular mechanisms. Importantly, these data could help to

identify the affected genes in specific conditions and help

to understand and predict the culture behavior in a flask

or in a bioreactor without deliberate manipulation; for in-

stance, predicting the conditions that affect biotechnolog-

ical process, such as overexpression of proteins, polymers,

or other recombinant cellular components.

Another use is related to the dynamic descriptions of

regulatory networks. One way to address this problem is to

focus on the regulation of TFs themselves (Thieffry et al.,

1998). In this case, RegulonDB is an important data reser-

voir that provides an estimate of 25% of the E. coli network.

This occurs without considering comprehensive predictions

of operons, promoters, and TFs also contained in the data-

base. This information, if appropriately organized, can be

used to infer the logic of the regulatory wiring. By putting

together the results of global or isolated experiments that

affect the transcriptional regulation of genes, we can outline

the fraction of the total regulatory network that E. coli uses

in a given condition. Once we are able to identify the con-

nections between TFs and their corresponding environmen-

tal conditions, we may be able to predict additional genes

affected under such conditions.

Finally, the formalization of gene regulation into a

grammatical model that links the gene regulation mecha-

nisms with their physiology is strongly dependent on data

contained in RegulonDB (Collado-Vides et al., 1998). The

long-term goal is to organize a model that describes a

dynamic set of options for gene regulation, starting with a

description of the structural and the physiological properties

of the organization of operons and their regulatory domains.

CONCLUSIONS

Genomes and postgenomic high-throughput experimental

methodologies are paving the way for implementation of

models of a complete cell. Cell modeling expresses much of

the need and tendency of a new biology. Databases with

knowledge gathered from the literature, together with

bioinformatic methods, are an important aspect of this

integrated modeling goal. Because the behavior of the cell is

the component most easily observed, we will eventually need

to deal with a representation of media and growth conditions

in a form linked to, or directly included within, databases.

Given the complexity in physicochemical properties of

growth media, it is not easy to find a simple and adequate

level for their computational representation. We decided to

initiate such modeling in a strategy that combines simplicity

and biological relevance—for instance, by specifying

general or global conditions and the specific experimental

manipulation of such conditions. To our knowledge, there is

no other database that explicitly contains the effect of

environmental conditions on gene transcription. Computa-

tional genomics has grown in methods and goals, moving

from a sequence-centered approach to one in which

regulatory networks and interactions have become the main

focus of research. Physiological states impact the rapidity of

the microbial responses to perturbations that occur in natural

ecosystems, industrial bioreactors, and in situ and ex situ

bioremediation strategies. We believe that the current

expansion of data gathered and organized in RegulonDB

will reinforce and contribute to the work aimed at the goal of

modeling the behavior of E. coli as an entire cell.

The authors acknowledge an anonymous referee and Arturo

Medrano-Soto for the critical comments on the manuscript, and

Cesar Bonavides and Vıctor del Moral for their computer support.

References

Anantharaman V, Koonin EV, Aravind L. 2001. Regulatory potential,

phyletic distribution and evolution of ancient, intracellular small-

molecule-binding domains. J Mol Biol 307:1271– 1292.

Armitage JP. 1997. Behavioural responses of bacteria to light and oxygen.

Arch Microbiol 168:249–261.

Babu MM, Teichmann SA. 2003. Evolution of transcription factors and gene

regulatory network in Escherichia coli. Nucl Acid Res 31:1234– 1244.

Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M,

Collado-Vides J, Glasner JD, Rode CK, Mayhew G, Gregor J, Davis

NW, Kirkpatrick H, Goeden M, Mau R, Shao Y. 1997. The complete

genome sequence of Escherichia coli. Science 277:1453–1462.

Bowtell DD. 1999. Options available—from start to finish—for obtaining

expression data by microarray. Nat Genet 21:25– 32.

Bruggeman FJ, van Heeswijk WC, Boorgerd FC, Whesterhoff HV. 2000.

748 BIOTECHNOLOGY AND BIOENGINEERING, VOL. 84, NO. 7, DECEMBER 30, 2003

Page 7: Environmental conditions and transcriptional regulation in Escherichia coli: a physiological integrative approach

Macromolecular intelligence in microorganisms. Biol Chem 138:

965– 972.

Cases I, de Lorenzo V, Ouzounis CA. 2003. Transcription regulation and

environmental adaptation in bacteria. Trends Microbiol 11:248– 253.

Chuang SE, Daniels DL, Blattner FR. 1993. Global regulation of gene

expression in Escherichia coli. J Bacteriol 175:2026–2036.

Collado-Vides J. 1992. Grammatical model of the regulation of gene

expression. Proc Natl Acad Sci USA 89:9405– 9409.

Collado-Vides J, Gutierrez-Rıos RM, Bel-Enguix G. 1998. Networks of

transcriptional regulation encoded in a grammatical model. Biosys-

tems 47:103– 118.

Cruz-Ramos H, Crack J, Wu G, Hughes MN, Scott C, Thomson AJ, Green

J, Poole RK. 2002. NO sensing by FNR: Regulation of the Escherichia

coliNO-detoxifying flavohaemoglobin, Hmp. EMBO J 21:3235–3244.

Covert MW, Schilling CH, Palsson B. 2001. Regulation of gene expression

in flux balance models of metabolism. J Theor Biol 213:73– 88.

Dworkin J, Losick R. 2001. Linking nutritional status to gene activation

and development. Gene Dev 15:1051– 1054.

Ermolaeva MD, White O, Salzberg SL. 2001. Prediction of operons in

microbial genomes. Nucl Acids Res 29:1216 – 1221.

Frey AD, Fiaux J, Szyperski T, Wuthrich K, Bailey JE, Kallio PT. 2001.

Dissection of central carbon metabolism of hemoglobin-expressing

Escherichia coli by 13C nuclear magnetic resonance flux distribution

analysis in microaerobic bioprocesses. Appl Environ Microbiol 67:

680– 687.

Guitierrez-Rios RM, Rosenblueth DA, Loza JA, Huerta AM, Glasner JD,

Blattner FR, Collado-Vides J. 2003. Regulatory network of Esche-

richia coli: Consistency between literature knowledge and microarray

profiles. Genome Res 13:2435– 2443.

Holden C. 2002. Alliance launched to model E. coli. Science 297:

1459– 1460.

Huerta AM, Glasner JD, Jin H, Blattner FD, Gutierrez-Rıos RM, Collado-

Vides J. 2002. GETools: Gene expression tool for analysis of tran-

scriptome experiments in Escherichia coli. Trends Genet 18:217–218.

Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM,

Pellegrini-Toole A, Bonavides C, Gama-Castro S. 2002. The EcoCyc

database. Nucl Acids Res 30:56– 58.

Konopka A. 2000. Microbial physiological state at low growth rate in

natural and engineered ecosystems. Curr Opin Microbiol 3:244–247.

Koretke KK, Lupas AN, Warren PV, Rosenberg M, Brown JR. 2000.

Evolution of two-component signal transduction. Mol Biol Evol 17:

1956– 1970.

Lengeler JW. 2000. Metabolic networks: A signal-oriented approach to

cellular models. Biol Chem 381:911–920.

Maltsev N, Marland E, Yu GX, Bhatnagar S, Lusk R. 2002. Sentra, a

database of signal transduction proteins. Nucl Acids Res 30:349–350.

Martin RG, Rosner JL. 2002. Genomics of the marA/soxS/rob regulon of

Escherichia coli: identification of directly activated promoters by

application of molecular genetics and informatics to microarray data.

Mol Microbiol 44:1611 – 1624.

Martinez-Antonio A, Collado-Vides J. 2003. Identifying global regulators

in transcriptional regulatory networks in bacteria. Current Opinion in

Microbiol 6:482– 489.

Menon NK, Robbins J, Wendt JC, Shanmugam KT, Przybyla AE.

1991. Mutational analysis and characterization of the Escherichia

coli hya operon, which encodes [NiFe] hydrogenase 1. J Bacteriol 173:

4851– 4861.

Monod J. 1966. From enzymatic adaptation to allosteric transitions.

Science 154:475–483.

Ninfa AJ. 1996. Regulation of gene transcription by extracellular stimuli. In:

Cellular and Molecular Biology: Escherichia coli and Salmonella, edn

2nd. Edited by Neidhardt FC, Curtiss III R, Ingraham J, Lin ECC, Low

KB, Magasanik B, Reznikoff W, Schaechter M, Umbarger HE and Riley

M. Washington, DC: American Society for Microbiology. 1246–1261.

Oltvai ZN, Barabasi AL. 2002. Systems biology. Life’s complexity

pyramid. Science 298:763– 764.

Oshima T, Aiba H, Masuda Y, Kanaya S, Sugiura M, Wanner BL, Mori H,

Mizuno T. 2002. Transcriptome analysis of all two-component reg-

ulatory system mutants of Escherichia coli K-12. Mol Microbiol 46:

281–291.

Overbeek R, Larsen N, Pusch GD, D’Souza M, Selkov E Jr, Kyrpides N,

Fonstein M, Maltsev N, Selkov E. 2000. WIT: Integrated system for

high-throughput genome sequence analysis and metabolic reconstruc-

tion. Nucl Acids Res 28:123– 125.

Pellequer JL, Brudler R, Getzoff ED. 1999. Biological sensors: More than

one way to sense oxygen. Curr Biol 9:R416– R418.

Perez-Rueda E, Collado-Vides J. 2000. The repertoire of DNA-binding

TFs in Escherichia coli. Nucl Acids Res 28:1838– 1847.

Phelps TJ, Palumbo AV, Beliaev AS. 2002. Metabolomics and microarrays

for improved understanding of phenotypic characteristics controlled by

both genomics and environmentalal constraints. Curr Opin Biotechnol

13:20– 24.

Pogliano J, Lynch AS, Belin D, Lin EC, Beckwith J. 1997. Regulation

of Escherichia coli cell envelope proteins involved in protein folding

and degradation by the Cpx two-component system. Gene Dev 11:

1169– 1182.

Poolman B, Blount P, Folgering JH, Friesen RH, Moe PC, van der Heide

T. 2002. How do membrane proteins sense water stress? Mol Micro-

biol 44:889– 902.

Raivio TL, Silhavy TJ. 2001. Periplasmic stress and ECF sigma factors.

Annu Rev Microbiol 55:591– 624.

Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR. 1999. Genome-

wide expression profiling in Escherichia coli K-12. Nucl Acids Res

27:3821– 3835.

Rowbury RJ. 2001. Extracellular sensing components and extracellular

induction component alarmones give early warning against stress in

Escherichia coli. Adv Microbiol Physiol 44:215–257.

Rui S, Tse-Dinh YC. 2003. Topoisomerase function during bacterial

responses to environmental challenge. Front Biosci 1:D256– D263.

Sabatti C, Rohlin L, Oh MK, Liao JC. 2002. Co-expression pattern from

DNA microarray experiments as a tool for operon prediction. Nucl

Acids Res 30:2886–2893.

Salgado H, Moreno-Hagelsieb G, Smith TF, Collado-Vides J. 2000.

Operons in Escherichia coli: genomic analyses and predictions. Proc

Natl Acad Sci USA 97:6652– 6657.

Salgado H, Santos-Zavaleta A, Gama-Castro S, Millan-Zarate D, Dıaz-

Peredo E, Sanchez-Solano F, Perez-Rueda E, Bonavides-Martınez C,

Collado-Vides J. 2001. RegulonDB (version 3.2): Transcriptional

regulation and operon organization in Escherichia coli K. Nucl Acids

Res 29:72– 74.

Santos JM, Lobo M, Matos AP, De Pedro MA, Arraiano CM. 2002. The

gene bolA regulates dacA (PBP5), dacC (PBP6) and ampC (AmpC),

promoting normal morphology in Escherichia coli. Mol Microbiol

45:1729– 1740.

Schembri MA, Ussery DW, Workman C, Hasman H, Klemm P. 2002.

DNA microarray analysis of fim mutations in Escherichia coli. Mol

Genet Genomics 267:721–729.

Shen-Orr SS, Milo R, Mangan S, Alon U. 2002. Network motifs in the

transcriptional regulation network of Escherichia coli. Nat Genet 31:

64–68.

Tao H, Bausch C, Richmond C, Blattner FR, Conway T. 1999. Functional

genomics: Expression analysis of Escherichia coli growing on minimal

and rich media. J Bacteriol 181:6425– 6440.

Taylor BL, Zhulin IB. 1998. In search of higher energy: Metabolism-

dependent behavior in bacteria. Mol Microbiol 28:683– 690.

Thieffry D, Huerta AM, Perez-Rueda E, Collado-Vides J. 1998. From

specific gene regulation to genomic networks: A global analysis of

transcriptional regulation in Escherichia coli. BioEssays 20:433– 440.

Thieffry D, Salgado H, Huerta AM, Collado-Vides J. 1998b. Prediction of

transcription regulatory sites in the complete genome of Escherichia

coli. Bioinformatics 14:391– 400.

Vicente M, Chater KF, de Lorenzo V. 1999. Bacterial transcription factors

involved in global regulation. Mol Microbiol 33:8 –17.

Zheng Y, Szustakowski JD, Fortnow L, Roberts RJ, Kasif S. 2002.

Computational identification of operons in microbial genomes. Ge-

nome Res 12:1221 – 1230.

MARTINEZ-ANTONIO ET AL.: GROWTH CONDITIONS AND REGULATION OF TRANSCRIPTION IN E. COLI 749