evolvability – the integron case and the use of synonymous sequences for directed evolution

Phenotypic stability is essential to the success of organisms evolving under steady conditions. However, the environment is subjected to perpetual stochastic variations, to which living beings must constantly adapt. Evolvability characterizes the ability of a population to respond to such selective pressures through the generation of heritable phenotypic changes. Most mutations being deleterious, processes enabling the confinement of mutations to periods of stress, or to specific loci and well-defined phenotypes, have been selected over evolution.Integrons constitute a particularily sophisticated illustration of such processes. Initially identified through their involvement in multi-resistance to antibiotics, these bacterial genetic systems are specialized in the exchange and stockpiling of accessory genes and therefore con-stitute an important source of genetic diversity. This work shows that integrons are directly coupled with the SOS system, a major bacterial stress response. By allowing the generation of significant phenotypic diversity during periods of stress without impacting the rest of the ge-nome, integrons hence constitute a paradigmatic example of evolvability.Another aspect of this work demonstrates that synonymous coding sequences – although specifying identical proteins – can access different area of the phenotypic space through ponctual mutations. When properly exploited, this property can enhance the evolva-bility of any protein in the context of biotechnological applications.





Discipline – Sciences de la vie

Spécialité – Génétique

Guillaume CAMBRAY




Thèse dirigée par le Dr. Didier MAZEL

Soutenue le 10 Juillet 2009


Mme. la Pr. Isabelle Martin-Verstraete Président

M. le Pr. Pierre Capy Rapporteur

M. le Pr. Fernando De La Cruz Rapporteur

M. le Dr. Antoine Danchin Examinateur

M. le Dr. Ivan Matic Examinateur

M. le Dr. Didier Mazel Directeur de thèse




“The beauty of the cosmos is given not only by unity into diversity,

but also by diversity into unity.”

– Umberto Eco, in The Name of the Rose

“La science a en fait deux aspects.

Ce qu’on pourrait appeler science de jour et science de nuit.

La science de jour met en jeu des raisonnements qui s’articulent comme des engrenages, des

résultats qui ont la force de la certitude. On en admire la majestueuse ordonnance comme

celle d’un tableau de Vinci ou d’une fugue de Bach. On s’y promène comme un jardin à la

française. Consciente de sa démarche, fière de son passé, sûre de son avenir, la science de jour

avance dans la lumière et la gloire.

La science de nuit, au contraire, erre à l’aveugle. Elle hésite, trébuche, recule, transpire, se

réveille en sursaut. Doutant de tout, elle se cherche, s’interroge, se réprend sans cesse. C’est

une sorte d’atelier du possible où s’élabore ce qui deviendra le matériau de la science…

Ce qui guide l’esprit alors, c’est l’instinct, l’intuition. C’est le besoin d’y voir clair. C’est

l’acharnement à vivre. C’est le courage…”

– François Jacob



La stabilité phénotypique est essentielle au succès d’organismes évoluant sous des

conditions constantes. L’environnement est néanmoins soumis à de perpétuelles variations

stochastiques, auxquelles les êtres vivants doivent sans cesse s’adapter. L’évolutivité

caractérise la capacité d’une population à répondre à de telles pressions sélectives par la

génération de modifications phénotypiques héritables. La majorité des mutations étant

délétères, des processus permettant de limiter la production de telles variations aux seules

périodes de stress, ou de la confiner à des loci et phénotypes bien définis, ont été sélectionnés

au cours de l'évolution.

Les intégrons en constituent une illustration particulièrement sophistiquée.

Initialement identifiés comme vecteurs de résistance à de multiples antibiotiques, ces

systèmes génétiques bactériens spécialisés dans l’échange, la collecte et l’expression de gènes

accesoires constituent une importante source de diversité génétique. Ce travail montre que les

intégrons sont directement couplés à une voie majeure de réponse au stress chez les bactéries,

le système SOS. En permettant de générer de la variabilité phénotypique en période de stress

sans affecter le reste du génome, les intégrons constituent ainsi un exemple paradigmatique


Un autre aspect de ce travail démontre que des séquences codantes synonymes – bien

que spécifiant des protéines identiques – peuvent accéder par mutations ponctuelles à des

régions différentes de l’espace phénotypique. Utilisée de manière adéquate, cette propriété

permet d’étendre l’évolutivité d’une protéine quelconque dans le cadre d’applications









Phenotypic stability is essential to the success of organisms evolving under steady

conditions. However, the environment is subjected to perpetual stochastic variations, to which

living beings must constantly adapt. Evolvability characterizes the ability of a population to

respond to such selective pressures through the generation of heritable phenotypic changes.

Most mutations being deleterious, processes enabling the confinement of mutations to periods

of stress, or to specific loci and well-defined phenotypes, have been selected over evolution.

Integrons constitute a particularily sophisticated illustration of such processes. Initially

identified through their involvement in multi-resistance to antibiotics, these bacterial genetic

systems are specialized in the exchange and stockpiling of accessory genes and therefore con-

stitute an important source of genetic diversity. This work shows that integrons are directly

coupled with the SOS system, a major bacterial stress response. By allowing the generation of

significant phenotypic diversity during periods of stress without impacting the rest of the ge-

nome, integrons hence constitute a paradigmatic example of evolvability.

Another aspect of this work demonstrates that synonymous coding sequences – al-

though specifying identical proteins – can access different area of the phenotypic space

through ponctual mutations. When properly exploited, this property can enhance the evolva-

bility of any protein in the context of biotechnological applications.








RESUME................................................................................................................................................................ 7

ABSTRACT ........................................................................................................................................................... 9

REMERCIEMENTS ........................................................................................................................................... 11

TABLE OF CONTENTS ..................................................................................................................................... 13

ABBREVIATIONS .............................................................................................................................................. 19

INTRODUCTION ........................................................................................................................................... 21

I. Control of genetic diversity ................................................................................................................22

I.1. Spontaneous mutations ............................................................................................................... 23

I.1.1. Effects and origins of mutations..........................................................................................23

a - General overview........................................................................................................................23

b - Types of mutations .....................................................................................................................25

c - The origins of mutations.............................................................................................................26

I.1.2. Genome-wide mutation rates................................................................................................30

a - Pattern of spontaneous mutation rates ........................................................................................30

b - Mechanisms of genetic maintenance..........................................................................................33

c - The lowest… the best .................................................................................................................39

I.1.3. General mutators and the ambiguity of repair systems.................................................42

a - Natural occurrence of mutators ..................................................................................................42

b - The short-term advantages of increased mutation rates..............................................................42

c - Long-term consequences of increased mutation rates ................................................................44

d - Lessons from the mutator phenomenon......................................................................................46

I.2. Stress-induced mutagenesis ....................................................................................................... 47

I.2.1. The SOS paradigm ..................................................................................................................47

a - The SOS response to DNA damage............................................................................................47

b - Survival and variability during SOS induction...........................................................................50

c - Extending the SOS response.......................................................................................................54

I.2.2. Other examples of stress-induced mutagenesis...............................................................56

a - Mutagenesis in aging colonies....................................................................................................56

b - The competence state .................................................................................................................58


I.3. Programed generation of genetic variations ........................................................................60

I.3.1. Localized mutation through slipped-strand mispairing ................................................61

a - Replication slippage ...................................................................................................................61

b - Simple Sequence Repeat (SSR) are variable loci .......................................................................61

c - Phenotypic impact ......................................................................................................................64

d - SSRs as localized mutators.........................................................................................................68

I.3.2. Mutation by intragenomic recombination ........................................................................71

a - Meiotic sex .................................................................................................................................71

b - Gene conversion .........................................................................................................................71

c - Transposition ..............................................................................................................................79

d - Site-specific recombination ........................................................................................................82

I.3.3. Epigenetics.................................................................................................................................89

a - Definition....................................................................................................................................89

b - Bistable regulatory switch ..........................................................................................................90

c - DNA methylation patterns in bacteria ........................................................................................93

d - The yeast prion PSI+ ..................................................................................................................98

II. Phenotypic plasticity, genetic variations and physiological regulation...........................99

II.1. Genetic versus physiological changes ................................................................................ 100

II.1.1. Cybernetic genomes ............................................................................................................100

II.1.2. Individual and populational adaptation..........................................................................101

II.1.3. Stochastic switches as a bet-hedging strategy .............................................................102

a - Contingency loci.......................................................................................................................102

b - Bet-hedging ..............................................................................................................................103

c - Genetic switches as crude regulatory controls..........................................................................104

d - Link with the lifestyle of organisms .........................................................................................105

II.2. Links between genetic changes and regulation.............................................................. 106

II.2.1. Impact of expression strength on sequence evolution ...............................................106

II.2.2. Genetic assimilation of long lasting regulation ...........................................................107

II.2.3. Evolution of regulatory patterns ......................................................................................108

a - Regulatory networks as evolutionary target .............................................................................108

b - Switching regulation patterns ...................................................................................................109

II.2.4. Physiological regulation of mutagenesis .......................................................................110

a - Stress-induced mutagenesis can be spatially confined .............................................................110

b - Targeted mutagenesis can be regulated ....................................................................................110

II.3. Evolvability and robustness................................................................................................... 111

II.3.1. Evolvability............................................................................................................................111

II.3.2. Robustness..............................................................................................................................114


II.3.3. Links between robustness and evolvability ..................................................................116

III. The integron genetic system .........................................................................................................118

III.1. Overview of the system .......................................................................................................... 118

III.1.1. A brief historical perspective ................................................................................................118

III.1.2. Structure of integrons.............................................................................................................120

a - The functional platform............................................................................................................120

b - The cassette array .....................................................................................................................120

III.1.3. Different flavors of integron............................................................................................121

a - Mobile integrons.......................................................................................................................121

b - Chromosomal integrons ...........................................................................................................123

III.2. Functional organization of integrons ............................................................................... 126

III.2.1. A unique site-specific recombination mechanism ....................................................126

a - Double and single stranded recombination substrates ..............................................................126

b - The different recombination reactions......................................................................................129

c - Accessory factors......................................................................................................................131

III.2.2. Expression of cassettes’ genes........................................................................................132

a - Transcription.............................................................................................................................132

b - Translation................................................................................................................................133

III.3. Integrons and evolution......................................................................................................... 134

III.3.1. Chromosomal integron as the source of mobile integrons .....................................135

a - Chromosomal integrons are ancient and widespread structures ...............................................135

b - Mounting of mobile integrons ..................................................................................................137

c - Resistance gene and chromosomal integrons ...........................................................................139

d - The generation of cassettes.......................................................................................................141

III.3.2. A central role in horizontal gene transfer ....................................................................142

a - Evidences for interspecies cassettes exchanges........................................................................142

b - Mobile integrons and the spread of cassette .............................................................................143

c - A cassette metagenome ............................................................................................................143

III.3.3. Integrons as sophisticated contingency loci................................................................144

a - Working model.........................................................................................................................144

b - An unknown recombination dynamic.......................................................................................146

RESULTS........................................................................................................................................................ 151

I. Evolution of recombination rate in integrons ...........................................................................152

Background ............................................................................................................................................. 152

Methods.................................................................................................................................................... 152

Results and discussion.......................................................................................................................... 152

Article I..................................................................................................................................................... 153


II. Recombination in integrons is controled by the SOS response to stress ......................171

Background ............................................................................................................................................. 171

Methods.................................................................................................................................................... 171

Results and discussion.......................................................................................................................... 171

Article II ................................................................................................................................................... 172

Article III ................................................................................................................................................. 172

III. Intrinsic evolutionary potential of genes .................................................................................212

Background ............................................................................................................................................. 212

Methods.................................................................................................................................................... 212

Results and discussion.......................................................................................................................... 212

Article IV ................................................................................................................................................. 213

DISCUSSION ................................................................................................................................................. 233

I. Integrons are powerful adaptive systems .....................................................................................234

I.1. The expression of gene cassettes ............................................................................................ 234

I.1.1. Coupling between recombination and expression........................................................234

I.1.2. The integron system mimics inducible promoters .......................................................236

I.1.3. Increased rate of cassette evolution..................................................................................237

I.2. Responsive and oriented mutagenesis ................................................................................. 238

I.2.1. Responsive versus constant mutation rates ....................................................................238

I.2.2. Integron and adaptive mutagenesis...................................................................................239

I.2.3. A clear case of stress-induced mutagenesis ...................................................................240

I.3. A deep connection with SOS triggers .................................................................................. 241

I.3.1. Single stranded DNA: a bridge between two systems ................................................241

I.3.2. Potential SOS triggers relevant to integrons..................................................................243

I.3.3. SOS-controlled accessory factors? ...................................................................................244

I.4. Are integron really successful? .............................................................................................. 245

II. Implications for health .....................................................................................................................246

III. Biotechnological considerations ..................................................................................................248

III.1. The ELP principle ................................................................................................................... 248

III.2. Synthetic integrons .................................................................................................................. 249

APPENDIX ..................................................................................................................................................... 253

Epistemological considerations on the role of variations in biology.....................................254

Maintenance versus variability: a major evolutionary trade-off ..................................... 254


The purpose of evolution .................................................................................................................. 257

Form, function and the watchmaker............................................................................................257

Adaptation, teleonomy and blindness .........................................................................................259

Impact of the environment .............................................................................................................. 260

What is the environment?...............................................................................................................260

The inheritance of acquired characteristics ...............................................................................262

The Neo-Darwinian focus on selection ......................................................................................265

Anticipating and responding environmental changes.............................................................265

REFERENCES .............................................................................................................................................. 269


Figure 1 – Distribution of fitness effects of mutations...........................................................................................24

Figure 2 – Spontaneous mutation rates ..................................................................................................................32

Figure 3 – Intrinsic replication error rates..............................................................................................................34

Figure 4 - Nucleotide excision repair (NER) .........................................................................................................35

Figure 5 – Schematic pathways of base excision repair (BER)..............................................................................37

Figure 6 – Main homologous recombination (HR) pathways ................................................................................38

Figure 7 – Schematic functioning of the SOS system............................................................................................48

Figure 8 – Disruption of the SOS response dramatically affects survival..............................................................51

Figure 9 – Survival and mutation of E. coli after starting antibiotic therapy in mice.............................................52

Figure 10 – Variation of aging-induced mutagenesis among natural E. coli isolates.............................................57

Figure 11 – Polymerase slippage............................................................................................................................62

Figure 12 – Phase variation in the biosynthesis of the LPS molecule of H. influenzae .........................................65

Figure 13 – Floculation controlled through slipped strand mispairing in S. cerevisiae..........................................66

Figure 14 – Morphological impact of repeat length variation................................................................................67

Figure 15 – Outcomes of gene conversion.............................................................................................................71

Figure 16 – Molecular models of recombination involved in gene conversion .....................................................73

Figure 17 – Gene conversion models for B. hermsii, A. marginale and T. brucei .................................................75

Figure 18 – Mating-type switching in S. cerevisiae ...............................................................................................76

Figure 19 – Stress-controlled targeting of the Ty5 retrotransposon .......................................................................80

Figure 20 – The three possible outcomes of site-specific recombination...............................................................82

Figure 21 – Types of specific inversion identified in B. fragilis ............................................................................84

Figure 22 – Phase variation of type 1 fimbrial expression by DNA inversion.......................................................86


Figure 23 – Hysteresis and bistability in the lac operon.........................................................................................91

Figure 24 – Epigenetic inheritance of the sporulation signal in Bacillus subtilis...................................................92

Figure 25 – DNA methylation-dependent phase variation of the pap operon in E. coli.........................................96

Figure 26 – Physiological and genetic adaptation in evolution............................................................................113

Figure 27 – Number of articles dealing with integron-mediated antibiotic resistance .........................................118

Figure 28 – Organization of the integron recombination system .........................................................................119

Figure 29 – Phylogentic distribution of attC sites found in Vibrio species..........................................................124

Figure 30 – Functional distribution of cassette-encoded proteins in vibrionales .................................................125

Figure 31 – Structure of attI sites .........................................................................................................................127

Figure 32 – Structure of attC sites........................................................................................................................128

Figure 33 – Model of atypical recombination in integrons ..................................................................................130

Figure 34 – The Pc promoter of class 1 integron .................................................................................................132

Figure 35 – Collapsed phylogenetic tree of integrases.........................................................................................134

Figure 36 – Phylogenetic tree of integron integrases ...........................................................................................136

Figure 37 – Evolution of mobile clinically derived class 1 integrons ..................................................................139

Figure 38 – Working model of integron functioning ...........................................................................................145

Figure 39 – Comparative organization of three V. cholerae chromosomal integrons ..........................................147

Figure 40 – The floral architecture of Linaria vulgaris and Linaria peloria .........................................................256

Figure 41 – Comparison of Lamarck's theory of transformation and a phylogenetic tree ...................................263


Table 1 – Spontaneous mutation rate in DNA-based microbes..............................................................................31

Table 2 – MMR components and their functions...................................................................................................36

Table 3 – Simple sequence repeats in the genome of H. influenzae ......................................................................63

Table 4 – A representative list of bacterial species containing chromosomal integrons ......................................122

Table 5 – Gene cassettes shared between the integrons of Vibrio species ...........................................................147



DSB ......................double strand break

EP..........................error-prone polymerases

HGT ......................horizontal gene transfer

HJ..........................Holiday junction

ICE........................integrative conjugative element

IS...........................insertion sequence

kb ..........................kilobase

LPS .......................lipopolysaccharide

MMR ....................mismatch repair

NER ......................nucleotide excision repair

nt ...........................nucleotide

ORF ......................open reading frame

pb ..........................base pair

ssDNA ..................single stranded DNA

SSR .......................simple sequence repeats


TE .........................transposable element

TLS .......................translesion synthesis

UV ........................ultraviolet




Introduction – Control of genetic diversity


The modern evolutionary synthesis is the unifying paradigm in biology. The main

lines of this theory were drawn around the 1940s, following the development of population

genetics. Since then, our knowledge of the molecular basis underlying genetic phenomena had

rapidly expanded, and some key concepts have been refined. The mechanisms governing the

generation of mutations constitute an important area of investigation. Indeed, the availability

of genetic diversity determines the adaptation rate of populations – their evolvability. Contra-

dicting one of the fundamental tenets of the neo-Darwinian theory, the last decades of re-

searches brought numerous examples showing that the occurrence of mutations is sometime

not a completely random process. In many aspects, the theory of evolution is more than an

academic discipline. It also offers an explanatory edifice to a wide range of metaphysical is-

sues and is an essential component of a materialistic worldview, almost a religion to some ex-

tent (Ruse, 2003). An epistemological account on the importance of biological variations is

thus presented in appendix to complement the scientific introduction (see p254).

This thesis focuses on evolvability. The first part of the introduction describes the dif-

ferent types of mutations and the diverse molecular mechanisms that affect their generation.

Despite my efforts to include examples from all kingdom of life, most focus on bacteria – but

after all, are not we living in a bacterial world (Gould, 1996)? The second chapter discusses

the respective impacts of genetic and physiological variations in the production of phenotypic

diversity. The last chapter is dedicated to the presentation of the integron system, the sophisti-

cated adaptive properties of which constitute the main subject of this work.


The availability of heritable phenotypic diversity determines the evolvability of bio-

logical populations. However, most mutations are deleterious and thus prejudicial for the

maintenance of individuals on the short-term. Although this situation settles an important evo-

lutionary trade-off, the early neo-Darwinian theory regarded mutational events as purely ran-

dom accidents. Advances in molecular biology led to the description of sophisticated

mechanisms to cope with alterations affecting DNA molecules. Alongside, many ingenious

systems able to regulate, modify and restructure the genetic information with minimal risk to

ongoing adaptation were discovered. The understanding of these specific processes of

mutagenesis is particularly important from both fundamental and applied standpoints. The re-

Spontaneous mutations – Effects and origins of mutations


alization that evolvability is itself an evolvable trait is an essential refinement of the theory of

evolution: evolution per se can then be considered as a biological function. Many health-

threatening phenomena – such as antibiotic resistance, microbial pathogenesis, tumor progres-

sion, genetic diseases, radiation- and chemotherapy-resistance – result from the capacity of

cells to tune their adaptation rate. Such processes may be challenged by the design of anti-

evolution drugs that would short-circuit mutagenesis responses.

This chapter first provides an overview of the processes responsible for spontaneous

mutagenesis and quickly addresses the sophisticated mechanisms developed over evolution to

keep mutations in check. Then, I describe how these mechanisms can be subverted to increase

the generation of genetic diversity. While a constitutive increase in mutation rate is necessar-

ily deleterious on the long term, its coupling with cellular responses to stress enables a refined

mechanism wherein genetic novelties are created specifically when individuals are mal-

adapted. The last part presents a detailed review of the genetic systems involved in localized

and phenotypically oriented mutations.

I.1. Spontaneous mutations

I.1.1. Effects and origins of mutations

a - General overview

The genetic information is encoded in the DNA sequences that form the genome of

living entities. These sequences constitute information because they interact in a very specific

manner with the cellular machinery (see Cybernetic genomes, p100). Generally speaking, a

mutation is a heritable alteration of the DNA sequence. Heritable modifications that do not af-

fect DNA sequences constitute a specific case of mutation – epimutations (see Epigenetics,

p79). Although this bulk affirmation will be nuanced later, mutations are a priori random

events. Indeed, none can precisely predict when and where a genetic alteration will occur, nor

can one foresee the exact functional impact of a mutation.

There are distinct forms of mutations, ranging from single base substitutions to whole

genome duplications. As we will see, these different types arise from diverse causes and their

generation may involve sophisticated repair mechanisms. The phenotypic impact of a muta-

tion depends considerably on its type, but also on its location (Streisinger et al., 1966; and see

Introduction – Control of genetic diversity


below). Mutations are distributed along a continuum of fitness effects, which determine their

fate in a given population. Though the exact shape of this distribution is subject to

discussions, several general principles can be outlined (Eyre-Walker and Keightley, 2007).

Most mutations are deleterious to the organism, and thus discarded by natural selection; some

mutations do not produce effects strong enough to permit selection – and are thus neutral;

while only very few are adaptive (see Figure 1). Importantly, the fate of mutations depends on

the effective size of the population: mutations that are deleterious or advantageous in a large

Spontaneous mutations – Effects and origins of mutations


population may be essentially neutral in a small population, wherein random drift outweighs

selection coefficients. As a rule of thumb, it is generally considered that a mutation with a se-

lection coefficient s inferior to the inverse of the effective population size Ne – i.e. Ne.s < 1 –

is effectively neutral (Kimura, 1983). In mutation accumulation and mutagenesis experiments

carried out in yeast (Wloch et al., 2001) and in the vesicular stomatitis virus (Wloch et al.,

2001) respectively, it was estimated that 30-40% of mutations are lethal in laboratory condi-

tions (see Figure 1). One can easily figure out that introducing absolutely random modifica-

tions in a complex system is much likely to impair it rather than enhancing it. Consider a

watch, for instance: removing a screw, adding a spring or slightly altering a cog will almost

certainly break the fine arrangement of the mechanism and prevent the device from being

functional. What is more, a second random alteration is very unlikely to restore the system

back to its functional state. In most cases, mutations are thus irreversible and accumulate in

genomes if they are not counter-selected – a process known as Muller’s ratchet (Muller,


b - Types of mutations

Different types of mutation have different phenotypic effects, depending of the loci af-

fected. In this respect, one can mainly distinguish mutations arising from point mutations, and

chromosomal rearrangements.

i. Point mutations

Point mutations include substitution, insertion and deletion of nucleotides. At worst, a

substitution can change a key amino-acid which is essential to the protein function; introduce

a termination codon; or affect the affinity of a DNA region to cognate proteins. At best, it may

fall into a non-informative region and be virtually neutral. When located in a coding region,

insertions and deletions almost always dramatically impact the protein sequence by shifting

the reading frame of the gene (Streisinger et al., 1966).

ii. Chromosomal rearrangements

Chromosomal mutations are structural changes of higher order that result from ille-

gitimate recombination events occurring during replication or reparation of DNA molecules.

These alterations include deletions, inversions, duplications and translocations. In this con-

text, a deletion corresponds to the loss of a whole region of a chromosome. Obviously, all

genes present in this region will be subsequently absent from the genome, and this can deeply

Introduction – Control of genetic diversity


affect the organism’s phenotype. Usually, an inversion has less dramatic consequences. It

mainly affects the regions at the tip of the inverted segment, though the expression of genes

located within the segment can be subtly altered (Rocha, 2004). A duplication event refers to

the repetition of a whole genomic region and results in increased gene-dosage. This can be

deleterious if a toxic gene is over-expressed, or if it destabilizes the functioning of cellular

networks. Notably, paralogous genes resulting from duplication events can evolve in diver-

gent ways and enrich the genome with new functions (Taylor and Raes, 2004; Conant and

Wolfe, 2008a; and see Impact of expression strength on sequence evolution, p106). The same

mechanisms that cause inversions, deletions or duplications can lead to translocations in mul-

tichromosomic genomes when fragments of DNA are exchanged between different chromo-

somes. All chromosomal rearrangements can alter DNA topology and therefore indirectly

affect expression of the surrounding regions (Reymond et al., 2007).

iii. Genetic exchanges

In prokaryotes, the acquisition of exogenous DNA is an important source of genetic

novelty, and a complete set of functional traits can be instantaneously gained through horizon-

tal gene transfer (HGT) (Ochman et al., 2000; Redfield, 2001). In some eukaryotes, sexual re-

production allows mixing the genetic information of two individual during meiosis, which led

to the emergence of new traits combinations (Otto and Lenormand, 2002). In both cases, the

rearrangement of pre-existing genetic variations can results in large and swift phenotypic


In eukaryotes that evolved specialized reproductive tissues, mutations show different

evolutionary impact depending on whether they affect the soma or the germen (see Appendix

– The inheritance of acquired characteristics, p262). Mutations in somatic cells may affect the

ability of the organism to survive and reproduce, but are limited to the individual and not

transmitted to the next generation. In contrast, mutations affecting the germline are largely si-

lent in parental individuals and are only expressed in their offsprings. Thus, only mutations

established in the germline are hereditary and have evolutionary consequences.

c - The origins of mutations

DNA molecules are continuously insulted by a variety of different factors including

environmental influences, mutagenic chemicals and replication. The proliferation of transpos-

able elements is also a significant source of structural variations. Most mutations are sub-

jected to advanced mechanisms of repair (see p33). Although these processes primarily

Spontaneous mutations – Effects and origins of mutations


evolved as caretakers of genome integrity in order to increase individual survival, their modus

operandi can also be a substantial source of mutations.

i. Mutagenic radiations

Most environments are exposed to physical radiations. Despite the protective effect of

the atmosphere, organisms are frequently exposed to mutagenic ultraviolet (UV) light. UVs

principally induce the formation of covalent bounds between adjacent pyrimidine bases on a

single DNA strand. Around 80–90% of the resulting photoproducts are cyclobutane

pyrimidine dimers, while the remaining 10–20% correspond to more mutagenic pyrimidine-

pyrimidone (6–4) photoproducts (Sancar, 2008). Both dimers alter the conformation of the

DNA double helix, thereby preventing normal transcription and replication. Besides, UVs

also promote the hydrolysis of cytosines, which ultimately results in C:G→T:A transitions

through mispairing with adenine during replication (Clancy, 2008a). Besides, organisms are

exposed to more energetic and penetrating radiations. Ionizing radiations, such as γ-rays and

X-rays, can induce double strand breaks (DSB) either directly or indirectly through the mas-

sive production of free oxygen radicals (Cadet et al., 2003).

ii. Chemical mutagens

Organisms are often exposed to specific chemical mutagens of biotic or abiotic ori-

gins. Such agents interfere with the normal behavior of the DNA molecule by preventing cor-

rect base pairing, substituting to standard bases or disrupting the integrity of the DNA helix

(Clancy, 2008b). Oxidative environments are a major source of DNA alterations. Particularly,

oxidation of guanines into 8-hydroxyguanine frequently leads to G:C→T:A and A:T→C:G

transversions. Spontaneous hydrolysis can remove purine bases from the sugar-phosphate

backbone of corresponding nucleotides. For instance, the N7 position of guanine is particu-

larly vulnerable to alkylation, and this alteration frequently results in spontaneous depurina-

tion (Mishina et al., 2006). If not repaired, such damages result in the incorporation of an

incorrect base during the next round of replication. Some bases are also subject to spontane-

ous loss of amine group. The most common deamination converts cytosine to uracil, which

can pair with adenine instead of the required guanine, leading to the fixation of a G:C→A:T

transition upon replication (Clancy, 2008b).

Aside from direct environmental exposure to mutagens, unfavorable chemical condi-

tions can also arise endogenously from the normal functioning of the cell. Indeed, numerous

metabolic by-products – notably from respiration – are free radicals or reactive oxygen spe-

Introduction – Control of genetic diversity


cies that can alter DNA through base oxidation, alkylation, or hydrolysis (Cadet et al., 2003).

Thus, mutations can occur spontaneously without explicit insults from the external environ-


iii. Transcription and replication

To some extent, the conformation of DNA duplexes protects its sequence from muta-

tions. Indeed, the inward orientation of amine bases toward the axis of the helix limits their

exposition to the cellular milieu (Watson and Crick, 1953). However, the two complementary

strands are separated during replication and transcription. The consequence of single stranded

DNA (ssDNA) exposure is particularly evident in bacterial genomes, wherein a sharp inver-

sion in strand content has been evidenced around the single origin and terminus of replication.

This bias reflects both the enrichment of the leading strand in coding sequences and the longer

time spent in single stranded form by the lagging strand (Lobry, 1996; Lobry and Sueoka,

2002; Rocha, 2004).

As every process involving transfer of information, replication is intrinsically inaccu-

rate by commiting punctual copying error and misincorporation – e.g. uracil opposite thymine

(but see Replication in Hi-Fi, p33). The availability and relative concentrations of the differ-

ent nucleotides in the cell can affect error frequencies.

iv. Transposition

Structural instability due to the mobilization of transposable elements (TEs) was ini-

tially observed by B. McClintock (1902–1992) in zea mays (McClintock, 1950, 1984). TEs

are mobile DNA segments that contain the information required to produce self-copies and/or

change their genomic location. These elements have been found in nearly all genomes inves-

tigated so far. One can distinguish two major classes of TEs. Class I retroelements (e.g. LTR-

retrotranposons and LINEs) comprise a reverse transcriptase that is used to process tran-

scribed RNA copies prior to reinsertion in a new location. This mechanism mediates a “copy-

and-paste” transposition process. The Class II elements are flanked by terminal inverted re-

peats that are processed by a dedicated transposase coded in the elements. This mediates a

conservative “cut-and-paste” mechanism. While both classes are presents in eukaryotes, pro-

karyotic genomes only contain class II elements. TEs are generally autonomous – though ei-

ther the reverse transcriptase (e.g. SINEs) or the transposase must be supplied in trans to

some altered elements (Miller and Capy, 2004).

Spontaneous mutations – Effects and origins of mutations


TEs can essentially be regarded as selfish parasites that spread within genomes – and

even between, through transposition into plasmids or viruses (Dawkins, 1976). In some eu-

karyotes, they can represent more than half of the genome. TEs play an important role in the

generation of genetic rearrangements. Because their target sites generally consist of few base

pairs only, they are expected to insert randomly in a genome. However, some exceptions have

been reported (see for instance Parks and Peters, 2009). Their mobilization can results in gene

inactivations, gene chimaerisations and altered expression patterns (Bushman, 2004), as well

as chromosomal deletions, inversions and translocations (Kazazian, 2004; Miller and Capy,

2004). Some TEs excise themselves in a precise and accurate manner while others leave scars

behind them, thereby permanently altering their target sites. As described below, chromoso-

mal rearrangements can also arise as an indirect consequence of the proliferation of TEs in the


v. Repair- and replication-mediated rearrangement

The occurrence of mutations is not always a purely random event. Instead, some pat-

terns can influence the appearance of mutations. Notably, the very structure of genomes can

promote the occurrence of chromosomal rearrangement through recombination between re-

peated sequences motifs (Rocha, 2003; Achaz et al., 2003; Cooper et al., 2007a). Such events

may result from misannealing of the replicating strand upon rescue of a stalled replication

fork or through more sophisticated repair mechanisms, such as homologous recombination

(HR, see Overview of repair mechanisms, p36 and Replication slippage, p61). In most ge-

nomes, the proliferation of TEs is an important source of large repeated sequences that can

drive chromosomal rearrangements (Kazazian, 2004). In this process, the orientation of the

repeats matters: direct repeats result in deletion or duplication, while indirect repeats generate

inversions. Tandem repeats results in specific patterns of facilitated duplication-deletion that

can prove adaptive. Transient gene amplifications can mediate specific increase in gene dos-

age (Roth et al., 2006; Hastings, 2007). In contrast, the expansion-retraction of repeated nu-

cleotides tract can alter the expression of functional protein (see Localized mutation through

slipped-strand mispairing, p61).

vi. Meiotic sex and horizontal gene transfer

Broadly speaking, sex is the combination of genetic materials from two distinct origins

to form a new genotype. In sexual eukaryotes, the process of meiosis regularly mixes the

complete sets of maternal and paternal genes, thereby randomly reassorting the alleles into

Introduction – Control of genetic diversity


new individuals. Although the exact evolutionary role of meiotic sex is far from being fully

understood (Otto and Lenormand, 2002), this process is driven by a dedicated molecular ma-

chinery which must have evolved to promote efficient homologous recombination (Cavalier-

Smith, 2002; Benavente and Volff, 2009; Wilkins and Holliday, 2009). In contrast, sex occurs

in prokaryotes through processes that are non-reciprocal and fragmentary. Horizontal gene

transfer (HGT) is not a regular component of these organisms’ life cycles (Redfield, 2001).

HGT can occur through: i) transduction by phages; ii) conjugation by plasmids; iii) transfor-

mation in naturally competent bacteria; and iv) is potentiated by other mobile elements such

as TEs. The repair machinery of the recipient cells is often implicated in genomic integration

of incoming DNA (see p36, HR and MMR ). As phages, conjugative plasmids and TEs are

essentially selfish elements; their involvement in genetic exchanges is likely to be an unse-

lected side effect of processes that evolved for more immediate functions. In contrast, natural

competence involves specialized and highly regulated machinery. Its implication in the gen-

eration of genetic diversity is thus more ambiguous and will be discussed in details latter (see

The competence state, p58).

I.1.2. Genome-wide mutation rates

a - Pattern of spontaneous mutation rates

i. Mutation rate

The mutation rate comprises all kinds of mutations occurring in a mutational target

during a given amount of time, including point mutations and complex chromosomal rear-

rangements. This rate is expected to vary greatly depending on target length, expression

strength (if the target is a gene) and other idiosyncrasies such as exact base sequence and the

presence of repetitive elements. The calculation of mutation rates cannot rely on post-hoc

comparison of natural sequences, because these generally result from complex and unknown

evolutionary histories. Such comparisons actually measure substitution rates, which depends

on mutation rate but also reflects the specific selective pressures exerted on the target, the

linkage with other genetic determinants, as well as demographic factors. One must dissociate

mutational events from any evolutionary process in order to obtain an unbiased picture of the

spontaneous mutation rates.

Spontaneous mutations – Genome-wide mutation rates


ii. DNA-based microbes display a constant genomic mutation rate

When measured in controlled experimental settings, the mutation rates exhibited by

representative organisms reveal remarkable taxonomic patterns. The most striking and most

accurate illustration concerns DNA-based microbes – which include representatives of bacte-

ria, archea, bacteriophages and unicellular eukaryotes. When expressed on a per-nucleotide

basis, mutation rates (µb) are collectively very low but vary over four orders of magnitude be-

tween different organisms (see

Table 1 and Figure 2). Interest-

ingly, genome sizes (G) are in-

versely correlated with µb. As a

direct consequence, the muta-

tion rates of diverse DNA-

based microbes are outstand-

ingly steady when extrapolated

to the genome as a whole. The

genomic mutation rates (µg) average 3.4·10-3 mutations per genome per generation and their

distribution is very narrow (2.5–4.6·10-3) (Drake et al., 1998). Hence, one single mutation is

expected to occur every 300 generations anywhere in the genome, irrespectively of the actual

microbe species. Overall, the measure of µg is biologically sound because the individual is the

entity on which selection operate.

The measure of mutation rates relies on the scoring of altered phenotypes, in cases

where the mutational target is presumably well defined. Because all mutations do not equally

impact the phenotype, the measured rate must be corrected. Furthermore, the extrapolation to

a genomic mutation rate relies on the assumption that the mutation-reporter gene is represen-

tative of the whole genome (Drake, 1991). In spite of these unavoidable approximations –and

considering the general paucity of constant values in evolutionary processes – the observation

of such a conserved constant is particularly meaningful. Specifically, this suggests that muta-

tion rates would naturally evolve toward an optimal equilibrium. This issue will be further

discussed below.

iii. Mutation rates in other organisms

Consistent measures of mutation rates are uneasy to assess experimentally in organ-

isms other than DNA-based microbes and the resulting estimates must be considered cau-

Introduction – Control of genetic diversity


tiously (see Figure 2). The mean µg calculated for lytic RNA viruses is ca. 1–2, but individual

values are considerably scattered. In retro-elements, including retrovirus and retrotransposons

the mean µg is ca. 0.1–0.2, again with several outliers. The fact that genome sizes do not read-

ily reflect genome contents complicates the situation in many eukaryotes. Some genomes are

indeed mostly composed of introns, TEs and no coding sequence whose functions are difficult

to appreciate. Most mutations in these regions may be neutral and are therefore unlikely to af-

fect the second-order selection of mutation rate (see The lowest… the best, p39). A proper es-

timate of µg in these species may then be based on the effective genome size, by only taking

the functional parts of the genome into account. In such cases, the mean mutation rate per ef-

fective genome has been estimated to ca. 6·10-3 (range 4–14·10-3) – values that are strikingly

Spontaneous mutations – Genome-wide mutation rates


close from the one observed in DNA-based microbes (see Figure 2). Nevertheless, these cal-

culations derive from very imprecise data. Notably, the actual extent of the so-called junk

DNA is far from being known. Another significant caveat is that the values mentioned above

correspond to mutation rates per effective genome per cellular generation. The strict equiva-

lent of the DNA-based microbe’s µg would arguably be better expressed per sexual genera-

tion. The corresponding mutation rates in higher eukaryotes vary widely in the range 3.6·10-2–

1.6 mutations per effective genome per sexual generation (Drake et al., 1998).

b - Mechanisms of genetic maintenance

Dispite relentless environmental injuries and repeated round of genetic information

copying, most organisms achieve strikingly low mutation rates. Accurate maintenance of the

genomic integrity is mediated by dedicated mechanisms that ensure both the fidelity of the

replication process and the reparation of DNA lesions.

i. Replication in Hi-Fi

As any copying process, DNA replication is very sensitive to noise. One would there-

fore expect it to be prone to high rate of error. In the absence of environmental injures and re-

pair mechanisms, in vivo substitution rates are in the range of 10-7–10-8 error per base pair in

both prokaryotes and eukaryotes (Kunkel, 2004 and see Figure 3). The intrinsic fidelity of

replicative polymerases is mediated by the topology of their active sites, which ensures a

stringent geometric selection for the shape and size of correct base pairs. Moreover, these po-

lymerases are unable to past replicate lesions or extend from mismatches, thereby dampening

the fixation of mutations during replication. The inhability to extend mismatches further pro-

vides polymerases with the opportunity to proofread. Replicative polymerases are indeed en-

dowed with 3’→5’ exonuclease function that permits to edit the last base replicated, and

readily correct 90–99.9% of erroneous pairing (Kunkel, 2004; and see Figure 3). Besides, or-

ganisms evolved various strategies to control the balance and availability of nucleotides and

to limit exposure to mutagens.

ii. Overview of repair mechanisms

Despite the overall fidelity of the replicative process, a substantial number of mispairs

arise during replication. Moreover, damaged bases arise continuously in a time-dependent and

replication-independent manner (Drake et al., 1998). If not repaired, these mutations are fix-

ated in the genome during replication, while their expression may lead to deleterious pheno-

Introduction – Control of genetic diversity


types in the meantime. Both prokaryotic and eukaryotic cells have evolved a number of

mechanisms to detect and repair various types of DNA damages. Many of the proteins in-

volved in these mechanisms are highly conserved between extremely remote organisms, illus-

trating their fundamental importance. Nevertheless, the molecular details of the underlying

pathways have considerably diversified over evolutionary times. Besides, different organisms

exhibit varying sets of mechanisms. Different types of damages are processed by specific and

sometime functionally redundant systems. Four major strategies can be distinguished: i) in

situ reversal of mutations; ii) resynthesis using the undamaged opposite strand; iii) recombi-

nation; and iv) transient tolerance of mutations.

(i) Some altered bases can be repaired in situ by specialized enzymes, thereby directly

reversing mutations. This form of repair does not require cleavage of the DNA backbone nor

polymerization, and thus limits the odds to produce breaks and replication errors. However,

the associated mechanisms are inherently limited in scope. So far, this strategy has been re-

ported for the repair of UV-induced cyclobutane pyrimidine dimers and (6–4)-photoproducts

by photoreactivation (Sancar, 2008), and for the repair of some alkylation damages (Mishina

et al., 2006). While photoreactivation use light as a source of energy to break chemical bonds,

Spontaneous mutations – Genome-wide mutation rates


the removal of alkyl groups involves the stoichiometric consumption of the dedicated alkyl-

transferase, which is metabolically costly.

(ii) Three repair systems rely on the more or less precise excision of erroneous bases

from the damaged strand, followed by resynthesis using the information carried on the oppo-

site strand. This strategy allows dealing with a wide variety of damages, but also promotes the

occurrence of DSBs by introducing nicks and exposing ssDNA. The nucleotide excision re-

pair (NER) pathway specifically targets mutations introducing bends in the DNA helix, such

as those produced by UV (see Figure 4). In E. coli, this function is carried by the UvrABCD

proteins (Truglio et al., 2006). Importantly, NER can be coupled to transcription, which fa-

vors the repair of regions that are likely to be of phenotypic importance in both prokaryotes

and eukaryotes (Deaconescu et al., 2007). The base excision repair (BER) fixes lesions that

are similar in size and shape to the normal ones. It is the predominant mechanism to handle

spontaneous DNA damages caused by free radicals and other reactive species. The initiation

of BER essentially relies on the recognition of lesions by specific DNA glycosylases (see

Figure 5). The glycosylase repertoire of a given genome hence specifies the range of damages

Introduction – Control of genetic diversity


that can be addressed (Baute and Depicker, 2008). The mismatch repair (MMR) is dedicated

to the processing of erroneous base pairs as well as various other damages. Its sole activity re-

sults in a 50-1000-fold increase in fidelity. The core MMR machinery consists of the MutSLH

proteins (Li, 2008; and see Table 2).

(iii) DSBs resulting from the cleavage of both DNA strands in the same DNA region

constitute one of the most hazardous genomic damage. Indeed, no proximal source of infor-

mation is available to direct the religation of broken ends. This situation notably arises from

exposition to ionizing radiations, replication of nicked sites, collapse of replication forks and

spontaneous cleavage of ssDNA exposed in the course of other repair mechanisms. Two main

recombination pathways are used to deal with such damages: homologous recombination

(HR) and non-homologous end joining (NHEJ). In E. coli, the RecA recombinase is the cen-

tral protein of the HR machinery. RecA specifically binds and coats ssDNA to form a recom-

binogenic nucleoprotein filaments. RecA-coated nucleofilments can invade duplex DNA

regions and search for sequence homologies in an ATP-dependent manner. The identification

of a homology results in the formation of a four-stranded DNA structure, called Holliday

junction (HJ) (see Figure 6A). At best, the resolution of HJs is only associated with the non-

reciprocical exchange of genetic information, a process called gene conversion (see p71).

However, if the contacted homology is not the true counterpart of the damaged region, resolu-

tion can lead to chromosomal rearrangements (see p29). Through its activity, the MMR ma-

chinery is involved in controlling mitotic recombination in prokaryotes and eukaryotes, as

well as crossing over during meiotic sex (see Table 2). Particularly, MMR promotes intrage-

Spontaneous mutations – Genome-wide mutation rates


nomic stability by preventing recombination between too divergent sequences. In HGT-prone

prokaryotes, this function is a major determinant of the species barrier (Ishino et al., 2006;

Dillingham and Kowalczykowski, 2008).

Introduction – Control of genetic diversity


NHEJ is a straightforward mechanism to deal with two-sided DSBs, but is far less

faithful than HR. During this process two broken DNA ends are simply joined together after

limited processing of the DNA ends, resulting in a quick but error-prone repair. The central

player in NHEJ is the DNA-end binding protein Ku. This pathway has first been evidenced in

eukaryotes (Burma et al., 2006). It is not present in E. coli, but has been found in a variety of

other bacteria and archaea (Shuman and Glickman, 2007).

(iv) The last strategy consists in tolerating damages to buy time for other repair

mechanisms to operate on the lesion. As mentionned above, the replicative polymerase cannot

accommodate most altered base pairs, leading replication forks to pause. The HR machinery

is essential to the recovery of stalled forks through the formation of a reversed structure,

Spontaneous mutations – Genome-wide mutation rates


whereby the annealing of the two nascent strands forms a four-way junction similar to a HJ

(see Figure 6B). In this structure, the interrupted nascent strand can be replicated using the

other nascent strand as a template, thereby allowing the lesion to be bypassed indirectly. Al-

ternatively, HJ resolution can reinstate the DNA duplex containing the lesion, while produc-

ing a one-sided DSB that can be further processed by the HR machinery as described above.

Yet, this latter process is a risky endeavor because it creates an intermediate that may promote

chromosomal rearrangements. Stalled replication forks can also recruit specialized error-

prone (EP) polymerases to past replicate the lesion (see Figure 3, p34). These poorly proces-

sive enzymes are lacking editing properties but can accommodate various altered templates

and incorporate complementary bases with varying degrees of accuracy – a process refered to

as translesion synthesis. This strategy is a two-edged sword: while accurately replicated le-

sions are transiently tolerated, the others directly results in the fixation of mutations.

c - The lowest… the best

The production of genetic variation can be regarded as a necessary evil. Most muta-

tions are deleterious (Eyre-Walker and Keightley, 2007) and their continuous appearance

jeopardizes the maintenance of an organism on the short term. Nevertheless, the production of

genetic novelty is required on the long term to keep in line with ever changing environments

(see Appendix, p260). The existence of taxonomic patterns of mutability strongly suggests

that genomic mutation rates might be adjusted to an evolutionary trade-off between these two

trends. Because repair mechanisms are genomic caretakers, they constitute privileged agents

to affect mutation rates. Accordingly, the action of this impressive arsenal of mechanisms

could be fine tuned in order to connive at just the sufficient amount of mutations necessary to

ensure proper evolutionary power, without hampering instantaneous survival. Evolution,

however, is a shortsighted process: phenotypes – and their underlying determinants – are se-

lected on the basis of their immediate reproductive advantage, not for the sake of potential fu-

ture benefits.

In this context, what would be the fate of an allele that modifies the global mutation

rate? This question was first asked by A. H. Sturtevant (1891-1970) some 70 years ago

(Sturtevant, 1937) and has been the object of numerous studies since then (Kondrashov, 1995;

Sniegowski et al., 2000; De Visser, 2002). From a theoretical point of view, a gene affecting

the mutation rate is called a modifier. A modifier allele per se has no effect on fitness, and is

thus not directly subjected to selection. Instead, it is involved in altering other genes, thereby

producing mutations which may affect fitness. The prospective selective regime undergone by

Introduction – Control of genetic diversity


modified genes then reflects on the modifier, because all are present in the same genome.

Modifier alleles are thus selected indirectly through their effects on other genes to which they

are genetically linked – a process termed genetic hitchhiking (Maynard-Smith and Haigh,

1974). The effectiveness of hitchhiking depends on the linkage disequilibrium between the

loci considered, i.e. on the propensity of the physical link between genes to be broken by re-

combination. Increased recombination rates decrease the average time during which the modi-

fier benefits from indirect selection by hitchhiking. The indirect selection of modifier genes is

often referred to as second-order selection, because it relates to the evolution of the capacity

to evolve.

An individual bearing a modifier allele associated with a decreased mutation rate has

higher probability to maintain its genetic integrity. Because most mutations are deleterious in

steady conditions (Eyre-Walker and Keightley, 2007), its mean fitness is on average higher

than the one of its surrounding competitors, ensuring its evolutionary success. At the popula-

tion level, the rise in frequency of the modifier reduces the overall genetic load of deleterious

alleles which is nonetheless maintained by mutation-selection balance. Thus, proximal evolu-

tionary forces tend to decrease the mutation rate as much as possible. Because recombination

erodes genetic linkage, there is theoretically a much stronger selection for the reduction of

mutation rates in asexual or selfing organisms than in sexual species. Hence, the indirect se-

lection of the weakest modifier is expected to be more significant in prokaryotes.

If selection systematically favors lower mutation rate, the following question arises:

why mutation rates are not falling to zero? There are three major answers to this question: i)

the observed level of fidelity has reached a maximum level, and cannot be heightened for

physicochemical reasons; ii) the sophisticated mechanisms required to achieve high fidelity

are costly in both energy and time, and this trades-off with the production of deleterious muta-

tions; and iii) the previous reasoning assumes the primacy of deleterious mutations, but the

rare generation of advantageous mutations can introduce a counterbalancing selection for an

increased mutation rate – thereby leading to an equilibrium.

The measures of mutation rates per base pair per generation (µb) presented earlier are

disparate and must be normalized by the genome size to reveal a constant pattern (see Figure

2, p32). Thus, different genomes achieve different absolute level of fidelity and lower muta-

tion rate are likely to be attainable. This suggests that proposition (i) must be rejected, at least

in some taxa. Hypothesis (ii) has been first posited by M. Kimura (1924–1994) (Kimura,

1967), but remains difficult to test experimentally. A proper demonstration requires the estab-

lishment of a direct link between altered mechanisms of fidelity and effective fitness. How-

Spontaneous mutations – Genome-wide mutation rates


ever, the mechanisms of fidelity are responsible for mutations that result in indirect pheno-

typic effects and strongly affect fitness measures. Impairing repair mechanisms may indeed

results in energy savings and increased growth rate, but the mutational load incurred on the

population might rapidly overweigh these effects. It is difficult to disentangle these direct and

indirect effects to obtain a clear picture. Nevertheless, the assumption that fidelity impinges a

physiological cost is perfectly sound. Indeed, repair mechanisms involve the synthesis of spe-

cialized machineries and their functioning is very costly in ATP. Furthermore, repair func-

tions often introduce new mutations at the expense of overall fidelity to avoid individual cell

death. Some of these mutations (e.g. illegitimate recombinations) are irreversible, while others

(tolerated mismatches) have a chance to be further repaired. That the processes of mainte-

nance and survival produce genetic alterations by themselves results in an infinite regression.

Intuitively, the zero mutation point is thus unattainable, and the costs of repair are likely to

rise sharply with the level of fidelity. Despite the lack of decisive experimental evidences, hy-

pothesis (ii) is generally given most credit. Then, selection would favor the highest practical

level of fidelity. Broadly speaking, the equilibrium between the short sighted fight against

mutations and the inherent physiological cost of doing so would thus provide enough muta-

tions for successful evolution to occur.

However – as we will see in the next sections – some increases in mutation rates can

be advantageous when the basal mutation rate does not suffice to drive efficient adaptation.

However, such increases are better kept transient in time or restricted to specific loci to be

adaptive. The necessity for a temporal containment will be first illustrated by a very rough

mechanism (see Lessons from the mutator phenomenon, p46). Well described examples of

fine-tuned global processes achieving this goal will then be presented (see Stress-induced

mutagenesis, pp 47–60). Genetic mechanisms allowing spatial targeting and phenotypic orien-

tation of mutations will be extensively described in the next chapter (see Programmed genera-

tion of genetic variations, pp 60–99). Such spatio-temporal containments of mutations do not

significantly affect the mean genomic mutation rates, urging to reject hypothesis (iii). Altera-

tions that increase mutation rates have been identified easily (see next section). The corre-

sponding mutator alleles generally result from impairment of the repair machinery. If

mutation rates were fine tuned to equilibrium between production and avoidance of altera-

tions, it would likewise be possible to identify alleles responsible for decrease in mutation

rates. Nevertheless, evidences for such anti-mutators are very sparse. EP polymerases consti-

tute anti-mutators because their inactivation may decrease the mutation rate in certain condi-

tions. However, these enzymes are deeply integrated in repair mechanisms and their

Introduction – Control of genetic diversity


impairment affects immediate cell survivals. The case of translesion polymerases is thus

somewhat controversial and will be further discussed later (see Survival and variability during

SOS induction, p50). Apart from these specialized polymerases, very few anti-mutator alleles

have been described and their phenotypes are generally unclear (Schaaper, 1998; Schaaper

and Dunn, 2001; Dzidic and Petranovic, 2003). While this scarcity may simply reflect the dif-

ficulty of isolation, it rather highlights the invalidity of hypothesis (iii) with respect to hy-

pothesis (ii).

I.1.3. General mutators and the ambiguity of repair systems

a - Natural occurrence of mutators

Mutator strains displaying high mutation rates have been found at significant frequen-

cies (0.1%–60%) in natural populations of pathogenic bacteria, including Escherichia coli,

Salmonella enterica, Neisseria meningitides, Haemophilus influenzae, Staphylococcus aureus,

Helicobacter pylori, Streptococcus pneumoniae, and Pseudomonas aeruginosa (Denamur and

Matic, 2006). Furthermore, among the twelve parallel cultures of E. coli experimentally

propagated for decades by Lensky and co-workers, three fixated a mutator phenotype

(Sniegowski et al., 1997). Altogether, these observations strongly suggest that global in-

creases in mutation rates can be selected in particular conditions. In E. coli, ca. 20 different

genes that are typically involved in maintaining genomes integrity can confer mutator pheno-

types of different strengths (Horst et al., 1999). Nevertheless, almost all natural mutators cor-

respond to the inactivation of the MMR genes mutS and mutL (Denamur and Matic, 2006).

b - The short-term advantages of increased mutation rates

When a population is perfectly adapted to its current environment most – if not all –

mutations have negative or at best neutral effects on fitness (Eyre-Walker and Keightley,

2007). As highlighted above, mutators are counter-selected in these conditions. However, the

likelihood that a particular mutation may prove advantageous increases when the population

is under-adapted. Increased mutation rate may then speed up adaptation when the environment

is heterogeneous, or when it change continually through time (De Visser, 2002). This effect

was observed in experimental infection of axenic mice. The mutator showed a strong advan-

tage in colonizing the mouse gut when the initial inoculum was a poorly adapted laboratory

strain. However, when the inoculated bacteria were already well-adapted to the mouse envi-

Spontaneous mutations – General mutators and the ambiguity of repair systems


ronment, mutator derivatives were slightly counterselected (Giraud et al., 2001). The exam-

ples mentioned above mentioned fit within this framework. Indeed, pathogens are exposed to

ever-changing environments corresponding to colonization of new hosts or new niches in the

same host, and are constantly challenged by the host immune system. In Lenski’s long term

experiment, bacteria are confronted with particularly unusual conditions to which they have

not been selected to cope with in nature (continuous exponential growth in a nutrient-limiting


When a population is challenged with a new environment, a mutator has larger oppor-

tunities to produce an advantageous mutation. The corresponding modifier allele can then

hitchhike on the mutation to reach significant frequency in the population. Because advanta-

geous mutations are rarer, the indirect selection to increase the mutation rate is largely more

sensitive to recombination than the indirect selection to decrease the mutation rate. In organ-

isms subjected to high recombination, the advantageous mutation is therefore rapidly segre-

gated away from the mutator allele that caused it (Tenaillon et al., 2000). Consequently, the

hitchhiking of the modifier allele is very transient, and the genomic mutation rate of the popu-

lation is largely unaffected. In contrast, low levels of recombination allow sustained hitchhik-

ing in bacteria, accounting for the natural occurrence of mutator phenotypes in these


Two factors particularly influence the rise of mutators in asexual populations: i) the

strength of the mutator alleles; and ii) the population size. Computer simulation showed that

mutator alleles of large effects have the highest probability of hitchhiking (Taddei et al.,

1997b). Consequently, indirect selection on mutators does not result in fine tuning the muta-

tion rates toward an optimal equilibrium level, as exposed in hypothesis (iii) of the previous

section (see p40). In contrast, asexual populations have a propensity to fixate sharply elevated

mutation rates through hitchhiking. The role of mutator strength is better understood when

considered in conjunction with the population size. Experimental competitions between muta-

tor and non-mutator bacterial strains showed that a mutator subpopulation requires a mutation

rate which is increased by more than the inverse of its numerical disadvantage to have a sig-

nificant chance to produce the next beneficial mutation, and hence invade the population

(Chao et al., 1983). In the absence of directional selective pressure, the frequency of mutators

in bacterial populations has been estimated to ca. 10−6–10−5 (LeClerc et al., 1998; Boe et al.,

2000). Then, a typical mutator subpopulation should exhibit mutations rates increased by

>106–105-fold to consistently take the population over. Yet, mutator phenotypes do not gener-

ally exceed ca. 103-fold increase in mutagenesis (Denamur and Matic, 2006). The rise of mu-

Introduction – Control of genetic diversity


tator in natural population would then involve accidental enrichments in mutator individuals.

Alternatively, a robust benefit in favor of the mutator subpopulation arises when several

epistatic mutations are required to produce an adaptive phenotype. Indeed, these mutations

would occur in independent non-mutator cells while a single mutator individual has higher

probability to produce the required sequence of mutations in a given amount of time. This

suggests that larger populations favor the rise of mutators (Tenaillon et al., 1999). However, a

subtle effect may nuance this conclusion. Too large increases in the rate of beneficial muta-

tions may decrease the overall adaptation pace of a population. Indeed, several independent

advantageous clones may arise at the same time in a sufficiently large population with a

strong mutator phenotype. These clones then engage a competition with each other – a phe-

nomenon known as clonal interference (Gerrish and Lenski, 1998). The relative selective ad-

vantage of any beneficial mutation is decreased in the presence of the others, resulting in

slower rise in frequencies. Due to the hampering effect of clonal interference, large or not

well-adapted mutator populations may thus not adapt faster than similar populations with

lower mutation rate (De Visser et al., 1999).

c - Long-term consequences of increased mutation rates

Apart from the transient situation presented above, wherein the sequential production

of mutations is needed to reach an optimal fitness, the selection of a mutator must be regarded

as an indirect consequence of adaptation, and not an adaptation by itself (Sniegowski et al.,

2000). Indeed, as the population becomes adapted to its new environment, the distribution of

fitness effect is shifted toward increased prevalence of deleterious mutations (Silander et al.,

2007; Martin and Lenormand, 2006). The selective pressure against deleterious mutations rap-

idly comes to prevail and the relative advantage of being a mutator vanishes. Instead, the mu-

tator allele induces a load of deleterious alleles that accumulate in the population, and

selection for lower mutation rate is soon renewed. If the population does not revert to a non-

mutator state, the short-term benefit of mutagenesis irremediably turn into a tragic extinction

in the long run. In this light, the rise of mutators in bacteria can be viewed as a deleterious by-

product of asexuality (Sniegowski and Murphy, 2006).

The naturally occuring mutS and mutL mutators are generated by a variety of muta-

tions including frameshifts, insertions, premature stop codons and deletions. While simple re-

version may occasionally occur in the first cases, mutator populations resulting from deletions

seem doomed. The selection of compensatory mutations is a possible alternative, but is also

unprobable. Hence, a mutator population has limited possibilities to reestablish a low muta-

Spontaneous mutations – General mutators and the ambiguity of repair systems


tion rate. Nonetheless, a particular feature of mutS and mutL mutants may incidentally explain

their prevalence among natural mutators. The mismatch repair system is indeed involved in

setting the specificity of homologous recombination. As a result, mutS and mutL mutator

strains show a 100-fold increase in recombination rate (Denamur and Matic, 2006). No only

increased recombination may facilitate the association between beneficial mutations through

HGT, but functional alleles of the impaired mismatch repair gene may also be acquired from

surrounding non-mutator bacteria, thereby increasing the odds of reversion toward standard

mutation rate. In support to this idea, the mutS and mutL genes display a patchy pattern of se-

quence polymorphism that indicates frequent events of homologous recombination between

E. coli isolates (Denamur et al., 2000). Besides, the three lines that evolved a mutator pheno-

type in the Lenski experiment are all impaired in mutL. A 6-bp repeat present in three copies

in the wild-type gene is present in four copies in one of the three mutator lines and two copies

another (Shaver and Sniegowski, 2003). This might constitute a mechanism of facilitated and

reversible switch between mutator and non-mutator strain (see Localized mutation through

slipped-strand mispairing, p61). In the same light, eukaryotic mismatch repair genes (Chang

et al., 2001) and bacterial genes involved in stress-responses (Rocha et al., 2002) seem to be

particularly enriched in small repeated units.

In rapidly changing environments, the population is never expected to be really

adapted, and it would continuously benefit from increased genetic variance. However, even in

this specific context, the indirect selection of strong mutators becomes highly problematic

with time and reversion to a non-mutator state seems ultimately mandatory in the long-term.

Indeed, in the course of adaptation to a new environment, genomes accumulate many muta-

tions that are immediately neutral, but may reveal deleterious in subsequent environments. Al-

ternatively, mutations that improve functions that are needed in a given environment may

negatively affect other functions that are essential elsewhere – a principle termed antagonistic

pleiotropy (Cooper and Lenski, 2000). In other words, if increased rates of mutation can raise

the adaptation pace they also turn organisms into niche specialists at the expense of adaptive

flexibility in the long run (Giraud et al., 2001). This effect is potentiated by recurring selective

sweeps. Indeed, each time a particular individual generates an adaptive mutation, its whole

genotype – including currently neutral and slightly deleterious mutations – increases in fre-

quency in the population, at the expense of the overall genetic diversity. This phenonmenon

reduces the population effective size and quickly results in a mutational meltdown. Passages

through real population bottlenecks – such as those that often occur during colonization of

new hosts by pathogens – are similarly expected to facilitate the action of Muller’s ratchet.

Introduction – Control of genetic diversity


During an experimental study, wild-type and mutS defective cells were subjected to 40 cycles

of single-cell bottlenecks. By the end of the experiment, 4% of mutS lineages had died out,

55% had auxotrophic requirements, 70% had defects in at least one sugar or catabolic path-

way, 33% had a defect in cell motility and 26% became temperature-sensitive lethals. In sharp

contrast, only 3% of the wild-type lineages displayed detectable phenotypes (Funchain et al.,

2000). In a similar experiment involving wild-type and msh2 yeast populations, two mutator

lineages out of twelve had gone extinct by the 175th cycle, while none in the wild-type (Zeyl

et al., 2001).

d - Lessons from the mutator phenomenon

The overall picture emerging from the mutator phenomenon is one in which the muta-

tion rate is continually buffeted between its lowest standard and largely increased values, ow-

ing to successive and antagonistic indirect selections on modifiers. Mutator phenotypes can be

transiently advantageous, but soon become deleterious and standard mutation rates must be

restored to ensure long term survival. These observations reveal the evolutionary advantage of

targeted hypermutability in avoiding the accumulation of mutational loads. Despite the inher-

ent cost of producing essentially deleterious mutations, global increase in mutation rate en-

sures maximal genetic “creativity”. Any mechanism that limits the time spent in the

hypermutable state when not necessary would soften the associated genetic burden, without

restricting the range of attainable mutations. This can be achieved by favoring the constitutive

wavering between increased and standard rates of mutation, a process that relies on the indi-

rect selection of mutagenic states by environmental pressures. Alternatively, refined mecha-

nisms can target the genome-wide production of mutations to periods wherein innovation is

needed. This will be discussed in the next section (see Stress-induced mutagenesis, below).

Nevertheless, any increase in genomic mutation rate induces its share of long-term detrimen-

tal mutations. When specific genetic elements particularly benefit from increased diversity, an

advantageous strategy is to specifically target variation to these elements without incurring

deleterious mutations elsewhere in the genome. As will be exemplified throughout the next

chapter, mutations can be targeted to specific loci by a variety of mechanisms (see Pro-

grammed generation of genetic variations, pp 60-99). This strategy radically reduces the mu-

tational load – though at the cost of decreasing the scope of accessible mutations. Obviously,

an ideal adaptation system would combine the best features of the strategies mentioned above.

As we will see, the integron system achieves this goal to some extent.

Stress-induced mutagenesis – The SOS paradigm


I.2. Stress-induced mutagenesis

The need for genetic diversity is not constant through time but depends on changing

selective environments. Such varying demands can be fulfilled through the rise and fall of

constitutive mutator alleles. However, this process relies on stochastic cycles of second-order

selection affecting modifier alleles, which is clearly not optimal. In this light, it seems intui-

tively advantageous for organisms to evolve mechanisms whereby the environment could in-

fluence the availability of variations on which selection can act. Because they are keepers of

the genome integrity, existing repair mechanisms – such as the SOS response – stand as privi-

leged agents to mediate this phenomenon.

I.2.1. The SOS paradigm

a - The SOS response to DNA damage

Some of the pathways involved in DNA repair are subjected to specific patterns of

physiological regulation that restricts their action to periods when genomic integrity is jeop-

ardized. The setup of complex regulatory circuits in place of constitutive expression is proba-

bly motivated by the deleterious interference of some repair enzymes with the functioning of

the cell under normal conditions, as well as by the energetic cost imposed by the synthesis of

the repair machineries. In addition, this control allows coordinating the action of different re-

pair mechanisms. All organisms are endowed with integrated responses to DNA damage,

though to different degree of complexity.

The initial recognition that such physiological responses may exist initially came from

the description of the SOS system in E. coli. The SOS system is a regulatory network com-

prising ca. 50 genes in E. coli (Wade et al., 2005). The expression of at least 31 of these genes

is directly controlled by LexA, the negative regulator of the system (Fernandez De Henestrosa

et al., 2000). The other genes are likely to be either secondary targets or regulated in non-

canonical ways. LexA exerts its repressive effect by binding to specific DNA motifs known as

LexA-boxes (consensus 5’-CTGTATATATATACAG-3’). One or several LexA-boxes are

generally located in the promoter region of the regulated gene, preventing access to the poly-

merase complex by steric hindrance when LexA is bound. Derepression of the regulon de-

pends on the auto-cleavage of LexA – a phenomenon mediated by the pleiotropic RecA

protein associated with ssDNA. RecA constitutes the positive regulator of the SOS response,

Introduction – Control of genetic diversity


and its action is conditioned by the presence of excess single-stranded DNA (see Figure 7).

In E. coli, the SOS genes are involved in various functions, including DNA replication

(umuC, umuD, dinB, polB); DNA recombination (recA, ruvA, ruvB, RecN); transcription

(lexA, fis, dinI, dea, suhB); nucleoside metabolism (grxA); transport (ycgH, glvB, ybeW, corA)

and cell division (sulA). This list only comprises the direct targets of LexA for which a clear

function has been defined. For a comprehensive table see Courcelle et al., 2001. The induc-

tion level of SOS-controlled genes upon activation of the response relies on several factors:

the number of binding sites, the intrinsic affinity of the sites to LexA and their location in the

regulatory region (Friedberg et al., 2005a). LexA binding may occur cooperatively when sev-

eral boxes are present in the same region (Gillor et al., 2008). Accordingly, the level of induc-

tion varies widely in both strength and dynamic among members of the regulon (Courcelle et

al., 2001; Ronen et al., 2002). These induction dynamics reflect different sensitivity to de-

creased level of LexA, the quickest genes being the ones that can be induced with limited

drop of lexA. Conversely, the genes that are induced late need significant RecA-mediated de-

pletion of LexA to be activated. Therefore, properties of the LexA-boxes are involved in fine-

Stress-induced mutagenesis – The SOS paradigm


tuning the hierarchy of gene induction. Because lexA is part of the SOS regulon, the master

repressor LexA controls its own expression. This negative auto-regulatory pattern typically

ensures tight regulation and rapid recovery of repressing conditions after induction (Alon,

2007). Indeed, standard SOS repression is reestablished within 1.5-2 cell cycles after induc-

tion (Ronen et al., 2002). The positive regulator RecA is also controlled by the response, re-

sulting in a positive feedback loop provided sufficient amounts of ssDNA are available. This

interplay between antagonistic feedback loop – and maybe other unclear mechanisms impli-

cating an UmuDC-mediated checkpoint – establishes a multiple peaked pattern of expression

from SOS promoters at the single cell level (Friedman et al., 2005).

RecA is primarily involved in homologous recombination by catalyzing the invasion

of duplex DNA by ssDNA (see p36). RecA monomers assemble on single-stranded DNA to

form recombinogenic nucleoprotein filaments. The processing of DSBs by the RecBCD com-

plex is a significant source of RecA nucleofilaments (Rosenberg, 2001). DSBs can arise upon

exposure to mutagenic factors either directly or as side-products of the collapsed replication

forks resulting from various DNA alterations. Besides, DNA damages can uncouple the repli-

cation of the leading and lagging strands (Pagès and Fuchs, 2003), and dissociation of the

helicase from stalled replisomes may result in continued DNA unwinding beyond arrested

forks (Rangarajan et al., 2002). Both phenomena might be non-negligible sources of RecA-

coated ssDNA, probably in a RecFOR-dependent manner (Rangarajan et al., 2002; Fujii et al.,

2006). An additional role of RecA associated to ssDNA is to promote the auto-cleavage of

LexA. Therefore, RecA plays the role of a general and integrating sensor of DNA damages in

the SOS system.

Because DNA alterations occur spontaneously as the result of endogenous metabo-

lism, ca. 0.3% of a bacterial population cultured in standard conditions undergo strong SOS

induction (McCool et al., 2004). As increased amounts of DNA lesions often arise from envi-

ronmental injuries, DNA-damage stresses are potentially a good indicator of selective envi-

ronments (Radman, 1975). Some potent inducers – such as UV radiations and the DNA

damaging antibiotic mitomycin C – are known for a long time, while others are reported con-

tinuously. Important oxidative stresses have been shown to induce the SOS response (Imlay

and Linn, 1987). The response is also promoted in aging colonies in a cAMP-dependent man-

ner suggesting links with other physiological responses (Taddei et al., 1995; see also

Mutagenesis in aging colonies, p56). Commonly used antibiotics that are known to interfere

more or less directly with proper replication (such as fluoroquinolones, rifamycins and

trimethoprim) potently induce SOS (Kelley and William, 2006). Other unexpected triggers are

Introduction – Control of genetic diversity


discussed below (see p54). Broadly speaking, any situation generating abnormal amount of

ssDNA can potentially lead to SOS induction.

The SOS system is referred to as a response because a substantial part of the regulon is

precisely dedicated to repairing the DNA damages that induced its derepression. The proteins

UvrA, UvrB and UvrD are essential components of the nucleotide excision repair. RecA,

RuvA and RuvB are implicated in homologous recombination and processing of stalled repli-

cation fork. Furthermore, the three SOS induced polymerases Pol II (polB), Pol IV (dinB) and

Pol V (umuCD) are implicated in translesion synthesis, thereby favoring replication in times

of stress (Napolitano et al., 2000).

b - Survival and variability during SOS induction

Repair mechanisms are devoted to maintain the genome integrity and to ensure imme-

diate survival of individuals. The upregulation of repair proteins by the SOS response is pri-

marily aimed at providing further resources, when increased amounts of DNA damage are

sensed in the cell. For instance, exposition to UVs efficiently triggers the SOS response. Fol-

lowing exposure, the NER pathway is overwhelmed by the increased number of UV-induced

photoproducts, which then cause more frequent stalling of the replication forks. The RecA

nucleofilament that results from the processing of collapsed forks leads to decreased level of

functional LexA and SOS induction. The subsequent increase in UvrABD allows NER to

cope with the excess of photoproducts. In the same light, higher levels of RecA, RecN, RuvA

and RuvB allow cells to overcome increased amounts of DSBs and stalled replication forks.

The expression of the otherwise tightly repressed sulA gene blocks cell division and causes

cell filamentation. This process constitutes a kind of check-point that buys time for cells to

perform the required repairs before potential template copies of the genetic information are

separated (Friedberg et al., 2005a). Another checkpoint may implicate Pol V (see below). The

SOS induced upregulation of LexA ensures the effective repression of the regulon as soon as

the amount of ssDNA dropped due to effective repair.

The inactivation of the SOS response leads to tremendously increased lethality upon

exposure to genotoxic stresses such as UVs (see Figure 8 and Figure 9). The response is thus

clearly involved in promoting individual survival. By doing so, increased genetic variability is

incidentally introduced in the population. Indeed, repair mechanisms occasionally produce

mutations, such as illegitimate recombination events. In this respect, the role of the SOS-

induced EP polymerases is particularly ambiguous. Their primary role is probably to alleviate

the workload of other repair mechanisms (Matic et al., 2004). By past replicating lesions that

Stress-induced mutagenesis – The SOS paradigm


would otherwise lead to collapse of the rep-

lication fork, they limit the production of

problematic DSBs. Besides, damaged sub-

strates can be replicated accurately during

translesion synthesis. Lesions are then tran-

siently tolerated with the perspective that

they might be properly repaired subse-

quently. However, the derepression of Pol

II, Pol IV and Pol V is responsible for a sig-

nificant increase in mutation rate.

Pol V is the most mutagenic EP po-

lymerase in E. coli. Its functional expression

is nevertheless subjected to complex control

mechanisms. PolV is constituted by two

UmuD’ monomers assembled with UmuC

(Goodman, 2002). LexA tightly represses

the umuD and umuC genes, so that no ex-

pression is detectable under normal circum-

stances. Both genes are strongly induced,

though lately in the course of the response –

and to reach fairly low absolute levels of ex-

pression (Courcelle et al., 2001). To be

functional in PolV, UmuD must undergo a

post-translational activation into UmuD’.

This modification corresponds to a self-cleavage induced by RecA nucleofilaments following

a mechanism similar to the one undergone by LexA (Friedberg et al., 2005a). UmuD and

UmuD’ each form homodimers, and interact with each other to form a heterodimer that is

more stable than either of the homodimers (Matic et al., 2004). Thus, the effective formation

of Pol V requires sufficient amount of RecA nucleofilaments to ensure the predominance of

UmuD’. In addition, interaction with a RecA nucleofilament polymerized downstream of the

stalled forks seems to be a required co-factor of the polymerase activity (Friedberg et al.,

2005b). Overall, only ca. 15 functional Pol V molecules per cell are assembled lately upon

SOS induction (Fuchs et al., 2004). The requirement for the late presence of RecA nu-

cleofilaments can be regarded as a kind of checkpoint (Opperman et al., 1999; Friedman et

Introduction – Control of genetic diversity


al., 2005). In this view, Pol V-

mediated translesion synthesis is used

as a last line of defense to deal with

damages that other mechanisms were

not able to overcome.

Contrasting with this situation,

Pol IV is expressed at an appreciable

concentration under normal condition

(ca. 250 molecule per cell) (Fuchs et

al., 2004). This may reflect the in-

volvement of this protein in restarting

spontaneously arrested forks. Pol IV is

encoded by dinB, the expression of

which is upregulated by a ca. 10-fold

factor relatively rapidly after SOS in-

duction. Pol IV is mostly implicated in

mismatch extension, which constitutes

a tolerance mechanism relying on the

subsequent action of MMR. However,

the EP polymerase is also able to past

replicate some types of lesions, thereby promoting mutagenesis. Pol II is occasionally able to

past replicate few lesions, but is mainly implicated in restarting stalled replication fork by in-

directly bypassing the lesion using the other nascent strand as a transient template for replica-

tion (see p38). A wide variety of lesions can be tolerated through this mechanism. Pol II is

encoded by polB and is overexpressed by a 4-fold factor very early after SOS induction.

Overall, Pol II and Pol IV seem to compete for rescuing arrested replication forks. Pol II is

overexpressed first – maybe to deal with arrested forks in the least mutagenic fashion. Pol IV

then enters in competition with Pol II, and offers increased opportunities to extend misaligned

strands and bypass certain lesions. If these mechanisms are not sufficient to deal with the ar-

rested forks, large amounts of DSBs sustaining a high level of RecA nucleofilament may be

produced, thereby favoring the derepression of umuC and umuD. Ultimately, RecA-coated

ssDNA filaments stimulate the expression of Pol V, which bypass the excess of lesions in a

mostly mutagenic manner.

Stress-induced mutagenesis – The SOS paradigm


Even though the hierarchical pattern of regulation imposed to the SOS-induced EP po-

lymerases may minimize their mutagenic effects, they are responsible for the accumulation of

most mutations during the SOS response. Exposure of E. coli to UV leads to a 100-fold in-

crease in mutations, which are primarily targeted at damaged sites (Friedberg et al., 2005a). In

umuC or umuD backgrounds, UV-induced mutations are largely reduced (Goodman, 2002).

This obsrvation, however, may reflect increased death rates. The important point is that muta-

tions are not introduced for their own sake, but in order to promote survival – as other repair

mechanisms do. The SOS response promotes survival of individual at the cost of increased

mutagenesis. If mutations were to be avoided at any price, the best strategy would be to let al-

tered cells die instead of repairing them. The evolution of such a behavior would require of

selective pressure at the population level, because no individual-based selection can favor

death. Such a phenomenon is conceivable in multicellular organism through apoptosis. Sev-

eral observations suggest that controlled cell death may play an important role in bacterial

populations experiencing some sorts of stressing conditions (Yarmolinsky, 1995; Lewis,

2000; Bayles, 2007; Rice and Bayles, 2008). In all these situations, termination of a subpopu-

lation promotes survival of the other cells, thereby benefiting the clonal population. In the

case of DNA-damaging stress, the existence of such a benefit is unlikely. Indeed, if a muta-

tion created while attempting repair proves deleterious, the cell will eventually die, incurring

little cost to the population. The generation of increased variability is thus much likely a by-

product of desperate effort to survive.

It remains that this shortsighted survival mechanism promotes the generation of heri-

table genetic variance in fitness. This can prove adaptive at the populational level by occa-

sionally producing mutations capable of overcoming the triggering stress. This process is

particularly well illustrated by reports on the acquisition of antibiotic resistances. For in-

stance, in a mouse infection model using E. coli, the appearance of resistant strains can be

evidenced soon after treatment with a widely used fluorquinolone antibiotic (ciprofloxacin).

However, a mutant strain that is unable to induce the SOS response does not give rise to resis-

tant derivatives in the same conditions (see Figure 9). Ciproflaxacin is a potent inducer of the

SOS response. Resistance to this antibiotic is easily acquired through point mutations in the

gyrA gene, which codes for the DNA topoisomerase targeted by the drug. Further genetic

analyses showed that all three SOS-inducible polymerase are required for the generation of

these point mutations (Cirz et al., 2005). Increased amount of HR events may similarly result

in advantageous genome rearrangements. In this light, the generation of mutations consecu-

tive to the SOS response incidentally results in a transient mutator phenotype – which can

Introduction – Control of genetic diversity


help bacteria to adapt stressful situations. Such a stress-induced mutagenesis can be subjected

to second-order selection, just as standard mutators do (Bjedov et al., 2003; Tenaillon et al.,

2004). However, clear-cut evidences indicating that this is indeed the case are lacking because

the observed phenomenon is also a side effect of stress-resistance mechanisms emerging from

first-order selection.

c - Extending the SOS response

Two different lines of evidence may strengthen the view that the SOS response is in-

volved in adaptive mutagenesis: i) the hijacking of the response by elements unrelated to

DNA repair; and ii) its induction by stresses that are not direcly linked to DNA damages. Par-

alleling the situation in E. coli, 33 genes are regulated by LexA in Bacillus subtilis (Au et al.,

2005). Nevertheless, the corresponding LexA-box consensus is remarkably divergent from E.

coli’s (5'-CGAACN(4)GTTCG-3' versus 5’-CTGTA(10)CAG-3’, respectively) (Groban et al.,

2005). Only a handful of genes are shared between the two regulons (recA, lexA, ruvA, ruvB,

uvrA, uvrB, and uvrC). Strikingly enough, all are involved in DNA repair. Because E.coli and

B. subtilis are very distantly related bacteria, this subset of genes may be regarded as the core

of the LexA regulon. However, distinct LexA-boxes and regulons have been identified in

other bacteria, mostly using bioinformatic inferences. When all these data are taken into ac-

count, the core regulon is drastically reduced to lexA alone – even if recA, ruvAB, uvrA and

ssb are commonly found under the control of LexA (Erill et al., 2007). Besides, three other

LexA-regulated genes involved in translesion synthesis (imuA, imuB, and dnaE2I) were found

to be widely associated with lexA and may constitute an alternative ancestral operon (Erill et

al., 2006). In this view, the use of easily implementable – though mutagenic – translesion

functions consistently predates the incorporation of more sophisticated repair mechanisms in

the SOS response. In any case, these results highlight the universality and the tremendous

plasticity of the LexA regulon over evolutionary times in both gene content and binding motif


(i) In this context, it shall not be surprising that new genes plug into the SOS regulon

so as to take advantage of its responsitivity. Notably, several mobile genetic elements have

been more or less directly linked to the response. The lytic cycle of several phages is re-

pressed by CI, a DNA binding protein that undergoes a RecA-mediated autocatalytic cleavage

similar to LexA and UmuD (Sauer et al., 1982). This suggests either a co-option of the RecA

induction pathway by bacteriophages or a possible bacteriophage-related origin of the lexA

gene. Then, the same triggers that induce the SOS response also promote prophage excisions

Stress-induced mutagenesis – The SOS paradigm


in a lexA-independent fashion, allowing them to selfishly evade potentially compromised

hosts (Redfield, 2001). The mobilization and dissemination of the SXT integrative conjuga-

tive element (ICE), which carries several determinants of antibiotic resistance is controlled in

a similar fashion (Beaber et al., 2004). Some prophages are more directly plugged in the SOS

response. Two studies carried in E. coli (Shearwin et al., 1998) and S. enterica (Bunny et al.,

2002) reported situations wherein prophages are repressed by a non-cleavable version of CI,

but include a gene (tum) encoding a LexA-repressed CI antirepressor. In V. cholerae, the

cholera toxin encoding phage CTXφ is controlled by a complicated mechanism involving two

repressors, including LexA (Quinones et al., 2005).

Asides from phages, the mobilization of few TEs were linked with SOS induction.

Transposition of IS10R, the right-hand module of Tn10, is indirectly increased upon induction

of the SOS response by UVs. The exact mechanism mediating this phenomenon is unknown

but may involve upregulation of the ihfA gene encoding a subunit of the integration host fac-

tor IHF (Eichenbaum and Livneh, 1998). In contrast, LexA directly represses the transposase

carried by the IS50R element of Tn5, and genetic data show that SOS functions result in in-

creased transposition frequency. However, no such increase could be monitored using extrin-

sic inducers such as UVs or mitomycin C (Kuan and Tessman, 1991, 1992). UVs were

nonetheless found to promote the excision of Tn5 and Tn10 in a similar manner, and to lead

to Tn1 excision at higher doses (Aleshkin et al., 1998).

Most of these behaviors can be interpreted as escape of selfish elements in front of ad-

verse conditions (Matic et al., 2004). Nevertheless, they can also increase the adaptive poten-

tial of the population. Although the induction of bacteriophages results in cell lysis, they can

transfer host genes to new recipient cells and may bring together the adequate set of determi-

nant to overcome the triggering stress. For instance, the mobilization of prophages upon SOS

induction is involved in the specific recruitment, encapsidation and subsequent transfer of

pathogenicity islands present in the genome of S. aureus (Ubeda et al., 2005; Maiques et al.,

2006). In addition, a significant proportion of phages contain factors enhancing the fitness of

their host, such as virulence or antibiotic resistance determinants. This is readily illustrated by

CTXφ, which encodes the cholera toxin (Quinones et al., 2005). The spread of such determi-

nants is particularly facilitated in the case of the conjugative element SXT (Beaber et al.,

2004). TEs are also frequently associated with such factors. Furthermore, their mobilization

can have a significant impact on the architecture of individual genome (see Transposition,

p28). The transposition of both Tn5 and Tn10 has been shown to promote adaptation in a

chemostat (Chao et al., 1983). Because it may rely on an SOS-induced host factor, the in-

Introduction – Control of genetic diversity


creased transposition Tn10 may provide an example of SOS function that is primarily targeted

at generating variability. As we will see later, the discovery that LexA controls recombination

in integrons provides a clear-cut example that the responsitivity afforded by the SOS system

can be purposefully co-opted by an adaptive mechanism (see Results – Recombination in in-

tegrons is controled by the SOS response to stress, p171).

(ii) The adaptive potential of the SOS response can also be derived by extending the

range of evoking situations. As mentioned earlier, several antibiotics are able to trigger the re-

sponse. All of them impact mechanisms linked with DNA processing, thereby providing a

straightforward causal explanation to their inducing effects. The discovery that β-lactam anti-

biotics can induce the response was unexpected. Indeed, this class of antibiotics exerts a bac-

tericidal effect by inhibiting the synthesis of the peptidoglycan layer of bacterial cell-walls

(Koch, 2003). The stress seems to be relayed to the SOS system by the DpiBA two-

component sensor system (Miller et al., 2004). In the same light, acoustic cavitations

(Vollmer et al., 1998) and high-pressure stresses (Aertsen and Michiels, 2005) have been

shown to induce the SOS system. In the latter case, activation of the cryptic type-IV restric-

tion endonuclease Mrr would mediate induction by introducing DSBs in the genome. These

examples illustrate an evolutionary strategy that consists in diverting an existing diversity-

generating system in order to increase adaptability to unrelated stresses.

I.2.2. Other examples of stress-induced mutagenesis

Other examples of stress-induced mutagenesis have been reported in various bacteria,

yeasts and human cancer cells. A comprehensive review is provided by Galhardo et al., 2007.

Although these processes vary widely in their molecular details, all rely on a common theme,

whereby existing DNA processing machineries are more or less directly subverted by one (or

more) regulatory response to stress. Two significant processes implicated in different sorts of

mutations are discussed below. A general picture emerging from the coupling of physiological

response to diversity-generating mechanisms will be further discussed later (see Physiological

regulation of mutagenesis, p110).

a - Mutagenesis in aging colonies

A linear increase in mutagenesis has been observed in aging colonies of E. coli grow-

ing for seven days on agar plates (Taddei et al., 1995). The induction of the SOS response –

and specifically the proteins RecA, UvrB and DNA Pol I – is required in this process (Taddei

Stress-induced mutagenesis – Other examples of stress-induced mutagenesis


et al., 1995; Taddei et al., 1997a). Increased mutagenesis is also dependent on the excretion of

cAMP, which is part of the release of catabolite repression in response to carbon starvation

(Taddei et al., 1995). In this light, this phenomenon can be regarded as a cellular response to

starvation stress.

To assess the prevalence of

stress-induced mutagenesis in the

wild, the impact of aging was moni-

tored in a large collection of E. coli

isolates originating from a wide

range of habitats worldwide

(Bjedov et al., 2003). Among 787

natural isolates, >80% exhibited in-

creased mutagenesis in 7-days old

as compared to 1-day old colonies.

Focusing on a single isolate, the au-

thors found that the response to ag-

ing is mostly LexA-independent

and thus does not directly rely on

the SOS response – as observed for

the laboratory strain. However, it

relies on genes that are part of the

SOS regulon (polB and recA). Con-

sistent with earlier results,

mutagenesis was dependent on the cAMP/CRP regulatory network. The global response to

stress orchestrated by the RpoS/σS was also found to be required. This alternative sigma fac-

tor upregulates ca. 340 genes – many of which play various roles in stress resistance (Weber

et al., 2005). Its expression is induced by a wide variety of conditions, including entry into

stationary phase; starvation; acid pH; osmotic shocks; cold shocks and oxidative stresses

(Hengge-Aronis, 2002). These genetic requirements further strengthen the idea that general

stress responses are implicated in mutagenesis through repair mechanisms.

The genetic discrepancies evidenced between laboratory and natural strains provide a

glimpse of the mechanistic diversity underlying stress-induced mutagenesis. In the aforemen-

tioned E.coli collection, the magnitude of stress-inducible mutability ranged from <1 to

>1000-fold among isolates (Bjedov et al., 2003; and see Figure 10). This pattern further em-

Introduction – Control of genetic diversity


phasizes the large variability that affects the genetic determinants of this phenomenon among

different strains.

b - The competence state

At least 40 taxonomically diverse bacterial species are naturally transformable (Lorenz

and Wackernagel, 1994). Most of them essentially use the same machinery to acquire naked

ssDNA from the extracellular environment (Chen and Dubnau, 2004). The assembly of this

machinery is tightly controlled, and defines a specialized physiological state termed compe-

tence. The distribution of genes involved in competence suggests that many more species

might be transformable – though the conditions triggering this developmental state are not

known (Claverys and Martin, 2003). Contrasting with other mechanisms involved in HGT –

such as transduction and conjugation – transformation relies on a mechanism that is inherent

to the species, and independent of extrachromosomal elements. Although competence can be

regarded as the only process to be genuinely dedicated to DNA exchange in bacteria, the ex-

act role of DNA uptake remains unclear. Three major hypotheses have been suggested: i) the

incoming DNA may provide a source of genetic diversity after recombination; ii) it may be

used as a template for DNA repair; or iii) it could be used as a food source (Redfield, 2001).

The major model organisms to study natural competence are the Gram-positive Bacillus sub-

tilis and Streptococcus pneumoniae and the Gram-negative Haemophilus influenzae Rd and

Neisseria gonorrhoeae. Although similar overall, their mechanisms of competence differ in

such a way that a generalization concerning their evolutionary role may not be relevant.

In B. subtilis, S. pneumoniae and H. influenzae, the competence regulons (com) have

been identified using microarrays (Claverys et al., 2006). In both Gram-positive species, the

number of induced genes greatly exceeds that required solely for genetic transformation. This

strongly suggests that the competence state is involved in a wider response than transforma-

tion alone. Notably, few SOS genes are upregulated in the competent state of B. subtilis (Love

et al., 1985; Hamoen et al., 2002). Although the determinism of this pattern is globally un-

clear, the competence activator ComX overrides the repressive action of LexA in the pro-

moter region of recA (Hamoen et al., 2001). S. pneumoniae does not seem to contain a SOS

response, but recA is induced in competence triggering conditions only if the competence

regulatory cascade is intact (Claverys et al., 2006). Altogether, these data suggest that the in-

duction of the competence regulon is coupled with the recombination machinery – which sup-

ports a role for transformation in repair and genetic diversity.

Stress-induced mutagenesis – Other examples of stress-induced mutagenesis


However, in H. influenzae the com regulon is essentially composed of genes for DNA

uptake and processing and do not comprise recA. Besides, this regulon is controlled by CRP,

which is responsive to carbon source depletion (Redfield et al., 2005). These data rather sup-

port the idea that DNA is uptaken to spare the metabolic cost of nucleotide synthesis or as a

source of carbon, energy, nitrogen and phosphorus (Redfield, 2001). Inconsistent with these

observations, H. influenzae is known to selectively uptake DNA fragments containing short

and specific signal sequences (Smith et al., 1999). These sequences are highly overrepre-

sented in its genome compared to other bacterial species. One hardly sees the point in restrict-

ing access to DNA if it is uptaken only to be degraded. In contrast, both B. subtilis and S.

pneumoniae efficiently take up DNA from any source – although only homologous DNA

from related species is normally recombined into the cell's chromosome due to the restrictive

action of MMR (see p36).

In the two Gram-negative bacteria, the induction of competence involves a quorum

sensing mechanism whereby the accumulation of a small peptide pheromone in crowded

populations activates a two-component regulatory system that relays the signal down to the

competence regulon. This can be interpreted as an adaptation favoring genetic exchanges

when DNA from conspecifics is most likely available (Morrison and Lee, 2000). In the same

light, competent S. pneumoniae cells can trigger the lysis of non-competent cells (Steinmoen

et al., 2003; Guiral et al., 2005). This fratricide process may serve to increase the local avail-

ability of DNA sequences from distant strains with a different pheromone type (Claverys et

al., 2006). However, quorum-sensing often have well-established roles in nutrient acquisition

and might act as an early warning signal of nutrient shortages, which are likely to result from

high population density (Redfield, 2001). Moreover, the competence state is induced by star-

vation in both H. influenzae and B. subtilis. These observations rather support the “DNA-as-

food” hypothesis.

The failure of mitomycin C to induce transformability in B. subtilis and H. influenzae

(Redfield, 1993) and the report that a small genomic fragment had the same effect on UV sur-

vival as total chromosomal DNA upon transformation in H. influenzae (Dubnau, 1999) led to

the rejection of the “competence-for-repair” hypothesis (Redfield, 2001). However, mitomy-

cin C as well as aminoglycoside and quinolone antibiotics have recently been shown to induce

competence in S. pneumoniae (Prudhomme et al., 2006). Altogether, the available evidences

suggest that the competence state in this species may function as a global stress response,

wherein genetic transformation is used for repair in the absence of SOS system (Claverys et

al., 2006).

Introduction – Control of genetic diversity


As with the SOS response, the involvement of competence in the production of genetic

diversity may arise as a side effect of repair function. The many contributions of HGT to

modern bacterial genomes evidenced by comparative genomics suggest that these side effects

can prove advantageous (Ochman et al., 2000; Koonin and Wolf, 2008). However, contempo-

rary genomes only grant us access to successful evolutionary events. New genetic combina-

tions are likely to be more often harmful than beneficial, just as point mutations do (Redfield,

2001). In this context, transformation can be seen as modifier trait that is essentially subjected

to the same constraints as other mutators. To date, there is no evidence supporting the com-

mitment of transformation to genetic diversity rather than DNA repair in some species (e.g. S.

pneumoniae), and the uptake of DNA as a nutrient source appear as a sound hypothesis in

others (e.g. H. influenzae). Interestingly, in B. subtilis the competence state has been found to

be required for cells to revert point mutations in auxotrophic alleles when grown on minimal

medium. This process relies on a putative EP polymerase and limiting concentration of MMR

proteins, but is RecA-independent (Robleto et al., 2007). The involvement of competence in a

process of stress-induced mutagenesis that is not directly linked with recombination of incom-

ing DNA underlies the complex interconnection existing between adaptive responses.

In the context of this thesis, it is noteworthy that the marine bacteria Vibrio cholerae

has recently been demonstrated to be naturally transformable. Although the underlying ge-

netic pathways are not known in details, the induction of competence has been shown to rely

on: i) the availability of chitin (the most abundant polymer in the marine environment); ii)

stresses, such as nutrients starvation (RpoS general response); and iii) a high bacterial density,

such as in biofilms (quorum sensing) (Meibom et al., 2005). The competence-mediated acqui-

sition of large genomic segments in these conditions can account for part of the remarkable

genomic plasticity evidenced in Vibrionaceae (Miller et al., 2007a; Thompson et al., 2005).

I.3. Programed generation of genetic variations

In a changeable world, long-term stability of fitness is found in the adaptive variation

that mutability provides. The previous chapter highlighted the deleterious nature of most mu-

tations, and the consecutive advantages of restricting increased mutagenesis to stressful peri-

ods whereby innovations are really needed. Although a genome-wide increase in mutation

rate is the most “creative” source of genetic novelty, it always leads to the accumulation of a

Programed generation of genetic variations – Localized mutation through slipped-strand



genetic load in the population by affecting traits that should be kept stable (Roth et al., 2006).

When recurrent modifications of the same traits are required over time, natural selection can

favor the emergence of specific and local mechanisms facilitating targeted variations. Punc-

tual regions of the genome are thus evolutionarily programmed to be inherently hypervariable.

In many cases, mutations are controled both quantitatively and qualitatively – thereby permit-

ing variations to be channeled toward a specific region of the phenotypic space. Overall, these

mechanisms enable the generation of mostly adaptive innovations only where they are fre-

quently needed – with little harm to the global integrity of the genome.

I.3.1. Localized mutation through slipped-strand mispairing

a - Replication slippage

The terms replication slippage and slipped-strand mispairing refer to a general

mechanistic process accounting for the higher than expected instability of repeated DNA

stretches through replication. Stalling of the replication fork is a common event which is de-

termined by exogenous (DNA damaging agents) as well as intrinsic (presence of DNA bind-

ing proteins, unusual DNA structures…) factors in both eukaryotes and prokaryotes (Mirkin

and Mirkin, 2007). It frequently results in the disassociation of the replisome and in the con-

secutive separation of the daughter and parental DNA strands. Reannealing of repetitive se-

quences can lead to stable misaligned intermediates containing a bulge. Larger numbers of

repeated motif increase the stability of illegitimate intermediates and permit the formation of

mismatches away from the replication complex, thereby favoring polymerization over proof-

reading (Kunkel, 2004). If resulting mismatches are not corrected in the meantime, a second

round of replication fixate either a decrease or an increase in repeat number, depending on

whether the bulge is on the template strand or the nascent strand, respectively (see Figure 11).

b - Simple Sequence Repeat (SSR) are variable loci

Genomic regions made up of contiguous iterations of simple DNA motifs are overrep-

resented in natural DNA sequences. The repeated motifs consist of one to several nucleotides,

forming for example a homopolymeric tract of guanine or tandem repeats of 5′-CAA-3′.

These structures are generically referred to as simple sequence repeat (SSR). In eukaryotes,

SSRs are often called micro- or mini-satellites (where the unit of repetitive DNA is 1-6 or >6

nucleotides, respectively). Whereas eukaryotic microsatellites often comprise hundreds of re-

Introduction – Control of genetic diversity


peats, in bacteria the numbers

of repeated units are generally

substantially less than 100. SSR

loci are often polymorphic. In-

deed, both eukaryotic and bac-

terial SSRs are prone to

expansions or contractions in

the number of repeat units

through slipped-strand mispair-

ing. In eukaryotes, where the

number of repeats is higher, un-

equal homologous recombina-

tion may also be involved. The

initial development of a SSR

must rely on the fortuitous for-

mation of a repetitive seed

trough random substitutions or

duplication of a small DNA


The mutation rates of

SSR loci are generally com-

prised between 10−2 and 10−5,

numbers that are orders of

magnitude higher than those

corresponding to standard spontaneous mutations (see p30). The spontaneous, stochastic and

reversible phenomenon of replication slippage thus endows SSR loci with a high and specific

mutation rate. Importantly, this mechanism does not require any specific apparatus and can be

seen as an emergent property of the DNA replication process.

SSRs are commonly regarded as junk DNA. Widely used techniques such as DNA

fingerprinting, lineage analysis and gene mapping actually rely on the assumption that SSR

mutations are neutral. This biased view is mostly based on observations from eukaryotic mi-

crosatellites. Indeed, SSRs were first described to be rather rare in prokaryotes (Tautz et al.,

1986). The prolificacy of fast growing organisms such as bacteria is largely dependent on ge-

nome size. In contrast to most eukaryotes, bacterial genomes are thus under strong selection

Programed generation of genetic variations – Localized mutation through slipped-strand



for compactness, leaving few non-functional intergenic regions where SSR mutations could

actually be neutral. This probably helped to establish a direct link between SSR loci and

relevant phenotypes. The analysis of complete genome sequences provides a systematic ap-

proach to identify putative SSR loci and determine their overall distribution (Medini et al.,

2008). The first study of this kind was carried on the genome of Haemophilus influenzae

(Hood et al., 1996). All SSR regions identified in this genome are reported in Table 3. Strik-

ingly, most of the repeated motifs are embedded in functional genes. Furthermore, the annota-

tions of these genes are enriched in specific functions, such as lipopolysaccharide (LPS)

synthesis. These examples readily illustrate that SSRs, located either within the reading

Introduction – Control of genetic diversity


frames of genes or their promoters, can impact quantitative traits. When SSRs are located

within reading frames the length of the repeated unit is often multiples of three. In eukaryotes,

many structural- and cell-surface proteins, as well as transcription factors seem to have

evolved by expansion of minisatellites, with each repeated unit encoding an oligopeptide mo-

tif. Mutations in these SSR can results in qualitative variations of the protein by finely modu-

lating their function (Kashi and King, 2006). However, most mutations affecting SSR have a

quantitative impact on gene expression. Most expansion-contraction events result in coding

sequence frameshifts, thereby enabling ON-OFF binary phenotypic switches by the produc-

tion of a non-functional, truncated protein. The impact of SSR variations on expression can

also results in more nuanced phenotypes. Mutation in the beginning of coding sequences can

shift the translation initiation site, thereby modifying the level of expression of the protein

without altering its function (Kashi et al., 1997; Dawid et al., 1999). Besides, expansion-

contraction of repeats located upstream of the translational start codon can affect the binding

of transcription factor, alter the spacing between regulatory element and promote competition

between alternate promoters. This provides an effective way to fine-tune phenotypic traits

through the analogical modulation of transcription rate (Moxon et al., 2006). In eukaryotes,

some SSRs have also been reported to impact the DNA structure and packaging, and to affect

mRNA splicing (Kashi and King, 2006).

c - Phenotypic impact

Most instances of SSR have been investigated in silico and proper experimental vali-

dations of the potential role of variations are lacking. This section provides detailed examples

and emphasizes the functional impact of SSR on fitness.

In prokaryotes, early-recognized phenomena generally involve a dual and reversible

switching between ON and OFF states, corresponding to full and no (or markedly decreased)

expression of a phenotypic trait. Such processes are known as phase variation and result in the

diversification of a clonal population into two distinct subpopulations. More complex pheno-

types bestowed by the combinatorial effects of several phase varying loci are referred to as

antigenic variation, because they essentially affect cell surface structures implicated in host-

pathogen interactions.

A well documented example of antigenic variation concerns the aforementioned LPS

synthesis in H. influenzae – whereby tetranucleotide repeats within genes alter expression

through alternative transcription start site orprotein inactivation (Moxon et al., 2006). LPS is

the major antigenic structure of the cell envelope in gram-negative bacteria. Its nature deter-

Programed generation of genetic variations – Localized mutation through slipped-strand



mines the physiological properties of the envelope and is critical in defining the virulence of

the bacteria. The role of three phase-variable proteins in generating different LPS through the

combinatorial addition of core sugars to the molecule backbone has been elucidated in detail

(see Figure 12). Importantly, clear selective advantages have been associated with different

forms (Weiser and Pan, 1998). For instance, addition of phosphorylcholine by the lic1-

encoded protein is associated with more efficient colonization of nasopharyngeal epithelia,

but is also targeted efficiently by innate immunity. In contrast, variants wherein lic1 expres-

Introduction – Control of genetic diversity


sion is switched off are more resistant to host clearance. Besides, switching of several glyco-

syl-transferases implicated in LPS modification was also shown to promote resistance to anti-

body-mediated clearance. Similar modifications of the LPS have been documented in

Neisseria ssp (Yang and Gotschlich, 1996) and Helicobacter pylori (Bergman et al., 2004).

The rapid phenotypic switching may be important for adaptation in the course of infection or

during transmission from host to host.

Most studies of phase-variation have focused on pathogens, and therefore the associa-

tion with virulence and immune evasion has been emphasized. A metabolic involvement of

SSR has been identified with the aphC gene of E. coli. A one-motif expansion of a (TCT)4

tract has been shown to drive the functional conversion of the encoded protein from a peroxi-

dase to a glutathione-glutaredoxin reductase. Interconversion between the two alleles should

mediate alternate survival to oxidative and disulfide-mediated stress with a single phase vari-

able gene (Ritz et al., 2001).

The biological role of most eukaryotic microsatellites remains uncertain but some have

been elucidated. Paralleling the bacterial case, gene-associated SSR seem to affect predomi-

nantly cell-surface proteins involved in cell adhesion and flocculation in Saccharomyces cere-

Programed generation of genetic variations – Localized mutation through slipped-strand



visiae (see Figure 13). In this organism, two alleles of the ras2 gene that differ by the pres-

ence of A9 and A10 poly-A tracts in the promoter region were shown to confer high and very

low sporulation frequency, respectively. As sporulation efficiency is a significant life-history

trait for yeast, this polymorphism is likely to be of adaptive signification (Kashi and King,


In higher eukaryotes, non-neutral SSRs have long been associated with deleterious ef-

fects. The first phenotypes that were associated with SSR polymorphism were related to dis-

eases. Particularly, the so-called triplet repeat diseases includes well-known hereditary

pathology (Fragile X, Huntington’s disease, spinocerebellar ataxia, cleidocranial dysplasia…)

and are caused by homopolymeric amino acid stretches within proteins. These disorders are

characterized by a peculiar pattern of inheritance that has been referred to as genetic anticipa-

tion, because symptoms become more severe and tend to appear earlier in successive genera-

tion. This exemplifies the potential analogical effect of SSR-based mutations.

More recently, SSR polymorphism has been implicated in diverse adaptive pheno-

types. In Drosophila melanogaster, the per gene is implicated in circadian clock control and

contains hexanucleotide repeats encoding (Thr-Gly) iterations. One allele containing 17 repeat

yields a circadian period closer to 24 hours, whereas a 20-repeats variant is less adjusted but

proves less sensitive to temperature fluctuations. The geographical distribution of the two al-

leles correlates with temperature so that the buffering allele is associated with colder regions,

a pattern that is indicative of positive selection (Sawyer et al., 1997). Strong correlations be-

tween selected phenotypes and SSRs have been reported in dogs. Emblematically, the pres-

ence of extra toes in the breed Great Pyrenees is consistently linked with a 51-bp contraction

of a hexanucleotide repeat in Alx-4. Reinforcing the observation, this gene was previously as-

sociated with polydactyly in mice (see Figure 14). The fact that dogs were actively subjected

to intense artificial breeding has certainly provided a good material to identify such relation-

ships. This illustrates how mutations at SSR loci might be important in morphological adapta-

tion to natural selection (Fondon and Garner, 2004). The last example extends the range of

phenotype altered by SSR to include social behaviors. Indeed, different species of voles have

been observed to display distinct social behaviors ranging from highly social and monoga-

mous to definitely asocial. These differences have been linked to the expression pattern of the

vasopressin receptor. Social species exhibit a compound SSR in the 5’ regulatory region of

the corresponding avpr1a gene, much of which is absent in asocial species. This was shown to

result in differential expression with respect to cell types, much probably through differential

Introduction – Control of genetic diversity


binding properties to transcription factors. Moreover, after artificial selection for longer and

shorter alleles, it was shown that the length of the selected tract correlated with quantitative

differences in brain distribution of vasopressin receptor and in individual social behavior

(Hammock and Young, 2005).

d - SSRs as localized mutators

Because they enable a specific mechanism of mutation, SSR display an increased mu-

tation rate. Besides, these mutations can result in adaptive phenotypic changes. SSR thus pro-

vide a simple strategy to increase phenotypic variability by specifically altering the expression

or function of single genes. In this respect, an important characteristic of SSR consists in their

ability to modulate their own mutation rates within the limits of certain constraints. The muta-

bility of SSR is indeed affected by the length, sequence, number, and purity of the repeated


i. The structures of SSRs influence their mutability

Experimental studies evidenced a correlation between the tract length and the intrinsic

mutability of SSRs. Higher repeat numbers increase the stability of mispaired strands arising

after a replication fork stalled (Kunkel, 2004). Models based on E. coli propose that some

Programed generation of genetic variations – Localized mutation through slipped-strand



SSR form barriers for the DNA polymerase because of their tendency to form secondary

structures, thereby promoting stalling of the replication fork (Bichara et al., 2006). This sug-

gests a direct effect of these SSRs in initiating the slippage responsible for their own mutabil-

ity. In addition, the mispairing frequency is highly sensitive to the degree of homology

between repeats. The accumulation of point mutations can thus stabilize SSRs, whereas active

mutational slippage tends to eliminate imperfect repeats. The purity of the repetitive structure

is thus differentially affected by the directionality of the selective pressure: stabilizing point

mutations are favored by purifying selection against facilitated variations, while diversifying

selection favor the maintenance of highly mutable SSR. Altogether, these mechanisms pro-

vide a direct coupling between the intrinsic mutability of SSR loci and their effects on fitness.

The system is nevertheless constrained by the function of the element in which the SSRs are

located. It seems reasonable to think that the length variations of repeat-encoded peptide are

limited. However, the nature and position of the repeated unit might allow a certain tolerance.

For instance, removal of the tetranucleotide repeats located in 3 H. influenzae genes involved

in LPS synthesis does not impact the protein function. The major role of the SSRs is thus to

mediate frameshift-mediated switching of expression (Moxon et al., 2006). In dogs, variations

in facial shape are best explained by the length ratio of two adjacent SSRs in the transcription

factor Runx-2 (Fondon and Garner, 2004). That the function of the protein depends on a rela-

tive ratio rather than an absolute length hints at a possible mechanism to relieve constraints.

ii. Ideal mutators

Broadly speaking, selection should act against the presence of unstable and mutagenic

SSRs within a gene. Indeed, it has recently been shown that synonymous codons are used in a

way that avoids the emergence of nucleotide repeats within coding sequence (Ackermann and

Chao, 2006; Wanner et al., 2008). This relates to the general observation that minimal muta-

tion rates are favored in the absence of favorable conditions (see The lowest… the best, p39).

Each allele of a given SSR locus encodes both a phenotypic effect – its repeat number – and a

mutation rate. These loci are thus subject to a second order selection process: natural selection

acting on the fitness effects of SSR alleles also indirectly selects their mutation rates. Because

its site for mutation is itself, an SSR locus can then be viewed as a nearly ideal mutator (Kashi

and King, 2006). This setup maximizes the genetic linkage between what mutates and what is

mutated, thereby avoiding the breakage of hitchhiking by recombination. Furthermore, this

strategy comes with a minimal genetic load because surrounding loci are maintained under

Introduction – Control of genetic diversity


standard mutation rate. Another important characteristic of SSRs-based mutations is their fac-

ile reversibility, because extensions and contractions are produced according to the same

mechanism. This allows alleles to fluctuate between a discrete numbers of states, thereby ori-

enting the phenotypic impact of variations.

iii. Extrinsic influences affecting SSR mutation rates

Apart from the cis effect of SSRs on their own mutation rates, some factors may be in-

volved in trans regulation. In N. meningitidis, the inactivation of the mutS or mutL genes in-

volved in MMR leads to striking phase-variation increases in genes containing

mononucleotide repeats (Richardson and Stojiljkovic, 2001). Furthermore, meningococcal

isolates containing natural variations in mutS and mutL were also associated with increase in

phase variation rates (Richardson et al., 2002). In the same light, saturation of MMR upon

natural transformation also results in increased variations at SSR loci. Together, these data

provide a mechanistic insight on the coupling of SSR with processes implicated in the modu-

lation of global mutation rates. This effect however may depend on specific characteristics of

the repeat. Indeed, experiments on artificial systems in E. coli showed that MMR is more effi-

cient in processing shorter mispairing (Parker and Marinus, 1992). In H. influenzae, tetranu-

cleotide repeats are refractory to the activity of MMR while artificial dinucleotide repeats are

affected (Bayliss et al., 2002).

These results suggest that in some cases SSRs mutability can be induced by stressing

conditions, such as those leading to saturation of the MMR machinery. Another hint in this

direction comes from the observation that oxidative stresses can destabilize artificial microsa-

tellites in E. coli (Jackson et al., 1998). Being prone to elongate mismatched termini, the SOS-

induced EP polymerase Pol IV may be expected to play a role in this process. However, this

seems not to be the case (Jacob and Eckert, 2007). In wheat, a report suggested that SSR mu-

tation rates are promoted by fungal infections (Schmidt and Mitter, 2004).

Programed generation of genetic variations – Mutation by intragenomic recombination


I.3.2. Mutation by intragenomic recombination

The quantitative and qualitative adjustments of phenotypes provided by the targeted

mutation of SSRs might be one of the simplest and straightforward means of increasing the

genetic diversity on which selection can act. The repeated motifs involved in this mechanism

are typically short. Longer repetitions are involved in a variety of recombination mechanisms

that impact the structure of the genome. These events subvert the standard recombination ma-

chinery of the cell, or alternatively relie on specifically dedicated systems to produce poten-

tially adaptive variations.

a - Meiotic sex

By mediating the reassortment of allele whitin a population, meiotic sex is expected to

increase the actual variance in fitness. Although this process occasionally produces fitter

genotypes, it also breaks advantageous combinations down – thereby incurring a recombina-

tional load (de Visser and Elena, 2007). While obligatory in some species, sexual reproduc-

tion is a facultative trait in other. The choice of sexual, as opposed to asexual, reproductive

strategies may provide a strategy to increase the variation in a population during hardship

(Greig et al., 1998; Grimberg and Zeyl, 2005). In this context, individuals essentially gamble

on wheter a process of genetic shuffling can result in fitter offsprings in future environments.

b - Gene conversion

i. Overview

Gene conversion corresponds to

the non-reciprocal transfer of informa-

tion between homologous sequences that

arise during recombination of degraded

DNA ends. This process homogenizes

the genetic information carried by dif-

ferent DNA molecules or loci. Although

gene conversion can be associated with

important genomic rearrangements due

to concomitant crossing-overs, in half of

Introduction – Control of genetic diversity


the cases it is restricted to a small region and preserves the overall genome architecture,

thereby minimizing potentially deleterious effects (see Figure 15). Studies on gene conver-

sion were initially conducted in ascomycete fungi (Saccharomyces, Neurospora or Sordaria),

wherein all the products of a single meiosis are clearly separated from each other in an ascus.

In the absence of such structures, it is often difficult to ascertain if an event arose through

gene conversion or by double reciprocal crossovers between sister molecules. Despite these

difficulties, molecular systems have been set up to successfully cumulate evidences of gene

conversion in prokaryotes (Santoyo and Romero, 2005). The most widely accepted mecha-

nism for gene conversion is currently the DSB repair (DSBR) model shown in Figure 16.

Conversion events can switch the expression states of closely related sequences and create

new sequences through the combinatorial rearrangement of different segments. Representa-

tive examples of adaptive gene conversion are presented below.

ii. Programmed rearrangement through gene conversion

The most remarkable examples of adaptive gene conversion arise from host-pathogens

relationships, wherein pathogens diversify their antigenic transmembrane proteins to escape

the host immune system and establish chronic infections. Ironically, gene conversion is also

responsible for immunoglobulin diversification, notably in chickens (Stavnezer et al., 2008).

The common theme to these mechanisms is that variant gene sequences are transferred

through gene conversion from unexpressed genes (pseudogenes or gene cassettes) into an ex-

pressed locus. Overall, this process corresponds to the duplicative transposition of the donor

sequence. The number of unexpressed cassettes and the diversity of sequences condition the

generation of variability at the expression site. As only the exposed domains of an antigen

need effective variation, gene conversion can readily occurs between invariant regions flank-

ing the variable parts of the sequence. In some instances, successive events of segmental gene

conversion generate combinatorial variations (see Figure 17, p75). Variable systems based on

gene conversion have been evidenced in diverse organisms, providing a good example of

convergent evolution (Palmer and Brayton, 2007). Prokaryotic examples are reviewed in

(Santoyo and Romero, 2005) and (Wisniewski-Dyé and Vial, 2008). Eukaryotes are best rep-

resented by Trypanosoma brucei (Taylor and Rudenko, 2006).

B. hermsii and B. burgdorferi are causal agents of Lyme disease transmitted by soft

ticks. In B. hermsii, persistence within mammalian hosts is ensured by the sequential emer-

gence and replication of variants expressing unique variable membrane proteins (VMPs) ex-

pressed from a single locus. At least 59 full-length and unexpressed vmp genes cassettes are

Programed generation of genetic variations – Mutation by intragenomic recombination


located on linear plasmids and serve as donors

for gene conversion into the vmp expression

site. Recombination of the full-length donor

involves sequences overlapping the 5’-coding

sequence on one end and external to the cod-

ing sequence on the 3’ end. This mechanism

readily provides a strategy to switch between

an appreciable numbers of surface proteins

(Dai et al., 2006). In this system, the switching

rate between different alleles is determined by

the degree of sequence identity on the 5’ side

and by the distance of the 3’ homologous re-

gion with respect to the coding sequence in

non-expressed cassettes. The variability of the

central sequences was not found to affect

switching frequencies (Barbour et al., 2006).

The closely related B. burgdorferi illustrates

an important refinement to this mechanism.

One of its major surface-exposed lipoprotein

(VlsE) is expressed from a single expressed

locus located on a linear plasmid. The spiro-

chete also uses gene conversion to convert the

expressed sequence from a repository of silent

donor sequences. However, the genome of

burgdorferi contains only 15 silent vls cas-

settes (Zhang et al., 1997). Antigenic variants

produced during mice infection results from

combinatorial gene conversion events, so that

a single variant may arise from 6 to 11 events

of segmental recombinations (Zhang and Nor-

ris, 1998). This phenomenon may also occur in

B. hermsii, but might be less obvious because

of the larger number of donor sequences.

Introduction – Control of genetic diversity


In Anaplasma marginale, a tick-borne rickettsia responsible for anaplasmosis in

mammals, avoidance of the immune system is achieved by antigenic variation in genes of the

major surface protein 2 family (msp2). The genome sequence revealed only 5 to 7 donor se-

quences, none of which is full-length. That donor sequences are pseudogenes, ensure that ef-

fective expression occur only at the expression site. Variant arising early in infection are

characterized by simple gene conversion of the msp2 gene. Persistence of the pathogen, char-

acterized by the continuous apparition of mosaic variant through segmental gene conversion,

has been monitored over a period of two years. Simple gene conversion seems initially fa-

vored because the native sequences are more competitive than the chimeric ones when naïve

animals are infected. The advantage of combinatorial variants then arises with training of the

host immune system. Up to four sequential changes have been detected in the expressed gene.

Based on this number, roughly 6500 (94) potential variants could be generated in this seem-

ingly limited system (Palmer et al., 2007).

The most striking example in term of achievable diversity comes with the ethiologic

agent of the sleeping sickness in human, the flagellated unicellular protozoa Trypanosoma

brucei. Bloodstream-form cells of this extracellular parasite are coated with a unique form of

the variant surface glycoprotein (VSG). The VSG coat is an effective protection against com-

plement-mediated lysis, but is effectively targeted by the adaptive immune response. This se-

lective pressure results in sequential parasitemic cycles, whereby new SVG variants emerge

and thrive until a new effective antibody is derived by the host. Dwarfing the number of vari-

ant antigen genes found in other organisms, T. brucei contains a repertoire of at least 1250 to

1500 silent vsg genes, as estimated from an incomplete genome sequence (Berriman et al.,

2005). The vast majority (>1250) are present in tandem arrays ranging from three to 250 cop-

ies and located at subtelomeres, while another set is present on a hundred of stable minichro-

mosomes that seem to have arisen solely to increase the number of telomeric VSGs. In a

process reminiscent of B. hermsii, full-length vsg donors can be mobilized by gene conversion

using characteristic 70-bp repeats upstream of the genes and a conserved domain within the 3’

end of the coding sequence. However, the genome project revealed that >90% of silent vsg are

in fact pseudogenes. These can only be used through segmental gene conversion, like A. mar-

ginale (see Figure 17).

The mechanism of VSG variation reveals additional complications. SVG expression

pattern depends on the phase of the parasite life cycle. T. brucei contains ca. 20 telomeric

bloodstream-form VSG expression sites and ca. 25 metacyclic VSG expression sites, which

are also telomeric but structurally different from the formers. These latter are active im-

Programed generation of genetic variations – Mutation by intragenomic recombination


mediately after infection, but are quickly silenced as the trypanosome switches to the exclu-

sive activation of one of the bloodstream-form expression sites. Then, only a single type of

VSG is exposed at a time. Such a hierarchy presumably avoids exhaustion of the antigenic

repertoire. The process ensuring the mutually exclusive expression of a single VSG gene out

of 20 bloodstream-form expression sites is unknown, but can lead to in situ expression

Introduction – Control of genetic diversity


switches that do not rely on recombination. Irrespective of gene conversion, the expression

pattern can also be switched through telomere exchange, whereby silent cassettes at the end of

the (mini)chromosomes are simply recombined to the currently active telomeric expression

site. The need for increased variability is not restricted to hosts-pathogens, but is a desirable

characteristic to thrive in a variety of ecological niches. The overrepresentation of pathogens

in the reported examples of adaptive gene conversion probably results from a biased focus on

medically or economically relevant organisms. It is not far-fetched to expect the discovery of

similar systems to perform varied functions in organisms with different lifestyles. The regula-

tion of sexual reproduction in yeast through mating type switching provides a good illustra-


Homothallic Strains of S. cerevisiae grow as haploid cells of either the a or mating

type. Only cells of opposite mating type can fuse to form a/ diploids capable of producing

meiotic spores in response to starvation. All physiological differences between a and cells

and between haploid and diploid yeast cells are ultimately determined by the DNA sequences

present at the MAT locus on chromosome III. Haploid – but not diploid – cells undergo fre-

quent inter-conversion of mating type during growth, with frequency reaching once every cell

division. Two regions located at each end of chromosome III, HMLa and HML correspond

Programed generation of genetic variations – Mutation by intragenomic recombination


to silenced backups of the sequences specifying the a and types, respectively. The MAT lo-

cus actually ensures the expression of a duplicate copy of one these regions (see Figure 18).

Owing to conserved regions between the two types, replacement of the current expressed

copy by a silent donor from the opposite type can occur by gene conversion. Importantly, the

switch is initiated by a DSB in the MAT locus specifically introduced by the HO endonuclease

(Haber, 1998).

iii. Setup of a gene conversion system

Operational gene conversion between two sequences requires a specific pattern of ge-

netic diversity, wherein homologies that are sufficiently conserved to allow recombination

flank a region that is sufficiently variable to meditate effective phenotypic changes. How such

a dissymmetric pattern can emerge?

Gene conversion plays an important role in the evolution of multigene families. Nu-

merous phylogenetic studies evidenced a different pattern of evolution between orthologous

and paralogous genes, with paralogs evolving in a non-independent way. Gene conversion is

though to spread identical mutations between closely related paralogs, leading to sequence

homogenization within a multigene family – a process known as concerted evolution

(Santoyo and Romero, 2005). Because high similarities favor recombination and thus gene

conversion, homogenous families are more prone to gene conversion which establishes a self-

sustained dynamic to limit divergence between genes. As advantageous mutations are likely

to maximize their fitness effects when they spread to all the members of a family, directional

selection might be an important drive for concerted evolution.

The specific pattern required for adaptive gene conversion may simply emerge from

concerted evolution. After duplication, two paralogs may stochastically begin to diverge, ei-

ther during a period of relaxed selection or because one copy happen to be non-expressed.

This latter condition would even be a necessary prerequisite. As soon as the trait encoded by

the duplicated gene experiences a diversifying pressure, gene conversion events become ori-

ented by natural selection from the silent copy toward the expression site. In the meantime,

the non-expressed copy is free to cumulate mutations through neutral drift. Only mutations af-

fecting the exposed domains of the protein will be repeatedly selected. The interplay between

frequent gene conversion and positive selection would then lead to the homogenization of the

flanking sequences through concerted evolution, while promoting the maintenance of the

variability cumulated in the central region.

Introduction – Control of genetic diversity


The paralogous babA and babB genes, which code for outer membrane proteins in H.

pylori, may illustrate an intermediate step in the formation of a complex gene conversion sys-

tem. A thorough phylogenetic analysis identified a pattern of concerted evolution restricted to

the 3’ region of the sequence, and it was demonstrated experimentally that gene conversion

indeed occurs between the two genes at a rate of 10-3 (Pride and Blaser, 2002). Upon experi-

mental infection of rhesus monkeys with H. pylori, most of the cells lost the ability to express

BabA. In some isolates, this pheotype corresponded to effective gene conversion with babB

replacing babA. The resulting higher BabB expression was proposed to promote chronic in-

fection by facilitating adherence to the gastric epithelium (Solnick et al., 2004). The main

drawback of the babA-babB system lies in the rapid exhaustion of the available diversity. H.

pylori being naturally transformable, this may be compensated by the acquisition of exoge-

nous bab sequences. The ability to generate advantageous variants may provide the selective

dynamic necessary for the emergence of a refined system at these loci.

Several factors are likely to favor the maintenance of diversity (Taylor and Rudenko,

2006): i) the bigger the repertoire of silent donor, the larger the sequence space explored by

drift; ii) increased number of possible donors results in each particular gene to be activated

less frequently, enabling the accumulation of more mutations and pseudogeneization; iii) the

the lifestyle of the organism may potentiate the diversifying action of genetic drift. This is

particularly evident in pathogens which are exposed to repeated selective sweeps through the

generation of antigenic variants; to population bottlenecks during transmission between hosts;

and sometimes to epidemic dynamics that reduce the effective population size; and iv) some

idiosyncratic phenomena may increase the diversity. For instance the VSG repertoire of T.

brucei is located near the telomeres, which are particularly recombinogenic areas and muta-

tional hotspots.

iv. Specific mechanisms of gene conversion?

The exact mechanisms leading to gene conversion are not completely understood and

may vary between species. Broadly speaking, gene conversion seems to involve the standard

HR machinery and fit into the DSB model (see Figure 16, p73). In bacteria, several studies

evidenced the requirement for RecA and it seems that RecBCD, RecE and RecFOR recombi-

nation pathways can mediate gene conversion. The RecFOR pathway might particularly gen-

erate conversions without crossing-overs. Not surprisingly, impairment of the MMR

apparatus increases the frequency of conversion and allows recombination between more di-

vergent homologies (Santoyo and Romero, 2005). In T. brucei, Rad51 is required for VSG

Programed generation of genetic variations – Mutation by intragenomic recombination


switching and the conversion frequency is dependent on the lengths and similarities of the

homologies as well as on the MMR machinery (Barnes and McCulloch, 2007).

Nonetheless, rates of gene conversion are often much higher than expected for HR.

Furthermore, conversions often occur between short regions of much lower identities than is

usually considered necessary for HR. In neisseriales, conversion events were reported with

micro-homologies as short as 11 bp. In addition, effective conversion rates in artificial sys-

tems were only frequently observed with large homologies (ca. 40 kb), while simple crossover

predominated with shorter repeats (ca. 5 kb) (Smith, 2001). Altogether, these observations

suggest that cis-acting factors or specialized systems may be implicated on top of the regular

machinery of recombination. So far, such systems have resisted detailed characterization. The

mating type switching system of S. Cerevisiae (see Figure 18, p76) exemplifies how high

conversion rates can be achieved. Here, gene conversion is initiated by introduction of a DSB

at the expressed locus. This event relies on the specific recognition of a 16 bp motif located in

the middle of the MAT locus by the site-specific HO endonuclease (Haber, 1998). This sys-

tem leads to high and controllable switching rates by providing the exact event that triggers

gene conversion. Nonetheless, this strategy is costly because DSBs also promotes other DNA

rearrangements that are deleterious to the cell. Moreover, requirement for a site-specific rec-

ognition system would constitute a significant evolutionary constraint to the setup of gene

conversion system.

As far as we currently understand it, gene conversion primarily depends on the stan-

dard DNA repair apparatus. The apparent independence from specialized enzymes ensures the

applicability of this diversity-generating strategy to a variety of genetic situations. One can

speculate that some refined systems have improved on the basic mechanism to include more

specific functions, including site-specific activities. The two next sections focus on mecha-

nisms that specifically rely on site-specific enzymes. The first is the mobilization of TEs by

transposases. The second involves very specialized and sophisticated recombination systems.

c - Transposition

TEs are selfish DNA segments that contain the genetic information necessary to their

mobilization and spread within a genome (see p28). The mobilization process relies on DDD-

or DDE-transposases and do not involve the formation of covalent bonds between these pro-

teins and the processed DNA molecules. Transposases specifically discriminate recognition

sequences flanking TEs to catalyze their insertion and/or excision. In contrast, target sites for

insertion generally consist in few base pairs only, so that most transposons can insert almost

Introduction – Control of genetic diversity


anywhere in a genome (but see for instance Parks and Peters, 2009). The impact of trans-

posons on the phenotype depends on the position of their insertion sites. Although TEs have

been observed in all kinds of genomic compartments, they are predominant in heterochro-

matin and in regulatory regions. It is generally difficult to assess whether this prevalence re-

flects target sequence specificity, chromatin accessibility or filtering of random transposition

events through natural selection (Miller and Capy, 2004).

B. McClintock (1902-1992) early suggested that genome restructuring mediated by TE

activity can be seen as an essential component of the hosts’ response to stress (McClintock,

1950, 1984). Several reports support the idea that environmental stresses indeed increase

transposition rates in some cases. In bacteria, the SOS response has been shown to promote

the mobilization of some ISs (see p54). Starvation stresses (Hall, 1999; Twiss et al., 2005) and

the stationary-phase alternative sigma factor RpoS (Ilves et al., 2001) have also been impli-

Programed generation of genetic variations – Mutation by intragenomic recombination


cated in increased transposition. Besides, transposition efficiency can be modulated by host

factors. Remarkably, the mobilization of Tn10 depends on the interplay between the host in-

tegration factor IHF, a DNA bending protein, and the histone-like nucleoid structuring protein

H-NS (Wardle et al., 2005). H-NS is also involved in Tn5 transposition (Whitfield et al.,

2009). Any stress affecting the expression level of these factors can potentially alter transposi-

tion rates. In the plant Nicotiana tabacum, transcription and spread of the Tnt1 retrotranspo-

son is inducible by several biotic and abiotic stress factors (Melayah et al., 2001). In

mammalian cells, telomere damages promote the transposition of LINE-1 elements (17% of

the human genome) (Morrish et al., 2007).

Stressful conditions may also affect the TEs’ target capture. For instance, H-NS has

been shown to confine the integration of IS903 to a few integration hot-spot sin the chromo-

some of E.coli. Thus, decreased level of this protein may results in wider target distributions.

A striking example provided by the Ty5 retrotransposon of Saccharomyces cerevisiae de-

serves a longer presentation. Long terminal repeat (LTR) retrotransposons form a major class

of eukaryotic TEs (Class I). They are structurally similar to retroviruses and also propagate

using an RNA intermediate, though they have lost the ability to autonomously spread from

one cell to another (Kazazian, 2004). The Ty family of LTR retrotransposons in S. cerevisiae

constitutes a particularly well-studied transposition model. Elements from this family are

known to direct their integration into gene-poor regions, thereby alleviating the burden im-

posed on the host genome. The best understood mechanism is the one of Ty5, which generally

integrate into the heterochromatin of telomeres and silent mating loci (Ebina and Levin,

2007). Targeting of the heterochromatin is mediated by a direct interaction between the inte-

grase and Sir4p, an important component of heterochromatin (Zhu et al., 2003). However,

strong interaction between these two proteins relies on the phosphorylation of the C-terminal

end of the Ty5 integrase. Importantly, the level of phosphorylation and thus the association

with Sir4p has been shown to decrease in stressful conditions, such as nutrient deprivation

(Dai et al., 2007). This work strongly supports a model in which Ty5 leads a double life im-

posed by the host genome, which has hence domesticated the phenotypic impact of the TE

(Ebina and Levin, 2007). Under normal conditions, most of the integrations are directed to-

ward neutral site, whereas stresses trigger the host-mediated relief of this constraint so as to

favor the generation of non-silent variations (see Figure 19).

Introduction – Control of genetic diversity


d - Site-specific recombination

i. Recombinases

Site-specific recombination is mediated by recombinases of the tyrosine or serine

families, depending on the nature of the residue contracting a covalent bond to the DNA

molecule. Though their recombination mechanisms are distinct, members of both families are

able to catalyze DNA synapsis, cleavage, strand exchange and subsequent ligation without re-

quirement for DNA synthesis or high-energy cofactors. Contrasting with transposases,

recombinases mediate a reciprocal exchange between well-defined DNA target sites and es-

tablish covalent

links with the proc-

essed molecules.

The simplest recom-

bination sites are

short duplex DNA

segments (20 to 30

bp) displaying an

inverted pair of rec-

ognition sequences

that bind one dimer (or two monomers) of the recombinase. These sites contain the point of

DNA breakage and joining. The substrate specificity of recombinases provides larger oppor-

tunity for precise genome rearrangements than transposases because the system is orthogonal

to the rest of the genome. Depending on the initial arrangement of the parental recombination

sites, site-specific recombination has one of three possible outcomes: integration, excision, or

inversion of DNA segments (see Figure 20). Many recombination sites are more complicated

and comprise binding sequences for additional proteins that can exert a regulatory and/or

structural role in the recombination process. Such protein can notably modulate the efficiency

of recombination and favor one particular recombination outcome over another (for example,

excision over inversion or deletion) (Grindley et al., 2006). Site-specific recombination sys-

tems have been extensively used to engineer conditional mutants in a variety of organisms.

These biotechnological achievements reflect the natural role of these systems, which are often

selected to introduce reversible genetic diversity at defined loci within a population.

Programed generation of genetic variations – Mutation by intragenomic recombination


ii. Modification through DNA inversion

Most examples of phenotypic variations driven by site-specific DNA inversion pertain

to the bacterial kingdom. Nevertheless, different bacterial species use very diverse strategies

to alter gene expression using this mechanism. The genome sequence of Bacteroides fragilis,

a commensal inhabitant of the human gastrointestinal tract and opportunistic pathogen, illus-

trate plenty of these possibilities. This bacterium uses DNA inversion to control a greater

breadth of systems than any organism described to date.

Two independent teams reported the genome sequence of B. fragilis (Kuwahara et al.,

2004; Cerdeno-Tarraga et al., 2005). In both cases, particular regions were difficult to assem-

ble from the shotgun sequencing data, because the corresponding reads were highly chimeric.

It appeared that these segments were present in two alternative orientations in the starting ge-

nomic material, even though it was extracted from a seemingly pure culture grown for 24h

from a single clone. These projects illustrate the ability of whole genome sequencing to

idetify loci that vary at high frequency (Medini et al., 2008). Further analysis led to the identi-

fication of 31 invertible DNA regions. These have been classified into 6 different classes cor-

responding to different recombination sites – and presumably to different associated

recombinases (Kuwahara et al., 2004). No less than 30 recombination enzymes including 26

tyrosine integrases, 3 serine resolvase-invertases and 1 transposase-invertase have been identi-

fied in the genome (Cerdeno-Tarraga et al., 2005). As shown in Figure 21, the inversion sys-

tems could be associated with four distinct types of modification with different functional

outcomes: i) inversion of promoter-containing segments; ii) generation of hybrid proteins

through exchange of C-terminal domains; iii) rearrangement of gene within operons; and iv)

complex and combinatorial modifications mediated by shufflon-like multiple inversion. Many

genes regulated by these mechanisms are implicated in the synthesis of surface architectures

that may be involved in immune evasion or colonization of different sites in the host, such as

capsular polysaccharides and other outer membrane proteins. Interestingly, many other func-

tions are also affected by site-specific inversions, including several transporters, signal trans-

duction systems, carbohydrate degradation systems, one restriction-modification system and

the molecular chaperone GroES/EL (see p116). This suggests that DNA inversions are used to

control adaptation to a wide range of challenges.

Only one recombinase has been validated experimentally. Fourteen invertible regions

are flanked by inverted repeats similar to those acted on by the Hin invertase, a model serine

recombinase involved in flagellar phase variation in Salmonella typhimurium (van de Putte

Introduction – Control of genetic diversity


and Goosen, 1992). Two homologues (FinA and FinB) of this enzyme could be identified in

one genome of B. fragilis (Cerdeno-Tarraga et al., 2005). FinB being located on a plasmid, it

was absent from the other genome (Kuwahara et al., 2004). Seven of these regions were pre-

viously shown to define invertible promoters controlling the ON-OFF expression switching of

polysaccharide biosynthesis operons (see Figure 21, type 1-a; Krinos et al., 2001; Coyne et

al., 2003). The resulting structural variations in expressed capsular polysaccharides is though

Programed generation of genetic variations – Mutation by intragenomic recombination


to play a major role in allowing B. fragilis to live in close association with the mucosal sur-

face of the intestine. Interestingly, expression of the finA gene itself might be determined by

the orientation of a flippable promoter segment putatively controlled by nearby recombinase

(Kuwahara et al., 2004). This model would account for the coordinated modification of cap-

sular expression through the hierarchical chaining of several inversion events.

Apart from the polysaccharide operons, the evidences of structural inversions in these

genomes derive from computational analyses and experimental verification by PCR. The like-

lihood that these inversions are indeed functional is nonetheless supported by their close re-

semblance to well described model systems. Unraveling the intricate regulatory controls that

affect the site-specific recombination rates require in depth experimental dissection of particu-

lar systems. To illustrate this, the next section provides an overview of the data cumulated on

the fim model system.

iii. Control of the fim promoter inversion system

The phase variation of type 1 fimbriae in E. coli K-12 constitutes one of the best-

understood models of site-specific promoter inversion. A wealth of detailed analysis allowed

researchers in the field to describe the overall functioning of the switch as a system and pro-

vide a glimpse of the complex mechanisms involved in modulating switching frequencies.

Type 1 fimbriae are the most common fimbrial adhesins in E. coli isolates and seem to be of

particular importance in mediating attachment to host tissue during colonization of the human

bladder. These fimbriae are encoded by the fim operon. The main subunit of the fimbriae is

FimA, the expression of which phase varies through the inversion of a segment containing its

promoter (Pa). The invertible element is flanked by inverted repeats recognized by two differ-

ent tyrosine-recombinases encoded by the upstream genes fimB and fimE (see Figure 22).

The implication of two independent enzymes to catalyze the recombination of a single

element is an uncommon feature that provides complex opportunities for controlling switch-

ing frequencies. FimB and FimE share 48% amino acid identity, but their activity and affinity

to recombination sites differ. FimB mediates inversion in both directions, whereas FimE al-

most exclusively mediates ON→OFF inversions. Two factors explain this bias: i) sequence

differences between the external parts of the recombination sites provide FimE with higher

affinity to its cognate substrate in the ON phase; ii) the activity of Pa in the OFF orientation

indirectly decrease FimE expression through a complex post-transcriptional phenomenon (see

Figure 22). In addition to the regulation of fimA, the invertible element thus modulates

switching frequencies – a phenomenon known as orientational control (Chu and Blomfield,

Introduction – Control of genetic diversity


2007). Other interactions complicate the picture. Because the -10 part of the Pa promoter

overlaps the internal side of the inverted repeats, the sole binding of the recombinases prevent

expression of fimA. Conversely, the Pa activity prevents binding of both recombinases in the

ON phase. Recombination of the invertible segment and transcription of fimA hence appear to

be mutually exclusive processes. In the same light, the activities of both the fimE promoter

and Pa seem to specifically inhibit the FimB-mediated OFF →ON transition.

Under typical laboratory conditions, the frequency of inversion mediated by FimB is

10-3-10-4 per cell per generation in both orientations, while the FimE-mediated ON→OFF

switch reaches a frequency of 10-1. In these conditions, steady-state equilibrium is reached

with ca. 97% of the population in the afimbriate phase. The net phase variation rate of type 1

Programed generation of genetic variations – Mutation by intragenomic recombination


fimbriae depends on the relative amount of the two recombinases. The components of the sys-

tem are linked by a complex network of interactions, including intricate feedback loops. In-

deed, variations in the expression level of fimA, fimB or fimF impact the switching frequency

and consequently affect expression of the others genes. In general, the contribution of FimB

to the ON→OFF transitions can be neglected before FimE. Therefore, either slight decreases

in the level of FimE or increased level of FimB greatly affects the frequency of OFF→ON in-

versions (Blomfield, 2001).

Several cellular factors modulate recombination rates by directly affecting the expres-

sion of the recombinases or by influencing the recombination process. All these factors act in

very pleiotropic and interdependent ways that makes it difficult to clearly establish their con-

certed effect (global trends are indicated on Figure 22). Importantly, several environmental

signals can modulate the frequency of switching through these cellular factors. Deprivation in

isoleucine, leucine, valine or alanine decreases recombination frequencies in both directions.

This effect is at least partly mediated by Lrp. Indeed, these amino-acids are allosteric co-

factors that modify the binding affinity of Lrp. The absence of these amino acids seems to

promote interaction with binding site 3 (see Figure 22), which limits the inversion reaction

(Blomfield, 2001). H-NS is a global regulator implicated in thermoregulation, with lower

temperature associated with higher expression of the protein. The OFF→ON switching fre-

quency increases with temperature over the range of 30 to 39°C. Regulation by H-NS thus

participates in the temperature responsivity of the fim system, which may increase the produc-

tion of type 1 fimbriae upon host colonization. At last, fimB transcription is also repressed by

a distant cis-active silencer, which promotes the OFF state. Sialic acid (N-acetylneuraminic

acid) and GlcNAc (N-acetyl-glucosaminidase) have been shown to suppress this silencing ef-

fect. Both amino sugars are released during the host inflammatory response and promote the

expression of regulatory proteins capable of repressing the silencer, and ultimately increase

the OFF→ON transition (El-Labany et al., 2003; Sohanpal et al., 2004; Sohanpal et al., 2007).

Overall, several intricate mechanisms seem to modulate the switching rates and thus

the production of variants in response to cues indicative of host colonization. Further deepen-

ing the complexity of the system, the switching frequencies of different type of fimbriae

seems to be coordinated to favor the expression of a single type at a time (van der Woude,

2006). Particularly, the PapB transcriptional regulator can specifically increase FimE-

mediated inversion while exerting the opposite influence on FimB, thereby silencing type 1

fimbriae when Pap pili are expressed (Blomfield, 2001). PapB is a key element of the Pap

fimbrial operon, which phase varies according to an epigenetic mechanism (see p95).

Introduction – Control of genetic diversity


iv. Variation controled by DNA insertions and excisions

DNA inversion systems form a stable device to alternate the expression state of a

handful of genes. Site specific-recombination can also mediate the insertion-excision of circu-

lar intermediates (see Figure 20, p82). Most mobile elements including phages, ICEs and ge-

nomic islands use self-encoded site-specific recombination systems. Genomic islands are

peculiar in the sense that they are not self-mobilizable – their excision from a chromosome

leading to non-replicative circular forms (Boyd et al., 2009). The insertion of such mobile

elements in the chromosome ensures their correct replication, but also tightly links their fate

to the host success. As mentioned earlier (see p54), several mobile elements carry accessory

genes that can benefit them indirectly, by proving adaptive to their host. From the host point

of view, the insertion of such mobile elements corresponds to the instantaneous acquisition of

prepacked adaptive functions, including resistance and detoxification factors, metabolic capa-

bilities and virulence determinants. If it does not kill the cell, the excision of these elements

may reduce metabolic cost when the encoded accessory functions are not needed.

The excision of mobile elements can also be co-opted for other purposes. For instance,

the excision of a prophage-like remnant is involved in the developmental differentiation of

forespores in B. subtilis (Krogh et al., 1996). In the same light, the site-specific excision of

two integrated elements triggers differentiation into heterocysts, which are resistant cells spe-

cialized in nitrogen fixation in many filamentous cyanobacteria. In both cases, the excision

restores the coding regions inactivated by an original insertion back to their functional forms.

The excision event is tightly controlled to occur only in terminated cells – corresponding to

progenitor cells after they have been compartmentalized from the forespore in B. subtilis and

to differentiated heterocysts, which are unable to divide in cyanobacteria. Thus, the restruc-

tured genomes do not participate in the next generation, while either the differentiated (spore)

or undifferentiated (vegetative cells) maintain the integrity of the system over time

(Haselkorn, 1992; Prozorov, 2001).

The integron system is a prominent diversity-generating structure that relies on both

insertions and excisions of dedicated elements. Because this system constitutes the essential

matter of this work, it will be described in detail in a dedicated chapter (see The integron ge-

netic system, p118).

Programed generation of genetic variations – Epigenetics


I.3.3. Epigenetics

a - Definition

The word epigenetic is somewhat ambiguous because it has been used to convey dif-

ferent ideas along the history a biological science. The concept of epigenesis was introduced

by Aristotle (ca. 384-322 BC) to refer to the embryological development of multicellular or-

ganisms. Particularly, it emphasizes development as an active process leading to the forma-

tion of a complex and organized organism through the gradual differentiation of an

amorphous zygote. The word epigenetic appeared in the 18th century to contrast this idea with

the preformationist theory, which held that the germ cells of each organism contain preformed

miniature adults that unfold passively during development without gain of complexity (Van

Speybroeck et al., 2002). Epigenetics was later used by C.H. Waddington (1905-1975) to

mean the external manifestation of genetic activity in interaction with the environment during

development (Waddington, 1942a). Etymologically, the word is based on the Greek prefix

epi- denoting on top of or in addition and genetic, meaning pertaining to or produced from

genes. In a broad sense, epigenetics is then a bridge between genotype and phenotype – a

phenomenon that changes the final outcome of a locus or chromosome without changing the

underlying DNA sequence. More specifically, epigenetics may now be defined as the study of

any mitotically and/or meiotically heritable change in gene expression or cellular phenotype

that occurs without changes in Watson-Crick base-pairing of DNA (Russo et al., 1996; Gold-

berg et al., 2007).

Epigenetic phenomena rely on diverse mechanisms that usually involve positive feed-

back loops to stabilize phenotypic states over time and rely on stochastic fluctuations to

switch between alternate phases. These mechanisms include: i) covalent histone

modifications, which maintain active and silent chromatin states (in eukaryotes); ii) DNA me-

thylation patterns, which alter the affinity of specific binding protein to their cognate sites; iii)

non-coding RNA, which can heritably affect gene regulation by diverse means, including the

covalent modifications of histones and DNA; iv) multistable regulatory switches, in which

expression states are maintained after the triggering signal disappeared; and v) prions, in

which protein structure is heritably transmitted (Casadesus and Low, 2006; Goldberg et al.,

2007; Feil, 2008; Veening et al., 2008a).

It has been argued that the requirement for heredity in the definition of epigenetics is

too restrictive, because some epigenetic mechanisms are also involved in very transient phe-

Introduction – Control of genetic diversity


nomena (Bird, 2007). In such cases, the outcome of the epigenetic process is closer to physio-

logical regulation than genetic modification. Importantly, these changes are particularly sensi-

tive to environmental inputs (Bird, 2007). Epigenetics then provide a bridge between the fast

and transient adaptation afforded by physiological regulation and the slow and heritable ge-

netic adaptation. The intricate relationships between regulatory changes and genetic variations

will be further discussed in the next chapter (see Phenotypic plasticity, genetic, variations and

physiological regulation, p99). In the following, selected examples of heritable epigenetic

phenotypes in microbes are presented to highlight their functional similarities with pro-

grammed genetic changes.

b - Bistable regulatory switch

i. Hysteresis in the lac regulatory system of E.coli

The first epigenetic process described in bacteria affects the lac operon of E. coli. M.

Delbrück (1906-1981) first formulated the idea that discontinuous transitions between alterna-

tive states exist in this context in 1949. The phenomenon of all-or-none enzyme induction was

subsequently verified experimentally (Novick and Weiner, 1957; Cohn and Horibata, 1959).

The genes encoding the proteins required for the uptake (lacY) and utilization (lacZ) of lac-

tose are induced in the presence of lactose analogues, such as isopropyl-d-thio-β-

galactopyranoside (IPTG) or thio-methylgalactoside (TMG). Being non-metabolizable, such

compounds allow disentangling the physiological effects of induction from their metabolic

consequence (Novick and Weiner, 1957). At high inducer concentrations the lac operon is

fully derepressed, cells express the LacY permease at high concentrations and thus remain ac-

tivated (ON state). In contrast, at low concentrations cells that were previously uninduced and

do not have any permease in their membranes do not respond to the low level of inducer and

remain in the OFF state. A single cell cultured at intermediate concentration of inducer lead to

the development of two phenotypically distinct subpopulations, wherein cells in the OFF

states coexist with cell in the ON states without alteration of their genotypes (see Figure 23a).

It has recently been shown that stochastic dissociation of the LacI repressor from its operator

was responsible for this phenomenon (Choi et al., 2008). When cells that were previously in-

duced are shifted to a medium with lower level of inducer, the presence of permease in their

membranes ensure sufficient uptake to maintain the ON state (see Figure 23b). Importantly,

under low inducer conditions, either the ON or OFF state can be epigenetically inherited by

the offspring through multiple rounds of growth and division. Hence, the past environment

Programed generation of genetic variations – Epigenetics


experienced by a given cell can influence the phenotype of its offspring – a phenomenon re-

ferred to as hysteresis or cellular memory.

The LacY permease plays a pivotal role in this system: high permease levels keep the

levels of intracellular inducer high, thus inducing permease gene expression. This positive-

feedback loop drives the bistability of the system and is responsible for the sigmoidal shape of

the curve discernable in Figure 23b. The lac operon is also subject to catabolite repression via

cAMP/CRP, which results in low induction when glucose is present. This additional regula-

tory input affects the pattern of hysteresis, showing that other signals can act on top of the

feedback loop to modulate the output of the system (Ozbudak et al., 2004; and Figure 23c).

Interestingly, the switching behavior is essentially stochastic due to noise in the intracellular

concentration of the LacI repressor (Choi et al., 2008). Although only DNA mutations are

usually considered heritable, it has recently been demonstrated that increased rate of transcrip-

Introduction – Control of genetic diversity


tion error altered the OFF→ON switching behavior by increasing the noise in functional LacI

production. In such noisy systems, heritable phenotypic changes can thus results from non-

heritable mutations (Gordon et al., 2009).

ii. Sporulation in B. subtilis

Three instances of bistable phe-

notypic switches have been reported in

B. subtilis (Dubnau and Losick, 2006):

i) initiation of the competence state; ii)

separation of single swimming cells

from chains of physically linked sib-

lings; and iii) developmental commit-

ment to sporulation. Heritability was

only demonstrated for the last process.

Sporulation is driven by the mas-

ter sporulation regulator Spo0A, whose

activity is governed by phosphorylation

via a multicomponent phosphorelay

(Burbulys et al., 1991). Although activa-

tion is triggered by specific environ-

mental signals, such as high cell

densities and nutrient deprivations, only

a fraction of cells are found in a Spo0A-

ON state in adequate conditions (Chung

et al., 1994). This observation initially

led to the description of sporulation as a bistable process. This phenomenon depends on a

complex and noisy positive feedback loop involving both the transcription of the gene for

Spo0A and its phosphorylation (Veening et al., 2005). Time-lapse tracking of cell lineage

showed that the actual decision to sporulate could often be traced back more than two genera-

tions before the actual appearance of the phenotype (see Figure 24), thereby demonstrating

that the sporulating signal is epigenetically passed on from cell to cell (Veening et al., 2008b).

The benefit arising from supplementing a physiological control with a bistable mechanism

may be to avoid the whole population to embark on an irreversible differentiation process, in

case the initial trigger is erroneous. Also, Spo0A-ON cells in a population trigger the lysis of

Programed generation of genetic variations – Epigenetics


non-sporulating Spo0A-OFF siblings, a process termed fratricide or cannibalism (Gonzalez-

Pastor et al., 2003). By doing so, the population density is decreased and nutrient are released

in the environment, which may alter the Spo0A activating feedback loop and delay commit-

ment to sporulation (Dubnau and Losick, 2006). The inheritance of sporulation ability might

be beneficial in increasing the response of the progeny to future conditions which have higher

probability to be challenging considering the parent experience (Veening et al., 2008a).

c - DNA methylation patterns in bacteria

i. General mechanism

Insofar, all described cases of strict epigenetic regulation in bacteria rely on a methyla-

tion-dependent mechanism. These systems require a DNA methylase and a regulatory protein

that bind to DNA sequences overlapping the target methylation site. Upon binding, the regula-

tory protein hampers methylation by preventing access of the target site by the methylase. In

turn, methylation of the target site inhibits protein binding to its cognate sequence. This recip-

rocal relationship results in the existence of two alternative and stable patterns – fully methy-

lated and nonmethylated (i.e. not methylated on both strand) – which correlates with the

binding of a regulatory protein and is thus associated with the control of a target gene’s ex-

pression (Casadesus and Low, 2006).

Only a handful of such systems have been reported, reflecting both the difficulty to

evidence these phenotypes and the multiplicity of conditions required to build up such

mechanisms. All described systems pertain to E. coli and involve the orphan DNA adenine

methylase (Dam), which processively catalyses the methylation of adenine in the ca. 20,000

5’-GATC-3’ motifs present in E. coli’s genome. Comprehensive studies indicated that only

few target sites are consistently protected from methylation by binding proteins. Among these

proteins, few play a regulatory function and fewer are likely to be affected by methylation

patterns (Casadesus and Low, 2006). Dam apparently methylates only one DNA strand at a

time, but full methylation is promoted by the higher affinity of the enzyme to hemimetylated

sites (Lim and van Oudenaarden, 2007). After replication, DNA duplexes are transiently

hemimethylated and this state serves as an important cue for several processes, including syn-

chronization of replication, DNA repair and transposition (Casadesus and Low, 2006). While

such processes can be viewed as epigenetic phenomena, they are not heritable. In contrast, the

methylation patterns described above are heritable over several generations.

The exact determinants of this cellular memory are not understood in details, but some key

Introduction – Control of genetic diversity


points can nevertheless be drafted. Nonmethylated naked sites are subject to competition be-

tween the binding regulatory protein and the methylase. The outcome of this competition de-

pends on the relative concentrations and affinities of the two proteins. As there is no DNA

demethylation reaction in E. coli, the passage from a fully methylated to a nonmethylated

state necessitates two successive rounds of replication. Replication thus provides an interme-

diate state that can initiate switching toward nonmethylation. Because replication forks

probably remove regulatory proteins associated with nonmethylated sites, replication also af-

fects the switch toward methylation state. Overall, a functional switch involves several inter-

mediate states that tend to revert to the stable state from which they originate. As a result, the

extreme states are strongly stabilized and the switching frequency is maintained to a level

compatible with sustained heritability (Lim and van Oudenaarden, 2007).

Several parameters can influence the switching frequency of these epigenetic systems.

Obviously, the affinities of the regulatory protein to the different methylated states are of pri-

mordial importance. The steady state switching frequency then depends on the relative con-

centration of regulatory protein and Dam, and any factor impacting this ratio can displace the

dynamic equilibrium. Growth rates may be a source of particular variations: increased delays

between replication rounds would favor the occurrence of two consecutive methylations,

while faster growth would titrate Dam activity thereby favoring dilution of the methylation

pattern (Lim and van Oudenaarden, 2007). To some extent, this effect may be buffered, be-

cause Dam levels are known to specifically increase with growth rate (Lobner-Olesen et al., 1992).

ii. The agn43 system

The agn43 (flu) gene of E. coli is a straightforward example of bacterial epigenetic

switch involving covalent DNA modification. Indeed, it is not embedded in a complex net-

work of feedback regulations and conforms perfectly to the general features presented above.

agn43 encodes the outer membrane protein Ag43, which is notably implicated in autoaggre-

gation and biofilm formation (Casadesus and Low, 2006). The expression of Ag43 phase var-

ies with a switching rate of 10-3 to 10-4 per cell per generation (Lim and van Oudenaarden,

2007). The oxydative stress response protein OxyR represses the transcription of agn43. The

binding site is located immediately downstream of the gene promoter. This regulator exists in

two redox states, associated with distinct binding affinities to DNA. Binding of the reduced

protein represses gene expression in the OxyR regulon, a constraint that is generally relieved

by oxidation of the protein. Surprisingly, the redox state of OxyR does not affect binding to

the agn43 regulatory region. However, the binding region contains three Dam target sites.

Programed generation of genetic variations – Epigenetics


Methylation of the target sites prevents OxyR binding, and consequently promotes Ag43 ex-

pression (Henderson and Owen, 1999). Conversely, the binding of OxyR to its cognate se-

quence protects the region from methylation, thereby maintaining proper condition for

effective repression. The affinity of OxyR for its hemimethylated substrate is six-fold lower

than nonmethylated, while its affinity for the fully methylated region is even too low to be

measured. As discussed in the previous section, this must be a critical feature in controlling

the switching frequency. In addition, methylation of any two of the three target sites is suffi-

cient to prevent binding (Casadesus and Low, 2006). The third site might then be involved in

decreasing the frequency of ON→OFF transitions.

Additional observations slightly complicate this simple model. First, an intermediate

agn43 expression level is measured in an oxyR dam double mutant background. This likely

results from a non-specific transcription, and indicates that methylation of the promoter region

is required for full expression. Furthermore, the region upstream of the promoter and between

the OxyR binding region and the agn43 gene are both necessary for correct regulation. It thus

seems that binding of OxyR by itself is not sufficient for repression. Instead, effective repres-

sion would rely on an interaction between the upstream and downstream regions mediated by

the DNA bending properties of the protein. This additional structural state seems to stabilize

the nonmethylated state and decrease the probability of OFF→ON switching (Lim and van

Oudenaarden, 2007).

The expression of the bacteriophage Mu mom gene of is also affected by OxyR. In this

case the binding of the protein is both regulated by the redox control and the methylation state

of three Dam target sites (Bolker and Kahmann, 1989; Hattman and Sun, 1997).

iii. The pap system

The pap operon, which encodes the pyelonephritis-associated pili in E.coli constitutes

the paradigmatic example of epigenetic switch mediated by methylation patterns. Several

roles have been suggested for the switching of Pap expression: escape from the host immune

response, facilitation of a bind-release-bind series implicated in urinary tract colonization and

control of growth through contact-dependent growth inhibition (Aoki et al., 2005). Phase

variation of Pap relies on the same general principle as agn43. The system is nonetheless

more complex, because it involves a dual methylation pattern and is subject to intricate feed-

back controls.

Introduction – Control of genetic diversity


The system consists of the main operon and a divergently encoded upstream gene

papI. Two stable expression states can be distinguished for the pap operon, depending on the

binding of Lrp (see Figure 25). In the ON state, GATCprox is methylated preventing binding of

Lrp to the Pb region, while Lrp binding protects GATCdist from methylation. In the OFF

phase, Lrp prevent expression of the pap operon and protect GATCprox from Dam activity,

while methylation of the GATCdist prevents binding to this site. Importantly, Lrp plays a dual

regulatory role in this system. In the OFF phase Lrp represses papB by preventing access to

Pba, whereas in the ON phase it is bound to GATCdist and activates papI transcription through

the formation of a complex involving PapI, PapB, and CRP. PapI plays an important role in

regulating the affinity of Lrp to its cognate sites. PapB is an activator of PapI expression, and

Programed generation of genetic variations – Epigenetics


also exert a negative autoregulation on the pap operon (Blomfield, 2001). The most conven-

ient way to appreciate the intricacies of the system is to consider the transitions between the

ON and OFF states.

Let us first consider the OFF→ON transition, which occurs at a frequency of ca. 10-4

per cell per replication in standard conditions. Two factors stabilize the OFF state: i) the sole

binding of Lrp to the proximal region decreases binding at the distal region by 10-fold, a phe-

nomenon termed mutual exclusion. The underlying mechanism is unknown but requires DNA

supercoiling, potentially mediated by Lrp binding to the proximal region; ii) the methylation

state of GATCdist, a mutation preventing methylation at this site is sufficient to lock the sys-

tem on the ON phase (Braaten et al., 1994). PapI is essential to the OFF→ON switch. In the

absence of PapI, Lrp is slightly more affine to the proximal region. When expressed at physio-

logical concentrations, PapI increases the affinity of Lrp to the distal region, even when

hemimethylated. Furthermore, PapI specifically decreases the affinity of Lrp to the methy-

lated proximal region. A raise in either PapI or PapB levels seems mandatory to initiate the

switch and replication is presumably required in this process. By destabilizing Lrp bound to

GATCprox, replication probably results in slight increase in PapB expression. This would initi-

ate a positive feedback loop because papI expression is promoted by PapB. Increased levels

of PapI and release of mutual exclusion would favor Lrp binding to the hemimetylated distal

region, which further promotes the expression of PapI in the presence of CRP. High levels of

PapI further decrease Lrp binding to the proximal region provided it has been methylated by

Dam, thereby stabilizing the ON phase and the production of PapB. A second round of repli-

cation is required to obtain a nonmethylated GATCdist and fully stabilize Lrp binding in this

region. In steady state, the expression of pap is limited by binding of PapB to the Pb region.

This autoregulation also indirectly controls the level of PapI, and probably permits the reverse

switch to occur.

The ON→OFF transition occurs at a 100-fold higher rate. Initiation of the switch

probably also involves replication, which destabilizes the Lrp complex at the distal site and

promote competition with Dam. Replication also produces a hemimethylated GATCprox site to

which Lrp affinity is increased in the presence of PapI. Once Lrp is bound to the proximal re-

gion, it would favor methylation of the distal site by mutual exclusion. Altered formation of

the Lrp enhancer together with lower PapB level decrease PapI expression, thereby stabilizing

the OFF state. A second round of replication leads to nonmethylated GATCprox (Blomfield,

2001; Casadesus and Low, 2006).

Introduction – Control of genetic diversity


This switching system is subjected to several environmental influences. The papI ex-

pression enhancing complex comprises the CRP protein, which links the regulation of the pap

operon with catabolite repression and availability of carbon sources. Specifically, the absence

of glucose dramatically lowers the OFF→ON switching rate. The global regulatory protein H-

NS is indirectly implicated in controlling switching rate via PapB expression. H-NS seems to

bind to the pap regulatory region and represses papB transcription in response to low tem-

perature (Goransson et al., 1990), high osmolarity and rich culture medium (White-Ziegler et

al., 2000). Both of these regulatory mechanisms may be used by cells to adapt the production

of pili in response to their immediate environment. Another environmental input into Pap

phase variation is mediated by the CpxAR two-component system. Under certain conditions

that stress the cell envelope, CpxA activates CpxR which then binds to sites overlapping all

pap Lrp binding sites, thereby shutting papB expression off. This response is not affected by

methylation patterns and thus overrides Lrp control (Hernday et al., 2004). The biological role

of this phenomenon remains unclear.

Another layer of complexity results from the presence of several paralogous fimbrial

operons in E. coli. These operons share a similar organization and crosstalks occur between

homologs of PapB and PapI. As a result, a particular stochastic switching event can influence

the transition frequency of related operons. This regulatory network even extends to unrelated

system. For instance, PapB and its paralogue SfaB repress phase variation of the type 1 fim-

briae encoded by the fim operon, which is mediated by site-specific promoter inversion (van

der Woude, 2006; and see p85). Overall, this interdependence may account for the observed

co-ordination of pili expression in individual cells.

d - The yeast prion PSI+

Prions are proteins that can adopt at least two distinct and stable conformational states,

one of which – the prion form – can stimulate the non-prion conformation to convert into the

prion form (Uptain and Lindquist, 2002). The yeast prion PSI+ is generated by the aggrega-

tion of the Sup35p translation termination factor which allows readthrough of nonsense

codons (Patino et al., 1996). The PSI+ prion is a metastable element that is generated and lost

spontaneously at low rates. Sup35 aggregates into amyloid fibers which leads to a range of

phenotype strengths that render characterization of switching rates uneasy (Uptain and

Lindquist, 2002). By uncovering hidden genetic variations in 3’ untraslated regions, ca. 25%

of [PSI+] cells exhibit a survival advantage under adverse conditions (True et al., 2004).

Programed generation of genetic variations – Epigenetics


The [PSI+] phenotype is maintained by the ability of prion fibers to convert native

Sup35p protein to the prion form (Satpute-Krishnan and Serio, 2005). Upon cell division, the

prion conformation is probably passed from the mother cell on to the daughter cell through

the cytoplasm (Uptain and Lindquist, 2002). Consequently, the [PSI+] state can be stably

maintained for approximately 105 to 107 generations (Lund and Cox, 1981). The protein chap-

erone Hsp104 modulates the propagation of the [PSI+] state (Chernoff et al., 1995). During

heat and chemical stress it is observed that the [PSI+] phenotype is suppressed, presumably

due to increased Hsp104 activity that releases functional Sup35 from prion aggregates

(Eaglestone et al., 1999). In this case, stresses transiently decrease the readthrough phenotype

of [PSI+] cells, which diminishes the phenotypic variance instead of promoting it. In contrast,

it was recently reported that signal transducers and stress response genes are prominent factor

modulating the frequency of switching to the prion state. Particularly, stressful conditions

such as oxidative stress or high salt concentrations greatly increase the induction of [PSI+]

phenotypes (Tyedmers et al., 2008).




Natural selection acts on the external manifestation of the genetic information – the

phenotype. The availability of heritable phenotypic variations determines the response of a

population to selection. Phenotypic diversity is not only the consequence of differences in the

genetic makeup of individual organisms but also results from variations in gene expression

(Bennett and Hasty, 2007). This chapter discusses the relationship between physiological

regulation and the diverse mechanisms to generate genetic diversity that have been introduced

above. Overall, the multiplicity of processes involved in phenotypic plasticity provides a more

complete view of evolution than initially conceived (see Figure 26, p113). The concepts of

evolvability and robustness are overviewed in this framework.

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


II.1. Genetic versus physiological changes

II.1.1. Cybernetic genomes

It has been observed for a long time that individual organisms are able to react and

adapt to their environment. C. Bernard (1813-1878) first defined the concept of homeostasis

to acknowledge the ability of higher metazoans to actively maintain the stability of certain

physiological parameters in front of varying conditions, via the coordinated action of their or-

gans. Such processes guarantee the self-consistance of individual organisms and ensure a rela-

tive independence from their immediate environment. The molecular mechanisms underlying

these physiological processes were first described by F. Jacob and J. Monod (1910-1976),

with their molecular studies on the lactose operon of E. coli (Müller-Hill, 1996).

Generally speaking, genetic information is encoded in the DNA sequences that form

the genome of living entities. These sequences constitute information because they are inter-

preted by the molecular machinery present in the cell to construct the phenotype. To some ex-

tent, proteins and nucleic acids molecules constitute the cell’s genetic hardware, while the

exact sequence of bases in DNA represents the genetic software – which is decoded by the

hardware. This computer analogy is, however, limited. In a computer the composition of the

hardware is fixed through time, and this ensures the unambiguous interpretation of the soft-

ware’s information according to a defined flow of execution. In a cell, the flow of execution is

not explicitly directed by the structure of the software. Instead, the execution of the software

determines the manufacture of hardware components which – in turn – can affect the interpre-

tation and structure of the software. In a first approximation, DNA sequences are accessible

continuously and the logical decisions directing the execution of this program depend on the

hardware available at a given time point.

In this view, the genetic program results from the interpretation of pure genetic infor-

mation by a dynamically changing molecular machinery. This machinery contains processing

elements (polymerases, ribosomes, tRNAs…) to express the information and controlling ele-

ments (regulator, operator…) to regulate this phenotypic expression. While the processing

fraction is relatively stable, the composition of the controlling pool varies through time in re-

sponse to diverse stimuli. Regulation of gene expression can occur at different levels (i.e.

transcriptional, translational or posttranslational) through various logical mechanisms includ-

ing feedback loops (e.g. catabolite derepression during glucose limitation), signaling through

Genetic versus physiological changes – Individual and populational adaptation


coupled sensor-transducers (e.g. histidine protein kinases) and global responses to stress (e.g.

SOS signaling, see p47). Overall, this physiological regulation allows organisms to exhibit a

prescriptive phenotypic plasticity in response to changes (see below).

II.1.2. Individual and populational adaptation

Living organisms have two major strategies for adapting to environmental changes: i)

random and heritable phenotypic changes mediated by genetic mutation; and ii) programmed,

directed and transient changes in gene expression patterns which only alter the phenotype.

The first strategy relies on the blind force of natural selection iteratively operating in a popu-

lation made of different individual variants. The second requires a set of prescriptive re-

sponses, which allow cells to maximize their fitness in response to a wide range of situations

and in a controlled manner. Because the phenotypic characteristics acquired through such

physiological responses are usually not heritable (but see Bistable regulatory switch, p90),

this form of phenotypic plasticity only affects individual survival punctually, and therefore

have no long-term evolutionary impact. Nonetheless, regulatory responses result from sus-

tained selection for cells to maintain their homesostasis in front of external and internal envi-

ronmental challenges.

Any mutation affecting a regulatory protein – or one of its cognate DNA binding sites

– can be selected based on its impact on fitness in the current environment. Yet, the effective

evolution of responsive regulation patterns is limited in several ways (Moxon et al., 2006): i)

the stressing conditions must not be too harsh for organisms to survive long enough to express

the response; ii) the environment must provide indicative cues, which can be sensed by cells

to elicit the response. Such cues have to be highly specific to avoid detrimental activation un-

der inappropriate environments. For some stressing conditions, such triggers may be very

scarce; iii) the frequency at which triggering conditions occur heavily constrains the evolution

of a dedicated response. Too rare challenging conditions cannot sustain the selective pressure

required to orient the mounting of a sophisticated response. Ideally, the environment should

cycle rapidly to ensure proper selection of advantageous responsive variants – though slowly

enough to avoid selection of constitutive regulation patterns (see Genetic assimilation of long

lasting regulation, p107); and iv) once functional, the sensing and operator systems are likely

to impose a substantial energetic cost on the organism. The maintenance of these systems de-

pends on a trade-off between their permanent cost and the occasional fitness advantages they

confer over time. The outcome of this trade-off ultimately relies on the rate at which appro-

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


priate environmental triggers occur, and rarely demanded systems are expected to be elimi-

nated by selection. Overall, the repertoire of programmed physiological responses resulting

from these constraints is necessarily limited.

Responsive regulation and natural selection are two processes with different scopes.

Mutations are the sole source of genetic innovations and can influence fitness in different

ways – including regulation – to cope with a very wide range of challenges. However, muta-

tions generally occur randomly in few individuals and are essentially destructive (Eyre-

Walker and Keightley, 2007; and see Figure 1, p24). Their occasional selection by the envi-

ronment slowly impacts the population, depending on the strength of selection. In sharp con-

trast, physiological regulation involves teleonomic systems that responsively orient

phenotypic modifications when exposed to known challenges. This process increases the

adaptedness of all individuals in the population at the same time. Selection is thus a general

and populational process operating on individuals over evolutionary time, while regulation is

a specialized and individual process that simultaneously affects the whole population. To

some extent epigenetic mechanisms form the middle way – blurring the difference between

simple regulation and heritable change. Aside from prions (see p98), epigenetic mechanisms

essentially affect regulation patterns. Owing to self-reinforcing behaviors mediated by posi-

tive-feedback loops, epigenetic phenomena stabilize particular physiological states so far as to

allow their inheritance (Casadesus and Low, 2006). This permits elements of the genetic

hardware that have been specifically expressed in response to stress to constitute historical in-

formation that can durably influence the interpretation of the genomic software. This hystere-

sis enables adaptation to occur on an intermediate timescale between regulation and selection

(Jablonka et al., 1995; Rando and Verstrepen, 2007).

II.1.3. Stochastic switches as a bet-hedging strategy

a - Contingency loci

Although most mutations are irreversible and random in time, space, and nature, sev-

eral mechanisms mediating high-frequency, stochastic, heritable, and reversible switching be-

tween well defined phenotypic states have been presented in detail earlier (see Programmed

generation og genetic variations, pp 60-99). In bacteria, the loci affected by these mechanisms

are often referred to as contingency loci and are involved in phase and/or antigenic variations.

These phenomena are not restricted to bacteria and have also been described in eukaryotic

Genetic versus physiological changes – Stochastic switches as a bet-hedging strategy


microbes (yeast, trypanosomes…), as well as in higher metazoans (e.g. immune system and

SSR loci, p64). Contingency loci are hypermutable compared to the genomic background but

channel variations toward specific phenotypes. This generally involves the ON-OFF switch-

ing of a gene’s expression state. Combinatorial variations allow a clonal population to rapidly

diversify into phenotypically distinct subpopulations – which remain almost unaltered at the

genetic level. The actual switching rate and the population size determine the extent of diver-

sification (De Bolle et al., 2000). Accurate estimations of switching rates in the absence of se-

lection are often difficult to achieve experimentally. In bacteria, they are commonly found on

the order of one switch in every 103–105 generations, though rates as high as 10-1 has been re-

ported in some systems (van der Woude and Baumler, 2004). In most cases, contingency loci

are modifiers of their own mutation rates and are thus hardly affected by recombination, as is

the case with general mutators (see p42).

What are the advantages of pre-diversifying a population at such contingency loci?

Broadly speaking, evolution occurs through the selection and rise in frequency of fitter vari-

ants in a population. As most mutations are deleterious, mutation rates are kept as low as pos-

sible on a genomic scale to ensure the maintenance of genetic characteristics on the short term

(see p39). In these conditions, a population may not be variable enough to efficiently respond

to selection. Recurrent selective pressures exerted on a given trait can drive the development

of contingency mechanisms targeted to loci at which increased rate of specific mutations

prove beneficial to the organism. In this light, the subpopulations resulting from the combina-

torial variations of several contingency loci are poised to adapt to a variety of sudden – but

expectable – environmental changes. In contrast to the loss of genetic information that follows

the take over of a population by one rare advantageous variant (selective sweep), a subpopula-

tion selected on the basis of contingency loci will be able to rediversify efficiently – owing to

the reversibility of these specific types of genetics mutations. This kind of genetic plasticity

underlies a phenotypic plasticity that is close to the one afforded by physiological regulation.

b - Bet-hedging

The idea that stochastic phenotypic switching can be advantageous in fluctuating envi-

ronments accords well with theoretical studies in evolutionary ecology – and is often referred

to as bet-hedging. In ecology, this strategy implies the diversification of life-history traits ex-

perienced by the progeny of an individual. Classical examples include variable maturation

rates in insects or germination events in plant seeds. Formally, bet-hedging refers to a risk-

spreading strategy which favors genotypes with lower variance in fitness at the cost of lower

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


mean fitness (Hopper, 1999). The lower variance in fitness reflects the fact that diversified

subpopulations are potentially able to cope with a larger panel of environments. In a given

environment however, only a subpart of the population may be well adapted, resulting in

lower mean fitness. Strictly speaking, this strategy supposes that the phenotypic variability is

expressed from a constant genotype. Indeed, phenotypic diversification can be achieved on

purely regulatory ground in multistable systems (Dubnau and Losick, 2006; Veening et al.,

2008a; and see p79). Contingency loci are close to this ideal because the mutations underlying

phenotypic changes are reversible. Interestingly, such a genetic determinism may be a benefi-

cial feature because it straightforwardly enables a heritable cellular memory of past environ-

mental conditions (Jablonka et al., 1995; Lachmann and Jablonka, 1996). At a given

generation, a population will be essentially biased toward the expression of traits that just

proved advantageous for the parental generation – instead of being randomly drawn from the

whole set of potentially adaptive variation. Such hysteresis is less reliably implemented using

regulatory network, which are more sensitive to molecular noise (Lim and van Oudenaarden,


What is the optimal switching frequency in these systems? Obviously, too high fre-

quencies would exaggeratedly decrease the mean fitness, while too rare switches would nega-

tively impact the fitness variance. Both modeling (Lachmann and Jablonka, 1996; Kussell et

al., 2005; Wolf et al., 2005) and experimental (Acar et al., 2008) studies suggested that sto-

chastic and heritable phenotypic switches would be beneficial when the environment

fluctuates randomly over timescales that roughly match the phenotypic switching rate. Inter-

estingly this optimum frequency would be the one observed at the population level if the

switch was controlled by responsive regulation. Another example corroborating these finding

will be presented in the results section (see Evolution of recombination rate in integrons,


c - Genetic switches as crude regulatory controls

Most genetic switches impact the expression of an existing trait rather than generating

functional novelty. On this ground, the phenotypic impact of these systems is comparable to

physiological regulation. Then, why are some phenotypic traits controlled through a pre-

emptive bet-hedging strategy rather than a prescriptive regulation pattern? As highlighted in

the previous section, several factors constrain the evolution of responsive regulation (see

p101). To some extent, the bet-hedging strategy provides an alternative to overcome these

constraints: i) while the time delay between sensing and phenotypic change may not be fast

Genetic versus physiological changes – Stochastic switches as a bet-hedging strategy


enough to ensure effective response, the pre-existence of substantial variability guarantees the

immediate survival of a subpopulation upon sudden and instantly lethal challenges (Borst and

Greaves, 1987); ii) a bet-hedging strategy does not require sensing of specific cues and is

probably well suited to cope with a wide range of situations (Wolf et al., 2005); iii) contin-

gency mechanisms are less costly in term of molecular machinery and some are easy to setup.

This is particularly true for SSR and gene conversion, which mostly subvert existing mecha-

nisms of repair. Supporting this idea, it has been noted that regulation by contingency loci are

particularly prevalent in bacteria with smaller genome sizes (Moxon et al., 2006). In addition,

some sophisticated contingency mechanisms can produce phenotypes that are difficult or im-

possible to achieve through physiological regulation – such as coordinating mutually exclu-

sive expression patterns and creating new combinatorial diversity through rearrangements.

d - Link with the lifestyle of organisms

Most examples of stochastic genetic switches are found in microbes, probably because

their biology is particularly well suited to this evolutionary strategy. Microbes tend to rapidly

establish clonal populations with large effective population sizes. In this context, subpopula-

tions differing at their contingency loci can readily differentiate – and heritable switches can

efficiently substitute for regulation. Microorganisms also undergo severe and extremely vari-

able selective pressures. On a macroscopic scale, microbial cells can be considered as sessile

organisms and are often subjected to rapid environmental change without any means of es-

cape (Andrews, 1998). They routinely experience rapid changes in nutrient levels, osmolarity

and exposure to toxic compounds. Particularly, pathogens must face a continuous and dy-

namic battle against immune defenses of their hosts. Accordingly, most of the described vari-

ability is in the cell surface and many contigenci loci have been implicated in immune

evasion, or possibly in colonization of different host niches (van der Woude, 2006).

Generally, multicellular organisms form smaller populations. Their evolution is then

more influenced by genetic drift, and contingency loci cannot reliably generate a diversified

panel of phenotypes. Furthermore, long generation times rule out genetic switching as an ef-

fective response to rapidly fluctuating conditions, and physiological regulation is a more

appropriate alternative in this context. Besides, cells in multicellular organisms are less ex-

posed to environmental changes. Remarkably, the immune system of higher eukaryotes re-

lies on several diversity generating mechanisms. Although this constitutes the other side of

the arm race with pathogens, the evolutionary implications are different because somatic

changes are not transmissible to the next generation. Most examples of stochastic genetic

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


switching described in multicellular organisms essentially rely on SSR, the simplest contin-

gency mechanism (see p64). The general significance of these mechanisms in multicellular

organisms may become apparent with further studies (Fondon and Garner, 2004).

II.2. Links between genetic changes and regulation

II.2.1. Impact of expression strength on sequence evolution

Genes are selected when their expression affect the fitness of the organism. Genes

whose products are at the interface with the external environment are often subjected to diver-

sifying selection at specific positions. In contrast, genes that are essential to the functioning of

the organisms undergo strong and constant purifying selection. Following S. Wright (1889-

1988) metaphor of the adaptive landscape (Wright, 1931), continuous exposure to directional

selective pressures traps functional genes to local fitness maxima and heavily constrains ef-

fective exploration of the surrounding landscapes (Weinreich et al., 2006).

The evolution of essential and highly expressed genes is particularly constrained. In-

deed, expression strength seems to be the most important determinant of the protein evolution

rate, with the more expressed genes undergoing less non-synonymous changes in a wide

range of organisms (Pal et al., 2001; Rocha and Danchin, 2004; Subramanian and Kumar,

2004; Lemos et al., 2005). Although essential genes tend to display higher expression level,

the importance of expression strength may be decoupled from the functional activity of the

protein and probably involve a selective pressure to limit translation errors (Drummond and

Wilke, 2008). In contrast, accessory genes whose expression infrequently impacts the pheno-

type, as well as genes that have been subjected to recent duplication undergo relaxed selective

pressures. Even if they do not participate in fitness in a given environment, the evolution of

constitutively expressed genes is constrained because alterations of their product can interfere

with the normal functioning of the cell (negative epistasis). Genes that are silenced through

physiological repression or genetic switch are not subject to any selective pressure.

In a context of relaxed selection, mutations can neutrally accumulate in the population

and evolution is mainly driven by genetic drift. While this introduces a substantial amount of

deleterious mutation and may eventually lead to pseudogenisation, it also provides a favorable

opportunity to generate diversity, and relieve adaptive constraints. This is illustrated by the

frequently observed sub- or neo-functionalization of duplicated genes (Conant and Wolfe,

Links between genetic changes and regulation –


2008b). Mimicking extended periods of neutral drift recently proved to be an efficient strat-

egy to evolve proteins experimentally (Gupta and Tawfik, 2008). This issue is further dis-

cussed in my work on directed evolution of proteins (see Results – Intrinsic evolutionary

potential of genes, p212) – and its relevance to the functioning of integrons will be addressed

later (see Discussion – Increased rate of cassette evolution, p237).

II.2.2. Genetic assimilation of long lasting regulation

What might happen when sustained environmental conditions induce a physiological

response for a long time? This type of question addressing the connection between short-term

and long-term phenotypic variation was raised by J.M. Baldwin (1861-1934) to deal with be-

havioral traits (Baldwin, 1896). The term Baldwin effect is now used to refer to a scenario in

which a phenotypic change occurring in an organism as a result of its interaction with its envi-

ronment becomes gradually assimilated into its genetic or epigenetic repertoire (Simpson,

1953). This concept was further developed by I.I. Schmalhausen (1884-1963) (Levit et al.,

2006) and C.H. Waddington (1826-1894) (Waddington, 1953). Waddington provided experi-

mental support to this idea. He evoked phenotypic changes in Drosophila by exposure to

ether, heat, or salt treatment and obtained flies that heritably exhibited new phenotypes in the

absence of treatment after few generations under selection. More recently, compromising the

activity of the heat-shock protein Hsp90 with environmental stresses, such as temperature,

was shown to increase the phenotypic variation in D. melanogaster (Rutherford and

Lindquist, 1998) and A. thaliana (Queitsch et al., 2002). These phenotypes could be subjected

to selection and stabilized so that their appearance was no longer dependent on the triggering

conditions, thereby demonstrating that Hsp90 is implicated in uncovering cryptic genetic – or

epigenetic (Sollars et al., 2003) – variations (see below pp 115 and 116). These results suggest

that environmentally evoked phenotypic changes can be regarded as new internal environ-

ments to which organisms can adapt genetically – provided they are maintained long enough.

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


II.2.3. Evolution of regulatory patterns

a - Regulatory networks as evolutionary target

The importance of regulatory variation in evolution has recently been emphasized

(Gerhart and Kirschner, 2007). Indeed, rewiring of existing genetic components can easily

promote the emergence of sophisticated and advantageous phenotypes. To some extent, bio-

logical entities at different levels of organization can be described as arrangements of largely

independent modules. For instance, the diversity of proteins seemingly arose from the combi-

nation of a limited set of functional domains (Caetano-anollés et al., 2009). In the context of

regulatory networks, modularity refers to a pattern of connectedness in which genes and their

products are grouped into highly connected subsets – i.e. modules – which perform integrated

functions and are more loosely connected to other such groups (Wagner et al., 2007). The

practical idea behind this abstract concept is that modules form functional bricks that – once

evolved on one occasion – can be reused by evolution in the edification of more and more

complex structures. If the notion of modular organization is now widely accepted in biology,

it remains unclear whether consistent modules are mounted by natural selection or emerge as

side effects of other processes (Wagner et al., 2007).

The rearrangement of regulatory networks is particularly flexible because the links be-

tween their constituting elements are weak. Two properties have been put forward to highlight

the impact of such weak regulatory linkages on evolution (Gerhart and Kirschner, 2007): i)

the signal input and response output interact indirectly through an intermediate agency, and

hence do not require stereochemical complementarity to each other; and ii) the output can be

much more complex than the regulatory input because it may be produced by a module that

has been previously built by natural selection – independent of the nature of the signal. The

regulatory input and functional output being decoupled, they need not coevolve. Instead, regu-

latory signals are selectable just for their regulatory value, without regard to their chemical re-

lationship to the response or to their intrinsic instructive capacity. In this light, functional

modules could be rewired and coupled with another module or environmental signals with

only few actual mutations. This process may greatly facilitate the contingent emergence of

complex phenotypic variations over evolutionary time.

The evolutionary implications of weak regulatory linkage were first perceived on very

simple systems: allosteric proteins (Monod, 1970). These proteins comprise both regulatory

and productive activities and thus constitute integrated system by themselves. They are able to

Links between genetic changes and regulation –


switches between ON and OFF states of activity – the OFF state being usually preferred in-

trinsically. Regulatory agents operate a state selection simply by binding preferentially to one

or the other conformation. Any regulator stabilizing the ON state is an activator, while any fa-

voring the OFF state is an inhibitor. The activity and inactivity states are built into the protein

and regulators only influence the conformation of the protein. Consequently, the actual func-

tion of the enzyme and its regulation are partly decoupled. The evolution of the segment me-

diating interaction with regulators can largely evolve independently from the functional sites.

This property readily allows diversification of an ancestral allosteric protein, with variants ei-

ther deriving new activities in response to the same regulatory inputs or displaying the same

activity in response to new triggers. Other examples of weak regulatory linkage involve more

complex modules at higher levels of organization (see Gerhart and Kirschner, 2007 and refer-

ence therein). The incorporation of the integron system to the SOS regulon presented in the

Results section also provides a striking illustration of how complex phenotypes can emerge

from the association of two preexisting modules (see p171).

b - Switching regulation patterns

Although contingency loci only affect their own single locus, their variations can have

far reaching phenotypic consequences. Indeed, the expression of several regulatory proteins

are known to phase varies (van der Woude and Baumler, 2004; van der Woude, 2006). The

expression state of all genes in the corresponding regulons is indirectly affected by such

switches. Documented examples of global regulators include Mga, a virulence-associated

regulatory protein in S. pyogenes (Bormann and Cleary, 1997); BvgS, a member of the global

two-component BvgAS regulatory system in Bordetella pertussis (Mattoo et al., 2001); and

possibly the type III restriction-modification system coded by the mod gene in H. influenzae

(Srikhanta et al., 2005). More local regulators can be involved in the coordination of specific

phenotypes – as exemplified by fimbriae, whereby crosstalks exist between fim and homo-

logues of the pap operon (van der Woude, 2006). Besides, the contribution of contingency

loci to the diversification of regulatory patterns can be more subtle. For instance, the binding

affinity of a transcription factor is modulated by a mononucleotide tract in the operator site of

an adhesion-coding gene in N. meningitidis (Martin et al., 2005). Thus, mechanisms facilitat-

ing mutations can further promote the evolution of weak regulatory linkage.

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


II.2.4. Physiological regulation of mutagenesis

a - Stress-induced mutagenesis can be spatially confined

Several phenomena suggesting a profound connection between major stress responses

and increased mutagenesis have been presented earlier (see Stress-induced mutagenesis, p47).

Conceptually similar genome-wide instabilities induced by a variety of stressing conditions

have been experimentally reported in various bacteria, yeast and human cancer cells. For a

comprehensive review, see Galhardo et al., 2007. There is no well defined or universal mo-

lecular mechanism responsible for stress-inducible mutagenesis, but rather a collection of in-

terconnected processes and recurring themes. Global stress responses coordinate the

expression of proteins that alleviate the deleterious effects of inadequate environmental condi-

tions on the organism. Because genomic alterations heavily compromise survival, it is not

surprising that mechanisms involved in DNA repair are incorporated in these responses.

However, the processes ensuring genome maintenance are not error free, and increased activ-

ity in time of stress incidentally promotes the fixation of mutation. By avoiding the constant

production of deleterious mutations, such phenomena promote the evolvability of populations

when it is most needed (see below Evolvability, p111).

Stress-induced mutagenesis essentially bestows a temporal control over diversification

on a genome-wide scale. However, spatial targeting of mutations may be surperimposed in

some instance. In E. coli, the coincident induction of the RpoS and SOS responses defines a

specific hypermutable state, whereby the relatively error-safe repair of DSBs is switched to a

DinB-dependent highly mutagenic process (Ponder et al., 2005). Because the repair of DSBs

is restricted to the region surrouding the lesion, such a hypermutagenic switch would actually

result in a targeted increase in mutations. The location of DSBs being random, the whole ge-

nome would be collectively affected at a populational level. But only regions near DSBs

would be mutagenized in individual cells, thereby limiting the overall mutational load

(Galhardo et al., 2007).

b - Targeted mutagenesis can be regulated

Contingency loci readily achieve spatial confinement of mutation rates – enabling a

genetic bet-hedging strategy (see p103). Nevertheless, in a number of cases environmental

signals and intercellular regulatory networks can be integrated with – or surperimposed on –

some switching mechanisms (van der Woude, 2006). The most straightforward examples con-

Evolvability and robustness – Evolvability


cern those contingency mechanisms that are closely dependent on DNA repair processes. For

instance, mutability at SSR loci is influenced by Pol IV and the MMR machinery (see

Extrinsic influences affecting SSR mutation rates, p70) – whereas gene conversion events de-

pend on the recombination apparatus. Any stress affecting these repair mechanisms can im-

pact the variation rate at such contingency loci. Natural populations of N. meningitides and H.

influenzae contain a significant proportion of mutL general mutators. As both bacteria encode

a large number of SSR contingency loci, there might be a synergistic effect between increased

bet-hedging and impaired MMR (Denamur and Matic, 2006; Moxon et al., 2006).

The most striking examples of complex coupling between contingency mechanism and

physiological regulation concern the promoter inversion in the fim system and the epigenetic

switch at the pap operon. The mechanisms involved in their regulation have been detailed

previously (see pp 87 and 98, respectively). In both systems, the switching frequencies are

regulated by a battery of cellular factors that are likely to provide integrated cues with regard

to the environment of the cell. Because switching events remain essentially stochastic, the

overall strategy still corresponds to bet-hedging. Nonetheless, these systems gain information

so as to tune their bet to fit specific environments. These examples are among the most docu-

mented illustrations of genuine directed mutations resulting in inheritance of the acquired

characters. One can appreciate the extent to which the apparent teleological outcome reflects

the combined action of chance with a sophisticated apparutus teleonomically designed to pro-

duce variability (see Appendix, p253).

II.3. Evolvability and robustness

II.3.1. Evolvability

A classical result of population genetics is that the rate of adaptation in a population is

proportional to the genetic variance in fitness in this population (Fisher, 1930). However, the

entities under selection are not genotypes but their physical embodiment – phenotypes. The

parameter that really constrains the rate of adaptation is thus the phenotypic variance – and

specifically the availability of heritable phenotypic diversity in the population. In this context,

the notion of evolvability refers to the capacity to evolve through the generation of heritable

variations in fitness. Two distinct positions concerning evolvability can be contrasted in the

literature (Sniegowski and Murphy, 2006): i) organisms evolved the capacity to evolve – i.e.

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


evolvability is an adaptation resulting from second-order selection; and ii) evolvability is a

by-product of other mechanisms that evolved to directly benefit the organisms (first order se-


The first point may appear teleological. Indeed, natural selection cannot adapt a popu-

lation for future contingencies any more than an effect can precede its cause (Dickinson and

Seger, 1999). As discussed previously, the generation of diversity is strongly promoted under

fluctuating selection (see p42). Even if such conditions are met, the concept of evolvability as

an adaptation implicitly involves selection among populations rather than individuals in this

particular context. Indeed, the average fitness effect of mutations strongly influence the opti-

mal mutation rate (Orr, 2000). As most random mutations are deleterious, a modifier driving

increased mutation rates would generally be counter-selected at the individual level, even

though it would increase the evolvability of the population. Mutator alleles impose a heavy

deleterious burden on the population over time, and continuous rise in global mutagenesis is

not a sustainable strategy (see p44). Because the average fitness effect actually depends on the

adaptedness of the population (Eyre-Walker and Keightley, 2007), increased mutation rates

are arguably more advantageous in adverse conditions. To some extent, mechanisms of stress-

induced mutagenesis limit the production of mutation to such situations. However, the muta-

bility they afford often arises as opportune side-effect of processes that evolved for more im-

mediate functions (Redfield, 2001; Matic et al., 2004). In these cases, there is no ground to

support the first hypothesis rather than the more parsimonious idea that evolvability is a by-

product of simpler functions subjected to first-order selection (Tenaillon et al., 2004; Gal-

hardo et al., 2007).

Nonetheless, the mechanisms underlying phenotypic variations are not limited to in-

creases in genome-wide mutagenesis. Several mechanisms capable of generating stable phe-

notypic variations have been presented hiterto. Most of these variations being teleonomically

confined to restricted areas of the phenotypic space, such processes limit the production of

deleterious mutations that hampers the evolutionary success of global modifiers. While some

of these mechanisms subvert other cellular functions (e.g. SSR and gene conversion), they

cannot be considered as mere by-products – but are better described as specialized mutators.

Hence, all kinds of contingency loci rather support the “evolvability-as-adaptation” hypothe-

sis (Sniegowski and Murphy, 2006). Because independent modules can be easily reused (see

above, p108), the modularity of biological entities is often presented as a factor promoting

evolvability. In this light, evolution appears as a self-promoting and explosive process. The

concept of modularity being fairly young in biological science, it remains unclear if this prop-

Evolvability and robustness – Evolvability


erty is selected to favor evolutionary tinkering – or if it arises as a by-product of other con-

straints (Wagner et al., 2007).

Overall, the combination of global processes with a range of specialized and targeted

mechanisms ensures adaptability to a wide panel of situations (see Figure 26). Although es-

sentially deleterious, genome-wide mutagenesis is essential in producing unbiased genetic in-

novations. This process is particularly inefficient and slow because it acts blindly at the

sequence level. In contrast, physiological regulation and contingency loci are programmed by

natural selection to rapidly and/or responsively create functional phenotypic variability – but

not true novelty. Besides, elements facilitating the exchange and acquisition of DNA se-

quences enable the instantaneous gain or loss of whole functions. Thus, not only different

processes reach different phenotypic scopes, but they also operate on different timescales

thereby increasing the flexibility of the organism’s response to change (Jablonka et al., 1995;

Rando and Verstrepen, 2007).

Introduction – Phenotypic plasticity, genetic variations and physiological regulation


II.3.2. Robustness

Some phenotypes are inherently less sensitive to perturbations. This property is often

referred to as robustness (de Visser et al., 2003) or canalization (Waddington, 1942b). Pheno-

typic robustness is likely to be a beneficial feature because adapted traits must be kept stable

in order to ensure the maintenance of fitness over time. Robustness appears at various levels

of biological organization, including gene expression; protein folding; metabolic flux; physio-

logical homeostasis; development and – ultimately – organism fitness. The mechanisms un-

derlying robustness are diverse – ranging from thermodynamic stability at the RNA and

protein level to behavior at the organism level. Phenotypes can be robust either against herita-

ble perturbations (e.g., mutations) or non-heritable perturbations (for instance noise or envi-

ronmental variations). These phenomena are referred to as genetic and environmental

robustness, respectively (Elena et al., 2006). Moreover, robustness can be an intrinsic property

of a phenotype or be extrinsically mediated by a dedicated mechanism.

The robustness of a trait faced to heritable perturbations depends on its so-called target

size. The more proteins are involved in the correct expression of this trait, the higher the odds

for the trait to be altered by mutation. Practically, the target size is difficult to estimate be-

cause it relies on a multiplicity of parameters such as the overall sequence size, the sequence

composition and the tolerance of each protein to mutations. The structure of the genetic code

illustrates a straightforward mechanism to modulate the intrinsic target size of a gene. Each

group of 6-fold degenerate codons (corresponding to leucine, arginine and serine) can be de-

composed into two groups of 4 and 2 codons, wherein only the last base of the codon varies.

In the smaller group, half of the mutations affecting the third position of codons results in

amino-acid changes at the protein level. In contrast, any such mutation affecting a codon of

the larger group leads to a synonymous codon, and hence would not impact fitness (but see

Results – Intrinsic evolutionary potential of genes, p212). Then, using a codon pertaining to

the first group decreases the mutational target size of the gene – thereby supporting purely ge-

netic robustness. It has been claimed that this property – termed codon volatility – could be

used to detect selection based on a single sequence (Plotkin and Dushoff, 2003; Plotkin et al.,

2004). While this possibility is unlikely (Dagan and Graur, 2005; Sharp, 2005; Plotkin et al.,

2006), it remains that low volatility could be selected to minimize error during translation

(Archetti, 2006). The redundancy of the genetic code can also avoid the occurrence of SSRs,

thereby avoiding increases in local mutation rates when robustness is desirable (Wanner et al.,

2008; Ackermann and Chao, 2006). Besides, large populations of proteins subjected to intense

Evolvability and robustness –


neutral drift under purifying selection and high mutation rates are enriched in stable variants

(Bloom et al., 2007; Bershtein et al., 2008), demonstrating that intrinsic robustness can be ef-

fectively selected experimentally. Another factor influencing the target size is functional re-

dundancy arising from gene duplication, polyploidy and alternative metabolic pathways.

Mechanisms mediating extrinsic robustness are best illustrated by chaperones – a spe-

cific subset of the heat-shock proteins (Hartl et al., 1994). Chaperones belong to structurally

unrelated protein families that share the ability to recognize and bind to aberrant protein con-

formations. Under normal physiological conditions, chaperones are involved in guiding pro-

tein folding during translation; assisting the entrance of newly synthesized polypeptides into

organelles; or facilitating the building of multimeric complexes. Under stressful conditions,

chaperones prevent misfolding and aggregation – or can even actively disaggregate damaged

proteins and restore their proper conformation. Some mutations that would normally affect the

folding or stability of the protein can be rescued by chaperones, which thus canalize the phe-

notypic expression of genetic mutations. In E. coli, the overproduction of the chaperone

GroEL is able to partly rescue fitness losses resulting from intense genetic drift (Fares et al.,

2002). In D. melanogaster (Rutherford and Lindquist, 1998) and A. thaliana (Queitsch et al.,

2002), the Hsp90 chaperone is also specifically involved in stabilizing several signaling path-

ways. Wheter extrinsic robustness is an adaptation or a contingent phenomenon will be dis-

cussed below (see p116). Chaperone genes are upregulated in response to various stresses.

Among different species, the thresholds for expression are correlated with the levels of stress

that they naturally undergo (Feder and Hofmann, 1999). By rescuing damaged proteins, these

proteins mediate robustness to non-heritable environmental perturbations.

It is likely that environmental and genetic robustness are linked. Notably, RNA se-

quences that have been evolved in silico toward the ability of folding into a given structure

across a wide range of temperatures are also less prone to structural change as a consequence

of mutations. Interestingly, the selection for increased stability also promoted the emergence

of modularity within the RNA molecules (Ancel and Fontana, 2000). In this particular case,

genetic robustness incidentally arose as a consequence of environmental robustness. The

aforementioned selection of protein with increased stability exemplifies the converse relation-

ship (Bloom et al., 2007; Bershtein et al., 2008) – wherein increased stability is selected to

favor genetic robustness to high mutation load, but also favors tolerance to a wider range of


Introduction – Phenotypic plasticity, genetic variations and physiological regulation


II.3.3. Links between robustness and evolvability

While evolvability promotes the adaptability of populations faced to new environ-

ments, robustness stabilizes phenotypes in front of adverse conditions. Although these two

processes seemingly carry distinctly opposite functions, their relationships are often ambigu-

ous. Indeed, mechanisms that generally promote robustness can also favor evolvability on oc-

casional situations, and conversely.

The buffering of phenotypic variations enlarges the neutral space accessible to a popu-

lation and thereby also widens the exploration of the adaptive landscape. By supporting the

diversification of a population over time, robustness can thus increase its evolvability. This

process cannot occur if the tolerance to mutation is purely grounded on genotypic robustness

– as is the case with codon volatility (see above p114). In silico evolution of RNA molecules

demonstrated that selection for genotypic robustness effectively antagonizes evolvability,

while selection for general phenotypic robustness increased evolvability (Wagner, 2008). In a

population experimentally subjected to directional selection, a mutation promoting the intrin-

sic stability of the target protein was shown to increases the fraction of advantageous amino

acid changes (Bloom et al., 2006). In this case, intrinsic robustness promoted evolvability be-

cause thermodynamic stability affects the phenotype of the gene – not its genotype.

The proteins involved in extrinsic robustness are often referred to as evolutionary ca-

pacitors. Their alteration can release phenotypic canalization – uncovering insofar hidden

variations. In the case of Hsp90, inactivation of the protein; modulation of its expression level

in response to environmental cues; or diversion from its usual targets through stress-mediated

saturation effectively lead to phenotypic diversification in D. melanogaster (Rutherford and

Lindquist, 1998) and A. thaliana (Queitsch et al., 2002). The exact mechanisms underlying

these changes are unknown, but they can be maintained independently of Hsp90 after selec-

tion during few generations – thereby increasing evolvability (see Genetic assimilation of

long lasting regulation, p107). Interestingly, expression of the GroES/EL chaperone is under

the control of a DNA inversion switch in B. fragilis (Kuwahara et al., 2004; and see p83).

This mechanism may promote the stochastic or controlled release of hidden phenotypic varia-


The hide-and-release of genetic variations is probably an intrinsic property of epistatic

genetic systems subjected to environmental interactions (Hermisson and Wagner, 2004;

Zhang, 2008). Computer simulations showed that inactivating a gene in a regulatory network

Evolvability and robustness – Links between robustness and evolvability


that has previously been evolved for phenotypic stability increases the phenotypic variance

and speed of adaptation toward a new optimal phenotype (Bergman and Siegal, 2003). In S.

cerevisiae, many single gene deletions increase the phenotypic variance of the cells, and can

thus act as evolutionary capacitors. These genes tend to be less dispensable with respect to

growth rate and are highly connected within cellular networks (Levy and Siegal, 2008). In

these reports, robustness is inferred from the phenotypic effects induced by genetic alterations

rather than from actual tolerance to perturbations. To a large extent, the phenomenon of ca-

pacitation can be regarded as an emergent feature of complex biological systems – and not

necessarily as an adaptation for canalization nor a consequence of robustness. The perceived

role of Hsp90 in extrinsic robustness could therefore simply reflect its central position in cel-

lular networks. Nonetheless, it has been proposed that selection for robustness can affect the

topology of interaction networks. The effect of a mutation increases with the number of char-

acters it affects – its pleiotropy. By reducing the number of negative pleiotropic effects per

mutation, selection for robustness can limit the interaction to a restricted set of elements –

thereby defining a module (Cooper et al., 2007b; Wagner et al., 2007). In this light, modular-

ity would be the consequence of selection for robustness. Yet modularity probably favors

evolvability on a larger timescale.

To some extent, processes that are evolvable at a given level of organization may

guarantee robustness higher in the hierarchy. For instance, the diversification of outer-

structures using contingency loci ensures the robustness of microbial infection within their

host. In this regard, RNA viruses constitute an extreme case. These entities exhibit the largest

mutation rates monitored across different taxa, which underlies their evolvability (see Figure

2, p32). The median selection coefficient against single mutations across different RNA vi-

ruses has been estimated to ca. 10.8% per generation – a figure that sharply contrast with the

ca. 1.7% measured in DNA organisms (Elena et al., 2006). Thus, RNA viruses are poorly ro-

bust at the genetic level. Now, the efficiency of natural selection increases with population

size and selective coefficient. These two parameters probably act in positive synergy within

population of RNA viruses – which are typically very large. The efficiency of selection may

then be used to drive the effective elimination of unfit variants at the benefit of non-mutated

genotype, thereby promoting robustness at the clonal population level. Such a strategy has

been referred to as anti-redundant in contrast to direct robustness mechanisms (Krakauer and

Plotkin, 2002).

Introduction – The integron genetic system



III.1. Overview of the system

III.1.1. A brief historical perspective

Most antibiotics are natural compounds synthesized by microorganisms. Although

their actual role in the wild has been questioned (Davies, 2006), they exhibit significant bacte-

ricidal or bacteriostatic activities by interfering with essential cellular processes of bacteria at

high concentrations. The unprecedented therapeutic benefits of antibiotic were readily illus-

trated during the second world-war, shortly after the initial discovery (Fleming, 1929) and

subsequent purification (Chain et al., 1940) of penicillin. This success inaugurated a new era

in the fight against bacterial pathogens, and initiated a massive effort toward the identification

and industrial production of new antibiotics.

The spontaneous apparition of rare resistant clone was not unnoticed at the time. How-

ever, mutations were then fundamentally regarded as discrete, independent and random events

(see Appendix), and an early study even confirmed this assumption in the case of antibiotic

resistance (Lederberg and Lederberg, 1952). In this context, the concomitant development of

resistance to several antibiotics was largely considered beyond adaptive potential of bacterial

populations. As exemplified in the previous chapter, bacteria developed numerous genetic

tools and strategies to overcome the drastic environmental changes they routinely undergo. In

hindsight, it is not surprising that clinical isolates of Shigella dysenteriae that were simultane-

ously resistant to four antibiotics (streptomycin, tetracycline, chloramphenicol and sulphona-

mides) were identified shortly after the introduction of these antibiotics in the 1950s

Overview of the system – A brief historical perspective


(Mitsuhashi et al., 1961). Today, multi-resistance has become a major public health issue.

Some pathogenic bacterial strains are virtually resistant to all known antibiotics. Novel classes

of antibiotic are ever more difficult to isolate and most of the innovations in the drug industry

rely on slight modifications of existing chemical scaffolds. Resistance determinant adapt so

quickly to this minor changes that drug development is a hardly viable, economically speak-


We now know that different strategies can results in resistance phenotypes: i) modifi-

cation of the antibiotic target; ii) bypass of the targeted pathway; iii) inactivation of the anti-

biotic; iv) modification of membrane permeability; and v) active efflux of antibiotics

(Tenover, 2006). The genetic requirements to achieve these functions range from mere point

mutations to acquisition of whole operon. It is clear that pre-existing genetic mechanisms

have been recruited to support the rapid evolution, acquisition and spread of multi-resistant

functions. Several ge-

netic elements imbri-

cate on top of each

other, just as Russian

dolls, are implicated

in the 1950s’ initial

outbreak of multi-

resistant Shigella

dysenteriae and fol-

lowing. During the

1970s, it was deter-

mined that multi-

resistance phenotypes

are frequently associ-

ated with transmissi-

ble plasmids and more

specifically with

transposable elements

located in this plas-

mids (Liebert et al.,

1999). The genetic

Introduction – The integron genetic system


system responsible for the gathering of resistance determinants on these mobile elements was

first described in the late 1980s (Martinez and de la Cruz, 1988) and termed integrons (Stokes

and Hall, 1989). It is now clear that such mobile integrons constitute the major vectors of an-

tibiotic multi-resistance in gram-negative and to a lesser extent in gram-positive bacteria.

Their importance in clinical and agricultural settings is reflected by the impressive amount of

epidemiological studies monitoring their prevalence and evolution (see Figure 27).

More recently, a much bigger integron was found on a chromosome of the Vibrio

cholerae (Mazel et al., 1998) and similar loci were further identified in a significant fraction

of environmental bacteria (Rowe-Magnus et al., 2001; Boucher et al., 2007). Such chromo-

somal integrons are ancient and relatively sedentary adaptive systems developed by bacteria

to face a changing world. Their discovery provided a paradigm to understand the emergence

of the multi-resistance integrons.

III.1.2. Structure of integrons

All integrons are composed of a stable platform, which contains the functional ele-

ments required for the functioning of the system, associated with a variable array of discrete

gene cassettes encoding accessory functions (see Figure 28).

a - The functional platform

The principal component of the integron functional platform is the intI gene which en-

codes a site-specific tyrosine recombinase. This enzyme catalyzes the specific excision and

integration of dedicated and discrete genetic elements, known as gene cassettes (Stokes and

Hall, 1989). The integration of cassette essentially occurs at a specific loci lying immediately

adjacent to intI, referred to as the primary recombination site attI (Collis et al., 1993). The ex-

pression of the gene contained in the integrated cassettes is ensured by a dedicated promoter

Pc which is generally embedded in the intI gene and oriented toward attI (Collis and Hall,

1995). In itself, the functional platform is stable and non-mobile.

b - The cassette array

Successive integration at the attI site results in the streamlined assembly of different

gene cassettes (Recchia et al., 1994). This cassette array constitutes the versatile part of the

integron. Gene cassettes are minimal functional elements intended to be mobilized by the in-

tegrase of integrons. They are generally constituted by a single ORF immediately followed by

Overview of the system – Different flavors of integron


a recombination site termed attC specifically recognize by IntI. Cassette-borne genes are gen-

erally promoterless and their expression is hence conditioned by the proximity of an external

promoter, essentially Pc. Accordingly, the ORF in cassettes are usually oriented toward the

attC site. The excision of cassettes by the IntI integrase leads to non-replicative covalently

closed circular intermediates (Collis and Hall, 1992).

III.1.3. Different flavors of integron

Although they present a similar organization, mobile and chromosomal integrons dis-

play distinctive characteristics that underpin different evolutionary history. This section out-

lines the paradigmatic characteristics of both groups. With the accumulation of genomic data,

it is becoming clear that a continuum of intermediate forms exists between these two ex-


a - Mobile integrons

Mobile integrons correspond to functional platforms that are physically associated

with mobile DNA elements, such as TEs and conjugative plasmids. These elements are used

as natural genetic vehicles, enabling efficient transmission between bacterial individual of the

same or different species. Mobile integrons contain few cassettes of heterogeneous origins

that are probably collected successively in different genomics backgrounds. The longest array

identified is composed of 8 cassettes (Naas et al., 2001a). The heterogeneity of the cassette is

attested by the unusual codon usage of their associated ORF and by the sequence and size di-

versity of their attC sites. Contrasting with their heterogenous origins, cassettes associated

with mobile integron display a striking functional homogeneity. A pool of >130 different cas-

settes harboring antibiotics resistance gene have been identified in mobile integrons (based on

98% nucleotide identity threshold) (Partridge et al., 2009). Together, these cassettes provide

resistance to most classes of antibiotics including β-lactams, all aminoglycosides, chloram-

phenicol, trimethoprim, streptothricin, rifampin, erythromycin, fosfomycin, lincomycin, qui-

nolones and antiseptics of the quaternary-ammonium-compound family (Mazel, 2006;

Partridge et al., 2009). Only, few cassettes in mobile integron from clinical isolates harbor

ORFs of unknown functions.

Five different classes of mobile integrons have been defined to date, based on the se-

quence of the encoded integrases (40–58% identity). Although only the three first ones have

been historically involved in the spread of multi-resistance phenotypes, all five classes have

Introduction – The integron genetic system


been associated with antibiotic-resistance determinants (Mazel, 2006). Class 1 integrons are

the most widespread and clinically important, as they are detected in 22 to 59% of Gram-

negative clinical isolates (Labbate et al., 2009). As such, they constitute the major experimen-

tal model of integron. They are associated with functional and non-functional transposons de-

rived from Tn402 that can be embedded in larger transposons, such as Tn21 (see Mounting of

mobile integrons, p137). Class 2 integrons are exclusively associated with Tn7 derivatives.

The integrase gene of class 2 integrons, intI2 contains a nonsense mutation that yields a non-

functional protein. Class 3 integrons are also thought to be located in a transposon and are

Overview of the system – Different flavors of integron


more prevalent than class 2. The other two classes of mobile integrons have been identified

through their involvement in the development of trimethoprim resistance in Vibrio species.

The class 4 integron is embedded in the integrative and conjugative element SXT found in Vi-

brio cholerae. The class 5 is located in a compound transposon carried on the pRSV1 plasmid

of Alivibrio salmonicida.

Interestingly, class 1 (Stokes et al., 2006) and class 3 (Xu et al., 2007) integrons that

are not associated with resistance genes have been recovered in environmental bacteria. In

each case, these integrons carried cassettes of unknown functions. In addition, a functional

class 2 integron isolated from beef cattle was associated with four non-antibiotic-resistance

gene cassettes (Barlow and Gobius, 2006). These data indicate that mobile integrons are not

specifically dedicated to antibiotic resistance. The prevalence of these functions results likely

from biased sampling focused on clinically relevant environment and reflects the evolutionary

success of integrons in these settings.

b - Chromosomal integrons

The integron described on the small chromosome of V. cholerae serotype O1 biotype

El Tor strain N16961 is the paradigmatic example of chromosomal integron (Mazel et al.,

1998). Its array contains 179 cassettes and spans ca. 3% of the whole genome. Contrasting

with the largely variable attCs found in mobile integrons, 149 cassettes comprise attC sites

that differ in sequence by less than 10% over their entire length of 122 to 124 nucleotides (see

Figure 29).

Chromosomal integrons have been found in a wide panel of bacterial species. A list of

representative examples is shown in Table 4 The cassette array of chromosomal integrons can

contains a much larger number of cassette (up to 217 in V. vulnificus), though some contains

only few or even no cassettes. The homogeneity of attC within a single integron is not con-

fined to V. cholerae but has also been observed in V. fischeri, V. metschnikovii, P. alcaligenes,

P. stutzeri, X. campestris, and T. denticola (Labbate et al., 2009). Homologous attC sites are

essentially species specific and define a genetically relevant typology (Rowe-Magnus et al.,

2001; Rowe-Magnus et al., 2003; and see Figure 29). To a large extent, chromosomal inte-

grons are sedentary resident in their host genomes (Rowe-Magnus et al., 2001; Boucher et al.,

2007). This point will be further discussed in the light of IntI phylogeny (see Chromosomal

integrons are ancient and widespread structures, p135).

In sharp contrast with mobile integrons, chromosomal integrons maintain highly di-

verse cassettes – mostly of unknown functions. The analysis of vibrionales genomes available

Introduction – The integron genetic system


Overview of the system – Different flavors of integron


in 2007 led to the identification of 1677 cassettes (Boucher et al., 2007). Among these, 65%

have not homologues in the database; another 4% correspond to proteins with homologues of

unknown functions; while 6% can only be assigned a vague general function. Altogether, 75%

of the cassette pool corresponds to accessory genes of undefined functions – which empha-

sizes the importance of integrons in gathering genetic diversity. These data parallel the obser-

vations made in environmental mobile integrons mentioned above. The remaining 25%

cassettes contain genes with a wide functional distribution (see Figure 30). The most preva-

lent functions are phage-related proteins; toxin-antitoxin systems; acetyltransferase; DNA

modification and virulence. The few functions which have been experimentally confirmed in-

Introduction – The integron genetic system


cludes restriction or methylation systems, sulfate-binding proteins, lipases, polysaccharide

biosynthesis and dNTP pyrophosphohydrolases (Rowe-Magnus et al., 2001; Smith and Sie

beling, 2003; Robinson et al., 2008). As mentioned above, most cassettes in large integrons

are silent – thus any mutation affecting these cassettes is expected to be neutral. In this light,

the overrepresentation of phage related functions in integrons probably reflects the neutral in-

cidence of mobile element insertions in silent cassettes. Large cassette arrays are enriched in

Toxin-Antitoxin (TA) systems – also known as post-segregational killing systems, which en-

code a stable toxin and its unstable cognate antitoxin. The genome of V. cholerae N16961

contains 13 TA loci, all of which are present in the integron array. Recently, these addiction

modules were shown to maintain the stability of the cassette array, by preventing excision and

loss of the surrounding silent cassettes (Rowe-Magnus et al., 2003; Szekeres et al., 2007).

A substantial part of the functions identified in chromosomal integrons are involved in

substrate modification (acetyltransferases) or interactions with biotic factors (virulence factors

and DNA modification). About 10 to 30% of the cassettes potentially encode proteins carry-

ing a signal peptide region for either membrane association or export from the cell (Koenig et

al., 2008). Besides, we found that ca. 30% of the cassette-encoded proteins display signatures

of multiple transmembrane domains (unpublished observations). Altogether, these data indi-

cate that chromosomal integrons carries important functions to mediate interactions with ex-

ternal environments.

III.2. Functional organization of integrons

III.2.1. A unique site-specific recombination mechanism

a - Double and single stranded recombination substrates

i. Integron integrase are tyrosine recombinases

Integron integrases belong to the family of tyrosine recombinases (see p82), though

they exhibit an additional and specific functional domain compared to other closely related

recombinases (Messier and Roy, 2001). These enzymes usually perform recombination be-

tween two DNA regions by establishing a synapse between their cognate binding site and the

subsequent resolution of the resulting HJ. Their typical core recombination sites consist of a

pair of highly conserved 9-13 bp inverted binding sites separated by a 6-8 bp central region

Functional organization of integrons – A unique site-specific recombination mechanism


(Grainge and Jayaram, 1999; Grindley et al., 2006). As described below, attI sites signifi-

cantly differ from this canonical organization, while attCs sites are processed in a very un-

conventional way.

ii. Structure of attI sites

The core recombination site of

attI is composed of two binding sites

termed L and R. The recombination

point is located in a conserved 5’-GTT-

3’ triplet between G and TT (Hall et al.,

1991). The inverted L binding, always

degenerate with respect to R is hardly recognizable. In addition, the central region differs

greatly between different attI sites. In vitro experiments performed with the class 1 integrase

IntI1 on its cognate double-stranded attI1 site demonstrated that four regions are actually

bound by the enzyme. Two of these regions correspond to the core site, while the other two

regions, dubbed DR1 and DR2 form direct repeats located 5' to the core site (Gravel et al.,

1998; Collis et al., 1998). The structure of attI1 is shown in Figure 31. The attIs sites from

different integrons diverge significantly, paralleling the pattern observed for integrases

(Rowe-Magnus et al., 2001). Cross recombination studies involving attIs and IntIs of het-

erologous origins evidenced that integrases preferentially recognize their cognate attI sites,

but did not rule out the possibility of cross talk between different systems (Hall et al., 1999;

Collis et al., 2002b). For instance, the inactivated integrase generally found in class 2 integron

can be complemented in trans by IntI1.

iii. attC sites form single stranded substrates

The structure of attC sites is more complex (see Figure 32). Each extremity contains a

degenerate core site, dubbed R”-L” and L’-R separated by a central region which is highly

variable in sequence and size (20-104 bp) (Mazel, 2006). A comparison of attC sites shows

that sequence conservation is restricted to two triplets 5’-AAC-3’ and 5’-GTT-3’ located in

the R” and R’ boxes, respectively. Consistent with attI sites, the recombination point is lo-

cated between the G and TT of the latter motif. Thus only the L’–R’ core site is recombino-

genic. Contrasting with their sequence heterogeneity, attC sites display a strikingly conserved

palindromic organization that can form cruciform structures through the extrusion and self-

pairing of both DNA strands (Stokes et al., 1997; Rowe-Magnus et al., 2003). Upon folding,

Introduction – The integron genetic system


single stranded attC

sites present an al-

most canonical core

site consisting in L"-

L' and R"-R' duplexes

separated by a bulged


The impor-

tance of this secon-

dary structure for

proper interaction

with the integrase

was initially put for-

ward by in vitro bind-

ing experiments

(Francia et al., 1999;

Johansson et al.,

2004). The delivery

of single stranded by conjugation showed that the bottom strand of attC sites is ca. 103 more

recombigenic than the top strand in vivo (Bouvier et al., 2005). The elucidation of the struc-

ture of vchIntIA bound to a folded attC bottom strand further confirmed that attCs sites con-

stitute atypical recombination substrates (MacDonald et al., 2006). Generally, the top and

bottom strands of attC sites by the orientation of conserved extra-helical bases with respect to

the 5’-GTT-3’ recombination motif differ upon folding. The structure of the IntI1-attC com-

plex showed that these extra-helical bases are precisely contacted by the IntI-specific addi-

tional domain (MacDonald et al., 2006). Sequence modified in such a way that extra-helical

bases are appropriately oriented on the top strand lead to a switch of integrase strand specific-

ity. However, this genetic manipulation resulted in cassettes being inserted in the wrong direc-

tion with respect to Pc, preventing expression of the associated ORF (Bouvier et al.,


In contrast to canonical core recombination sites, the genetic information required for

proper recombination is not contained in the primary sequence of attC sites, but mostly in

their secondary structures. This mechanism readily explains how cassettes with diverse attC

sites can be mobilized by a single mobile integron. Nevertheless, the evolutionary advantages

Functional organization of integrons – A unique site-specific recombination mechanism


arising from this atypical process are unclear, and will be discussed latter in light of new re-

sults (see Discussion – Single stranded DNA: a bridge between two systems, p241).

b - The different recombination reactions

i. attC x attC recombination

Recombination between two attC sites located in the same cassette array leads to the

excision of a circular cassette intermediate (Collis and Hall, 1992). Two single stranded and

appropriately folded attCs must be contacted at the same time by an integrase complex to es-

tablish a HJ. Resolution of the synapse leads to the excision of a covalently closed single

stranded intermediate from the bottom strand, while the top strand remains unchanged. Cas-

sette excision is thus an asymmetric and semi-conservative process. Upon replication, one of

the molecules remains unchanged while a cassette is effectively deleted from the other. Sev-

eral cassettes can be excised at the same time when the recombination does not occur between

to immediately successive sites. However, circular intermediates made of two cassettes seem

to be further resolved into single cassette circles prior to reintegration (Collis et al., 1993). In

this respect, it is worth noting that the sequence downstream the R’ box of a given attC gener-

ally forms a palindrome with the sequence upstream of the next cassette’s R” box. The circu-

lar cassette resulting from the recombination between these two sites then contains an attC

that folds in a longer and probably more stable stem-loop structure. This feature may promote

high rate of reintegration of excised cassette.

Recombination can also occur between attC sites located on two different molecules.

This intermolecular reaction is disfavored compared to the intramolecular recombination be-

tween physically linked sites described above. In the case of mobile integrons, recombination

frequently occurs between two copies of the same plasmid, resulting in co-integration. Co-

integration of two newly replicated chromosomes may occur but prevent proper segregation,

and lead to the formation of chromosome dimers after another round of replication. Although

such dimers can be resolved by dedicated mechanisms (Sivanathan et al., 2009), this makes

co-integration of two chromosomal integrons unlikely. When one of partner is a previously

excised circular cassette intermediate, recombination leads to the insertion of the cassette just

after the attC site contacted on the other molecule (Collis et al., 1993). Again, the process is

semi-conservative and only one replicated strand is modified.

Introduction – The integron genetic system


Functional organization of integrons – A unique site-specific recombination mechanism


ii. attI x attC recombination

The integration of a circular cassette intermediate preferentially occurs at the attI site

compared to an arbitrary attC site within the array (Collis et al., 1993; Collis et al., 2001).

This feature is essential in ensuring immediate expression of integrated cassettes by the up-

stream Pc promoter. In mobile integrons, attI x attC recombinations can occur between sites

located on two different plasmids, leading to co-integration. The excision of cassette resulting

from attI x attC recombination has never been accurately monitored.

The attI sites seem to be processed in the usual double stranded form. Conventional

resolution of the HJ formed between a single stranded attC and a double stranded attI would

lead to an abortive product. Proper resolution is thus dependent on unidentified host factors

and a model relying on replication has been proposed (Bouvier et al., 2005; and see Figure

33). According to this hypothesis, cassette insertion would only affect one of the daughter

DNA molecules upon replication. Supposing replication occured between the excision of a

cassette and its subsequent reintegration, the cassette may be duplicated if integration involves

the attI site of the daughter DNA derivating from the top strand.

iii. attI x attI recombination

The recombination between two attI sites has been observed but is particularly ineffi-

cient (Collis et al., 2001). This reaction may happen if the number of attI sites in the cell is

important – i.e. in the context of mobile integron harbored by high copy number plasmids.

iv. Recombination at secondary sites

Insertion events at unconventional sites outside of the integrons have been occasion-

ally observed. This can occur at very low frequency between either attI or attC site and 5’-

GTT-3’ containing sequences (Francia et al., 1993; Recchia et al., 1994; Recchia and Hall,

1995; Francia and Garcia Lobo, 1996; Francia et al., 1997; Hansson et al., 1997). In any case,

the expression of the inserted element is conditioned by the presence of a promoter at the in-

sertion site. The absence of surrounding recombination partners renders subsequent excision

unlikely, ensuring the stability of the inserted cassette.

c - Accessory factors

IntI-mediated recombination does not seem to rely on any absolutely required acces-

sory factor. The attI x attC insertion reaction was recently reconstructed in vitro with the class

Introduction – The integron genetic system


1 system, which suggests that IntI1 indeed possesses all the functions required to carry out

this reaction (Dubois et al., 2007). However, this observation does not rule out the existence

of accessory factors that might increase recombination efficiency in some systems. For in-

stance, attI x attC recombination occurred at a 2,600-fold higher rate in V. cholerae than in E.

coli using a system derived from V. cholerae. In contrast, the recombination frequencies in a

system derived from the Class I integron were identical in both species (Biskri et al., 2005).

This suggests that host factors in V. cholerae increase recombination in the resident chromo-

somal integron, while the class 1 mobile integrons achieve higher degree of independence.

III.2.2. Expression of cassettes’ genes

a - Transcription

Few exceptions aside (e.g. Bissonnette et al., 1991; Stokes and Hall, 1991), gene cas-

settes are promoterless and their correct expression relies on their relative position from the

Pc promoter. Cassette ex-

pression has been studied

in details in class 1 inte-

gron, which display two

potential promoter Pc1 and

Pc2 (Collis and Hall, 1995;

and see Figure 34). The

Pc1 is embedded in the

intI1 sequence (Stokes and

Hall, 1989). Four version

(1 strong, 2 weaks and 1

hybrid intermediate) that

differ by the sequence of

the -35 and -15 boxes has

been distinguished

(Levesque et al., 1994;

Bunny et al., 1995). The

Pc2 is located in the upper

part of attI but is usually

Functional organization of integrons – Expression of cassettes’ genes


inactivated by incorrect spacing between the -35 and -15 elements. However, a canonical

spacer has been observed in some attIs, generally associated with a weak version of the Pc1

(Collis and Hall, 1995). The Pc promoter of class 2 integrons has not been accurately mapped

but seems to lie within attI2, hence resembling Pc2 (Levesque et al., 1994). In class 3 inte-

grons (Collis et al., 2002a) and in the chromosomal integron of Pseudomonas stutzeri strain Q

(Coleman and Holmes, 2005), the Pc has been located in the 5’ part of the integrase gene,

similarly to Pc1. The location of the Pc in V. cholerae has never been explored experimentally

and is also assumed to be embedded in intI.

In class 1 integrons, the transcripts originating from both Pc promoters are of varying

lengths and can span several cassettes. Folded attC sites may determine this pattern by func-

tioning as transcriptional terminators (Collis and Hall, 1995). The representation of a given

cassette’s gene in the transcript pool, and hence its expression level, decreases with increasing

distance from the promoter. Then, only the few first cassettes are expressed at significant

level, depending on the strength of the promoter (Collis and Hall, 1995). This model seems

applicable to all types of integrons. In large chromosomal integrons, the majority of genes in

the array are thus silent.

b - Translation

The presence of binding motif initiating the assembly of ribosome (RBS) is a major

determinant of gene expression levels (Shultzaberger et al., 2001). Some cassette-borne genes

are preceded by a functional RBS, while the motif is seemingly absent from others. In this lat-

ter case, the translation complex can be initiated at an upstream ORF. In class 1 integron, a

small ORF preceded by a functional RBS is located in attI1 and is thus present in all tran-

scripts originating from the Pcs. This ORF – dubbed orf11 – overlaps the recombination point

so that its actual 3’ end depends on the first cassette inserted in the array. When the first cas-

sette in the array brings an appropriately located termination codon, the translation complex

responsible for ORF11 expression can process to the cassette encoded ORF. This mechanism

accounts for a significant part of the expression of some cassettes (Hanau-Berçot et al., 2002).

In the same perspective, the presence of small ORFs in attC sites may serve to increasing ex-

pression by mediating processivity of the translational complex.

Introduction – The integron genetic system


Integrons and evolution – Chromosomal integron as the source of mobile integrons


III.3. Integrons and evolution

III.3.1. Chromosomal integron as the source of mobile integrons

a - Chromosomal integrons are ancient and widespread structures

Integrases of the tyrosine-recombinase family are essential in the processing of a wide

variety of mobile elements – including integrons; phages; ICEs and genomic islands. The

plasticity associated with the content of these elements as well as with their genomic locations

makes integrases stable models to resolve their phylogenetic relationships (Boyd et al., 2009).

From a broad perspective, the integron integrases form a well defined clade among the tyro-

sine-recombinase family (see Figure 35).

A comprehensive analysis of the 603 completely or partially sequenced genomes

available in 2007 showed that 9% of them carry an integron integrase (Boucher et al., 2007).

The phylogenetic relationship between the 56 corresponding integrases is showed in Figure

36. Three major groups can be distinguished in this tree: i) the soil-freshwater proteobacteria

group of integrons, mostly composed of proteobacteria from freshwater and soil environ-

ments; ii) the marine γ-proteobacteria group; and iii) the inverted integrase group, character-

ized by the co-linear orientation of the integrase with respect to the cassette array, so that the

attI site is found in the 3’ end of the integrase. The first two clades form ecologically relevant

taxons (Mazel, 2006), while the third one regroup taxonomically diverse organisms but corre-

late with a structural peculiarity (Boucher et al., 2007).

Overall, the distribution of chromosomal integrons spans several bacterial phyla and

the branching pattern of integrases is in good agreement with the organismal phylogeny. This

clearly shows that integrons are ancient and generally stable genomic structures (Rowe-

Magnus et al., 2001; Rowe-Magnus et al., 2003; Mazel, 2006; Nemergut et al., 2008). Never-

theless, several phylogenetic incongruences can be noted (Boucher et al., 2007; and see

Figure 36). Notably, alteromomonades (including Shewanella, Pseudoalteromonas, Altero-

monas) and vibrionales normally are sister taxa that are closely related to pseudomonadales to

form a consistent group a marine species. In the IntI based dendrogram, vibrionales atypically

branch whithin the alteromomonades to cluster with Alteromonas and Pseudoalteromonas,

thereby separating these groups from the Shewanella. The vibrionales taxon mostly keeps its

monophyletic structure in the IntI tree, at the exception of V. fischeri. Consequently, the pseu-

Introduction – The integron genetic system


Integrons and evolution – Chromosomal integron as the source of mobile integrons


domonadales taxon is fractioned in discrete clusters that branches in diverse location of the

tree. One cluster is related to the soil-freshwater proteobacteria group, another branch with V.

fischeri, while others are scattered within the alteromomanades. Overall, the strinking poly-

phyletic structure of alteromomanades, together with the widespread distribution of pseudo-

monadales in the IntI-based phylogeny, suggests that several transfers of the integron platform

occurred independently along the evolutionary history of chromosomal integrons. The obser-

vation that the monophyletic group of integrons with inverted integrases includes representa-

tives from unrelated phyla (Proteobacteria, Planctomycetes, Chlorobi, Spirochaetes and

Cyanobacteria) further supports the occasional mobility of chromosomal integrons.

Importantly, the integrases of mobile integrons do not group together but are rather

scattered in the tree. IntI1 and IntI3 are closely related and pertain to the soil-freshwater pro-

teobacteria group. IntI2 and the integrases corresponding to the two other mobile integron re-

ported in Vibrio branch to various points within the marine γ-proteobacteria group. Hence,

despite their common functional contribution to antibiotic resistance phenotypes, mobile inte-

grons does not form an evolutionary consistent group. In contrast, they arose several times in-

dependently, most probably from chromosomal integrons incidentally mobilized by other

mobile elements (Rowe-Magnus et al., 2001).

b - Mounting of mobile integrons

Phylogenetic evidences strongly suggest that mobile integrons originated from the as-

sociation of chromosomal integrons with mobile elements. The mobilization processes that

led to the successful radiation of class 1 integrons within clinical environments are relatively

well understood. The following paragraph provides a reconstitution of the evolutionary his-

tory that hypothetically generated the most prevalent forms of class 1 integrons (Labbate et

al., 2009). The different steps of this scenario are outlined in Figure 37.

Class 1 integrons not associated with resistance cassettes have been reported in the chromo-

somes of the environmental bacteria Azoarcus communis MUL2G9 and Acidovorax sp.

MUL2G8A (Stokes et al., 2006). These integrons are not associated with mobile elements and

may be regarded as the ancestor of mobile class 1 integrons. Such a chromosomal integron

was subsequently embedded in a functional transposon, through the addition of transposition

genes associated with two cognate inverted repeats. This structure then acquired a cassette

harboring qacE (resistance to quaternary ammonium compounds). The resulting element is

known as Tn402. This transposition system particularly targets the resolution (res) sites of

plasmids and other transposons. In this context, the qacE cassette potentially provided an

Introduction – The integron genetic system


Integrons and evolution – Chromosomal integron as the source of mobile integrons


adaptive advantage that drove the transfer of Tn402 to different plasmid and transposons. The

next important step was the recruitment of sulI (resistance to sulfonamides). This event fol-

lowed an unconventional mechanism that resulted in the deletion of the 5’ end of qacE and its

associated attC site. The acquisition of this major resistance determinant probably boosted the

spread of the Tn402 derivative in a wide range of genetic backgrounds. The mobile integron

then gain access to a vast repertoire of cassettes, among which potential resistance cassettes

(see next section). This spread was accompanied by diverse alterations and associations with

other ISs. A prominent event was the deletion of part of the transposition region, leading to

the inactivation of Tn402. This led to the genetic context that is now prevalent in clinical iso-

lates: an upstream region termed 3’-CS consisting of the left-hand inverted repeat associated

with the intI1/attI1 functional platform, and a downstream 5’-CS consisting of the qacE frag-

ment associated with the sul1 gene and the partially deleted transposition region (see Figure


This structure is still mobilizable by related transposases in trans or through the asso-

ciation with another active transposon. This latter case is best illustrated by the Tn21 transpo-

son, which results from the insertion into a Tn501-like transposon harboring the mer genes

conferring resistance to mercury. Tn21 borne by plasmid NR1 (R100) contributed to the ini-

tial outbreak of multi-resistance identified in the 1950s mentioned above (Mitsuhashi et al.,

1961), and led to the historical discovery of integrons (Martinez and de la Cruz, 1988).

Evidences showing that other mobile integrons followed a similar evolutionary path

are widespread. Class 3 integrons are also associated with Tn402-like elements and versions

lacking these transposition features have been found in the chromosome of two Delftia spe-

cies (Xu et al., 2007). Class 2 integron are generally associated with Tn7, an active transposon

that can preferentially target conjugative plasmids or a unique conserved site within bacterial

chromosomes (Parks and Peters, 2009). The integron platform harbored by the Vibrio sal-

monicidae plasmid pRVS1 is very closely related to the one of Pseudoalteromonas

haloplanktis TAC125, providing a clear-cut example of recent chromosomal integron mobili-

zation (Szekeres et al., 2007).

c - Resistance gene and chromosomal integrons

The functional platforms of mobile integrons probably arise from essentially stable

chromosomal integrons. However, chromosomal integrons mostly harbor cassettes of un-

known functions. In this context, where do the resistance cassettes found in many mobile in-

tegron come from? Two lines of evidences suggest that resistance cassettes are gathered by

Introduction – The integron genetic system


mobile integrons wandering in different host genomes: i) some resistance cassette display an

attC sequence that match the typical sequence of chromosomal integrons; ii) some chromo-

somal integron carry identifiable resistance genes.

The first attC sites to be described were 59 bp long and were all closely related in se-

quence (Cameron et al., 1986; Martinez and de la Cruz, 1988). It was initially though that the

length and structure of these elements, termed 59-be were characteristic of resistance cassette

(Stokes and Hall, 1989). These 59-be were subsequently found to be closely related to the

attC sites typically harbored in chromosomal integron of Xanthomonas spp (Rowe-Magnus et

al., 2001; Gillings et al., 2005). This implies that many common resistance cassettes (such as

aadA1, aadA6, aadA7, aadB, aacA and qacF) probably originated in these genomes (Rowe-

Magnus et al., 2001). Likewise, several resistance cassettes with longer attC sites probably

originated in Vibrio spp. This feature contributed to the realization that previously identified

Vibrio cholerae repeats (VCR) (Barker et al., 1994) were part of a large chromosomal inte-

gron (Recchia and Hall, 1997; Clark et al., 1997; Mazel et al., 1998). Examples include the

CARB-4 and dfrVI cassettes, which harbor typical attC sites of V. metschnikovii and V. para-

haemolyticus respectively (Rowe-Magnus et al., 2001). Cassettes harboring a dfr gene are

also present in the mobile integrons located in SXT and plasmid pRVS1. Recently, two cas-

settes comprising the qnr gene (resistance to fluoroquinolones) associated with attC sites

typical of V. parahaemolyticus and V. cholerae were found in class 1 integrons isolated from

V. cholerae (Fonseca et al., 2008). Interestingly, these genes are closely related to non-mobile

qnr genes present in vibrionales genome (Poirel et al., 2005). It is tempting to speculate that

these structural genes were independently recruited to the chromosomal integron present in

the genome of these Vibrio species, and were latter recruited by class 1 integron. Closely re-

lated qnr genes are also associated with class 1 integron in various enterobacteria (Nordmann

and Poirel, 2005). However, these genes lack an attC site and their integration relies on inde-

pendent mechanisms (Robicsek et al., 2006a). It might be that these cassettes initially drove

the spread of qnr genes from vibrionales reservoirs, and were latter atypically mobilized in the

so-called complex integrons.

Several cassette encoding resistance determinants have been found in chromosomal in-

tegrons. An unexpressed but functional catb9 cassette specifying resistance to chlorampheni-

col is present in the array of V. cholerae N16961 and is associated with an attC site

corresponding to this species (Mazel et al., 1998). Similarly, CARB-7 and CARB-9 cassettes

harboring typical attCs were independently found in the chromosomal array of environmental

V. cholerae isolates (Melano et al., 2002; Petroni et al., 2004). These cassettes provide resis-

Integrons and evolution – Chromosomal integron as the source of mobile integrons


tance to β-lactam and resemble the CARB-4 cassette found in class 1 integron. A dfr cassette

was recently found in the array of V. splendidus LGP32 (Le Roux et al., 2009), parralleling

the presence of such cassette in diverse mobile integrons. Also, an aacC-A7 cassette confer-

ring resistance to aminoglycosides was identified in Saccarophagus degradans (Elbourne and

Hall, 2006). The attC signature is indicative of an exogenous origin related to Nitrosococcus

oceani, which provide an example of gene cassette exchange between chromosomal inte-


d - The generation of cassettes

Some cassettes harbor ORFs that are homologous to structural genes, indicating that

chromosomal sequences can be recruited to cassette. In several instances, the analysis of the

phylogenetic distribution of genes found in different cassettes and outside a cassette context

clearly suggests that recruitment to cassette occurred several time independently (Recchia and

Hall, 1997; Rowe-Magnus et al., 2003; Boucher et al., 2006). This point is strengthened by

the observation of cassettes harboring closely related genes associated with clearly distinct

attC types (Recchia and Hall, 1997; unpublished data). Together, these data implies that cas-

sette generation is not a rare event. Even though hypotheses have been put forward, the

mechanisms responsible for the creation of cassettes remain unknown.

The observation that most cassette lack promoter and contain very little non-coding

sequence led R. M. Hall and colleagues to suggest that the process of cassette formation may

involve the reverse transcription of an mRNA molecule (Hall et al., 1991; Recchia and Hall,

1997). The repeated identification of bacterial group II introns inserted behind attC sites

(Sunde, 2005) fostered P. Roy and colleagues to propose a mechanism relying on the retro-

transpositional and RNA catalytic properties of these elements (Centron and Roy, 2002; Leon

and Roy, 2003). Although this model may account for the occasional creation of cassettes, it

relies on processes that are too complicated and contingent to account for the tremendous di-

versity of cassette observed in single genomes.

The fact that some genomes harbor hundreds of cassettes with closely related attC se-

quences may indicate that they encode the machinery required for gene recruitment and addi-

tion of specific attC sites (Rowe-Magnus et al., 2003). These genomes would then stand as

genuine cassette factories. The corresponding integrons are often referred to as superintegrons

(Mazel, 2006). As will be further discussed in the next sections, this hypothesis implies that

superintegrons are the source of cassettes observed in mobile integrons.

Introduction – The integron genetic system


Anecdotally, new cassettes can arise through the modification of existing cassettes.

Cases of cassette fusion have been reported (for instance, see Centron and Roy, 2002). By fa-

voring the genetic linkage and co-expression of genes, such events may participate in the

creation of novel operon, such as those evidenced in various genomic islands. As will be dis-

cussed below, mobile elements are often inserted in integron. Such elements can introduce

new gene between two attC sites, thereby providing the material for subsequent evolution of a

novel cassette.

III.3.2. A central role in horizontal gene transfer

a - Evidences for interspecies cassettes exchanges

The recognition that the sequence of attC sites tend to be species specific provide a

valuable tool to trace the origins of cassettes (Rowe-Magnus et al., 2001). As illustrated

above, the origin of several antibiotic resistance cassettes have been determined this way.

Other cases implicating cassettes unrelated to antibiotic resistance pervade chromosomal inte-

grons (unpublished data). However, a systematic and rigorous analysis based on attC signa-

tures remains to be carried out.

These data show that cassette exchange between different species can readily occur.

The mobilization of gene cassette with diverse recombination sequences rely on the specific

recombination mechanisms at work in integrons. The information carried by attC sites is

mostly expressed upon folding of single-stranded molecules (see above, p127), which allow

integron integrases to recognize a wide variety of seemingly unrelated sites. The recombina-

tion efficiency on different substrates seems to vary among integrases. For instance, IntI1 ef-

ficiently recombine a wider range of structure than vchIntIA (Biskri et al., 2005), a feature

that may partly explain the success of the class 1 integron.

Cassettes might be acquired upon excision as circular intermediate through transfor-

mation for instance. In this respect, it is noteworthy that major contributors to the cassette

pool, such as V. cholerae and most probably other Vibrio, enter a transformable state when

forming biofilms in the presence of chitin (Meibom et al., 2005; Bartlett and Azam, 2005; and

see p60). Another route for the transmission of cassette involves highjacking of mobile ele-


Integrons and evolution – A central role in horizontal gene transfer


b - Mobile integrons and the spread of cassette

The complex genetic structure resulting from the coupling of sedentary chromosomal

integrons to mobile elements have been described in details above (see p137). Associations

with mobile elements that show a wide range of hosts, such as most transposons and are self-

mobilizable, such as conjugative plasmids greatly enhances the dispersion potential of inte-

grons. Although the prevalence of mobile integrons unrelated to antibiotic resistance has

hardly been studied in most natural environmental environment, at least five clearly inde-

pendent cases of mobile integrons have been described so far. The mobilization of gene cas-

sette by mobile integron is probably a frequent event in natural setting, and this may play a

significant role in the dissemination of cassette away from the factory genomes (see The gen-

eration of cassettes, p141). Indeed, mobile integron are likely to experience a variety of ge-

netic backgrounds and can readily exchange cassettes with the chromosomal integron of their

hosts and with other mobile integrons. This process is reflected by the diverse origins of the

attC sites identified in the arrays of mobile integrons. Mobile integron may also functions as

efficient vehicles to shuttle cassette between chromosomal integrons. The direct recruitment

of a chromosomally borne cassette by a class 1 integron was demonstrated with the catB9 cas-

sette of V. cholerae N16961 (Rowe-Magnus et al., 2002).

c - A cassette metagenome

The contribution of integron to genome evolution clearly extends beyond the classical

vertical transfer of genetic information, as illustrated by the common exchange of gene cas-

sette between different species. HGT is an important contributor to bacterial evolution

(Ochman et al., 2000), and integrons certainly stand as a major facilitating mechanism. Over-

all, the global pool of cassettes can be encompassed as a shared metagenome potentially ac-

cessible by a diverse bacterial community (Stokes et al., 2001).

Available genomic sequences provide a glimpse of the genetic diversity encoded in

gene cassettes (see Figure 30, p125). However, sequenced genomes constitute a particularly

biased sample of the bacterial biosphere and the actual diversity can only be assessed through

metagenomic approaches (Keller and Zengler, 2004). Several techniques based on PCR am-

plification using degenerate primers have been developed to extract integrons from metage-

nomic samples in a culture-independent manner. These techniques can be used to recover intI

sequences, using primer matching the seemingly conserved regions (Nemergut et al., 2004);

or gene cassettes, using primer targeting attC sites (Stokes et al., 2001; Holmes et al., 2003).

Introduction – The integron genetic system


Several studies led to the identification of integrases and a limited number of associated cas-

settes in atypical environments such as heavy-metal-contaminated mine tailings (Nemergut et

al., 2004) or hydrothermal vents (Elsaied et al., 2007). Other reports specifically focused on

the analysis of cassette diversity. Because the sequences of attC sites are very variable, prim-

ers designed to amplify cassette are necessarily restricted to a subset of the actual cassette

pool. Despite these limitations, a study restricted to a 50 m2 soil plot suggested that the area

contained >2300 different cassettes (Michael et al., 2004). This estimation was based on the

limited information provided by the fine resolution of cassette lengths after cassette PCR.

More recently a similar study was conducted on marine sediment samples, but 2145 amplified

cassettes were entirely sequenced. Four different – though geographically close – sites were

sampled in Halifax harbor, Canada. Diversity analysis suggested that these locations collec-

tively contain ca. 3000 different cassettes. Here again, a major fraction (80%) of the recovered

cassettes harbored genes of unknown functions (Koenig et al., 2008). Together, these data

highlight that the cassette metagenome constitute an incredible source of genetic diversity.

Is the cassette metagenome globally available to all integrons? Or is it subdivided into

smaller pool restricted to specific communities? This important question remains largely un-

solved. It is likely that integron integrases developed some substrate preferences along their

evolutionary history. However, the existence of mobile integrons not only offer facilitated ac-

cess to novel cassette but also provides genomes with integrases potentially selected to ac-

commodate a wide range of substrates (Biskri et al., 2005). Nonetheless, the overall cassette

diversity available in a given environment is likely to be limited and somewhat adapted to a

specific niche, leading to specialized local pools or cassette ecotypes. In the Halifax harbor

study mentioned above, the authors found that two geographically distant – but ecologically

related – sites contain more cassette types in common than expected by chance, which support

the ecotype concept (Koenig et al., 2008). In this context, access to the global gene pool is

necessarily limited and depends on the migration pattern of individual bacteria and/or mobile

integron between specific niches.

III.3.3. Integrons as sophisticated contingency loci

a - Working model

This section provides an outlook of the essential features characterizing integrons.

These properties can be integrated in a consistent conceptual model which highlights the

Integrons and evolution – Integrons as sophisticated contingency loci


adaptive role of the system (see Figure 38).

The minimal integron consists in a functional platform composed of a site-specific in-

tegrase gene intI, tightly associated with a cognate recombination site attI and a promoter Pc

ensuring the expression of integrated cassettes. The genetic linkage between these elements is

very strong, as the Pc promoters are embedded in intI and/or attI sequence. This tightly

packed locus is able to integrate and expressed an indefinite number of accessory traits mobi-

lized as gene cassettes that are stockpiled downstream of attI. Because gene cassettes are

promoterless, only the first cassettes are expressed by Pc. While random excisions occur

throughout the cassette array to form non-replicable circular intermediate containing one or

several cassettes, integrations preferentially occur at the attI site. These observations support a

model in which newly integrated cassettes are immediately expressed and thereby gradually

driving previously integrated cassettes away from expression. Integration events are thus sub-

jected to selection. Non-expressed cassettes in the array constitute a reservoir of standing ge-

netic variability that can be mobilized through excision and subsequent reintegration at attI. In

this light, an integrons function as a sophisticated contingency locus that is able to switch the

expression state of accessory traits owing to dedicated site-specific recombination machinery.

This defines a system that generates genetic diversity at a targeted locus.

While most contingency loci can only access a narrow range of phenotype, integrons

can potentially access a huge metagenomic repertoire of gene cassettes that encodes a wide

range of different functions. In at least some species, an extended version of the system is

necessarily able to generate new cassettes to enrich the existing pool. In addition, the stepwise

attenuation of expression resulting from successive integration events led to the progressive

relaxation of selective pressure. This particular regime may drive accelerated diversification

Introduction – The integron genetic system


of cassette-encoded proteins (see Impact of expression strength on sequence evolution, p106).

Overall, the functions carried in the cassettes might provide ready-made adaptive opportuni-

ties to adapt to a vast panel of different environments. The integration of a new cassette can

either provide an advantageous phenotypic trait in the current conditions, in which case it will

be selected, or unproductively take previously integrated advantageous traits away from ex-

pression, an event that must be counter-selected. In this light, the integron system, by stock-

piling elements that previously prove adaptive, provides a form of molecular memory. Drastic

reductive evolution leading to the loss of unexpressed gene cassettes is reduced by the pres-

ence of addiction module (TA systems) in large array. As discussed previously, the rate of

phenotypic switching is an essential parameter in such systems (see Stochastic switches as a

bet-hedging strategy, p102 and Evolution of recombination rate in integrons, p152). However,

the effective recombination dynamic in integron was essentially unknown until recently.

b - An unknown recombination dynamic

The functional activities of integron integrases has been demonstrated experimentally

over a wide range of substrates and model systems. However, in all this studies the integrase

is overexpressed, generally from a plasmidic vector. In fact, spontaneous changes in cassette

arrays within a single strain have never been observed in controlled conditions.

Nevertheless, evidences of modification in integron cassettes array are countless. This

is manifest in the continuum of slightly different array identified in mobile integron isolated

from clinical isolates. An overview of these different structures can be found on the integrall

website (http://integrall.bio.ua.pt/; Moura et al., 2009). Valuable information concerning

large chromosomal integrons comes from the comparison of completely sequenced array. The

large chromosomal arrays reported in several vibrionacae are highly variable and most cas-

settes are only found in one genome (Boucher et al., 2006; and see Table 5). Even the array of

the two closely related V. vulnificus strains CMCP6 and YJ016, which respectively harbor 211

and 188 cassettes, show little similarity in composition or order (Chen et al., 2003). In fact,

integrons stand as one of the most variable genomic loci in these organisms. This feature has

been used as a typing system to finely resolve phylogenetic relationships between otherwise

identical strains (Labbate et al., 2007). However, this method relies on the length resolution of

cassette PCR amplicons and do not allow clear identification of the recombination events.

A fine analysis of the recombination dynamic would require sequencing the array of

very closely related isolates. Such a situation is unlikely to occur by the random sequencing of

environmental bacteria. Nevertheless, the comparison of three complete genome of

Integrons and evolution – Integrons as sophisticated contingency loci


pathogenic V. cholerae strains isolated from related pandemic outbreak recently provided a

better picture (Feng et al., 2008). Overall, 207 different cassettes, of which only 36 are

unique, have been identified. As illustrated in Figure 39, there is a substantial conservation of

cassette order between genomes. More precisely, the pattern of syntheny is organized in cas-

sette blocks. This strongly supports the fact that several cassettes can be excised and reinte-

grated together. As expected from the model discussed above, the distal part of the array is

less variable than the proximal one. Several observations are consistent with the mobilization

of distal cassette and subsequent reintegration at attI (see arrows in Figure 39B). Interestingly,

many cassettes seem to have been duplicated, most probably through the copying of pre-

existing cassettes in the same integron. For instance, of the 40 cassettes which were appar-

ently integrated at the attI site of M66-2 since divergence from N16961 (block A, Figure 39),

21 were copied from downstream cassettes. Among these, one is located in block A, 12 are

scattered in downstream blocks and 8 correspond to cassettes present in N16961 but absent in

M66-2. These last cassettes may have been duplicated and lost or may correspond to effective

mobilization from the distal to the proximal part of the array. These data are in accordance

with the mobilization of cassette as single stranded substrates (see Figure 33). Based on a so-

phisticated whole genome analysis, the authors estimated that the M66-2 and N16961 strains

(which were isolated in 1937 and 1971 respectively) diverged around 1923, while strain 0395

(isolated in 1965) diverge from the former clade in 1880. Hence, substantial variation in the

cassettes arrays accumulated in few decades.

A better understanding of the recombination dynamic of integrons would benefit from

experimental evolution studies. Nevertheless, the apparent absence of recombination under

laboratory conditions seriously hampered this approach. What is the source of the discrepancy

between the overwhelming variability observed in natura and the stability in controlled condi-

tions? Obviously, the natural selection of newly expressed traits upon integration plays a sig-

nificant role in the cassette dynamic. As no adaptive recombination were observed when a

Introduction – The integron genetic system


resistance cassette capable of overcoming an antibiotic selection was afforded in experimental

systems, the absence of recombination could not stem from an inadequacy between selective

pressure and the adaptive potential carried in the array. Instead, the array stability results from

limiting rates of recombination. The integrase is indeed expressed at very low levels in stan-

dard laboratory cultures. Despite extensive efforts, the intI1 promoter proves impossible to

map in these conditions (M.C. Ploy, unpublished observation). Interestingly, overexpression

Integrons and evolution – Integrons as sophisticated contingency loci


of integron integrases is deleterious, as attested by growth and survival rates (Mazel lab, un-

published observations). In addition, a recent study reported that about one third of the intI

genes identified in public databases are pseudogenes, inactivated by either an internal stop

codon or a frameshift mutation (Nemergut et al., 2008).

We found that integrase expression is actually controlled by a wide physiological

stress response in a majority of integrons. This feature endows the integron system with the

ability to adapt responsively to environmental changes. This will be exposed in the first sec-

tion of the results (see Recombination in integrons is controled by the SOS response to stress,





Results – Evolution of recombination rate in integrons





Integrons can be regarded as sophisticated contingency loci (see p144) capable of

switching the expression state of independent gene cassettes harboring diverse functions.

These systems thus exploit a bet-hedging strategy wherein the population diversifies by site-

specific recombination into subpopulations expressing different cassettes. The extent of this

diversification is constrained by the actual recombination rate. Although this parameter is es-

sential in understanding the adaptive properties of integrons, the dynamic of cassette recom-

bination remains largely unknown (see p146). Theoretical studies on phenotypic switches

suggest that the recombination rate should display a strong dependency on the rate at which

challenging environmental changes occur (a challenging environment being one that can be

overcome through cassette swapping) (see p103). In this work, we developed a model to track

the optimal recombination rate of an “idealized” integron under fluctuating selection.


We used C-implemented Monte-Carlo simulations to model the evolution of the site-

specific recombination rate in an integron subjected to a randomly fluctuating environment.

The fitness conferred by each cassette in the integron array is given by a stabilizing fitness

function. The optimum of this function is randomly shifted during environmental changes,

which occur stochastically at a predefined rate. Only the first cassette in the array is expressed

and hence contributes to fitness. Deleterious mutations in non-expressed cassettes do not af-

fect the phenotype. The integrase is treated as a modifier locus which is indirectly selected

through its effect on cassette expression. Mutations between integrase alleles associated with

different recombination rate is allowed. The mean recombination rate of the population, its

mean fitness and the proportion of deleterious mutations are recorded at each generation.


Low rates of cassette recombination ensure the stability of the system in steady condi-

tions. However, too few recombinations would limit the adaptability of integrons facing vari-

Results – Evolution of recombination rate in integrons


able environments. In contrast, too high recombination rates exaggerately decrease the mean

fitness and are thus counter-selected. We found that the optimal recombination rate selected in

fluctuating environments is linearly proportional to the rate of environmental changes.

Though it has no detectable effect on recombination rate, the accumulation of deleterious mu-

tations in non-expressed cassettes is inversely proportional to the rate of environmental varia-

tions. This suggests that cassette-borne functions may evolve at a higher rate than

continuously expressed genes. The optimal mutation rate can be achieved with a distinct strat-

egy, wherein recombination events occur punctually in the whole population as a stress-

response to environmental changes. This eventuality is addressed in article 2 and 3 (see Re-

combination in integrons is controled by the SOS response to stress, p171).


This article is in preparation to be submited to PLoS One or to the Journal of Evolu-

tionary Biology (pp 154-170).

Results – Evolution of recombination rate in integrons

Article I


Title: Evolution of site-specific recombination rates in integrons

under fluctuating selection.

Cambray Guillaume1*, Chevin Luis-Miguel2* and Didier Mazel1

1 Institut Pasteur, Unité de Plasticité du Génome Bactérien, CNRS URA 2171, 75015 Paris,


2 Ecologie, Systématique & Evolution, Université Paris-Sud XI, 91400 Orsay, France

*: These authors contributed equally to this work


Integrons are complex genetic structures, which are notably responsible for most of the

antibiotics multi-resistance phenotypes that threaten our control over pathogenic bacteria. The

system maintains an array of unexpressed gene cassettes that stands as a reservoir of potential

genetic variation. Silent cassettes can be randomly recombined at an expression site by a

dedicated integrase. This establishes a switching mechanism that allows instantaneous expres-

sion of potentially adaptive functions. Here, we model the evolution of integrase-mediated re-

combination rate in a stochastically fluctuating environment. The integrase gene is a modifier

locus at which alleles can change the turnover rate of the expressed cassettes. Simulations

show that the mean recombination rate of a population would tend to fit the environmental

change rate. Cassette-borne genes are under relaxed selective constraints when not expressed.

We show that deleterious mutations tend to accumulate in the unexpressed part of the cassette

array. While this process does not affect the mutation rate in large populations, it may pro-

mote the functional diversification of cassette-encoded functions. We suggest that a stress re-

sponsive control of recombination rate may be an efficient alternative to a constitutively

determined bet-hedging strategy. This work highlights the importance of integrons as a major

bacterial adaptive system through its effect on evolvability.

Results – Evolution of recombination rate in integrons

Article I



Bacteria are one of the major successful life form (Gould, 1996). They have been iso-

lated from a wide range of natural environments, some of which quite extremes. However, di-

rect observation of natural microbial communities is uneasy and little is known about the

actual ecology of bacterial populations. Indeed, a great majority of the bacterial species

(>99%) cannot be cultivated, rendering their detection and studies only possible through me-

tagenomic approaches (Streit and Schmitz, 2004; Rusch et al., 2007). Despite the inherent dif-

ficulties in characterizing the ecological specificities of individual bacterial species, it is safe

to generally regard those organisms as essentially sessile on a macroscopic scale. As a conse-

quence they cannot track their environment in space when it changes, and hence experience a

wide variety of environmental variations, be they physical, chemical or biotic (Andrews,


The ubiquity of bacteria underlies the remarkable diversity of metabolisms developed

over evolutionary times. The ability of bacterial populations to adapt rapidly to new and ever-

changing environments has been documented in both experimental and natural conditions.

The long-term evolution experiment initiated in 1988 by R. Lenski and colleagues monitored

the adaptation of twelve replicate populations of Escherichia coli to a regime of exponential

growth in a nutrient-limited environment (Lenski et al., 1991; Lenski and Travisano, 1994).

Since then, numerous fitness-enhancing phenotypic variations occurred more or less repeat-

edly in the different populations. These variations include morphological diversification

(Lenski and Mongold, 2000; Philippe et al., 2009), topological modification of DNA (Crozat

et al., 2005), global change in gene expression (Cooper et al., 2003; Pelosi et al., 2006; Phil-

ippe et al., 2007; Cooper et al., 2008), specialization and diversification of metabolic abilities

(Cooper and Lenski, 2000; Cooper et al., 2001; Blount et al., 2008). Adaptive changes occur-

ring in natural conditions are more difficult to identify and most examples involve medically

or economically relevant pathogenic bacteria. The most striking illustration is certainly the

ever more rapid development of resistance phenotype consecutive to the introduction of new

antibiotics (Hawkey, 2008).

According to Fisher’s fundamental theorem of natural selection (Fisher, 1930), the rate

of adaptation in a population equals the heritable variance in fitness in this population. The

speed of adaptive evolution can thus theoretically be increased by increasing the variance in

fitness in the population – its evolvability. The most straightforward way to achieve this is

through a genome-wide increase in mutation rate. Mutator strains are indeed found at non-

Results – Evolution of recombination rate in integrons

Article I


negligible frequencies (0.1% to >60%) in pathogenic bacterial isolates (Denamur and Matic,

2006). Besides, a mutator phenotype reached fixation in 3 out of the 12 replicate lines per-

petuated in Lenski’s long-term evolution experiment (Sniegowski et al., 1997). These strains

are generally affected in their ability to perform mismatch repair, a major system of DNA re-

pair also involved in recombination. Particularly, mutations in the mutS and mutL genes re-

sults in up to 100-fold increase in mutation rates (Denamur and Matic, 2006).

Genes that can affect the mutation rate by their activity can be abstracted as modifier

loci which are subjected to indirect selection by hitchhiking with the mutations they contrib-

uted to generate (Kondrashov, 1995). Because most mutations are deleterious (Eyre-Walker

and Keightley, 2007), a general mutator is generally counter-selected. Nevertheless, the pro-

duction rate of advantageous mutations increases with the initial maladaptation of the popula-

tion (Silander et al., 2007; Martin and Lenormand, 2006). Hence, increased mutation rate are

more likely to be beneficial in fitness-compromising conditions. Indeed, mutator readily pro-

vides short-term adaptive advantage in new, changing or heterogeneous environments (Giraud

et al., 2001). The rise in frequency of a mutator genetically linked to a beneficial mutation is

only the consequence of adaptation, and do not constitute an adaptation in itself (Sniegowski

et al., 2000). Instead, the accumulation of deleterious mutations over time hampers the long-

term success of mutator populations (Funchain et al., 2000; Cooper and Lenski, 2000; Zeyl et

al., 2001). After adaptation occurred, a modifier locus experiences a strong selective pressure

toward lower mutation rate if the environment remains constant (De Visser, 2002). As a re-

sult, the spontaneous mutation rates observed in various microbes are surprisingly steady and

are though to be actively maintained at the lowest level afforded by the cost of fidelity (Drake

et al., 1998).

The evolution of increased evolvability through an increase in the genome-wide muta-

tion rate is thus heavily constrained by the prevalence of deleterious mutations. To circumvent

this limitation, some loci are subjected to frequent, stochastic and heritable modifications me-

diated by dedicated genetic or epigenetic mechanisms (van der Woude and Baumler, 2004;

and see Discussion). These specific mutations are easily reversible, which promotes the con-

stitutive wavering between well defined phenotypic states. Such localized increase in muta-

tion allows the combinatorial diversification of target functions while limiting the potential

deleterious effects of mutations at loci that do not need to evolve.

Integrons constitute a particularly sophisticated example of such systems. A typical in-

tegron consists of a stable platform associated with a variable array of dedicated gene-

Results – Evolution of recombination rate in integrons

Article I


cassettes. The functional platform constitutes a tightly packed locus comprising an intI gene,

coding for a site-specific recombinase, a primary recombination site attI and a promoter Pc

oriented toward attI (figure 1). The gene cassettes integrated in integron arrays are generally

composed of a single and promoterless ORF flanked by two attC recombination sites (Mazel,

2006). The integrase catalyses the recombination of cassettes through a cut-and-paste mecha-

nism whereby cassettes are randomly excised from the cassette array (attC x attC recombina-

tion) to be preferentially integrated in attI downstream of the Pc promoter (Collis et al., 1993;

Collis et al., 2001). This oriented process ensures instantaneous expression of the mobilized

cassettes, while previously integrated cassette are progressively moved away from the Pc

(Collis and Hall, 1995). Overall, only the few first cassettes in the array are expressed and

hence subjected to selection, while the others constitute a silent reservoir of potential genetic

variation (figure 1). Exogenous cassettes uptaken from the environment or brought about by

mobile elements can be incorporated in the array, thereby enriching the repertoire of available

functions (Holmes et al., 2003; Biskri et al., 2005).

Two distinct forms of integrons are generally distinguished in the literature. Mobile in-

tegrons (MI) were the first to be identified through their involvement in antibiotic multi-

resistance phenotype (Martinez and de la Cruz, 1988; Stokes and Hall, 1989). They are lo-

cated on mobile genetic elements such as ICEs, plasmids and transposons, which permit their

dissemination and potentially make them efficient shuttles for the transfer of cassette between

genomes (Biskri et al., 2005). They comprise only few cassettes (up to 8 (Naas et al., 2001b)),

which typically encode antibiotic resistance proteins (Fluit and Schmitz, 2004). Chromosomal

integrons (CI), in contrast, are essentially sedentary. They have been identified in around 10%

of bacterial genomes sequenced to date (Boucher et al., 2007). A subset of these CIs, often re-

ferred to as superintegrons, comprise large array that can span hundreds of cassettes and are

hypothesized to play a major role in the generation of cassettes (Mazel, 2006), a process

which otherwise remains unraveled. Most CI’s cassettes harbor genes of unknown functions.

Nevertheless, the functions that can be predicted are very diverse and a substantial part of it is

involved in substrate modification (acethyltransferases) or interactions with biotic factors

(virulence factors and DNA modification) (Boucher et al., 2007). Besides, 10 to 30% of the

genes potentially encode protein carrying a signal peptide region for either membrane associa-

tion or export from the cell (Koenig et al., 2008). Altogether, these data suggest that cassette-

encoded genes can mediate adaptation to wide range of environmental conditions. Both the

functional platform and the cassettes of MIs are though to derive from CIs (Mazel, 2006;

Results – Evolution of recombination rate in integrons

Article I


Labbate et al., 2009). In this light, the impressive ability of bacteria to rapidly overcome such

drastic environmental changes as those imposed by the human use of antibiotics heavily relies

on the recruitment of pre-existing integrons. This illustrates the capacity of the system to cope

with ever changing environments.

Obviously, the shuffling of gene-cassette introduces variability in the integron regard-

ing which traits are expressed or not, and this process is directly dependent on the system’s

recombination rate. The functionality of integron integrases has been demonstrated experi-

mentally over a wide range of substrates and model systems. However, the integrase was al-

ways artificially overexpressed in these studies and spontaneous recombination events have

never been observed in controlled conditions. Thorough epidemiological studies designed to

monitor the spread of multi-resistant MIs evidenced a continuum of cassette arrangements,

evidencing their effective diversification in naturae (Moura et al., 2009). Considerable vari-

ability has been observed in CIs, even between closely related bacterial species and strains

(Boucher et al., 2006). The integron locus is actually one of the most variable genomic loci, a

feature that has been used to finely resolve phylogenetic relationships between otherwise

identical isolates (Labbate et al., 2007). One of the most precise examples to date identified

numerous rearrangements between three pandemic V. cholerae strains over a one century pe-

riod. More accurate estimations of recombination dynamics would rely on the comparison of

very closely related arrays, which is difficult to achieve in practice through the sequencing of

random natural isolates. Hence, although the evolvability bestowed by integron relies on cas-

sette rearrangement, the recombination rate in these systems remains enigmatic.

To shed light on this question, we model the evolution of site-specific recombination

in an integron subjected to a fluctuating environment, which entails the need to constantly

evolve. By essence, environmental changes in nature are stochastic. Their effects on natural

selection are difficult to capture in a purely analytical model without tremendous assumptions.

To avoid oversimplification and to accurately describe the integron system, we develop a

Monte-Carlo simulations scheme incorporating a quantitative-genetic-based modeling of fit-

ness. We show that the selected recombination rate tend to fit the rate of environmental shifts.

Moreover, we highlight that mutations tend to accumulate in non-expressed cassette, particu-

larly under slowly fluctuating conditions.

Results – Evolution of recombination rate in integrons

Article I



We used Monte-Carlo simulations to model the evolution of the site-specific recombi-

nation rate in an integron, in a randomly fluctuating environment. An integron with C cas-

settes was modeled such that, for each cassette ci, its genetic value zi was drawn in a uniform

distribution of variancec². The fitness of each cassette relative to the best possible genotype

was then assigned by applying a stabilizing fitness function, such that for a genotype of value

z, its relative fitness is: 2


( ( ))

0 0( ) (1 )z t

W z W W e


The Gaussian term in this function (2


( ( ))z t


) refers to selection for an optimum genotype

whose value (t) can change in time. The term W0 (0<W0<1) is a constant representing the

basal fitness of the organism irrespective of which cassette is currently expressed. This latter

term can be viewed as the dispensability of the integron: when W0 is equal to 1, the fitness of

the individual is maximal whatever the expressed cassette in the integron, such that the inte-

gron does not improve the fitness of the individuals; in contrast when it equals 0, the fitness of

individuals varies much according to which cassette is express by the integron. For simplicity,

a single cassette in the array – the one integrated at the expression site attI – is considered to

be expressed. The remaining C-1 cassettes are silent and constitute the reservoir of genetic

variation. Besides the expressed and unexpressed cassettes, integrons bear an integrase locus,

the product of which is responsible for cassette excision and subsequent integration. This lo-

cus determine the site-specific recombination rate, and hence the turnover pace of expressed

cassettes. Mutations at this locus can potentially impact the mean cassette turnover rate, mak-

ing it a modifier of the system, just as in models of modifiers of homologous recombination,

segregation or mutation for instance (Kondrashov, 1995). Polymorphism on the recombina-

tion rate trait was allowed at this locus, such that there could be up to I different alleles, the

recombination rate of each allele inti being ri. At the beginning of each run of the simulation,

we set r1=0 (no recombination), while the recombination rate ri of each of the other alleles

was drawn randomly. Specifically, we used ri=10i where i was drawn uniformly between 1

and 4.5, so that a wide range of recombination values were explored. During the course of the

simulation, mutations were then allowed to occur at the integrase locus at equal rate µ among

all pairs of alleles. To model the effect of the integrase locus on the recombination rate, a pro-

portion ri of the expressed cassettes of individuals carrying allele inti at the integrase locus, were

replaced by other cassettes in the unexpressed pool of the same individuals at each generation.

Results – Evolution of recombination rate in integrons

Article I


Fluctuations in the environment were modeled as changes in the optimal genetic value

(t). These changes happened randomly in time at rate e. A new genetic value was drawn in a

uniform distribution of variance e²=c², such that the potential values of the optimum and

those of the actual genetic values of cassettes fully overlapped; we assumed no autocorrela-

tion, meaning that (t) was independent of (t-1). To improve computer-time efficiency in

cases where e<1/20, environment shifts were modeled as a Poisson process, such that the time

in generations between two changes was drawn in an exponential distribution of parameter e.

When mentioned, mutation was also allowed inside the cassettes at a rate . We assumed

those mutations had deleterious effects, as a consequence of either pleiotropic effects on traits

not considered in the model, or of a general decrease in the efficiency of the protein encoded

by the affected cassette. The Gaussian term in the fitness of a cassette affected by m mutations

was then multiplied by an amount (1- s)m if m < mmax, and 0 if m ≥ mmax. Practically, mmax

was set to 0 or 1 in this work to model the absence of mutations and the occurrence of drasti-

cally deleterious mutations.

To model genetic drift, we used a genotype-based framework with multinomial sam-

pling adapted from that of Tenaillon et al. (Tenaillon et al., 1999). This framework has the

caveat that it imposes prior knowledge of all possible genotypes, but it is effective in term of

computer time and efficient to model very large populations for many generations.

We aim at understanding how the genetic properties of the integron, such as the turn-

over rate of expressed cassettes, change as an adaptation to the environmental fluctuations. At

each run of simulation, a burn-in period of 10 environmental changes was let to elapse in or-

der to allow the system to reach its dynamical equilibrium, and the population was then left to

evolve for another 100 environmental changes, during which the mean recombination rate r

and the mean fitness of the population were recorded at each generation, and averaged over

the 100 environmental changes.


We developed a simulation framework to study the evolution of recombination rate in

a stochastically fluctuating environment. To provide an overview of the model behavior, we

draw the evolution of the frequency of integrase alleles in a population of integrons over a pe-

riod of time spanning 75 environmental changes in one run of simulation (figure 2A). The

population consisted in N=108 integrons and comprised I=6 different integrase alleles associ-

ated with an array of C=5 different gene cassettes, of which only one is expressed in each in-

Results – Evolution of recombination rate in integrons

Article I


dividual. All 30 possible genotypes are initially introduced in equal frequency in the popula-

tion. Environmental shifts occur stochastically according to a predefined rate. To highlight the

dynamic of integrase allele in diverse contexts, the rate of environmental change was initially

set to 10-2.5 and was decreased by a factor of 10-0.5 every 25 shifts, resulting in 3 successive

regimes of selection.

Before any environmental shift occurs, the genotype expressing the fittest cassette as-

sociated to the non-functional integrase (no recombination) is transiently favored. This allele

is strongly counter-selected by the first shift, because it does not allow the generation of di-

versity necessary to adapt to new conditions. The two alleles with highest recombination rates

were also rapidly counter-selected for the exact opposite rationale: their associated array is not

stable enough to sustain selection in steady environments. In contrast the three integrase al-

leles characterized by intermediate recombination rate rose in frequency. A burst of succes-

sive environmental variations quickly led to the drop of the allele with the lowest rate. Then,

one allele predominated in this regime, with few occasional take over by the allele with im-

mediate lower rate correlating with period of relative environmental stasis. After a lag, the

passage to the next regime of environmental change indirectly drove the fixation of this latter

allele. Similarly, the slowest fluctuating regime led to the rise of the allele with the lowest re-

combination rate, which was previously counter-selected. We also calculated the mean re-

combination rate in the population at each generation. As illustrated in figure 2B, the mean

recombination rate selected indirectly via the fitness effect of gene cassettes tends to stabilize

in each regime to fit the imposed fluctuation rate. Overall, each regime promotes either the

fixation of one specific allele, or the maintenance of a polymorphism at the integrase locus,

resulting in a relatively stable recombination rate over the long term; this steady-state recom-

bination rate changes with the speed of environmental fluctuations.

Prompted by this observation, we undertook a more systematic approach to monitor

the evolution of the mean recombination rate r over a range of different environmental change

rates e, for different values of specificity and dispensability. Overall, the results confirmed the

existence of a linear relationship between these two variables (figure 3). Each point represents

the average of 100 independent runs. In each run, different cassette values and recombination

alleles were randomly sampled and the mean recombination rate over 100 stochastic envi-

ronmental shifts was calculated. Despite the high level of noise imposed by this method, the

mean recombination rate remarkably conform to the fluctuation rate corrected by the expecta-

tion of the number of switch necessary to recombine the best cassette (figure 3). The dispen-

Results – Evolution of recombination rate in integrons

Article I


sability of the integrons, i.e. its contribution to the global fitness, does not seem to influence

this result on its own. In contrast, low cassette specificity – which is modeled by a wider

Gaussian fitness curve – results in lower recombination rates, especially at high rates of envi-

ronmental change. This can be understood as a consequence of clonal interference (Gerrish

and Lenski, 1998; De Visser et al., 1999). The simultaneous occurrence of cassette with simi-

lar effect on fitness decrease their relative selection coefficient and results in slower evolu-

tionary dynamics. Under rapid environmental change, the frequency of cassettes does not

change fast enough to allow adaptation, even when the best cassette has been reached by re-

combination. In this context, there is no advantage in increasing the mutation rate. Moreover,

it has been suggested that under very rapid environmental change, it may be advantageous to

decrease evolvability, since a genetic response in one generation often decreases adaptation in

the next generation (Kawecki, 2000). The combination of these two factors may explain why

recombination rate decreases at high rates of environmental change and weak selection, a re-

sult that has not been described in pervious models of mutators. Note also that the dispensa-

bility of cassettes does reduce the mean recombination rate under high specificity of cassettes.

In a given integron only the cassettes proximal to the Pc promoter are expressed and

thus subjected to directional selection. The remaining cassettes experience relaxed selective

pressure and may accumulate deleterious mutations, because those cannot be efficiently

purged by natural selection. This should produce a decrease in mean fitness – a genetic load –

that is different from the one directly caused by mutation itself, and more similar to the drift

load (Hartl and Taubes, 1998; Poon and Otto, 2000). We term it the silencing load. To address

the importance of this silencing load, we incorporated a rate of deleterious mutations in the

previous framework. We considered a drastic mutational model wherein a single mutation

leads to gene inactivation. We carried the same simulation as described previously, and moni-

tored the mean frequency of inactivated genes in the cassette reservoir L.

We found that L is inversely proportional to e (figure 4). Under rapidly cycling envi-

ronments (e=10-4), the non-expressed compartment had the time to accumulate up to 8% of

inactivated cassettes, irrespective of the set of parameter used. As the frequencies of environ-

mental shifts decreases, simulations ran with a low cassette specificity (σ=0.8) progressively

cumulate more deleterious mutations than their counterparts. Similarly, simulations ran with

higher dispensability (ω0=0.5) display an increased silencing load. Because both of these pa-

rameters decrease the intensity of selection, these data strongly suggest that natural selection

is involved in purging the silencing load. In rapidly fluctuating environments, cassettes ex-

Results – Evolution of recombination rate in integrons

Article I


perience environments in which they prove adaptive at a higher rate and thus tend to spend

less time in the non-expressed compartment. The silencing load clearly reflects the frequency

at which deleterious mutations are purged by selection when cassettes are put under expres-

sion in a favorable environment. Overall, the silencing load does not affect the population fit-

ness. Although heavy loads are essentially cumulated in slowly fluctuating environments,

these conditions also provide the sustained periods of stasis required for efficient purifying se-

lection. In these conditions, the dynamic of natural selection is fast enough to mediate effi-

cient adaptation. In rapidly changing environments, the efficiency of selection is reduced, but

favorable environment occur fast enough to limit the impact of deleterious mutations on fit-

ness. The introduction of deleterious mutation had no impact on the selected recombination

rate (figure 3).


To be written.

Essential points that will be raised include:

Comparison of the integron system with other loci subjected to diversity-generating

mechanisms (e.g. SSRs, gene conversion, epigenetic switches and other systems relying on

site-specific recombination). Highlight the general scope of integron with respect to these sys-


Similar relationships between the optimal rate of phenotypic switches and the rate of

environmental variations have been reported in theoretical (Kimura, 1967; Lachmann and

Jablonka, 1996; Kussell et al., 2005) and experimental (Acar et al., 2008) studies. However,

these models only consider two phenotypic states in binary environments that change with a

constant period. Discuss the advantage of this stochastic model to address the complex case of

integron without such simplification.

Discuss the control of phenotypic plasticity by recombination-mediated expression.

Contrast it with classical physiological regulation.

Discuss the bet-hedging strategy.

Discuss the benefit of stress-responsive regulation of integrase expression with respect

to constitutive regulation.

Highlight the impact of the system (silencing load) on cassette diversification.

Results – Evolution of recombination rate in integrons

Article I



Acar M, Mettetal JT, van Oudenaarden A (2008) Stochastic switching as a survival strategy in fluctuating environments. Nat Genet 40(4): 471-475. Epub 2008 Mar 2023.

Andrews JH (1998) Bacteria as modular organisms. Annual review of microbiology 52: 126.

Biskri L, Bouvier M, Guerout AM, Boisnard S, Mazel D (2005) Comparative study of class 1 integron and Vibrio cholerae superintegron integrase activities. J Bacteriol 187(5): 1740-1750.

Blount Z, Borland C, Lenski R (2008) Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proceedings of the National Academy of Sciences: 0803151105.

Boucher Y, Labbate M, Koenig JE, Stokes HW (2007) Integrons: mobilizable plat-forms that promote genetic diversity in bacteria. Trends in microbiology 15(7): 309.

Boucher Y, Nesbo C, Joss M, Robinson A, Mabbutt B et al. (2006) Recovery and evo-lutionary analysis of complete integron gene cassette arrays from Vibrio. BMC Evol Biol 6(1).

Collis CM, Grammaticopoulos G, Briton J, Stokes HW, Hall RM (1993) Site-specific insertion of gene cassettes into integrons. Molecular microbiology 9(1): 52.

Collis CM, Hall RM (1995) Expression of antibiotic resistance genes in the integrated cassettes of integrons. Antimicrob Agents Chemother 39(1): 162.

Collis CM, Recchia GD, Kim MJ, Stokes HW, Hall RM (2001) Efficiency of recom-bination reactions catalyzed by class 1 integron integrase IntI1. Journal of bacteriology 183(8): 2542.

Cooper TF, Remold SK, Lenski RE, Schneider D (2008) Expression profiles reveal parallel evolution of epistatic interactions involving the CRP regulon in Escherichia coli. PLoS Genet 4(2): e35.

Cooper TF, Rozen DE, Lenski RE (2003) Parallel changes in gene expression after 20,000 generations of evolution in Escherichiacoli. Proc Natl Acad Sci U S A 100(3): 1072-1077.

Cooper VS, Lenski RE (2000) The population genetics of ecological specialization in evolving Escherichia coli populations. Nature 407(6805): 736-739.

Cooper VS, Schneider D, Blot M, Lenski RE (2001) Mechanisms causing rapid and parallel losses of ribose catabolism in evolving populations of Escherichia coli B. J Bacteriol 183(9): 2834-2841.

Crozat E, Philippe N, Lenski RE, Geiselmann J, Schneider D (2005) Long-term ex-perimental evolution in Escherichia coli. XII. DNA topology as a key target of selection. Ge-netics 169(2): 523-532.

De Visser JAGM (2002) The fate of microbial mutators. Microbiology (Reading, Eng-land) 148(Pt 5): 1252.

De Visser JAGM, Zeyl CW, Gerrish PJ, Blanchard JL, Lenski RE (1999) Diminishing returns from mutation supply rate in asexual populations. Science 283(5400): 404-406.

Denamur E, Matic I (2006) Evolution of mutation rates in bacteria. Molecular Micro-biology 60(4): 827.

Drake J, Charlesworth B, Charlesworth D, Crow J (1998) Rates of Spontaneous Muta-tion. Genetics 148(4): 1686.

Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new muta-tions. Nat Rev Genet 8(8): 610-618.

Fisher RA (1930) The Genetical Theory of Natural Selection: Oxford University Press.

Results – Evolution of recombination rate in integrons

Article I


Fluit AC, Schmitz FJ (2004) Resistance integrons and super-integrons. Clinical micro-biology and infection: the official publication of the European Society of Clinical Microbiol-ogy and Infectious Diseases 10(4): 288.

Funchain P, Yeung A, Stewart JL, Lin R, Slupska MM et al. (2000) The consequences of growth of a mutator strain of Escherichia coli as measured by loss of function among mul-tiple gene targets and loss of fitness. Genetics 154(3): 970.

Gerrish PJ, Lenski RE (1998) The fate of competing beneficial mutations in an asexual population. Genetica 102-103(1-6): 127-144.

Giraud A, Matic I, Tenaillon O, Clara A, Radman M et al. (2001) Costs and Benefits of High Mutation Rates: Adaptive Evolution of Bacteria in the Mouse Gut. Science 291(5513): 2608.

Gould SJ (1996) Full House: The Spread of Excellence from Plato to Darwin: Three Rivers Press.

Hartl DL, Taubes CH (1998) Towards a theory of evolutionary adaptation. Genetica 102-103(1-6): 525-533.

Hawkey PM (2008) The growing burden of antimicrobial resistance. J Antimicrob Chemother 62 Suppl 1: i1-9.

Holmes AJ, Gillings MR, Nield BS, Mabbutt BC, Nevalainen KM et al. (2003) The gene cassette metagenome is a basic resource for bacterial genome evolution. Environ Micro-biol 5(5): 383-394.

Kawecki TJ (2000) The evolution of genetic canalization under fluctuating selection. Evolution 54(1): 1-12.

Kimura M (1967) On the evolutionary adjustment of spontaneous mutation rates. Genet Res 9: 23-34.

Koenig JE, Boucher Y, Charlebois RL, Nesbo C, Zhaxybayeva O et al. (2008) Inte-gron-associated gene cassettes in Halifax Harbour: assessment of a mobile gene pool in ma-rine sediments. Environ Microbiol 10(4): 1024-1038.

Kondrashov AS (1995) Modifiers Of Mutation-Selection Balance - General-Approach And The Evolution Of Mutation-Rates. Genet Res 66(1): 53-69.

Kussell E, Kishony R, Balaban NQ, Leibler S (2005) Bacterial persistence: a model of survival in changing environments. Genetics 169(4): 1807-1814. Epub 2005 Jan 1831.

Labbate M, Boucher Y, Joss MJ, Michael CA, Gillings MR et al. (2007) Use of chro-mosomal integron arrays as a phylogenetic typing system for Vibrio cholerae pandemic strains. Microbiology (Reading, England) 153(Pt 5): 1498.

Labbate M, Case RJ, Stokes HW (2009) The integron/gene cassette system: an active player in bacterial adaptation. Methods in molecular biology (Clifton, NJ) 532: 125.

Lachmann M, Jablonka E (1996) The inheritance of phenotypes: an adaptation to fluc-tuating environments. J Theor Biol 181(1): 1-9.

Lenski RE, Mongold JA (2000) Cell size, shape, and fitness in evolving populations of bacteria. Scaling in biology: Oxford University Press. pp. 221-235.

Lenski RE, Rose MR, Simpson SC, Tadler SC (1991) Long-Term Experimental Evo-lution In Escherichia-Coli.1. Adaptation And Divergence During 2,000 Generations. Am Nat 138(6): 1315-1341.

Lenski RE, Travisano M (1994) Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. Proc Natl Acad Sci U S A 91(15): 6808-6814.

Martin G, Lenormand T (2006) A general multivariate extension of Fisher's geometri-cal model and the distribution of mutation fitness effects across species. Evolution 60(5): 893-907.

Martinez E, de la Cruz F (1988) Transposon Tn21 encodes a RecA-independent site-

Results – Evolution of recombination rate in integrons

Article I


specific integration system. Molecular & general genetics: MGG 211(2): 325. Mazel D (2006) Integrons: agents of bacterial evolution. Nature Reviews Microbiol-

ogy 4(8): 620. Moura A, Soares Mr, Pereira C, Leitão N, Henriques I et al. (2009) INTEGRALL: a

database and search engine for integrons, integrases and gene cassettes. Bioinformatics (Ox-ford, England).

Naas T, Mikami Y, Imai T, Poirel L, Nordmann P (2001) Characterization of In53, a class 1 plasmid- and composite transposon-located integron of Escherichia coli which carries an unusual array of gene cassettes. J Bacteriol 183(1): 235-249.

Pelosi L, Kuhn L, Guetta D, Garin J, Geiselmann J et al. (2006) Parallel changes in global protein profiles during long-term experimental evolution in Escherichia coli. Genetics 173(4): 1851-1869.

Philippe N, Crozat E, Lenski RE, Schneider D (2007) Evolution of global regulatory networks during a long-term experiment with Escherichia coli. Bioessays 29(9): 846-860.

Philippe N, Pelosi L, Lenski RE, Schneider D (2009) Evolution of penicillin-binding protein 2 concentration and cell shape during a long-term experiment with Escherichia coli. J Bacteriol 191(3): 909-921.

Poon A, Otto SP (2000) Compensating for our load of mutations: freezing the melt-down of small populations. Evolution 54(5): 1467-1479.

Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S et al. (2007) The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5(3): e77.

Silander OK, Tenaillon O, Chao L (2007) Understanding the evolutionary fate of finite populations: the dynamics of mutational effects. PLoS Biol 5(4): e94.

Sniegowski PD, Gerrish PJ, Johnson T, Shaver A (2000) The evolution of mutation rates: separating causes from consequences. BioEssays: news and reviews in molecular, cellu-lar and developmental biology 22(12): 1066.

Sniegowski PD, Gerrish PJ, Lenski RE (1997) Evolution of high mutation rates in ex-perimental populations of E. coli. Nature 387(6634): 703-705.

Stokes HW, Hall RM (1989) A novel family of potentially mobile DNA elements en-coding site-specific gene-integration functions: integrons. Molecular microbiology 3(12): 1683.

Streit WR, Schmitz RA (2004) Metagenomics--the key to the uncultured microbes. Curr Opin Microbiol 7(5): 492-498.

Tenaillon O, Toupance B, Le Nagard H, Taddei F, Godelle B (1999) Mutators, popu-lation size, adaptive landscape and the adaptation of asexual populations of bacteria. Genetics 152(2): 485-493.

van der Woude M, Baumler A (2004) Phase and Antigenic Variation in Bacteria. Clin Microbiol Rev 17(3): 611.

Zeyl C, Mizesko M, de Visser JA (2001) Mutational meltdown in laboratory yeast populations. Evolution 55(5): 909-917.

Results – Evolution of recombination rate in integrons

Article I



Figure 1 - Schematic organization of the integron locus

Integrons forms integrated genetic systems. The intI gene encodes a site-specific tyrosine re-

combinase capable of mobilizing dedicated gene cassettes. Most gene cassettes in the array

are unexpressed. Excision of non-replicative cassette intermediate occurs through random

attC x attC recombination mediated by IntI. Such intermediates are preferentially recombined

in attI through IntI-mediated attC x attI recombination. Newly integrated cassettes are thus

directly put under expression by the Pc promoter. These specific properties enable a versatile

switching mechanism whereby recombination affects the expression of potentially adaptive


Results – Evolution of recombination rate in integrons

Article I




2 -











n ra

te in

a f



ng e




A p



n co





y 30



nt in


on g



in e








t cas


es x

6 in








in t










ic e




l va








ge r



l to




, an

d 10

-3.5 p









ue o

f 75










d in



er p


, w



te f


d do

ts h









ts. P


A s


s th

e lo

g fr



s of


6 i






r ti



el B







ng l

og o

f th

e m







e in



le p



n. T

he s






on r


is s





ed b

y th

e ra

te o

f en









n te

xt f

or d



Results – Evolution of recombination rate in integrons

Article I



Figure 3 - Impact of the environmental change rate on the selection of recombination rate

Each data point corresponds to the recombination rate averaged over 100 simulation runs. In

each run, the mean mutation rate selected over 100 environmental changes is selected. Envi-

ronmental changes occur stochastically according to a predefined rate. Filled triangles and

filled squares indicate whether deleterious mutations were allowed or not, respectively. Col-

ors distinguish different combination of dispensability ω0 and specificity σ as follow: dark

blue, W0=0 and σ=0.2; light blue, W0=0 and σ=0.8; violet, W0=0.5 and σ=0.2; and red,

W0=0.5 and σ=0.8. The black line correspond to y=(C-1).x, where C-1 is the number of unex-

pressed cassette in the array.

Results – Evolution of recombination rate in integrons

Article I


Figure 4 - Accumulation of deleterious mutations in unexpressed gene cassettes

(silencing load)

The silencing load is defined as the frequency of unexpressed cassette inactivated by deleteri-

ous mutations. Each data point corresponds to the silencing load averaged over 100 simula-

tions. In each run, the mean mutation rate selected over 100 environmental changes is

selected. Environmental changes occur stochastically according to a predefined rate. Deleteri-

ous mutations occurred at a rate of 10-6 per generation. Colors distinguish different combina-

tion of dispensability ω0 and specificity σ as follow: dark blue, W0=0 and σ=0.2; light blue,

W0=0 and σ=0.8; violet, W0=0.5 and σ=0.2; and red, W0=0.5 and σ=0.8.

Results – Recombination in integrons is controled by the SOS response to stress






Theoretical considerations strongly suggest that the optimal recombination rate in in-

tegron must match the average rate of environmental changes (see Article 1, p153). Two dis-

tinct strategies can be encompassed to implement such a relationship: i) the recombination

rate could be constitutively coded in the integrase, in which it can be slowly fine-tuned

through mutations affecting the protein activity and/or its expression level; and ii) the expres-

sion of the integrase could be responsively regulated by environmental changes. So far, the

expression pattern of integrases resisted experimental analysis (see p146), which is consistent

with the second hypothesis. We thus undertook to identify stress-responses capable of modu-

lating the expression of the integrases.


The promoter region of all intI genes deposited in GenBank were recovered and ana-

lyzed for the presence of specific sequence motifs using custom scripts, leading to the identi-

fication of LexA binding sites. To investigate the involvement of the SOS response, electro-

mobility shift assays (EMSA) were performed with material from V. cholerae N16961. The

expression of the integrase in different genetic background and stressing conditions was

monitored using LacZ reporters in V. cholerae and a class 1 integron in E. coli. We devised a

positively selectable reporter of recombination in order to further examine the link between

integrase induction and recombination rate.


We identified a LexA binding motif in the promoter region of most integron inte-

grases. The site is effectively bound by LexA in V. cholerae. The expression of the integrase

is induced by classical trigger of the SOS response, including widely used antibiotics, in both

V. cholerae and E. coli. In contrast, no induction was measured when the SOS response is im-

Results – Recombination in integrons is controled by the SOS response to stress


paired. The induction of the integrase has a neat functional impact and strongly increases the

recombination rate. By ensuring diversification in response to a wide range of environmental

challenges, the regulation of recombination rate promotes the evolvability of the organism.

This complex adaptive phenotype arises from the coupling of two simpler genetic modules

and only involves few mutations. Mapping of the LexA binding sites identified in silico to the

IntI phylogenetic tree revealed that the SOS control of recombination pervades marine spe-

cies. In contrast, this trait appears very sporadically in soil and freshwater species, suggesting

different selective pressure in these niches. Strikingly, all clinically relevant multi-resistance

integrons are subjected to SOS control, irrespective of their phylogenetic relationships. This

observation is particularly meaningful in the light of antibiotic-mediated induction of the inte-



This article has been published as a brevia in Science and concisely reports the SOS

control of recombination rate (pp 154-186)


This manuscript is in preparation to be submitted to Nucleic Acids Research. It further

discusses the implication of the coupling between the SOS and integron system and its phy-

logenetic distribution (pp 186-211).

Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article II


Results – Recombination in integrons is controled by the SOS response to stress

Article III


A manuscript to NAR

SOS control of recombination in integron is a primeval feature

Guillaume Cambray1*, Neus Sanchez-Alberola2*, Ivan Erill3*, Susana Campoy3,

Émilie Guerin4, Sandra Da Re4, Bruno Gonzales-Zorn5, Marie-Cécile Ploy4, Jordi Barbé3,

Didier Mazel1

1 Institut Pasteur, Unité de Plasticité du Génome Bactérien, CNRS URA 2171, 75015 Paris,

France 2 Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Barcelona,

Spain 3 Department of Biological Sciences, University of Maryland Baltimore County, Baltimore

21228, USA.

4 Université de Limoges, Faculté de Médecine, EA3175, INSERM, Equipe Avenir, Limoges

87000, France 5 Departamento de Sanidad Animal, Facultad de Veterinaria, Universidad Complutense de

Madrid, 28040 Madrid, Spain.

*: equal contribution

Results – Recombination in integrons is controled by the SOS response to stress

Article III



Integrons are found in the genome of hundreds of environmental bacterial species, but

are mainly known as the genetic agents responsible for the capture and spread of antibiotic re-

sistance determinants among Gram-negative pathogens. The SOS response is a regulatory

network under control of the repressor protein LexA and is targeted at repairing and bypass-

ing DNA damages, thus promoting genetic variation in time of stress. We recently reported a

direct link between the SOS response and the expression of integron integrases in Vibrio

cholerae and a plasmid-borne class 1 mobile integron. . Here we conduct a systematic study

of all integron integrase promoter regions available in genomic databases and we show that

LexA controls the expression of most integron integrases. We also provide experimental vali-

dation of integrase LexA control for another Vibrio chromosomal integron and a multi-

resistance plasmid harbouring two integrons. By mapping the distribution of predicted LexA-

binding sites onto an IntI phylogeny, we propose that SOS control arose early and was proba-

bly the ancestral state in integron evolution. Importantly, these data indicates that SOS regula-

tion has been positively selected for in mobile integrons. The coupling of both genetic

systems enhances the potential for cassette swapping and capture in cells undergoing stress

and changing conditions, while freezing the cassette arrangement in steady environments. In

agreement with this, we find a strong correlation between the lack of LexA control and inte-

grase inactivation by mutation, which suggests that unregulated integrase activity may be

deleterious. This discovery highlights the role of integrons and the SOS response as integrated

adaptive systems and will likely have important implications for antibiotic treatment policies.

Results – Recombination in integrons is controled by the SOS response to stress

Article III


Integrons are bacterial genetic elements capable of incorporating exogenous and pro-

moter-less open reading frames (ORF), referred to as gene cassettes, by site-specific recombi-

nation (Figure 1). First described in the late 1980’s in connection to the emergence of

antibiotic resistance (Stokes and Hall, 1989), integrons always contain three functional com-

ponents: an integrase gene (intI), which mediates recombination, a primary recombination site

(attI) and an outward-orientated promoter (Pc) (Mazel, 2006a). Cassette integrations mainly

occur at the attI site (Collis et al., 2002), ensuring the correct expression of mobilized cas-

settes’ genes by placing them under the control of Pc (Levesque et al., 1994). To date, two

main subsets of integrons have been described. On the one hand, mobile integrons, also re-

ferred to as resistance integrons, contain relatively few (2-8) cassettes and encode resistance

to a broad spectrum of antibiotics (Rowe-Magnus and Mazel, 2002; Fluit and Schmitz, 2004;

Partridge et al., 2009). They have been conventionally divided into five different classes ac-

cording to their intI gene sequence (Mazel, 2006a). These are typically associated with mobile

elements, such as transposons and conjugative plasmids, ensuring their dissemination across

bacterial species. They are present mostly in the Proteobacteria, but have also been reported in

other bacterial phyla, such as Gram-positive bacteria (Mazel, 2006a). On the other hand,

chromosomal integrons have been identified in the genomes of many bacterial species

(Boucher et al., 2007). Although many chromosomal integrons comprise a limited number of

cassettes (ref ACID), a subset of them – termed superintegrons (SI) – exhibits large arrays

spanning hundreds of cassettes (Mazel, 2006a). SIs have been specifically identified in the

Vibrionaceae and, to some extent, in the Xanthomonadaceae and Pseudomonadaceae (Mazel

et al., 1998; Rowe-Magnus et al., 1999; Rowe-Magnus et al., 2001; Vaisvila et al., 2001;

Rowe-Magnus et al., 2003; Gillings et al., 2005) and seem to be ancient residents of the host

genome (Rowe-Magnus et al., 2001). Contrasting with mobile integrons, most cassette in a

given SIs display recombination site (attC) that are typical of the species, suggesting that SI

Results – Recombination in integrons is controled by the SOS response to stress

Article III


harbouring bacteria are implicated in cassette genesis (Rowe-Magnus et al., 2003). Most cas-

sette-borne genes in chromosomal integrons are of unknown function (Boucher et al., 2007),

though some of them are related to existing resistance cassettes (Rowe-Magnus et al., 2002;

Melano et al., 2002; Petroni et al., 2004). While stable under laboratory conditions, superinte-

grons have been reported to be the most variable loci among V. cholerae natural isolates

(Rowe-Magnus et al., 1999; Labbate et al., 2007).

Despite the importance of integrons in the acquisition and spread of antibiotic resis-

tance determinants and – from a broader perspective – in bacterial adaptation, little was

known on the dynamics of cassette recombination. Integron integrases mediate recombination

by interacting with single-stranded (ss) attC sites present in all reported cassettes, employing

a unique site-specific recombination process (MacDonald et al., 2006; Bouvier et al.,

2005)xxx(Bouvier, submited). However, the level and control of integrase expression, which

are central to this process, remained enigmatic until recently, when we reported that expres-

sion of the integrases of the V. cholerae superintegron and of a class 1 mobile integron were

controlled by the SOS response (Guerin et al., 2009).

The SOS response is a global regulatory network governed by a repressor protein

(LexA) and principally targeted at addressing DNA damage (Walker, 1984; Erill et al., 2007).

LexA represses SOS genes by binding to highly specific binding sites present in their pro-

moter regions. In E. coli and most β- and γ-Proteobacteria these sites consist of a 16 bp long

palindromic motif (5’-CTGTatatatatACAG-3’), commonly known as LexA box (Walker,

1984). The SOS response is typically induced by the presence of single stranded DNA frag-

ments (ssDNA), which can arise from a number of environmental stresses (Aertsen and Mi-

chiels, 2006), but is normally linked to replication-fork stall due to DNA lesions. These

ssDNA fragments bind non-specifically to the universal RecA protein (Sassanfar and Roberts,

1990), enabling it to promote LexA inactivation by autocatalytic cleavage (Little, 1991) and

Results – Recombination in integrons is controled by the SOS response to stress

Article III


thus inducing the SOS response. Up to 40 genes have been shown to be directly regulated by

LexA in E. coli (Fernandez De Henestrosa et al., 2000; Courcelle et al., 2001), encoding pro-

teins to stabilize the replication fork, repair DNA, promote translesion synthesis and arrest

cell division. Following its initial description in E. coli (Walker, 1984), the SOS response has

been characterized in many other bacterial classes and phyla and LexA has been shown to

bind very different motifs in different phyla (Erill et al., 2007).

In recent years, the SOS response has been linked to clinically relevant phenotypes,

such as the activation and dissemination of virulence factors carried in bacteriophages (Kim-

mitt et al., 1999; Waldor and Friedman, 2005), transposons, pathogenicity islands and inte-

grating conjugative elements encoding antibiotic resistance genes (Erill et al., 2007; Kelley,

2006). Moreover, it has recently become established that some widely used antibiotics, such

as fluoroquinolones, trimethoprim and β-lactams are able to trigger SOS induction and are

thus able to promote the dissemination of antibiotic resistance genes (Erill et al., 2007; Kel-

ley, 2006) or the generation of resistant alleles (Cirz et al., 2005). This puts forward a positive

feedback loop that has been postulated to have important consequences for the emergence and

dissemination of antibiotic resistance (Avison, 2005). Our recent work demonstrating a direct

link between the SOS response and integrase-mediated recombination further reinforces this

line of reasoning, as it provides bacteria with an antibiotic-induced mechanism for gene ac-

quisition, functional expression and dispersal (Guerin et al., 2009). Here we expand on this

recent connection by means of a systematic study of integron integrase promoter regions. Pu-

tative LexA-binding sites are found in the majority of integron integrase promoters, suggest-

ing that the SOS response control recombination in most integrons. We provide further

experimental validation of this control in the Vibrio parahaemolyticus chromosomal integron

and in the integrons of E. coli multi-resistance plasmid pMUR (Gonzalez-Zorn et al., 2005).

The phylogenetic distribution of the LexA controlled integrases suggests that SOS control

Results – Recombination in integrons is controled by the SOS response to stress

Article III


evolved in the ancestor of chromosomal integrons, and that it has been positively maintained

in mobile integrons. We also find a correlation between the loss of LexA control and integrase

inactivation by mutation, indicating that unregulated recombination may be deleterious in

these genetic elements. The only exceptions to this rule appear to be multi-resistance mobile

integrons, in which SOS deregulation leads to the creation of a secondary cassette promoter.

We discuss these findings for the adaptive dynamics of integrons and their implications on the

antibiotic resistance genes acquisition and dissemination.


Identification of LexA-binding sites in intI promoters

We recently identified Escherichia coli-like LexA binding sites overlapping the pro-

moter of the integrase genes intI from all clinically relevant mobile integrons and intIA from

the V. cholerae SI (Figure 2A). We have shown that intI expression was indeed controlled by

the SOS response, eventually resulting in heightened rates of integrase-mediated recombina-

tion upon SOS induction (Guerin et al., 2009).

To gain insight into the general relevance of this observation, we undertook an exhaus-

tive in silico study. Using BLASTP, we identified 296 homologues of intIA in the GenBank

database and systematically searched the nucleotide sequences corresponding to their coding

region plus 501 bp upstream. We conducted independent searches for each of the described

LexA-binding motifs (Erill et al., 2007). Putative LexA-binding sites were detected in 66%

(196) of the 296 sequences (Table S1) All the identified LexA-binding sites corresponded to

the motif found in E. coli and most β/γ-Proteobacteria,. This suggests that the putative LexA

regulation of intI genes probably originated after the split of the α- and β/γ -Proteobacteria

subclasses, since the LexA-binding motif of α-Proteobacteria is markedly divergent from the

E. coli one (Tapias and Barbe, 1998). When we examined the core 16 bp of the identified E.

Results – Recombination in integrons is controled by the SOS response to stress

Article III


coli-like LexA-binding sites, 54 distinct sequences were identified (Table S2). Nonetheless,

the LexA-binding sites exhibit a high level of conservation, as reflected in their joint informa-

tion content logo (Figure 2C), which contrast with the immediately surrounding sequences.

This strongly support the functionality of these motifs. Importantly, E. coli-like LexA sites

were detected in all but one of the mobile integron classes and in almost all Vibrionaceae su-

per-integrons (Figure 2B), evidencing that putative LexA regulation of intI genes is a wide-

spread phenomenon pervading all integron divisions.

Predicted LexA-binding sites correspond to functional transcriptional control elements.

We have previously shown that LexA regulates the expression of intI in V. cholerae,

and our in-silico search identified LexA-binding sites in the promoter region of intI for all se-

quenced Vibrio species but V. fischeri (Table S1). To further assess the overall functionality

of the in silico predicted LexA-binding sites, we evaluated integrase LexA regulation in V.

parahaemolyticus ATCC 17802, which harbours a LexA-binding site upstream of its intIA

gene in a genetic context that is substantially different from the one of V. cholerae (Figure

2A). Using RT-PCR, we determined the intIA expression level in both the wild-type strain

and its lexA(Def) derivative. We found an expression ratio of 6.18, revealing a strong LexA

regulation of the intIA gene expression (Figure 3A).

In several class 1 integrons, heightened expression of the cassette genes has been

shown to rely on a secondary cassette promoter called P2, located just upstream of the intI1

gene (Figure S1). P2 is enabled by a CCC insertion that increases the distance between a -35

box sequence and a sequence resembling the -10 box consensus from 13 to 16 bp, thereby

generating a functional 70 promoter (Kim et al., 2007; Collis and Hall, 1995). In all its re-

ported instances, this CCC insertion takes place in what appears to be a disrupted LexA-

binding site. Therefore, the CCC insertion that enables P2 should simultaneously abolish inte-

grase regulation by LexA. Here we tested this hypothesis using the E. coli multi-resistant

Results – Recombination in integrons is controled by the SOS response to stress

Article III


plasmid pMUR050 (Gonzalez-Zorn et al., 2005). This plasmid provides an ideal material to

address this issue because it harbours two integrons with inactivated copies of the intI1 gene

(Figure S1). However, only one of these intI genes contains a functional LexA-binding site in

its promoter, while the other presents a CCC insertion disrupting the LexA-binding site (Fig-

ure S1). Using EMSA, we found that the CCC insertion effectively prevents LexA-binding

(Figure 3B). Furthermore, RT-PCR in WT and lexA(Def) backgrounds confirmed that LexA

regulation was only observed in the integron carrying an intact LexA-binding site, with a

strong deregulation (6.55 ratio) in the lexA(Def) background (Figure 3A). Thus, the CCC in-

sert does not only enable the secondary cassette promoter P2, but concomitantly disrupts the

LexA-binding site of the integrase promoter. Evidence of increased cassette expression due to

the CCC insert was obtained by comparing RT-PCR expression profiles for the first cassette

gene of both pMUR integrons, and this increase was found to be independent of the lexA(Def)

background (data not shown). In-silico searches for disrupted LexA-binding sites revealed 31

such instances in integrons from a wide variety of species (Table 1). Furthermore, all the

identified CCC insertions corresponded to multi-resistance mobile integrons. Together, these

results suggest that LexA regulation may be eventually lost under heavy selection to promote

higher basal levels of the antibiotic resistance transcript.

Analysis of LexA-binding sites distribution

The presence of confirmed LexA regulation in V. cholerae and V. parahaemolyticus

SIs suggested that SOS induction of intI genes probably originated very early in the evolu-

tionary history of integrons. To further explore this hypothesis, we mapped the in silico iden-

tified LexA-binding sites onto a phylogenetic tree of IntI protein sequences. The tree shown

in Figure 4 is in overall agreement with previously published IntI phylogenies (Diaz-Mejia et

al., 2008; Mazel, 2006a; Nemergut et al., 2008; Boucher et al., 2007). It distinguishes two

major ecological groups. Integrons borne by marine species form a monophyletic clade, with

the Vibrionaceae super-integrons sitting at the root of the tree. Integrons from soil and fresh-

water bacteria, on the other hand, seem to constitute a more recent branch. As has been noted

previously, the tree also put forward that multiresistance mobile integrons probably arose sev-

eral times in both ecological groups (classes 2, 4 and 5 [green panel] and classes 1 and 3 [or-

ange panel] in Figure 4).

The distribution of identified LexA boxes in Figure 4 shows that LexA regulation of

intI genes is prevalent among marine chromosomal integrons and their cognate mobile rela-

tives. Conversely, no functional LexA-binding sites can be identified in the chromosomal in-

tegrons from soil and freshwater species. Nonetheless, the mobile integrons branching off

from soil and freshwater bacteria do contain functional LexA-binding sites, hinting that LexA

regulation could have been lost in most non-marine chromosomal integrons but has been pre-

served in their related mobile counterparts.


Coupling of integrons with the SOS response

We have recently demonstrated that the SOS response regulates the expression of two

integron integrase genes, leading to heightened recombination rates upon SOS induction, both

in a class 1 mobile integron and in the V. cholerae SI (Guerin et al., 2009). The extensive in

silico search reported here shows that about two thirds of the available integron integrase se-

quences are putatively regulated by LexA, and this regulation has been confirmed here for ad-

ditional integrase genes. In hindsight, the coupling of genetic elements capable of cassette

integration with a global response to stress comes out as an elegant and powerful pairing. As

illustrated in Figure 1, integrons can be seen as stockpiling agents of genetic diversity, which

in addition, can tap into a huge and variable pool of cassettes through horizontal gene transfer

from the surrounding bacterial communities (Boucher et al., 2007; Michael et al., 2004;

Koenig et al., 2008). Nonetheless, the efficient expression of these acquired traits is highly

dependent on integrase-mediated recombination. Newly integrated cassettes sitting in the

proximal region of the integron are highly expressed by the Pc promoter, but they can be

moved to distal parts of the integron and thus progressively put away from expression by con-

secutive recombination events (Figure 1), which may also reinstate formerly acquired cas-

settes under full expression.

The SOS response comes thus as an obvious choice for regulation of integrase activ-

ity, as it is already a key component of adaptive mutagenesis in bacteria, triggering both tran-

slesion synthesis and activation of transposable elements (Bjedov et al., 2003; Ubeda et al.,

2007). Furthermore, SOS induction is carefully timed to those periods of stress in which adap-

tive mutagenesis can be particularly advantageous. In the early chromosomal integrons, where

SOS regulation apparently arose, LexA repression of the intI gene may have contributed to

integron stability by minimizing the basal expression levels of intI and thus decreasing the

rates of integrase-mediated recombination. Then again, SOS regulation would have ensured

that both the occasional cassette reordering and the acquisition of exogenous cassettes took

place at a time of need for innovation, such as in reaction to antibiotic exposure. Therefore,

regulation of integrase activity by the SOS response comes as a natural way to optimize inte-

gron-mediated adaptation without incurring in excessive integron destabilization or in the

possible toxic effects of sustained integrase expression.

Loss and persistence of integrase LexA-regulation

The phylogenetic distribution of LexA-binding sites reveals an apparent loss of LexA

regulation in several instances. LexA regulation of intI genes is clearly prevalent among ma-

rine species. Loss of LexA regulation is only observed in the SXT integrating-conjugative

element, for which SOS-dependent transfer has been reported (Kelley, 2006) and in Vibrio

fischeri. Curiously enough, in V. fischeri LexA shares its binding motif with the LuxR quo-

rum sensing regulator (Shadel et al., 1990), suggesting that LexA regulation of intI may have

been lost in this species to prevent interference with the lux regulon.

Conversely, loss of LexA regulation seems to be the norm among soil and freshwater

species harbouring chromosomal integrons. In some cases, this loss of regulation has an obvi-

ous explanation. Some families, like the Nitrosomonadaceae and the Chromatiaceae, simply

do not possess any LexA homologues, thus explaining the absence of this motif upstream of

their intI genes (Erill et al., 2007). A similar, yet less powerful argument can be made for the

Xanthomonadaceae, in which neither of the two identified LexA proteins recognizes the β/γ-

Proteobacteria LexA-binding site (Yang et al., 2002),. However, the main mechanism associ-

ated with the loss of LexA-regulation appears to be the inactivation of the integrase gene. The

majority of Xanthomonadaceae chromosomal integrases, for instance, are inactivated by di-

verse types of mutations and deletions (Gillings et al., 2005). There is also evidence that

frame-shift mutations may have inactivated most of the remaining intI genes lacking apparent

LexA regulation (Nemergut et al., 2008). Thus, it seems that many species may have opted

for inactivating their intI gene upon loss of LexA regulation, or that accidental inactivation of

intI has made LexA regulation superfluous. Both lines of reasoning strongly suggest that un-

regulated intI expression must be somehow detrimental to the cell, thereby introducing an ad-

ditional selective pressure towards the initial emergence of LexA regulation of integron

integrase genes. Further strengthening this conclusion, it is a well known informal observation

for worker in the integron field that experimental overexpression of the integrase is deleteri-

ous to the cell.

In contrast to their soil and freshwater chromosomal relatives, most class 1 and class 3

integrases are both LexA regulated and functional. This indicates that the capability of cas-

sette uptake and shuffling is a useful trait in mobile integrons, since it would allow their hosts

to express novel phenotypes in selective environments. This parallels the persistent regulation

by LexA of functional integrase genes in marine species, in which reorganization of the su-

perintegrons has been evidenced by comparative genomics (Labbate et al., 2007)(+ref feng

2008). In any case, the preservation of LexA regulation in integrons harbouring functional in-

tegrases suggests again that, if not mandatory, LexA regulation of intI genes must be overtly

beneficial for integron hosts when the product of the intI gene is a functional protein. The

only exception to this general rule appears to be deregulation mediated by a CCC insertion

that disrupts the LexA-binding site. This same CCC insert, however, enables a secondary cas-

sette promoter (P2) that enhances cassette expression, and in silico search results evidence

that this insert is only found in multi-resistance plasmids. This suggests that the detrimental

effects of unregulated intI expression trade-off with the selective pressure towards increased

expression of multi-resistance phenotypes. Nonetheless, as the pMUR case illustrates, en-

hanced cassette promoter may be maintained with a subsequent inactivation of the integrase,

in this case by a IS26 insertion.

Clinical implications of SOS-induced integrase activity

Both integrons and the SOS response have been previously singled out as elements of

clinical importance and have therefore been the focus of abundant research in the fight against

antibiotic resistance and antibacterial drug development (Weldhagen, 2004; Nijssen et al.,

2005; Cirz et al., 2005). Beyond its fundamental relevance to bacterial adaptation, serious

clinical implications emerge from the discovery of a direct link between SOS induction and

integrase activity. Since most multi-resistant Gram-negative bacteria carry mobile integrons,

this establishes a generic system for genetic interchange under control of a general stress re-

sponse shared by a large group of human and animal pathogens. In this setting, it is important

to note that integron cassettes encoding resistance to several antibiotics known to induce the

SOS response, such as trimethoprim, quinolones and β-lactams, are common today (Rowe-

Magnus and Mazel, 2002; Fonseca et al., 2008). This suggests that the indirect triggering ef-

fect of these antibiotics on the capture of resistance cassettes has been very efficient.

A less obvious consequence of integrase SOS regulation is its repercussion on antibi-

otic resistance policies. Current policies in the fight against antibiotic resistance rely largely

on the detrimental effects most resistance mechanisms inflict on bacteria, which eventually

lead to loss of resistance genes in the absence of antibiotic exposure (Andersson, 2006). Since

most cassettes are promoter-less, the most ancient cassettes (located at the distal part of the

integron) are subject to severe polar effects, leading to rare or non-existent protein products

(see Figure 1) (Collis and Hall, 1995). In this context, the incorporation of SOS regulation in

integrons puts forward a mechanism by which antibiotic-resistance genes and other useful ad-

aptations can be silently set aside, while current adaptive traits are steadily kept under expres-

sion. In time of stress, such as exposure to antibiotics, the relevant resistance cassette can be

called upon by integrase-mediated translocation, and thus selected for only when its expres-

sion is required. Furthermore, cassette’s genes temporarily relegated to distal positions in in-

tegrons may also sustain increased evolution rates, generating a substantial pool of variability

from which to draw on when the appropriate selective pressures resurface. Therefore, SOS

mediated regulation of integron integrases should be taken into account regarding the time

spans currently being considered for spontaneous loss of antibiotic resistance in restrictive use

policies and, ultimately, concerning the future development and assessment of antibiotic


In silico searches and phylogenetic analyses were made on sequences deposited in

GenBank as described previously (Abella et al., 2007; Mazel, 2006b). The electro-mobility

shift assays were performed as described before (Abella et al., 2007). The different lacZ re-

porter constructions were made by fusion at the initiation codon of the tested genes and -

galactosidase activities were measured in Miller units. The full list of strains and plasmids is

available as Table S7, and oligonucleotide primers are listed in Table S8. Full methods and

associated references are described in the supplementary text.


We thank Mike C. O’Neill for his careful reading and comments on the different ver-

sions of this manuscript. This work was supported by grants from the Ministère de la Recher-

che et de l’Enseignement supérieur, the Conseil Régional du Limousin, the Fondation pour la

Recherche médicale (FRM) and from the Institut National de la Santé et de la Recherche

Médicale (Inserm) for the Ploy lab; by the Institut Pasteur, the Centre National de la Recher-

che Scientifique (CNRS-URA 2171), the FRM and the EU (STREP CRAB, LSHM-CT-2005-

019023, and NoE EuroPathoGenomics, LSHB-CT-2005-512061), for the Mazel lab; and by

grants BFU2008-01078/BMC from the Ministerio de Ciencia e Innovación de España and

2005SGR-533 from the Generalitat de Catalunya, for the Jordi lab.

Figure 1 – Schematic organization of integrons

The functional platform of integrons is constituted by an intI gene encoding an integrase, a

cassette promoter Pc and a primary recombination site attI. The system maintains an array

that can consist in more than 200 cassettes in chromosomal superintegrons. Only the few first

cassettes are expressed by the Pc, a feature represented by the fading filling color. The rest of

the array can be seen as a reservoir of standing genetic variation. A cassette is generally con-

stituted of a promoterless ORF flanked by two recombination sites termed attCs. Cassettes

can be excised from any position in the array through attC x attC recombination mediated by

the integrase. The resulting circular intermediate can then be integrated by the integrase, pref-

erentially at attI bringing the cassette under control of Pc. Note that exogenous circular inter-

mediate can be integrated owing to the low specificity of the integrase activity, rendering the

system prone to horizontal transfer. In the present study, the integrase promoter Pint is shown

to be under the control of the SOS system, thus conditioning recombination to periods of SOS

inducing stress.

Figure 2 – In silico analysis of integrases promoter

(A) Alignment of representative promoter regions of Vibrionaceae intIA homologues. Puta-

tive LexA-binding sequences are boxed, while putative σ70 promoter elements (-35 and -10)

are underlined and the translation start site of intIA is boxed and highlighted in bold type. (B)

Representative examples of LexA-binding sites identified upstream of different integrase

genes. MI stands for mobile integron, while SI stands for superintegron and the subsequent

number (1-5) denotes integrase class. The provided accessors correspond to IntI proteins

from: E. coli pSa (AAA92752), Providencia stuartii ABR23a (ABG21674), Serratia marces-

cens AK9373 (BAA08929), V. cholerae 569B (AAC38424) and Vibrio salmonicida VS224

pRVS1 (CAC35342). (C) Sequence logos (Crooks et al., 2004) of the profile used to search

for β/γ- Proteobacteria LexA-binding sites (top) and the profile emerging from the distinct 54

located binding sites (bottom). Abbreviations for panel A are as follows: Lan, Listonella an-

guillarum; Lpe_CIP, L. pelagia CIP 102762T; Val_12G01, Vibrio alginolyticus 12G01;

Vch_N16961, V. cholerae O1 biovar Eltor str. N16961; Vha_ATCC, V. harveyi ATCC BAA-

1116; Vha_HY01, V. harveyi HY01; Vme, V. metschnikovii; Vmi, V. mimicus; Vna_CIP, V.

natriegens strain CIP 10319; Vpa, V. parahaemolyticus; Vpa_RIMD, V. parahaemolyticus

RIMD 2210633; Vsh_AK1, V. shilonii AK1; Vsp_DAT722, Vibrio sp. DAT722; Vsp_Ex25,

Vibrio sp. Ex25; Vvu_CIP754, V. vulnificus CIP 75.4; Vvu_YJ016, V. vulnificus YJ016.

Corresponding accession numbers can be found in Table S1.

Figure 3 – Electro-mobility shift assays on different intIA promoter mutants.

(A) Sequence of the V.cholerae intIA promoter region and of the different LexA box mutants

tested. The putative LexA-binding sequence is boxed in red, the putative σ70 promoter ele-

ments (-35 and -10) are framed in green and the intIA 5’ region is indicated by a grey open

frame. (B) EMSA of the different lexA box mutants in presence the V.cholerae LexA purified

protein. F, free DNA, R, retarded complex.

Figure 5 – Phylogenetic tree of intI genes.

The tree illustrates the distribution of identified and experimentally verified LexA-binding

sites. This is the best-distance neighbour-joining tree obtained using MEGA3 and was rooted

using E. coli and Thiobacillus denitrificans XerCD protein sequences as outgroup[Mazel,

2006 #159]. Bootstrap values are based on 1,000 pseudo-replicates and the scale bar indicates

the number of substitutions per site. Broken LexA-binding sites denote disrupted sites located

through in silico searches. Abbreviations are as follows: Azo, Azoarcus sp. EbN1; Dar,

Dechloromonas aromatica; Eco, E. coli; Gme, Geobacter metallireducens; Lan, Listonella

anguillarum; Lpe, Listonella pelagia; Mfl, Methylobacillus flagellatus; Neu, Nitrosomonas

europaea; Nmo, Nitrococcus mobilis; Pal, Pseudomonas alcaligenes; Pme, Pseudomonas

mendocina; Ppr, Photobacterium profundum; PstuBA, Pseudomonas stutzeri BAM; PstuQ,

Pseudomonas stutzeri Q; Rei, Reinekea sp.; Rba, Rhodopirellula baltica; Rge, Rubrivivax ge-

latinosus; Sde, Saccharophagus degradans; Sam, Shewanella amazonensis; Ssp, Shewanella

sp. MR-7; Son, Shewanella oneidensis; Spu, Shewanella putrefaciens; Tden, Treponema den-

ticola; Tde, Thiobacillus denitrificans; Vch, Vibrio cholerae; Vfi, Vibrio fischeri; Vme, Vi-

brio metschnikovii; Vmi, Vibrio mimicus; Vpa, Vibrio parahaemolyticus; Vsp, Vibrio

splendidus; Vvu, Vibrio vulnificus; Xca, Xanthomonas campestris; Xor, Xanthomonas oryzae;

Xsp, Xanthomonas sp.

Figure 4 – Impact of SOS induction on the integron system

The increase in expression levels of V. cholerae’s intIA (A) and of pAT674’s intI1 in E. coli

(B) genes upon induction of the SOS response was monitored using a β-galactosidase re-

porter. Several antibiotics were used to induce the SOS response in a WT background, as in-

dicated by the letter below the histogram’s bars (M, mitomycin; C, ciprofloxacin; T,

trimethoprim; A, ampicilin). To validate the involvement of the SOS system, induction of intI

expression was further assessed in several SOS defective mutant backgrounds. Mutants are

specified in the insets. The LexA box mutants correspond to substitution of the canonical site

CTG-N10-CAG by TAA-N10-CAG (#1) and CTC-N10-GAG (#2), in the promoter of intIA;

or by TAA-N10-CAG (#1) and CTG-N10-ACT (#2), in the promoter of intI1. The functional

impact of SOS mediated induction of intI expression on cassette recombination was measured

using a dedicated reporter. In V. cholerae recombination rate was monitored upon mitomycin

C treatment (C), while in E. coli, induction was genetically mimicked by comparison of re-

combination rate between wild type and LexA box mutant #2 (see Material and Methods).

Supplementary materials

Figure S1 – Organization of the plasmid pMUR050

(A) Schematic diagram of the pMUR050 plasmid (AY522431) showing both copies of the in-

tegron integrase (red arrows). (B) Detail of both integron integrase promoter regions. LexA-

binding sites are denoted by red rectangles. Black circles indicate the integron integrase Pint

promoter elements (-10 and -35). Green rectangles define the Pc2 promoter elements (-10 and

-35). Pc2 is a secondary cassette gene promoter activated by the disruption of the LexA-

binding site.


intI1 ant3'9 linF tnp intI1 ant3'9 qacEdelta1

Pint2Pint1Pc Pc Pc2


5’ 3’



Figure S2 – CCC insertions in the intI’s LexA-box prevent LexA-binding.

Electrophoretic mobility shift assay using E. coli purified LexA protein and the two

pMUR050 integrase promoters PintI1D1 (Pint1) and PintI1D2 (Pint2), the last presenting a CCC in-

sertion in the LexA-box (see material and methods). F: free DNA; R: retarded DNA.

Figure S3 – Schematic representation of the positively selectable IntI-mediated excision

reporter assay

(A) A cassette bearing cat (CmR) associated with a VCR is inserted between a attCaadA7 site

and the aac(6’)-Ib gene whose 3’ part is identical to its wild-type counterpart except for the

start codon, thereby preventing expression of the gene aac(6’)-Ib*. This construction allows

resistance to chloramphenicol only. (B) IntI-mediated excision of the cat-VCR cassette allow

expression a functional AAC(6’)-Ib fused in its N-terminal part with the translation of a small

attC site (in red) and the peptide from pSU38. Homologous recombination between the two

attC sites is impossible because these are different in sequence, ensuring that proper expres-

sion of the functional gene only relies on site-specific recombination. Expression of the hy-

brid aac(6’)-Ib* provides selectable resistance to tobramycin, while the cat cassette deletion

leads to the loss of Cm resistance.

The standard genetic code is redundant and its structure is not random. Irrespective of

its function, the sequence of a gene heavily constrains the genotypic and thus phenotypic

space accessible by point mutations. Particularly, synonymous codons can access different

sets of amino-acids from each other. A given protein would then reach different areas of the

phenotype landscape depending on its actual nucleotide sequence. Over evolutionary time, the

essentially neutral diversification of a coding sequence – corresponding to the exploration of

its synonymous space – would then grant access to new phenotypes. We develop a strategy to

take advantage of this observation in the framework of directed evolution. Because directed

and natural evolutions are based on the same principles, this approach also shed light on the

constraints experienced by natural sequences.


We implemented an algorithm (ELP) to output synonymous sequences with evolution-

ary perspectives as different as possible from each others and from an input sequence. ELP

was used to design a synonymous version of the aac(6’)-Ib gene, which is naturally borne by

an integron cassette and encode resistance to aminoglycoside antibiotics. The synthetic gene

was then constructed. Both versions of the gene were mutagenized by error-prone PCR. The

resulting libraries were concurrently screened for increase resistance pattern on a variety of

antibiotics. Monte-Carlo simulations were performed to assess the impact of selection on the

exploration of adaptive landscape.


The conceptual translation of alleles randomly picked from the mutant libraries before

selection demonstrated that the two gene versions effectively experience different areas of the

phenotypic space. Accordingly, distinct advantageous variants were isolated from these ver-

sions. Considering the fast development of DNA synthesis services, the incorporation of ELP-

designed synonymous sequences can thus greatly enhance the efficiency of directed evolution

experiments at little cost.

In nature, the diversification of sequences is possibly constrained by rugged adaptive

landscapes. Our simulations show that even slightly deleterious intermediates can signifi-

cantly constraints the following of evolutionary routes. In this context, selective regimes char-

acterized by the alternation of relaxed and intense selection periods – such as the one

experienced by gene cassettes – are likely to promote the exploration of neutral spaces. This

effect would favor protein diversification and evolvability at the population level.


This article has been published in PLoS Genetics (pp 214-231).

Figure S1: Relative Evolutionary Potentials of the different synonymous codons.

Ten amino-acids display synonymous codons with different evolutionary potential

from each other. These include all 6-fold degenerate amino-acids: leucine, serine and argin-

ine; five out of six 4-fold degenerate amino-acids: proline, threonine, valine, alanine and gly-

cine; and finally isoleucine and lysine. These tables show the REP for every pair of

synonymous codons corresponding to these amino-acids. We define the Relative Evolutionary

Potential of codon XXX relative to its synonymous counterpart YYY (REPXXX/YYY) as the

number of different amino-acids reachable from XXX but not from YYY through single mu-

tation. Note that the REP is not a symmetrical index.


TTA 1 1 1 2 3 TCT 0 3 3 4 4 CGT 0 3 3 4 4TTG 2 3 3 4 3 TCC 0 3 3 4 4 CGC 0 3 3 4 4CTT 3 4 0 2 3 TCA 1 1 0 3 3 CGA 1 1 0 3 3CTC 3 4 0 2 3 TCG 2 2 1 4 4 CGG 2 2 1 4 3CTA 3 4 1 1 1 AGT 4 5 5 5 0 AGA 3 3 4 4 1CTG 4 3 2 2 1 AGC 4 5 5 5 0 AGG 4 4 5 4 2


CCT 0 1 1 ACT 0 2 3 GTT 0 2 3 GCT 0 1 1 GGT 0 2 2CCC 0 1 1 ACC 0 2 3 GTC 0 2 3 GCC 0 1 1 GGC 0 2 2CCA 1 1 0 ACA 2 2 1 GTA 2 2 1 GCA 1 1 0 GGA 2 2 0CCG 1 1 0 ACG 3 3 1 GTG 3 3 1 GCG 1 1 0 GGG 3 3 1


ATT 0 3 AAA 1ATC 0 3 AAG 1ATA 3 3

Figure S2: Alignment of aacWT and aacELP sequences

While encoding identical proteins, aacWT and aacELP only share 61% identity. In this figure,

different bases are highlighted in red. Overall, 119 codons out of 184 are different between

the two sequences.

Figure S3: Number of mutations and protein space exploration

Min. number of mutations 1 2 3

From single codon 30 51 19 Average %

of aa acces-

sible From all synonymous codons 40 53 7

No amino acid shows more than four codons with different REP. Thus, at any position, a set

of four ELP-designed sequences accesses the same evolutionary landscape as do all the syn-

onymous codons corresponding to the position considered. We compared the minimum num-

ber of mutations necessary to reach the other 19 aa from either a single codon or all

synonymous codons. This figure summarizes the percentage of amino acid accessible in 1, 2

or 3 mutations averaged over the 61 single codon or the 20 sets of synonymous codons. The

use of four ELP-designed sequences, achieves a shift toward a lower number of mutations. It

drastically decreases the number of substitutions requiring three mutations by codon.











C A A A C A C G C C A G G C A T T C G A G C G A A C A C G C A G T G A T G C C T A A aac_wt C A G A C T A G A C A A G C T T T T G A A A G A A C T A G A T C G G A C G C A T G A aac_syn

aac WT aac ELP

Table S1: Oligonucleotides used in this study

Name Sequence (5' → 3')3




















1 AACelps were designed to construct the synthetic gene aacELP 2 AACmuts were used to amplify cloned genes 3 The letter p indicates phosphorylation

able S2: Properties of the mutant libraries

1 Libraries were generated by error-prone PCR from each version of the gene aac(6’)-Ib. 2 Each pool contained approximately the same number of clones, as estimated on plates be-

fore selection. 3 The mean mutation rate is 2.5 mut./kb for aacWT and 2 mut./kb for aacELP

# Size2 Mut. Rate

1 >106 0.52 >106 1.33 >106 3.1 aa

c WT


4 >106 5.21 >106 1.32 >106 0.93 >106 2.5aa

c EL


4 >106 3.2

Results – Intrinsic evolutionary potential of genes

Text S1: Modeling of the relationship between protein space exploration and library size

Let us assume that the sequence is L nucleotides long and that any modification in a fraction f

of its positions is not lethal (i.e. leads to properly folded proteins [1]).

The probability that a sequence codes for a properly folded proteins after m independent mu-

tations is:




)!()( .











The denominator is the total number of mutants bearing m mutations, while the numerator is

the number of combinations in which these mutations does not adversely affect protein func-

tion. Assuming the sequence length L is much larger than the number of introduced mutations

m (L >> m), this equation simplifies into:

mf fmP )( (2)

which is consistent with several studies [1, 2].

Let us now consider a given target optimal genotype that is k mutations away from the

reference one. Among sequences with m mutations, the probability that the k desired muta-

tions are present is:

otherwise 0 andk m if ,)|(










CmutationsmsolutionP (3)

The probability that a sequence with m mutations encodes a properly folded protein and con-

tains the k desired mutations directly stems from equation (2) and (3):

otherwise 0 andk m if ,)|( k



m!fmutationsmfoldedandsolutionP (4)

If we assume, as usual, that a library is composed of sequences with a Poisson distributed

number of mutations with mean X, then the probability to find the target sequence coding for

a properly folded protein is:













which simplifies into:





)1()|( (6)

The inverse of (6) is the mean library size required to generate one target clone.

Deriving equation (6) with respect to X gives the optimal mean mutation rate respective to

targets k mutations away


kX opt



The graph below displays the inverses of equation (6) for target variants at k=1 (red), k=2

(orange) and k=3 (yellow) mutations away from the template. We assumed a standard bacte-

rial gene length (L=1000) and a conservative proportion of non-deleterious mutations at the

DNA level (f=3/4, corresponding to 1/3 of lethal aa substitutions (Dawid et al.)). Numbers on

the left side scale are obtained by calculating the inverse of equation (6) for X equal to Xopt

from equation (7).

Results – Intrinsic evolutionary potential of genes

Article IV


The increase in required size between a library covering a mutational distance k+i and one

targeting k mutations is:










The larger the mean number of mutations, the higher the chance to recover a target further

away. However, optimal mutation rate for error-prone PCR derived libraries are predicted to

be rather low, even when subtle advantages of high mutation rate are taken into account [2].

The following graph displays equation (8) for i=1 (red) and i=2 (orange). A substantial in-

crease in library size is required to fully explore possibilities, even with a somewhat high mu-

tation rate of 4 mutations on average per gene (dotted line). As the occurrence of several

mutations in the same codon is very rare using error-prone PCR, these curves can be inter-

preted as lower bounds to the increase in library size necessary to obtain a 2 or 3 mutations in

the same codon instead of 1.

Results – Intrinsic evolutionary potential of genes

Article IV


The overall picture could have been worse if we had assumed a cumulative effect of muta-

tions: due to negative epistasis neutral mutations may become deleterious when they accumu-

late [4].

1. Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH (2005) Thermody-

namic prediction of protein neutrality. PNAS :606–611.

2. Drummond DA, Iverson BL, Georgiou G, Arnold FH (2005) Why High-error-rate Random

Mutagenesis Libraries are Enriched in Functional and Improved Proteins. J. Mol. Biology

350: 806-816.

3. Guo HH, Choe J, Loeb LA (2004) Protein tolerance to random amino acid change. PNAS

101: 9205-9210.

4. Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS (2006) Robustness-epistasis

link shapes the fitness landscape of a randomly drifting protein. Nature 444: 929.

Results – Intrinsic evolutionary potential of genes

Article IV


Data S1: Alignment of the aac(6')-Ib homologs identified by BlastP

The protein sequence AAC(6')-Ib was blasted again the NCBI nr protein database as of

2007/08/26. Corresponding nucleotide sequences were fetched, sorted and aligned using a

dedicated BioPerl script.

The file can be downloaded on the PLoS Genetics server





Discussion – Integrons are powerful adaptive systems


Integrons are mostly known for their involvement in the emergence of multi-resistance

to antibiotics in gram-negative and – to a lesser extent – gram-positive bacteria. However,

chromosomally borne integrons associated with arrays containing 0 to >200 cassettes have

been identified in ca. 10% of all sequenced bacteria, and have been detected in a variety of

environments. Although most of the genes present in these cassettes are of unknown func-

tions, their mere existence underscores their functional value as accessory factors. The rapid

success of multi-resistance integrons in overcoming the human-imposed antibiotic selective

pressure reflects the co-option of ancient chromosomal systems by mobile elements, and pro-

vides the best illustration of their adaptive potential. Nevertheless, the propensity of integrons

to recombine cassettes – a very specific type of mutation – was unknown. The discovery that

recombination is controlled by the SOS response in most integrons sheds new light on this

problem and provides the opportunity to highlight the idiosyncrasies of the system with re-

spect to other diversity-generating mechanisms. Interestingly, ssDNA is a central component

of both the SOS and integron systems. Beyond its impact on evolvability, the coupling of

these systems may thus have a profound mechanistic significance. Overall, these observations

may have important medical implications. The two themes developed in this work address the

evolvability of biological systems. From a biotechnological perspective, the results presented

above can be used to increase the generation of diversity at different levels of organization.


I.1. The expression of gene cassettes

I.1.1. Coupling between recombination and expression

Two characteristics are essential to our understanding of the integron system: i) the Pc

promoter located upstream of the attI recombination point is the only element responsible for

the consistent expression of cassette-borne genes (Stokes and Hall, 1989; Levesque et al.,

1994; Bunny et al., 1995; Collis and Hall, 1995); and ii) the excision of cassettes through attC

x attC recombinations occurs randomly within the cassette array, while subsequent reintegra-

tions preferentially involve attI x attC recombination events (Collis et al., 1993; Collis et al.,

2001). Taken together, these two features straightforwardly couples recombination with varia-

The expression of gene cassettes – Coupling between recombination and expression


tion in cassette expression. Indeed, only the few proximal cassettes of a given array are sub-

jected to expression, while the remaining cassettes are kept silent (Collis and Hall, 1995).

The mobilization of a gene cassette can lead to three different outcomes: i) the expres-

sion of a fitness-enhancing trait, in which case the recombination is positively selected; the

expression of a trait irrelevant to the current environment, which can either prove: ii) nearly

neutral, if none of the previously expressed cassettes are adaptive; or iii) deleterious, if the

event brings previously integrated adaptive cassettes away from expression, in which case it

would be counter selected. As evidenced by the model presented in Article I (see Evolution of

recombination rate in integrons, p154), this behavior enables a strong link between integron-

mediated fitness and recombination rate, thereby allowing second-order selection to operate

on this latter modifier trait. To some extent, this model would still hold if cassettes were re-

integrated randomly in the array. However, higher recombination rates would then be required

– with all the drawbacks this is implying (see below). In contrast, the conditional expression

of cassettes according to their position in the array is a strictly required feature in this frame-


A genome-wide expression profiling study reported a slight pattern of differential ex-

pression for a hundred of cassette-borne genes in hapR, rpoS and rpoN mutants with respect

to wild-type (Yildiz et al., 2004). Decreased expression ratios averaging 0.74 and 0.66 were

observed in the hapR and rpoS backgrounds respectively, while increased ratios averaging

1.68 were measured in an rpoN mutant. Although these data suggest that the typical attC sites

of V. cholerae (VCR) can act as cryptic promoters, they must be interpreted with caution. In-

deed, these experiments were not specifically undertaken to monitor cassette expression and

the microarray used in this work was designed from the publically available annotation of the

V. cholerae genome (Heidelberg et al., 2000). Now, the nucleotide composition of the genes

found in cassettes is generally at odd with the rest of the genome – a phenomenon that under-

pin their exogenous origin. Consequently, gene identification algorithms perform particularly

badly in integrons, and several misannotated ORFs overlap a VCR in the released annotation.

The probe designed for these genes may thus hybridize with the VCR-containing transcripts

that originate from the Pc, blurring the signal. Besides, only log-ratios are presented in this

publication while absolute expression values are not available. Faint expression increases may

result from the diversification of the integron through induction of IntI-mediated recombina-

tion in mutant backgrounds. Despite their weaknesses, the biological implications of these ob-

servations are interesting. HapR is part of the quorum-sensing system of V. cholerae and is

induced at high cell densities, while RpoS orchestrates a general stress response. Thus, cas-

Discussion – Integrons are powerful adaptive systems


sette expression might be slightly activated in crowded and stressed bacterial communities

that contain non-replicating organisms. The design of accurate experiments using a dedicated

microarray would be an interesting project to confirm these results, and further assay the ef-

fect of various conditions on cassette expression.

Some cassettes exceptionally carry their own promoter (Bissonnette et al., 1991;

Stokes and Hall, 1991; Naas et al., 2001b; Biskri and Mazel, 2003). Such promoters may also

control the expression of downstream cassettes – thereby providing an alternative to the Pc

promoter. The Toxin-Antitoxin cassettes that are frequently found in large chromosomal inte-

grons also harbor a functional promoter (Szekeres et al., 2007). However, these genes are

generally oriented divergently with respect to canonical cassettes so that they cannot directly

influence the expression of neighboring cassettes. Although no data support this hypothesis,

small cassettes that are unlikely to encode proteins might constitute floatting promoters. Be-

cause they would not benefit from being recombined in attI, all such cassettes do not really fit

within our model: attC x attC integrations being rare events, the generation of variability

downstream of floating promoters would then rely on cassette excision.

Clearly, these unusual cassettes short-circuit the normal functioning of the system.

However, they seem anecdotical enough not to challenge our working model. If all cassettes

were to harbor their own promoter, the role of integrons would be restricted to the facilitated

collection of constitutively expressed traits. As highlighted below, the maintenance of an un-

expressed reservoir of cassette endows the system with valuable properties – especially when

associated with the SOS response.

I.1.2. The integron system mimics inducible promoters

As discussed previously (see Genetic versus physiological changes, p100), physiologi-

cal regulation allows responsive, targeted and non-heritable modification of the phenotype.

This process relies on the molecular sensing of environmental cues (either internal or exter-

nal) and their subsequent transduction into functional effects. The evolution and maintenance

of such fine-tuned machineries is slow and costly – and may even not be possible in most

cases. Nevertheless, the modification of expression pattern is a major source of phenotypic

novelty (Gerhart and Kirschner, 2007). Several mechanisms, including slipped strand mispair-

ing (p61); gene conversion (p71); site-specific recombination (p71) and epigenetic (p89) have

evolved to facilitate the constitutive alteration of gene expression.

The expression of gene cassettes – Increased rate of cassette evolution


Most of theses systems are limited in the range of phenotypes they can confer. In the

simplest cases, they implement a stochastic switch between two genotypes – and hence two

phenotypes –, which enables a bet-hedging strategy (see p102). In few instances, the switch-

ing frequency has been shown to be modulated by the environment (see p110). This last situa-

tion tends to ressemble a context of physiological regulation. Although it can generate of huge

number of slightly different proteins and ensure their mutually exclusive expression, the

mechanism of segmental gene conversion is focused on a single trait (generally surface anti-

gens or immune system, see p71). Some genetic shufflons can access a ten of interdependent

phenotypes using site-specific recombination (see p83).

In contrast, integrons provide a standardized system to switch the expression of an in-

definite number of functionally independent traits. To some extent, the coupling with the SOS

response enables a complex behavior that mimics the function of a multitude of inducible

promoters. Instead of controlling each cassette-borne gene by specialized stress-responsive

promoters, the system maintains a single constitutive promoter and a global stress signal –

DNA damage – is used to switch expression through a site-specific recombination mecha-

nism. This setup is simply mediated by the connection to an existing regulatory network, and

is thus very economic for the cell. At the difference of inducible promoters, changes in cas-

sette expression rely on modifications of the genome structure, and are thus heritable. Be-

sides, if switching a cassette ON is pretty straightforward, driving an expressed cassette to the

OFF state is uneasy and requires the excision – and probable loss – of the cassette, or the suc-

cessive occurrence of several cassette insertions.

I.1.3. Increased rate of cassette evolution

Overall, the time spent under selection by a gene cassette depends on the occurrence of

environments in which it confers an adaptive phenotype. This is generally true for any induc-

ible gene – and to a lesser extent for any accessory gene. Continuous cycles of effective and

relaxed selection are known to increase the rate of protein evolution. As shown in Article I

(see p154), non-expressed cassettes tend to accumulate mutations. As exemplified by the

common insertion of ISs in cassette arrays, many of these mutations are deleterious. However,

occasional selective episodes can purge this deleterious load and only maintain those variants

that retained function. Overall, this process widens the exploration of neutral phenotypic

spaces, which would increase gene evolvability according to a similar principle as the one de-

veloped in Article IV (see p214). Because cassettes are exposed to varying expression

Discussion – Integrons are powerful adaptive systems


strength depending on their proximity to the Pc promoter, this effect is probably more effi-

cient than in a binary ON-OFF switch (Bershtein and Tawfik, 2008). Besides, the recombina-

tion mechanism characteristic of integrons is likely to results in frequent duplication of

cassettes (see pp 129 and 146). Such cassettes may experience a pattern of concerted evolu-

tion comparable to the one driving diversification through segmental gene conversion (see


I.2. Responsive and oriented mutagenesis

I.2.1. Responsive versus constant mutation rates

The regulatory switch afforded by the integron system essentially enables a bet-

hedging strategy (see Bet-hedging, p103). Accordingly, the optimal recombination rate is ex-

pected to equal the mean rate of environmental change (see Article I, p154). Such a relation-

ship may be achieved by fine-tuning the constitutive activity of integrase alleles. According to

this strategy, inadequate variants are produced at a loss in steady environments. This repre-

sents a variability cost at the level of a clonal population. The second figure of Article I show

that longer periods of environmental stasis entail the rise in frequency of the intI null allele,

abolishing recombination (see p168). This probably accounts for the inactivation of roughly

one third of the integrases alleles in natural integrons (Nemergut et al., 2008). In addition, en-

vironmental shifts are not strictly periodic, and episodes with frequent disturbances may al-

ternate with sustained stasis. Rapid modification of recombination rate would be required to

meet the needs imposed by such a variance. It might be that specific processes have been se-

lected to facilitate the ON-OFF switching of the integrase gene. In class 2 integron, the intI2

gene is inactivated but different array arrangements have been isolated, demonstrating that re-

combination can occur. While this probably involves occasional cross-talks with other IntIs

(Hansson et al., 2002), it is noteworthy that functional intI2 alleles have been reported. The

gene might then be subjected to a mechanism facilitating its reversible inactivation.

As discussed in the introduction, the mechanisms mediating stress-induced mutagene-

sis can be viewed as refinements to constitutive mutator phenotypes (see p47). In the same

light, the branching of integrons onto the SOS response permits a faster and more appropriate

response, whereby the fitness costs of untimely recombination are decreased. Interestingly, if

all individuals in the population recombined precisely at environmental shifts, the recombina-

Responsive and oriented mutagenesis – Integron and adaptive mutagenesis


tion rate integrated over a large period still equates the rate of environmental changes.

An unknown parameter in the integron system is the probability of cassette reinsertion

following excision. Significant loss of excised cassettes would impoverish the available reser-

voir of standing genetic variations, thereby severely limiting the adaptive potential of the sys-

tem. Excised cassettes exhibit a more stable attC sites which may ensure higher

recombinogenic potential (see p129). Cautiously designed experiments based on the excision

assay developed in Article II (see p173) would permit to shed light on this property. In any

case, the regulation of integrase potentially limits the loss of cassettes. In support to this point,

it seems that integrons lacking SOS regulation tend to have shorter arrays (unpublished ob-


Globally, bet-hedging strategies correspond to the pre-emptive generation of pheno-

typic variations – which may permit a fraction of the population to survive sudden and harsh

changes. In integrons, the resolution of recombination intermediates may require a round of

genomic replication (see Figure 33, p130). This mechanism may not be fast enough upon ex-

posure to severe bactericidal or bacteriostatic stresses. Besides and as general as the SOS re-

sponse may be, the range of inducing conditions is necessarily limited (but see Single

stranded DNA: a bridge between two systems, p241). Some challenging environments may

not be able to trigger cassette shuffling, limiting the advantages of responsive mutability. In

contrast, the continuous diversification driven by constitutive recombination rates ensures that

the population is poised to adapt to any situations that can be dealt with the available cas-

settes. The SOS response is induced by a factor >20-fold in ca. 0.3% of exponentially grow-

ing E. coli cells – a phenomenon that probably reflects the occurrence of spontaneous DNA

damages (McCool et al., 2004). This subpopulation probably experience increased recombi-

nation rates following derepression of the integrase. To some extent, by allowing the coexis-

tence of pre-emptive and responsive strategies, the SOS regulation thus combines the best of

two worlds.

I.2.2. Integron and adaptive mutagenesis

As compared to other systems enabling programmed genetic variations, integrons

bring together a unique set of features that turn it into a potent adaptive system.

(i) The functional platform constitutes of tightly packed locus – which ensures maximal ge-

netic linkage between recombinational mutations and integrase alleles. This feature is essen-

tial for efficient second-order selection (Tenaillon et al., 2000). However, the genetic link

Discussion – Integrons are powerful adaptive systems


with the silent part of the cassette array is inherently loose. In some genomes, the cassette ar-

ray is even split between several locations, e.g. Saccharophagus degradans 2-40T (Weiner et

al., 2008) and Vibrio splendidus LGP32 (Le Roux et al., 2009). Cassettes are prone to IntI-

mediated excision and may not be reintegrated. Furthermore, the homogeneity of attC sites

may also favor cassette loss by homologous recombination or replication slippage in superin-

tegrons. Nonetheless, the TA-harboring cassettes exert a stabilizing influence on the sur-

rounding cassette and may play a significant role in limiting the loss of silent cassette

(Szekeres et al., 2007).

(ii) Gene cassettes essentially correspond to modular and independent units of genetic infor-

mation. Cassette recombination allows instantaneous expression of single and well-defined

traits. Alhough the recombination process is random, the type of genetic mutation afforded by

integron is definitely oriented in the phenotype space. In this way, integrons shift the stochas-

ticity of the mutational process from the level of DNA sequence to that of functional gene.

(iii) Silent gene cassettes constitute a source of standing genetic diversity that can be mobi-

lized through site specific recombination, just like meiotic sex can brought about new adap-

tive alleles from preexisting mutations by homologous recombination (see pp 29 and 71).

Contrasting with genome-wide mutagenesis, integrons rely on a preexisting set of possible

variations. Nevertheless, gene cassettes constitute an extremely vast and diverse metagenome

in which integrons can tap through HGT (see p143). At least some species must encode the

machinery to manufacture new gene cassettes in their respective genomes. Overall, integrons

can thus access an extent and potentially limitless amount of genetic diversity.

(iv) Most of the cassettes that are present in a given array must have been initially recombined

at attI, expressed and selected accordingly. Integrons thus provide a kind of long lasting ge-

netic memory, whereby cassette that previously proved adaptive are stockpiled and can effi-

ciently be mobilized when past selective conditions are renewed.

(v) The coupling of integrons with the SOS response enables a temporal regulation of recom-

bination rate. The system thus combines both spatial and structural confinement of mutagene-

sis, which profoundly limits the cost of increased mutagenesis. Overall, integrons constitute

the perfect embodiment of a Lamarckian process (see Appendix, p260).

I.2.3. A clear case of stress-induced mutagenesis

All documented examples of stress-induced mutagenesis arise as side-effects of DNA

repair mechanisms (Redfield, 2001; Matic et al., 2004; Tenaillon et al., 2004; Roth et al.,

A deep connection with SOS triggers – Single stranded DNA: a bridge between two systems


2006; Galhardo et al., 2007). Proximal benefits of increased survival and distal effects on

evolvability are thus intricately linked – and it is very difficult to disentangle which of these

two phenomena is actually selected for. While evolvability is a complex trait that arises from

second-order selection, immediate survival is the matter of first-order selection. Now, the sci-

entific method generally favors the most parsimonious and straightforward explanation – a

rule known as Ockham’s razor. Stress-induced mutagenesis should be regarded as a by-

product of desperate effort to survive that incidentally increases evolvability (for further dis-

cussion see Evolvability, p111). A few of the systems that are specialized in the generation of

targeted variability seem to be controlled by environmental cues (see Targeted mutagenesis

can be regulated, p110). Although these constitute cases of stress-induce mutagenesis, they

are clearly confined to a very limited repertoire of phenotypes. Hence, their general contribu-

tion to adaptation remains largely anecdotic.

In this context, the incorporation of IntI to the SOS regulon provides a clear-cut exam-

ple of stress-induced mutagenesis. As discussed above, the broad evolutionary potential of in-

tegrons is guaranteed by the wide repertoire of genetic diversity that is potentially available in

the cassette metagenome. To the best of our knowledge, there is no short-term advantage in

increasing the recombination rate in time of SOS-inducing stress, aside from improved capac-

ity of adaptation. Therefore, we are compelled to admit that the coupling with the SOS re-

sponse is driven by increased evolvability. The integron case thus constitutes a strong line of

evidence advocating the advantageous evolutionary repercussions of stress-induced


I.3. A deep connection with SOS triggers

I.3.1. Single stranded DNA: a bridge between two systems

The recombination mechanism exhibited by the integron integrases is atypical with re-

spect to other tyrosine-recombinase (Grainge and Jayaram, 1999). Indeed, attCs sites are only

recognized and processed as folded ssDNA by the integrase (Bouvier et al., 2005; MacDonald

et al., 2006). This interaction is mediated by a characteristic functional domain (Messier and

Roy, 2001; MacDonald et al., 2006). Cassettes are consequently mobilized in a single

stranded form, and their insertion at the double stranded attI site may involve accessory host

factors – such as the replication machinery (Bouvier et al., 2005; and see Figure 33, p130).

Discussion – Integrons are powerful adaptive systems


The exact pressure that drove the evolution of such an idiosyncratic process remains elusive.

Interestingly, ssDNA is also the central trigger of the SOS response. The combination of inte-

grons with this particular stress-response puts forward a system whereby ssDNA is both the

trigger and the substrate of recombination. The phylogenetic mapping of SOS-controlled inte-

grases suggests that the association between the two systems is an ancient trait – if not ances-

tral (see Article III, p186). Although the dual role of ssDNA might be purely contingent, it

may also offer a mechanistic coupling that benefits the system and have been selected accord-


Stalling of a replication fork in an integron array potentially provides a RecA-coated

nucleofilament, leading to a local depletion of LexA and subsequent integrase expression. The

integrase may then readily access the structured attC sites located on the available ssDNA.

Besides, restart of the replication fork may help resolving recombination intermediates. In this

perspective, an interesting experiment would consist in monitoring the recombination of a re-

porter cassette upon generation of DNA damages in the integron of V. cholerae N16961. The

introduction of homing endonuclease restriction sites may be used to induce DSB at various

locations (Ponder et al., 2005) – e.g. near intI, in the middle of the cassette array and at an un-

related locus in the genome. Alternatively, a targeted lesion (e.g. G-AAF adduct) can be in-

troduced on either strands of a plasmid to uncouple synthesis of the leading and lagging

strands, which results in the accumulation of ssDNA (Pagès and Fuchs, 2003). With this

method, each strand can be monitored separately, which may be convenient because only

structured attC from the bottom strand are recombinogenic (Bouvier et al., 2005). In large

chromosomal arrays, the high density of secondary structures due to the extrusion of attC

sites may promote stalling of replication forks (Bichara et al., 2006), thereby favoring the

production of ssDNA. This process may also lead to gene conversion between similar attC

sites or duplicated cassettes.

As exemplified by the gathering of antibiotic resistance cassettes from various chro-

mosomal integrons by mobile integrons, the horizontal transfer of gene cassettes is an essen-

tial source of genetic diversity in the integron system (see p142). Both conjugation and

transformation involve the entry of ssDNA in the cell. Several lines of evidences suggest that

conjugation can indeed trigger the SOS response. Mating between S. typhimurium Hfr and E.

coli F- is known to strongly induce the response. However, most of this effect is probably due

to sequence divergence between the two genomes, because intraspecific mating only slightly

induces the response (Matic et al., 1995). Another hint comes from the presence of the psiB

anti-SOS gene in some conjugative plasmids (Bagdasarian et al., 1986; Golub et al., 1988;

A deep connection with SOS triggers – Potential SOS triggers relevant to integrons


Bailone et al., 1988). This gene is specifically expressed upon entry in the recipient cell – a

fetures that involves the structuration of the incoming ssDNA molecule to form a promoter

(Jones et al., 1992; Bates et al., 1999). We are currently showing in the lab that conjugation is

indeed able to significantly induce the SOS system (Z. Baharoglu, unpublished results). In

this context, the conjugation of an integron-containing plasmid would trigger the expression

of its own and/or a resident IntI, thereby promoting cassettes swapping between mobile and

chromosomal integrons. Likewise, massive transformation may similarly favor the acquisition

of new cassettes by inducing the expression of intI genes. This may be particularily signifi-

cant in Vibrionales, which become competent in the presence of chitin in crowded environ-

ments (Meibom et al., 2005; Bartlett and Azam, 2005; Miller et al., 2007b; and see p60).

Preliminary experiments carried out in the lab show that natural transformation of competent

V. cholerae cells indeed induces the expression of the integrase (Z. Baharoglu, unpublished


I.3.2. Potential SOS triggers relevant to integrons

The functional and mechanistic link between SOS induction and integron recombina-

tion sheds new light onto essential aspect of integrons’ biology. As mentioned above, access

to the cassette metagenome is probably potentiated by the integrated role of ssDNA as both a

trigger and substrate of recombination. Aside from this far reaching example, several other

conditions may favor the induction of the SOS response in adequate conditions.

As reported in Article II, several antibiotics trigger integrase expression by eliciting

the SOS response (see p172). This effect is particularly interesting because cassette harboring

resistance determinant against these antibiotics have been isolated. Indeed, gene cassettes

providing resistance to fluoroquinolones (Robicsek et al., 2006b; Fonseca et al., 2008) and

trimethoprime (Le Roux et al., 2009) have been repeatedly reported. This mechanism is

highly relevant to the spread of antibiotic resistance via mobile integrons (see below

Implications for health, p246).

By leaving gaps in the donor site, the excision of some transposons may induce the

SOS response (Roberts and Kleckner, 1988; Lane et al., 1994). Specifically, this phenomenon

has been demonstrated for Tn7 in which class 2 integrons are embedded (Stellwagen and

Craig, 1997). Moreover, as the mobility of some transposons is also SOS-promoted (see p80),

the SOS response might coordinate the exchange of cassettes with the mobilization of genetic

shuttles to spread integrons between individuals.

Discussion – Integrons are powerful adaptive systems


The chromosomal integron of V. cholerae contains numerous TA-harboring cassettes.

TA modules are commonly found in plasmids – where they are though to ensure correct seg-

regation of the replicons. Likewise, the addictive properties of TA cassettes have been shown

to stabilize the cassette arrays (Szekeres et al., 2007). The observation of TA systems at stable

genomic locations in E. coli led to the suggestion that these may be implicated in pro-

grammed-cell death (Aizenman et al., 1996; Hazan et al., 2004) or cell growth arrest (Gerdes

et al., 2005) – thereby mediating cooperation in time of stress. Although the validity of these

results have been questioned (Tsilibaris et al., 2007), TA cassettes may also serve to relay

stress signals to the integron system in order to promote recombination. In this respect, the

ccdBA module found in several integron-containing bacteria (Rowe-Magnus et al., 2003; un-

published observations) is particularly interesting. Indeed, the stable CcdB toxin induces the

SOS response by poisoning the DNA gyrase – just as quinolone antibiotics do (Karoui et al.,

1983; Aertsen and Michiels, 2006). Under standard growth conditions, this effect is thwarted

by the unstable CcdA antitoxin. Stresses impacting the steady production of antitoxin may

thus indirectly elicit the SOS response, widening the range of triggering conditions.

I.3.3. SOS-controlled accessory factors?

As mentioned in the introduction, it is possible that additional host-factors are required

in some integrons to reach high recombination frequency (Biskri et al., 2005; and see p131).

Furthermore, the generation of gene cassettes must rely on unidentified determinant (see

p141). Given the deep association between the SOS and integrons systems, the incorporation

of such host factors to the LexA regulon would be biologically relevant. Various DNA proc-

essing proteins have been tested in the lab for their impact on recombination in E. coli using a

standardized assay, whereby an attC-containing suicide plasmid is delivered by conjugation to

a recipient cell harboring attI. Several of these proteins are encoded by gene pertaining to the

SOS regulon. No effect were observed in recA, uvrD, ruvB, ruvC and ssb background while a

slight decrease of recombination was observed when ruvA was inactivated (C. Loot, unpub-

lished results). Apparently, known members of the SOS response thus play little role in the

recombination process. By crossing the results of a microarray experiment carried after expo-

sure to UVs (M. Waldor, unpublished) with in silico data from a whole genome search for

LexA binding-sites, we identified potential members of the V. cholerae N19661 SOS regulon

that have no counterpart in E. coli. These genes constitute prominent candidates, but their in-

fluences on recombination have not been tested yet. Because the mechanism of cassette gen-

Are integron really successful? – SOS-controlled accessory factors?


eration is totally unknown, no assay has been developed to assess this phenotype – rendering

any effort to unravel this process particularily difficult to set up.

I.4. Are integron really successful?

The data presented so far strongly advocate the advantages of the integron adaptive

system. If integrons are so powerful, however, why are they not a basic genetic component

common to all microorganisms? While integrons are present in ca. 10% of all sequenced bac-

terial genomes (Boucher et al., 2007), their phylogenetic distribution is globally restricted to

characteristic taxa. In addition, the mapping of IntI is very spotty in some clades, such as the

Shewanella (Nemergut et al., 2008). Integrons may then be regarded as a declining system

that has been revived few decades ago owing to the antibiotic selective pressure. In this con-

text, what kind of forces would drive the apparent loss of integron?

The success of an integron principally lies in the availability of diverse gene cassettes.

Integrons with large cassette arrays harboring a specific attC signature seem to be responsible

for the primary assembly of gene cassettes – and may largely prove self-sufficient in this re-

spect (see p141). The IntI phylogeny is congruent with the organismal phylogeny in most

clades, indicating that integrons are maintained for a long time. However, some species ap-

pear to have acquired an integron more recently through HGT (see p135). As the genetic

neighborhood of chromosomal integrons is not conserved, the factors required for cassette

generation are unlikely to be transferred with the functional platform. Then, some chromoso-

mal integrons would essentially depend on exogenous sources of cassettes in which they can

tap through transformation or via mobile integrons. Such sources may be unavailable or lim-

ited in particular ecological niches (see p143), thereby leading to the decay and eventual loss

of the useless integron system.

Integrons may also prove directly deleterious in some cases. While the occasional in-

corporation of toxic cassettes can clearly impose a cost to the population, it only affects a sub-

set of individuals at the same time – which is most probably not sufficient to drive the

systematic counter-selection of the whole system. In contrast, overexpression of the integrase

greatly impacts the growth rate of a population by causing cell death (mazel lab, unpublished

observations). Although the exact mechanism underlying this phenomenon is unknown, one

may venture that integrases loses their site-specificity at high concentration and cause irre-

coverable DNA damages. In this perspective, even weak expression of the integrase may exert

Discussion – Implications for health


a deleterious pressure if constitutive, eventually leading to the loss of the integron in the ab-

sence of counterbalancing advantageous effects. This putative effect provides an additional

explanation for the frequent inactivation of integrases and the emergence of SOS regulation.


Host-pathogen interactions typically favor an explosive mechanism of evolution

whereby the constant generation of genetic novelty is required to keep up with competitors

(Van Valen, 1973; Dawkins, 1986). Hence, it is not surprising that most traits affected by

programmed mechanisms of genetic variation are involved in biotic interactions. While mi-

crobes use various processes of phase and antigenic variation to modify their surface antigens

and evade the host immune system, eukaryotic hosts use very same kinds of mechanisms to

diversify and refine their antibody repertoire. Even prokaryotes developed the complex

CRISPR system that allows them to acquire exogenous sequences to fight phage infections

(Sorek et al., 2008). A substantial fraction of integrons cassettes seem to code for functions

involved in the interaction with biotic factors (see p123). However, integrons are implicated

in the expression of a wide variety of accessory factors rather than the stealth evasion of the

immune system. Although some integron cassettes have been involved in virulence (Labbate

et al., 2009), the clinical importance of integrons rather lies in their involvement in multi-

resistance to antibiotics (Fluit and Schmitz, 2004; Partridge et al., 2009).

The chromosomal origin of mobile integrons and their associated cassettes is now well

established (see Chromosomal integron as the source of mobile integrons, p135). However,

the evolutionary dynamic of these elements was largely unknown. The discovery that inte-

grons are governed by an almost ubiquitous stress response carries serious clinical implica-

tions. Indeed, the integrases of the three clinically relevant multi-resistant integrons are

controlled by the SOS response. Yet, IntI1 and intI3 branch to a clade in which SOS-

regulation is mostly absent. This pattern strongly suggests that the SOS control is a particu-

larly advantageous feature in the context of antibiotic resistance (see Article III, p186).

Several studies recently highlighted the profund involvement of the SOS response in

the evolution of pathogenic bacteria (see pp 50 and 54). Particularly, a functional SOS system

was shown to be required for the rise of clones resistant to fluoroquinolones and rifamycin in

a mouse model (Cirz et al., 2005). Besides, SOS is implicated in the mobilization of trans-

Are integron really successful? – SOS-controlled accessory factors?


posons (Aleshkin et al., 1998), prophages (Bunny et al., 2002; Quinones et al., 2005),

pathogenicity islands (Ubeda et al., 2005) and ICEs (Beaber et al., 2004) – all of which carry

virulence and/or resistance determinants. In this context, the SOS-mediated regulation of re-

combination rates not only allows multi-resistance integrons to swap cassette expression in

time of stress, but also coordinates these events with other potentiating mechanisms.

Multi-resistance integrons use mobile elements as shuttles between different genomic

backgrounds. As illustrated above (see p243), the mobilization of these shuttles is dependent

on the SOS response to some extent. Besides, some of these elements such as TEs and conju-

gative plasmids are able to trigger the SOS response. Altogether, the SOS system thus con-

trols a set of processes promoting both cassette exchanges and the spread of integrons in the

population. Furthermore, the use of many antibiotics triggers the SOS response with the unin-

tended consequence of promoting the spread of both bacterial virulence factors and antibiotic

resistances (see p54). Both trimetoprime and fluoroquinolones induce the expression of the

integrase (see Article II, p173). As pointed out earlier, cassettes encoding resistance to these

antibiotics have been isolated. In this perspective, some antibiotics would directly promote the

recruitment and dissemination of appropriate resistance determinants. Antibiotic guidelines

should take these considerations into account, and limit the use of these drugs accordingly.

Current policies in the fight against antibiotic resistances mostly rely on the cost pre-

sumably imposed by the very resistance mechanisms on fitness. In the absence of antibiotic

selective pressures, such deleterious effects are expected to be counter-selected in the popula-

tion. Because integrons tightly couple recombination and expression, potentially deleterious

resistance traits can be put away from expression and kept silent in the cassette array without

impacting fitness until they are needed again. The relaxed selective pressure experienced by

non-expressed cassettes may favor the diversification of the resistance gene and promote its

efficient adaptation to successive generations of antibiotic – as exemplified by the evolution-

ary success of extended spectrum β-lactamases (Gniadkowski, 2008).

Several authors had noted the advantages of developing drugs targeting the SOS sys-

tem in order to combat the adaptive properties of bacteria (Avison, 2005; Cirz et al., 2005;

Kelley and William, 2006; Potts et al., 2008). The integron case further illustrates the poten-

tial benefit of this endeavor, but drastically restricts the range of suitable targets. Indeed, inte-

grons being a fully independent system that is simply pluged on the SOS regulon, the RecA

protein stands as the sole target that can alter its functioning.

Discussion – Biotechnological considerations



As put forward in Article IV (see p214), directed evolution is a powerful tool to engi-

neer elaborate behaviors that are too complex to be fully predictable in biological systems.

The generation of large amounts of diversity is a general prerequisite to successful selection,

be it natural or artificial. The two themes developed in this work – the integron system and the

ELP principle – deal with the evolvability of biological systems. In the framework of directed

evolution, these properties can be used to increase the generation of diversity at different lev-

els of organization.

III.1. The ELP principle

The benefits of incorporating several ELP-designed synonymous sequences in directed

evolution protocols are presented in Article IV (see p214). This principle and its embodiment

as software (see http://mobyle.pasteur.fr/cgi-bin/portal.py?form=elp) have been patented with

the support of the Pasteur Institute under the name Modulating mutational frequency to opti-

mize protein evolution. This technology would benefit biotechnological companies involved

in directed evolution and/or can enrich the range of services proposed by companies special-

ized in de novo gene synthesis.

So far, we applied this strategy to identify variants of the Aac(6’)-Ib enzyme, which

confer increased level of resistance to aminoglycosides in E. coli. Despite significant widen-

ing of the explored protein space before selection, we only identified few advantageous mu-

tants. These mitigated results probably reflect the narrow evolutionary perspectives of the

enzyme. We are now using two ELP-designed synonymous genes to isolate IntI1 mutants

achieving increased attI x attI recombination frequency. Such variants may exhibit increased

affinity to attI and would eventually allow Int1I-attI complexes to be crystallized, which is

not afforded by the low affinity of the wild-type enzyme with this substrate. The screening

procedure is not straightforward, but has already been successfully used to isolate variants

with increased attI x attC recombination rate (Demarre et al., 2007). Nevertheless, a definitive

demonstration of the advantages of the ELP approach would rely on well known model genes,

for which efficient screening strategies and extensive dataset obtained with other directed

evolution assays are available. The TEM β-lactamases (Zaccolo and Gherardi, 1999; Ber-

Synthetic integrons – SOS-controlled accessory factors?


shtein et al., 2006; Weinreich et al., 2006) and the fluorescent proteins – such as GFP

(Crameri et al., 1996; Sacchetti et al., 2000; Miyawaki et al., 2005; Shaner et al., 2007) –

would be privileged candidates in this respect.

III.2. Synthetic integrons

While the ELP principle can speed up protein evolution through point mutations, the

integrons system may prove useful to engineer whole metabolic pathways. Indeed, integrons

basically perform combinatorial rearrangement of silent gene cassettes under the control of a

single promoter, thereby providing the opportunity to select for the best arrangement of ex-

pressed cassette under appropriate conditions.

We developed a directed evolution protocol whereby a library of synthetic cassettes

harboring genes of interest is introduced in an E.coli strain containing an inducible intI1 gene

and a chromosomal attI site associated with a strong inducible promoter. Upon induction of

the integrase, cassettes from the library are randomly recombined at the attI site – so that ex-

tensive variability is generated at the population level. Expression of the array can then be

turned on to screen individual cells for desired properties. Successive rounds of recombina-

tion-selection may be chained until no further improvements could be detected. As a proof of

principle, we introduced the five genes of the E. coli tryptophan operon into separate cas-

settes. These functional cassettes – interspersed with three inappropriate ones (one containing

lacZ and two harboring a transcription terminator) – were cloned in disarray into a library

plasmid. Cells transformed by this plasmid were selected for growth in minimal medium after

thay have been subjected to IntI1-mediated shuffling. Preliminary data indicate that several

functional arrangements can readily be selected (D. Bikard, unpublished results). Although

the involvement of the few first cassettes of a natural integron in the same functional path-

ways has never been reported (but see Elsaied et al., 2007), these data suggest that integrons

may facilitate the emergence of operons.

The synthetic cassettes have been constructed according to a standardized, fast, and ef-

ficient procedure inspired by the rise of synthetic biology standards. This integron-based

combinatorial strategy would be easily applicable to the optimization of artificial biochemical

pathways, when competing candidate genetic elements are available. Alternatively, the design

of attC sites with primary sequences coding for flexible polypeptides would permit to use this

system to shuffle protein domains. Such a tool may prove particularly valuable in the genera-

Discussion – Biotechnological considerations


tion of the multi-modular enzymes which are responsible for the synthesis of polyketides

(Menzella and Reeves, 2007). In this perspective, the very system that initially led to the rise

of antibiotic resistances would ironically be subverted to produce brand new antiobiotics.





Appendix – Epistemological considerations on the role of variations in biology




Maintenance versus variability: a major evolutionary


Cats do not make dogs and children tend to resemble their parents. These simple ob-

servations are accessible to everyone’s immediate experience. They nonetheless underlie two

related, essential and long mysterious biological processes: the maintenance of species char-

acteristics and the inheritance of individual traits over time. Another compelling observation

is that, beyond the astonishing diversity of living forms on Earth, some species exhibit patent

similarities between each others. This allowed generation of naturalists since Aristotle to un-

dertake systematic classifications of animals and vegetals. These two concepts are seemingly

difficult to reconcile: on the one hand, heredity entails the transfer of unchanged information

from parent to offspring within species, while on the other hand the study of diversity unveils

the profound link between species.

For ages, at least in occident, the most successful explanation of this paradox was one

coupling the platonic’s idea of essential types to the existence of an intelligent and omnipotent

agent responsible for their creation and embodiment. Essentialism holds that, for any specific

kind of entity, there is a set of permanent, unalterable, and eternal characteristics, all of which

any entity of that kind must possess. Real entities then stand as imperfect manifestations of

their ideal essence. This view goes along well with the concept of biological species. Indeed,

if there are some accidental differences between individuals, none had ever observed modifi-

cation of a species’ representative traits over a human lifetime. The belief that an intelligent

force, a demiurge, created the essences reaches back to Plato (ca. 428-348 BC) and may

somehow account for the observed relationship between species. Suffice it to say that, along

its initiative, the demiurge was inspired by its former creations and accordingly developed a

range of resembling forms. Some pre-Socratic philosophers, e.g. Empedocles (ca. 490–430

BC) and Democritus (ca. 460-370 BC) and their followers, such as Epicurus (ca. 341-270 BC)

and Lucretius (ca. 99-55 BC) rejected these deterministic ideas and let much space to chance

and contingency in their world views. Such metaphysical edifices are not meant to satisfy the

scientific principle of objectivity, but rather appear as a posteriori constructions justifying

Maintenance versus variability: a major evolutionary trade-off


ethical and political beliefs. The essentialist view established itself in the Eastern and Middle-

Eastern thoughts because it fitted well the precepts of the Abrahamic religions (essentially Ju-

daism, Christianity and Islam). These religions had, and still have, a profound impact on hu-

man societies, ethics and sciences. A famous illustration, on which we will come back later, is

the natural theologism and fixism of Carl Linnaeus. The Swedish taxonomist believed his

classification scheme to reveal the divine order of God's creation. In his own words: “There

are as many species as the number of different forms created by the Infinite Being in the be-

ginning. These forms have then according to the inherent laws of creation always produced

offspring like themselves, so that we do not now find more species than have previously ex-

isted. Thus, there are as many species as there are different forms or structures if we exclude

the non-essential deviations (varieties) that are conditioned by the habitat or by fortuities”

(As cited in (Gustafsson, 1979)). Beyond the three main monotheisms and to the best of my

knowledge, all civilizations developed cosmogonies to account for the biosphere by the ab ni-

hilo appearance of demiurge-like entities and the subsequent transformations, emanations or

creations of living forms.

Because of complex socio-cultural factors, a lot a people still believe in these types of

metaphysical explanations. However, the last 150 years have seen the birth and development

of a consistent and powerful body of knowledge, i.e. a theory, the modern evolutionary syn-

thesis, which allow biologists to account for these natural facts in an objective, scientific, and

much more satisfactory manner. We now know the nature and mechanisms of transmission of

the genetic material responsible for heredity. We also know that all diverse living forms, be-

yond their perceptible macroscopic similarities, dwell in a profound mechanistic unity and re-

late to each other by common descent.

Given the aforementioned fixist grasp of life, the explanation of relationship between

species by way of common descent was difficult to admit. Indeed, the concept of common de-

scent supposes the apparition and perpetuation of modifications into species. When consid-

ered separately, the ideas of modification and perpetuation were not so problematic. For

instance, the appearance of viable variations in cultivated plants is so common that they can-

not be unnoticed. Regarding this issue, Linnaeus wrote: “Let a garden be sown with a thou-

sand different seeds, let to these be given the incessant care of the Gardener in producing

abnormal forms, and in a few years it will contain six thousand varieties, which the common

herd of Botanists calls species. And so I distinguish the species of the almighty Creator which

are true from the abnormal varieties of the gardener: the former I reckon of the highest im-

portance because of their author, the latter I reject because of their authors. The former

Appendix – Epistemological considerations on the role of variations in biology


persist and have persisted from the beginning of the world, the latter, being monstrosities, can

boast of but a brief life” (as cited in (Gouyon et al., 2002)). This way, modifications were

usually seen as fortuitous anomalies that could only be perpetuated by artificial selection, but

would be quickly eliminated otherwise. The real trouble arose when naturally occurring and

heritably stable variants were discovered. Linnaeus was once confronted with a mutant of the

otherwise well described Linaria vulgaris species (see Figure 40), in which the fundamental

symmetry of the flower is changed from bilateral to radial. The specimen was in complete

contradiction with the botanist’s classifying system, which is grounded on flower morphol-

ogy. This naturally led him to name the plant Linaria peloria, i.e. monster in ancient Greek

(Gustafsson, 1979). The case troubled the taxonomist’s faith, and eventually had him embrace

the possibility that “all species be-

longing to the same genus originally

formed a single species which diversi-

fied by hybridization” (as cited in

(Gouyon et al., 2002)). In other

words, the constancy of divine crea-

tion might only hold until genera. We

now know that the flower’s altered

phenotype is the consequence of an

epimutation (Cubas et al., 1999), i.e.

due to an epigenetic phenomenon (see

Epigenetics, p89). As we will see

throughout this work, bacteria are

particularly prone to genetic modifications. It is interesting to muse that, if basic techniques of

microbiology were available before L. Pasteur (1822-1895) and colleagues set those up, a lot

of spontaneous and self-perpetuating variations could have been observed. In hindsight and

despite its falseness, the fixist paradigm initially subtended the edification of classifications

and may thus be considered as a necessary epistemological intermediate. Indeed, the devel-

opment of systematic classifications drove a fantastic accumulation of specimens, such as

peloria, which in turn reinforced the idea of evolution.

Aside from these paradigmatic and religious considerations, the question of genetic

modification remains intricate and is somehow at odds with the concept of heredity. If the ex-

istence of variations is necessary for evolution to occur, their introduction in the hereditary

equation raise tricky questions concerning the control of their generation. Maintenance of ge-

The purpose of evolution


netic integrity is indeed essential for the continuation of important traits over time. Because

most alterations are deleterious, too much variation is likely to hinder the stability of the or-

ganism. However, too few variations might not allow sufficient evolution to changing living

conditions. In this respect, successful adaptation obviously requires an exquisite balance be-

tween stability and variability. How such a balance can be established is the central theme of

this work. Before addressing this issue, it is worth wondering why evolution is necessary,

what is exactly meant by adaptation and what constitute its fundamental mechanisms.

The purpose of evolution

The title of this section is deliberately provocative. Evolution is often presented as

blind and contingent process, and I will certainly not argue against that. Nevertheless, this de-

scription alone might be misleading and I would like to highlight the reasons for that. At the

same time, this will permit to bring out the necessity for evolutionary processes and thus the

necessity of variations.

Form, function and the watchmaker

When I was an undergraduate student, I was taught not to say that eyes are made to

see. In the same light, I was also said that an animal has sight because it has eyes, while for-

bidden to think that it has eyes because he has the need of sight. Without further explanations

(and there were not) these assertions are absolute nonsense in the light of modern biology. In

his essay Chance and Necessity, J. Monod (1910-1976) highlights “how much arbitrary and

pointless it would be to deny that the natural organ, the eye, represents the materialization of

a ‘project’ (the one of capturing image)”, and that “one of the fundamental properties com-

mon to all living beings without exception [is] that of being objects endowed with a purpose,

which at the same time they exhibit in their structure and carry out in their performance”

(Monod, 1970). The seemingly perfect adequacy between forms and functions pervade all

levels of biological organization, from whole organs to nanoscopic molecular machines,

Whether the function followed the form or the contrary (i.e. do we see because we

have eyes or have we eyes in order to see?) is an age old question again tracing back to Greek

philosophers such as Plato (ca. 428-348 BC), Democritus (ca. 460-370 BC) and Aristotle (ca.

384-322 BC). The implications of this debate extend beyond natural sciences and reach met-

physical concepts. The true question behind the alternative is to determine whether the exis-

Appendix – Epistemological considerations on the role of variations in biology


tence of purposeful biological structures is merely accidental or whether a force drives the de-

velopment of their intrinsic projects. That the former point must be false stands as an obvious

fact today. The functions, i.e. the adaptations of structures toward defined ends that are appar-

ent in biological entities put them at odd with other physical manifestations. It is is a statisti-

cal impossibility that structures as complex and refined as an eye, a bacteria and even a

functional enzyme can suddenly emerge with fully functional features (Salisbury, 1969;

Dawkins, 1986). Besides, the existence of spontaneous generation has been definitively ruled

out since L. Pasteur (1822-1895) (Pasteur, 1861). A rational mind feel compelled to admit the

existence of a creative force to account for the projects expressed in biological functions.

What is however the nature of this creative force? An immediate explanation would be

a theological one: the purpose apparent in living intities is the reflection of the will of the

creator. This issue is best illustrated by the so-called watchmaker analogy. This argument was

famously put forward by W. Paley (1743-1805) (Paley, 1809), but similar ideas were formerly

evoked by numerous thinkers. Let us imagine one happens to find a watch in the middle of a

virgin natural landscape. In contrast to simpler natural objects, such as stones, the obvious

complexity of the artifact, the fine and purposeful arrangement of perfectly suited mechanics

irremediably argue for the existence of a intelligent watchmaker, who designed and craft it.

Similarly, Paley argue, the complexity of living forms, their exquisite adaptations to specific

functions definitely prove the existence of an intelligent designer. There is, however, no logi-

cal demonstration in this reasoning: the long watch preamble is not a sound premise to an ar-

gument, but merely serves to establish the plausibility of the general premise one can tell,

simply by looking at something, whether or not it was the product of intelligent design, which

eventually remains unproven. This rhetorical slippage is known as the design inference and is

frequently used as an argument to the existence of God.

For a long time there was no satisfying alternative to those kinds of argument. Never-

theless, the methods of natural sciences are grounded on objectivity not projectivity and hence

do not leave room for supernatural explanations. A heuristic scientific principle, known as

Ockham razor after the logician and Franciscan friar William of Ockham (ca. 1288-1348)

holds that the explanation of any phenomenon should make as few assumptions as possible.

In this respect, the assumption of an omnipotent and omnipresent creature transcending the

law of the universe is particularily not parsimonic explanation. Before Darwin (1809-1882),

people that did not admit the divine intervention as an explanation of life were somehow

compelled to admit its spontaneous apparition.

The purpose of evolution


Adaptation, teleonomy and blindness

Living organisms are able to reproduce themselves in an almost identical manner, at

the exception of few variations. Because the resource of a given environment are limited, a

population of organism cannot grow indefinitely but soon become restricted. This creates ex-

trinsic selective conditions: any genetic variant that has higher chances to reproduce is stabi-

lized and increase in frequency in the population. In this light, the propagation of self appears

as the ultimate end of a living organism. It relies on proper capacity to survive, exploit the en-

vironment and reproduce. These performances are carried out by specialized devices that an

organism produce as part of is own self, which together constitute its phenotype. The pheno-

type is defined as the expression of an organism’s genetic information in a given environ-

ment. Any random phenotypic variation allowing a particular function to be performed more

efficiently is selected provided it finally results in higher prolificity. If the variation is the re-

sult of a heritable genetic mutation, the sustained selection on the phenotype leads to a rise in

frequency of the mutation in the population. When repeated iteratively, the short-sighted se-

lection of small effect mutations can progressively lead to the appearance of sophisticated

structure. This cumulative selection is a creative process that is not driven by final ends but by

the instantaneous action of the environment on the available variability. The trade-off between

productions of phenotypic variations and genetic stability mentioned earlier is essential in this

process. Sustained selection of a genetically encoded trait relies on the relative invariance of

the global phenotype. In contrast, the existence of variations is mandatory for adaptation.

The seemingly intrinsic projects that single out the phenotypes of organisms reflect

their adaptations to the environment. The adaptation is not only the static state that we can ob-

serve, and which unavoidably appears as purposeful design. Above all, it is a continuous and

dynamic process driven by the environment and resulting from the cumulative selection of

genetic determinant through their impact on the phenotype.

Unambiguously, eyes are made to see, wings to fly and at another scale, DNA poly-

merases to replicate DNA. The ambiguity does not lie in the fact of adaptation but rather in

the process of adaptation. The verb to make inevitably alludes to the existence of an almighty

watchmaker responsible for crafting the universe and its inhabitants. The modern synthesis of

evolution provides a robust and scientific framework to explain the apparent finality of bio-

logical entities. The process of cumulative selection, initially described by C. Darwin (1809-

1882) and A. R. Wallace (1823-1913) (Darwin and Wallace, 1858), fulfills the role of the

creative force that account for the finality of biological artifacts. As eloquently written by R.

Appendix – Epistemological considerations on the role of variations in biology


Dawkins (1941): “All appearances to the contrary, the only watchmaker in nature is the blind

forces of physics, albeit deployed in a very special way. A true watchmaker has foresight: he

designs his cogs and springs, and plans their interconnections, with a future purpose in his

mind's eye. Natural selection, the blind, unconscious, automatic process which Darwin dis-

covered, and which we now know is the explanation for the existence and apparently purpose-

ful form of all life, has no purpose in mind. It has no mind and no mind's eye. It does not plan

for the future. It has no vision, no foresight, no sight at all. If it can be said to play the role of

watchmaker in nature, it is the blind watchmaker” (Dawkins, 1986).

The decisive subtlety differentiating the artifacts produced by an intelligent watch-

maker from those generated by Dawkins’ blind watchmaker is caught between the words

teleology and teleonomy, which both refers to the issue of finality. In the one hand, teleology

refers to purposeful systems that are able to elaborate their own ends. Such systems are char-

acterized by intentionality and foresight. As described above, the concept of teleology is

closely linked to the one of theology in the context of biological systems. On the other hand,

teleonomy is a property of goal seeking systems. Such systems are not internally driven to-

ward a defined end, but result from an exploratory process composed of several round of

variation and subsequent stabilization supervised by extrinsic conditions. Biological evolution

is a perfect illustration of a goal seeking process. In this case, the goal that is sought is to

maximize the reproduction of an organism. The adaptations of this organism are both the re-

sults and the consequences of this process. In this light, the debate concerning the respective

primacy of forms over functions is pointless. Forms and functions interact in a dialectic man-

ner to results in adaptation through time. None has the primacy over the other: a given func-

tion is the consequence of the form, but the form has been selected through the function it

confers. Besides, the simplest cell is a complex network of interdependent processes that

evolved on top of each other. As a result, the expression of a given form and function relies

on preexistence of other forms and associated functions.

Impact of the environment

What is the environment?

From the standpoint of an organism, the environment represents all what is outside of

the self and comprises abiotic and biotic components. The abiotic factors correspond to the

Impact of the environment


physicochemical conditions experienced by the organism. They are subjected to both random

variations and regular fluctuations over a wide range of timescales. Random fluctuations typi-

cally results from meteorological phenomena: variation in temperature, drought, rain and sub-

sequent afflux of chemical… Examples of regular fluctuation include the day-night and

seasonal cycles. The biotic factors comprise all other living forms, from the same or different

species. The essential difference between the biotic and abiotic factors lies in the ability of the

formers to evolve. The co-evolution of different species results in the establishment of diverse

and interdependent relationships in the ecosystem, such as competition, symbiose or com-

mensality. The antagonistic interactions resulting from competition, predation and parasitism

are particularly interesting. They determine situations in which the survival of one species is

threatened by the existence of another species, while the survival of the latter is stricly de-

pendent on these harassments. In such contexts, every innovation developed by one camp

must be counteracted in the other resulting in an explosive mechanism of evolution, which is

often referred to as an arm race (Dawkins, 1986). The broad ecological significance of such

interactions is captured in the Red Queen analogy (Van Valen, 1973), which refers to a chap-

ter in L. Carroll's novel Through the Looking-Glass in which Alice and the Red Queen are

running in one side, while the entire world is moving in the opposite direction. As the net

movement of the protagonists is null, the Red Queen explains: “It takes all the running you

can do, to keep in the same place”. This highlights the fact that continuous adaptation is man-

datory to simply persist in an evolving biotic world.

Exposure to environmental variations depends on the actual biology of organisms.

Mobile organisms can actively forage for foods and, to some extent, can escape or avoid chal-

lenging environments, including biotic and abiotic factors. The ability to sample a larger set

of conditions renders them less dependent on local variations. In contrast, sessile organisms

are constrained to one location and condemned to undergo the vicissitude of the weather, the

food availability and cannot escape from other organisms. Although they are generally able to

move actively on a microscopic scale, microorganism can be considered as mostly sessile or-

ganisms on a macroscopic scale (Andrews, 1998).

An organism delimits a physical separation between its self and the surrounding envi-

ronment, thereby establishing an internal compartment. Apart from the particular case of

niche construction (see below), the external environment is uncontrolled by the organism. In

contrast, the internal environment is part of the individual phenotype and is subjected to so-

phisticated mechanisms to adjust its composition. This maintains a certain physicochemical

homeostasis which fundamental to the occurrence of metabolic reactions. The first replicating

Appendix – Epistemological considerations on the role of variations in biology


molecules to undergo evolution had to cope with direct exposure to the environment. The

evolution of the cells allowed emancipation from the hazards of the external environment.

However, this process required the coordinated association of different replicators and their

subsequent functional specialization driven by increased collective survival over evolutionary

time (Szathmary and Maynard Smith, 1997). The individuality was eventually shifted from

single independent replicators to a consortium of interdependent molecules replicating in a

coordinated fashion. In the same fashion, individuality was shifted from single cells to several

related cells during the evolution of pluricellular organisms. The progressive liberation from

the external conditions relies on the construction of a phenotype resulting from the coordi-

nated action of several entities and is accompanied by modification of the selection units. In

this view, an organism constitutes a microenvironment constructed by an assembly of genes

to cooperatively increase their survival. The coordinated action of the genes is the results of

ongoing evolution. Any genetic variation in one of these genes can be selected if it increases

the survival of the cell. However, a net survival increase may hide possibly inadequate inter-

actions between some components of the cellular environment. Hence, the phenotypic varia-

tion of one gene product may constitute a change in the internal environment for another

gene, resulting in complex epistatic relationship. The emancipation from the environment re-

lieved some constraints, while creating others.

Beyond the edification of an internal environment, an organism’s phenotype can also

significantly impact its external environment. Modified environments can affect the progeny

of the organism, resulting in ecological inheritance and influencing biological evolution. This

overlooked biotic factor is often referred to as niche construction (Laland et al., 2000).

The inheritance of acquired characteristics

The first formal theory of evolution was proposed by Jean-Baptiste de Lamarck

(Gould, 2002). Lamarck recognized transformation of species by way of progressive modifi-

cations and highlighted the theoretical necessity for evolution (Lamarck, 1809). Following

naturalists since Aristotle, he favored the ordering of living form along a complexity scale. He

proposed that an inner complexifying force is driving evolution from the simplest living enti-

ties to the more complex. In his time, the absence of spontaneous generation was admitted for

higher animals, but the issue was not yet settled concerning microorganisms. Lamarck pro-

posed that complex organisms arose by progressive transformation of simpler one, all the way

down to microorganisms, that were conceived as simple enough to appear spontaneously (see

Impact of the environment


Figure 41). In this view, evolution is necessary to explain the existence of complex creatures

that cannot appear by chance alone. In Lamarck’s thought, the evolution of complexity under-

lie an idea of progress that is an inherent feature of life. This process is driven by an elusive

force referred to as “Le pouvoir de la vie”. The creative component of Lamarckian evolution

is thus teleological. Nevertheless, Lamarck clearly outlined the importance of the environ-

ment in evolution. By determining the use and disuse of phenotypic characteristics, the envi-

ronment drives the modification necessary to evolutionary change. The frequent and

continuous use of a function is expected to subtend the development of the structure carrying

this function. In contrast, a characteristic that is not used in an environment will progressively

shrink, until eventual loss. These mechanisms are grounded on the observation of diverse

phenotypic plasticity, for instance the development of the musculature upon physical exer-

cises or the deformation of certain organs subject to constant physical constraints.

Following the predominant idea of his time, Lamarck assumed that the changes affect-

ing the phenotype of the parental organisms are transmitted to their offspring. In Lamarck

own words: “All the acquisitions or losses wrought by nature on individuals, through the in-

fluence of the environment in which their race has long been placed, and hence through the

influence of the predominant use or permanent disuse of any organ; all these are preserved

by reproduction to the new individuals which arise, provided that the acquired modifications

Appendix – Epistemological considerations on the role of variations in biology


are common to both sexes, or at least to the individuals which produce the young” (Lamarck,

1809). At the time, the mechanisms underlying heredity were completely unknown and the

inheritance of acquired characters was a common belief. This idea was notably refuted by A.

Weismann (1834-1914), who established the distinction between germen and soma in meta-

zoan. Only a subset of cells is transmitted to the next generation, while the vast majority of

cells participates to the elaboration of the phenotype and only serve the individual. In this

context, the transmission of acquired character requires that genetic information supposedly

received by the soma be communicated to the germen. The establishment of the central

dogma of molecular biology which can be summarized as follows ADN ↔ARN→Protein –

was the ultimate proof that no information modifying the phenotype (proteins) can trace back

to the genetic information (DNA) (Monod, 1970). The reciprocal ADN ↔ARN relationship

reflects the existence of reverse transcriptase coded in retrolements.

Lamarck had a remarkable intuition concerning the role of the environment in direct-

ing adaptation of individual organisms. Nevertheless he failed to identify the actual mecha-

nisms driving this evolution. Half a century later, Darwin proposed that evolution is driven by

natural selection. In his time, the laws of heredity and the nature of the genetic information

were still unknown and his idea about the generations of variability where extremely fuzzy. In

the origins of species, he wrote: “I have hitherto sometimes spoken as if the variations… were

due to chance. This, of course, is a wholly incorrect expression, but it serves to acknowledge

plainly our ignorance of the cause of each particular variation. [The facts] lead to the con-

clusion that variability is generally related to the conditions of life to which each species has

been exposed during several successive generations” (Darwin, 1859). Ignorant of the source

of mutations, Darwin did not reject the idea that organisms may respond to environmental

conditions and furnish the gametes with information enhancing the next generation’s re-

sponse. He even suggested that stress might generate the variability upon which natural selec-

tion operates.

The idea that the environment can directly influence heritable variation is appealing

because it straightforwardly couple the rate of evolution to its immediate necessity. The whole

concept was however firmly rejected by the neo-Darwinian synthesis, which established the

unilateral primacy of selection. Any mechanisms that somehow suggest a coupling between

environment and mutation were discredited and dubbed Lamarckian.

Impact of the environment


The Neo-Darwinian focus on selection

A fundamental tenet of the synthetic theory of evolution is that mutations occur ran-

domly in time and genomic space. Mutations are conceived as accidental error altering the in-

tegrity of the genetic information transmitted from one generation to the other. Apart from

exposure to mutagenic conditions, the environment is considered to play no role in this proc-

ess. In contrast, selection by the environment is conceived as the sole driving force in evolu-

tion, an ordering process that sorts the preexisting random variations generated

spontaneously. No more teleological forces are required to account for the orientation of evo-

lution; the process is blindly directed by the selective action of the natural environment. Evo-

lution is essentially a random process. The shifting balance theory developed by S. Wright

contributed to show that theoretically, evolution is a short-sighted and favor immediate adap-

tation irrespective of its long term consequences (Johnson, 2008). S. Luria and M. Delbrück

provided the first experimental demonstration of the precedence of mutations over selection.

They exposed bacterial population to phage infections and carefully analyzed the distribution

of resistant variants selected in independent experiments. They showed that this distribution

was in agreement with the random accumulation of resistance mutations prior to exposure to

the phage (Luria and Delbrück, 1943). E. and J. Lederberg reached similar conclusions by

monitoring the apparition of penicillin-resistant clones (Lederberg and Lederberg, 1952).

Anticipating and responding environmental changes

The teleological idea that a mysterious force directs evolution is known as orthogene-

sis. This concept has ideological implication and as been use to legitimate the incorporation of

evolution to various doctrine. Under Stalin in URSS, Lysenko emphasizes the capacity of the

environment to direct heritable variation. The geneticist that did not agree to that position

were harassed, incarcerated or expulsed. The rising synthetic theory of evolution was seen as

a bourgeois science that denied the aspiration of the regime (Fisher, 1948). Religions rather

tend to hold that the mysterious force is the hand of God. The Jesuit T. de Chardin famously

tried to reconcile the Christian faith with the idea of evolution (Teilhard de Chardin, 1955).

Presently, an institution such as the Catholic Church officially accepts the existence of theo-

ries of evolution, but practically favors an interpretation whereby important variations are dic-

tated by God rather by contingent mutations. Other congregations are less subtle and the

belief in creationism is widespread in some countries (Miller et al., 2006; Berkman et al.,

Appendix – Epistemological considerations on the role of variations in biology


2008). The most moderate proponents of the Intelligent Design movement still argue that mo-

lecular machines are too complex to have evolved by cumulative selection. Re-actualizing

Paley’s design inference, they present this argument as a proof of the existence of a superior

intelligence (Behe, 1996).

Continuous pseudoscientific interpretations of evolution somehow prompted the pro-

ponents of the mainstream synthetic theory to strengthen their position concerning the pri-

macy of selection. The idea that the environment may influence another step than selection in

the evolutionary algorithm was out of the paradigm and difficult to defend in the scientific

community. A classic illustration concerns the observation of stress-induced chromosomal re-

arrangement in maize by B. McClintock (McClintock, 1950), which showed that high order

genetic changes can be elicited by the environment. It received little credit until the loci im-

plicated mere identified as transposon providing a mechanistic basis for the phenomenon

(McClintock, 1984).

However, the idea that mutations can be induced by the environment does not contra-

dict the existing evolutionary theory, but rather appear as a sound consequence of it. A huge

controversy that fuelled extensive researches in this field was initiated by the publication of a

paper by John Cairns and colleagues in 1988 (Cairns et al., 1988). In this paper and several

subsequent works, the authors established a genetic system to follow the apparition of muta-

tion in E. coli. In this setting, bacteria carrying a lacZ gene inactivated by a reversible

frameshift are selected on lactose agar plates, so that only revertants can grow. The plates

were incubated during six days. The fluctuating apparition of spontaneous revertants was ob-

served as in the Luria-Delbrück experiments during the two first days of incubation. But the

apparition of revertants continued during the following days. The late mutants were not slow

growers and their number exceeded expectations under the Luria-Delbrück model. Instead,

their distribution was consistent with apparition under selection. Overall, the observed rever-

sion rate in the selective environment exceed by 100-fold the rate measured in a non-selective

one. It was initially reported that the increased mutation rate was specifically directed to lacZ.

This process, referred to as directed mutation, supposes that a kind of molecular cognitive

system is able to predict the consequences of mutations, so that only adaptive loci are tar-

geted… Or it can easily be interpreted as evidence of orthogenesis. Subsequent studies

showed that the increased mutation rate under selection was not restricted to the lacZ gene,

but distributed over the whole genome, though the region surrounding lacZ was more vari-

able. Furthermore, the mutational signature observed under selection was found to be differ-

ent than the one observed in the absence of selection. It thus appeared that a distinct

Impact of the environment


mechanism is responsible for increased mutagenesis. These results fostered a comprehensive

research effort and the experimental system was fully dissected (Roth et al., 2006; Galhardo

et al., 2007).

These studies and others highlighted that the role of the environment is not restricted

to selection alone. Clearly, the results of the Luria-Delbrück (Luria and Delbruck, 1943) and

Lederberg-Lederberg (Lederberg and Lederberg, 1952) experiments rapidly gained general

acceptance because they fitted paradigmatic expectations, in spite of their restricted biological

significance (harsh selective pressure, specific type of mutations). As will be illustrated be-

low, cells evolved mechanisms to sense stressful conditions that happen to responsively con-

vert the collected information into mutations (see Stress-induced mutagenesis, pp 47-60).

Thus, the generation of variability is not necessarily constant in time but can be informed by

the environment. Moreover, mutation rates are not only variable through time but are also

variable in genomic space. Indeed, descriptions of genetic mechanisms dedicated to or favor-

ing localized and oriented mutations accumulated over the last decades. The detailed presenta-

tion of these mechanisms is covered in the introduction of this thesis (see Programed

generation of genetic variations, pp 60-99).

Collectively, these processes participate in a kind of molecular intelligence that allow

cells to anticipate changes or directs genetic changes according to the environment and are of-

ten perceived as Lamarckian. However, these discoveries do not contradict but extend the

classical synthetic theory of evolution (Thaler, 1994). In final analysis, mutations always ap-

pear in a random fashion. Genomes simply evolved the capacity to control this randomness,

so that the production of variation can be tuned to the demands of the environment. To take

up a weel put sentence, this reflects the fact that chance favors the prepared genome

(Caporale, 1999).





Evolutivité – Le cas des integrons & utilisation de sequences synonymoes en évolution dirigée

La stabilité phénotypique est essentielle au succès d’organismes évoluant sous des

conditions constantes. L’environnement est néanmoins soumis à de perpétuelles variations stochastiques, auxquelles les êtres vivants doivent sans cesse s’adapter. L’évolutivité caractérise la capacité d’une population à répondre à de telles pressions sélectives par la génération de modifications phénotypiques héritables. La majorité des mutations étant délétères, des processus permettant de limiter la production de telles variations aux seules périodes de stress, ou de la confiner à des loci et phénotypes bien définis, ont été sélectionnés au cours de l'évolution.

Les intégrons en constituent une illustration particulièrement sophistiquée. Initialement identifiés comme vecteurs de résistance à de multiples antibiotiques, ces systèmes génétiques bactériens spécialisés dans l’échange, la collecte et l’expression de gènes accesoires constituent une importante source de diversité génétique. Ce travail montre que les intégrons sont directement couplés à une voie majeure de réponse au stress chez les bactéries, le système SOS. En permettant de générer de la variabilité phénotypique en période de stress sans affecter le reste du génome, les intégrons constituent ainsi un exemple paradigmatique d’évolutivité.

Un autre aspect de ce travail démontre que des séquences codantes synonymes – bien que spécifiant des protéines identiques – peuvent accéder par mutations ponctuelles à des régions différentes de l’espace phénotypique. Utilisée de manière adéquate, cette propriété permet d’étendre l’évolutivité d’une protéine quelconque dans le cadre d’applications biotechnologiques.

Evolvability – The integron case & the use of synonymous sequences for directed evolution

Phenotypic stability is essential to the success of organisms evolving under steady

conditions. However, the environment is subjected to perpetual stochastic variations, to which living beings must constantly adapt. Evolvability characterizes the ability of a population to respond to such selective pressures through the generation of heritable phenotypic changes. Most mutations being deleterious, processes enabling the confinement of mutations to periods of stress, or to specific loci and well-defined phenotypes, have been selected over evolution.

Integrons constitute a particularily sophisticated illustration of such processes. Initially identified through their involvement in multi-resistance to antibiotics, these bacterial genetic systems are specialized in the exchange and stockpiling of accessory genes and therefore con-stitute an important source of genetic diversity. This work shows that integrons are directly coupled with the SOS system, a major bacterial stress response. By allowing the generation of significant phenotypic diversity during periods of stress without impacting the rest of the ge-nome, integrons hence constitute a paradigmatic example of evolvability.

Another aspect of this work demonstrates that synonymous coding sequences – al-though specifying identical proteins – can access different area of the phenotypic space through ponctual mutations. When properly exploited, this property can enhance the evolva-bility of any protein in the context of biotechnological applications.

top related